linux: improve openat and openat2 handling by cyphar · Pull Request #939 · RsyncProject/rsync

cyphar · 2026-06-01T01:24:47Z

When looking at the usage of openat and openat2 on Linux, I noticed two bugs:

openat(2) famously does not return -EINVAL for unknown flag bits on Linux, so the fallback code to use O_RESOLVE_BENEATH if defined would result in all open operations with the "secure resolver" becoming insecure. This would require a fairly convoluted build environment to trigger, so this isn't exploitable but if it were triggered it would be a security bug.
The usage of openat2(2) did not handle -EAGAIN at all, which could lead to spurious errors. openat2(2) can return -EAGAIN when trying to walk into .. during a scoped lookup like RESOLVE_BENEATH and there was a rename or mount operation anywhere else on the system. The reason for this behaviour is a little esoteric but in short I implemented it this way so that userspace could choose their retry policy and (more importantly) the kernel could avoid potentially-unbounded retries that could lead to a DoS. The man page does mention this but it is easy to miss:

EAGAIN how.resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and the kernel could not ensure that a ".." component didn't escape (due to a race condition or potential attack). The caller may choose to retry the openat2() call.
It is possible for some of the concerns raised in other threads about instability with 3.8.3 were caused by this, but nobody has posted strace logs so it's hard to know for sure, and I haven't had time to test this myself.

I also took a quick look at the path_has_dotdot_component usage and I'm not really convinced that it is actually doing anything useful. The comments seem to be Claude-authored so I'm not sure reading them is super helpful, but the upshot of:

	/* Reject any path with a literal ".." component (bare "..",
	 * "../foo", "foo/..", "foo/../bar", "subdir/.."). The previous
	 * substring-based check caught only "../" prefix and "/../"
	 * substring; bare ".." and trailing "/.." escape on the per-
	 * component walk fallback used by NetBSD/OpenBSD/Solaris/Cygwin
	 * and pre-5.6 Linux. RESOLVE_BENEATH on Linux/FreeBSD/macOS
	 * catches some of these in-kernel with EXDEV, but the front
	 * door must reject them consistently with EINVAL across all
	 * platforms so callers can rely on the validation. */

appears to be that .. is problematic and RESOLVE_BENEATH catches "some of these". This is is incorrect -- .. cannot escape the starting point with RESOLVE_BENEATH (and O_RESOLVE_BENEATH as it was modeled after RESOLVE_BENEATH), regardless of where you put it in the path. However, you can walk into .. during a lookup without issue if it remains within the starting root.

Reading the rest of the code, it seems this was actually done because the fallback userspace resolver cannot handle .. and so the decision was to ban it for everything for "consistency" but this doesn't really make sense -- symlinks can contain .. so there will always be an inconsistency between the resolver that rejects all symlinks and those that can handle them. It seems like it would be more helpful to most users to permit .. for RESOLVE_BENEATH and only block it in the resolver.

My final patch does this, but I can drop it if you prefer.

For what it's worth, there actually are safe ways of handling .. in userspace. libpathrs (shameless plug!) does this on Linux using /proc/self/fd/... to check the logical path against the real path in a very solid way, and Go's os.Root has a brute-force approach that in principle works on any platform by restarting the lookup each time they hit .. and re-applying the logical path walked so far (I plan to adapt the approach for libpathrs at some point so it can be cross-platform as well).

I suspect this will also help alleviate some of the issues people have had with regressions on v3.8.3.

On Linux, openat(2) famously does not return an error if you pass it invalid flag bits. This was something I fixed in openat2(2) but for backward compatibility reasons, openat(2) will always silently ignore unknown O_* flags (which makes adding new ones quite painful). As a result, the fallback path added by commits e1c5f0e ("t_chmod_secure: probe kernel RESOLVE_BENEATH at runtime; drop test skip") and commit 7f60ec0 ("syscall: also use O_RESOLVE_BENEATH on FreeBSD and MacOS") would always return sucess and could cause spurious test failures so we can just remove it. O_RESOLVE_BENEATH will never get added to Linux for this reason, and anyone doing so out-of-tree is just creating a massive foot-gun so there is no need to entertain such broken kernels. Fixes: 7f60ec0 ("syscall: also use O_RESOLVE_BENEATH on FreeBSD and MacOS") Fixes: e1c5f0e ("t_chmod_secure: probe kernel RESOLVE_BENEATH at runtime; drop test skip") Signed-off-by: Aleksa Sarai <aleksa@amutable.com>

In order to avoid in-kernel DoS due to unbounded retries, openat2(2) will return -EAGAIN when trying to walk through ".." if there is any racing rename or mount on the entire system. Note that this applies regardless of whether the rename was on the same filesytem or mount was in the same mount namespace as the process doing openat(2) -- as a result, calling openat2(2) on even a modestly busy system will result in spurious -EAGAIN errors every once in a while and it is necessary to implement a retry loop for it. (Libraries such as libpathrs and heavy users of openat2(2) like runc all do this, and in our testing we found that ~256 iterations is enough to provide resilience even on incredibly rename-heavy machines.) Fixes: 4fa7156 ("syscall: use openat2(RESOLVE_BENEATH) on Linux for secure_relative_open") Signed-off-by: Aleksa Sarai <aleksa@amutable.com>

This code was added in commit 30656c5 ("syscall: add symlink-race-safe do_*_at() wrappers and harden secure_relative_open") but the justification given is really not particularly convincing. The stated justification is that RESOLVE_BENEATH does not block some ".." escapes, which is categorically false (if it were true, that would be a Linux kernel bug and the kernel actually has an extra safety check at the end of lookup that would render this kind of breakout practically impossible). Reading the code, the actual reason appears to be to provide some consistency between the RESOLVE_BENEATH and fallback per-component O_NOFOLLOW lookups. However, the O_NOFOLLOW resolver can never have full parity with the RESOLVE_BENEATH resolvers because RESOLVE_BENEATH permit symlinks and symlinks can contain ".." components. It seems more prudent to just allow ".." for modern systems and reject it for older ones. If truly necessary, there are even ways to support ".." in a cross-platform way so this could even be seen as a "not-implemented-yet" error rather than a permanent API contract. For the really-old systems case (with no O_NOFOLLOW or AT_DIRFD) then just as we have to accept symlinks (which might contain "..") we might as well accept bare ".." components. Rejecting ".." for that case is just pure security theatre. Fixes: 30656c5 ("syscall: add symlink-race-safe do_*_at() wrappers and harden secure_relative_open") Signed-off-by: Aleksa Sarai <aleksa@amutable.com>

cyphar added 3 commits June 1, 2026 11:25

cyphar force-pushed the linux-openat-openat2 branch from 062c12e to 3d80c98 Compare June 1, 2026 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linux: improve openat and openat2 handling#939

linux: improve openat and openat2 handling#939
cyphar wants to merge 3 commits into
RsyncProject:masterfrom
cyphar:linux-openat-openat2

cyphar commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cyphar commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cyphar commented Jun 1, 2026 •

edited

Loading