linux: improve openat and openat2 handling#939
Open
cyphar wants to merge 3 commits into
Open
Conversation
On Linux, openat(2) famously does not return an error if you pass it invalid flag bits. This was something I fixed in openat2(2) but for backward compatibility reasons, openat(2) will always silently ignore unknown O_* flags (which makes adding new ones quite painful). As a result, the fallback path added by commits e1c5f0e ("t_chmod_secure: probe kernel RESOLVE_BENEATH at runtime; drop test skip") and commit 7f60ec0 ("syscall: also use O_RESOLVE_BENEATH on FreeBSD and MacOS") would always return sucess and could cause spurious test failures so we can just remove it. O_RESOLVE_BENEATH will never get added to Linux for this reason, and anyone doing so out-of-tree is just creating a massive foot-gun so there is no need to entertain such broken kernels. Fixes: 7f60ec0 ("syscall: also use O_RESOLVE_BENEATH on FreeBSD and MacOS") Fixes: e1c5f0e ("t_chmod_secure: probe kernel RESOLVE_BENEATH at runtime; drop test skip") Signed-off-by: Aleksa Sarai <aleksa@amutable.com>
In order to avoid in-kernel DoS due to unbounded retries, openat2(2) will return -EAGAIN when trying to walk through ".." if there is any racing rename or mount on the entire system. Note that this applies regardless of whether the rename was on the same filesytem or mount was in the same mount namespace as the process doing openat(2) -- as a result, calling openat2(2) on even a modestly busy system will result in spurious -EAGAIN errors every once in a while and it is necessary to implement a retry loop for it. (Libraries such as libpathrs and heavy users of openat2(2) like runc all do this, and in our testing we found that ~256 iterations is enough to provide resilience even on incredibly rename-heavy machines.) Fixes: 4fa7156 ("syscall: use openat2(RESOLVE_BENEATH) on Linux for secure_relative_open") Signed-off-by: Aleksa Sarai <aleksa@amutable.com>
This code was added in commit 30656c5 ("syscall: add symlink-race-safe do_*_at() wrappers and harden secure_relative_open") but the justification given is really not particularly convincing. The stated justification is that RESOLVE_BENEATH does not block some ".." escapes, which is categorically false (if it were true, that would be a Linux kernel bug and the kernel actually has an extra safety check at the end of lookup that would render this kind of breakout practically impossible). Reading the code, the actual reason appears to be to provide some consistency between the RESOLVE_BENEATH and fallback per-component O_NOFOLLOW lookups. However, the O_NOFOLLOW resolver can never have full parity with the RESOLVE_BENEATH resolvers because RESOLVE_BENEATH permit symlinks and symlinks can contain ".." components. It seems more prudent to just allow ".." for modern systems and reject it for older ones. If truly necessary, there are even ways to support ".." in a cross-platform way so this could even be seen as a "not-implemented-yet" error rather than a permanent API contract. For the really-old systems case (with no O_NOFOLLOW or AT_DIRFD) then just as we have to accept symlinks (which might contain "..") we might as well accept bare ".." components. Rejecting ".." for that case is just pure security theatre. Fixes: 30656c5 ("syscall: add symlink-race-safe do_*_at() wrappers and harden secure_relative_open") Signed-off-by: Aleksa Sarai <aleksa@amutable.com>
062c12e to
3d80c98
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When looking at the usage of
openatandopenat2on Linux, I noticed two bugs:openat(2)famously does not return-EINVALfor unknown flag bits on Linux, so the fallback code to useO_RESOLVE_BENEATHif defined would result in all open operations with the "secure resolver" becoming insecure. This would require a fairly convoluted build environment to trigger, so this isn't exploitable but if it were triggered it would be a security bug.openat2(2)did not handle-EAGAINat all, which could lead to spurious errors.openat2(2)can return-EAGAINwhen trying to walk into..during a scoped lookup likeRESOLVE_BENEATHand there was a rename or mount operation anywhere else on the system. The reason for this behaviour is a little esoteric but in short I implemented it this way so that userspace could choose their retry policy and (more importantly) the kernel could avoid potentially-unbounded retries that could lead to a DoS. The man page does mention this but it is easy to miss:I also took a quick look at the
path_has_dotdot_componentusage and I'm not really convinced that it is actually doing anything useful. The comments seem to be Claude-authored so I'm not sure reading them is super helpful, but the upshot of:appears to be that
..is problematic andRESOLVE_BENEATHcatches "some of these". This is is incorrect --..cannot escape the starting point withRESOLVE_BENEATH(andO_RESOLVE_BENEATHas it was modeled afterRESOLVE_BENEATH), regardless of where you put it in the path. However, you can walk into..during a lookup without issue if it remains within the starting root.Reading the rest of the code, it seems this was actually done because the fallback userspace resolver cannot handle
..and so the decision was to ban it for everything for "consistency" but this doesn't really make sense -- symlinks can contain..so there will always be an inconsistency between the resolver that rejects all symlinks and those that can handle them. It seems like it would be more helpful to most users to permit..forRESOLVE_BENEATHand only block it in the resolver.My final patch does this, but I can drop it if you prefer.
For what it's worth, there actually are safe ways of handling
..in userspace. libpathrs (shameless plug!) does this on Linux using/proc/self/fd/...to check the logical path against the real path in a very solid way, and Go'sos.Roothas a brute-force approach that in principle works on any platform by restarting the lookup each time they hit..and re-applying the logical path walked so far (I plan to adapt the approach for libpathrs at some point so it can be cross-platform as well).I suspect this will also help alleviate some of the issues people have had with regressions on v3.8.3.