Skip to content

Implement find builtin command#36

Open
matt-dz wants to merge 78 commits intomainfrom
matt-dz/implement-find-builtin
Open

Implement find builtin command#36
matt-dz wants to merge 78 commits intomainfrom
matt-dz/implement-find-builtin

Conversation

@matt-dz
Copy link
Collaborator

@matt-dz matt-dz commented Mar 10, 2026

Summary

  • Add a sandboxed find builtin for searching directory trees with expression-based predicates
  • Supports -name, -iname, -path, -ipath, -type, -size, -empty, -newer, -mtime, -mmin, -maxdepth, -mindepth, -print, -print0, -prune, -true, -false, and logical operators (!, -a, -o, ())
  • Blocks unsafe predicates (-exec, -delete, -regex, -fprint, etc.) for sandbox safety
  • Uses recursive-descent expression parser with bounded AST (max 64 depth, 256 nodes) and explicit-stack DFS traversal (capped at 256 depth)

Implementation

  • interp/builtins/find/find.go — Command handler, arg parsing, DFS walker
  • interp/builtins/find/expr.go — AST types + recursive-descent parser
  • interp/builtins/find/eval.go — Expression evaluator
  • interp/builtins/find/match.go — Glob/type/size matching helpers
  • 24 YAML test scenarios covering basic, predicates, logic, depth, size, output, prune, errors, and sandbox categories
  • Import allowlist updated with path.Match and strings.ToLower

Test plan

  • go test ./interp/... ./tests/... — all pass
  • RSHELL_BASH_TEST=1 go test ./tests/ -run TestShellScenariosAgainstBash — all pass (ordering-sensitive tests marked skip_assert_against_bash due to ReadDir returning sorted entries vs GNU find's filesystem order)
  • Pentest scenarios for resource exhaustion, symlink loops, blocked predicates
  • Code review cycle

🤖 Generated with Claude Code

@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 10, 2026

@codex review

Copy link
Collaborator Author

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Reviewed the full find builtin implementation across 4 new Go files, 24 YAML test scenarios, and supporting changes (registration, import allowlist, documentation).

Overall assessment: needs fixes

The architecture is sound — expression parser with bounded AST, explicit-stack DFS traversal, all filesystem access through sandbox APIs (callCtx.ReadDir, callCtx.LstatFile, callCtx.StatFile). The blocked-predicates list correctly prevents -exec, -delete, -regex, and friends. No direct os filesystem calls.

However, there is one P1 correctness bug and several P2 gaps that need attention before merge.

# Priority File Finding
1 P1 Badge interp/builtins/find/find.go:116 -maxdepth/-mindepth extraction steals arguments from predicates
2 P2 Badge interp/builtins/find/eval.go:31 Dead field followLink in evalContext
3 P2 Badge interp/builtins/find/ No Go unit tests for parser, evaluator, or matcher
4 P2 Badge tests/scenarios/cmd/find/ Missing test coverage for -mtime, -mmin, -newer, -path, -ipath, -L flag, -size with various units
5 P2 Badge interp/builtins/find/expr.go:308 -type accepts empty string argument without error
6 P3 Badge interp/builtins/find/match.go:124 baseName only handles / separator — add comment noting shell normalizes paths
7 P3 Badge tests/scenarios/cmd/find/logic/or.yaml skip_assert_against_bash scenarios should document the reason for divergence

Positive Observations

  • All filesystem access uses sandbox APIs — no direct os.Open/os.Stat/os.ReadDir calls
  • Expression parser has proper AST bounds (64 depth, 256 nodes) preventing resource exhaustion
  • DFS traversal uses explicit stack capped at 256 depth
  • Context cancellation checked before every ReadDir and on each stack iteration
  • Blocked predicates list is comprehensive and well-justified
  • path.Match (Go stdlib) is immune to ReDoS
  • -newer file access goes through sandbox — no information leak outside AllowedPaths

Copy link
Collaborator Author

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary (Round 2)

Re-reviewed after fixes for round 1 findings.

Overall assessment: safe to merge (with minor P3 nits)

The P1 from round 1 (-maxdepth/-mindepth extraction corrupting predicate arguments) has been properly fixed — depth options are now only parsed from leading positions before the expression starts. The dead followLink field has been removed, -type "" validation added, baseName documented, and 6 new test scenarios added for -path, -ipath, -mmin, -mtime, -size variants, and -type "" error.

No P0 or P1 findings remain. Two minor P3 items noted below.

# Priority File Finding
1 P3 Badge tests/scenarios/cmd/find/errors/nonexistent.yaml:14 skip_assert_against_bash nested under expect (silently ignored)
2 P3 Badge interp/builtins/find/eval.go:131 -newer stats reference file once per tree entry — could cache

Positive Observations

  • P1 fix is clean — leading-only parse with break at first non-depth-option, passes remaining args as a slice without mutation
  • All round 1 P2 findings addressed: dead field removed, -type "" validated, new test scenarios cover previously-untested predicates
  • Sandbox integrity remains solid — all filesystem access exclusively through callCtx.ReadDir, callCtx.StatFile, callCtx.LstatFile
  • Expression parser bounds (64 depth, 256 nodes) prevent resource exhaustion
  • Blocked predicate list is comprehensive
  • -L symlink loop bounded by maxTraversalDepth=256 (consistent with ls -R approach)

@datadog-prod-us1-3

This comment has been minimized.

@AlexandreYang
Copy link
Member

@codex @DataDog make a comprehensive code and security review

@datadog-prod-us1-6
Copy link
Contributor

datadog-prod-us1-6 bot commented Mar 10, 2026

View session in Datadog

Bits Dev status: ✅ Done

CI Auto-fix: Disabled | Enable

Comment @DataDog to request changes

@datadog-official
Copy link
Contributor

Great PR overall — I did a full pass on the new find builtin implementation and tests with both code-quality and security lenses.

What I reviewed

  • interp/builtins/find/find.go
  • interp/builtins/find/expr.go
  • interp/builtins/find/eval.go
  • interp/builtins/find/match.go
  • Integration points:
    • interp/register_builtins.go
    • interp/builtins/builtins.go
    • tests/scenarios/cmd/find/**
    • tests/import_allowlist_test.go

I also ran:

  • go test ./tests -count=1
  • go test ./tests -run 'TestShellScenarios/cmd/find' -count=1
  • manual behavior repros with go run ./cmd/rshell ...

Findings

1) P1 – Bash compatibility bug: -newer missing reference is silently ignored

Location: interp/builtins/find/eval.go:130-134

evalNewer does:

refInfo, err := ec.callCtx.StatFile(ec.ctx, refPath)
if err != nil {
    return false
}

So a nonexistent/inaccessible reference file just makes predicate false per-entry, with no error and exit code 0.

Why this is a problem

  • GNU find errors out and exits non-zero when the -newer reference cannot be stat’d.
  • Silent failure can hide user mistakes and produce misleading “no matches” results.

Observed repro

  • rshell: exit 0, no stderr for missing ref.
  • GNU find: exit 1, explicit “No such file or directory”.

Recommended fix

  • Resolve/stat -newer reference once during parse/setup (or first use), cache mtime in expression node, and fail command immediately on error.

2) P2 – Input validation bug: malformed -type values accepted

Location:

  • Parser validation: interp/builtins/find/expr.go:310-317
  • Matcher behavior: interp/builtins/find/match.go:37-46

Current parser allows commas anywhere, so -type , is accepted. Matcher skips commas and returns false for all files, so command succeeds with zero output.

Why this is a problem

  • GNU find treats malformed -type arguments as an error (non-zero exit).
  • Accepting malformed predicates reduces compatibility and can mask mistakes.

Recommended fix

  • Parse as comma-separated tokens and require each token to be exactly one valid type char (f,d,l,p,s), no empty elements, no leading/trailing/consecutive commas.

3) P2 – Performance/security hardening: -newer does repeated stat per entry

Location: interp/builtins/find/eval.go:130-132

The reference path is stat’d for every visited file. On large trees this is avoidable overhead and can become expensive.

Recommended fix


Security review summary

Good security controls already present

  • Denylist blocks dangerous predicates (-exec, -delete, -fprint*, -regex, etc.) in expr.go.
  • AST complexity protections (maxExprDepth, maxExprNodes).
  • Traversal depth cap and explicit-stack DFS in find.go.
  • Uses sandboxed CallContext filesystem APIs only.

Hardening gap (non-blocking but worth adding)

  • No explicit test scenarios for -L symlink cycles/fanout exhaustion behavior (depth cap helps, but dedicated regression tests would be valuable).

Recommendation on merge readiness

I would not approve as-is due to finding #1 (-newer error handling).
After fixing #1 and #2, this looks strong and close to merge quality.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0c155cb1c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

matt-dz and others added 4 commits March 11, 2026 08:58
Add a sandboxed find builtin for searching directory trees by name,
type, size, and time predicates. Uses a recursive-descent expression
parser with bounded AST (max 64 depth, 256 nodes) and explicit-stack
DFS traversal (capped at 256 depth). Blocks -exec, -delete, -regex
and other unsafe predicates for sandbox safety.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P1: Fix -maxdepth/-mindepth extraction to only parse leading positions,
preventing corruption of predicate arguments (e.g. find . -name -maxdepth).

P2: Remove dead followLink field from evalContext, add empty-string
validation for -type predicate, add test scenarios for -path, -ipath,
-mmin, -mtime, -size variants, and -type empty error.

P3: Document baseName forward-slash-only convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The field was nested under expect: where it was silently ignored by the
YAML decoder. Removed since the test passes against bash anyway.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- builtins.go: Expand ReadDir comment to explain that sorted entries
  cause builtins (ls -R, find) to produce deterministic but different
  output ordering than GNU coreutils/findutils
- find.go: Add NOTE explaining the ordering divergence at the walker
- empty_file.yaml: Add skip_assert_against_bash (ordering divergence)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matt-dz matt-dz force-pushed the matt-dz/implement-find-builtin branch from d0c155c to b3d04ee Compare March 11, 2026 12:59
P1: Surface -newer reference file errors — report to stderr and set
exit code 1 instead of silently returning false. Cache reference file
modtime to avoid redundant stat calls per entry.

P1: Detect symlink loops with -L — track visited directory paths to
break cycles, preventing CPU/memory DoS on attacker-controlled trees.

P2: Reject malformed -type arguments — validate comma-separated format
properly, rejecting leading/trailing/consecutive commas and adjacent
type chars without separators (e.g. ",", "f,", ",d", "fd").

Low: Add comment explaining why errorf uses errors.New(fmt.Sprintf())
instead of fmt.Errorf (fmt.Errorf is not in the import allowlist).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add fmt.Errorf to the import allowlist (pure function, no I/O) and
replace all errorf() calls with fmt.Errorf() directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 11, 2026

@codex @DataDog make a comprehensive code and security review

@datadog-datadog-prod-us1-2
Copy link

Absolutely — I completed a comprehensive code + security review of this PR’s find builtin.

Scope reviewed

  • interp/builtins/find/find.go
  • interp/builtins/find/expr.go
  • interp/builtins/find/eval.go
  • interp/builtins/find/match.go
  • Related wiring/docs/tests:
    • interp/register_builtins.go
    • interp/builtins/builtins.go
    • tests/scenarios/cmd/find/**
    • tests/import_allowlist_test.go
    • SHELL_FEATURES.md

I also validated behavior with:

  • go test ./tests -count=1
  • go test ./tests -run 'TestShellScenarios/cmd/find' -count=1
  • manual GNU find vs rshell repros.

Findings

P1 — -newer missing reference file is silently ignored (bash compatibility + correctness)

Where: interp/builtins/find/eval.go (evalNewer)

Current behavior:

  • If StatFile(refPath) fails, code returns false for that entry.
  • Command can finish with exit 0 and no stderr.

Expected (GNU find):

  • Missing/inaccessible -newer reference should be an error with non-zero exit.

Impact

  • User mistakes are silently masked as “no matches”.
  • Diverges from expected find semantics.

Fix recommendation

  • Validate/stat -newer reference once during parse/setup.
  • If stat fails, print error and exit 1.
  • Cache reference mtime for evaluation (also improves performance).

P2 — malformed -type arguments accepted (compatibility/input validation)

Where:

  • Validation: interp/builtins/find/expr.go (parseTypePredicate)
  • Matching: interp/builtins/find/match.go (matchType)

Current parser accepts comma anywhere, so -type , is accepted and just matches nothing (exit 0).

Expected (GNU find):

  • malformed type list should error out (exit 1).

Fix recommendation

  • Parse -type as comma-separated tokens.
  • Require each token to be exactly one valid type char (f,d,l,p,s).
  • Reject empty segments (leading/trailing/double commas).

P2 — avoid repeated stat of -newer reference per file (perf/security hardening)

Where: interp/builtins/find/eval.go (evalNewer)

Reference file is stat’d for every visited path.

Impact

  • Unnecessary overhead on large trees.
  • Can amplify cost under heavy workload.

Fix recommendation

  • Resolve reference once and reuse cached modtime.

P3 — hardening gap: no dedicated tests for -L symlink cycles / traversal exhaustion

Where: behavior in interp/builtins/find/find.go; test coverage gap in tests/scenarios/cmd/find/**

The implementation has good depth caps (maxTraversalDepth, AST limits), but there are no explicit regression scenarios for:

  • symlink loops with -L
  • large/fanout traversal stress

Fix recommendation

  • Add pentest scenarios for loop/fanout limits and confirm deterministic non-hanging behavior.

Security posture (positive)

Strong design choices already present:

  • Dangerous predicates blocked (-exec, -delete, -fprint*, -regex, etc.).
  • AST complexity bounded (maxExprDepth, maxExprNodes).
  • Explicit-stack DFS with traversal depth cap.
  • All file operations go through sandboxed CallContext APIs.
  • Import allowlist remains strict.

Overall assessment

  • Do not merge yet due to P1 (-newer error handling).
  • After fixing P1 + P2, this is close to merge-ready.
  • Security model is solid, with remaining work mainly in hardening tests and one perf/correctness cleanup.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10e4148c02

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

matt-dz and others added 3 commits March 11, 2026 09:29
Cover all major code paths including symlinks (-L, loop detection,
broken links), -newer, -true/-false, comma-separated -type, size
units (c/w/k/M/b), exact numeric comparisons, logical operator
aliases (-not/-and/-or), parser error paths, sandbox-blocked
predicates, paths with spaces, pipe integration, and explicit -print.

Brings total find test scenarios from 29 to 81.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix P1 bug where -newer with a nonexistent reference file would cache
a zero-time sentinel, causing subsequent entries (with fresh
evalContexts) to match against time.Time{} instead of returning false.
Replace per-evalContext newerErr flag with a shared newerErrors map
that persists across all entries in the traversal, consistent with
newerCache.

Also improve documentation:
- Clarify -maxdepth/-mindepth leading-only parse is intentional
- Document symlink loop detection path-text limitation and
  maxTraversalDepth=256 safety bound

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move depth option parsing from a leading-only extraction loop into the
recursive-descent expression parser itself. This allows commands like
`find . -name '*.go' -maxdepth 1` to work correctly, while the parser's
natural token ownership prevents the argument-stealing bug that the
previous naive scan-and-extract approach suffered from.

Add 9 test scenarios covering parser integration, anti-stealing, depth
band selection, last-wins semantics, and edge cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Collaborator

@julesmcrt julesmcrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR implements a find builtin with good overall design: sandbox enforcement goes through os.Root (immune to .. traversal), blocked predicates (-exec, -delete, -regex, etc.) are enforced at parse time, expression AST size is bounded (64 depth / 256 nodes), and the explicit-stack DFS avoids deep Go recursion. The import allowlist is correctly updated. No P0/P1 issues found.

Verdict: needs minor fixes — four P2 findings (one correctness bug, three coverage gaps) and two P3 issues.

# Priority File Finding
1 P2 Badge interp/builtins/find/match.go:96 Integer overflow in compareSize for file sizes near MaxInt64
2 P2 Badge tests/scenarios/cmd/find/predicates/false.yaml skip_assert_against_bash: true on tests whose output is order-independent
3 P2 Badge tests/scenarios/cmd/find/ Missing sandbox coverage: no test for -newer with a path outside allowed_paths
4 P2 Badge interp/builtins/find/find.go:111–117 Intentional -maxdepth-after-predicates limitation has no scenario test
5 P3 Badge interp/builtins/find/ Missing pentest Go tests (acknowledged in PR checklist)
6 P3 Badge tests/scenarios/cmd/find/ No stdout_windows overrides for path-separator output on Windows

…ings

Path-based cycle detection missed loops that re-enter the same directory
under different textual paths (e.g. dir/a/link_to_dir → dir), allowing
expansion until maxTraversalDepth=256. Now track visited directories by
canonical file identity (device + inode) on Unix via FileInfo.Sys(), with
path-based fallback on Windows. Zero additional I/O — reuses FileInfo
already in hand.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 11, 2026

@codex @DataDog Ready for another review — this latest commit replaces path-based symlink loop detection with canonical file identity (dev+inode) tracking, addressing the P1 DoS concern. Would appreciate a look at the FileIdentity callback pattern and the Windows fallback path. Thanks!

The evalMtime refactor introduced time.Hour and time.Second usage
which were not in the allowed symbols list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7375410f46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +313 to +314
if ancestorPaths[path] {
callCtx.Errf("find: File system loop detected; '%s' has already been visited.\n", path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect symlink cycles when file identity lookup fails

The fallback cycle check uses the raw traversed path string, so loops like a/loop -> . become a/loop, a/loop/loop, a/loop/loop/loop, etc. and never hit ancestorPaths[path]; this means if FileIdentity is unavailable for a subtree (e.g., platform/ACL-specific identity lookup failure), loop detection is effectively disabled and traversal repeats until the hard depth cap, causing avoidable CPU and output amplification.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Opus 4.6] Valid observation. The path-based fallback won't detect cycles because the traversed path keeps growing (a/loop, a/loop/loop, ...) and never matches an ancestor entry.

However, I believe this is acceptable as-is for the following reasons:

  1. FileIdentity works on all supported platforms (Linux/macOS via dev+inode, Windows via volume serial+file index). The path-based fallback only activates when the syscall fails, which is rare (exotic filesystems, ACL edge cases).

  2. maxTraversalDepth=256 provides a hard safety cap — even if the fallback fails, traversal stops at depth 256. For an AI agent sandbox, this is the ultimate bound.

  3. A proper fix would require symlink resolution (os.EvalSymlinks or equivalent), which isn't available within the builtin sandbox and would add syscall overhead per directory entry.

The existing code comments (lines 267-274) explicitly acknowledge the depth cap as the ultimate safety bound. If we want tighter cycle detection in the fallback, we could resolve canonical paths via StatFile + comparing dev/inode from the info, but that would duplicate what FileIdentity already does.

Leaving this for the reviewer to decide — not resolving the thread.

matt-dz and others added 2 commits March 13, 2026 14:52
Narrow the OpenDir return type from *os.File to fs.ReadDirFile so
builtin implementations only see read-only directory methods (ReadDir,
Close, Read, Stat) instead of the full os.File method set (Seek, Write,
Truncate, Sync, Fd, etc.).

*os.File already satisfies fs.ReadDirFile, so no wrapper is needed.

Addresses PR #36 feedback from thieman.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ching

1. Set failed=true when ctx.Err() is detected in both the path
   iteration loop and the walk loop, so cancelled/timed-out find
   returns exit code 1 instead of silently succeeding with partial
   output.

2. Remove the negative diff clamp in evalMtime's exact-match branch.
   GNU find computes negative day buckets for future-dated files, so
   they never match non-negative -mtime N. The clamp incorrectly
   forced them into bucket 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matt-dz matt-dz requested a review from thieman March 13, 2026 19:05
@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 13, 2026

Hey @thieman — addressed your *os.File feedback (narrowed to fs.ReadDirFile) and picked up the remaining bot comments:

  • Context cancellation: find now returns exit 1 when cancelled mid-traversal instead of silently succeeding with partial output.
  • Future-dated file mtime: Removed the negative delta clamp so -mtime 0 no longer incorrectly matches future-dated files, matching GNU find.
  • Symlink cycle fallback (path-based detection): Acknowledged the limitation but we think the tradeoff is acceptable here — FileIdentity works on all supported platforms, and the maxTraversalDepth=256 hard cap provides defense-in-depth for the rare case where it doesn't. Happy to revisit if you feel differently.

Ready for another look when you have a moment!

@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 13, 2026

@chatgpt-codex-connector — addressed your review comments:

  • Context cancellation: find now returns exit 1 when cancelled mid-traversal instead of silently succeeding with partial output.
  • Future-dated file mtime: Removed the negative delta clamp so -mtime 0 no longer incorrectly matches future-dated files, matching GNU find.
  • Symlink cycle fallback (path-based detection): Acknowledged the limitation but the tradeoff is acceptable here — FileIdentity works on all supported platforms, and the maxTraversalDepth=256 hard cap provides defense-in-depth for the rare case where it doesn't.

Could you take another look?

matt-dz and others added 3 commits March 13, 2026 15:11
1. mmin_plus_zero: use explicit mod_time in the past instead of relying
   on file creation timing. On fast Windows CI runners, the file could
   be created in the same time quantum as Now(), making -mmin +0 fail.

2. ls/sandbox/outside_allowed_paths: update stderr_windows to match
   current Go os.Root error format (no longer includes "statat" prefix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The scenario expected "      3\n" (width-7 padding) but GNU wc outputs
"3\n" for single-column stdin. Verified with:
  docker run --rm debian:bookworm-slim bash -c 'printf "one\ntwo\nthree\n" | wc -l'

Removed skip_assert_against_bash since our output now matches bash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9fbd9aa44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

GNU find evaluates age predicates relative to a single invocation
timestamp. Previously, walkPath called callCtx.Now() per root path,
so multi-path invocations could produce inconsistent results for
files near minute/day boundaries.

Now captured once in run() and passed via walkOptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds TestNowCalledOnce which runs find with two root paths and a
-mmin predicate, asserting that callCtx.Now() is invoked exactly
once (not per root path). This guards against regressions where
multi-path find invocations evaluate age predicates against
inconsistent timestamps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 13, 2026

@chatgpt-codex-connector — addressed your latest feedback:

  • Capture evaluation time once (cced11b): Moved callCtx.Now() from walkPath to run() so all root paths evaluate age predicates against a single invocation timestamp, matching GNU find.
  • Added regression test (4a43391): TestNowCalledOnce asserts Now() is called exactly once per find invocation, regardless of how many root paths are provided.

Ready for another look!

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a4339116e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

GNU find falls back to lstat when a -newer reference is a dangling
symlink under -L mode, rather than treating it as fatal. Apply the
same fallback in both eager validation (find.go) and evalNewer
(eval.go) to match bash-compatible behavior.

Added scenario test for -L with dangling symlink -newer reference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 13, 2026

@chatgpt-codex-connector — addressed your latest comment:

  • Dangling -newer refs under -L (f4b613a): Both eager validation and evalNewer now fall back to LstatFile when StatFile fails with ErrNotExist under -L, matching GNU find. Added scenario test newer_dangling_symlink_L.

Ready for another look!

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4b613a5e9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

matt-dz and others added 2 commits March 13, 2026 16:03
Set mod_time on new.txt to ensure it is strictly newer than the
symlink reference, avoiding nondeterminism on filesystems with
coarse or same-tick timestamps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nt-find-builtin

# Conflicts:
#	allowedpaths/portable.go
#	allowedpaths/portable_unix.go
#	allowedpaths/portable_windows.go
#	allowedpaths/sandbox.go
#	interp/register_builtins.go
#	tests/allowed_symbols_test.go
The Windows error message for accessing /etc varies between Go
versions (with/without "statat" prefix). Use stderr_contains_windows
to match just the stable prefix instead of the exact string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 13, 2026

@chatgpt-codex-connector — merged main and resolved conflicts. All functionality intact, CI green on Linux/macOS/Docker. The only remaining failure is the verified/allowed_symbols label check (needs human approval since allowedsymbols/ was updated during the merge).

Ready for another look!

On fast Windows CI runners, the file could be created with the same
timestamp as Now(), making -mmin 0 match (ceil(0) == 0). Use explicit
mod_time in the past to ensure the file is always > 0 minutes old.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@matt-dz
Copy link
Collaborator Author

matt-dz commented Mar 13, 2026

@codex make a comprehensive code and security review

…d-builtin

# Conflicts:
#	allowedsymbols/symbols_builtins.go
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32c6ee0d18

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +176 to +178
if followLinks {
if _, lerr := callCtx.LstatFile(ctx, ref); lerr == nil {
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject escaped -newer references when -L cannot follow target

In -L mode this fallback treats every StatFile failure as if it were a dangling symlink and silently switches to LstatFile. If a -newer reference is a symlink inside allowed_paths that points outside the sandbox, StatFile returns a permission/path-escape error but LstatFile succeeds, so find -L ... -newer ref proceeds and exits 0 instead of enforcing the sandbox denial (while direct outside refs fail). This weakens the path restriction model and changes matching behavior; the fallback should only apply to dangling-target errors.

Useful? React with 👍 / 👎.

Comment on lines +422 to +424
if err != nil {
// Dangling symlink: stat fails but lstat succeeds.
childInfo, err = callCtx.LstatFile(ctx, childPath)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Surface symlink escape errors during -L directory walk

The -L walker also falls back from StatFile to LstatFile on any error, so symlinks that resolve outside allowed_paths are silently treated as regular links instead of reporting a sandbox violation. In practice, an in-sandbox link like out -> /etc makes find -L . succeed and print ./out with exit code 0, even though following the link is disallowed. This masks policy violations and makes -L ineffective for enforcing follow-link access checks.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants