Author: Security Analysis
Date: 2026-04-02
Scope: /sessions/cool-friendly-einstein/mnt/claude-code/src/utils/bash/ (23 files, 12,306 LOC)
Claude Code's bash parser and execution subsystem is a sophisticated security architecture that replaces naive shell-quote parsing with a fail-closed, tree-sitter-based AST analysis system. The design philosophy is explicit: never interpret code we don't understand. If the parser encounters an unallowlisted node type, the entire command is classified as "too-complex" and requires explicit user approval.
The system implements 15 dangerous AST node types, 35+ dangerous shell builtins, and strict semantic checks on command arguments before execution. This is not a sandbox—it's a permission-and-visibility system that answers one critical question: Can we produce a trustworthy argv[] for static analysis? If yes, downstream permission matching proceeds. If no, the user is asked.
The bash parsing system consists of four distinct layers:
┌─────────────────────────────────────────────────────────┐
│ Layer 1: Lexer & Tokenizer (bashParser.ts) │
│ ├─ Token types: WORD, OP, COMMENT, STRING, DOLLAR, etc │
│ ├─ UTF-8 byte offset tracking (not JS char indices) │
│ └─ Heredoc delimiter extraction │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Recursive Descent Parser (bashParser.ts) │
│ ├─ Grammar-based AST construction │
│ ├─ 50ms timeout + 50K node budget (anti-DoS) │
│ └─ TsNode output (tree-sitter compatible format) │
├─────────────────────────────────────────────────────────┤
│ Layer 3: Security-Aware Tree Walking (ast.ts) │
│ ├─ Allowlist-based node type filtering (FAIL-CLOSED) │
│ ├─ Variable scope tracking (VAR_PLACEHOLDER, literals) │
│ ├─ Dangerous pattern detection │
│ └─ Outputs: SimpleCommand[] with argv/envVars/redirects │
├─────────────────────────────────────────────────────────┤
│ Layer 4: Semantic Post-Checks (ast.ts) │
│ ├─ Dangerous builtin detection │
│ ├─ Subscript arithmetic evaluation guards │
│ ├─ Path/newline/hash injection detection │
│ └─ Returns: ok:true OR rejection reason │
└─────────────────────────────────────────────────────────┘
User Command
↓
[parseForSecurity(cmd)] ← Entry point in ast.ts
├─ Pre-checks: control chars, unicode whitespace, zsh syntax
├─ Parse: parseCommandRaw(cmd) → AST root
└─ Walk: parseForSecurityFromAst(root) → SimpleCommand[]
↓
├─ kind: 'simple' → proceed with argv validation
├─ kind: 'too-complex' → request user approval
└─ kind: 'parse-unavailable' → fallback to shell-quote (legacy)
↓
[checkSemantics(commands)] ← Post-parse validation
├─ Strip safe wrappers (nohup, time, timeout, nice, env, stdbuf)
├─ Dangerous builtin check (eval, source, trap, enable, etc.)
├─ Array subscript arithmetic detection
├─ Path constraint checks
└─ Return: ok:true OR reject
↓
[Permission Matching] ← bashPermissions.ts
└─ Match argv[0] + deny rules against user allowlist
↓
[Subprocess Execution] ← BashTool
└─ spawn with sandboxed environment
// parser.ts lines 65-82
if (feature('TREE_SITTER_BASH')) {
await ensureParserInitialized()
const mod = getParserModule()
logLoadOnce(mod !== null)
if (!mod) return null
const rootNode = mod.parse(command)
// ... tree-sitter path
}
// Falls back to shell-quote on unavailable- TREE_SITTER_BASH: Primary parser (enabled in production/ant-builds)
- TREE_SITTER_BASH_SHADOW: Logging-only fallback path
- External builds: use legacy
shell-quoteparsing
The primary parser is hand-rolled (not tree-sitter WASM), implementing bash grammar in pure TypeScript with these characteristics:
Context-sensitive lexing for bash:
function nextToken(L: Lexer, ctx: 'cmd' | 'arg' = 'arg'): Token {
// In 'cmd' mode: [ [[ { are operators (test/group start)
// In 'arg' mode: they're word characters (glob/subscript)
// Multi-char operators (longest match):
// && || |& && |
// << <<- <<< <& <(
// >> >& >| &> &>>
// ;; ;;& ;&
// (( ))
// Handles UTF-8 byte offsets (critical for source positions)
// Tracks heredoc pending delimiters
}Token Types (15 types):
WORD: Unquoted textNUMBER: Digits (with base syntax:10#ff)OP: Operators (&&,||,|,;,>,<,(,), etc.)NEWLINE: Line separatorCOMMENT:# ...textDQUOTE/SQUOTE: Quote delimitersANSI_C:$'...'stringsDOLLAR/DOLLAR_PAREN/DOLLAR_BRACE/DOLLAR_DPARENBACKTICK: Command substitution delimiterLT_PAREN/GT_PAREN: Process substitution<(...)/>(...)EOF: End of input
// bashParser.ts lines 145-159
function advance(L: Lexer): void {
const c = L.src.charCodeAt(L.i)
L.i++ // JS string index
if (c < 0x80) {
L.b++ // ASCII: 1 byte
} else if (c < 0x800) {
L.b += 2 // 2-byte UTF-8
} else if (c >= 0xd800 && c <= 0xdbff) {
L.b += 4 // Surrogate pair: 4 bytes
} else {
L.b += 3 // 3-byte UTF-8
}
}This ensures startIndex/endIndex in AST nodes are UTF-8 byte offsets (matching tree-sitter), not JS character indices—critical for correct source mapping.
// bashParser.ts lines 27-31
const PARSE_TIMEOUT_MS = 50 // Wall-clock timeout
const MAX_NODES = 50_000 // Node budget cap
const MAX_COMMAND_LENGTH = 10000 // Character limit
// Adversarial input example that triggers timeout:
// `(( a[0][0][0]... ))` with ~2800 subscripts exhausts budgetReturns PARSE_ABORTED (distinct sentinel) when timeout/budget exceeded—treated as fail-closed (too-complex), not routed to legacy fallback.
Core statement types parsed:
program
├─ command (pipe operators │ separate)
├─ list (&&, ||, ; chaining)
├─ pipeline (| | |&)
├─ redirected_statement (> >> < <<'EOF' etc.)
├─ negated_command (! cmd)
├─ declaration_command (export/declare/typeset/readonly/local)
├─ variable_assignment (VAR=value at statement level)
├─ for_statement (for VAR in LIST; do BODY; done)
├─ while_statement (while COND; do BODY; done)
├─ if_statement (if COND; then...; fi)
├─ case_statement (case VAR in PATTERN) cmd;; esac)
├─ function_definition (name() { BODY })
├─ subshell ((cmd))
├─ compound_statement ({ cmd; })
├─ test_command ([[ EXPR ]])
├─ unset_command (unset VAR...)
└─ comment (# text)
command
├─ variable_assignment* (VAR=x prefix)
├─ command_name | word (argv[0])
├─ argument* (argv[1..n])
└─ file_redirect* (> >> < << <<<)
argument
├─ word
├─ number (with `NN#` base syntax)
├─ raw_string ('...')
├─ string ("...") [with expansion children]
├─ concatenation (adjacent string/word parts)
├─ arithmetic_expansion ($((expr)))
├─ simple_expansion ($VAR)
├─ command_substitution ($(...) or `...`)
└─ process_substitution (<(...) >(...)
Structural Nodes (traversed recursively, not executed):
// ast.ts lines 54-65
const STRUCTURAL_TYPES = new Set([
'program', // Root node
'list', // a && b || c
'pipeline', // a | b | c
'redirected_statement', // cmd > file
])
const SEPARATOR_TYPES = new Set([
'&&', '||', '|', ';', '&', '|&', '\n'
])Dangerous Node Types (15 types—immediate rejection):
// ast.ts lines 186-205
const DANGEROUS_TYPES = new Set([
'command_substitution', // $(cmd) or `cmd`
'process_substitution', // <(cmd) >(cmd)
'expansion', // ${VAR} general expansion
'simple_expansion', // $VAR (conditionally allowed)
'brace_expression', // {a,b,c}
'subshell', // (cmd1; cmd2)
'compound_statement', // { cmd; }
'for_statement', // for VAR in ...; do...; done
'while_statement', // while COND; do...; done
'until_statement', // until COND; do...; done
'if_statement', // if...; then...; fi
'case_statement', // case VAR in PATTERN) cmd;; esac
'function_definition', // name() { body }
'test_command', // [[ ... ]]
'ansi_c_string', // $'...\n...'
'translated_string', // $"..." (gettext)
'herestring_redirect', // <<< content
'heredoc_redirect', // << delimiter
])Safe Leaf Nodes (terminal, no dangerous children):
// walkArgument switch cases (ast.ts ~1408):
case 'word':
return node.text.replace(/\\(.)/g, '$1') // Unescape
case 'number':
// SECURITY: NN#<expansion> syntax (e.g., 10#$(cmd))
// tree-sitter emits expansion as child—MUST REJECT
if (node.children.length > 0) return tooComplex(node)
return node.text
case 'raw_string':
return stripRawString(node.text) // Remove ' quotes
case 'string':
return walkString(node, ...) // Complex—see below
case 'concatenation':
// Adjacent strings/words: "foo"bar → "foobar"
// Brace expansion check ($BRACE_EXPANSION_RE)
// Recurse into children
case 'arithmetic_expansion':
const err = walkArithmetic(node) // Validate literals only
return node.text // Return full $((...))Safe Command-Level Nodes:
declaration_command (export/declare/typeset/readonly/local)
negated_command (! cmd)
variable_assignment (VAR=value at statement level)
unset_command (unset VAR...)
test_command ([[ ... ]])
// ast.ts lines 212-218
export function nodeTypeId(nodeType: string | undefined): number {
if (!nodeType) return -2 // No type (pre-check)
if (nodeType === 'ERROR') return -1 // Parse error
const i = DANGEROUS_TYPE_IDS.indexOf(nodeType)
return i >= 0 ? i + 1 : 0 // Unknown type ID
}
// Indices (stable for analytics):
// 1=command_substitution, 2=process_substitution, 3=expansion,
// 4=simple_expansion, 5=brace_expression, ..., 15=heredoc_redirectCore Principle: If we can't prove a node is safe, reject it.
// ast.ts lines 1-19
/**
* FAIL-CLOSED: we never interpret structure we don't understand.
* If tree-sitter produces a node we haven't explicitly allowlisted,
* we refuse to extract argv and the caller must ask the user.
*/Example: New bash feature added to tree-sitter grammar? Old parser version will correctly reject it as tooComplex rather than silently misinterpreting it.
Problem: VAR=safe && cmd $VAR must resolve $VAR to safe, not treat it as unknown.
Solution: varScope Map tracks variable assignments as we walk the AST:
// ast.ts lines 472-475
const varScope = new Map<string, string>()
const err = collectCommands(root, commands, varScope)
// varScope maps var names → literal values or placeholdersThree Value Types:
-
Literal strings (e.g.,
/tmp,foo):- Returned DIRECTLY to downstream path validation
VAR=/etc && rm $VAR→ argv=['rm', '/etc']
-
Placeholders:
CMDSUB_PLACEHOLDER=__CMDSUB_OUTPUT__(from$(cmd))VAR_PLACEHOLDER=__TRACKED_VAR__(unknown-value: loop var, read stdin, etc.)- Bare
$VARwith placeholder → too-complex (can't prove safety) - Inside strings:
"prefix$VAR"→ allowed (output embedded, not a bare arg)
-
containsAnyPlaceholder() guard (ast.ts lines 94-96):
function containsAnyPlaceholder(value: string): boolean { return value.includes(CMDSUB_PLACEHOLDER) || value.includes(VAR_PLACEHOLDER) }
Problem: Variables set in a subshell/branch shouldn't leak outside:
# Flag-omission attack:
true || FLAG=--dry-run && cmd $FLAG
# Bash: || RHS doesn't run → FLAG unset → $FLAG empty
# If scope leaked: our argv would show ['cmd', '--dry-run'] → bypassSolution (ast.ts lines 504-564): Scope snapshots at branch points.
// Pipeline: ALL stages are subshells
if (isPipeline) {
scope = new Map(varScope) // Copy scope for first stage
}
// List (&&, ||): snapshot at entry
const snapshot = needsSnapshot ? new Map(varScope) : null
// After ||/| operators, reset to snapshot
if (child.type === '||' || child.type === '|' || child.type === '&') {
scope = new Map(snapshot ?? varScope)
}- &&, ;: Sequential, scope shared (VAR=x && cmd $VAR)
- ||, |, &: Isolated, scope reset to pre-operator state
- for/while bodies: Scope copies (body assignments don't leak)
- if branches: Scope copies (branch assignments don't leak)
Bash while read VAR in a condition: VAR is set in the condition (always runs) but body is conditional.
while read V < file; do
cmd $V
doneSolution (ast.ts lines 839-877): Track read VAR in REAL scope (accessible to body), but reject if the read MIGHT NOT execute:
// e.g., if true || read VAR; then... fi
// The ||'d read may not run, so VAR might not be set
// But VAR is already tracked → placeholder override bypass
// Fail closed: reject if overwriting a known literalWhen $(cmd) appears in argv, the inner command is extracted and permission-checked separately:
// ast.ts lines 1374-1393
function collectCommandSubstitution(
csNode: Node,
innerCommands: SimpleCommand[],
varScope: Map<string, string>,
): ParseForSecurityResult | null {
// Inner commands see COPY of outer scope (no pollution)
const innerScope = new Map(varScope)
for (const child of csNode.children) {
if (child.type === '$(' || child.type === '`' || child.type === ')') {
continue
}
const err = collectCommands(child, innerCommands, innerScope)
if (err) return err
}
return null
}Examples:
echo $(git rev-parse HEAD)
# Extracts: ['echo', '$(git rev-parse HEAD)']
# ['git', 'rev-parse', 'HEAD']
# Both must match permission rules
git commit -m "SHA: $(git rev-parse --short HEAD)"
# Outer: ['git', 'commit', '-m', 'SHA: __CMDSUB_OUTPUT__']
# Inner: ['git', 'rev-parse', '--short', 'HEAD']Pattern: $(cat <<'EOF'...EOF) is idiomatic for multi-line content.
// ast.ts lines 1721-1775
function extractSafeCatHeredoc(subNode: Node): string | 'DANGEROUS' | null {
// Exact match: $(cat <<'DELIM'...DELIM)
// - Quoted delimiter (literal body)
// - No other arguments to cat
// - No pipeline/redirection
if (PROC_ENVIRON_RE.test(body)) return 'DANGEROUS'
if (/\bsystem\s*\(/.test(body)) return 'DANGEROUS'
return body // Embed body in argv
}
// Example:
gh pr create --body "$(cat <<'EOF'
## Summary
...
EOF
)"
# body = "## Summary\n...\n"
# Embedded in argv as literal (no placeholder)Dangerous patterns rejected:
/proc/*/environ- jq
system()calls
String nodes are complex: they can contain expansions ($VAR, $()) and have tree-sitter newline quirks.
// ast.ts lines 1508-1652
function walkString(node: Node, ...): string | ParseForSecurityResult {
let result = ''
let cursor = -1 // Track previous child endIndex
let sawDynamicPlaceholder = false
let sawLiteralContent = false
for (const child of node.children) {
// Index gap = dropped newline (tree-sitter quirk)
if (cursor !== -1 && child.startIndex > cursor && child.type !== '"') {
result += '\n'.repeat(child.startIndex - cursor)
sawLiteralContent = true
}
cursor = child.endIndex
switch (child.type) {
case 'string_content':
// Unescape only $ ` " \ (double-quote rules, not generic)
result += child.text.replace(/\\([$`"\\])/g, '$1')
sawLiteralContent = true
break
case 'command_substitution':
// $() inside "..." — extract inner cmd
const err = collectCommandSubstitution(child, innerCommands, varScope)
if (err) return err
result += CMDSUB_PLACEHOLDER
sawDynamicPlaceholder = true
break
case 'simple_expansion':
// $VAR inside "..." — resolveSimpleExpansion(insideString=true)
const v = resolveSimpleExpansion(child, varScope, true)
if (typeof v !== 'string') return v
if (v === VAR_PLACEHOLDER) sawDynamicPlaceholder = true
else sawLiteralContent = true
result += v
break
}
}
// SECURITY: Reject solo-placeholder strings
// "$(cmd)" becomes just __CMDSUB_OUTPUT__, hiding the real path
if (sawDynamicPlaceholder && !sawLiteralContent) {
return tooComplex(node)
}
return result
}Key invariant: "$(cmd)" alone rejects (argv would be ['cmd', '__CMDSUB_OUTPUT__']), but "sha: $(cmd)" allows (output embedded in longer string, can't be a bare path).
$((expr)) is only safe if it contains literal integers and operators (no variables, no substitutions).
// ast.ts lines 1659-1702
const ARITH_LEAF_RE =
/^(?:[0-9]+|0[xX][0-9a-fA-F]+|[0-9]+#[0-9a-zA-Z]+|
[-+*/%^&|~!<>=?:(),]+|<<|>>|\*\*|&&|\|\||
[<>=!]=|\$\(\(|\)\))$/
function walkArithmetic(node: Node): ParseForSecurityResult | null {
for (const child of node.children) {
if (child.children.length === 0) {
if (!ARITH_LEAF_RE.test(child.text)) {
return tooComplex(child) // Variable ref → RCE via subscript
}
} else {
// Recurse into expression trees
const err = walkArithmetic(child)
if (err) return err
}
}
return null
}
// Rejected (RCE via arithmetic injection):
// VAR='a[$(id)]' && echo $((VAR))
// bash evaluates subscript: runs id, stores exit codePost-parsing semantic validation catches dangerous patterns that tokenize fine but are unsafe by name or argument content.
35+ dangerous commands (ast.ts lines 2086-2134):
const EVAL_LIKE_BUILTINS = new Set([
'eval', // eval "..."
'source', // source file
'.', // . file (alias for source)
'exec', // exec cmd (replaces shell with cmd)
'command', // command -p cmd
'builtin', // builtin cmd
'fc', // fc (fix command, reevaluates)
'coproc', // coproc cmd (coprocess spawning)
'noglob', // noglob cmd (zsh precommand)
'nocorrect', // nocorrect cmd (zsh precommand)
'trap', // trap 'cmd' SIGNAL (guaranteed execution on EXIT)
'enable', // enable -f /path/lib.so name (dlopen arbitrary .so)
'mapfile', // mapfile -C callback (callback runs per-line)
'readarray', // readarray -C callback
'hash', // hash -p /path cmd (poison lookup cache)
'bind', // bind -x '"key":cmd' (interactive callback)
'complete', // complete -C cmd (executes cmd)
'compgen', // compgen -C cmd (non-interactive completion)
'alias', // alias name='cmd' (with expand_aliases shopt)
'let', // let EXPR (arithmetic eval → subscript RCE)
])const ZSH_DANGEROUS_BUILTINS = new Set([
'zmodload', // Load zsh modules
'emulate', // Switch shell emulation mode
'sysopen', // Open file descriptor
'sysread', // Read from file descriptor
'syswrite', // Write to file descriptor
'sysseek', // Seek in file descriptor
'zpty', // Pseudo-terminal control
'ztcp', // TCP socket control
'zsocket', // Socket control
'zf_rm', 'zf_mv', 'zf_ln', 'zf_chmod', 'zf_chown', // FTP functions
'zf_mkdir', 'zf_rmdir', 'zf_chgrp',
])Commands like timeout, nohup, nice, env, stdbuf wrap other commands. Permission checking must see the wrapped command, not the wrapper.
// ast.ts lines 2220-2384
function checkSemantics(commands: SimpleCommand[]): SemanticCheckResult {
for (const cmd of commands) {
let a = cmd.argv
// Strip wrappers until we see the real command
for (;;) {
if (a[0] === 'time' || a[0] === 'nohup') {
a = a.slice(1)
} else if (a[0] === 'timeout') {
// Parse timeout flags: -v, -k DUR, -s SIG, --foreground, etc.
// SECURITY: Fail CLOSED on unknown flags
// (previous versions fell through to name='timeout')
let i = 1
while (i < a.length) {
const arg = a[i]!
if (arg === '--foreground' || arg === '--preserve-status') {
i++ // no-value flags
} else if (/^--(?:kill-after|signal)=[...]/.test(arg)) {
i++ // fused long form
} else if ((arg === '-k' || arg === '-s') && a[i+1]) {
i += 2 // space-separated form
} else if (/^-[ks][A-Za-z0-9_.+-]+$/.test(arg)) {
i++ // fused short form
} else if (arg.startsWith('-')) {
return {
ok: false,
reason: `timeout with ${arg} flag cannot be statically analyzed`
}
} else {
break // duration found
}
}
if (a[i] && /^\d+(?:\.\d+)?[smhd]?$/.test(a[i]!)) {
a = a.slice(i + 1)
} else {
// SECURITY: non-matching duration → fail CLOSED
// GNU timeout accepts .5, +5, 5e-1, inf (not matched by regex)
// Previously this fell through to name='timeout'
return {
ok: false,
reason: `timeout duration '${a[i]}' cannot be statically analyzed`
}
}
} else if (a[0] === 'nice') {
// -n N or -N (legacy), then wrapped command
} else if (a[0] === 'env') {
// [VAR=val...] [-i] [-0] [-v] [-u NAME...] cmd
// SECURITY: -S (argv splitter), -C (altwd), -P (altpath) → reject
} else if (a[0] === 'stdbuf') {
// -o MODE, -e MODE, -i MODE (various forms)
// SECURITY: fail closed on unknown flags
} else {
break // real command found
}
}
const name = a[0]
// ... continue with name checks
}
}Bash arithmetically evaluates subscripts in certain contexts, running $(cmd) even from single-quoted strings:
test -v 'a[$(id)]' # Evaluates subscript → runs id
printf -v 'a[$(id)]' # Same
read 'a[$(id)]' <<< x # Same
unset 'a[$(id)]' # Same
wait -p 'a[$(id)]' # bash 5.1+
[[ 'a[$(id)]' -eq 0 ]] # Arithmetic comparisonDetection (ast.ts lines 2428-2496):
const SUBSCRIPT_EVAL_FLAGS: Record<string, Set<string>> = {
test: new Set(['-v', '-R']),
'[': new Set(['-v', '-R']),
'[[': new Set(['-v', '-R']),
printf: new Set(['-v']),
read: new Set(['-a']),
unset: new Set(['-v']),
wait: new Set(['-p']),
}
const TEST_ARITH_CMP_OPS = new Set(['-eq', '-ne', '-lt', '-le', '-gt', '-ge'])
// In checkSemantics:
if (dangerFlags) {
for (let i = 1; i < a.length; i++) {
// Check for [brackets] in operands following danger flags
if (dangerFlags.has(a[i]!) && a[i+1]?.includes('[')) {
return { ok: false, reason: '... array subscript ...' }
}
}
}
if (name === '[[') {
for (let i = 2; i < a.length; i++) {
if (TEST_ARITH_CMP_OPS.has(a[i]!)) {
if (a[i-1]?.includes('[') || a[i+1]?.includes('[')) {
return { ok: false, reason: '... array subscript ...' }
}
}
}
}// ast.ts lines 2197-2204
const PROC_ENVIRON_RE = /\/proc\/.*\/environ/
const NEWLINE_HASH_RE = /\n[ \t]*#/
// Detected in argument values, env vars, redirect targetsThese run FIRST, before AST parsing—they catch tree-sitter/bash differentials:
// ast.ts lines 408-437
function parseForSecurityFromAst(cmd: string, root: Node): ParseForSecurityResult {
// Control characters (bash silently drops, tree-sitter mishandles)
if (CONTROL_CHAR_RE.test(cmd)) {
return { kind: 'too-complex', reason: 'Contains control characters' }
}
// const CONTROL_CHAR_RE = /[\x00-\x08\x0B-\x1F\x7F]/
// Unicode whitespace (invisible in terminals, not IFS)
if (UNICODE_WHITESPACE_RE.test(cmd)) {
return { kind: 'too-complex', reason: 'Contains Unicode whitespace' }
}
// const UNICODE_WHITESPACE_RE = /[\u00A0\u1680\u2000-\u200B\u2028\u2029\u202F\u205F\u3000\uFEFF]/
// Backslash-escaped whitespace (tree-sitter/bash differential)
if (BACKSLASH_WHITESPACE_RE.test(cmd)) {
return { kind: 'too-complex', reason: 'Contains backslash-escaped whitespace' }
}
// const BACKSLASH_WHITESPACE_RE = /\\[ \t]|[^ \t\n\\]\\\n/
// Example: `cat\ test` (tree-sitter keeps backslash, bash treats as literal space)
// Zsh dynamic tilde expansion: ~[name] invokes zsh_directory_name hook
if (ZSH_TILDE_BRACKET_RE.test(cmd)) {
return { kind: 'too-complex', reason: 'Contains zsh ~[ dynamic directory syntax' }
}
// const ZSH_TILDE_BRACKET_RE = /~\[/
// Zsh equals expansion: =cmd → /path/to/cmd (equivalent to $(which cmd))
if (ZSH_EQUALS_EXPANSION_RE.test(cmd)) {
return { kind: 'too-complex', reason: 'Contains zsh =cmd equals expansion' }
}
// const ZSH_EQUALS_EXPANSION_RE = /(?:^|[\s;&|])=[a-zA-Z_]/
// Example: `=curl evil.com` runs `/usr/bin/curl` (zsh only)
// Brace expansion combined with quotes (obfuscation attempt)
if (BRACE_WITH_QUOTE_RE.test(maskBracesInQuotedContexts(cmd))) {
return { kind: 'too-complex', reason: 'Contains brace with quote character' }
}
// const BRACE_WITH_QUOTE_RE = /\{[^}]*['"]/
// Example: `{a'}',b}` uses quoted } to obfuscate expansion
}// ast.ts lines 331-371
function maskBracesInQuotedContexts(cmd: string): string {
// Fast path: no braces → skip scan
if (!cmd.includes('{')) return cmd
// Scan bash quote state (not regex-able due to quote nesting)
const out: string[] = []
let inSingle = false, inDouble = false, i = 0
while (i < cmd.length) {
const c = cmd[i]!
if (inSingle) {
if (c === "'") inSingle = false
out.push(c === '{' ? ' ' : c)
i++
} else if (inDouble) {
if (c === '\\' && (cmd[i+1] === '"' || cmd[i+1] === '\\')) {
out.push(c, cmd[i+1]!)
i += 2
} else {
if (c === '"') inDouble = false
out.push(c === '{' ? ' ' : c)
i++
}
} else {
if (c === '\\' && i + 1 < cmd.length) {
out.push(c, cmd[i+1]!)
i += 2
} else {
if (c === "'") inSingle = true
else if (c === '"') inDouble = true
out.push(c)
i++
}
}
}
return out.join('')
}
// Example:
// Input: echo "json" '{a,b}'
// Output: echo "json" a,b (unquoted { unmasked)
// Input: '{a,b}'
// Output: '{a,b}' (no change → BRACE_WITH_QUOTE_RE won't match quoted braces)Variables whose values are OS/shell-controlled (safe to expand):
// ast.ts lines 125-149
const SAFE_ENV_VARS = new Set([
'HOME', 'PWD', 'OLDPWD', 'USER', 'LOGNAME', 'SHELL', 'PATH',
'HOSTNAME', 'UID', 'EUID', 'PPID', 'RANDOM', 'SECONDS', 'LINENO',
'TMPDIR', 'BASH_VERSION', 'BASHPID', 'SHLVL', 'HISTFILE', 'IFS',
])
// NOTE: $IFS is safe INSIDE strings (quote prevents word-split)
// but NOT as bare arg (classic injection: IFS=:; VAR=a:b; cmd $VAR)// ast.ts lines 167-174
const SPECIAL_VAR_NAMES = new Set([
'?', // exit status
'$', // shell PID
'!', // last background PID
'#', // number of positional args
'0', // script name
'-', // shell options
])
// NOT included: '@' and '*' (positional params, empty in fresh BashTool)
// If included, "$*" placeholder would mismatch bash behavior# User enters:
VAR=/etc && cat "$VAR/passwd"
# Step 1: parseForSecurity
# - Pre-checks: pass (no dangerous patterns)
# - Parse: builds AST
# - Walk: collectCommands
# - variable_assignment VAR=/etc → varScope.set('VAR', '/etc')
# - command cat with argv child string("$VAR/passwd")
# - walkString → simple_expansion $VAR
# - resolveSimpleExpansion: tracked '/etc' → return literal
# - result = '/etc/passwd' (literal inside string)
# - Output: SimpleCommand { argv: ['cat', '/etc/passwd'], ... }
# Step 2: checkSemantics(['cat', '/etc/passwd'])
# - name='cat' (not dangerous)
# - No array subscripts
# - Path regex check in downstream code
# Step 3: Permission matching
# - User allowlist: Bash(cat:etc/passwd)
# - argv[0]='cat' matches ✓
# - PROCEED with execution# User enters:
echo "Built at $(date '+%Y-%m-%d')"
# Step 1: parseForSecurity
# - Walk: walkString finds command_substitution
# - collectCommandSubstitution → extract inner commands
# - Output: SimpleCommand { argv: ['date', '+%Y-%m-%d'], ... }
# - Outer argv: ['echo', 'Built at __CMDSUB_OUTPUT__']
# - (Inner cmd is separate, permission-checked separately)
# Step 2: checkSemantics twice
# - Inner: checkSemantics(['date', '+%Y-%m-%d'])
# - name='date' → not dangerous
# - Outer: checkSemantics(['echo', 'Built at __CMDSUB_OUTPUT__'])
# - name='echo' → not dangerous
# Step 3: Permission matching (both commands)
# - User allowlist: Bash(echo:*) AND Bash(date:*)
# - PROCEED# User enters:
git push $(cat /tmp/branch)
# Step 1: parseForSecurity
# - Walk: command git with argument push (word)
# - Next argument: command_substitution (BARE, not inside string)
# - Bare $() → command_name may be a path
# - Return: too-complex (reason: 'Contains command_substitution')
# Result: { kind: 'too-complex', reason: '...', nodeType: 'command_substitution' }
# Step 2: Request user permission
# - "This command is complex and requires your approval"
# - User reviews and approves/denies# User enters:
eval "rm -rf /$VAR"
# Step 1: parseForSecurity
# - Parse: succeeds (valid syntax)
# - Walk: argv=['eval', 'rm -rf /$VAR']
# Step 2: checkSemantics
# - name='eval' → in EVAL_LIKE_BUILTINS
# - Return: { ok: false, reason: 'eval is a dangerous builtin...' }
# Result: Permission denied immediately (no need for user prompt for known-dangerous)// ast.ts lines 224-234
const REDIRECT_OPS: Record<string, Redirect['op']> = {
'>': '>', // stdout to file
'>>': '>>', // append
'<': '<', // stdin from file
'>&': '>&', // redirect fd to fd
'<&': '<&', // dup fd
'>|': '>|', // clobber (noclobber override)
'&>': '&>', // stdout+stderr (bash)
'&>>': '&>>', // append stdout+stderr
'<<<': '<<<', // herestring (content as stdin)
}// commands.ts lines 19-35
function generatePlaceholders() {
const salt = randomBytes(8).toString('hex') // 16 hex chars
return {
SINGLE_QUOTE: `__SINGLE_QUOTE_${salt}__`,
DOUBLE_QUOTE: `__DOUBLE_QUOTE_${salt}__`,
NEW_LINE: `__NEW_LINE_${salt}__`,
ESCAPED_OPEN_PAREN: `__ESCAPED_OPEN_PAREN_${salt}__`,
ESCAPED_CLOSE_PAREN: `__ESCAPED_CLOSE_PAREN_${salt}__`,
}
}
// SECURITY: Each command split gets new random salt
// Prevents: `sort __SINGLE_QUOTE__ hello --help __SINGLE_QUOTE__`
// from injecting arguments via placeholder collision// ast.ts lines 2029-2031
function stripRawString(text: string): string {
return text.slice(1, -1) // Remove ' quotes: 'foo' → foo
}// commands.ts lines 37-39
const ALLOWED_FILE_DESCRIPTORS = new Set(['0', '1', '2'])
// stdin, stdout, stderr only
// Reject: 3>&1, 9>/tmp/log (custom FDs)// bashParser.ts lines 165-193
function byteAt(L: Lexer, charIdx: number): number {
if (L.byteTable) return L.byteTable[charIdx]! // Cache hit
// Lazy computation: only build table on non-ASCII lookups
const t = new Uint32Array(L.len + 1)
let b = 0, i = 0
while (i < L.len) {
t[i] = b
const c = L.src.charCodeAt(i)
if (c < 0x80) {
b++; i++
} else if (c < 0x800) {
b += 2; i++
} else if (c >= 0xd800 && c <= 0xdbff) {
t[i+1] = b + 2
b += 4; i += 2
} else {
b += 3; i++
}
}
t[L.len] = b
L.byteTable = t
return t[charIdx]!
}Fast path: ASCII-only input never allocates byteTable (byte==char).
Scope copies only on branch points (||, |, &), not on sequential (&&, ;):
// ast.ts lines 530-544
const isPipeline = node.type === 'pipeline'
let needsSnapshot = false
if (!isPipeline) {
for (const c of node.children) {
if (c && (c.type === '||' || c.type === '&')) {
needsSnapshot = true
break
}
}
}
const snapshot = needsSnapshot ? new Map(varScope) : nullOptimization: Pre-scan for risky operators before allocating Map.
// ast.ts lines 331-334
function maskBracesInQuotedContexts(cmd: string): string {
if (!cmd.includes('{')) return cmd // >90% of commands
// ... full quote-state scan
}Attack: Exploit cases where tree-sitter and bash parse differently, hiding dangerous code.
Mitigations:
-
Pre-checks (lines 408-437): Catch known differentials before AST.
- Control characters, Unicode whitespace, backslash-whitespace, zsh-syntax
-
Explicit allowlist (FAIL-CLOSED): Unknown node types → too-complex.
- New bash feature? Unknown to old parser? Safely rejected.
-
Coverage: All known differentials catalogued in code comments.
Attack: eval ${VAR_NAME} where VAR_NAME contains dangerous code.
Mitigation: Variable names validated in assignment:
// ast.ts lines 1835-1841
if (!/^[A-Za-z_][A-Za-z0-9_]*$/.test(name)) {
return {
kind: 'too-complex',
reason: `Invalid variable name (bash treats as command): ${name}`,
}
}Attack: VAR='a[$(id)]' && test -v $VAR → bash runs id.
Mitigation: walkArithmetic() validates literals only, checkSemantics() detects -v flags with [ operands.
Attack: cat <<'EOF' <content with /proc/self/environ> EOF → access /etc/passwd.
Mitigation: extractSafeCatHeredoc() rejects /proc/*/environ patterns.
Attack: enable -f /path/lib.so name → dlopen arbitrary .so.
Mitigation: enable in EVAL_LIKE_BUILTINS → checkSemantics() rejects.
Attack: timeout -k 5 10 eval "..." → wrapper confusion, eval never checked.
Mitigation: checkSemantics() iterates all known timeout flags, fails CLOSED on unknown (doesn't fall through to name='timeout').
Attack: Command contains literal __CMDSUB_OUTPUT__ → injection.
Mitigation: Placeholders include 16-char random salt (commands.ts lines 19-35).
Documented adversarial inputs that trigger parser limits:
# Timeout trigger (bashParser.ts lines 444-456):
# (( a[0][0][0]... )) with ~2800 subscripts → PARSE_TIMEOUT_MICROS
# Node budget exhaustion:
# Deeply nested: (((((...))))) with 10K levels → MAX_NODES
# Result: PARSE_ABORTED sentinel → too-complex (fail-closed)# Correct: foo\<LF>bar → joined to foobar
# Incorrect: foo\\<LF>bar → NOT joined (paired backslashes)
# commands.ts lines 105-119: Regex handles odd/even backslash count# Supported: <<'EOF' (quoted, literal body)
# Rejected: <<EOF (unquoted, expansion enabled)
# Reason: tree-sitter grammar gap—backticks not parsed in body children// parser.ts lines 51-54, 65, 108
if (feature('TREE_SITTER_BASH') || feature('TREE_SITTER_BASH_SHADOW')) {
// Tree-sitter path
}
// External builds: shell-quote fallback (legacy)Handled downstream in bashPermissions.ts (not in parser):
- Permission rules:
Bash(git:*),Bash(cat:etc/passwd), etc. - Deny rules:
Bash(rm:/), etc.
// parser.ts lines 40-44, 120-124, 128-133
logEvent('tengu_tree_sitter_load', { success })
logEvent('tengu_tree_sitter_parse_abort', {
cmdLength: command.length,
panic: false/true
})| File | LOC | Purpose |
|---|---|---|
| bashParser.ts | 4436 | Lexer, tokenizer, recursive descent parser → TsNode |
| ast.ts | 2679 | Tree walker, SimpleCommand extraction, semantic checks |
| commands.ts | 1339 | shell-quote integration, redirection stripping, prefix extraction |
| heredoc.ts | 733 | Heredoc extraction/restoration for shell-quote |
| ShellSnapshot.ts | 582 | Shell environment snapshot |
| treeSitterAnalysis.ts | 506 | Tree-sitter utilities (legacy) |
| ParsedCommand.ts | 318 | Wrapper for parsed command data |
| shellQuote.ts | 304 | Quote handling utilities |
| bashPipeCommand.ts | 294 | Pipe command execution |
| shellCompletion.ts | 259 | Shell completion integration |
| parser.ts | 230 | Public API, TREE_SITTER_BASH feature gate |
| prefix.ts | 204 | Command prefix extraction for permission rules |
| shellQuoting.ts | 128 | Quote/escape utilities |
| specs/* | ~140 | Command-specific specs (pyright, timeout, srun, etc.) |
// ast.ts imports
import { SHELL_KEYWORDS } from './bashParser.js'
import type { Node } from './parser.js'
import { PARSE_ABORTED, parseCommandRaw } from './parser.js'
// commands.ts imports
import { quote, tryParseShellCommand } from './shellQuote.js'
import { extractHeredocs, restoreHeredocs } from './heredoc.js'
// parser.ts imports
import { logEvent } from '../../services/analytics/index.js'
import {
ensureParserInitialized,
getParserModule,
type TsNode,
} from './bashParser.js'- Tilde expansion modeling: Conservative reject on
~in assignment RHS. - IFS modeling: Reject all IFS assignments (can't model custom splits).
- PS4 allowlist: Strict charset allowlist (not exhaustive, but conservative).
- Brace expansion edge cases: Over-rejects some valid cases for safety.
- Heredoc body handling: Can't inspect multiline bodies without false positives.
- Cached parser module: Current implementation reloads on each command.
- Incremental parsing: Large scripts could be parsed in chunks.
- Whitelist-based command categories: (already done via bashPermissions.ts)
- Integration with LSP: Real-time linting of bash commands.
Claude Code's bash parser is a security-first, fail-closed system that prioritizes user visibility and safety over convenience. By implementing:
- Explicit allowlists instead of blacklists
- Variable scope tracking with unknown-value sentinels
- Comprehensive semantic checks on dangerous builtins
- Parser differential detection via pre-checks
- Resource limits against adversarial input
...the system ensures that dangerous commands either execute with explicit user approval or are safely rejected. The design is inherently conservative—better to ask the user than to silently execute code that might be dangerous.
Command Nodes (checked against permission rules):
command(simple command)declaration_command(export/declare/typeset/readonly/local)negated_command(! cmd)unset_command(unset VAR)test_command([[ expr ]])
Structural Nodes (recursively traversed):
program(root)list(&&, ||, ;)pipeline(|, |&)redirected_statementsubshell((cmd))compound_statement({cmd;})for_statement,while_statement,until_statementif_statement,case_statement,elif_clause,else_clausedo_groupfunction_definitioncomment
Argument Nodes (analyzed for content):
wordnumber-
raw_string('...') -
string("...") concatenation-
simple_expansion($VAR) -
command_substitution($(...) or...) -
arithmetic_expansion($((expr))) -
expansion(${VAR}, ${VAR:-default}, etc.) -
brace_expression({a,b,c}) -
process_substitution(<(...) or >(...)) -
ansi_c_string($'...') -
translated_string($"...") -
string_content(literal inside double quotes) variable_name-
special_variable_name($?, $ $, etc.)
Redirect Nodes:
file_redirect(> >> < <&)heredoc_redirect(<< or <<-)herestring_redirect(<<<)heredoc_bodyheredoc_startheredoc_endheredoc_contentfile_descriptor
Operators:
&&,||,|,|&,&,;,|,<<,>>,>|,<&,>&,&>,&>>,<<<(,),[[,]],[,],{,}!,=,+=, test operators (-f, -d, ==, !=, =~, etc.)
END OF ANALYSIS
Total AST Node Types Identified: 50+ Dangerous Node Types: 15 Dangerous Builtins: 35+ Safe Environment Variables: 24 Safe Special Variables: 6 Redirect Operators: 10 Parser Timeout (ms): 50 Max Node Budget: 50,000 Max Command Length: 10,000 characters