Skip to content

Latest commit

 

History

History
1399 lines (1114 loc) · 44.5 KB

File metadata and controls

1399 lines (1114 loc) · 44.5 KB

Claude Code v2.1.88 Bash Parser & Execution Subsystem - Deep Dive Analysis

Author: Security Analysis Date: 2026-04-02 Scope: /sessions/cool-friendly-einstein/mnt/claude-code/src/utils/bash/ (23 files, 12,306 LOC)


Executive Summary

Claude Code's bash parser and execution subsystem is a sophisticated security architecture that replaces naive shell-quote parsing with a fail-closed, tree-sitter-based AST analysis system. The design philosophy is explicit: never interpret code we don't understand. If the parser encounters an unallowlisted node type, the entire command is classified as "too-complex" and requires explicit user approval.

The system implements 15 dangerous AST node types, 35+ dangerous shell builtins, and strict semantic checks on command arguments before execution. This is not a sandbox—it's a permission-and-visibility system that answers one critical question: Can we produce a trustworthy argv[] for static analysis? If yes, downstream permission matching proceeds. If no, the user is asked.


1. Architecture Overview

1.1 Multi-Layer Design

The bash parsing system consists of four distinct layers:

┌─────────────────────────────────────────────────────────┐
│  Layer 1: Lexer & Tokenizer (bashParser.ts)             │
│  ├─ Token types: WORD, OP, COMMENT, STRING, DOLLAR, etc │
│  ├─ UTF-8 byte offset tracking (not JS char indices)     │
│  └─ Heredoc delimiter extraction                         │
├─────────────────────────────────────────────────────────┤
│  Layer 2: Recursive Descent Parser (bashParser.ts)       │
│  ├─ Grammar-based AST construction                       │
│  ├─ 50ms timeout + 50K node budget (anti-DoS)            │
│  └─ TsNode output (tree-sitter compatible format)        │
├─────────────────────────────────────────────────────────┤
│  Layer 3: Security-Aware Tree Walking (ast.ts)           │
│  ├─ Allowlist-based node type filtering (FAIL-CLOSED)    │
│  ├─ Variable scope tracking (VAR_PLACEHOLDER, literals)  │
│  ├─ Dangerous pattern detection                          │
│  └─ Outputs: SimpleCommand[] with argv/envVars/redirects │
├─────────────────────────────────────────────────────────┤
│  Layer 4: Semantic Post-Checks (ast.ts)                  │
│  ├─ Dangerous builtin detection                          │
│  ├─ Subscript arithmetic evaluation guards               │
│  ├─ Path/newline/hash injection detection                │
│  └─ Returns: ok:true OR rejection reason                 │
└─────────────────────────────────────────────────────────┘

1.2 Execution Pipeline

User Command
    ↓
[parseForSecurity(cmd)] ← Entry point in ast.ts
    ├─ Pre-checks: control chars, unicode whitespace, zsh syntax
    ├─ Parse: parseCommandRaw(cmd) → AST root
    └─ Walk: parseForSecurityFromAst(root) → SimpleCommand[]
    ↓
    ├─ kind: 'simple' → proceed with argv validation
    ├─ kind: 'too-complex' → request user approval
    └─ kind: 'parse-unavailable' → fallback to shell-quote (legacy)
    ↓
[checkSemantics(commands)] ← Post-parse validation
    ├─ Strip safe wrappers (nohup, time, timeout, nice, env, stdbuf)
    ├─ Dangerous builtin check (eval, source, trap, enable, etc.)
    ├─ Array subscript arithmetic detection
    ├─ Path constraint checks
    └─ Return: ok:true OR reject
    ↓
[Permission Matching] ← bashPermissions.ts
    └─ Match argv[0] + deny rules against user allowlist
    ↓
[Subprocess Execution] ← BashTool
    └─ spawn with sandboxed environment

1.3 Feature Flags & Module Loading

// parser.ts lines 65-82
if (feature('TREE_SITTER_BASH')) {
    await ensureParserInitialized()
    const mod = getParserModule()
    logLoadOnce(mod !== null)
    if (!mod) return null

    const rootNode = mod.parse(command)
    // ... tree-sitter path
}
// Falls back to shell-quote on unavailable
  • TREE_SITTER_BASH: Primary parser (enabled in production/ant-builds)
  • TREE_SITTER_BASH_SHADOW: Logging-only fallback path
  • External builds: use legacy shell-quote parsing

2. Parser Architecture

2.1 Pure-TypeScript Bash Parser (bashParser.ts: 4436 LOC)

The primary parser is hand-rolled (not tree-sitter WASM), implementing bash grammar in pure TypeScript with these characteristics:

Tokenizer (nextToken function, ~500 LOC)

Context-sensitive lexing for bash:

function nextToken(L: Lexer, ctx: 'cmd' | 'arg' = 'arg'): Token {
    // In 'cmd' mode: [ [[ { are operators (test/group start)
    // In 'arg' mode: they're word characters (glob/subscript)

    // Multi-char operators (longest match):
    // && || |& && |
    // << <<- <<< <& <(
    // >> >& >| &> &>>
    // ;; ;;& ;&
    // (( ))

    // Handles UTF-8 byte offsets (critical for source positions)
    // Tracks heredoc pending delimiters
}

Token Types (15 types):

  • WORD: Unquoted text
  • NUMBER: Digits (with base syntax: 10#ff)
  • OP: Operators (&&, ||, |, ;, >, <, (, ), etc.)
  • NEWLINE: Line separator
  • COMMENT: # ... text
  • DQUOTE/SQUOTE: Quote delimiters
  • ANSI_C: $'...' strings
  • DOLLAR/DOLLAR_PAREN/DOLLAR_BRACE/DOLLAR_DPAREN
  • BACKTICK: Command substitution delimiter
  • LT_PAREN/GT_PAREN: Process substitution <(...) / >(...)
  • EOF: End of input

UTF-8 Byte Offset Tracking

// bashParser.ts lines 145-159
function advance(L: Lexer): void {
    const c = L.src.charCodeAt(L.i)
    L.i++  // JS string index
    if (c < 0x80) {
        L.b++  // ASCII: 1 byte
    } else if (c < 0x800) {
        L.b += 2  // 2-byte UTF-8
    } else if (c >= 0xd800 && c <= 0xdbff) {
        L.b += 4  // Surrogate pair: 4 bytes
    } else {
        L.b += 3  // 3-byte UTF-8
    }
}

This ensures startIndex/endIndex in AST nodes are UTF-8 byte offsets (matching tree-sitter), not JS character indices—critical for correct source mapping.

Resource Limits (Anti-DoS)

// bashParser.ts lines 27-31
const PARSE_TIMEOUT_MS = 50      // Wall-clock timeout
const MAX_NODES = 50_000          // Node budget cap
const MAX_COMMAND_LENGTH = 10000  // Character limit

// Adversarial input example that triggers timeout:
// `(( a[0][0][0]... ))` with ~2800 subscripts exhausts budget

Returns PARSE_ABORTED (distinct sentinel) when timeout/budget exceeded—treated as fail-closed (too-complex), not routed to legacy fallback.

2.2 AST Grammar (Recursive Descent)

Core statement types parsed:

program
  ├─ command (pipe operators │ separate)
  ├─ list (&&, ||, ; chaining)
  ├─ pipeline (| | |&)
  ├─ redirected_statement (> >> < <<'EOF' etc.)
  ├─ negated_command (! cmd)
  ├─ declaration_command (export/declare/typeset/readonly/local)
  ├─ variable_assignment (VAR=value at statement level)
  ├─ for_statement (for VAR in LIST; do BODY; done)
  ├─ while_statement (while COND; do BODY; done)
  ├─ if_statement (if COND; then...; fi)
  ├─ case_statement (case VAR in PATTERN) cmd;; esac)
  ├─ function_definition (name() { BODY })
  ├─ subshell ((cmd))
  ├─ compound_statement ({ cmd; })
  ├─ test_command ([[ EXPR ]])
  ├─ unset_command (unset VAR...)
  └─ comment (# text)

command
  ├─ variable_assignment* (VAR=x prefix)
  ├─ command_name | word (argv[0])
  ├─ argument* (argv[1..n])
  └─ file_redirect* (> >> < << <<<)

argument
  ├─ word
  ├─ number (with `NN#` base syntax)
  ├─ raw_string ('...')
  ├─ string ("...") [with expansion children]
  ├─ concatenation (adjacent string/word parts)
  ├─ arithmetic_expansion ($((expr)))
  ├─ simple_expansion ($VAR)
  ├─ command_substitution ($(...) or `...`)
  └─ process_substitution (<(...) >(...)

3. AST Node Types & Safety Model

3.1 Complete AST Node Catalog

Structural Nodes (traversed recursively, not executed):

// ast.ts lines 54-65
const STRUCTURAL_TYPES = new Set([
    'program',              // Root node
    'list',                 // a && b || c
    'pipeline',             // a | b | c
    'redirected_statement', // cmd > file
])

const SEPARATOR_TYPES = new Set([
    '&&', '||', '|', ';', '&', '|&', '\n'
])

Dangerous Node Types (15 types—immediate rejection):

// ast.ts lines 186-205
const DANGEROUS_TYPES = new Set([
    'command_substitution',  // $(cmd) or `cmd`
    'process_substitution',  // <(cmd) >(cmd)
    'expansion',             // ${VAR} general expansion
    'simple_expansion',      // $VAR (conditionally allowed)
    'brace_expression',      // {a,b,c}
    'subshell',              // (cmd1; cmd2)
    'compound_statement',    // { cmd; }
    'for_statement',         // for VAR in ...; do...; done
    'while_statement',       // while COND; do...; done
    'until_statement',       // until COND; do...; done
    'if_statement',          // if...; then...; fi
    'case_statement',        // case VAR in PATTERN) cmd;; esac
    'function_definition',   // name() { body }
    'test_command',          // [[ ... ]]
    'ansi_c_string',         // $'...\n...'
    'translated_string',     // $"..." (gettext)
    'herestring_redirect',   // <<< content
    'heredoc_redirect',      // << delimiter
])

3.2 Allowlisted Node Types

Safe Leaf Nodes (terminal, no dangerous children):

// walkArgument switch cases (ast.ts ~1408):
case 'word':
    return node.text.replace(/\\(.)/g, '$1')  // Unescape

case 'number':
    // SECURITY: NN#<expansion> syntax (e.g., 10#$(cmd))
    // tree-sitter emits expansion as child—MUST REJECT
    if (node.children.length > 0) return tooComplex(node)
    return node.text

case 'raw_string':
    return stripRawString(node.text)  // Remove ' quotes

case 'string':
    return walkString(node, ...)  // Complex—see below

case 'concatenation':
    // Adjacent strings/words: "foo"bar → "foobar"
    // Brace expansion check ($BRACE_EXPANSION_RE)
    // Recurse into children

case 'arithmetic_expansion':
    const err = walkArithmetic(node)  // Validate literals only
    return node.text  // Return full $((...))

Safe Command-Level Nodes:

declaration_command  (export/declare/typeset/readonly/local)
negated_command      (! cmd)
variable_assignment  (VAR=value at statement level)
unset_command        (unset VAR...)
test_command         ([[ ... ]])

3.3 DANGEROUS_TYPE_IDS Mapping (Analytics)

// ast.ts lines 212-218
export function nodeTypeId(nodeType: string | undefined): number {
    if (!nodeType) return -2          // No type (pre-check)
    if (nodeType === 'ERROR') return -1  // Parse error
    const i = DANGEROUS_TYPE_IDS.indexOf(nodeType)
    return i >= 0 ? i + 1 : 0        // Unknown type ID
}

// Indices (stable for analytics):
// 1=command_substitution, 2=process_substitution, 3=expansion,
// 4=simple_expansion, 5=brace_expression, ..., 15=heredoc_redirect

4. Security-Critical Design Decisions

4.1 Fail-Closed Philosophy

Core Principle: If we can't prove a node is safe, reject it.

// ast.ts lines 1-19
/**
 * FAIL-CLOSED: we never interpret structure we don't understand.
 * If tree-sitter produces a node we haven't explicitly allowlisted,
 * we refuse to extract argv and the caller must ask the user.
 */

Example: New bash feature added to tree-sitter grammar? Old parser version will correctly reject it as tooComplex rather than silently misinterpreting it.

4.2 Variable Scope Tracking

Problem: VAR=safe && cmd $VAR must resolve $VAR to safe, not treat it as unknown.

Solution: varScope Map tracks variable assignments as we walk the AST:

// ast.ts lines 472-475
const varScope = new Map<string, string>()
const err = collectCommands(root, commands, varScope)
// varScope maps var names → literal values or placeholders

Three Value Types:

  1. Literal strings (e.g., /tmp, foo):

    • Returned DIRECTLY to downstream path validation
    • VAR=/etc && rm $VAR → argv=['rm', '/etc']
  2. Placeholders:

    • CMDSUB_PLACEHOLDER = __CMDSUB_OUTPUT__ (from $(cmd))
    • VAR_PLACEHOLDER = __TRACKED_VAR__ (unknown-value: loop var, read stdin, etc.)
    • Bare $VAR with placeholder → too-complex (can't prove safety)
    • Inside strings: "prefix$VAR" → allowed (output embedded, not a bare arg)
  3. containsAnyPlaceholder() guard (ast.ts lines 94-96):

    function containsAnyPlaceholder(value: string): boolean {
        return value.includes(CMDSUB_PLACEHOLDER) ||
               value.includes(VAR_PLACEHOLDER)
    }

4.3 Scope Isolation (Subshells, Conditionals, Loops)

Problem: Variables set in a subshell/branch shouldn't leak outside:

# Flag-omission attack:
true || FLAG=--dry-run && cmd $FLAG
# Bash: || RHS doesn't run → FLAG unset → $FLAG empty
# If scope leaked: our argv would show ['cmd', '--dry-run'] → bypass

Solution (ast.ts lines 504-564): Scope snapshots at branch points.

// Pipeline: ALL stages are subshells
if (isPipeline) {
    scope = new Map(varScope)  // Copy scope for first stage
}

// List (&&, ||): snapshot at entry
const snapshot = needsSnapshot ? new Map(varScope) : null

// After ||/| operators, reset to snapshot
if (child.type === '||' || child.type === '|' || child.type === '&') {
    scope = new Map(snapshot ?? varScope)
}
  • &&, ;: Sequential, scope shared (VAR=x && cmd $VAR)
  • ||, |, &: Isolated, scope reset to pre-operator state
  • for/while bodies: Scope copies (body assignments don't leak)
  • if branches: Scope copies (branch assignments don't leak)

4.4 Read Var Tracking (Conditionals)

Bash while read VAR in a condition: VAR is set in the condition (always runs) but body is conditional.

while read V < file; do
    cmd $V
done

Solution (ast.ts lines 839-877): Track read VAR in REAL scope (accessible to body), but reject if the read MIGHT NOT execute:

// e.g., if true || read VAR; then... fi
// The ||'d read may not run, so VAR might not be set
// But VAR is already tracked → placeholder override bypass
// Fail closed: reject if overwriting a known literal

5. Advanced AST Walking Patterns

5.1 Command Substitution Extraction

When $(cmd) appears in argv, the inner command is extracted and permission-checked separately:

// ast.ts lines 1374-1393
function collectCommandSubstitution(
    csNode: Node,
    innerCommands: SimpleCommand[],
    varScope: Map<string, string>,
): ParseForSecurityResult | null {
    // Inner commands see COPY of outer scope (no pollution)
    const innerScope = new Map(varScope)

    for (const child of csNode.children) {
        if (child.type === '$(' || child.type === '`' || child.type === ')') {
            continue
        }
        const err = collectCommands(child, innerCommands, innerScope)
        if (err) return err
    }
    return null
}

Examples:

echo $(git rev-parse HEAD)
# Extracts: ['echo', '$(git rev-parse HEAD)']
#           ['git', 'rev-parse', 'HEAD']
# Both must match permission rules

git commit -m "SHA: $(git rev-parse --short HEAD)"
# Outer: ['git', 'commit', '-m', 'SHA: __CMDSUB_OUTPUT__']
# Inner: ['git', 'rev-parse', '--short', 'HEAD']

5.2 Safe Heredoc Detection (cat Carve-out)

Pattern: $(cat <<'EOF'...EOF) is idiomatic for multi-line content.

// ast.ts lines 1721-1775
function extractSafeCatHeredoc(subNode: Node): string | 'DANGEROUS' | null {
    // Exact match: $(cat <<'DELIM'...DELIM)
    // - Quoted delimiter (literal body)
    // - No other arguments to cat
    // - No pipeline/redirection

    if (PROC_ENVIRON_RE.test(body)) return 'DANGEROUS'
    if (/\bsystem\s*\(/.test(body)) return 'DANGEROUS'
    return body  // Embed body in argv
}

// Example:
gh pr create --body "$(cat <<'EOF'
## Summary
...
EOF
)"
# body = "## Summary\n...\n"
# Embedded in argv as literal (no placeholder)

Dangerous patterns rejected:

  • /proc/*/environ
  • jq system() calls

5.3 String Walking (Double-Quote Context)

String nodes are complex: they can contain expansions ($VAR, $()) and have tree-sitter newline quirks.

// ast.ts lines 1508-1652
function walkString(node: Node, ...): string | ParseForSecurityResult {
    let result = ''
    let cursor = -1  // Track previous child endIndex
    let sawDynamicPlaceholder = false
    let sawLiteralContent = false

    for (const child of node.children) {
        // Index gap = dropped newline (tree-sitter quirk)
        if (cursor !== -1 && child.startIndex > cursor && child.type !== '"') {
            result += '\n'.repeat(child.startIndex - cursor)
            sawLiteralContent = true
        }
        cursor = child.endIndex

        switch (child.type) {
            case 'string_content':
                // Unescape only $ ` " \ (double-quote rules, not generic)
                result += child.text.replace(/\\([$`"\\])/g, '$1')
                sawLiteralContent = true
                break

            case 'command_substitution':
                // $() inside "..." — extract inner cmd
                const err = collectCommandSubstitution(child, innerCommands, varScope)
                if (err) return err
                result += CMDSUB_PLACEHOLDER
                sawDynamicPlaceholder = true
                break

            case 'simple_expansion':
                // $VAR inside "..." — resolveSimpleExpansion(insideString=true)
                const v = resolveSimpleExpansion(child, varScope, true)
                if (typeof v !== 'string') return v
                if (v === VAR_PLACEHOLDER) sawDynamicPlaceholder = true
                else sawLiteralContent = true
                result += v
                break
        }
    }

    // SECURITY: Reject solo-placeholder strings
    // "$(cmd)" becomes just __CMDSUB_OUTPUT__, hiding the real path
    if (sawDynamicPlaceholder && !sawLiteralContent) {
        return tooComplex(node)
    }

    return result
}

Key invariant: "$(cmd)" alone rejects (argv would be ['cmd', '__CMDSUB_OUTPUT__']), but "sha: $(cmd)" allows (output embedded in longer string, can't be a bare path).

5.4 Arithmetic Expansion Validation

$((expr)) is only safe if it contains literal integers and operators (no variables, no substitutions).

// ast.ts lines 1659-1702
const ARITH_LEAF_RE =
    /^(?:[0-9]+|0[xX][0-9a-fA-F]+|[0-9]+#[0-9a-zA-Z]+|
      [-+*/%^&|~!<>=?:(),]+|<<|>>|\*\*|&&|\|\||
      [<>=!]=|\$\(\(|\)\))$/

function walkArithmetic(node: Node): ParseForSecurityResult | null {
    for (const child of node.children) {
        if (child.children.length === 0) {
            if (!ARITH_LEAF_RE.test(child.text)) {
                return tooComplex(child)  // Variable ref → RCE via subscript
            }
        } else {
            // Recurse into expression trees
            const err = walkArithmetic(child)
            if (err) return err
        }
    }
    return null
}

// Rejected (RCE via arithmetic injection):
// VAR='a[$(id)]' && echo $((VAR))
// bash evaluates subscript: runs id, stores exit code

6. Dangerous Command Detection (checkSemantics)

Post-parsing semantic validation catches dangerous patterns that tokenize fine but are unsafe by name or argument content.

6.1 Dangerous Shell Builtins

35+ dangerous commands (ast.ts lines 2086-2134):

Code Execution Builtins (17):

const EVAL_LIKE_BUILTINS = new Set([
    'eval',        // eval "..."
    'source',      // source file
    '.',           // . file (alias for source)
    'exec',        // exec cmd (replaces shell with cmd)
    'command',     // command -p cmd
    'builtin',     // builtin cmd
    'fc',          // fc (fix command, reevaluates)
    'coproc',      // coproc cmd (coprocess spawning)
    'noglob',      // noglob cmd (zsh precommand)
    'nocorrect',   // nocorrect cmd (zsh precommand)
    'trap',        // trap 'cmd' SIGNAL (guaranteed execution on EXIT)
    'enable',      // enable -f /path/lib.so name (dlopen arbitrary .so)
    'mapfile',     // mapfile -C callback (callback runs per-line)
    'readarray',   // readarray -C callback
    'hash',        // hash -p /path cmd (poison lookup cache)
    'bind',        // bind -x '"key":cmd' (interactive callback)
    'complete',    // complete -C cmd (executes cmd)
    'compgen',     // compgen -C cmd (non-interactive completion)
    'alias',       // alias name='cmd' (with expand_aliases shopt)
    'let',         // let EXPR (arithmetic eval → subscript RCE)
])

Zsh Dangerous Builtins (18):

const ZSH_DANGEROUS_BUILTINS = new Set([
    'zmodload',    // Load zsh modules
    'emulate',     // Switch shell emulation mode
    'sysopen',     // Open file descriptor
    'sysread',     // Read from file descriptor
    'syswrite',    // Write to file descriptor
    'sysseek',     // Seek in file descriptor
    'zpty',        // Pseudo-terminal control
    'ztcp',        // TCP socket control
    'zsocket',     // Socket control
    'zf_rm', 'zf_mv', 'zf_ln', 'zf_chmod', 'zf_chown',  // FTP functions
    'zf_mkdir', 'zf_rmdir', 'zf_chgrp',
])

6.2 Safe Wrapper Stripping

Commands like timeout, nohup, nice, env, stdbuf wrap other commands. Permission checking must see the wrapped command, not the wrapper.

// ast.ts lines 2220-2384
function checkSemantics(commands: SimpleCommand[]): SemanticCheckResult {
    for (const cmd of commands) {
        let a = cmd.argv

        // Strip wrappers until we see the real command
        for (;;) {
            if (a[0] === 'time' || a[0] === 'nohup') {
                a = a.slice(1)
            } else if (a[0] === 'timeout') {
                // Parse timeout flags: -v, -k DUR, -s SIG, --foreground, etc.
                // SECURITY: Fail CLOSED on unknown flags
                // (previous versions fell through to name='timeout')
                let i = 1
                while (i < a.length) {
                    const arg = a[i]!
                    if (arg === '--foreground' || arg === '--preserve-status') {
                        i++  // no-value flags
                    } else if (/^--(?:kill-after|signal)=[...]/.test(arg)) {
                        i++  // fused long form
                    } else if ((arg === '-k' || arg === '-s') && a[i+1]) {
                        i += 2  // space-separated form
                    } else if (/^-[ks][A-Za-z0-9_.+-]+$/.test(arg)) {
                        i++  // fused short form
                    } else if (arg.startsWith('-')) {
                        return {
                            ok: false,
                            reason: `timeout with ${arg} flag cannot be statically analyzed`
                        }
                    } else {
                        break  // duration found
                    }
                }
                if (a[i] && /^\d+(?:\.\d+)?[smhd]?$/.test(a[i]!)) {
                    a = a.slice(i + 1)
                } else {
                    // SECURITY: non-matching duration → fail CLOSED
                    // GNU timeout accepts .5, +5, 5e-1, inf (not matched by regex)
                    // Previously this fell through to name='timeout'
                    return {
                        ok: false,
                        reason: `timeout duration '${a[i]}' cannot be statically analyzed`
                    }
                }
            } else if (a[0] === 'nice') {
                // -n N or -N (legacy), then wrapped command
            } else if (a[0] === 'env') {
                // [VAR=val...] [-i] [-0] [-v] [-u NAME...] cmd
                // SECURITY: -S (argv splitter), -C (altwd), -P (altpath) → reject
            } else if (a[0] === 'stdbuf') {
                // -o MODE, -e MODE, -i MODE (various forms)
                // SECURITY: fail closed on unknown flags
            } else {
                break  // real command found
            }
        }

        const name = a[0]
        // ... continue with name checks
    }
}

6.3 Array Subscript Arithmetic Detection

Bash arithmetically evaluates subscripts in certain contexts, running $(cmd) even from single-quoted strings:

test -v 'a[$(id)]'     # Evaluates subscript → runs id
printf -v 'a[$(id)]'   # Same
read 'a[$(id)]' <<< x  # Same
unset 'a[$(id)]'       # Same
wait -p 'a[$(id)]'     # bash 5.1+
[[ 'a[$(id)]' -eq 0 ]] # Arithmetic comparison

Detection (ast.ts lines 2428-2496):

const SUBSCRIPT_EVAL_FLAGS: Record<string, Set<string>> = {
    test: new Set(['-v', '-R']),
    '[': new Set(['-v', '-R']),
    '[[': new Set(['-v', '-R']),
    printf: new Set(['-v']),
    read: new Set(['-a']),
    unset: new Set(['-v']),
    wait: new Set(['-p']),
}

const TEST_ARITH_CMP_OPS = new Set(['-eq', '-ne', '-lt', '-le', '-gt', '-ge'])

// In checkSemantics:
if (dangerFlags) {
    for (let i = 1; i < a.length; i++) {
        // Check for [brackets] in operands following danger flags
        if (dangerFlags.has(a[i]!) && a[i+1]?.includes('[')) {
            return { ok: false, reason: '... array subscript ...' }
        }
    }
}

if (name === '[[') {
    for (let i = 2; i < a.length; i++) {
        if (TEST_ARITH_CMP_OPS.has(a[i]!)) {
            if (a[i-1]?.includes('[') || a[i+1]?.includes('[')) {
                return { ok: false, reason: '... array subscript ...' }
            }
        }
    }
}

6.4 Path Injection Detection

// ast.ts lines 2197-2204
const PROC_ENVIRON_RE = /\/proc\/.*\/environ/
const NEWLINE_HASH_RE = /\n[ \t]*#/

// Detected in argument values, env vars, redirect targets

7. Dangerous Pattern Matching & Pre-Checks

7.1 Pre-Parse Checks (Before Tree-Sitter)

These run FIRST, before AST parsing—they catch tree-sitter/bash differentials:

// ast.ts lines 408-437
function parseForSecurityFromAst(cmd: string, root: Node): ParseForSecurityResult {
    // Control characters (bash silently drops, tree-sitter mishandles)
    if (CONTROL_CHAR_RE.test(cmd)) {
        return { kind: 'too-complex', reason: 'Contains control characters' }
    }
    // const CONTROL_CHAR_RE = /[\x00-\x08\x0B-\x1F\x7F]/

    // Unicode whitespace (invisible in terminals, not IFS)
    if (UNICODE_WHITESPACE_RE.test(cmd)) {
        return { kind: 'too-complex', reason: 'Contains Unicode whitespace' }
    }
    // const UNICODE_WHITESPACE_RE = /[\u00A0\u1680\u2000-\u200B\u2028\u2029\u202F\u205F\u3000\uFEFF]/

    // Backslash-escaped whitespace (tree-sitter/bash differential)
    if (BACKSLASH_WHITESPACE_RE.test(cmd)) {
        return { kind: 'too-complex', reason: 'Contains backslash-escaped whitespace' }
    }
    // const BACKSLASH_WHITESPACE_RE = /\\[ \t]|[^ \t\n\\]\\\n/
    // Example: `cat\ test` (tree-sitter keeps backslash, bash treats as literal space)

    // Zsh dynamic tilde expansion: ~[name] invokes zsh_directory_name hook
    if (ZSH_TILDE_BRACKET_RE.test(cmd)) {
        return { kind: 'too-complex', reason: 'Contains zsh ~[ dynamic directory syntax' }
    }
    // const ZSH_TILDE_BRACKET_RE = /~\[/

    // Zsh equals expansion: =cmd → /path/to/cmd (equivalent to $(which cmd))
    if (ZSH_EQUALS_EXPANSION_RE.test(cmd)) {
        return { kind: 'too-complex', reason: 'Contains zsh =cmd equals expansion' }
    }
    // const ZSH_EQUALS_EXPANSION_RE = /(?:^|[\s;&|])=[a-zA-Z_]/
    // Example: `=curl evil.com` runs `/usr/bin/curl` (zsh only)

    // Brace expansion combined with quotes (obfuscation attempt)
    if (BRACE_WITH_QUOTE_RE.test(maskBracesInQuotedContexts(cmd))) {
        return { kind: 'too-complex', reason: 'Contains brace with quote character' }
    }
    // const BRACE_WITH_QUOTE_RE = /\{[^}]*['"]/
    // Example: `{a'}',b}` uses quoted } to obfuscate expansion
}

7.2 Brace Expansion Masking

// ast.ts lines 331-371
function maskBracesInQuotedContexts(cmd: string): string {
    // Fast path: no braces → skip scan
    if (!cmd.includes('{')) return cmd

    // Scan bash quote state (not regex-able due to quote nesting)
    const out: string[] = []
    let inSingle = false, inDouble = false, i = 0

    while (i < cmd.length) {
        const c = cmd[i]!
        if (inSingle) {
            if (c === "'") inSingle = false
            out.push(c === '{' ? ' ' : c)
            i++
        } else if (inDouble) {
            if (c === '\\' && (cmd[i+1] === '"' || cmd[i+1] === '\\')) {
                out.push(c, cmd[i+1]!)
                i += 2
            } else {
                if (c === '"') inDouble = false
                out.push(c === '{' ? ' ' : c)
                i++
            }
        } else {
            if (c === '\\' && i + 1 < cmd.length) {
                out.push(c, cmd[i+1]!)
                i += 2
            } else {
                if (c === "'") inSingle = true
                else if (c === '"') inDouble = true
                out.push(c)
                i++
            }
        }
    }
    return out.join('')
}

// Example:
// Input:  echo "json" '{a,b}'
// Output: echo "json"   a,b   (unquoted { unmasked)

// Input:  '{a,b}'
// Output: '{a,b}' (no change → BRACE_WITH_QUOTE_RE won't match quoted braces)

7.3 SAFE_ENV_VARS Allowlist

Variables whose values are OS/shell-controlled (safe to expand):

// ast.ts lines 125-149
const SAFE_ENV_VARS = new Set([
    'HOME', 'PWD', 'OLDPWD', 'USER', 'LOGNAME', 'SHELL', 'PATH',
    'HOSTNAME', 'UID', 'EUID', 'PPID', 'RANDOM', 'SECONDS', 'LINENO',
    'TMPDIR', 'BASH_VERSION', 'BASHPID', 'SHLVL', 'HISTFILE', 'IFS',
])

// NOTE: $IFS is safe INSIDE strings (quote prevents word-split)
// but NOT as bare arg (classic injection: IFS=:; VAR=a:b; cmd $VAR)

7.4 Special Variables

// ast.ts lines 167-174
const SPECIAL_VAR_NAMES = new Set([
    '?',  // exit status
    '$',  // shell PID
    '!',  // last background PID
    '#',  // number of positional args
    '0',  // script name
    '-',  // shell options
])

// NOT included: '@' and '*' (positional params, empty in fresh BashTool)
// If included, "$*" placeholder would mismatch bash behavior

8. Execution Flow Examples

8.1 Example 1: Simple Command with Variable

# User enters:
VAR=/etc && cat "$VAR/passwd"

# Step 1: parseForSecurity
# - Pre-checks: pass (no dangerous patterns)
# - Parse: builds AST
# - Walk: collectCommands
#   - variable_assignment VAR=/etc → varScope.set('VAR', '/etc')
#   - command cat with argv child string("$VAR/passwd")
#     - walkString → simple_expansion $VAR
#       - resolveSimpleExpansion: tracked '/etc' → return literal
#       - result = '/etc/passwd' (literal inside string)
#   - Output: SimpleCommand { argv: ['cat', '/etc/passwd'], ... }

# Step 2: checkSemantics(['cat', '/etc/passwd'])
# - name='cat' (not dangerous)
# - No array subscripts
# - Path regex check in downstream code

# Step 3: Permission matching
# - User allowlist: Bash(cat:etc/passwd)
# - argv[0]='cat' matches ✓
# - PROCEED with execution

8.2 Example 2: Command Substitution

# User enters:
echo "Built at $(date '+%Y-%m-%d')"

# Step 1: parseForSecurity
# - Walk: walkString finds command_substitution
#   - collectCommandSubstitution → extract inner commands
#     - Output: SimpleCommand { argv: ['date', '+%Y-%m-%d'], ... }
#   - Outer argv: ['echo', 'Built at __CMDSUB_OUTPUT__']
#   - (Inner cmd is separate, permission-checked separately)

# Step 2: checkSemantics twice
# - Inner: checkSemantics(['date', '+%Y-%m-%d'])
#   - name='date' → not dangerous
# - Outer: checkSemantics(['echo', 'Built at __CMDSUB_OUTPUT__'])
#   - name='echo' → not dangerous

# Step 3: Permission matching (both commands)
# - User allowlist: Bash(echo:*) AND Bash(date:*)
# - PROCEED

8.3 Example 3: Too-Complex Rejection

# User enters:
git push $(cat /tmp/branch)

# Step 1: parseForSecurity
# - Walk: command git with argument push (word)
# - Next argument: command_substitution (BARE, not inside string)
#   - Bare $() → command_name may be a path
#   - Return: too-complex (reason: 'Contains command_substitution')

# Result: { kind: 'too-complex', reason: '...', nodeType: 'command_substitution' }

# Step 2: Request user permission
# - "This command is complex and requires your approval"
# - User reviews and approves/denies

8.4 Example 4: Dangerous Builtin

# User enters:
eval "rm -rf /$VAR"

# Step 1: parseForSecurity
# - Parse: succeeds (valid syntax)
# - Walk: argv=['eval', 'rm -rf /$VAR']

# Step 2: checkSemantics
# - name='eval' → in EVAL_LIKE_BUILTINS
# - Return: { ok: false, reason: 'eval is a dangerous builtin...' }

# Result: Permission denied immediately (no need for user prompt for known-dangerous)

9. Constants, Patterns & Safety Lists

9.1 Redirect Operators

// ast.ts lines 224-234
const REDIRECT_OPS: Record<string, Redirect['op']> = {
    '>': '>',      // stdout to file
    '>>': '>>',    // append
    '<': '<',      // stdin from file
    '>&': '>&',    // redirect fd to fd
    '<&': '<&',    // dup fd
    '>|': '>|',   // clobber (noclobber override)
    '&>': '&>',    // stdout+stderr (bash)
    '&>>': '&>>',  // append stdout+stderr
    '<<<': '<<<',  // herestring (content as stdin)
}

9.2 Placeholder Strings (Anti-Injection)

// commands.ts lines 19-35
function generatePlaceholders() {
    const salt = randomBytes(8).toString('hex')  // 16 hex chars
    return {
        SINGLE_QUOTE: `__SINGLE_QUOTE_${salt}__`,
        DOUBLE_QUOTE: `__DOUBLE_QUOTE_${salt}__`,
        NEW_LINE: `__NEW_LINE_${salt}__`,
        ESCAPED_OPEN_PAREN: `__ESCAPED_OPEN_PAREN_${salt}__`,
        ESCAPED_CLOSE_PAREN: `__ESCAPED_CLOSE_PAREN_${salt}__`,
    }
}

// SECURITY: Each command split gets new random salt
// Prevents: `sort __SINGLE_QUOTE__ hello --help __SINGLE_QUOTE__`
// from injecting arguments via placeholder collision

9.3 Raw String Stripping

// ast.ts lines 2029-2031
function stripRawString(text: string): string {
    return text.slice(1, -1)  // Remove ' quotes: 'foo' → foo
}

9.4 Allowed File Descriptors

// commands.ts lines 37-39
const ALLOWED_FILE_DESCRIPTORS = new Set(['0', '1', '2'])
// stdin, stdout, stderr only
// Reject: 3>&1, 9>/tmp/log (custom FDs)

10. Performance Considerations

10.1 Lexer Optimization (UTF-8 Byte Offsets)

// bashParser.ts lines 165-193
function byteAt(L: Lexer, charIdx: number): number {
    if (L.byteTable) return L.byteTable[charIdx]!  // Cache hit

    // Lazy computation: only build table on non-ASCII lookups
    const t = new Uint32Array(L.len + 1)
    let b = 0, i = 0
    while (i < L.len) {
        t[i] = b
        const c = L.src.charCodeAt(i)
        if (c < 0x80) {
            b++; i++
        } else if (c < 0x800) {
            b += 2; i++
        } else if (c >= 0xd800 && c <= 0xdbff) {
            t[i+1] = b + 2
            b += 4; i += 2
        } else {
            b += 3; i++
        }
    }
    t[L.len] = b
    L.byteTable = t
    return t[charIdx]!
}

Fast path: ASCII-only input never allocates byteTable (byte==char).

10.2 Variable Scope Cost

Scope copies only on branch points (||, |, &), not on sequential (&&, ;):

// ast.ts lines 530-544
const isPipeline = node.type === 'pipeline'
let needsSnapshot = false
if (!isPipeline) {
    for (const c of node.children) {
        if (c && (c.type === '||' || c.type === '&')) {
            needsSnapshot = true
            break
        }
    }
}
const snapshot = needsSnapshot ? new Map(varScope) : null

Optimization: Pre-scan for risky operators before allocating Map.

10.3 Brace Masking Fast-Path

// ast.ts lines 331-334
function maskBracesInQuotedContexts(cmd: string): string {
    if (!cmd.includes('{')) return cmd  // >90% of commands
    // ... full quote-state scan
}

11. Security Analysis: Attack Vectors & Mitigations

11.1 Parser Differentials (Tree-Sitter vs Bash)

Attack: Exploit cases where tree-sitter and bash parse differently, hiding dangerous code.

Mitigations:

  1. Pre-checks (lines 408-437): Catch known differentials before AST.

    • Control characters, Unicode whitespace, backslash-whitespace, zsh-syntax
  2. Explicit allowlist (FAIL-CLOSED): Unknown node types → too-complex.

    • New bash feature? Unknown to old parser? Safely rejected.
  3. Coverage: All known differentials catalogued in code comments.

11.2 Injection via Variable Names

Attack: eval ${VAR_NAME} where VAR_NAME contains dangerous code.

Mitigation: Variable names validated in assignment:

// ast.ts lines 1835-1841
if (!/^[A-Za-z_][A-Za-z0-9_]*$/.test(name)) {
    return {
        kind: 'too-complex',
        reason: `Invalid variable name (bash treats as command): ${name}`,
    }
}

11.3 Arithmetic Subscript Injection

Attack: VAR='a[$(id)]' && test -v $VAR → bash runs id.

Mitigation: walkArithmetic() validates literals only, checkSemantics() detects -v flags with [ operands.

11.4 Path Traversal via Heredoc

Attack: cat <<'EOF' <content with /proc/self/environ> EOF → access /etc/passwd.

Mitigation: extractSafeCatHeredoc() rejects /proc/*/environ patterns.

11.5 Command Name Poisoning

Attack: enable -f /path/lib.so name → dlopen arbitrary .so.

Mitigation: enable in EVAL_LIKE_BUILTINS → checkSemantics() rejects.

11.6 Wrapper Command Confusion

Attack: timeout -k 5 10 eval "..." → wrapper confusion, eval never checked.

Mitigation: checkSemantics() iterates all known timeout flags, fails CLOSED on unknown (doesn't fall through to name='timeout').

11.7 Placeholder Collision

Attack: Command contains literal __CMDSUB_OUTPUT__ → injection.

Mitigation: Placeholders include 16-char random salt (commands.ts lines 19-35).


12. Test Coverage & Known Edge Cases

12.1 Adversarial Input

Documented adversarial inputs that trigger parser limits:

# Timeout trigger (bashParser.ts lines 444-456):
# (( a[0][0][0]... )) with ~2800 subscripts → PARSE_TIMEOUT_MICROS

# Node budget exhaustion:
# Deeply nested: (((((...))))) with 10K levels → MAX_NODES

# Result: PARSE_ABORTED sentinel → too-complex (fail-closed)

12.2 Line Continuation Handling

# Correct: foo\<LF>bar → joined to foobar
# Incorrect: foo\\<LF>bar → NOT joined (paired backslashes)

# commands.ts lines 105-119: Regex handles odd/even backslash count

12.3 Heredoc Complexity

# Supported: <<'EOF' (quoted, literal body)
# Rejected: <<EOF (unquoted, expansion enabled)
# Reason: tree-sitter grammar gap—backticks not parsed in body children

13. Configuration & Integration Points

13.1 Feature Flags

// parser.ts lines 51-54, 65, 108
if (feature('TREE_SITTER_BASH') || feature('TREE_SITTER_BASH_SHADOW')) {
    // Tree-sitter path
}
// External builds: shell-quote fallback (legacy)

13.2 Allowed vs Denied Directories

Handled downstream in bashPermissions.ts (not in parser):

  • Permission rules: Bash(git:*), Bash(cat:etc/passwd), etc.
  • Deny rules: Bash(rm:/), etc.

13.3 Analytics Events

// parser.ts lines 40-44, 120-124, 128-133
logEvent('tengu_tree_sitter_load', { success })
logEvent('tengu_tree_sitter_parse_abort', {
    cmdLength: command.length,
    panic: false/true
})

14. File Organization & Dependencies

14.1 File Structure (by lines of code)

File LOC Purpose
bashParser.ts 4436 Lexer, tokenizer, recursive descent parser → TsNode
ast.ts 2679 Tree walker, SimpleCommand extraction, semantic checks
commands.ts 1339 shell-quote integration, redirection stripping, prefix extraction
heredoc.ts 733 Heredoc extraction/restoration for shell-quote
ShellSnapshot.ts 582 Shell environment snapshot
treeSitterAnalysis.ts 506 Tree-sitter utilities (legacy)
ParsedCommand.ts 318 Wrapper for parsed command data
shellQuote.ts 304 Quote handling utilities
bashPipeCommand.ts 294 Pipe command execution
shellCompletion.ts 259 Shell completion integration
parser.ts 230 Public API, TREE_SITTER_BASH feature gate
prefix.ts 204 Command prefix extraction for permission rules
shellQuoting.ts 128 Quote/escape utilities
specs/* ~140 Command-specific specs (pyright, timeout, srun, etc.)

14.2 Import Dependencies

// ast.ts imports
import { SHELL_KEYWORDS } from './bashParser.js'
import type { Node } from './parser.js'
import { PARSE_ABORTED, parseCommandRaw } from './parser.js'

// commands.ts imports
import { quote, tryParseShellCommand } from './shellQuote.js'
import { extractHeredocs, restoreHeredocs } from './heredoc.js'

// parser.ts imports
import { logEvent } from '../../services/analytics/index.js'
import {
    ensureParserInitialized,
    getParserModule,
    type TsNode,
} from './bashParser.js'

15. Future Improvements & Known Limitations

15.1 Known Limitations

  1. Tilde expansion modeling: Conservative reject on ~ in assignment RHS.
  2. IFS modeling: Reject all IFS assignments (can't model custom splits).
  3. PS4 allowlist: Strict charset allowlist (not exhaustive, but conservative).
  4. Brace expansion edge cases: Over-rejects some valid cases for safety.
  5. Heredoc body handling: Can't inspect multiline bodies without false positives.

15.2 Potential Enhancements

  1. Cached parser module: Current implementation reloads on each command.
  2. Incremental parsing: Large scripts could be parsed in chunks.
  3. Whitelist-based command categories: (already done via bashPermissions.ts)
  4. Integration with LSP: Real-time linting of bash commands.

16. Conclusion

Claude Code's bash parser is a security-first, fail-closed system that prioritizes user visibility and safety over convenience. By implementing:

  • Explicit allowlists instead of blacklists
  • Variable scope tracking with unknown-value sentinels
  • Comprehensive semantic checks on dangerous builtins
  • Parser differential detection via pre-checks
  • Resource limits against adversarial input

...the system ensures that dangerous commands either execute with explicit user approval or are safely rejected. The design is inherently conservative—better to ask the user than to silently execute code that might be dangerous.


Appendix: Complete AST Node Type Catalog

All Node Types Encountered

Command Nodes (checked against permission rules):

  • command (simple command)
  • declaration_command (export/declare/typeset/readonly/local)
  • negated_command (! cmd)
  • unset_command (unset VAR)
  • test_command ([[ expr ]])

Structural Nodes (recursively traversed):

  • program (root)
  • list (&&, ||, ;)
  • pipeline (|, |&)
  • redirected_statement
  • subshell ((cmd))
  • compound_statement ({cmd;})
  • for_statement, while_statement, until_statement
  • if_statement, case_statement, elif_clause, else_clause
  • do_group
  • function_definition
  • comment

Argument Nodes (analyzed for content):

  • word
  • number
  • raw_string ('...')
  • string ("...")
  • concatenation
  • simple_expansion ($VAR)
  • command_substitution ($(...) or ...)
  • arithmetic_expansion ($((expr)))
  • expansion (${VAR}, ${VAR:-default}, etc.)
  • brace_expression ({a,b,c})
  • process_substitution (<(...) or >(...))
  • ansi_c_string ($'...')
  • translated_string ($"...")
  • string_content (literal inside double quotes)
  • variable_name
  • special_variable_name ($?, $$, etc.)

Redirect Nodes:

  • file_redirect (> >> < <&)
  • heredoc_redirect (<< or <<-)
  • herestring_redirect (<<<)
  • heredoc_body
  • heredoc_start
  • heredoc_end
  • heredoc_content
  • file_descriptor

Operators:

  • &&, ||, |, |&, &, ;, |, <<, >>, >|, <&, >&, &>, &>>, <<<
  • (, ), [[, ]], [, ], {, }
  • !, =, +=, test operators (-f, -d, ==, !=, =~, etc.)

END OF ANALYSIS

Total AST Node Types Identified: 50+ Dangerous Node Types: 15 Dangerous Builtins: 35+ Safe Environment Variables: 24 Safe Special Variables: 6 Redirect Operators: 10 Parser Timeout (ms): 50 Max Node Budget: 50,000 Max Command Length: 10,000 characters