Claude Code Binary Architecture: Safety System Internals

Security research disclosure. This document maps the complete safety architecture of Anthropic's Claude Code CLI (v2.1.119), including system prompt construction, permission systems, refusal handlers, remote killswitches, telemetry, and classifier trust boundaries.

All findings are from analysis of the publicly distributed Claude Code binary and the open-sourced Claude Code repository.

Why This Matters

Claude Code ships with multiple layers of safety controls that are entirely client-side. The binary contains the full system prompt in plaintext, all permission logic, classifier decisions, and remote killswitch handlers. Any user with a hex editor can read, modify, or remove these controls. This document demonstrates the architectural implications.

What's Documented

System Prompt Internals - Every safety instruction embedded in the binary, with exact strings
Permission Pipeline - How tool calls are approved/denied, including the YOLO classifier
Refusal Handler Chain - How the client processes server-side refusals
Remote Killswitches - GrowthBook feature flags that Anthropic can trigger to disable features
Classifier Trust Boundary - An architectural weakness where tool_result content bypasses the input classifier
Telemetry Map - All outbound data collection endpoints and event names
Undocumented Features - Opus 4.7 gating, undercover mode, advisor tool encryption

Key Finding: Classifier Trust Boundary

The Anthropic API runs two classifiers:

Classifier	When	What it scans	What it skips
Input (pre-model)	Before inference	System prompt, assistant messages, `tool_use` blocks	`tool_result` blocks
Output (post-model)	During inference	Model-generated content	N/A

The input classifier trusts all tool_result content (role: user). This means any content that reaches the model via a tool result (file reads, command output, web fetches) is not scanned by the input classifier. This is an architectural trust boundary decision with security implications for deployments that rely on the classifier for content filtering.

How To Apply

The patches are designed to be applied by Claude Code itself. The irony is intentional.

Open Claude Code in a terminal with the target binary's directory accessible
Paste the relevant patch section from PATCHES.md into the conversation
Claude will write a Python script that performs length-matched binary replacements on its own executable
Keep this Claude Code session open after patching -- do NOT close it
Open a second Claude Code instance in a new terminal to verify the patches took effect
The second instance runs the patched binary; test it by asking something the unpatched version would refuse
If the second instance behaves correctly, close the first (unpatched) session
If something broke, the backup (claude.exe.bak) is in the same directory -- restore and retry

Critical notes:

All patches MUST be length-matched (same byte count old and new) or the binary will corrupt
Always back up the binary before patching: copy claude.exe claude.exe.bak
When Claude Code auto-updates, patches are overwritten -- re-apply from this manifest
The binary path is: %APPDATA%\Claude\claude-code\<VERSION>\claude.exe
Find current version: check the most recently modified directory in %APPDATA%\Claude\claude-code\

Files

PATCHES.md - Complete patch manifest with exact before/after strings for all safety controls
LICENSE - Proprietary license (view/study permitted, commercial use prohibited)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
PATCHES.md		PATCHES.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code Binary Architecture: Safety System Internals

Why This Matters

What's Documented

Key Finding: Classifier Trust Boundary

How To Apply

Files

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Claude Code Binary Architecture: Safety System Internals

Why This Matters

What's Documented

Key Finding: Classifier Trust Boundary

How To Apply

Files

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages