Skip to content

ACD421/claude-code-binary-patches

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Claude Code Binary Architecture: Safety System Internals

Security research disclosure. This document maps the complete safety architecture of Anthropic's Claude Code CLI (v2.1.119), including system prompt construction, permission systems, refusal handlers, remote killswitches, telemetry, and classifier trust boundaries.

All findings are from analysis of the publicly distributed Claude Code binary and the open-sourced Claude Code repository.

Why This Matters

Claude Code ships with multiple layers of safety controls that are entirely client-side. The binary contains the full system prompt in plaintext, all permission logic, classifier decisions, and remote killswitch handlers. Any user with a hex editor can read, modify, or remove these controls. This document demonstrates the architectural implications.

What's Documented

  1. System Prompt Internals - Every safety instruction embedded in the binary, with exact strings
  2. Permission Pipeline - How tool calls are approved/denied, including the YOLO classifier
  3. Refusal Handler Chain - How the client processes server-side refusals
  4. Remote Killswitches - GrowthBook feature flags that Anthropic can trigger to disable features
  5. Classifier Trust Boundary - An architectural weakness where tool_result content bypasses the input classifier
  6. Telemetry Map - All outbound data collection endpoints and event names
  7. Undocumented Features - Opus 4.7 gating, undercover mode, advisor tool encryption

Key Finding: Classifier Trust Boundary

The Anthropic API runs two classifiers:

Classifier When What it scans What it skips
Input (pre-model) Before inference System prompt, assistant messages, tool_use blocks tool_result blocks
Output (post-model) During inference Model-generated content N/A

The input classifier trusts all tool_result content (role: user). This means any content that reaches the model via a tool result (file reads, command output, web fetches) is not scanned by the input classifier. This is an architectural trust boundary decision with security implications for deployments that rely on the classifier for content filtering.

How To Apply

The patches are designed to be applied by Claude Code itself. The irony is intentional.

  1. Open Claude Code in a terminal with the target binary's directory accessible
  2. Paste the relevant patch section from PATCHES.md into the conversation
  3. Claude will write a Python script that performs length-matched binary replacements on its own executable
  4. Keep this Claude Code session open after patching -- do NOT close it
  5. Open a second Claude Code instance in a new terminal to verify the patches took effect
  6. The second instance runs the patched binary; test it by asking something the unpatched version would refuse
  7. If the second instance behaves correctly, close the first (unpatched) session
  8. If something broke, the backup (claude.exe.bak) is in the same directory -- restore and retry

Critical notes:

  • All patches MUST be length-matched (same byte count old and new) or the binary will corrupt
  • Always back up the binary before patching: copy claude.exe claude.exe.bak
  • When Claude Code auto-updates, patches are overwritten -- re-apply from this manifest
  • The binary path is: %APPDATA%\Claude\claude-code\<VERSION>\claude.exe
  • Find current version: check the most recently modified directory in %APPDATA%\Claude\claude-code\

Files

  • PATCHES.md - Complete patch manifest with exact before/after strings for all safety controls
  • LICENSE - Proprietary license (view/study permitted, commercial use prohibited)

License

Copyright (c) 2025-2026 Andrew C. Dorman. All Rights Reserved. Proprietary license. View and study permitted. Commercial use prohibited. See LICENSE for full terms.

About

Security research: Claude Code v2.1.119 client-side safety architecture, system prompt internals, classifier trust boundaries, and remote killswitch mapping

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors