Skip to content

Security hardening: 30 exploitable vulnerabilities (batches 1–7)#38

Merged
jkyberneees merged 9 commits into
mainfrom
fix/next-security-vulnerabilities-tdd
Jun 13, 2026
Merged

Security hardening: 30 exploitable vulnerabilities (batches 1–7)#38
jkyberneees merged 9 commits into
mainfrom
fix/next-security-vulnerabilities-tdd

Conversation

@jkyberneees

Copy link
Copy Markdown
Contributor

Summary

This PR closes 30 exploitable vulnerabilities discovered in the odek agent runtime through a TDD-driven hardening pass. Each fix is paired with a RED regression test; the branch is kept green with the full suite plus the race detector.

Scope

The hardening spans file tools, browser/network tools, performance/parallel tools, the Web UI server, sub-agent handling, config loading, and the @-resource resolver. Every change is minimal and follows the existing code style.

Batches

Batch Commit Focus
1 9a1eaa6 Browser history cap, search/multi_grep caps, file-size limits, odek serve CSRF / clickjacking / security headers, untrusted-content wrappers for head_tail / diff / sort / tr / json_query
2 67e3da7 Shell / parallel_shell output caps, browser request timeout, batch_patch file-size + diff wrapping, transcribe file-size + output caps, tree directory-width cap
3 73582b1 patch file-size limits + file-mode preservation, glob match cap, sub-agent task-file cap, session_search get/list wrapping and caps
4 09e4372 @-resource / --ctx prompt wrapping, resource file-size cap, delegate_tasks summary cap, patch / batch_patch output-expansion cap
5 db47d8f write_file content cap, file_info CWD confinement + path wrapping, WebSocket max payload, session file size cap, SKILL.md size cap
6 8dbba3d tree path wrapping, head_tail output cap, AGENTS.md / IDENTITY.md size caps, search_files symlink hardening
7 68078c4 base64 file-mode wrapping, browser title/element-text wrapping, browser interactive-element cap, config file size cap, FileResolver.Search symlink metadata hardening

Key security themes

  • Untrusted-content boundary: file-reading tools (read_file, search_files, multi_grep, batch_read, head_tail, diff, sort, tr, json_query, glob, file_info, tree, base64 file mode), shell, browser, transcribe, session_search, @-resources, --ctx files, and MCP tools now wrap external output in a per-call nonce wrapper.
  • DoS / OOM caps: search results, shell output, browser history/elements, perf-tool inputs/outputs, write_file content, sub-agent task/summary sizes, session/skill/config/resource files, and WebSocket payloads are all bounded.
  • Symlink/metadata hardening: search_files and FileResolver.Search use Lstat to avoid following symlinks for metadata.
  • Network hardening: browser and http_batch re-classify redirect hops; browser enforces a request timeout and caps extracted elements.
  • Serve hardening: CSRF-on-localhost protection, security headers, and WebSocket payload limits.

Testing

# Full suite
go test ./... -count=1

# Race detector on affected packages
go test -race ./cmd/odek ./internal/config ./internal/resource -count=1

Both pass.

Documentation

  • AGENTS.md updated with the new defenses and constants.
  • docs/SECURITY.md updated with the untrusted-content wrapper table, config-size-cap section, and resource-resolver symlink note.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 13, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
odek 47fae17 Commit Preview URL

Branch Preview URL
Jun 13 2026, 12:41 PM

Comment thread internal/resource/resource.go Dismissed
vprotocol finding (axis 2.1/2.9): the maxHeadTailTotalBytes comment
claimed a "total across all files" guarantee that the code enforces
per-file. Correct the comment to state the cap is per-file and that the
~10 MiB aggregate bound comes from the separate 10-file-per-call limit.

Add TestHeadTail_CapsOutputSizeMultiFile to lock both the per-file cap
and the aggregate bound across the max 10 files. Additive only — no
behavior change to security logic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees

Copy link
Copy Markdown
Contributor Author

🔏 Verification Certificate — AI Verification Protocol v5.2.7

Ran the AI Verification Protocol against this PR's diff (8 commits, ~1,790 insertions across 30 files).

Classification: KnownGroundTruth — human-authored security fixes, each with a TDD test for the exact change (generator_identity absent; Ci_source: human_floor)
Pipeline diversity: single-model run (Opus 4.8). Provider diversity unavailable → monoculture caveat (§0.1). B/C/D/E roles played over a static diff; no live replay sandbox (§6.2-F not run — signal f skipped, weight redistributed per §3.5).
Untrusted-input pre-scan: clean — no verdict-affecting strings in the diff.

Axes Summary

Axis Status Key finding
2.1 Semantic Correctness ⚠️→✅ maxHeadTailTotalBytes comment claimed a "total across all files" guarantee the code enforces per-file. Fixed (comment + test).
2.2 Behavioral Contract Caps/wrapping match documented intent in AGENTS.md / SECURITY.md.
2.3 Security Surface CSRF requireLocalOrigin fails closed on unknown methods; Origin-present → localhost-only; empty-Origin allowed (standard pattern). Symlink hardening via Lstat. WS/file/config size caps sound.
2.4 Structural Integrity limitWriter clean; no new coupling.
2.5 Behavioral Exploration -race on concurrent paths (limitWriter, parallel shell, delegate_tasks) clean. transcribe temp file properly defer-removed (no leak).
2.6 Dependency Integrity No new dependencies.
2.7 Generator Provenance ⚠️ (info) Human-authored; no generator_identity. Non-penalizing note.
2.8 Adversarial Surface Untrusted-content wrapping extended to title/elements, tree/glob/file_info paths, session_search, @-resources, ctx files. unwrapUntrusted correctly newline-trims.
2.9 Documentation Coverage Every new cap documented in AGENTS.md + docs/SECURITY.md; new §15 config-cap section.

Signals (qualitative)

  • o (oracle agreement): very high — 49 tests, ≥1 per change (now 50 with the added multi-file test).
  • b (branch coverage): high on changed lines; both branches of caps/wrap exercised.
  • t (static depth): go vet + go build clean.
  • m / f: not independently run (no mutation framework / live sandbox in this static pass) → redistributed per §3.5.

Findings & Remediations (auto-applied, additive — §7.1)

# Axis Type Files Auto-applied
1 2.1 / 2.9 Doc accuracy — corrected the head_tail cap comment to state the cap is per-file, with the ~10 MiB aggregate bound coming from the separate 10-file-per-call limit cmd/odek/perf_tools.go
2 2.1 (oracle) Added TestHeadTail_CapsOutputSizeMultiFile locking the real per-file + aggregate bound cmd/odek/next_security_vulnerabilities_test.go

ℹ️ The two remediations above are committed and pushed as 016e091 — additive (one comment, one test). Per §7.4/§7.1, no behavior-changing source logic was modified — all security logic is exactly as authored.

Repair Gate (§7.5): ✅ build passes · ✅ full cmd/odek suite passes · ✅ -race clean · ✅ go vet clean · ✅ additive/reversible · ✅ no new warnings.

Verdict: AutoApprove (single-reviewer caveat → HumanReviewRecommended under a multi-provider policy)

The PR is genuinely high quality: comprehensive DoS/size caps, consistent untrusted-content wrapping at every filesystem/network boundary, symlink hardening (Lstat/O_NOFOLLOW), REST CSRF + clickjacking headers, and strict TDD coverage. No exploitable defects found — only one stated-vs-actual documentation inaccuracy (now fixed) plus a coverage-strengthening test.

Rationale: all 9 axes pass after remediation; the only binding gate is the §0.1 monoculture caveat — this was a single-model verification pass with no live replay sandbox. A human auditor requiring a fully independent (multi-provider) pipeline should treat this as HumanReviewRecommended.

🤖 Generated via the AI Verification Protocol with Claude Code

CodeQL flagged uncontrolled user data (the @-reference query) reaching a
path expression in FileResolver.Search. A query like "../../etc/passwd"
makes filepath.Join(root, query) clean to a path outside root, so
filepath.Glob matched files the workspace must not expose and os.Lstat
leaked their metadata (size, etc.) — the same metadata-disclosure class
this PR hardens elsewhere.

Add a separator-aware withinRoot() confinement primitive and apply it to
every Search match before touching the filesystem. Reuse it in Load,
replacing the prior HasPrefix(absTarget, absRoot) check that also fixed a
boundary bug ("/foo" matching "/foobar").

TestFileResolver_SearchOutsideRoot fails against the pre-fix code
(leaks @../secret.txt) and passes with the guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees jkyberneees merged commit 19b6986 into main Jun 13, 2026
7 checks passed
jkyberneees added a commit that referenced this pull request Jun 17, 2026
* fix(sec): scope audit ingest recorder to run context (#20)

Replace the package-global ingestRecorder callback with a context.Context
value carried through the agent loop. This removes the race where
concurrent WebSocket sessions could overwrite each other's audit
attribution.

Key changes:
- internal/loop: add WithIngestRecorder / IngestRecorderFrom helpers.
- cmd/odek/untrusted.go: wrapUntrusted now reads the recorder from ctx;
  remove setIngestRecorder globals.
- cmd/odek tools: embed ctxTool and pass toolCtx() to wrapUntrusted.
- cmd/odek serve/main: inject per-session/per-prompt recorder into ctx.
- odek.go: toolAdapter forwards SetContext to odek.Tool implementations.
- Tests: add unit, integration, and concurrency regression tests.

Fixes finding #20 from sec_findings.md.

* fix(sec): scope prompt cancel by session ID (#19)

Replace the global currentPromptCancel atomic.Value with a mutex-protected
map keyed by session ID. POST /api/cancel now requires a session_id query
parameter, so one WebSocket connection cannot cancel another connection's
running prompt.

Changes:
- cmd/odek/serve.go: add promptCancels map + register/unregister/cancel helpers;
  handlePrompt registers/unregisters the cancel func for the session it runs;
  handleCancel requires session_id.
- cmd/odek/ui/app.js: include sessionId in cancel fetch URL.
- cmd/odek/serve_test.go: update existing cancel tests to pass session_id;
  add TestServe_Cancel_MissingSessionIDReturns400 and
  TestServe_Cancel_CannotCrossSessions regression test.

Fixes finding #19 from sec_findings.md.

* fix(sec): redact Session.Task before persisting (#21)

Store.Save already redacted Message.Content and ReasoningContent, but the
session Task field (the first user prompt / session title) was written to
disk verbatim. Secrets in the first prompt leaked into the session file and
index, and were returned by the session APIs.

Changes:
- internal/session/session.go: redact sess.Task in saveLocked before
  marshalling and updating the index.
- internal/session/session_test.go: add TestStore_SaveRedactsTask covering
  in-memory, on-disk, and index redaction of the Task field.

Fixes finding #21 from sec_findings.md.

* fix(sec): confine glob/search_files patterns to workspace (#22)

Replace filepath.Glob in glob and search_files(target=files) with a
workspace-confined walk helper. filepath.Glob follows symlinked directory
components and resolves .. segments, allowing patterns like link/*.txt or
../.ssh/id_* to enumerate files outside the working directory.

Changes:
- cmd/odek/file_tool.go: add confinedGlob helper using filepath.WalkDir,
  skipping all symlinks, rejecting .. and absolute patterns, and verifying
  every match stays inside the resolved root. Refactor globTool.Call and
  searchFilesTool.searchFiles to use it.
- cmd/odek/security_vulnerabilities_test.go: add
  TestGlob_DotDotPatternRejected, TestSearchFiles_DotDotPatternRejected, and
  TestGlob_AbsolutePatternRejected.

Fixes finding #22 from sec_findings.md.

* fix(sec): nonce sub-agent untrusted-input fence (#24)

buildSubagentRequest previously wrapped untrusted parent input in constant
<untrusted_input> tags with no per-call nonce and no neutralisation of the
marker string inside the body. A crafted </untrusted_input> in the goal or
context could close the fence early and inject instructions.

Changes:
- cmd/odek/subagent.go: add wrapUntrustedSubagentInput and
  neutraliseSubagentInputLiterals helpers. Generate a random nonce per
  request, emit <untrusted_input_<nonce>> tags, and replace literal
  occurrences of untrusted_input with a look-alike so injected close tags
  cannot pair with the wrapper.
- cmd/odek/subagent_prompt_isolation_test.go: update
  TestSubagentRequest_UntrustedIsFenced for nonce'd tags; add
  TestSubagentRequest_UntrustedNeutralisesCloseTag regression test.

Fixes finding #24 from sec_findings.md.

* fix(sec): reject symlinks in sandbox --ctx injection (#25)

InjectFiles used os.Stat, which follows symlinks, and only verified that
the symlink path itself was under cwd. A symlink committed inside the
project pointing at an arbitrary host file (e.g. leak -> /etc/shadow)
would have its target copied into the container.

Changes:
- internal/sandbox/sandbox.go: use os.Lstat in InjectFiles; reject any ctx
  file that is a symlink. Resolve cwd to an absolute path and use
  isPathUnder for the relative-path check.
- internal/sandbox/sandbox_test.go: add TestInjectFiles_SkipsSymlink.

Fixes finding #25 from sec_findings.md.

* fix(sec): allowlist sandbox network modes (#26)

BuildRunArgs only rewrote 'host' to 'none', but other Docker network modes
such as 'container:<name>' were passed through unchanged. That allowed the
sandbox to share another container's network namespace and bypass intended
network isolation.

Changes:
- internal/sandbox/sandbox.go: enforce an allowlist of 'none' and 'bridge'
  for --sandbox-network; reject any other value (including 'container:...')
  by forcing it to 'none' with a warning. Empty Network defaults to
  'bridge'.
- internal/sandbox/sandbox_test.go: add tests for container:, bridge,
  empty, and host network modes.

Fixes finding #26 from sec_findings.md.

* fix(sec): FD-based API key handoff for Telegram restart child (#28)

spawnChild copied os.Environ() and re-injected ODEK_API_KEY,
DEEPSEEK_API_KEY, and OPENAI_API_KEY, leaving the key visible in the child
process environment (/proc/<pid>/environ) and crash dumps.

Changes:
- cmd/odek/telegram.go: reuse the existing FD-based key handoff helpers
  (writeKeyToUnlinkedFile / readKeyFromInheritedFD). telegramCmd reads the
  key from  if present; spawnChild writes the resolved key
  to an unlinked tempfile and passes the FD to the child via the safe
  ODEK_API_KEY_FD signal env var instead of the key itself.
- cmd/odek/telegram_test.go: replace env-injection test with a ProcAttr
  interception test that verifies the key is passed via FD 3 and absent
  from the child env.

Fixes finding #28 from sec_findings.md.

* fix(sec): harden Telegram document filename sanitization (#30)

sanitizeDocName only stripped directory components and rejected empty/.
/.. names. It allowed hidden files, arbitrary characters, and very long
filenames, so an attacker could send a document named '.bashrc' or an
overlong filename that would be saved under ~/.odek/media/.

Changes:
- internal/telegram/download.go: reject names starting with '.', restrict
  characters to [A-Za-z0-9._-] (replacing others with '_'), enforce a
  maxDocNameLen of 200 while preserving the extension, and factor out
  fallbackDocName.
- internal/telegram/download_test.go: add test cases for hidden files,
  unsafe characters, Unicode names, and overlong names.

Fixes finding #30 from sec_findings.md.

* Fix #31: decline auto-save of tainted skills; require --force to promote

- AutoSaveSuggestions now skips suggestions with untrusted provenance unless allowUntrusted is true.

- RunAutoSaveLoop declines tainted skills by default and logs them.

- promoteSkill refuses to clear NeedsReview on tainted skills without --force.

- CLI help, SECURITY.md, LEARNING.md, README.md, and AGENTS.md updated.

- Added/updated tests for promote refusal/force and auto-save decline/allow paths.

* Fix #34: escalate interpreters that can invoke shell commands to code_execution

- Add embeddedShellInterpreters map for awk/gawk/mawk/nawk, ed/ex, vi/vim/nvim/view, emacs/emacsclient.

- Classify their non-version/help invocations as code_execution.

- Detect sed 'e' command, -f/--file, and s///e flag as code_execution.

- Remove awk from writePrefixes; add embeddedShellInterpreters to isKnownCommandName.

- Add regression tests for awk, sed, vim, find -exec bypasses.

- Update SECURITY.md and AGENTS.md.

* Fix #37: cap MCP server stdout line size to prevent OOM

- Replace unbounded bufio.Reader.ReadString with bufio.Scanner limited to 10 MiB per line.

- On oversized line, report error to pending callers and close the connection.

- Add TestReadLoop_OversizedResponse regression test.

* security: harden MCP tools/list metadata trust (#38)

- Validate MCP tool names in internal/mcpclient/client.go Discover:
  ASCII letters/digits/_/-, non-empty, <=64 chars.
- Reject raw MCP tool names that shadow odek built-ins in cmd/odek/main.go
  loadMCPTools, even though prefixed names are normally used.
- Add per-tool approval for project-level MCP servers in
  cmd/odek/mcp_approval.go, persisted in ~/.odek/mcp_tool_approvals.json.
- Reuse ODEK_APPROVE_MCP env var for both server and tool approvals.
- Add unit tests and E2E coverage for name validation, shadowing, and
  approval gating.
- Update docs/SECURITY.md, AGENTS.md, and docs/MCP.md.

* security: cap file sizes in read-only perf tools and inline inputs (#39, #40)

- Add 10 MiB size checks to count_lines, checksum, head_tail, and
  word_count before scanning/hashing, matching other perf tools.
- Add maxInlineContentBytes (10 MiB) and enforce it on base64 inline
  string/content and tr inline content arguments.
- Add tests for each new size-cap rejection path.
- Update docs/SECURITY.md and AGENTS.md.

* test+docs: increase coverage of size-cap changes and keep docs consistent

- Add exact-size boundary tests for count_lines, checksum, head_tail,
  word_count, base64 inline content, and tr inline content.
- Add tail-mode and decode-string oversized rejection tests for head_tail
  and base64 to cover both branches.
- Add CHEATSHEET note listing the perf tools with 10 MiB file/inline caps.

* security: fix schedule locking/JSON caps and nonce tool-result delimiters (#41-#43)

- internal/schedule/store.go: fileLock now returns an error instead of a
  silent no-op fallback; all mutating callers (Add, Put, Remove, SetEnabled,
  SaveState) abort when the flock cannot be acquired.
- internal/schedule/store.go: readJSON now Stat()s schedule/state files and
  rejects anything larger than 10 MiB before reading.
- internal/loop/loop.go: tool-result delimiter now embeds a per-call random
  hex nonce in both the opening and closing lines, preventing a tool or MCP
  server from forging the closing delimiter.
- Add/update tests for lock failure, oversized schedule file, exact-size
  boundary, and nonce uniqueness.
- Update docs/SECURITY.md and AGENTS.md.

* security: fix parallel_shell/batch_patch/browser/telegram/restart findings (#44-#48)

- parallel_shell: bind commands to agent context via CommandContext, run
  each in its own process group, kill the group on cancellation/timeout,
  and cap per-command timeouts at 30 minutes.
- batch_patch: add trustedClasses field and pass it to CheckOperation for
  consistent approval behavior with write_file/patch.
- browser: wrap clickableRef.URL as untrusted; keep rawURL for internal
  click resolution.
- telegram: enforce MaxMsgLength in UTF-16 code units instead of bytes.
- telegram: write restart.json with 0600 instead of 0644.
- Add regression tests for all five fixes.
- Update docs/SECURITY.md and AGENTS.md.

* test+docs: increase coverage of #44-#48 changes and keep CHEATSHEET current

- Add TestParallelShell_ContextCancel_Explicit to cover the
  context.Canceled branch in runOne.
- Add TestBrowser_Click_ButtonAcknowledges and
  TestBrowser_Click_InputSubmitAcknowledges to cover button/submit click
  paths and increase browser_tool.go coverage.
- Update docs/CHEATSHEET.md: parallel_shell process-group kill, browser
  URL wrapping, Telegram flock/0600 restart marker/UTF-16 length limits.

* security: fixes #50-#53 (Telegram MarkdownV2 escape, resource limit cap, WS connection limits, sub-agent progress bounds)

- #50: Escape agent-generated text in send_message before Telegram MarkdownV2 send
- #51: Cap /api/resources limit to 100 in handler and Registry.Search
- #52: Enforce max 20 concurrent WebSocket connections + per-IP upgrade rate limit
- #53: Cap sub-agent progress stream at 100k lines / 100 MiB and cancel child on overflow

Tests and docs (SECURITY.md, CHEATSHEET.md) updated.

* security: harden #54-#59 (subagent cleanup, fsatomic, resource search, schedule perms, config TOCTOU, flock docs)

- Scope sub-agent --task file deletion to odek temp files only
- Create fsatomic temp files with exact perms via O_EXCL + random suffix
- Sanitize resource search query: 256-byte cap + glob metachar escaping
- Create schedule directory with 0700 permissions
- Read config via single Open+LimitReader to remove size-check TOCTOU
- Document advisory flock semantics in package doc
- Update SECURITY.md with defenses 43-48 and attack-vector rows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants