Security hardening: 30 exploitable vulnerabilities (batches 1–7) by jkyberneees · Pull Request #38 · BackendStack21/odek

jkyberneees · 2026-06-13T12:26:31Z

Summary

This PR closes 30 exploitable vulnerabilities discovered in the odek agent runtime through a TDD-driven hardening pass. Each fix is paired with a RED regression test; the branch is kept green with the full suite plus the race detector.

Scope

The hardening spans file tools, browser/network tools, performance/parallel tools, the Web UI server, sub-agent handling, config loading, and the @-resource resolver. Every change is minimal and follows the existing code style.

Batches

Batch	Commit	Focus
1	`9a1eaa6`	Browser history cap, search/multi_grep caps, file-size limits, `odek serve` CSRF / clickjacking / security headers, untrusted-content wrappers for head_tail / diff / sort / tr / json_query
2	`67e3da7`	Shell / parallel_shell output caps, browser request timeout, batch_patch file-size + diff wrapping, transcribe file-size + output caps, tree directory-width cap
3	`73582b1`	patch file-size limits + file-mode preservation, glob match cap, sub-agent task-file cap, session_search get/list wrapping and caps
4	`09e4372`	@-resource / `--ctx` prompt wrapping, resource file-size cap, delegate_tasks summary cap, patch / batch_patch output-expansion cap
5	`db47d8f`	write_file content cap, file_info CWD confinement + path wrapping, WebSocket max payload, session file size cap, SKILL.md size cap
6	`8dbba3d`	tree path wrapping, head_tail output cap, AGENTS.md / IDENTITY.md size caps, search_files symlink hardening
7	`68078c4`	base64 file-mode wrapping, browser title/element-text wrapping, browser interactive-element cap, config file size cap, FileResolver.Search symlink metadata hardening

Key security themes

Untrusted-content boundary: file-reading tools (read_file, search_files, multi_grep, batch_read, head_tail, diff, sort, tr, json_query, glob, file_info, tree, base64 file mode), shell, browser, transcribe, session_search, @-resources, --ctx files, and MCP tools now wrap external output in a per-call nonce wrapper.
DoS / OOM caps: search results, shell output, browser history/elements, perf-tool inputs/outputs, write_file content, sub-agent task/summary sizes, session/skill/config/resource files, and WebSocket payloads are all bounded.
Symlink/metadata hardening: search_files and FileResolver.Search use Lstat to avoid following symlinks for metadata.
Network hardening: browser and http_batch re-classify redirect hops; browser enforces a request timeout and caps extracted elements.
Serve hardening: CSRF-on-localhost protection, security headers, and WebSocket payload limits.

Testing

# Full suite
go test ./... -count=1

# Race detector on affected packages
go test -race ./cmd/odek ./internal/config ./internal/resource -count=1

Both pass.

Documentation

AGENTS.md updated with the new defenses and constants.
docs/SECURITY.md updated with the untrusted-content wrapper table, config-size-cap section, and resource-resolver symlink note.

…ile-size limits, serve CSRF, untrusted wrappers)

…batch_patch, transcribe, tree width)

…nscribe output, session_search get)

…ession_search wrapping, resource size, subagent summary, patch expansion)

…n/skill sizes

…les symlinks

…le/element wrapping + cap, config size cap, resource resolver symlink Lstat)

cloudflare-workers-and-pages · 2026-06-13T12:26:39Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	odek	`47fae17`	Commit Preview URL Branch Preview URL	Jun 13 2026, 12:41 PM

vprotocol finding (axis 2.1/2.9): the maxHeadTailTotalBytes comment claimed a "total across all files" guarantee that the code enforces per-file. Correct the comment to state the cap is per-file and that the ~10 MiB aggregate bound comes from the separate 10-file-per-call limit. Add TestHeadTail_CapsOutputSizeMultiFile to lock both the per-file cap and the aggregate bound across the max 10 files. Additive only — no behavior change to security logic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jkyberneees · 2026-06-13T12:37:17Z

🔏 Verification Certificate — AI Verification Protocol v5.2.7

Ran the AI Verification Protocol against this PR's diff (8 commits, ~1,790 insertions across 30 files).

Classification: KnownGroundTruth — human-authored security fixes, each with a TDD test for the exact change (generator_identity absent; Ci_source: human_floor)
Pipeline diversity: single-model run (Opus 4.8). Provider diversity unavailable → monoculture caveat (§0.1). B/C/D/E roles played over a static diff; no live replay sandbox (§6.2-F not run — signal f skipped, weight redistributed per §3.5).
Untrusted-input pre-scan: clean — no verdict-affecting strings in the diff.

Axes Summary

Axis	Status	Key finding
2.1 Semantic Correctness	⚠️→✅	`maxHeadTailTotalBytes` comment claimed a "total across all files" guarantee the code enforces per-file. Fixed (comment + test).
2.2 Behavioral Contract	✅	Caps/wrapping match documented intent in `AGENTS.md` / `SECURITY.md`.
2.3 Security Surface	✅	CSRF `requireLocalOrigin` fails closed on unknown methods; Origin-present → localhost-only; empty-Origin allowed (standard pattern). Symlink hardening via `Lstat`. WS/file/config size caps sound.
2.4 Structural Integrity	✅	`limitWriter` clean; no new coupling.
2.5 Behavioral Exploration	✅	`-race` on concurrent paths (`limitWriter`, parallel shell, delegate_tasks) clean. `transcribe` temp file properly `defer`-removed (no leak).
2.6 Dependency Integrity	✅	No new dependencies.
2.7 Generator Provenance	⚠️ (info)	Human-authored; no `generator_identity`. Non-penalizing note.
2.8 Adversarial Surface	✅	Untrusted-content wrapping extended to title/elements, tree/glob/file_info paths, session_search, @-resources, ctx files. `unwrapUntrusted` correctly newline-trims.
2.9 Documentation Coverage	✅	Every new cap documented in `AGENTS.md` + `docs/SECURITY.md`; new `§15` config-cap section.

Signals (qualitative)

o (oracle agreement): very high — 49 tests, ≥1 per change (now 50 with the added multi-file test).
b (branch coverage): high on changed lines; both branches of caps/wrap exercised.
t (static depth): go vet + go build clean.
m / f: not independently run (no mutation framework / live sandbox in this static pass) → redistributed per §3.5.

Findings & Remediations (auto-applied, additive — §7.1)

#	Axis	Type	Files	Auto-applied
1	2.1 / 2.9	Doc accuracy — corrected the `head_tail` cap comment to state the cap is per-file, with the ~10 MiB aggregate bound coming from the separate 10-file-per-call limit	`cmd/odek/perf_tools.go`	✅
2	2.1 (oracle)	Added `TestHeadTail_CapsOutputSizeMultiFile` locking the real per-file + aggregate bound	`cmd/odek/next_security_vulnerabilities_test.go`	✅

ℹ️ The two remediations above are committed and pushed as 016e091 — additive (one comment, one test). Per §7.4/§7.1, no behavior-changing source logic was modified — all security logic is exactly as authored.

Repair Gate (§7.5): ✅ build passes · ✅ full cmd/odek suite passes · ✅ -race clean · ✅ go vet clean · ✅ additive/reversible · ✅ no new warnings.

Verdict: AutoApprove (single-reviewer caveat → `HumanReviewRecommended` under a multi-provider policy)

The PR is genuinely high quality: comprehensive DoS/size caps, consistent untrusted-content wrapping at every filesystem/network boundary, symlink hardening (Lstat/O_NOFOLLOW), REST CSRF + clickjacking headers, and strict TDD coverage. No exploitable defects found — only one stated-vs-actual documentation inaccuracy (now fixed) plus a coverage-strengthening test.

Rationale: all 9 axes pass after remediation; the only binding gate is the §0.1 monoculture caveat — this was a single-model verification pass with no live replay sandbox. A human auditor requiring a fully independent (multi-provider) pipeline should treat this as HumanReviewRecommended.

_{🤖 Generated via the AI Verification Protocol with Claude Code}

CodeQL flagged uncontrolled user data (the @-reference query) reaching a path expression in FileResolver.Search. A query like "../../etc/passwd" makes filepath.Join(root, query) clean to a path outside root, so filepath.Glob matched files the workspace must not expose and os.Lstat leaked their metadata (size, etc.) — the same metadata-disclosure class this PR hardens elsewhere. Add a separator-aware withinRoot() confinement primitive and apply it to every Search match before touching the filesystem. Reuse it in Load, replacing the prior HasPrefix(absTarget, absRoot) check that also fixed a boundary bug ("/foo" matching "/foobar"). TestFileResolver_SearchOutsideRoot fails against the pre-fix code (leaks @../secret.txt) and passes with the guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sec): scope audit ingest recorder to run context (#20) Replace the package-global ingestRecorder callback with a context.Context value carried through the agent loop. This removes the race where concurrent WebSocket sessions could overwrite each other's audit attribution. Key changes: - internal/loop: add WithIngestRecorder / IngestRecorderFrom helpers. - cmd/odek/untrusted.go: wrapUntrusted now reads the recorder from ctx; remove setIngestRecorder globals. - cmd/odek tools: embed ctxTool and pass toolCtx() to wrapUntrusted. - cmd/odek serve/main: inject per-session/per-prompt recorder into ctx. - odek.go: toolAdapter forwards SetContext to odek.Tool implementations. - Tests: add unit, integration, and concurrency regression tests. Fixes finding #20 from sec_findings.md. * fix(sec): scope prompt cancel by session ID (#19) Replace the global currentPromptCancel atomic.Value with a mutex-protected map keyed by session ID. POST /api/cancel now requires a session_id query parameter, so one WebSocket connection cannot cancel another connection's running prompt. Changes: - cmd/odek/serve.go: add promptCancels map + register/unregister/cancel helpers; handlePrompt registers/unregisters the cancel func for the session it runs; handleCancel requires session_id. - cmd/odek/ui/app.js: include sessionId in cancel fetch URL. - cmd/odek/serve_test.go: update existing cancel tests to pass session_id; add TestServe_Cancel_MissingSessionIDReturns400 and TestServe_Cancel_CannotCrossSessions regression test. Fixes finding #19 from sec_findings.md. * fix(sec): redact Session.Task before persisting (#21) Store.Save already redacted Message.Content and ReasoningContent, but the session Task field (the first user prompt / session title) was written to disk verbatim. Secrets in the first prompt leaked into the session file and index, and were returned by the session APIs. Changes: - internal/session/session.go: redact sess.Task in saveLocked before marshalling and updating the index. - internal/session/session_test.go: add TestStore_SaveRedactsTask covering in-memory, on-disk, and index redaction of the Task field. Fixes finding #21 from sec_findings.md. * fix(sec): confine glob/search_files patterns to workspace (#22) Replace filepath.Glob in glob and search_files(target=files) with a workspace-confined walk helper. filepath.Glob follows symlinked directory components and resolves .. segments, allowing patterns like link/*.txt or ../.ssh/id_* to enumerate files outside the working directory. Changes: - cmd/odek/file_tool.go: add confinedGlob helper using filepath.WalkDir, skipping all symlinks, rejecting .. and absolute patterns, and verifying every match stays inside the resolved root. Refactor globTool.Call and searchFilesTool.searchFiles to use it. - cmd/odek/security_vulnerabilities_test.go: add TestGlob_DotDotPatternRejected, TestSearchFiles_DotDotPatternRejected, and TestGlob_AbsolutePatternRejected. Fixes finding #22 from sec_findings.md. * fix(sec): nonce sub-agent untrusted-input fence (#24) buildSubagentRequest previously wrapped untrusted parent input in constant <untrusted_input> tags with no per-call nonce and no neutralisation of the marker string inside the body. A crafted </untrusted_input> in the goal or context could close the fence early and inject instructions. Changes: - cmd/odek/subagent.go: add wrapUntrustedSubagentInput and neutraliseSubagentInputLiterals helpers. Generate a random nonce per request, emit <untrusted_input_<nonce>> tags, and replace literal occurrences of untrusted_input with a look-alike so injected close tags cannot pair with the wrapper. - cmd/odek/subagent_prompt_isolation_test.go: update TestSubagentRequest_UntrustedIsFenced for nonce'd tags; add TestSubagentRequest_UntrustedNeutralisesCloseTag regression test. Fixes finding #24 from sec_findings.md. * fix(sec): reject symlinks in sandbox --ctx injection (#25) InjectFiles used os.Stat, which follows symlinks, and only verified that the symlink path itself was under cwd. A symlink committed inside the project pointing at an arbitrary host file (e.g. leak -> /etc/shadow) would have its target copied into the container. Changes: - internal/sandbox/sandbox.go: use os.Lstat in InjectFiles; reject any ctx file that is a symlink. Resolve cwd to an absolute path and use isPathUnder for the relative-path check. - internal/sandbox/sandbox_test.go: add TestInjectFiles_SkipsSymlink. Fixes finding #25 from sec_findings.md. * fix(sec): allowlist sandbox network modes (#26) BuildRunArgs only rewrote 'host' to 'none', but other Docker network modes such as 'container:<name>' were passed through unchanged. That allowed the sandbox to share another container's network namespace and bypass intended network isolation. Changes: - internal/sandbox/sandbox.go: enforce an allowlist of 'none' and 'bridge' for --sandbox-network; reject any other value (including 'container:...') by forcing it to 'none' with a warning. Empty Network defaults to 'bridge'. - internal/sandbox/sandbox_test.go: add tests for container:, bridge, empty, and host network modes. Fixes finding #26 from sec_findings.md. * fix(sec): FD-based API key handoff for Telegram restart child (#28) spawnChild copied os.Environ() and re-injected ODEK_API_KEY, DEEPSEEK_API_KEY, and OPENAI_API_KEY, leaving the key visible in the child process environment (/proc/<pid>/environ) and crash dumps. Changes: - cmd/odek/telegram.go: reuse the existing FD-based key handoff helpers (writeKeyToUnlinkedFile / readKeyFromInheritedFD). telegramCmd reads the key from if present; spawnChild writes the resolved key to an unlinked tempfile and passes the FD to the child via the safe ODEK_API_KEY_FD signal env var instead of the key itself. - cmd/odek/telegram_test.go: replace env-injection test with a ProcAttr interception test that verifies the key is passed via FD 3 and absent from the child env. Fixes finding #28 from sec_findings.md. * fix(sec): harden Telegram document filename sanitization (#30) sanitizeDocName only stripped directory components and rejected empty/. /.. names. It allowed hidden files, arbitrary characters, and very long filenames, so an attacker could send a document named '.bashrc' or an overlong filename that would be saved under ~/.odek/media/. Changes: - internal/telegram/download.go: reject names starting with '.', restrict characters to [A-Za-z0-9._-] (replacing others with '_'), enforce a maxDocNameLen of 200 while preserving the extension, and factor out fallbackDocName. - internal/telegram/download_test.go: add test cases for hidden files, unsafe characters, Unicode names, and overlong names. Fixes finding #30 from sec_findings.md. * Fix #31: decline auto-save of tainted skills; require --force to promote - AutoSaveSuggestions now skips suggestions with untrusted provenance unless allowUntrusted is true. - RunAutoSaveLoop declines tainted skills by default and logs them. - promoteSkill refuses to clear NeedsReview on tainted skills without --force. - CLI help, SECURITY.md, LEARNING.md, README.md, and AGENTS.md updated. - Added/updated tests for promote refusal/force and auto-save decline/allow paths. * Fix #34: escalate interpreters that can invoke shell commands to code_execution - Add embeddedShellInterpreters map for awk/gawk/mawk/nawk, ed/ex, vi/vim/nvim/view, emacs/emacsclient. - Classify their non-version/help invocations as code_execution. - Detect sed 'e' command, -f/--file, and s///e flag as code_execution. - Remove awk from writePrefixes; add embeddedShellInterpreters to isKnownCommandName. - Add regression tests for awk, sed, vim, find -exec bypasses. - Update SECURITY.md and AGENTS.md. * Fix #37: cap MCP server stdout line size to prevent OOM - Replace unbounded bufio.Reader.ReadString with bufio.Scanner limited to 10 MiB per line. - On oversized line, report error to pending callers and close the connection. - Add TestReadLoop_OversizedResponse regression test. * security: harden MCP tools/list metadata trust (#38) - Validate MCP tool names in internal/mcpclient/client.go Discover: ASCII letters/digits/_/-, non-empty, <=64 chars. - Reject raw MCP tool names that shadow odek built-ins in cmd/odek/main.go loadMCPTools, even though prefixed names are normally used. - Add per-tool approval for project-level MCP servers in cmd/odek/mcp_approval.go, persisted in ~/.odek/mcp_tool_approvals.json. - Reuse ODEK_APPROVE_MCP env var for both server and tool approvals. - Add unit tests and E2E coverage for name validation, shadowing, and approval gating. - Update docs/SECURITY.md, AGENTS.md, and docs/MCP.md. * security: cap file sizes in read-only perf tools and inline inputs (#39, #40) - Add 10 MiB size checks to count_lines, checksum, head_tail, and word_count before scanning/hashing, matching other perf tools. - Add maxInlineContentBytes (10 MiB) and enforce it on base64 inline string/content and tr inline content arguments. - Add tests for each new size-cap rejection path. - Update docs/SECURITY.md and AGENTS.md. * test+docs: increase coverage of size-cap changes and keep docs consistent - Add exact-size boundary tests for count_lines, checksum, head_tail, word_count, base64 inline content, and tr inline content. - Add tail-mode and decode-string oversized rejection tests for head_tail and base64 to cover both branches. - Add CHEATSHEET note listing the perf tools with 10 MiB file/inline caps. * security: fix schedule locking/JSON caps and nonce tool-result delimiters (#41-#43) - internal/schedule/store.go: fileLock now returns an error instead of a silent no-op fallback; all mutating callers (Add, Put, Remove, SetEnabled, SaveState) abort when the flock cannot be acquired. - internal/schedule/store.go: readJSON now Stat()s schedule/state files and rejects anything larger than 10 MiB before reading. - internal/loop/loop.go: tool-result delimiter now embeds a per-call random hex nonce in both the opening and closing lines, preventing a tool or MCP server from forging the closing delimiter. - Add/update tests for lock failure, oversized schedule file, exact-size boundary, and nonce uniqueness. - Update docs/SECURITY.md and AGENTS.md. * security: fix parallel_shell/batch_patch/browser/telegram/restart findings (#44-#48) - parallel_shell: bind commands to agent context via CommandContext, run each in its own process group, kill the group on cancellation/timeout, and cap per-command timeouts at 30 minutes. - batch_patch: add trustedClasses field and pass it to CheckOperation for consistent approval behavior with write_file/patch. - browser: wrap clickableRef.URL as untrusted; keep rawURL for internal click resolution. - telegram: enforce MaxMsgLength in UTF-16 code units instead of bytes. - telegram: write restart.json with 0600 instead of 0644. - Add regression tests for all five fixes. - Update docs/SECURITY.md and AGENTS.md. * test+docs: increase coverage of #44-#48 changes and keep CHEATSHEET current - Add TestParallelShell_ContextCancel_Explicit to cover the context.Canceled branch in runOne. - Add TestBrowser_Click_ButtonAcknowledges and TestBrowser_Click_InputSubmitAcknowledges to cover button/submit click paths and increase browser_tool.go coverage. - Update docs/CHEATSHEET.md: parallel_shell process-group kill, browser URL wrapping, Telegram flock/0600 restart marker/UTF-16 length limits. * security: fixes #50-#53 (Telegram MarkdownV2 escape, resource limit cap, WS connection limits, sub-agent progress bounds) - #50: Escape agent-generated text in send_message before Telegram MarkdownV2 send - #51: Cap /api/resources limit to 100 in handler and Registry.Search - #52: Enforce max 20 concurrent WebSocket connections + per-IP upgrade rate limit - #53: Cap sub-agent progress stream at 100k lines / 100 MiB and cancel child on overflow Tests and docs (SECURITY.md, CHEATSHEET.md) updated. * security: harden #54-#59 (subagent cleanup, fsatomic, resource search, schedule perms, config TOCTOU, flock docs) - Scope sub-agent --task file deletion to odek temp files only - Create fsatomic temp files with exact perms via O_EXCL + random suffix - Sanitize resource search query: 256-byte cap + glob metachar escaping - Create schedule directory with 0700 permissions - Read config via single Open+LimitReader to remove size-check TOCTOU - Document advisory flock semantics in package doc - Update SECURITY.md with defenses 43-48 and attack-vector rows

jkyberneees added 7 commits June 13, 2026 11:21

fix: next 5 security vulnerabilities (browser history, search caps, f…

9a1eaa6

…ile-size limits, serve CSRF, untrusted wrappers)

fix: next 5 security vulnerabilities (shell output, browser timeout, …

67e3da7

…batch_patch, transcribe, tree width)

fix: next 5 security vulnerabilities (patch, glob, subagent task, tra…

73582b1

…nscribe output, session_search get)

fix: next 5 exploitable vulnerabilities (resource prompt injection, s…

09e4372

…ession_search wrapping, resource size, subagent summary, patch expansion)

fix: cap write_file content, harden file_info, bound WebSocket/sessio…

db47d8f

…n/skill sizes

fix: wrap tree paths, cap head_tail/AGENTS/IDENTITY, harden search_fi…

8dbba3d

…les symlinks

fix: next 5 exploitable vulnerabilities (base64 wrapping, browser tit…

68078c4

…le/element wrapping + cap, config size cap, resource resolver symlink Lstat)

github-advanced-security AI found potential problems Jun 13, 2026

View reviewed changes

Comment thread internal/resource/resource.go Dismissed

jkyberneees merged commit 19b6986 into main Jun 13, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security hardening: 30 exploitable vulnerabilities (batches 1–7)#38

Security hardening: 30 exploitable vulnerabilities (batches 1–7)#38
jkyberneees merged 9 commits into
mainfrom
fix/next-security-vulnerabilities-tdd

jkyberneees commented Jun 13, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

jkyberneees commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jkyberneees commented Jun 13, 2026

Summary

Scope

Batches

Key security themes

Testing

Documentation

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Uh oh!

jkyberneees commented Jun 13, 2026

🔏 Verification Certificate — AI Verification Protocol v5.2.7

Axes Summary

Signals (qualitative)

Findings & Remediations (auto-applied, additive — §7.1)

Verdict: AutoApprove (single-reviewer caveat → HumanReviewRecommended under a multi-provider policy)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages Bot commented Jun 13, 2026 •

edited

Loading

Verdict: AutoApprove (single-reviewer caveat → `HumanReviewRecommended` under a multi-provider policy)