fix(hermes): route via content-filter proxy; close stale-token windows in proxy + rotator#53
Open
dgokeeffe wants to merge 2 commits into
Open
fix(hermes): route via content-filter proxy; close stale-token windows in proxy + rotator#53dgokeeffe wants to merge 2 commits into
dgokeeffe wants to merge 2 commits into
Conversation
Hermes is the only long-running interactive CLI in CoDA — Claude/Codex/Gemini are re-spawned per turn so they re-read their token from disk on every call, and OpenCode already routes through the local content-filter proxy. Hermes loaded ~/.hermes/config.yaml once at startup, cached the api_key in memory, and silently 403'd on its next turn after the in-process PAT rotator swapped tokens (#issue). setup_hermes.py now points base_url at http://127.0.0.1:4000 instead of the gateway/serving endpoint. The proxy reads ~/.databrickscfg per request (with a short cache) and injects a fresh Bearer, so Hermes' stale in-memory api_key is overridden transparently — same trick OpenCode uses. Also adds PAT_ROTATION_INTERVAL / PAT_TOKEN_LIFETIME env knobs to pat_rotator so e2e tests can compress the rotation cycle without a code change. Prod defaults (900s/600s) unchanged. Test coverage closed too: test_cli_token_rotation.py was missing a TestUpdateHermes class and the omnibus test was literally named "four". Added the class and bumped to "five". Verified end-to-end on daveok: minted PAT, opened hermes chat, sent prompt, waited 90s for two rotations (old tokens ELIMINATED per app logs), sent a second prompt in the same long-running Hermes process — second turn succeeded where it would have 403'd against the unfixed code. Co-authored-by: Isaac
…r skip Two related bugs surfaced while end-to-end testing the Hermes-routing fix (GH-52): 1. **Proxy cache could serve revoked tokens for up to 30s after rotation.** content_filter_proxy._get_fresh_token cached the parsed ~/.databrickscfg value purely on read time, so a rotation that rewrote the file mid-cache was invisible until TTL expired. In prod (10-min rotation) that's a ~5% failure window per cycle. Now stats the cfg file's mtime and invalidates the cache the instant mtime advances; the TTL stays as a backstop for weird filesystem behaviour. 2. **Rotator deadlocked if an idle skip outran the token's lifetime.** The _rotation_loop skipped when session_count == 0, without checking whether the in-process token was approaching expiry. If it expired during the skip, the next attempt to mint a replacement 403'd (we'd authenticate with a dead token) and the rotator was stuck forever. Now still skips when idle but only while the token has > one rotation_interval of life left; otherwise rotates anyway to keep our own auth alive. Tests added for both: tests/test_content_filter_proxy.py (4 cases) plus TestRotationOnNearExpiry in test_pat_rotator.py (2 cases). All 37 tests in the PAT-lifecycle group pass. Closes #52 Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
~/.databrickscfgper request, the same trick OpenCode already relies on.Changes
setup_hermes.pybase_url: http://127.0.0.1:4000instead of the gateway/serving endpoint. Diagnostic banner updated.pat_rotator.pyPAT_ROTATION_INTERVAL/PAT_TOKEN_LIFETIMEenv knobs (prod defaults unchanged). Idle-skip now still rotates when the in-process token is within one rotation interval of expiry — prevents deadlock.content_filter_proxy.py_get_fresh_tokenconsults file mtime, invalidating the cache the instant the rotator rewrites~/.databrickscfg. 30s TTL kept as a backstop.cli_auth.py_update_hermeswas already in place as defence-in-depth.tests/test_cli_token_rotation.pyTestUpdateHermes(3 cases); omnibus test bumped from "four" → "five".tests/test_pat_rotator.pyTestRotationOnNearExpiry(2 cases).tests/test_content_filter_proxy.pyTest plan
pytest tests/test_pat_rotator.py tests/test_cli_token_rotation.py tests/test_content_filter_proxy.py).hermes chat, sent prompt chore(deps): bump opentelemetry-semantic-conventions from 0.62b0 to 0.62b1 #1 ("hello-one"), waited 90s during which two rotations fired (per app logs: "Old token ELIMINATED"), sent prompt chore(deps): bump fastapi from 0.136.0 to 0.136.1 #2 ("hello-two") in the same long-running Hermes process. Both turns succeeded — would have 403'd against the unfixed code.grepinside the container that~/.hermes/config.yamlnow pointsbase_urlathttp://127.0.0.1:4000.Closes #52.
This pull request and its description were written by Isaac.