refactor(auth): introduce pluggable Provider chain (PR1 foundation) by naveen-kurra · Pull Request #77 · initializ/forge

naveen-kurra · 2026-05-22T22:51:58Z

Summary

PR 1 of 7 implementing the pluggable auth provider chain (design: FORGE_AUTHN_PROVIDER_PLUGIN.md).
This PR introduces the foundation — the Provider interface, ChainProvider
composition, factory registry, and migrates the existing --auth-url and
channel-loopback logic onto the new architecture. Zero externally observable
behavior change.

Where this fits in the architecture diagrams

This PR implements the highlighted boxes in the design diagrams.
Boxes marked "later" are scoped to subsequent PRs (see What's not in this PR).

Diagram 1 — Component architecture

Box	Status
`forge-core/auth` package (Provider, Middleware, ChainProvider, Registry, Identity+Context)	✅ this PR
`providers/statictoken`	✅ this PR
`providers/httpverifier`	✅ this PR
`providers/oidc` (Provider, Discovery, JWKSCache)	⏳ PR 2
`types.AuthConfig` (forge.yaml block)	⏳ PR 3
`security.AuthDomains` hook (egress merge)	⏳ PR 3
`runtime.audit` (EventAuthVerify / EventAuthFail)	⏳ PR 4
TUI wizard `step_auth.go`	⏳ PR 5
Web UI `AuthStep`	⏳ PR 6

Diagram 2 — Class diagram

Type	Status	Notes
`Provider` interface (`Name`, `Verify`)	✅
`Identity` struct	✅	Deliberately no `Valid` field — sentinel errors are the contract
`Headers` (case-insensitive)	✅
`ChainProvider`	✅	First-match wins; fail-closed on non-yield errors
`Registry` (`Register` / `Build` / `RegisteredTypes`)	✅	`database/sql` driver pattern
`HTTPVerifierProvider`	✅	wire format preserved byte-for-byte
`StaticTokenProvider`	✅	`crypto/subtle` constant-time compare
`OIDCProvider` + Discovery + JWKSCache + ClaimMap	⏳ PR 2
Sentinel errors: `ErrTokenNotForMe`, `ErrTokenRejected`, `ErrInvalidToken`, `ErrProviderNotConfigured`	✅

Diagram 3 — Startup sequence (config → chain)

Implements the runner → Registry.BuildChain → PrependChain(internal static_token) → install Middleware leg.
The AuthDomains → Egress step is scoped to PR 3.

Diagram 4 — Request verification (chain walk)

Fully implemented (chain walk, WithIdentity, response handling).
The auth_verify / auth_fail audit event emission is scoped to PR 4.

Diagram 5 — OIDC verify deep-dive

Out of scope — entirely in PR 2.

What this PR adds

New files (forge-core/auth/)

provider.go — Provider interface, Identity, Headers, HeadersFromRequest, sentinel errors
chain_provider.go — ChainProvider (first-match wins, fail-closed) + PrependChain helper (flattens nested chains)
context.go — WithIdentity / IdentityFromContext
registry.go — Register / Build / RegisteredTypes with concurrency safety
settings.go — UnmarshalSettings (yaml roundtrip so providers keep typed Config structs)
providers/httpverifier/provider.go — migrated --auth-url logic; same JSON wire format
providers/statictoken/provider.go — constant-time compare; env-var or literal token; defensive identity copy

Modified files

forge-core/auth/middleware.go — gutted from 213 → 130 lines; now delegates to a Provider chain
forge-cli/runtime/runner.go — resolveAuth now returns auth.MiddlewareOptions; new buildLegacyAuthChain translates legacy CLI flags into a chain

Tests

forge-core/auth/provider_test.go — chain ordering, fail-closed semantics, registry concurrency, headers, context
forge-core/auth/middleware_test.go — rewritten to use MiddlewareOptions API
forge-core/auth/providers/httpverifier/provider_test.go — happy path, valid:false, 401/500, network error, org-id precedence, wire-format lock, context cancellation
forge-core/auth/providers/statictoken/provider_test.go — match/mismatch, TokenEnv precedence, defensive copy, race safety, factory
forge-cli/runtime/auth_chain_test.go — 11 e2e tests covering legacy --auth-url, loopback short-circuit, skip paths, wire format, verifier-unreachable

What's NOT in this PR (deliberate)

OIDC provider — PR 2
auth: YAML block in forge.yaml — PR 3
Egress allowlist auto-extension for IdP hosts — PR 3
auth_verify / auth_fail audit events — PR 4
TUI wizard step — PR 5
Web UI wizard step — PR 6
Documentation updates — PR 7
Okta provider — Phase 3 (v0.11.0)

Backward compatibility

Existing config	Behavior after this PR
No auth configured	Anonymous access (unchanged)
`--auth-url https://x/verify`	Synthesizes `http_verifier` chain (unchanged wire format)
`FORGE_AUTH_URL=...` env	Synthesizes `http_verifier` chain (unchanged)
`FORGE_AUTH_ORG_ID=...`	Passed to `http_verifier`'s `DefaultOrg` setting
Channel adapters (Slack/Telegram)	Continue to authenticate via auto-injected `static_token` provider (loopback short-circuits before external verifier)
Existing custom verifier sidecar	Works unchanged — same `POST {url}` with `{token, org_id}` body, same response shape

Verification (all pass locally)

Test plan (manual smoke)

forge run with no auth flags → anonymous access works (curl :8000/healthz → 200)
forge run --auth-url http://localhost:9999/verify with a fake verifier accepting good-token → Bearer good-token accepted, Bearer bad returns 401
Slack/Telegram channel adapter still receives messages and authenticates loopback calls without the external verifier being hit (check verifier access logs are empty during channel events)
Verifier unreachable → request returns 401, not 500

Notes for reviewers

The middleware refactor preserves the existing public skip-paths list byte-for-byte
Identity.Valid was intentionally dropped — the package comment explains why (nil-error return is the only "valid" signal)
ChainProvider is fail-closed on non-yield errors (e.g., a network error from one provider does NOT fall through to the next). This is intentional security posture
The static_token provider returns ErrTokenNotForMe on mismatch (yield), the http_verifier returns ErrTokenRejected (deny) — this difference is what enables the loopback short-circuit in a chain [static_token, http_verifier]

Replaces the monolithic auth middleware with a Provider interface, ChainProvider composition, and a database/sql-style factory registry. Existing --auth-url logic is migrated into the new http_verifier provider; the implicit channel-loopback token becomes a static_token provider injected at the chain head. No externally observable behavior change. Backward compatible with all existing --auth-url / FORGE_AUTH_URL / FORGE_AUTH_ORG_ID flags and env vars; channel-adapter loopback continues to short-circuit before the external verifier is consulted. Why: PR2 (OIDC), PR3 (forge.yaml auth: block), and Phase 3 (Okta) plug in via Register() — no further changes to middleware or runner. Tests: - 94.3% combined coverage across forge-core/auth/... - 11 e2e tests in forge-cli/runtime/auth_chain_test.go verifying legacy --auth-url, loopback short-circuit, skip-paths, wire format - -race clean across all touched packages

Implements auth.Provider for any OIDC-compliant issuer (Auth0, Keycloak, Azure AD, Google Workspace, Ping, JumpCloud, Okta in OIDC-only mode). Security posture: - Reject alg=none and HMAC algorithms (HS256/384/512) outright - Whitelist asymmetric algs: RS256/384/512, PS256/384/512, ES256/384/512 - Algorithm derived from JWKS-declared alg, NOT token header — defends against algorithm-confusion attacks - iss exact match, aud contains check with optional azp == ClientID fallback, exp/nbf with configurable ClockSkew Performance: - Lazy discovery + JWKS load (no network on New) - Discovery cached forever (per OIDC spec, issuer is stable) - JWKS cache with sync.RWMutex hot-path; refresh on unknown kid - Stampede protection: refetch-grace window suppresses repeated JWKS fetches for the same bad kid Why: PR3 (forge.yaml auth: block) can now plug `type: oidc` directly into the auth.providers list. Phase 3 (Okta) composes oidc.Provider to add introspection + group sync without rewriting JWT verification. Tests (coverage 83.3% on oidc, 88.0% combined auth): - Happy paths: RS256, ECDSA P-256 - Security: alg=none reject, HMAC reject, alg-confusion reject, missing-kid reject, HMAC-in-JWKS reject - Standard claims: wrong iss, wrong aud, azp fallback, expired exp, not-yet-valid nbf, ClockSkew tolerance, missing exp, missing aud, multi-audience array - JWKS: lazy load, rotation triggers refresh, unknown kid after refresh, stampede prevention, malformed RSA key skipped, 500 response error - Discovery: issuer-mismatch rejected, JWKSURL override bypasses discovery - ClaimMap: defaults, custom names, X-Org-ID header override, groups as single-string tolerated - Concurrency: 50 parallel Verify calls under -race - Context: respects cancellation - Factory: registers as "oidc", forward-compat with unknown YAML keys No regressions: - All forge-core packages pass (28/28) - All forge-cli packages pass (14/14) - All forge-plugins packages pass (4/4) - Race detector clean across auth/... + cli/runtime - gofmt + go vet clean - Anti-pattern greps clean (no http.DefaultClient, no plain == on tokens, no token/secret in logs, no disallowed algs accepted)

Adds the typed AuthConfig schema, validation, and the AuthDomains egress hook. The runner now builds the provider chain via auth.Registry.BuildChain(cfg.Auth.Providers) when forge.yaml has an auth: block, falling back to the legacy --auth-url path otherwise. Schema: auth: required: true providers: - type: oidc name: corporate-sso settings: issuer: https://login.example.com audience: api://forge Precedence: --no-auth → anonymous (unchanged) forge.yaml auth.providers → Registry.BuildChain --auth-url / FORGE_AUTH_URL → legacy http_verifier nothing → loopback-only chain When both forge.yaml auth: and --auth-url are configured, the YAML block wins and a warning is logged. Loopback static_token is always prepended for channel-adapter callbacks. Egress integration: security.AuthDomains(cfg.Auth) returns issuer/verifier hosts. The runner merges these into the egress allowlist BEFORE the enforcer is constructed — without this, an OIDC issuer configured in forge.yaml would be silently blocked at JWKS-fetch time. Validation: validate.ValidateAuthConfig is called from ValidateForgeConfig. Each provider type's required settings are checked at `forge validate` time (not just at runtime construction): - oidc: issuer + audience - http_verifier: url - static_token: token or token_env (literal token warns) - unknown type: error with list of known types Side-effect import: cmd-side runner.go imports providers/oidc as `_` so auth.Build("oidc", settings) succeeds via factory registration. Tests (88.6% validate, 90.8% security, 88% combined auth): - Validation: every provider type, missing required fields, unknown type, missing type, duplicate names warning, ValidateForgeConfig integration - AuthDomains: oidc issuer, http_verifier URL, JWKS-URL override, multi-provider dedup, static_token contributes nothing, malformed URLs skipped, port stripping, unknown provider returns nothing - Runner: buildChainFromConfig (empty, single provider, ordered chain with first-match-wins, invalid type, factory error), buildLegacyHTTPVerifierChain, buildUserAuthChain precedence rules (neither / only legacy / only YAML / both prefers YAML / oidc via registry) No regressions: - All forge-core packages pass (28/28) - All forge-cli packages pass (15/15) - All forge-plugins packages pass (4/4) - Race detector clean across auth/.../validate/security/runtime - gofmt + go vet clean - Anti-pattern greps clean (no DefaultClient, no token-in-logs, no plain == on tokens, oidc uses injected client only)

golangci-lint v2.10.1 flagged removeKey as unused — it was scaffolded for a key-rotation test case that was implemented differently (by adding a new key without removing the old). Dropping the dead code keeps CI lint green.

Replaces the thin {method, path, remote_addr} payload of the legacy auth_success/auth_failure events with structured fields suitable for operators to dashboard and alert on, without leaking PII. Events: auth_verify (success) { provider, user_id, org_id, groups_count, token_kind, method, path, remote_addr } auth_fail (failure) { reason, token_kind, method, path, remote_addr } Reason codes (stable contract — changing is a breaking change for downstream consumers): missing_token no Authorization header rejected provider recognized + denied (revoked, expired, wrong iss/aud) invalid token malformed / cryptographically invalid not_for_me chain exhausted, no provider claimed the token infrastructure transient provider error (network, etc.) Privacy invariants enforced by test: - email, claims, token bytes NEVER appear in either event - test asserts the literal absence of email / SSN / secret values from the serialized audit line Interface changes: auth.MiddlewareOptions.OnAuth signature widened from func(*http.Request, bool) to func(*http.Request, *auth.Identity, error, string) so the caller can read the verified principal + chain error + token kind, instead of just a bool. Only caller (runner.go) updated. New: auth.ErrMissingBearer sentinel for the no-header case so callers can distinguish missing-token from chain errors. New: auth.TokenKind(token) returns "jwt" | "opaque" | "empty" — cheap structural classification suitable for logging. Backward compat: AuditAuthSuccess and AuditAuthFailure constants kept as deprecated aliases (point to the new "auth_verify"/"auth_fail" strings) so any consumer that grep'd for the old constants compiles but emits the new string. Scheduled for removal in v0.11.0. Tests (100% coverage on TokenKind, makeAuthAuditCallback, authFailReason): - TokenKind: jwt/opaque/empty/edge-cases - Middleware: OnAuth receives (Identity, err, kind) on every decision - Middleware: ErrMissingBearer surfaces on no-header requests - Audit: auth_verify carries provider/user_id/org_id/groups_count/kind - Audit: auth_fail carries stable reason codes for each sentinel - Audit: PII-leak test (email + secret claims never appear in payload) - Audit: CorrelationID propagates from request context - E2E: full middleware → callback → audit log path No regressions: - All forge-core packages pass (28/28) - All forge-cli packages pass (15/15) - All forge-plugins packages pass (4/4) - go test -race clean across affected packages - golangci-lint v2.10.1: 0 issues - gofmt + go vet clean

Adds the Authentication step to the `forge init` wizard so users can configure the auth chain without hand-writing forge.yaml. Same surface is exposed via non-interactive --auth* flags for CI/script use. Interactive flow (between Egress and Review steps): 🔐 Authentication ○ None — anonymous (default) ○ OIDC (JWT) — Auth0, Keycloak, Azure AD, Google, Okta-OIDC, … ○ HTTP Verifier — legacy external /verify endpoint ○ Custom — write a commented stub, edit forge.yaml manually OIDC sub-flow: Issuer → Audience → Groups Claim (defaults to "groups") HTTP Verifier sub-flow: URL → Default org_id (optional) None / Custom: pick and continue, no inputs Non-interactive flags: --auth=none|oidc|http_verifier|custom --auth-issuer=<url> (required with --auth=oidc) --auth-audience=<aud> (required with --auth=oidc) --auth-url=<url> (required with --auth=http_verifier) --auth-default-org=<org> (http_verifier, optional) --auth-groups-claim=<name> (oidc, default: groups) Scaffold output: - Adds auth: block to forge.yaml (alphabetical keys for stable diffs) - Auto-merges the issuer/verifier host into egress.allowed_domains - For custom mode: writes a commented-out template stub instead - Settings are pre-rendered as YAML in Go (init_auth.go) so nested maps like claim_map: {groups: roles} emit correctly without template helpers UX details: - "None" is the first option; one Enter to skip the entire step - Backspace on empty input returns to the picker (consistent with other wizard steps) - Live validation on issuer/URL fields: must be parseable + http(s) - The wizard's existing egress preview displays the auth-host auto-add in the final review step No runtime behavior change — PR3 already wired forge.yaml auth: into the runner. This PR is pure UX: easier to author the YAML. Tests (TUI step + scaffold helpers): - All 4 mode flows: None (default), Custom (stub), OIDC (full + custom groups claim), HTTP Verifier (with/without default org) - Issuer trailing-slash trimmed; groups claim suppressed at default - AuthEgressHosts populated correctly per mode - Apply() correctness for each mode - Summary text per mode - renderAuthBlock: empty/none/custom/oidc/http_verifier shapes, nested claim_map, deterministic key ordering, name suppression when same as type - buildAuthFromFlags: happy paths, missing required fields, unknown modes, claim-map suppression at default - mergeEgressDomains: dedupe + order preservation - End-to-end: `forge init --auth=oidc` produces a forge.yaml that passes `forge validate` - E2E negative: `--auth=oidc` without --auth-issuer fails with the expected error message No regressions: - All forge-core packages pass (28/28) - All forge-cli packages pass (15/15) - All forge-plugins packages pass (4/4) - golangci-lint v2.10.1: 0 issues - gofmt + go vet clean

Adds the matching Authentication step to the Web UI wizard (`forge ui`) so users without a terminal can configure the auth chain through the browser-based create-agent flow. Same backend contract as PR5 — both surfaces feed the single scaffold path that writes forge.yaml. Backend changes: - forge-ui/types.go: new AuthCreateOptions (mode + settings) field on AgentCreateOptions; new AuthProviderTypeMeta + AuthProviderTypes on WizardMetadata. - forge-ui/handlers_create.go: GET /api/wizard/meta now returns the four founding auth provider types so the frontend renders the picker from server-driven metadata (adding Okta in Phase 3 = one line here, no frontend change). - forge-cli/cmd/ui.go: createFunc translates opts.Auth into the initOptions fields the scaffold expects (AuthMode, AuthSettings, AuthEgressHosts). - forge-cli/cmd/init_auth.go: new authEgressHostsFromSettings helper extracts issuer/verifier hosts for the Web UI path (same hosts the TUI wizard and --auth flags compute). Frontend changes (forge-ui/static/dist/app.js): - WIZARD_STEPS extended from 9 to 10 entries (Authentication slotted between Env Vars and Review). - form.auth = {mode, settings} added to initial state. - Step renders a radio group fed by meta.auth_provider_types, with conditional OIDC / HTTP Verifier sub-forms. - canNext validation: OIDC requires issuer + audience; HTTP Verifier requires url. - buildAuthPayload reshapes the wizard's flat groups_claim into the nested claim_map: {groups: ...} shape before submission; strips empty optional fields and trims trailing slashes from issuer. - Review step adds an "A2A Auth" row with a one-line summary. UX details: - None is the default — anonymous unless the user picks otherwise. - Switching modes resets settings so stale fields from a previously selected mode don't leak into the submitted payload. - Custom mode writes a commented stub the user can edit later. Backward compat: - Requests without `auth` continue to work (anonymous access). - AuthCreateOptions.Settings uses `map[string]any` so future provider types add fields server-side without breaking older payloads. Tests: - handleGetWizardMeta now asserts all 4 auth provider types are exposed with non-empty label + description. - New TestHandleCreateAgent_WithAuthPayload: AgentCreateOptions.Auth round-trips through the JSON boundary with correct settings. - New TestHandleCreateAgent_WithoutAuthPayload: omitted auth field keeps Auth nil (no spurious anonymous-mode payload). - New authEgressHostsFromSettings tests: oidc, oidc+jwks_url, http_verifier, no-network modes, nil settings, malformed URL. - End-to-end smoke (manual): forge ui POST /api/agents with auth payload → generated forge.yaml has auth: block + login.example.com auto-added to egress.allowed_domains. No regressions: - All 4 modules pass (forge-core, forge-cli, forge-plugins, forge-ui) - go test -race clean on touched packages - golangci-lint v2.10.1: 0 issues - gofmt + go vet clean

initializ-mk · 2026-05-23T17:56:00Z

Code Review

Reviewed against the two-phase plan (Phase 1: static_token + http_verifier with --auth-url backward compat; Phase 2: OIDC + Okta). Per the decision to ship both phases together, findings are consolidated below in priority order.

Architecture and security defenses are largely sound. The items below are fixable without redesign.

Blockers — security / correctness

1. JWKS cache TTL is dead code

forge-core/auth/providers/oidc/jwks.go:4877–4881 — ttl/lastFetch are set but never read. Rotated or compromised keys never expire until process restart. The doc claims "we refresh on miss," but a key removed from the IdP's JWKS never invalidates locally. Either implement TTL-driven refresh or drop the field. High severity.

2. Issuer trailing-slash inconsistency

forge-core/auth/providers/oidc/provider.go:5276 vs forge-core/auth/providers/oidc/discovery.go:4747. newDiscovery trims / from the configured issuer, but jwt.WithIssuer and the discovery-doc compare use mixed-slash forms. Any IdP that emits iss with a trailing slash gets spurious rejections. Fix: normalize in Config.New so all comparisons use the same form.

3. Middleware silently disables auth when `Chain == nil`

forge-core/auth/middleware.go:2895 returns a no-op wrapper. A misconfigured runner that forgets to wire a chain silently runs unauthenticated with no panic. Fail-loud (panic or require explicit AnonymousChain) is safer.

4. `--no-auth` + YAML `Required: true` silently grants anonymous access

forge-cli/runtime/runner.go:1753 short-circuits before checking whether cfg.Auth.Required is set or cfg.Auth.Providers is non-empty. A user who set auth: { required: true, providers: [...] } and ran --no-auth gets anonymous access with no warning. At minimum warn; refuse outright when Required: true.

5. JWKS refresh failure does not update cooldown

forge-core/auth/providers/oidc/jwks.go:4933 — when refreshLocked errors, lastMissRefresh is not updated. Every subsequent unknown-kid request hammers the IdP again. For a flapping IdP, this is a DDoS amplifier. Update lastMissRefresh (or a separate lastFailedRefresh) on error too.

6. httpverifier classifies 5xx/network errors as `ErrInvalidToken`

forge-core/auth/providers/httpverifier/provider.go:4227, 4239. Fail-closed direction is correct, but the audit signal is misleading: operators reading audit logs see "invalid token" when the IdP is down. Consider a separate ErrProviderUnavailable.

7. `auth_domains.go` strips port via `u.Hostname()`

forge-core/security/auth_domains.go:88 — for issuers like https://login.example.com:8443, only the hostname is added to the egress allowlist. Verify the enforcer is port-agnostic, or return u.Host (host:port).

Blockers — process / maintainability

8. `forge-ui/static/dist/app.js` hand-edited under `dist/`

2756-line bundle file with no source file, no package.json, no bundler, no sourcemap in the repo. 169 lines of new auth UI live in a path that conventionally signals "generated artifact, don't edit." Either rename to static/app.js to remove the misleading dist/ directory, or introduce a real source folder + minimal esbuild step. Will only get worse as more wizard steps land.

9. Web UI does not validate auth payload server-side

forge-ui/handlers_create.go:135–160 decodes JSON, checks only name and model_provider, then hands raw opts.Auth (including arbitrary Settings map[string]any) to CreateFunc. validate.ValidateAuthConfig exists but is not invoked on this path. A buggy client could write empty issuer/audience into YAML and only fail at forge run. Cheap fix: call ValidateAuthConfig in handleCreateAgent before delegating.

10. Loopback "ALWAYS prepended" doc overstates the code

forge-cli/runtime/runner.go:1744 — the prepend is conditional on r.authToken != \"\". Currently works in practice because ResolveAuth always mints a token, but any future refactor that short-circuits that path will silently break channel adapters. Tighten doc to "whenever an internal token was minted" or unconditionally mint one before the prepend.

Smaller bugs

statictoken length leak — forge-core/auth/providers/statictoken/provider.go:6567. subtle.ConstantTimeCompare is not constant-time across differing lengths. Worth a comment or hash-then-compare.
OIDC PSS default-when-empty alg — forge-core/auth/providers/oidc/jwks.go:5028–5031. JWKS entry with no alg field defaults to RS256; legitimate PS-signed tokens mismatch the cross-check at provider.go:5348.
mergeEgressDomains breaks sort order — forge-cli/cmd/init.go:1048. deriveEgressDomains returns sorted, then auth hosts get appended unsorted at the end. Diff stability suffers.
renderAuthBlock name parameter is dead — forge-cli/cmd/init.go:1053 calls with mode twice, so name: never renders. Either drop the parameter or wire a real name source.
TUI esc mid-input aborts the whole wizard — forge-cli/internal/tui/wizard.go:131–135. Backspace-on-empty is the documented escape hatch but undiscoverable.
Identity.Claims is unfiltered shallow copy of JWT — forge-core/auth/providers/oidc/claims.go:4612. Custom claims could leak into downstream consumers. Worth a doc warning.

Test gaps

Warning-log assertion on YAML/--auth-url precedence — forge-cli/runtime/auth_config_test.go:225 uses a nop logger that discards the Warn. The "warns when both configured" half of the contract is unasserted. Capture warnings into a slice-logger.
No test for --no-auth + YAML auth precedence (Bug Tavily web search, Bubble Tea wizard, and bug fixes #4).
No test that loopback static_token is actually prepended in the full resolveAuth flow with YAML auth — only buildLegacyAuthChain is exercised in auth_chain_test.go.
No TestAuthDomains in forge-core/security/ at all. auth_domains.go has zero direct unit tests despite being the egress contract — add coverage for bare hostname, port, malformed URL, dedup.
No web-handler test for buildAuthPayload translating flat groups_claim → nested claim_map. Since app.js is hand-edited, this is risky.
No wizard test for backspace-to-picker behavior in step_auth.go.
No test for invalid URL submission on the OIDC issuer wizard step (validation lives in validateHTTPSURL, tested standalone, but not as a wizard interaction).
renderAuthBlock YAML quoting — writeYAMLMap (forge-cli/cmd/init_auth.go:194) writes key: value raw. Values containing : followed by space, #, leading !, &, * could break the YAML. Low risk but a hardening gap with no test.

What's strong (worth keeping as-is)

Alg-confusion defense is layered and correct — three independent checks: WithValidMethods at parser construction, isAllowedAlg pre-JWKS, and key.Alg != headerAlg cross-check. Alg sourced from JWKS, not token header.
HMAC and alg=none rejected at three layers including JWKS load time — even a tampered JWKS source can't inject HMAC keys.
ChainProvider fail-closed semantics — exactly what the docstring promises (chain_provider.go:2685–2691); PrependChain correctly flattens nested chains.
PII negative-substring assertion in audit test — forge-cli/runtime/auth_audit_test.go:123–154 is exactly the right shape, including key-name assertions (\"claims\", \"email\").
Single source-of-truth scaffold path — TUI, --auth flags, and Web UI all converge through initOptions.{AuthMode,AuthSettings,AuthEgressHosts}. Server-driven auth_provider_types metadata in handlers_create.go:124–129 makes adding Okta in Phase 3 a one-line server change.
Validation cleanly decoupled from provider implementations — forge-core/validate/auth.go:9–15 comment explains the trade-off well. Keeps validate import-free of crypto.
Pre-rendering YAML in Go rather than template helpers — right call for nested maps (e.g., claim_map). Alphabetical sort makes diffs stable.
Test coverage — auth/ 94.9%, httpverifier 90.9%, statictoken 96.6%, oidc 83.3%. Race-clean across the board.

Verified PR claims (all check out)

alg=none and HMAC algorithms rejected at three layers ✓
Algorithm derived from JWKS-declared alg, not token header (alg-confusion defense) ✓
iss exact match, aud contains check with azp == ClientID fallback ✓
exp/nbf with configurable ClockSkew ✓
JWKS cache with sync.RWMutex hot path ✓
Stampede protection via refetch-grace ✓ (but does not cover refresh failures — see Resolve Custom Tool Entrypoint Relative to Working Directory in Runner #5)
Constant-time compare for static_token ✓ (with caveat — see smaller bugs)
ChainProvider fail-closed on non-yield errors ✓
No http.DefaultClient, no token in logs, no plain == on tokens ✓
Backward compatibility: --auth-url, FORGE_AUTH_URL, FORGE_AUTH_ORG_ID, channel adapter loopback all preserved ✓
Deprecated AuditAuthSuccess/AuditAuthFailure aliases point to new event names ✓
ErrMissingBearer sentinel exists and distinguishes missing-header from chain errors ✓
TokenKind() returns \"jwt\" | \"opaque\" | \"empty\" without leaking token bytes ✓
Side-effect import _ \"…/providers/oidc\" in runner.go:24 ✓

Before this fix the JWKS cache's ttl/lastFetch fields were set but never read — the cache only refreshed on unknown-kid lookups. Real-world IdPs revoke keys by REMOVING them from the JWKS document, which means a stolen private key remained trusted by Forge until process restart. The "refresh-on-miss" semantics handle key ROTATION (new kid arrives) but not key REVOCATION (kid silently disappears). Reviewer flagged: forge-core/auth/providers/oidc/jwks.go — high severity. Fix: - Split refresh-time state into two fields: lastSuccessful — drives TTL accounting; only advanced on success lastAttempt — drives error backoff; advanced on every attempt This stops a failed refresh during an IdP outage from silently extending trust in already-cached keys. - Get() now triggers a refresh when (now - lastSuccessful) > ttl, on the request path. A stale cache treats every kid as a miss so the refresh actually runs. - backoffActive() suppresses retry storms during JWKS outages (refetchGrace window — 30s default). - Successful refresh clears lastMissRefresh, so a previously-unknown kid can be re-attempted immediately after a successful reload. - Injectable clock (`now func() time.Time`) so tests exercise TTL expiry without time.Sleep. Tests (all -race clean): TestJWKS_TTLExpiryForcesRefresh Same kid, second call within TTL → no refetch. Advance virtual time past TTL → refetch happens. TestJWKS_RevokedKeyEventuallyRejected Sign with key A → accept. IdP removes key A from JWKS. Within TTL → still accepted (acceptable revocation lag). Past TTL → refresh picks up new JWKS → token rejected (ErrInvalidToken). TestJWKS_FailedRefreshDoesNotExtendTTL Warm cache → fail JWKS → past TTL → refresh fails (returns error). Endpoint recovers → past refetchGrace → next call retries refresh successfully (proves lastSuccessful was NOT bumped by the failure). Misleading doc comment removed; new struct comment documents the two-timestamp design and exact refresh triggers. Backward compat: no API changes. Default TTL (1 hour) and min TTL (5 min) clamp unchanged. Phase 1 OIDC behavior preserved otherwise. Verification: go test -race ./forge-core/auth/providers/oidc/ — all green go test -count=1 forge-core/... forge-cli/... forge-plugins/... forge-ui/... — 47/47 packages pass golangci-lint v2.10.1 — 0 issues

…itializ#2) Before: newDiscovery trimmed the configured Issuer, but the discovery-doc check compared the trimmed value against the IdP's RAW emitted form, and jwt.WithIssuer received the user's RAW configured form. Result: any IdP that emits iss with a trailing slash (Auth0) was spuriously rejected against a config without slash, and vice versa. Reviewer flagged: provider.go vs discovery.go. Fix: - New Config.normalize() trims trailing slash from Issuer; called once in New() so every downstream comparison uses the same canonical form. - Discovery comparison now trims meta.Issuer before checking against the already-trimmed d.issuer. - JWT iss validation: replaced jwt.WithIssuer (strict equality, no trim option) with a manual checkIssuer that trims both the token's iss claim and the configured Issuer before comparing. Security: trimming a single trailing slash cannot collapse distinct issuers — "https://attacker.example" and "https://legit.example" remain distinct after trim. The trim is normalization, not weakening. TestIssuer_GenuineMismatchStillRejected guards this invariant. Backward compat: existing configs continue to work; new configs with trailing slashes now also work. JWT iss claim with trailing slash now also works. Tests: TestIssuer_ConfigWithoutSlash_TokenWithSlash (Auth0-shaped iss + Okta-shaped config) TestIssuer_ConfigWithSlash_TokenWithoutSlash (reverse) TestIssuer_BothWithoutSlash_StillWorks (regression — common case) TestIssuer_BothWithSlash_StillWorks (regression — Auth0 paired) TestIssuer_GenuineMismatchStillRejected (security guard) TestIssuer_TokenMissingIssClaim (no iss → ErrTokenRejected) TestDiscovery_TrailingSlashTolerated (discovery doc has slash, config doesn't) Verification: go test -race ./forge-core/auth/providers/oidc/ — all green full sweep — 47/47 packages pass golangci-lint v2.10.1 — 0 issues

…ializ#3) Reviewer flagged: forge-core/auth/middleware.go silently returned a no-op wrapper when opts.Chain == nil. A misconfigured runner that forgot to wire a chain would silently serve unauthenticated requests in production. High severity — the open-prod-endpoint failure mode is the highest-impact misconfiguration in the auth subsystem. Fix: - New MiddlewareOptions.AllowAnonymous explicit opt-in (default false). - Middleware() panics at construction when Chain == nil && !AllowAnonymous. Fails loudly at startup, not silently at the first real request. - Runner updated to set AllowAnonymous=true in the two legitimate anonymous paths: 1. --no-auth flag (operator explicit opt-in) 2. No auth: block AND no --auth-url AND no channels (legacy local dev default — preserved for backward compat) Backward compat: - Existing deployments with a configured chain — unchanged. - Deployments with --no-auth — unchanged (runner sets the flag). - Deployments that ran anonymous by accident — now panic at startup with a clear, actionable message naming the flag to set. This IS a behavior change, by design. Tests (all -race clean): TestMiddleware_NilChainPanicsWithoutAllowAnonymous Asserts the panic happens AND the message mentions AllowAnonymous so operators know how to fix it. TestMiddleware_NilChainWithAllowAnonymousPassesThrough Counterpart: explicit opt-in works. TestMiddleware_NonNilChainIgnoresAllowAnonymous AllowAnonymous is only consulted when Chain is nil; a configured chain always enforces auth. The existing PR1 "nil chain passes through" test updated to use the new AllowAnonymous flag. The E2E TestE2E_NoAuthConfigured_AnonymousAccess test driver sets AllowAnonymous=true automatically when given a nil chain. Verification: go test -race ./forge-core/auth/... ./forge-cli/runtime/ — all green full sweep — 47/47 packages pass golangci-lint v2.10.1 — 0 issues

…initializ#4) Reviewer flagged: a user who configures forge.yaml with 'auth: { required: true, providers: [...] }' and then runs the agent with --no-auth gets anonymous access silently. The flag and the YAML block were treated independently. The --no-auth happy-path short-circuit returned an anonymous middleware before consulting cfg.Auth at all. This is the second-highest-impact misconfiguration class in the auth subsystem (open prod endpoint by mistake), and it's worse than the nil-Chain case (initializ#3) because here the operator deliberately wrote 'required: true' into their config. Fix (graduated severity): auth.required == true + --no-auth → startup error (refuse) providers populated + --no-auth → warning (operator intent unclear; we proceed but make the override visible) empty auth block + --no-auth → unchanged (anonymous, no log) Error message names what's wrong AND lists the three ways to fix it: - remove --no-auth - set 'auth.required: false' - delete the 'auth:' block Warning includes provider count so dashboards can alert. Tests (all -race clean): TestResolveAuth_NoAuthWithRequiredFails auth.required=true + --no-auth → error; message contains "--no-auth conflicts with". TestResolveAuth_NoAuthWithProvidersWarns providers set, required=false + --no-auth → no error, warning emitted containing "--no-auth overrides". TestResolveAuth_NoAuthWithEmptyYAMLConfig_NoWarning Regression: --no-auth alone (no auth: block) still works without spurious warning. TestResolveAuth_NoAuthWithRequiredFalseAndProviders_WarnsNotFails Explicit Required: false is respected — warn, don't refuse. New recordLogger test helper captures Warn/Info calls so assertions can check what the runner emitted; mirrors the no-op nopLogger pattern already in use. Backward compat: - --no-auth alone (no auth: block): unchanged - --no-auth + providers + no required: warns, still anonymous - --no-auth + required: true: BREAKING — refuses to start. This is intentional; the previous behavior silently exposed a "required auth" deployment as anonymous, which the user explicitly tried to prevent in their forge.yaml. Verification: go test -race ./forge-core/auth/... ./forge-cli/runtime/ — green full sweep — 47/47 packages pass golangci-lint v2.10.1 — 0 issues

initializ-mk · 2026-05-23T18:35:58Z

Re-review after fix commits

Verified the 4 fix commits against the working tree and ran go test -race ./forge-core/auth/... ./forge-cli/runtime/ — all green.

✅ Fixed and verified

#	Item	Commit	Verdict
1	JWKS TTL dead code	`379d578`	Correct. Two-timestamp design (`lastSuccessful`/`lastAttempt`) implemented as documented. Failed refresh does NOT advance `lastSuccessful` — verified at `jwks.go:230–290`. `TestJWKS_FailedRefreshDoesNotExtendTTL` (`provider_test.go:1000`) is the strongest assertion. Injectable clock works.
2	Issuer trailing slash	`3a7a131`	Correct. `Config.normalize()` (`provider.go:113–116`) called in `New()`. Manual `checkIssuer` trims both sides. `TestIssuer_GenuineMismatchStillRejected` guards the security invariant. All 7 tests present.
3	Middleware nil-chain panic	`19f7251`	Correct. Panic at construction (`middleware.go:90–95`) with `AllowAnonymous` named in message. Runner wires the flag in both legitimate paths (`runner.go:1778–1781`, `1810–1815`). 3 tests present including the "configured chain ignores AllowAnonymous" guard.
4	`--no-auth` + YAML cross-check	`b33522d`	Correct. Three-way severity at `runner.go:1753–1782`. Error message lists all 3 remediation paths. Warning includes provider count. 4 tests + `recordLogger` helper present.
5	JWKS refresh-failure cooldown	(subsumed by #1)	Fixed as side-effect. `errBackoff` check at `jwks.go:175` suppresses unknown-kid stampedes after a failed refresh within `refetchGrace`. Note: during the backoff window, even valid-but-uncached kids are denied — deliberate trade-off, worth a release note.
—	`TestAuthDomains` unit tests	(added)	`forge-core/security/auth_domains_test.go` — 9 tests, 136 lines.

❌ Still outstanding

Blockers

httpverifier 5xx classification — forge-core/auth/providers/httpverifier/provider.go:131, 143 still wrap transport errors and non-200/non-401 responses (including 5xx) as ErrInvalidToken. No ErrProviderUnavailable sentinel introduced. Audit signal is wrong when the verifier is down.
auth_domains.go port stripping — forge-core/security/auth_domains.go:88 still calls u.Hostname(). This one needs a decision, not just a fix: TestAuthDomains_PortStripped (auth_domains_test.go:112–125) now asserts want := []string{"localhost"} for http://localhost:8080/... with message "port must be stripped" — so this is now codified as intended behavior. Please confirm: is the egress matcher port-agnostic? If yes, the test + code are correct and this can be closed. If no, an issuer on :8443 will be silently blocked at JWKS-fetch time and the fix needs to flow through both auth_domains.go and the test.
forge-ui/static/dist/app.js — still hand-edited under dist/ with no src/, no package.json, no build script, no README. The maintenance trap is unchanged.
Web UI server-side validation — forge-ui/handlers_create.go:135–175 still only checks Name/ModelProvider before delegating. validate.ValidateAuthConfig is not called on opts.Auth despite being importable. A client can post empty issuer/audience and only fail at forge run.
Loopback "ALWAYS prepended" doc — forge-cli/runtime/runner.go:1744 doc still says "ALWAYS prepended" while the code at :1792 is conditional on r.authToken != "". Either tighten the doc or make the code unconditional.

Smaller bugs

statictoken length-leak comment — forge-core/auth/providers/statictoken/provider.go:100 unchanged. One-liner comment about subtle.ConstantTimeCompare's length-mismatch fast-path.
mergeEgressDomains sort stability — forge-cli/cmd/init_egress.go:25–43 preserves insertion order, no sort.Strings on the result. forge.yaml allowed_domains stays "sorted base + appended auth hosts."
renderAuthBlock dead name parameter — forge-cli/cmd/init.go:1053 still calls renderAuthBlock(opts.AuthMode, opts.AuthMode, ...), so the name != mode branch at init_auth.go:151 is dead.

Test gaps

Warning-log assertion on YAML/--auth-url precedence — forge-cli/runtime/auth_config_test.go:226 (TestBuildUserAuthChain_BothSourcesPrefersYAML) still uses nopLogger{}. The new recordLogger helper (added at line 349 for the --no-auth cross-check tests) wasn't retrofitted onto this test. Cheap to fix.
Web-handler test for buildAuthPayload → claim_map transformation — still no Go-side test. The translation only lives in hand-edited app.js, which has no test coverage.

Summary

The 4 top blockers are cleanly resolved. The TTL fix elegantly subsumes the refresh-cooldown concern (#5). 5 blockers + 3 smaller bugs + 2 test gaps remain. None of the outstanding items are crypto/correctness in the JWT/JWKS path — they're now concentrated in the egress, audit signal, web UI surface, and maintenance areas.

Most actionable next pass: (a) decide on the port-strip question (#2), (b) add server-side validation in handlers_create.go (#4), (c) decide the dist/app.js story (#3). The rest are quick mechanical fixes.

…nitializ#5) Reviewer flagged: when refreshLocked errors, lastMissRefresh is not updated → every subsequent unknown-kid request hammers the IdP again → a flapping IdP becomes a DDoS amplifier (N callers × broken IdP = N IdP hits per request burst). The reviewer was looking at the pre-fix-initializ#1 code. Fix initializ#1 (commit 379d578) addressed this concern through a DIFFERENT mechanism: it added a lastAttempt + lastErr pair and a backoffActive() guard, so any refresh error (not just unknown-kid-after-refresh) suppresses subsequent refresh attempts inside the refetchGrace window. This commit makes the no-amplifier property explicit by: 1. Adding TestJWKS_FailedRefreshDoesNotHammerIdP — fires 50 unknown-kid requests during a backoff window and asserts the JWKS endpoint receives exactly ONE hit (the initial refresh that failed). Then advances time past refetchGrace and asserts the next request triggers EXACTLY ONE retry — and the burst after that retry stays capped at one IdP call again. 2. Adding a doc comment on backoffActive() that names the DDoS-amplifier concern explicitly and references the test that pins the contract, so future maintainers don't accidentally remove the gate. No code behavior change — fix initializ#1 already produced the right behavior; this commit makes the test coverage match. Verification: go test -race ./forge-core/auth/providers/oidc/ — green (incl. new test) full sweep — 39/39 packages pass golangci-lint v2.10.1 — 0 issues

…nitializ#6) Reviewer flagged: httpverifier classified 5xx + network errors as ErrInvalidToken. Fail-closed direction was correct, but the audit signal was misleading — operators reading audit logs saw "invalid token" alerts firing when the actual problem was the IdP being down. Different response, different runbook, distinct alert. Fix: - New sentinel auth.ErrProviderUnavailable. Same chain behavior as ErrInvalidToken (stop, fail-closed, 401 response), but distinct audit signal. - httpverifier reclassifies its error mapping: HTTP 200 + valid:true → Identity (unchanged) HTTP 200 + valid:false → ErrTokenRejected (unchanged) HTTP 401 → ErrTokenRejected (unchanged) HTTP other 4xx → ErrTokenRejected (NEW: was ErrInvalidToken) HTTP 5xx → ErrProviderUnavailable (CHANGED) Network/transport error → ErrProviderUnavailable (CHANGED) Undecodable body → ErrProviderUnavailable (CHANGED; 200-but-garbage is verifier misbehavior, not a token issue) Local marshal/build err → ErrInvalidToken (unchanged; never the verifier's fault) - middleware.classifyAuthFailure: new branch returns "auth provider unavailable" as the client-visible message. - runner.authFailReason: new reason code "provider_unavailable" in the audit event. Documented in the function's reason-code table. Audit-event contract additions: auth_fail { reason: "provider_unavailable", ... } Operators dashboarding on `reason:provider_unavailable` can now alert on IdP downtime distinctly from token issues. Tests (all -race clean): TestVerify_500_ReturnsProviderUnavailable (httpverifier) TestVerify_502BadGateway_ReturnsProviderUnavailable TestVerify_NetworkError_ReturnsProviderUnavailable TestVerify_Non401_4xx_ReturnsRejected (regression: 400/403 stay rejected) TestVerify_UndecodableBody_ReturnsProviderUnavailable TestAuthAudit_EmitsAuthFailOnError (new "provider_unavailable" row) TestClassifyAuthFailure (new "auth provider unavailable" message row) Backward compat: - Chain behavior preserved (ErrProviderUnavailable fail-closed like ErrInvalidToken). - Existing client retry logic that retried on "invalid token" responses will still retry on "auth provider unavailable" (both are 401-class). - ErrInvalidToken sentinel preserved for token-side failures elsewhere (oidc malformed JWT, bad kid, etc.) — unchanged. OIDC provider not updated in this commit. Its JWKS-fetch failures currently surface via the chain's "any other error fails closed" path, mapped to authFailReason "infrastructure". Could be tightened to ErrProviderUnavailable in a follow-up, but reviewer's specific concern was the misleading "invalid token" signal from httpverifier — OIDC already produces "infrastructure" which is at least not misleading. Verification: go test -race forge-core/auth/... forge-cli/runtime/ — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues

…iz#7) Reviewer asked: is the egress matcher port-agnostic? If yes, the current TestAuthDomains_PortStripped behavior is correct; if no, an issuer on :8443 is silently blocked at JWKS-fetch time. Investigation result: YES, the egress matcher IS port-agnostic. All three callsites strip the port off the outbound host before passing to matcher.IsAllowed: - egress_enforcer.go:40 — req.URL.Hostname() before IsAllowed - egress_proxy.go:210 — extractHost() via net.SplitHostPort - safe_dialer.go:42 — net.SplitHostPort(addr) So auth_domains.go emitting hostname-only entries is correct and consistent. No code fix needed. What WAS missing: the cross-package contract was implicit. If someone later hardens the matcher to be port-aware ("allow :443 but not :8080"), auth_domains.go would silently break — every forge.yaml that points at a non-443 IdP would suddenly fail JWKS fetches. This commit: - Adds a doc comment on hostFromURL() naming the three callsites whose port-stripping behavior we depend on. If any of them changes, auth_domains needs to follow. - Adds TestAuthDomains_AssumesPortAgnosticMatcher that asserts the contract directly: builds a matcher with the host AuthDomains produces, checks the hostname-only form is allowed, AND checks that host:port form is NOT (yet) allowed. If a future change makes the matcher port-aware, the second assertion flips and the test flags the cross-package break. No production code changes. The fix is preventative documentation + test, not behavior change. Verification: go test -race ./forge-core/security/ ./forge-core/auth/... — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues

…nitializ#8) Reviewer flagged: forge-ui/static/dist/app.js is a 110KB hand-edited ES-module file living under a path (dist/) that conventionally signals "generated build artifact, don't touch." There is no source folder, no package.json, no bundler config, no sourcemap. Every UI change has been a direct edit to the "dist" file. This will only get worse as more wizard steps land. Decision: rename out of dist/ rather than introduce a bundler. A bundler (esbuild, etc.) would add real operational debt — package.json, npm install in CI, version pinning, sourcemap rotation, build step in the Makefile, contributor onboarding cost — for files that are hand-written single-file ES modules with no JSX, no TypeScript, no imports beyond htm/preact CDN globals. The file IS the source. Removing the misleading dist/ name communicates that honestly. Changes: - git mv app.js / style.css / index.html / monaco/ up one level from forge-ui/static/dist/ → forge-ui/static/. History preserved via git rename detection. - static/embed.go: //go:embed all:dist → //go:embed app.js style.css index.html monaco. Explicit list keeps embed.go itself out of the served assets and documents the asset surface. - server.go: drop fs.Sub(static.FS, "dist") — the FS is now rooted directly at assets. Removes the unused "io/fs" import. The SPA fallback's static.FS.Open() replaces distFS.Open(). Manual smoke (against built binary): - GET / → 200 text/html (index.html) - GET /app.js → 200 text/javascript, 110469 bytes (unchanged) - GET /style.css → 200 - GET /monaco/editor.js → 200, 2317605 bytes - GET /create → 200 (SPA fallback still works) - GET /api/wizard/meta auth_provider_types length == 4 (PR6 unaffected) Future: if app.js ever does grow JSX/TS/minification needs, that's the right time to introduce a build step — into a NEW static/src/ directory producing into static/build/ or similar. Adding it preemptively today is over-engineering. Verification: go build forge-ui/... — green full sweep — 39/39 packages pass golangci-lint v2.10.1 — 0 issues manual end-to-end smoke test of all asset paths — all 200

…nitializ#9) Reviewer flagged: handleCreateAgent decoded JSON, checked only name and model_provider, then handed opts.Auth (with an arbitrary Settings map[string]any) straight to CreateFunc. The validate package's ValidateAuthConfig existed but was never called on this path — so a buggy frontend could POST an oidc payload missing issuer/audience, the agent directory would scaffold cleanly with a malformed auth: block, and the error would only surface when the user ran `forge run`. By then the directory, .env.example, and channel tokens are all on disk. Fix: - New validateAuthPayload(opts.Auth) helper in handlers_create.go. - Called from handleCreateAgent immediately after the name/provider checks, BEFORE CreateFunc runs. - Translates AuthCreateOptions → types.AuthConfig exactly the way cmd/ui.go's createFunc does, so this check sees the same shape the eventual scaffold will. No drift between "wizard accepts" and "forge validate accepts". - 400 with the validation errors joined as the response message. - Modes "" / "none" / "custom" skip validation — they don't write provider entries, nothing to validate. Validation surface (delegated to validate.ValidateAuthConfig): - oidc: issuer + audience required - http_verifier: url required - static_token: token or token_env required - unknown type: rejected with list of known types The frontend's pre-submit checks (canNext in app.js step 8) already guard the happy path, but those are defense in depth at best — anyone with curl can bypass them. This is the server-side enforcement that matters. Tests: TestHandleCreateAgent_OIDCMissingIssuerRejected 400, error names "issuer" TestHandleCreateAgent_OIDCMissingAudienceRejected 400, error names "audience" TestHandleCreateAgent_OIDCBothFieldsAccepted 201 (regression) TestHandleCreateAgent_HTTPVerifierMissingURLRejected 400, error names "url" TestHandleCreateAgent_UnknownAuthModeRejected 400, error names unknown type TestHandleCreateAgent_NoneAndCustomSkipValidation 201 for mode=none and mode=custom TestHandleCreateAgent_WithoutAuthPayload 201 with nil Auth (regression) Each failure test also asserts that CreateFunc was NOT called — the agent directory must never exist on disk for malformed auth. Verification: go test -race ./forge-ui/ — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues

…tializ#10) Reviewer flagged: resolveAuth's comment claimed the loopback static_token is "ALWAYS prepended at the chain head (regardless of source)", but the actual code is conditional on r.authToken != "". It works in practice only because ResolveAuth() always mints a token in the non-NoAuth path. Any future refactor that short-circuits that path would silently break channel-adapter callbacks — no compile error, no test failure, just "my Slack bot stopped working after we updated forge" weeks later. Fix is documentation + an invariant test. The conditional itself is correct (--no-auth path returns early before the prepend, and that path also doesn't mint a token), so the code stays as-is. What was missing is making the upstream dependency explicit so a future maintainer either preserves the invariant or notices when they don't. Changes: - resolveAuth doc: replaced "ALWAYS prepended" with an explicit description of the precondition (ResolveAuth() must have minted a token) and a reference to the pinning test. Inline comment at the prepend site explains why the conditional is defensive, not load-bearing. - ResolveAuth() doc: states the invariant it maintains (after nil return, either r.authToken != "" or r.cfg.NoAuth is true) and names the downstream consumer + test. - New tests: TestResolveAuth_InvariantMintsTokenInNonNoAuthPath Direct invariant pin: non-NoAuth + ResolveAuth() returns nil → r.authToken MUST be non-empty. Failure message tells future maintainers what they broke and where to look. TestResolveAuth_NoAuthPathDoesNotMintToken Counterpart: --no-auth path does NOT mint. Pins the "anonymous mode skips token generation" property too, so neither direction can drift. If someone ever refactors ResolveAuth to skip minting in the non-NoAuth path, the invariant test fails immediately — and the failure message points straight at the loopback-prepend dependency. Verification: go test -race ./forge-cli/runtime/ — green (incl. 2 new tests) full sweep — all packages pass golangci-lint v2.10.1 — 0 issues

Six related cleanups bundled per the reviewer's grouping: 11a. statictoken length leak (provider.go) subtle.ConstantTimeCompare returns 0 instantly on length mismatch without inspecting bytes, leaking the configured token's length via measurable timing differences across many trials. Fix: SHA-256 hash both sides and constant-time compare the 32-byte digests. The comparison is now over equal-length inputs in every case. Pinned by TestVerify_MismatchedLengths_AreRejected — exercises empty, prefix, suffix, off-by-one, and same-length-but-different inputs. 11b. OIDC PSS-default mismatch (jwks.go, provider.go) parseJWK was defaulting an RSA JWK with no `alg` field to "RS256". The cross-check at provider.go then rejected any PS256/PS384/PS512 token signed by the same RSA key — RFC 7518 says "infer from context" when alg is absent, and the same RSA key signs both RS* and PS* families. Fix: preserve empty alg through parseJWK. In the cross-check, when the JWK's alg is empty, fall back to trusting the token's header alg — which the asymmetric whitelist already validated upstream. When the JWK declares an alg explicitly, the strict comparison still applies (algorithm-confusion defense intact). Pinned by TestVerify_JWKWithoutAlgAcceptsAnyAsymmetric and TestVerify_JWKExplicitAlgStillStrict. 11c. mergeEgressDomains sort order (init_egress.go) deriveEgressDomains returned sorted output; mergeEgressDomains appended AuthEgressHosts unsorted at the tail. When AuthEgressHosts came from a map iteration, the rendered forge.yaml's egress.allowed_domains section diffed across runs even when the underlying input set was unchanged. Fix: sort the merged result. Generated forge.yaml is now stable. Pinned by TestMergeEgressDomains_SortedOutput. 11d. renderAuthBlock dead `name` parameter (init_auth.go, init.go) Caller passed opts.AuthMode for both the mode and name arguments; the name was suppressed by the equality check inside the function. The wizard never captures an explicit provider name today. Fix: dropped the parameter. If a future wizard step adds name capture, reintroduce it then. Existing tests updated; renamed TestRenderAuthBlock_NameSameAsModeSuppressed → TestRenderAuthBlock_NoNameEverEmitted with a doc-pinning intent. 11e. TUI backspace-on-empty undiscoverable (step_auth.go) Backspace-on-empty was the documented escape hatch out of an active sub-step input, but no visible affordance surfaced it. Fix: inline kbd hint under the input in each sub-step phase — "Backspace at empty back to picker". Doesn't change behavior, makes it discoverable. 11f. Identity.Claims unfiltered shallow copy (claims.go, provider.go) The OIDC provider's copyClaims returns the full JWT claim set on Identity.Claims, including custom claims the issuer adds (PII, raw group memberships, internal IDs). Downstream consumers reading Identity.Claims will see all of it. Fix is documentation for now: WARNING on both the Identity.Claims field and the copyClaims helper, naming the risk and pointing future consumers at typed Identity fields (UserID/Email/OrgID/Groups) as the portable surface. Filtering is properly a future authz-layer concern — not something this package can do without policy input. Verification: go test -race forge-core/... forge-cli/... — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues

Comment alignment in the candidate-tokens slice tripped CI's 'gofmt -l forge-core forge-cli forge-plugins' check. Mechanical fix — gofmt re-aligns trailing comments to the same column. CI Test job was failing here: Test → Check formatting → 'test -z $(gofmt -l ...)' exits 1 on the new TestVerify_MismatchedLengths_AreRejected block I added in b3d4264 (review #11a). Local gofmt -l also reports it now; pushed.

…ializ#12) Six coverage gaps and one hardening item from the reviewer's eight-point batch. Two items in the original list are already covered: 12.2 (--no-auth + YAML auth precedence) is exercised by the four TestResolveAuth_NoAuth* tests added in review initializ#4 commit b33522d. 12.4 (TestAuthDomains in forge-core/security) is exercised by the eight TestAuthDomains_* tests in security/auth_domains_test.go (added in PR3 + strengthened in review initializ#7 commit cf4ae14). The six real gaps: 12.1 Warning capture for --auth-url + YAML auth precedence TestBuildUserAuthChain_BothSourcesPrefersYAML was asserting the chain-behavior half of the contract through a nop logger that swallowed the Warn — silent regression on the "we logged a warning" half. Switched to recordLogger and added an assertion on "--auth-url and forge.yaml". If a refactor ever drops that warning, the test fails. 12.3 Full resolveAuth + YAML loopback head-of-chain test auth_chain_test.go exercised buildLegacyAuthChain. Nothing exercised resolveAuth() end-to-end with a YAML auth: block AND a minted internal token, then asserted "loopback is at chain head, ahead of YAML provider." Added TestResolveAuth_YAMLAuth_LoopbackPrependedAtChainHead that inspects chain.Providers() ordering directly and smoke-tests the loopback token verifies with the "internal" source. 12.5 Backend contract test for the frontend's claim_map translation app.js's buildAuthPayload translates flat groups_claim → nested claim_map.groups before submitting. The backend doesn't run that translation; if the frontend ever regresses, downstream YAML would silently drop the custom claim. Added two tests in the UI handler: - TestHandleCreateAgent_NestedClaimMapShape pins what the backend EXPECTS post-translation (claim_map.groups present). - TestHandleCreateAgent_FlatGroupsClaim_PreservedAsExtraField documents the failure mode if the JS regresses — the request succeeds structurally but claim_map is missing. Real JS tests would require a Node toolchain we deliberately don't add (see PR6 / review initializ#8). These Go tests are the next-best contract. 12.6 Backspace-to-picker behavior in step_auth.go Added TestAuthStep_BackspaceOnEmptyInputReturnsToPicker and a counterpart TestAuthStep_BackspaceWithContentDoesNotReturnToPicker to pin the escape-hatch behavior the review #11e hint advertises. 12.7 Invalid issuer URL blocks advance validateHTTPSURL was unit-tested standalone; never exercised as a wizard interaction. Added TestAuthStep_InvalidIssuerURLBlocksAdvance: bad URL + Enter → still on issuer phase; type a valid replacement + Enter → advances to audience. 12.8 renderAuthBlock YAML quoting hardening writeYAMLMap previously emitted "key: value" raw. Values containing ": " sequences, "#" markers, leading YAML-significant chars (!, *, -, [, {, etc.), reserved tokens (true/false/null/yes/no…), or control chars could break the parser. Added: - yamlScalar/needsYAMLQuoting helpers that emit double-quoted + escaped strings only when the input would otherwise change meaning. Plain values pass through unchanged so generated forge.yaml remains readable. - TestRenderAuthBlock_QuotesUnsafeValues (19 sub-cases) pinning the predicate's classification per input shape. - TestRenderAuthBlock_QuotedValueIsParseable round-trips a value containing ": ", "#", "!", and quotes through the renderer → yaml.v3 unmarshaler, asserting the rendered line is valid YAML AND decodes back to the original string. Note: the renderer remains a focused subset (string / number / bool / map). If auth-settings ever grow richer types, switch to yaml.v3 for the whole block. Verification: test -z "$(gofmt -l forge-core forge-cli forge-plugins)" — clean go test -race ./forge-core/... ./forge-cli/... — all green golangci-lint v2.10.1 — 0 issues

initializ-mk · 2026-05-23T20:48:22Z

Round-3 review — reviews #5–#12

Verified all 8 fix commits against the working tree. go test -race on all affected packages (auth, httpverifier, oidc, statictoken, runtime, cmd, ui, security, validate) — all green.

✅ All items resolved

#	Item	Verdict
5	JWKS no-hammering proof	Verified. `TestJWKS_FailedRefreshDoesNotHammerIdP` (`provider_test.go:1049`) uses `atomic.Int32` to count `/jwks` hits, fires 50 unknown-kid requests during backoff → asserts exactly 1 hit; advances `35s` past `refetchGrace` → asserts 2; another burst of 50 → stays at 2. Real counters, not log greps.
6	`ErrProviderUnavailable` sentinel	Verified end-to-end: sentinel at `provider.go:146`; httpverifier mapping (5xx, network err, undecodable body → ErrProviderUnavailable; 401 / non-401 4xx → ErrTokenRejected) at `httpverifier/provider.go:120–171`; chain semantics via the generic non-yield-error path (`chain_provider.go:43–56`); user-facing message in `classifyAuthFailure` (`middleware.go:153–157`); stable reason code `"provider_unavailable"` in `runner.go:1924–1940`. Four targeted tests in httpverifier.
7	Port-stripping contract	Verified. `auth_domains.go:81–95` documents the three matcher callsites the contract depends on. New `TestAuthDomains_AssumesPortAgnosticMatcher` (`auth_domains_test.go:152–195`) builds a real `DomainMatcher` and asserts hostname is allowed AND `host:port` form is NOT — so any future port-aware change flips this assertion loudly.
8	`static/dist/` rename	Verified. `forge-ui/static/dist/` removed; `static/` directly contains `app.js`, `style.css`, `index.html`, `monaco/`, `embed.go`. `//go:embed` directives updated; `server.go:108–128` serves directly from `static.FS` (no `fs.Sub("dist")`). Convention documented in `embed.go:14–17`. All `ui` tests green.
9	Server-side auth validation	Verified. `handleCreateAgent` calls `validateAuthPayload(opts.Auth)` before `CreateFunc` (`handlers_create.go:162–165`), returns 400 with the validator's detail. Translation to `types.AuthConfig` matches the `cmd/ui.go` shape. `TestHandleCreateAgent_OIDCMissingIssuerRejected` asserts 400 + "issuer is required" + `CreateFunc` not called.
10	Loopback invariant	Verified. `runner.go:1752–1762` doc tightened to "WHEN ResolveAuth() has minted one." `ResolveAuth` itself (`runner.go:130–137`) documents the mint invariant. Code remains conditional (correct). Pinned by `TestResolveAuth_InvariantMintsTokenInNonNoAuthPath` + the `--no-auth` counterpart in `auth_chain_test.go:308–341`.
11	Six smaller bugs	Verified all six: statictoken now SHA-256 hashes both sides before constant-time compare (`provider.go:108–123`, with length-leak comment + `TestVerify_MismatchedLengths_AreRejected`); JWK empty-`alg` fallback fixed at `jwks.go:306–317` + `provider.go:233–243` with `TestVerify_JWKWithoutAlgAcceptsAnyAsymmetric` and `TestVerify_JWKExplicitAlgStillStrict`; `mergeEgressDomains` now sorts (`init_egress.go:28–42`); `renderAuthBlock` `name` parameter dropped; backspace hint added to TUI; `Identity.Claims` unfiltered-copy warning at `provider.go:53–65`.
12	Coverage gaps + YAML quoting	Verified. `TestBuildUserAuthChain_BothSourcesPrefersYAML` now uses `recordLogger` and asserts `warnedAbout("--auth-url and forge.yaml")`. Backend tests for claim_map shaping (`TestHandleCreateAgent_NestedClaimMapShape` + `…_FlatGroupsClaim_PreservedAsExtraField`). Wizard tests for Backspace + invalid issuer URL. YAML quoting: new `yamlScalar`/`needsYAMLQuoting` (`init_auth.go:217–274`) handles leading specials, `:` , `#`, control chars, YAML 1.1 bool/null literals, empty string. Round-trip test via `yaml.v3`.

Minor note (acknowledged in commit message, not a blocker)

OIDC's own JWKS-fetch failures still surface as "infrastructure" rather than "provider_unavailable". Pre-existing behavior preserved; commit body of #6 calls this out as a follow-up. Reasonable scoping decision.

Test suite

ok  forge-core/auth                          1.719s
ok  forge-core/auth/providers/httpverifier   2.767s
ok  forge-core/auth/providers/oidc           8.239s
ok  forge-core/auth/providers/statictoken    3.601s
ok  forge-cli/runtime                        6.322s
ok  forge-cli/cmd                           19.215s
ok  forge-core/security                     13.125s
ok  forge-core/validate                      1.498s
ok  forge-ui                                 3.373s

Verdict

All 12 review items across three rounds are resolved with corresponding tests pinning the behavior. The two highest-risk fixes in this round — ErrProviderUnavailable (new sentinel touching chain semantics) and the static/dist/ rename (path refactor) — both look clean: the sentinel rides the existing fail-closed contract through ChainProvider without needing a special branch, and the static rename touches all //go:embed directives and server callers consistently.

LGTM — ready to merge.

Naveen Kurra added 7 commits May 22, 2026 15:34

Naveen Kurra added 4 commits May 23, 2026 14:15

Naveen Kurra added 9 commits May 23, 2026 14:36

naveen-kurra requested a review from initializ-mk May 23, 2026 20:17

initializ-mk merged commit 7998f12 into initializ:main May 23, 2026
9 checks passed

naveen-kurra mentioned this pull request May 24, 2026

feat(auth): Phase 2 — AWS Sigv4, GCP IAP, Azure AD providers (v0.11.0) #79

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(auth): introduce pluggable Provider chain (PR1 foundation)#77

refactor(auth): introduce pluggable Provider chain (PR1 foundation)#77
initializ-mk merged 20 commits into
initializ:mainfrom
naveen-kurra:pr1-auth-provider-foundation

naveen-kurra commented May 22, 2026

Uh oh!

initializ-mk commented May 23, 2026

Uh oh!

initializ-mk commented May 23, 2026

Uh oh!

initializ-mk commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

naveen-kurra commented May 22, 2026

Summary

Where this fits in the architecture diagrams

What this PR adds

What's NOT in this PR (deliberate)

Backward compatibility

Verification (all pass locally)

Test plan (manual smoke)

Notes for reviewers

Uh oh!

initializ-mk commented May 23, 2026

Code Review

Blockers — security / correctness

1. JWKS cache TTL is dead code

2. Issuer trailing-slash inconsistency

3. Middleware silently disables auth when Chain == nil

4. --no-auth + YAML Required: true silently grants anonymous access

5. JWKS refresh failure does not update cooldown

6. httpverifier classifies 5xx/network errors as ErrInvalidToken

7. auth_domains.go strips port via u.Hostname()

Blockers — process / maintainability

8. forge-ui/static/dist/app.js hand-edited under dist/

9. Web UI does not validate auth payload server-side

10. Loopback "ALWAYS prepended" doc overstates the code

Smaller bugs

Test gaps

What's strong (worth keeping as-is)

Verified PR claims (all check out)

Uh oh!

initializ-mk commented May 23, 2026

Re-review after fix commits

✅ Fixed and verified

❌ Still outstanding

Blockers

Smaller bugs

Test gaps

Summary

Uh oh!

initializ-mk commented May 23, 2026

Round-3 review — reviews #5–#12

✅ All items resolved

Minor note (acknowledged in commit message, not a blocker)

Test suite

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3. Middleware silently disables auth when `Chain == nil`

4. `--no-auth` + YAML `Required: true` silently grants anonymous access

6. httpverifier classifies 5xx/network errors as `ErrInvalidToken`

7. `auth_domains.go` strips port via `u.Hostname()`

8. `forge-ui/static/dist/app.js` hand-edited under `dist/`