refactor(auth): introduce pluggable Provider chain (PR1 foundation)#77
Conversation
Replaces the monolithic auth middleware with a Provider interface, ChainProvider composition, and a database/sql-style factory registry. Existing --auth-url logic is migrated into the new http_verifier provider; the implicit channel-loopback token becomes a static_token provider injected at the chain head. No externally observable behavior change. Backward compatible with all existing --auth-url / FORGE_AUTH_URL / FORGE_AUTH_ORG_ID flags and env vars; channel-adapter loopback continues to short-circuit before the external verifier is consulted. Why: PR2 (OIDC), PR3 (forge.yaml auth: block), and Phase 3 (Okta) plug in via Register() — no further changes to middleware or runner. Tests: - 94.3% combined coverage across forge-core/auth/... - 11 e2e tests in forge-cli/runtime/auth_chain_test.go verifying legacy --auth-url, loopback short-circuit, skip-paths, wire format - -race clean across all touched packages
Implements auth.Provider for any OIDC-compliant issuer (Auth0, Keycloak, Azure AD, Google Workspace, Ping, JumpCloud, Okta in OIDC-only mode). Security posture: - Reject alg=none and HMAC algorithms (HS256/384/512) outright - Whitelist asymmetric algs: RS256/384/512, PS256/384/512, ES256/384/512 - Algorithm derived from JWKS-declared alg, NOT token header — defends against algorithm-confusion attacks - iss exact match, aud contains check with optional azp == ClientID fallback, exp/nbf with configurable ClockSkew Performance: - Lazy discovery + JWKS load (no network on New) - Discovery cached forever (per OIDC spec, issuer is stable) - JWKS cache with sync.RWMutex hot-path; refresh on unknown kid - Stampede protection: refetch-grace window suppresses repeated JWKS fetches for the same bad kid Why: PR3 (forge.yaml auth: block) can now plug `type: oidc` directly into the auth.providers list. Phase 3 (Okta) composes oidc.Provider to add introspection + group sync without rewriting JWT verification. Tests (coverage 83.3% on oidc, 88.0% combined auth): - Happy paths: RS256, ECDSA P-256 - Security: alg=none reject, HMAC reject, alg-confusion reject, missing-kid reject, HMAC-in-JWKS reject - Standard claims: wrong iss, wrong aud, azp fallback, expired exp, not-yet-valid nbf, ClockSkew tolerance, missing exp, missing aud, multi-audience array - JWKS: lazy load, rotation triggers refresh, unknown kid after refresh, stampede prevention, malformed RSA key skipped, 500 response error - Discovery: issuer-mismatch rejected, JWKSURL override bypasses discovery - ClaimMap: defaults, custom names, X-Org-ID header override, groups as single-string tolerated - Concurrency: 50 parallel Verify calls under -race - Context: respects cancellation - Factory: registers as "oidc", forward-compat with unknown YAML keys No regressions: - All forge-core packages pass (28/28) - All forge-cli packages pass (14/14) - All forge-plugins packages pass (4/4) - Race detector clean across auth/... + cli/runtime - gofmt + go vet clean - Anti-pattern greps clean (no http.DefaultClient, no plain == on tokens, no token/secret in logs, no disallowed algs accepted)
Adds the typed AuthConfig schema, validation, and the AuthDomains
egress hook. The runner now builds the provider chain via
auth.Registry.BuildChain(cfg.Auth.Providers) when forge.yaml has an
auth: block, falling back to the legacy --auth-url path otherwise.
Schema:
auth:
required: true
providers:
- type: oidc
name: corporate-sso
settings:
issuer: https://login.example.com
audience: api://forge
Precedence:
--no-auth → anonymous (unchanged)
forge.yaml auth.providers → Registry.BuildChain
--auth-url / FORGE_AUTH_URL → legacy http_verifier
nothing → loopback-only chain
When both forge.yaml auth: and --auth-url are configured, the YAML
block wins and a warning is logged. Loopback static_token is always
prepended for channel-adapter callbacks.
Egress integration:
security.AuthDomains(cfg.Auth) returns issuer/verifier hosts.
The runner merges these into the egress allowlist BEFORE the
enforcer is constructed — without this, an OIDC issuer configured
in forge.yaml would be silently blocked at JWKS-fetch time.
Validation:
validate.ValidateAuthConfig is called from ValidateForgeConfig.
Each provider type's required settings are checked at `forge
validate` time (not just at runtime construction):
- oidc: issuer + audience
- http_verifier: url
- static_token: token or token_env (literal token warns)
- unknown type: error with list of known types
Side-effect import: cmd-side runner.go imports providers/oidc as `_`
so auth.Build("oidc", settings) succeeds via factory registration.
Tests (88.6% validate, 90.8% security, 88% combined auth):
- Validation: every provider type, missing required fields,
unknown type, missing type, duplicate names warning,
ValidateForgeConfig integration
- AuthDomains: oidc issuer, http_verifier URL, JWKS-URL override,
multi-provider dedup, static_token contributes nothing, malformed
URLs skipped, port stripping, unknown provider returns nothing
- Runner: buildChainFromConfig (empty, single provider, ordered
chain with first-match-wins, invalid type, factory error),
buildLegacyHTTPVerifierChain, buildUserAuthChain precedence rules
(neither / only legacy / only YAML / both prefers YAML / oidc via
registry)
No regressions:
- All forge-core packages pass (28/28)
- All forge-cli packages pass (15/15)
- All forge-plugins packages pass (4/4)
- Race detector clean across auth/.../validate/security/runtime
- gofmt + go vet clean
- Anti-pattern greps clean (no DefaultClient, no token-in-logs,
no plain == on tokens, oidc uses injected client only)
golangci-lint v2.10.1 flagged removeKey as unused — it was scaffolded for a key-rotation test case that was implemented differently (by adding a new key without removing the old). Dropping the dead code keeps CI lint green.
Replaces the thin {method, path, remote_addr} payload of the legacy
auth_success/auth_failure events with structured fields suitable for
operators to dashboard and alert on, without leaking PII.
Events:
auth_verify (success)
{ provider, user_id, org_id, groups_count, token_kind,
method, path, remote_addr }
auth_fail (failure)
{ reason, token_kind, method, path, remote_addr }
Reason codes (stable contract — changing is a breaking change for
downstream consumers):
missing_token no Authorization header
rejected provider recognized + denied (revoked, expired, wrong iss/aud)
invalid token malformed / cryptographically invalid
not_for_me chain exhausted, no provider claimed the token
infrastructure transient provider error (network, etc.)
Privacy invariants enforced by test:
- email, claims, token bytes NEVER appear in either event
- test asserts the literal absence of email / SSN / secret values
from the serialized audit line
Interface changes:
auth.MiddlewareOptions.OnAuth signature widened from
func(*http.Request, bool)
to
func(*http.Request, *auth.Identity, error, string)
so the caller can read the verified principal + chain error + token
kind, instead of just a bool. Only caller (runner.go) updated.
New: auth.ErrMissingBearer sentinel for the no-header case so callers
can distinguish missing-token from chain errors.
New: auth.TokenKind(token) returns "jwt" | "opaque" | "empty" — cheap
structural classification suitable for logging.
Backward compat:
AuditAuthSuccess and AuditAuthFailure constants kept as deprecated
aliases (point to the new "auth_verify"/"auth_fail" strings) so any
consumer that grep'd for the old constants compiles but emits the
new string. Scheduled for removal in v0.11.0.
Tests (100% coverage on TokenKind, makeAuthAuditCallback, authFailReason):
- TokenKind: jwt/opaque/empty/edge-cases
- Middleware: OnAuth receives (Identity, err, kind) on every decision
- Middleware: ErrMissingBearer surfaces on no-header requests
- Audit: auth_verify carries provider/user_id/org_id/groups_count/kind
- Audit: auth_fail carries stable reason codes for each sentinel
- Audit: PII-leak test (email + secret claims never appear in payload)
- Audit: CorrelationID propagates from request context
- E2E: full middleware → callback → audit log path
No regressions:
- All forge-core packages pass (28/28)
- All forge-cli packages pass (15/15)
- All forge-plugins packages pass (4/4)
- go test -race clean across affected packages
- golangci-lint v2.10.1: 0 issues
- gofmt + go vet clean
Adds the Authentication step to the `forge init` wizard so users can
configure the auth chain without hand-writing forge.yaml. Same surface
is exposed via non-interactive --auth* flags for CI/script use.
Interactive flow (between Egress and Review steps):
🔐 Authentication
○ None — anonymous (default)
○ OIDC (JWT) — Auth0, Keycloak, Azure AD, Google, Okta-OIDC, …
○ HTTP Verifier — legacy external /verify endpoint
○ Custom — write a commented stub, edit forge.yaml manually
OIDC sub-flow: Issuer → Audience → Groups Claim (defaults to "groups")
HTTP Verifier sub-flow: URL → Default org_id (optional)
None / Custom: pick and continue, no inputs
Non-interactive flags:
--auth=none|oidc|http_verifier|custom
--auth-issuer=<url> (required with --auth=oidc)
--auth-audience=<aud> (required with --auth=oidc)
--auth-url=<url> (required with --auth=http_verifier)
--auth-default-org=<org> (http_verifier, optional)
--auth-groups-claim=<name> (oidc, default: groups)
Scaffold output:
- Adds auth: block to forge.yaml (alphabetical keys for stable diffs)
- Auto-merges the issuer/verifier host into egress.allowed_domains
- For custom mode: writes a commented-out template stub instead
- Settings are pre-rendered as YAML in Go (init_auth.go) so nested
maps like claim_map: {groups: roles} emit correctly without
template helpers
UX details:
- "None" is the first option; one Enter to skip the entire step
- Backspace on empty input returns to the picker (consistent with
other wizard steps)
- Live validation on issuer/URL fields: must be parseable + http(s)
- The wizard's existing egress preview displays the auth-host
auto-add in the final review step
No runtime behavior change — PR3 already wired forge.yaml auth: into
the runner. This PR is pure UX: easier to author the YAML.
Tests (TUI step + scaffold helpers):
- All 4 mode flows: None (default), Custom (stub), OIDC (full + custom
groups claim), HTTP Verifier (with/without default org)
- Issuer trailing-slash trimmed; groups claim suppressed at default
- AuthEgressHosts populated correctly per mode
- Apply() correctness for each mode
- Summary text per mode
- renderAuthBlock: empty/none/custom/oidc/http_verifier shapes,
nested claim_map, deterministic key ordering, name suppression
when same as type
- buildAuthFromFlags: happy paths, missing required fields, unknown
modes, claim-map suppression at default
- mergeEgressDomains: dedupe + order preservation
- End-to-end: `forge init --auth=oidc` produces a forge.yaml that
passes `forge validate`
- E2E negative: `--auth=oidc` without --auth-issuer fails with the
expected error message
No regressions:
- All forge-core packages pass (28/28)
- All forge-cli packages pass (15/15)
- All forge-plugins packages pass (4/4)
- golangci-lint v2.10.1: 0 issues
- gofmt + go vet clean
Adds the matching Authentication step to the Web UI wizard (`forge ui`)
so users without a terminal can configure the auth chain through the
browser-based create-agent flow. Same backend contract as PR5 — both
surfaces feed the single scaffold path that writes forge.yaml.
Backend changes:
- forge-ui/types.go: new AuthCreateOptions (mode + settings) field on
AgentCreateOptions; new AuthProviderTypeMeta + AuthProviderTypes on
WizardMetadata.
- forge-ui/handlers_create.go: GET /api/wizard/meta now returns the
four founding auth provider types so the frontend renders the
picker from server-driven metadata (adding Okta in Phase 3 = one
line here, no frontend change).
- forge-cli/cmd/ui.go: createFunc translates opts.Auth into the
initOptions fields the scaffold expects (AuthMode, AuthSettings,
AuthEgressHosts).
- forge-cli/cmd/init_auth.go: new authEgressHostsFromSettings helper
extracts issuer/verifier hosts for the Web UI path (same hosts the
TUI wizard and --auth flags compute).
Frontend changes (forge-ui/static/dist/app.js):
- WIZARD_STEPS extended from 9 to 10 entries (Authentication slotted
between Env Vars and Review).
- form.auth = {mode, settings} added to initial state.
- Step renders a radio group fed by meta.auth_provider_types, with
conditional OIDC / HTTP Verifier sub-forms.
- canNext validation: OIDC requires issuer + audience; HTTP Verifier
requires url.
- buildAuthPayload reshapes the wizard's flat groups_claim into the
nested claim_map: {groups: ...} shape before submission; strips
empty optional fields and trims trailing slashes from issuer.
- Review step adds an "A2A Auth" row with a one-line summary.
UX details:
- None is the default — anonymous unless the user picks otherwise.
- Switching modes resets settings so stale fields from a previously
selected mode don't leak into the submitted payload.
- Custom mode writes a commented stub the user can edit later.
Backward compat:
- Requests without `auth` continue to work (anonymous access).
- AuthCreateOptions.Settings uses `map[string]any` so future provider
types add fields server-side without breaking older payloads.
Tests:
- handleGetWizardMeta now asserts all 4 auth provider types are
exposed with non-empty label + description.
- New TestHandleCreateAgent_WithAuthPayload: AgentCreateOptions.Auth
round-trips through the JSON boundary with correct settings.
- New TestHandleCreateAgent_WithoutAuthPayload: omitted auth field
keeps Auth nil (no spurious anonymous-mode payload).
- New authEgressHostsFromSettings tests: oidc, oidc+jwks_url,
http_verifier, no-network modes, nil settings, malformed URL.
- End-to-end smoke (manual): forge ui POST /api/agents with auth
payload → generated forge.yaml has auth: block + login.example.com
auto-added to egress.allowed_domains.
No regressions:
- All 4 modules pass (forge-core, forge-cli, forge-plugins, forge-ui)
- go test -race clean on touched packages
- golangci-lint v2.10.1: 0 issues
- gofmt + go vet clean
Code ReviewReviewed against the two-phase plan (Phase 1: static_token + http_verifier with Architecture and security defenses are largely sound. The items below are fixable without redesign. Blockers — security / correctness1. JWKS cache TTL is dead code
2. Issuer trailing-slash inconsistency
3. Middleware silently disables auth when
|
Before this fix the JWKS cache's ttl/lastFetch fields were set but never
read — the cache only refreshed on unknown-kid lookups. Real-world IdPs
revoke keys by REMOVING them from the JWKS document, which means a stolen
private key remained trusted by Forge until process restart. The
"refresh-on-miss" semantics handle key ROTATION (new kid arrives) but
not key REVOCATION (kid silently disappears).
Reviewer flagged: forge-core/auth/providers/oidc/jwks.go — high severity.
Fix:
- Split refresh-time state into two fields:
lastSuccessful — drives TTL accounting; only advanced on success
lastAttempt — drives error backoff; advanced on every attempt
This stops a failed refresh during an IdP outage from silently
extending trust in already-cached keys.
- Get() now triggers a refresh when (now - lastSuccessful) > ttl, on
the request path. A stale cache treats every kid as a miss so the
refresh actually runs.
- backoffActive() suppresses retry storms during JWKS outages
(refetchGrace window — 30s default).
- Successful refresh clears lastMissRefresh, so a previously-unknown
kid can be re-attempted immediately after a successful reload.
- Injectable clock (`now func() time.Time`) so tests exercise TTL
expiry without time.Sleep.
Tests (all -race clean):
TestJWKS_TTLExpiryForcesRefresh
Same kid, second call within TTL → no refetch.
Advance virtual time past TTL → refetch happens.
TestJWKS_RevokedKeyEventuallyRejected
Sign with key A → accept.
IdP removes key A from JWKS.
Within TTL → still accepted (acceptable revocation lag).
Past TTL → refresh picks up new JWKS → token rejected (ErrInvalidToken).
TestJWKS_FailedRefreshDoesNotExtendTTL
Warm cache → fail JWKS → past TTL → refresh fails (returns error).
Endpoint recovers → past refetchGrace → next call retries refresh
successfully (proves lastSuccessful was NOT bumped by the failure).
Misleading doc comment removed; new struct comment documents the
two-timestamp design and exact refresh triggers.
Backward compat: no API changes. Default TTL (1 hour) and min TTL
(5 min) clamp unchanged. Phase 1 OIDC behavior preserved otherwise.
Verification:
go test -race ./forge-core/auth/providers/oidc/ — all green
go test -count=1 forge-core/... forge-cli/... forge-plugins/... forge-ui/...
— 47/47 packages pass
golangci-lint v2.10.1 — 0 issues
…itializ#2) Before: newDiscovery trimmed the configured Issuer, but the discovery-doc check compared the trimmed value against the IdP's RAW emitted form, and jwt.WithIssuer received the user's RAW configured form. Result: any IdP that emits iss with a trailing slash (Auth0) was spuriously rejected against a config without slash, and vice versa. Reviewer flagged: provider.go vs discovery.go. Fix: - New Config.normalize() trims trailing slash from Issuer; called once in New() so every downstream comparison uses the same canonical form. - Discovery comparison now trims meta.Issuer before checking against the already-trimmed d.issuer. - JWT iss validation: replaced jwt.WithIssuer (strict equality, no trim option) with a manual checkIssuer that trims both the token's iss claim and the configured Issuer before comparing. Security: trimming a single trailing slash cannot collapse distinct issuers — "https://attacker.example" and "https://legit.example" remain distinct after trim. The trim is normalization, not weakening. TestIssuer_GenuineMismatchStillRejected guards this invariant. Backward compat: existing configs continue to work; new configs with trailing slashes now also work. JWT iss claim with trailing slash now also works. Tests: TestIssuer_ConfigWithoutSlash_TokenWithSlash (Auth0-shaped iss + Okta-shaped config) TestIssuer_ConfigWithSlash_TokenWithoutSlash (reverse) TestIssuer_BothWithoutSlash_StillWorks (regression — common case) TestIssuer_BothWithSlash_StillWorks (regression — Auth0 paired) TestIssuer_GenuineMismatchStillRejected (security guard) TestIssuer_TokenMissingIssClaim (no iss → ErrTokenRejected) TestDiscovery_TrailingSlashTolerated (discovery doc has slash, config doesn't) Verification: go test -race ./forge-core/auth/providers/oidc/ — all green full sweep — 47/47 packages pass golangci-lint v2.10.1 — 0 issues
…ializ#3) Reviewer flagged: forge-core/auth/middleware.go silently returned a no-op wrapper when opts.Chain == nil. A misconfigured runner that forgot to wire a chain would silently serve unauthenticated requests in production. High severity — the open-prod-endpoint failure mode is the highest-impact misconfiguration in the auth subsystem. Fix: - New MiddlewareOptions.AllowAnonymous explicit opt-in (default false). - Middleware() panics at construction when Chain == nil && !AllowAnonymous. Fails loudly at startup, not silently at the first real request. - Runner updated to set AllowAnonymous=true in the two legitimate anonymous paths: 1. --no-auth flag (operator explicit opt-in) 2. No auth: block AND no --auth-url AND no channels (legacy local dev default — preserved for backward compat) Backward compat: - Existing deployments with a configured chain — unchanged. - Deployments with --no-auth — unchanged (runner sets the flag). - Deployments that ran anonymous by accident — now panic at startup with a clear, actionable message naming the flag to set. This IS a behavior change, by design. Tests (all -race clean): TestMiddleware_NilChainPanicsWithoutAllowAnonymous Asserts the panic happens AND the message mentions AllowAnonymous so operators know how to fix it. TestMiddleware_NilChainWithAllowAnonymousPassesThrough Counterpart: explicit opt-in works. TestMiddleware_NonNilChainIgnoresAllowAnonymous AllowAnonymous is only consulted when Chain is nil; a configured chain always enforces auth. The existing PR1 "nil chain passes through" test updated to use the new AllowAnonymous flag. The E2E TestE2E_NoAuthConfigured_AnonymousAccess test driver sets AllowAnonymous=true automatically when given a nil chain. Verification: go test -race ./forge-core/auth/... ./forge-cli/runtime/ — all green full sweep — 47/47 packages pass golangci-lint v2.10.1 — 0 issues
…initializ#4) Reviewer flagged: a user who configures forge.yaml with 'auth: { required: true, providers: [...] }' and then runs the agent with --no-auth gets anonymous access silently. The flag and the YAML block were treated independently. The --no-auth happy-path short-circuit returned an anonymous middleware before consulting cfg.Auth at all. This is the second-highest-impact misconfiguration class in the auth subsystem (open prod endpoint by mistake), and it's worse than the nil-Chain case (initializ#3) because here the operator deliberately wrote 'required: true' into their config. Fix (graduated severity): auth.required == true + --no-auth → startup error (refuse) providers populated + --no-auth → warning (operator intent unclear; we proceed but make the override visible) empty auth block + --no-auth → unchanged (anonymous, no log) Error message names what's wrong AND lists the three ways to fix it: - remove --no-auth - set 'auth.required: false' - delete the 'auth:' block Warning includes provider count so dashboards can alert. Tests (all -race clean): TestResolveAuth_NoAuthWithRequiredFails auth.required=true + --no-auth → error; message contains "--no-auth conflicts with". TestResolveAuth_NoAuthWithProvidersWarns providers set, required=false + --no-auth → no error, warning emitted containing "--no-auth overrides". TestResolveAuth_NoAuthWithEmptyYAMLConfig_NoWarning Regression: --no-auth alone (no auth: block) still works without spurious warning. TestResolveAuth_NoAuthWithRequiredFalseAndProviders_WarnsNotFails Explicit Required: false is respected — warn, don't refuse. New recordLogger test helper captures Warn/Info calls so assertions can check what the runner emitted; mirrors the no-op nopLogger pattern already in use. Backward compat: - --no-auth alone (no auth: block): unchanged - --no-auth + providers + no required: warns, still anonymous - --no-auth + required: true: BREAKING — refuses to start. This is intentional; the previous behavior silently exposed a "required auth" deployment as anonymous, which the user explicitly tried to prevent in their forge.yaml. Verification: go test -race ./forge-core/auth/... ./forge-cli/runtime/ — green full sweep — 47/47 packages pass golangci-lint v2.10.1 — 0 issues
Re-review after fix commitsVerified the 4 fix commits against the working tree and ran ✅ Fixed and verified
❌ Still outstandingBlockers
Smaller bugs
Test gaps
SummaryThe 4 top blockers are cleanly resolved. The TTL fix elegantly subsumes the refresh-cooldown concern (#5). 5 blockers + 3 smaller bugs + 2 test gaps remain. None of the outstanding items are crypto/correctness in the JWT/JWKS path — they're now concentrated in the egress, audit signal, web UI surface, and maintenance areas. Most actionable next pass: (a) decide on the port-strip question (#2), (b) add server-side validation in |
…nitializ#5) Reviewer flagged: when refreshLocked errors, lastMissRefresh is not updated → every subsequent unknown-kid request hammers the IdP again → a flapping IdP becomes a DDoS amplifier (N callers × broken IdP = N IdP hits per request burst). The reviewer was looking at the pre-fix-initializ#1 code. Fix initializ#1 (commit 379d578) addressed this concern through a DIFFERENT mechanism: it added a lastAttempt + lastErr pair and a backoffActive() guard, so any refresh error (not just unknown-kid-after-refresh) suppresses subsequent refresh attempts inside the refetchGrace window. This commit makes the no-amplifier property explicit by: 1. Adding TestJWKS_FailedRefreshDoesNotHammerIdP — fires 50 unknown-kid requests during a backoff window and asserts the JWKS endpoint receives exactly ONE hit (the initial refresh that failed). Then advances time past refetchGrace and asserts the next request triggers EXACTLY ONE retry — and the burst after that retry stays capped at one IdP call again. 2. Adding a doc comment on backoffActive() that names the DDoS-amplifier concern explicitly and references the test that pins the contract, so future maintainers don't accidentally remove the gate. No code behavior change — fix initializ#1 already produced the right behavior; this commit makes the test coverage match. Verification: go test -race ./forge-core/auth/providers/oidc/ — green (incl. new test) full sweep — 39/39 packages pass golangci-lint v2.10.1 — 0 issues
…nitializ#6) Reviewer flagged: httpverifier classified 5xx + network errors as ErrInvalidToken. Fail-closed direction was correct, but the audit signal was misleading — operators reading audit logs saw "invalid token" alerts firing when the actual problem was the IdP being down. Different response, different runbook, distinct alert. Fix: - New sentinel auth.ErrProviderUnavailable. Same chain behavior as ErrInvalidToken (stop, fail-closed, 401 response), but distinct audit signal. - httpverifier reclassifies its error mapping: HTTP 200 + valid:true → Identity (unchanged) HTTP 200 + valid:false → ErrTokenRejected (unchanged) HTTP 401 → ErrTokenRejected (unchanged) HTTP other 4xx → ErrTokenRejected (NEW: was ErrInvalidToken) HTTP 5xx → ErrProviderUnavailable (CHANGED) Network/transport error → ErrProviderUnavailable (CHANGED) Undecodable body → ErrProviderUnavailable (CHANGED; 200-but-garbage is verifier misbehavior, not a token issue) Local marshal/build err → ErrInvalidToken (unchanged; never the verifier's fault) - middleware.classifyAuthFailure: new branch returns "auth provider unavailable" as the client-visible message. - runner.authFailReason: new reason code "provider_unavailable" in the audit event. Documented in the function's reason-code table. Audit-event contract additions: auth_fail { reason: "provider_unavailable", ... } Operators dashboarding on `reason:provider_unavailable` can now alert on IdP downtime distinctly from token issues. Tests (all -race clean): TestVerify_500_ReturnsProviderUnavailable (httpverifier) TestVerify_502BadGateway_ReturnsProviderUnavailable TestVerify_NetworkError_ReturnsProviderUnavailable TestVerify_Non401_4xx_ReturnsRejected (regression: 400/403 stay rejected) TestVerify_UndecodableBody_ReturnsProviderUnavailable TestAuthAudit_EmitsAuthFailOnError (new "provider_unavailable" row) TestClassifyAuthFailure (new "auth provider unavailable" message row) Backward compat: - Chain behavior preserved (ErrProviderUnavailable fail-closed like ErrInvalidToken). - Existing client retry logic that retried on "invalid token" responses will still retry on "auth provider unavailable" (both are 401-class). - ErrInvalidToken sentinel preserved for token-side failures elsewhere (oidc malformed JWT, bad kid, etc.) — unchanged. OIDC provider not updated in this commit. Its JWKS-fetch failures currently surface via the chain's "any other error fails closed" path, mapped to authFailReason "infrastructure". Could be tightened to ErrProviderUnavailable in a follow-up, but reviewer's specific concern was the misleading "invalid token" signal from httpverifier — OIDC already produces "infrastructure" which is at least not misleading. Verification: go test -race forge-core/auth/... forge-cli/runtime/ — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues
…iz#7) Reviewer asked: is the egress matcher port-agnostic? If yes, the current TestAuthDomains_PortStripped behavior is correct; if no, an issuer on :8443 is silently blocked at JWKS-fetch time. Investigation result: YES, the egress matcher IS port-agnostic. All three callsites strip the port off the outbound host before passing to matcher.IsAllowed: - egress_enforcer.go:40 — req.URL.Hostname() before IsAllowed - egress_proxy.go:210 — extractHost() via net.SplitHostPort - safe_dialer.go:42 — net.SplitHostPort(addr) So auth_domains.go emitting hostname-only entries is correct and consistent. No code fix needed. What WAS missing: the cross-package contract was implicit. If someone later hardens the matcher to be port-aware ("allow :443 but not :8080"), auth_domains.go would silently break — every forge.yaml that points at a non-443 IdP would suddenly fail JWKS fetches. This commit: - Adds a doc comment on hostFromURL() naming the three callsites whose port-stripping behavior we depend on. If any of them changes, auth_domains needs to follow. - Adds TestAuthDomains_AssumesPortAgnosticMatcher that asserts the contract directly: builds a matcher with the host AuthDomains produces, checks the hostname-only form is allowed, AND checks that host:port form is NOT (yet) allowed. If a future change makes the matcher port-aware, the second assertion flips and the test flags the cross-package break. No production code changes. The fix is preventative documentation + test, not behavior change. Verification: go test -race ./forge-core/security/ ./forge-core/auth/... — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues
…nitializ#8) Reviewer flagged: forge-ui/static/dist/app.js is a 110KB hand-edited ES-module file living under a path (dist/) that conventionally signals "generated build artifact, don't touch." There is no source folder, no package.json, no bundler config, no sourcemap. Every UI change has been a direct edit to the "dist" file. This will only get worse as more wizard steps land. Decision: rename out of dist/ rather than introduce a bundler. A bundler (esbuild, etc.) would add real operational debt — package.json, npm install in CI, version pinning, sourcemap rotation, build step in the Makefile, contributor onboarding cost — for files that are hand-written single-file ES modules with no JSX, no TypeScript, no imports beyond htm/preact CDN globals. The file IS the source. Removing the misleading dist/ name communicates that honestly. Changes: - git mv app.js / style.css / index.html / monaco/ up one level from forge-ui/static/dist/ → forge-ui/static/. History preserved via git rename detection. - static/embed.go: //go:embed all:dist → //go:embed app.js style.css index.html monaco. Explicit list keeps embed.go itself out of the served assets and documents the asset surface. - server.go: drop fs.Sub(static.FS, "dist") — the FS is now rooted directly at assets. Removes the unused "io/fs" import. The SPA fallback's static.FS.Open() replaces distFS.Open(). Manual smoke (against built binary): - GET / → 200 text/html (index.html) - GET /app.js → 200 text/javascript, 110469 bytes (unchanged) - GET /style.css → 200 - GET /monaco/editor.js → 200, 2317605 bytes - GET /create → 200 (SPA fallback still works) - GET /api/wizard/meta auth_provider_types length == 4 (PR6 unaffected) Future: if app.js ever does grow JSX/TS/minification needs, that's the right time to introduce a build step — into a NEW static/src/ directory producing into static/build/ or similar. Adding it preemptively today is over-engineering. Verification: go build forge-ui/... — green full sweep — 39/39 packages pass golangci-lint v2.10.1 — 0 issues manual end-to-end smoke test of all asset paths — all 200
…nitializ#9) Reviewer flagged: handleCreateAgent decoded JSON, checked only name and model_provider, then handed opts.Auth (with an arbitrary Settings map[string]any) straight to CreateFunc. The validate package's ValidateAuthConfig existed but was never called on this path — so a buggy frontend could POST an oidc payload missing issuer/audience, the agent directory would scaffold cleanly with a malformed auth: block, and the error would only surface when the user ran `forge run`. By then the directory, .env.example, and channel tokens are all on disk. Fix: - New validateAuthPayload(opts.Auth) helper in handlers_create.go. - Called from handleCreateAgent immediately after the name/provider checks, BEFORE CreateFunc runs. - Translates AuthCreateOptions → types.AuthConfig exactly the way cmd/ui.go's createFunc does, so this check sees the same shape the eventual scaffold will. No drift between "wizard accepts" and "forge validate accepts". - 400 with the validation errors joined as the response message. - Modes "" / "none" / "custom" skip validation — they don't write provider entries, nothing to validate. Validation surface (delegated to validate.ValidateAuthConfig): - oidc: issuer + audience required - http_verifier: url required - static_token: token or token_env required - unknown type: rejected with list of known types The frontend's pre-submit checks (canNext in app.js step 8) already guard the happy path, but those are defense in depth at best — anyone with curl can bypass them. This is the server-side enforcement that matters. Tests: TestHandleCreateAgent_OIDCMissingIssuerRejected 400, error names "issuer" TestHandleCreateAgent_OIDCMissingAudienceRejected 400, error names "audience" TestHandleCreateAgent_OIDCBothFieldsAccepted 201 (regression) TestHandleCreateAgent_HTTPVerifierMissingURLRejected 400, error names "url" TestHandleCreateAgent_UnknownAuthModeRejected 400, error names unknown type TestHandleCreateAgent_NoneAndCustomSkipValidation 201 for mode=none and mode=custom TestHandleCreateAgent_WithoutAuthPayload 201 with nil Auth (regression) Each failure test also asserts that CreateFunc was NOT called — the agent directory must never exist on disk for malformed auth. Verification: go test -race ./forge-ui/ — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues
…tializ#10) Reviewer flagged: resolveAuth's comment claimed the loopback static_token is "ALWAYS prepended at the chain head (regardless of source)", but the actual code is conditional on r.authToken != "". It works in practice only because ResolveAuth() always mints a token in the non-NoAuth path. Any future refactor that short-circuits that path would silently break channel-adapter callbacks — no compile error, no test failure, just "my Slack bot stopped working after we updated forge" weeks later. Fix is documentation + an invariant test. The conditional itself is correct (--no-auth path returns early before the prepend, and that path also doesn't mint a token), so the code stays as-is. What was missing is making the upstream dependency explicit so a future maintainer either preserves the invariant or notices when they don't. Changes: - resolveAuth doc: replaced "ALWAYS prepended" with an explicit description of the precondition (ResolveAuth() must have minted a token) and a reference to the pinning test. Inline comment at the prepend site explains why the conditional is defensive, not load-bearing. - ResolveAuth() doc: states the invariant it maintains (after nil return, either r.authToken != "" or r.cfg.NoAuth is true) and names the downstream consumer + test. - New tests: TestResolveAuth_InvariantMintsTokenInNonNoAuthPath Direct invariant pin: non-NoAuth + ResolveAuth() returns nil → r.authToken MUST be non-empty. Failure message tells future maintainers what they broke and where to look. TestResolveAuth_NoAuthPathDoesNotMintToken Counterpart: --no-auth path does NOT mint. Pins the "anonymous mode skips token generation" property too, so neither direction can drift. If someone ever refactors ResolveAuth to skip minting in the non-NoAuth path, the invariant test fails immediately — and the failure message points straight at the loopback-prepend dependency. Verification: go test -race ./forge-cli/runtime/ — green (incl. 2 new tests) full sweep — all packages pass golangci-lint v2.10.1 — 0 issues
Six related cleanups bundled per the reviewer's grouping: 11a. statictoken length leak (provider.go) subtle.ConstantTimeCompare returns 0 instantly on length mismatch without inspecting bytes, leaking the configured token's length via measurable timing differences across many trials. Fix: SHA-256 hash both sides and constant-time compare the 32-byte digests. The comparison is now over equal-length inputs in every case. Pinned by TestVerify_MismatchedLengths_AreRejected — exercises empty, prefix, suffix, off-by-one, and same-length-but-different inputs. 11b. OIDC PSS-default mismatch (jwks.go, provider.go) parseJWK was defaulting an RSA JWK with no `alg` field to "RS256". The cross-check at provider.go then rejected any PS256/PS384/PS512 token signed by the same RSA key — RFC 7518 says "infer from context" when alg is absent, and the same RSA key signs both RS* and PS* families. Fix: preserve empty alg through parseJWK. In the cross-check, when the JWK's alg is empty, fall back to trusting the token's header alg — which the asymmetric whitelist already validated upstream. When the JWK declares an alg explicitly, the strict comparison still applies (algorithm-confusion defense intact). Pinned by TestVerify_JWKWithoutAlgAcceptsAnyAsymmetric and TestVerify_JWKExplicitAlgStillStrict. 11c. mergeEgressDomains sort order (init_egress.go) deriveEgressDomains returned sorted output; mergeEgressDomains appended AuthEgressHosts unsorted at the tail. When AuthEgressHosts came from a map iteration, the rendered forge.yaml's egress.allowed_domains section diffed across runs even when the underlying input set was unchanged. Fix: sort the merged result. Generated forge.yaml is now stable. Pinned by TestMergeEgressDomains_SortedOutput. 11d. renderAuthBlock dead `name` parameter (init_auth.go, init.go) Caller passed opts.AuthMode for both the mode and name arguments; the name was suppressed by the equality check inside the function. The wizard never captures an explicit provider name today. Fix: dropped the parameter. If a future wizard step adds name capture, reintroduce it then. Existing tests updated; renamed TestRenderAuthBlock_NameSameAsModeSuppressed → TestRenderAuthBlock_NoNameEverEmitted with a doc-pinning intent. 11e. TUI backspace-on-empty undiscoverable (step_auth.go) Backspace-on-empty was the documented escape hatch out of an active sub-step input, but no visible affordance surfaced it. Fix: inline kbd hint under the input in each sub-step phase — "Backspace at empty back to picker". Doesn't change behavior, makes it discoverable. 11f. Identity.Claims unfiltered shallow copy (claims.go, provider.go) The OIDC provider's copyClaims returns the full JWT claim set on Identity.Claims, including custom claims the issuer adds (PII, raw group memberships, internal IDs). Downstream consumers reading Identity.Claims will see all of it. Fix is documentation for now: WARNING on both the Identity.Claims field and the copyClaims helper, naming the risk and pointing future consumers at typed Identity fields (UserID/Email/OrgID/Groups) as the portable surface. Filtering is properly a future authz-layer concern — not something this package can do without policy input. Verification: go test -race forge-core/... forge-cli/... — green full sweep — all packages pass golangci-lint v2.10.1 — 0 issues
Comment alignment in the candidate-tokens slice tripped CI's 'gofmt -l forge-core forge-cli forge-plugins' check. Mechanical fix — gofmt re-aligns trailing comments to the same column. CI Test job was failing here: Test → Check formatting → 'test -z $(gofmt -l ...)' exits 1 on the new TestVerify_MismatchedLengths_AreRejected block I added in b3d4264 (review #11a). Local gofmt -l also reports it now; pushed.
…ializ#12) Six coverage gaps and one hardening item from the reviewer's eight-point batch. Two items in the original list are already covered: 12.2 (--no-auth + YAML auth precedence) is exercised by the four TestResolveAuth_NoAuth* tests added in review initializ#4 commit b33522d. 12.4 (TestAuthDomains in forge-core/security) is exercised by the eight TestAuthDomains_* tests in security/auth_domains_test.go (added in PR3 + strengthened in review initializ#7 commit cf4ae14). The six real gaps: 12.1 Warning capture for --auth-url + YAML auth precedence TestBuildUserAuthChain_BothSourcesPrefersYAML was asserting the chain-behavior half of the contract through a nop logger that swallowed the Warn — silent regression on the "we logged a warning" half. Switched to recordLogger and added an assertion on "--auth-url and forge.yaml". If a refactor ever drops that warning, the test fails. 12.3 Full resolveAuth + YAML loopback head-of-chain test auth_chain_test.go exercised buildLegacyAuthChain. Nothing exercised resolveAuth() end-to-end with a YAML auth: block AND a minted internal token, then asserted "loopback is at chain head, ahead of YAML provider." Added TestResolveAuth_YAMLAuth_LoopbackPrependedAtChainHead that inspects chain.Providers() ordering directly and smoke-tests the loopback token verifies with the "internal" source. 12.5 Backend contract test for the frontend's claim_map translation app.js's buildAuthPayload translates flat groups_claim → nested claim_map.groups before submitting. The backend doesn't run that translation; if the frontend ever regresses, downstream YAML would silently drop the custom claim. Added two tests in the UI handler: - TestHandleCreateAgent_NestedClaimMapShape pins what the backend EXPECTS post-translation (claim_map.groups present). - TestHandleCreateAgent_FlatGroupsClaim_PreservedAsExtraField documents the failure mode if the JS regresses — the request succeeds structurally but claim_map is missing. Real JS tests would require a Node toolchain we deliberately don't add (see PR6 / review initializ#8). These Go tests are the next-best contract. 12.6 Backspace-to-picker behavior in step_auth.go Added TestAuthStep_BackspaceOnEmptyInputReturnsToPicker and a counterpart TestAuthStep_BackspaceWithContentDoesNotReturnToPicker to pin the escape-hatch behavior the review #11e hint advertises. 12.7 Invalid issuer URL blocks advance validateHTTPSURL was unit-tested standalone; never exercised as a wizard interaction. Added TestAuthStep_InvalidIssuerURLBlocksAdvance: bad URL + Enter → still on issuer phase; type a valid replacement + Enter → advances to audience. 12.8 renderAuthBlock YAML quoting hardening writeYAMLMap previously emitted "key: value" raw. Values containing ": " sequences, "#" markers, leading YAML-significant chars (!, *, -, [, {, etc.), reserved tokens (true/false/null/yes/no…), or control chars could break the parser. Added: - yamlScalar/needsYAMLQuoting helpers that emit double-quoted + escaped strings only when the input would otherwise change meaning. Plain values pass through unchanged so generated forge.yaml remains readable. - TestRenderAuthBlock_QuotesUnsafeValues (19 sub-cases) pinning the predicate's classification per input shape. - TestRenderAuthBlock_QuotedValueIsParseable round-trips a value containing ": ", "#", "!", and quotes through the renderer → yaml.v3 unmarshaler, asserting the rendered line is valid YAML AND decodes back to the original string. Note: the renderer remains a focused subset (string / number / bool / map). If auth-settings ever grow richer types, switch to yaml.v3 for the whole block. Verification: test -z "$(gofmt -l forge-core forge-cli forge-plugins)" — clean go test -race ./forge-core/... ./forge-cli/... — all green golangci-lint v2.10.1 — 0 issues
Round-3 review — reviews #5–#12Verified all 8 fix commits against the working tree. ✅ All items resolved
Minor note (acknowledged in commit message, not a blocker)OIDC's own JWKS-fetch failures still surface as Test suiteVerdictAll 12 review items across three rounds are resolved with corresponding tests pinning the behavior. The two highest-risk fixes in this round — LGTM — ready to merge. |
Summary
PR 1 of 7 implementing the pluggable auth provider chain (design:
FORGE_AUTHN_PROVIDER_PLUGIN.md).This PR introduces the foundation — the
Providerinterface,ChainProvidercomposition, factory registry, and migrates the existing
--auth-urlandchannel-loopback logic onto the new architecture. Zero externally observable
behavior change.
Where this fits in the architecture diagrams
This PR implements the highlighted boxes in the design diagrams.
Boxes marked "later" are scoped to subsequent PRs (see What's not in this PR).
Diagram 1 — Component architecture
forge-core/authpackage (Provider, Middleware, ChainProvider, Registry, Identity+Context)providers/statictokenproviders/httpverifierproviders/oidc(Provider, Discovery, JWKSCache)types.AuthConfig(forge.yaml block)security.AuthDomainshook (egress merge)runtime.audit(EventAuthVerify / EventAuthFail)step_auth.goAuthStepDiagram 2 — Class diagram
Providerinterface (Name,Verify)IdentitystructValidfield — sentinel errors are the contractHeaders(case-insensitive)ChainProviderRegistry(Register/Build/RegisteredTypes)database/sqldriver patternHTTPVerifierProviderStaticTokenProvidercrypto/subtleconstant-time compareOIDCProvider+ Discovery + JWKSCache + ClaimMapErrTokenNotForMe,ErrTokenRejected,ErrInvalidToken,ErrProviderNotConfiguredDiagram 3 — Startup sequence (config → chain)
Implements the runner → Registry.BuildChain → PrependChain(internal static_token) → install Middleware leg.
The
AuthDomains → Egressstep is scoped to PR 3.Diagram 4 — Request verification (chain walk)
Fully implemented (chain walk,
WithIdentity, response handling).The
auth_verify/auth_failaudit event emission is scoped to PR 4.Diagram 5 — OIDC verify deep-dive
Out of scope — entirely in PR 2.
What this PR adds
New files (
forge-core/auth/)provider.go—Providerinterface,Identity,Headers,HeadersFromRequest, sentinel errorschain_provider.go—ChainProvider(first-match wins, fail-closed) +PrependChainhelper (flattens nested chains)context.go—WithIdentity/IdentityFromContextregistry.go—Register/Build/RegisteredTypeswith concurrency safetysettings.go—UnmarshalSettings(yaml roundtrip so providers keep typedConfigstructs)providers/httpverifier/provider.go— migrated--auth-urllogic; same JSON wire formatproviders/statictoken/provider.go— constant-time compare; env-var or literal token; defensive identity copyModified files
forge-core/auth/middleware.go— gutted from 213 → 130 lines; now delegates to aProviderchainforge-cli/runtime/runner.go—resolveAuthnow returnsauth.MiddlewareOptions; newbuildLegacyAuthChaintranslates legacy CLI flags into a chainTests
forge-core/auth/provider_test.go— chain ordering, fail-closed semantics, registry concurrency, headers, contextforge-core/auth/middleware_test.go— rewritten to useMiddlewareOptionsAPIforge-core/auth/providers/httpverifier/provider_test.go— happy path,valid:false, 401/500, network error, org-id precedence, wire-format lock, context cancellationforge-core/auth/providers/statictoken/provider_test.go— match/mismatch,TokenEnvprecedence, defensive copy, race safety, factoryforge-cli/runtime/auth_chain_test.go— 11 e2e tests covering legacy--auth-url, loopback short-circuit, skip paths, wire format, verifier-unreachableWhat's NOT in this PR (deliberate)
auth:YAML block in forge.yaml — PR 3auth_verify/auth_failaudit events — PR 4Backward compatibility
--auth-url https://x/verifyhttp_verifierchain (unchanged wire format)FORGE_AUTH_URL=...envhttp_verifierchain (unchanged)FORGE_AUTH_ORG_ID=...http_verifier'sDefaultOrgsettingstatic_tokenprovider (loopback short-circuits before external verifier)POST {url}with{token, org_id}body, same response shapeVerification (all pass locally)
go build ./forge-core/... ./forge-cli/... ./forge-plugins/...cleango test ./forge-core/...— 25/25 packages greengo test ./forge-cli/...— 14/14 packages greengo test -race ./forge-core/auth/... ./forge-cli/runtime/— greengo vetcleangofmtcleanforge-core/auth/— 94.9%httpverifier/— 90.9%statictoken/— 96.6%http.DefaultClientin auth code==token compare (constant-time only)Test plan (manual smoke)
forge runwith no auth flags → anonymous access works (curl :8000/healthz→ 200)forge run --auth-url http://localhost:9999/verifywith a fake verifier acceptinggood-token→Bearer good-tokenaccepted,Bearer badreturns 401Notes for reviewers
Identity.Validwas intentionally dropped — the package comment explains why (nil-error return is the only "valid" signal)ChainProvideris fail-closed on non-yield errors (e.g., a network error from one provider does NOT fall through to the next). This is intentional security posturestatic_tokenprovider returnsErrTokenNotForMeon mismatch (yield), thehttp_verifierreturnsErrTokenRejected(deny) — this difference is what enables the loopback short-circuit in a chain[static_token, http_verifier]