Skip to content

refactor(auth): introduce pluggable Provider chain (PR1 foundation)#77

Merged
initializ-mk merged 20 commits into
initializ:mainfrom
naveen-kurra:pr1-auth-provider-foundation
May 23, 2026
Merged

refactor(auth): introduce pluggable Provider chain (PR1 foundation)#77
initializ-mk merged 20 commits into
initializ:mainfrom
naveen-kurra:pr1-auth-provider-foundation

Conversation

@naveen-kurra
Copy link
Copy Markdown
Collaborator

Summary

PR 1 of 7 implementing the pluggable auth provider chain (design: FORGE_AUTHN_PROVIDER_PLUGIN.md).
This PR introduces the foundation — the Provider interface, ChainProvider
composition, factory registry, and migrates the existing --auth-url and
channel-loopback logic onto the new architecture. Zero externally observable
behavior change.

Where this fits in the architecture diagrams

This PR implements the highlighted boxes in the design diagrams.
Boxes marked "later" are scoped to subsequent PRs (see What's not in this PR).

Diagram 1 — Component architecture

Box Status
forge-core/auth package (Provider, Middleware, ChainProvider, Registry, Identity+Context) ✅ this PR
providers/statictoken ✅ this PR
providers/httpverifier ✅ this PR
providers/oidc (Provider, Discovery, JWKSCache) ⏳ PR 2
types.AuthConfig (forge.yaml block) ⏳ PR 3
security.AuthDomains hook (egress merge) ⏳ PR 3
runtime.audit (EventAuthVerify / EventAuthFail) ⏳ PR 4
TUI wizard step_auth.go ⏳ PR 5
Web UI AuthStep ⏳ PR 6

Diagram 2 — Class diagram

Type Status Notes
Provider interface (Name, Verify)
Identity struct Deliberately no Valid field — sentinel errors are the contract
Headers (case-insensitive)
ChainProvider First-match wins; fail-closed on non-yield errors
Registry (Register / Build / RegisteredTypes) database/sql driver pattern
HTTPVerifierProvider wire format preserved byte-for-byte
StaticTokenProvider crypto/subtle constant-time compare
OIDCProvider + Discovery + JWKSCache + ClaimMap ⏳ PR 2
Sentinel errors: ErrTokenNotForMe, ErrTokenRejected, ErrInvalidToken, ErrProviderNotConfigured

Diagram 3 — Startup sequence (config → chain)

Implements the runner → Registry.BuildChain → PrependChain(internal static_token) → install Middleware leg.
The AuthDomains → Egress step is scoped to PR 3.

Diagram 4 — Request verification (chain walk)

Fully implemented (chain walk, WithIdentity, response handling).
The auth_verify / auth_fail audit event emission is scoped to PR 4.

Diagram 5 — OIDC verify deep-dive

Out of scope — entirely in PR 2.

What this PR adds

New files (forge-core/auth/)

  • provider.goProvider interface, Identity, Headers, HeadersFromRequest, sentinel errors
  • chain_provider.goChainProvider (first-match wins, fail-closed) + PrependChain helper (flattens nested chains)
  • context.goWithIdentity / IdentityFromContext
  • registry.goRegister / Build / RegisteredTypes with concurrency safety
  • settings.goUnmarshalSettings (yaml roundtrip so providers keep typed Config structs)
  • providers/httpverifier/provider.go — migrated --auth-url logic; same JSON wire format
  • providers/statictoken/provider.go — constant-time compare; env-var or literal token; defensive identity copy

Modified files

  • forge-core/auth/middleware.go — gutted from 213 → 130 lines; now delegates to a Provider chain
  • forge-cli/runtime/runner.goresolveAuth now returns auth.MiddlewareOptions; new buildLegacyAuthChain translates legacy CLI flags into a chain

Tests

  • forge-core/auth/provider_test.go — chain ordering, fail-closed semantics, registry concurrency, headers, context
  • forge-core/auth/middleware_test.go — rewritten to use MiddlewareOptions API
  • forge-core/auth/providers/httpverifier/provider_test.go — happy path, valid:false, 401/500, network error, org-id precedence, wire-format lock, context cancellation
  • forge-core/auth/providers/statictoken/provider_test.go — match/mismatch, TokenEnv precedence, defensive copy, race safety, factory
  • forge-cli/runtime/auth_chain_test.go — 11 e2e tests covering legacy --auth-url, loopback short-circuit, skip paths, wire format, verifier-unreachable

What's NOT in this PR (deliberate)

  • OIDC provider — PR 2
  • auth: YAML block in forge.yaml — PR 3
  • Egress allowlist auto-extension for IdP hosts — PR 3
  • auth_verify / auth_fail audit events — PR 4
  • TUI wizard step — PR 5
  • Web UI wizard step — PR 6
  • Documentation updates — PR 7
  • Okta provider — Phase 3 (v0.11.0)

Backward compatibility

Existing config Behavior after this PR
No auth configured Anonymous access (unchanged)
--auth-url https://x/verify Synthesizes http_verifier chain (unchanged wire format)
FORGE_AUTH_URL=... env Synthesizes http_verifier chain (unchanged)
FORGE_AUTH_ORG_ID=... Passed to http_verifier's DefaultOrg setting
Channel adapters (Slack/Telegram) Continue to authenticate via auto-injected static_token provider (loopback short-circuits before external verifier)
Existing custom verifier sidecar Works unchanged — same POST {url} with {token, org_id} body, same response shape

Verification (all pass locally)

  • go build ./forge-core/... ./forge-cli/... ./forge-plugins/... clean
  • go test ./forge-core/... — 25/25 packages green
  • go test ./forge-cli/... — 14/14 packages green
  • go test -race ./forge-core/auth/... ./forge-cli/runtime/ — green
  • go vet clean
  • gofmt clean
  • Coverage forge-core/auth/94.9%
  • Coverage httpverifier/90.9%
  • Coverage statictoken/96.6%
  • Anti-pattern grep: no http.DefaultClient in auth code
  • Anti-pattern grep: no plain == token compare (constant-time only)
  • Anti-pattern grep: no token/secret in log statements

Test plan (manual smoke)

  • forge run with no auth flags → anonymous access works (curl :8000/healthz → 200)
  • forge run --auth-url http://localhost:9999/verify with a fake verifier accepting good-tokenBearer good-token accepted, Bearer bad returns 401
  • Slack/Telegram channel adapter still receives messages and authenticates loopback calls without the external verifier being hit (check verifier access logs are empty during channel events)
  • Verifier unreachable → request returns 401, not 500

Notes for reviewers

  • The middleware refactor preserves the existing public skip-paths list byte-for-byte
  • Identity.Valid was intentionally dropped — the package comment explains why (nil-error return is the only "valid" signal)
  • ChainProvider is fail-closed on non-yield errors (e.g., a network error from one provider does NOT fall through to the next). This is intentional security posture
  • The static_token provider returns ErrTokenNotForMe on mismatch (yield), the http_verifier returns ErrTokenRejected (deny) — this difference is what enables the loopback short-circuit in a chain [static_token, http_verifier]

Naveen Kurra added 7 commits May 22, 2026 15:34
Replaces the monolithic auth middleware with a Provider interface,
ChainProvider composition, and a database/sql-style factory registry.
Existing --auth-url logic is migrated into the new http_verifier
provider; the implicit channel-loopback token becomes a static_token
provider injected at the chain head.

No externally observable behavior change. Backward compatible with all
existing --auth-url / FORGE_AUTH_URL / FORGE_AUTH_ORG_ID flags and env
vars; channel-adapter loopback continues to short-circuit before the
external verifier is consulted.

Why: PR2 (OIDC), PR3 (forge.yaml auth: block), and Phase 3 (Okta)
plug in via Register() — no further changes to middleware or runner.

Tests:
- 94.3% combined coverage across forge-core/auth/...
- 11 e2e tests in forge-cli/runtime/auth_chain_test.go verifying
  legacy --auth-url, loopback short-circuit, skip-paths, wire format
- -race clean across all touched packages
Implements auth.Provider for any OIDC-compliant issuer (Auth0, Keycloak,
Azure AD, Google Workspace, Ping, JumpCloud, Okta in OIDC-only mode).

Security posture:
- Reject alg=none and HMAC algorithms (HS256/384/512) outright
- Whitelist asymmetric algs: RS256/384/512, PS256/384/512, ES256/384/512
- Algorithm derived from JWKS-declared alg, NOT token header — defends
  against algorithm-confusion attacks
- iss exact match, aud contains check with optional azp == ClientID
  fallback, exp/nbf with configurable ClockSkew

Performance:
- Lazy discovery + JWKS load (no network on New)
- Discovery cached forever (per OIDC spec, issuer is stable)
- JWKS cache with sync.RWMutex hot-path; refresh on unknown kid
- Stampede protection: refetch-grace window suppresses repeated JWKS
  fetches for the same bad kid

Why: PR3 (forge.yaml auth: block) can now plug `type: oidc` directly
into the auth.providers list. Phase 3 (Okta) composes oidc.Provider
to add introspection + group sync without rewriting JWT verification.

Tests (coverage 83.3% on oidc, 88.0% combined auth):
- Happy paths: RS256, ECDSA P-256
- Security: alg=none reject, HMAC reject, alg-confusion reject,
  missing-kid reject, HMAC-in-JWKS reject
- Standard claims: wrong iss, wrong aud, azp fallback, expired exp,
  not-yet-valid nbf, ClockSkew tolerance, missing exp, missing aud,
  multi-audience array
- JWKS: lazy load, rotation triggers refresh, unknown kid after refresh,
  stampede prevention, malformed RSA key skipped, 500 response error
- Discovery: issuer-mismatch rejected, JWKSURL override bypasses discovery
- ClaimMap: defaults, custom names, X-Org-ID header override, groups
  as single-string tolerated
- Concurrency: 50 parallel Verify calls under -race
- Context: respects cancellation
- Factory: registers as "oidc", forward-compat with unknown YAML keys

No regressions:
- All forge-core packages pass (28/28)
- All forge-cli packages pass (14/14)
- All forge-plugins packages pass (4/4)
- Race detector clean across auth/... + cli/runtime
- gofmt + go vet clean
- Anti-pattern greps clean (no http.DefaultClient, no plain == on
  tokens, no token/secret in logs, no disallowed algs accepted)
Adds the typed AuthConfig schema, validation, and the AuthDomains
egress hook. The runner now builds the provider chain via
auth.Registry.BuildChain(cfg.Auth.Providers) when forge.yaml has an
auth: block, falling back to the legacy --auth-url path otherwise.

Schema:
  auth:
    required: true
    providers:
      - type: oidc
        name: corporate-sso
        settings:
          issuer: https://login.example.com
          audience: api://forge

Precedence:
  --no-auth                 → anonymous (unchanged)
  forge.yaml auth.providers → Registry.BuildChain
  --auth-url / FORGE_AUTH_URL → legacy http_verifier
  nothing                   → loopback-only chain

When both forge.yaml auth: and --auth-url are configured, the YAML
block wins and a warning is logged. Loopback static_token is always
prepended for channel-adapter callbacks.

Egress integration:
  security.AuthDomains(cfg.Auth) returns issuer/verifier hosts.
  The runner merges these into the egress allowlist BEFORE the
  enforcer is constructed — without this, an OIDC issuer configured
  in forge.yaml would be silently blocked at JWKS-fetch time.

Validation:
  validate.ValidateAuthConfig is called from ValidateForgeConfig.
  Each provider type's required settings are checked at `forge
  validate` time (not just at runtime construction):
  - oidc: issuer + audience
  - http_verifier: url
  - static_token: token or token_env (literal token warns)
  - unknown type: error with list of known types

Side-effect import: cmd-side runner.go imports providers/oidc as `_`
so auth.Build("oidc", settings) succeeds via factory registration.

Tests (88.6% validate, 90.8% security, 88% combined auth):
- Validation: every provider type, missing required fields,
  unknown type, missing type, duplicate names warning,
  ValidateForgeConfig integration
- AuthDomains: oidc issuer, http_verifier URL, JWKS-URL override,
  multi-provider dedup, static_token contributes nothing, malformed
  URLs skipped, port stripping, unknown provider returns nothing
- Runner: buildChainFromConfig (empty, single provider, ordered
  chain with first-match-wins, invalid type, factory error),
  buildLegacyHTTPVerifierChain, buildUserAuthChain precedence rules
  (neither / only legacy / only YAML / both prefers YAML / oidc via
  registry)

No regressions:
- All forge-core packages pass (28/28)
- All forge-cli packages pass (15/15)
- All forge-plugins packages pass (4/4)
- Race detector clean across auth/.../validate/security/runtime
- gofmt + go vet clean
- Anti-pattern greps clean (no DefaultClient, no token-in-logs,
  no plain == on tokens, oidc uses injected client only)
golangci-lint v2.10.1 flagged removeKey as unused — it was scaffolded
for a key-rotation test case that was implemented differently (by
adding a new key without removing the old). Dropping the dead code
keeps CI lint green.
Replaces the thin {method, path, remote_addr} payload of the legacy
auth_success/auth_failure events with structured fields suitable for
operators to dashboard and alert on, without leaking PII.

Events:

  auth_verify (success)
    { provider, user_id, org_id, groups_count, token_kind,
      method, path, remote_addr }

  auth_fail (failure)
    { reason, token_kind, method, path, remote_addr }

Reason codes (stable contract — changing is a breaking change for
downstream consumers):

  missing_token   no Authorization header
  rejected        provider recognized + denied (revoked, expired, wrong iss/aud)
  invalid         token malformed / cryptographically invalid
  not_for_me      chain exhausted, no provider claimed the token
  infrastructure  transient provider error (network, etc.)

Privacy invariants enforced by test:
  - email, claims, token bytes NEVER appear in either event
  - test asserts the literal absence of email / SSN / secret values
    from the serialized audit line

Interface changes:

  auth.MiddlewareOptions.OnAuth signature widened from
    func(*http.Request, bool)
  to
    func(*http.Request, *auth.Identity, error, string)
  so the caller can read the verified principal + chain error + token
  kind, instead of just a bool. Only caller (runner.go) updated.

  New: auth.ErrMissingBearer sentinel for the no-header case so callers
  can distinguish missing-token from chain errors.

  New: auth.TokenKind(token) returns "jwt" | "opaque" | "empty" — cheap
  structural classification suitable for logging.

Backward compat:

  AuditAuthSuccess and AuditAuthFailure constants kept as deprecated
  aliases (point to the new "auth_verify"/"auth_fail" strings) so any
  consumer that grep'd for the old constants compiles but emits the
  new string. Scheduled for removal in v0.11.0.

Tests (100% coverage on TokenKind, makeAuthAuditCallback, authFailReason):
  - TokenKind: jwt/opaque/empty/edge-cases
  - Middleware: OnAuth receives (Identity, err, kind) on every decision
  - Middleware: ErrMissingBearer surfaces on no-header requests
  - Audit: auth_verify carries provider/user_id/org_id/groups_count/kind
  - Audit: auth_fail carries stable reason codes for each sentinel
  - Audit: PII-leak test (email + secret claims never appear in payload)
  - Audit: CorrelationID propagates from request context
  - E2E: full middleware → callback → audit log path

No regressions:
  - All forge-core packages pass (28/28)
  - All forge-cli packages pass (15/15)
  - All forge-plugins packages pass (4/4)
  - go test -race clean across affected packages
  - golangci-lint v2.10.1: 0 issues
  - gofmt + go vet clean
Adds the Authentication step to the `forge init` wizard so users can
configure the auth chain without hand-writing forge.yaml. Same surface
is exposed via non-interactive --auth* flags for CI/script use.

Interactive flow (between Egress and Review steps):

  🔐 Authentication
    ○ None              — anonymous (default)
    ○ OIDC (JWT)        — Auth0, Keycloak, Azure AD, Google, Okta-OIDC, …
    ○ HTTP Verifier     — legacy external /verify endpoint
    ○ Custom            — write a commented stub, edit forge.yaml manually

  OIDC sub-flow: Issuer → Audience → Groups Claim (defaults to "groups")
  HTTP Verifier sub-flow: URL → Default org_id (optional)
  None / Custom: pick and continue, no inputs

Non-interactive flags:

  --auth=none|oidc|http_verifier|custom
  --auth-issuer=<url>            (required with --auth=oidc)
  --auth-audience=<aud>          (required with --auth=oidc)
  --auth-url=<url>               (required with --auth=http_verifier)
  --auth-default-org=<org>       (http_verifier, optional)
  --auth-groups-claim=<name>     (oidc, default: groups)

Scaffold output:
- Adds auth: block to forge.yaml (alphabetical keys for stable diffs)
- Auto-merges the issuer/verifier host into egress.allowed_domains
- For custom mode: writes a commented-out template stub instead
- Settings are pre-rendered as YAML in Go (init_auth.go) so nested
  maps like claim_map: {groups: roles} emit correctly without
  template helpers

UX details:
- "None" is the first option; one Enter to skip the entire step
- Backspace on empty input returns to the picker (consistent with
  other wizard steps)
- Live validation on issuer/URL fields: must be parseable + http(s)
- The wizard's existing egress preview displays the auth-host
  auto-add in the final review step

No runtime behavior change — PR3 already wired forge.yaml auth: into
the runner. This PR is pure UX: easier to author the YAML.

Tests (TUI step + scaffold helpers):
- All 4 mode flows: None (default), Custom (stub), OIDC (full + custom
  groups claim), HTTP Verifier (with/without default org)
- Issuer trailing-slash trimmed; groups claim suppressed at default
- AuthEgressHosts populated correctly per mode
- Apply() correctness for each mode
- Summary text per mode
- renderAuthBlock: empty/none/custom/oidc/http_verifier shapes,
  nested claim_map, deterministic key ordering, name suppression
  when same as type
- buildAuthFromFlags: happy paths, missing required fields, unknown
  modes, claim-map suppression at default
- mergeEgressDomains: dedupe + order preservation
- End-to-end: `forge init --auth=oidc` produces a forge.yaml that
  passes `forge validate`
- E2E negative: `--auth=oidc` without --auth-issuer fails with the
  expected error message

No regressions:
- All forge-core packages pass (28/28)
- All forge-cli packages pass (15/15)
- All forge-plugins packages pass (4/4)
- golangci-lint v2.10.1: 0 issues
- gofmt + go vet clean
Adds the matching Authentication step to the Web UI wizard (`forge ui`)
so users without a terminal can configure the auth chain through the
browser-based create-agent flow. Same backend contract as PR5 — both
surfaces feed the single scaffold path that writes forge.yaml.

Backend changes:
- forge-ui/types.go: new AuthCreateOptions (mode + settings) field on
  AgentCreateOptions; new AuthProviderTypeMeta + AuthProviderTypes on
  WizardMetadata.
- forge-ui/handlers_create.go: GET /api/wizard/meta now returns the
  four founding auth provider types so the frontend renders the
  picker from server-driven metadata (adding Okta in Phase 3 = one
  line here, no frontend change).
- forge-cli/cmd/ui.go: createFunc translates opts.Auth into the
  initOptions fields the scaffold expects (AuthMode, AuthSettings,
  AuthEgressHosts).
- forge-cli/cmd/init_auth.go: new authEgressHostsFromSettings helper
  extracts issuer/verifier hosts for the Web UI path (same hosts the
  TUI wizard and --auth flags compute).

Frontend changes (forge-ui/static/dist/app.js):
- WIZARD_STEPS extended from 9 to 10 entries (Authentication slotted
  between Env Vars and Review).
- form.auth = {mode, settings} added to initial state.
- Step renders a radio group fed by meta.auth_provider_types, with
  conditional OIDC / HTTP Verifier sub-forms.
- canNext validation: OIDC requires issuer + audience; HTTP Verifier
  requires url.
- buildAuthPayload reshapes the wizard's flat groups_claim into the
  nested claim_map: {groups: ...} shape before submission; strips
  empty optional fields and trims trailing slashes from issuer.
- Review step adds an "A2A Auth" row with a one-line summary.

UX details:
- None is the default — anonymous unless the user picks otherwise.
- Switching modes resets settings so stale fields from a previously
  selected mode don't leak into the submitted payload.
- Custom mode writes a commented stub the user can edit later.

Backward compat:
- Requests without `auth` continue to work (anonymous access).
- AuthCreateOptions.Settings uses `map[string]any` so future provider
  types add fields server-side without breaking older payloads.

Tests:
- handleGetWizardMeta now asserts all 4 auth provider types are
  exposed with non-empty label + description.
- New TestHandleCreateAgent_WithAuthPayload: AgentCreateOptions.Auth
  round-trips through the JSON boundary with correct settings.
- New TestHandleCreateAgent_WithoutAuthPayload: omitted auth field
  keeps Auth nil (no spurious anonymous-mode payload).
- New authEgressHostsFromSettings tests: oidc, oidc+jwks_url,
  http_verifier, no-network modes, nil settings, malformed URL.
- End-to-end smoke (manual): forge ui POST /api/agents with auth
  payload → generated forge.yaml has auth: block + login.example.com
  auto-added to egress.allowed_domains.

No regressions:
- All 4 modules pass (forge-core, forge-cli, forge-plugins, forge-ui)
- go test -race clean on touched packages
- golangci-lint v2.10.1: 0 issues
- gofmt + go vet clean
@initializ-mk
Copy link
Copy Markdown
Contributor

Code Review

Reviewed against the two-phase plan (Phase 1: static_token + http_verifier with --auth-url backward compat; Phase 2: OIDC + Okta). Per the decision to ship both phases together, findings are consolidated below in priority order.

Architecture and security defenses are largely sound. The items below are fixable without redesign.


Blockers — security / correctness

1. JWKS cache TTL is dead code

forge-core/auth/providers/oidc/jwks.go:4877–4881ttl/lastFetch are set but never read. Rotated or compromised keys never expire until process restart. The doc claims "we refresh on miss," but a key removed from the IdP's JWKS never invalidates locally. Either implement TTL-driven refresh or drop the field. High severity.

2. Issuer trailing-slash inconsistency

forge-core/auth/providers/oidc/provider.go:5276 vs forge-core/auth/providers/oidc/discovery.go:4747. newDiscovery trims / from the configured issuer, but jwt.WithIssuer and the discovery-doc compare use mixed-slash forms. Any IdP that emits iss with a trailing slash gets spurious rejections. Fix: normalize in Config.New so all comparisons use the same form.

3. Middleware silently disables auth when Chain == nil

forge-core/auth/middleware.go:2895 returns a no-op wrapper. A misconfigured runner that forgets to wire a chain silently runs unauthenticated with no panic. Fail-loud (panic or require explicit AnonymousChain) is safer.

4. --no-auth + YAML Required: true silently grants anonymous access

forge-cli/runtime/runner.go:1753 short-circuits before checking whether cfg.Auth.Required is set or cfg.Auth.Providers is non-empty. A user who set auth: { required: true, providers: [...] } and ran --no-auth gets anonymous access with no warning. At minimum warn; refuse outright when Required: true.

5. JWKS refresh failure does not update cooldown

forge-core/auth/providers/oidc/jwks.go:4933 — when refreshLocked errors, lastMissRefresh is not updated. Every subsequent unknown-kid request hammers the IdP again. For a flapping IdP, this is a DDoS amplifier. Update lastMissRefresh (or a separate lastFailedRefresh) on error too.

6. httpverifier classifies 5xx/network errors as ErrInvalidToken

forge-core/auth/providers/httpverifier/provider.go:4227, 4239. Fail-closed direction is correct, but the audit signal is misleading: operators reading audit logs see "invalid token" when the IdP is down. Consider a separate ErrProviderUnavailable.

7. auth_domains.go strips port via u.Hostname()

forge-core/security/auth_domains.go:88 — for issuers like https://login.example.com:8443, only the hostname is added to the egress allowlist. Verify the enforcer is port-agnostic, or return u.Host (host:port).


Blockers — process / maintainability

8. forge-ui/static/dist/app.js hand-edited under dist/

2756-line bundle file with no source file, no package.json, no bundler, no sourcemap in the repo. 169 lines of new auth UI live in a path that conventionally signals "generated artifact, don't edit." Either rename to static/app.js to remove the misleading dist/ directory, or introduce a real source folder + minimal esbuild step. Will only get worse as more wizard steps land.

9. Web UI does not validate auth payload server-side

forge-ui/handlers_create.go:135–160 decodes JSON, checks only name and model_provider, then hands raw opts.Auth (including arbitrary Settings map[string]any) to CreateFunc. validate.ValidateAuthConfig exists but is not invoked on this path. A buggy client could write empty issuer/audience into YAML and only fail at forge run. Cheap fix: call ValidateAuthConfig in handleCreateAgent before delegating.

10. Loopback "ALWAYS prepended" doc overstates the code

forge-cli/runtime/runner.go:1744 — the prepend is conditional on r.authToken != \"\". Currently works in practice because ResolveAuth always mints a token, but any future refactor that short-circuits that path will silently break channel adapters. Tighten doc to "whenever an internal token was minted" or unconditionally mint one before the prepend.


Smaller bugs

  • statictoken length leakforge-core/auth/providers/statictoken/provider.go:6567. subtle.ConstantTimeCompare is not constant-time across differing lengths. Worth a comment or hash-then-compare.
  • OIDC PSS default-when-empty algforge-core/auth/providers/oidc/jwks.go:5028–5031. JWKS entry with no alg field defaults to RS256; legitimate PS-signed tokens mismatch the cross-check at provider.go:5348.
  • mergeEgressDomains breaks sort orderforge-cli/cmd/init.go:1048. deriveEgressDomains returns sorted, then auth hosts get appended unsorted at the end. Diff stability suffers.
  • renderAuthBlock name parameter is deadforge-cli/cmd/init.go:1053 calls with mode twice, so name: never renders. Either drop the parameter or wire a real name source.
  • TUI esc mid-input aborts the whole wizardforge-cli/internal/tui/wizard.go:131–135. Backspace-on-empty is the documented escape hatch but undiscoverable.
  • Identity.Claims is unfiltered shallow copy of JWTforge-core/auth/providers/oidc/claims.go:4612. Custom claims could leak into downstream consumers. Worth a doc warning.

Test gaps

  1. Warning-log assertion on YAML/--auth-url precedenceforge-cli/runtime/auth_config_test.go:225 uses a nop logger that discards the Warn. The "warns when both configured" half of the contract is unasserted. Capture warnings into a slice-logger.
  2. No test for --no-auth + YAML auth precedence (Bug Tavily web search, Bubble Tea wizard, and bug fixes #4).
  3. No test that loopback static_token is actually prepended in the full resolveAuth flow with YAML auth — only buildLegacyAuthChain is exercised in auth_chain_test.go.
  4. No TestAuthDomains in forge-core/security/ at all. auth_domains.go has zero direct unit tests despite being the egress contract — add coverage for bare hostname, port, malformed URL, dedup.
  5. No web-handler test for buildAuthPayload translating flat groups_claim → nested claim_map. Since app.js is hand-edited, this is risky.
  6. No wizard test for backspace-to-picker behavior in step_auth.go.
  7. No test for invalid URL submission on the OIDC issuer wizard step (validation lives in validateHTTPSURL, tested standalone, but not as a wizard interaction).
  8. renderAuthBlock YAML quotingwriteYAMLMap (forge-cli/cmd/init_auth.go:194) writes key: value raw. Values containing : followed by space, #, leading !, &, * could break the YAML. Low risk but a hardening gap with no test.

What's strong (worth keeping as-is)

  • Alg-confusion defense is layered and correct — three independent checks: WithValidMethods at parser construction, isAllowedAlg pre-JWKS, and key.Alg != headerAlg cross-check. Alg sourced from JWKS, not token header.
  • HMAC and alg=none rejected at three layers including JWKS load time — even a tampered JWKS source can't inject HMAC keys.
  • ChainProvider fail-closed semantics — exactly what the docstring promises (chain_provider.go:2685–2691); PrependChain correctly flattens nested chains.
  • PII negative-substring assertion in audit testforge-cli/runtime/auth_audit_test.go:123–154 is exactly the right shape, including key-name assertions (\"claims\", \"email\").
  • Single source-of-truth scaffold path — TUI, --auth flags, and Web UI all converge through initOptions.{AuthMode,AuthSettings,AuthEgressHosts}. Server-driven auth_provider_types metadata in handlers_create.go:124–129 makes adding Okta in Phase 3 a one-line server change.
  • Validation cleanly decoupled from provider implementationsforge-core/validate/auth.go:9–15 comment explains the trade-off well. Keeps validate import-free of crypto.
  • Pre-rendering YAML in Go rather than template helpers — right call for nested maps (e.g., claim_map). Alphabetical sort makes diffs stable.
  • Test coverage — auth/ 94.9%, httpverifier 90.9%, statictoken 96.6%, oidc 83.3%. Race-clean across the board.

Verified PR claims (all check out)

  • alg=none and HMAC algorithms rejected at three layers ✓
  • Algorithm derived from JWKS-declared alg, not token header (alg-confusion defense) ✓
  • iss exact match, aud contains check with azp == ClientID fallback ✓
  • exp/nbf with configurable ClockSkew ✓
  • JWKS cache with sync.RWMutex hot path ✓
  • Stampede protection via refetch-grace ✓ (but does not cover refresh failures — see Resolve Custom Tool Entrypoint Relative to Working Directory in Runner #5)
  • Constant-time compare for static_token ✓ (with caveat — see smaller bugs)
  • ChainProvider fail-closed on non-yield errors ✓
  • No http.DefaultClient, no token in logs, no plain == on tokens ✓
  • Backward compatibility: --auth-url, FORGE_AUTH_URL, FORGE_AUTH_ORG_ID, channel adapter loopback all preserved ✓
  • Deprecated AuditAuthSuccess/AuditAuthFailure aliases point to new event names ✓
  • ErrMissingBearer sentinel exists and distinguishes missing-header from chain errors ✓
  • TokenKind() returns \"jwt\" | \"opaque\" | \"empty\" without leaking token bytes ✓
  • Side-effect import _ \"…/providers/oidc\" in runner.go:24

Naveen Kurra added 4 commits May 23, 2026 14:15
Before this fix the JWKS cache's ttl/lastFetch fields were set but never
read — the cache only refreshed on unknown-kid lookups. Real-world IdPs
revoke keys by REMOVING them from the JWKS document, which means a stolen
private key remained trusted by Forge until process restart. The
"refresh-on-miss" semantics handle key ROTATION (new kid arrives) but
not key REVOCATION (kid silently disappears).

Reviewer flagged: forge-core/auth/providers/oidc/jwks.go — high severity.

Fix:
- Split refresh-time state into two fields:
    lastSuccessful — drives TTL accounting; only advanced on success
    lastAttempt    — drives error backoff; advanced on every attempt
  This stops a failed refresh during an IdP outage from silently
  extending trust in already-cached keys.
- Get() now triggers a refresh when (now - lastSuccessful) > ttl, on
  the request path. A stale cache treats every kid as a miss so the
  refresh actually runs.
- backoffActive() suppresses retry storms during JWKS outages
  (refetchGrace window — 30s default).
- Successful refresh clears lastMissRefresh, so a previously-unknown
  kid can be re-attempted immediately after a successful reload.
- Injectable clock (`now func() time.Time`) so tests exercise TTL
  expiry without time.Sleep.

Tests (all -race clean):
  TestJWKS_TTLExpiryForcesRefresh
    Same kid, second call within TTL → no refetch.
    Advance virtual time past TTL → refetch happens.
  TestJWKS_RevokedKeyEventuallyRejected
    Sign with key A → accept.
    IdP removes key A from JWKS.
    Within TTL → still accepted (acceptable revocation lag).
    Past TTL → refresh picks up new JWKS → token rejected (ErrInvalidToken).
  TestJWKS_FailedRefreshDoesNotExtendTTL
    Warm cache → fail JWKS → past TTL → refresh fails (returns error).
    Endpoint recovers → past refetchGrace → next call retries refresh
    successfully (proves lastSuccessful was NOT bumped by the failure).

Misleading doc comment removed; new struct comment documents the
two-timestamp design and exact refresh triggers.

Backward compat: no API changes. Default TTL (1 hour) and min TTL
(5 min) clamp unchanged. Phase 1 OIDC behavior preserved otherwise.

Verification:
  go test -race ./forge-core/auth/providers/oidc/  — all green
  go test -count=1 forge-core/... forge-cli/... forge-plugins/... forge-ui/...
    — 47/47 packages pass
  golangci-lint v2.10.1                              — 0 issues
…itializ#2)

Before: newDiscovery trimmed the configured Issuer, but the
discovery-doc check compared the trimmed value against the IdP's
RAW emitted form, and jwt.WithIssuer received the user's RAW configured
form. Result: any IdP that emits iss with a trailing slash (Auth0) was
spuriously rejected against a config without slash, and vice versa.

Reviewer flagged: provider.go vs discovery.go.

Fix:
- New Config.normalize() trims trailing slash from Issuer; called once
  in New() so every downstream comparison uses the same canonical form.
- Discovery comparison now trims meta.Issuer before checking against
  the already-trimmed d.issuer.
- JWT iss validation: replaced jwt.WithIssuer (strict equality, no
  trim option) with a manual checkIssuer that trims both the token's
  iss claim and the configured Issuer before comparing.

Security: trimming a single trailing slash cannot collapse distinct
issuers — "https://attacker.example" and "https://legit.example"
remain distinct after trim. The trim is normalization, not weakening.
TestIssuer_GenuineMismatchStillRejected guards this invariant.

Backward compat: existing configs continue to work; new configs with
trailing slashes now also work. JWT iss claim with trailing slash now
also works.

Tests:
  TestIssuer_ConfigWithoutSlash_TokenWithSlash  (Auth0-shaped iss + Okta-shaped config)
  TestIssuer_ConfigWithSlash_TokenWithoutSlash  (reverse)
  TestIssuer_BothWithoutSlash_StillWorks        (regression — common case)
  TestIssuer_BothWithSlash_StillWorks           (regression — Auth0 paired)
  TestIssuer_GenuineMismatchStillRejected       (security guard)
  TestIssuer_TokenMissingIssClaim               (no iss → ErrTokenRejected)
  TestDiscovery_TrailingSlashTolerated          (discovery doc has slash, config doesn't)

Verification:
  go test -race ./forge-core/auth/providers/oidc/  — all green
  full sweep — 47/47 packages pass
  golangci-lint v2.10.1                            — 0 issues
…ializ#3)

Reviewer flagged: forge-core/auth/middleware.go silently returned a
no-op wrapper when opts.Chain == nil. A misconfigured runner that
forgot to wire a chain would silently serve unauthenticated requests
in production. High severity — the open-prod-endpoint failure mode
is the highest-impact misconfiguration in the auth subsystem.

Fix:
- New MiddlewareOptions.AllowAnonymous explicit opt-in (default false).
- Middleware() panics at construction when Chain == nil && !AllowAnonymous.
  Fails loudly at startup, not silently at the first real request.
- Runner updated to set AllowAnonymous=true in the two legitimate
  anonymous paths:
    1. --no-auth flag (operator explicit opt-in)
    2. No auth: block AND no --auth-url AND no channels (legacy local
       dev default — preserved for backward compat)

Backward compat:
- Existing deployments with a configured chain — unchanged.
- Deployments with --no-auth — unchanged (runner sets the flag).
- Deployments that ran anonymous by accident — now panic at startup
  with a clear, actionable message naming the flag to set. This IS
  a behavior change, by design.

Tests (all -race clean):
  TestMiddleware_NilChainPanicsWithoutAllowAnonymous
    Asserts the panic happens AND the message mentions AllowAnonymous
    so operators know how to fix it.
  TestMiddleware_NilChainWithAllowAnonymousPassesThrough
    Counterpart: explicit opt-in works.
  TestMiddleware_NonNilChainIgnoresAllowAnonymous
    AllowAnonymous is only consulted when Chain is nil; a configured
    chain always enforces auth.

The existing PR1 "nil chain passes through" test updated to use the
new AllowAnonymous flag. The E2E TestE2E_NoAuthConfigured_AnonymousAccess
test driver sets AllowAnonymous=true automatically when given a nil
chain.

Verification:
  go test -race ./forge-core/auth/... ./forge-cli/runtime/ — all green
  full sweep — 47/47 packages pass
  golangci-lint v2.10.1 — 0 issues
…initializ#4)

Reviewer flagged: a user who configures forge.yaml with
'auth: { required: true, providers: [...] }' and then runs the agent
with --no-auth gets anonymous access silently. The flag and the YAML
block were treated independently. The --no-auth happy-path short-circuit
returned an anonymous middleware before consulting cfg.Auth at all.

This is the second-highest-impact misconfiguration class in the auth
subsystem (open prod endpoint by mistake), and it's worse than the
nil-Chain case (initializ#3) because here the operator deliberately wrote
'required: true' into their config.

Fix (graduated severity):
  auth.required == true  + --no-auth  →  startup error (refuse)
  providers populated    + --no-auth  →  warning (operator intent
                                        unclear; we proceed but make
                                        the override visible)
  empty auth block       + --no-auth  →  unchanged (anonymous, no log)

Error message names what's wrong AND lists the three ways to fix it:
  - remove --no-auth
  - set 'auth.required: false'
  - delete the 'auth:' block

Warning includes provider count so dashboards can alert.

Tests (all -race clean):
  TestResolveAuth_NoAuthWithRequiredFails
    auth.required=true + --no-auth → error; message contains
    "--no-auth conflicts with".
  TestResolveAuth_NoAuthWithProvidersWarns
    providers set, required=false + --no-auth → no error, warning
    emitted containing "--no-auth overrides".
  TestResolveAuth_NoAuthWithEmptyYAMLConfig_NoWarning
    Regression: --no-auth alone (no auth: block) still works without
    spurious warning.
  TestResolveAuth_NoAuthWithRequiredFalseAndProviders_WarnsNotFails
    Explicit Required: false is respected — warn, don't refuse.

New recordLogger test helper captures Warn/Info calls so assertions
can check what the runner emitted; mirrors the no-op nopLogger pattern
already in use.

Backward compat:
  - --no-auth alone (no auth: block): unchanged
  - --no-auth + providers + no required: warns, still anonymous
  - --no-auth + required: true: BREAKING — refuses to start. This is
    intentional; the previous behavior silently exposed a "required
    auth" deployment as anonymous, which the user explicitly tried
    to prevent in their forge.yaml.

Verification:
  go test -race ./forge-core/auth/... ./forge-cli/runtime/ — green
  full sweep — 47/47 packages pass
  golangci-lint v2.10.1 — 0 issues
@initializ-mk
Copy link
Copy Markdown
Contributor

Re-review after fix commits

Verified the 4 fix commits against the working tree and ran go test -race ./forge-core/auth/... ./forge-cli/runtime/ — all green.

✅ Fixed and verified

# Item Commit Verdict
1 JWKS TTL dead code 379d578 Correct. Two-timestamp design (lastSuccessful/lastAttempt) implemented as documented. Failed refresh does NOT advance lastSuccessful — verified at jwks.go:230–290. TestJWKS_FailedRefreshDoesNotExtendTTL (provider_test.go:1000) is the strongest assertion. Injectable clock works.
2 Issuer trailing slash 3a7a131 Correct. Config.normalize() (provider.go:113–116) called in New(). Manual checkIssuer trims both sides. TestIssuer_GenuineMismatchStillRejected guards the security invariant. All 7 tests present.
3 Middleware nil-chain panic 19f7251 Correct. Panic at construction (middleware.go:90–95) with AllowAnonymous named in message. Runner wires the flag in both legitimate paths (runner.go:1778–1781, 1810–1815). 3 tests present including the "configured chain ignores AllowAnonymous" guard.
4 --no-auth + YAML cross-check b33522d Correct. Three-way severity at runner.go:1753–1782. Error message lists all 3 remediation paths. Warning includes provider count. 4 tests + recordLogger helper present.
5 JWKS refresh-failure cooldown (subsumed by #1) Fixed as side-effect. errBackoff check at jwks.go:175 suppresses unknown-kid stampedes after a failed refresh within refetchGrace. Note: during the backoff window, even valid-but-uncached kids are denied — deliberate trade-off, worth a release note.
TestAuthDomains unit tests (added) forge-core/security/auth_domains_test.go — 9 tests, 136 lines.

❌ Still outstanding

Blockers

  1. httpverifier 5xx classificationforge-core/auth/providers/httpverifier/provider.go:131, 143 still wrap transport errors and non-200/non-401 responses (including 5xx) as ErrInvalidToken. No ErrProviderUnavailable sentinel introduced. Audit signal is wrong when the verifier is down.

  2. auth_domains.go port strippingforge-core/security/auth_domains.go:88 still calls u.Hostname(). This one needs a decision, not just a fix: TestAuthDomains_PortStripped (auth_domains_test.go:112–125) now asserts want := []string{"localhost"} for http://localhost:8080/... with message "port must be stripped" — so this is now codified as intended behavior. Please confirm: is the egress matcher port-agnostic? If yes, the test + code are correct and this can be closed. If no, an issuer on :8443 will be silently blocked at JWKS-fetch time and the fix needs to flow through both auth_domains.go and the test.

  3. forge-ui/static/dist/app.js — still hand-edited under dist/ with no src/, no package.json, no build script, no README. The maintenance trap is unchanged.

  4. Web UI server-side validationforge-ui/handlers_create.go:135–175 still only checks Name/ModelProvider before delegating. validate.ValidateAuthConfig is not called on opts.Auth despite being importable. A client can post empty issuer/audience and only fail at forge run.

  5. Loopback "ALWAYS prepended" docforge-cli/runtime/runner.go:1744 doc still says "ALWAYS prepended" while the code at :1792 is conditional on r.authToken != "". Either tighten the doc or make the code unconditional.

Smaller bugs

  • statictoken length-leak commentforge-core/auth/providers/statictoken/provider.go:100 unchanged. One-liner comment about subtle.ConstantTimeCompare's length-mismatch fast-path.
  • mergeEgressDomains sort stabilityforge-cli/cmd/init_egress.go:25–43 preserves insertion order, no sort.Strings on the result. forge.yaml allowed_domains stays "sorted base + appended auth hosts."
  • renderAuthBlock dead name parameterforge-cli/cmd/init.go:1053 still calls renderAuthBlock(opts.AuthMode, opts.AuthMode, ...), so the name != mode branch at init_auth.go:151 is dead.

Test gaps

  • Warning-log assertion on YAML/--auth-url precedenceforge-cli/runtime/auth_config_test.go:226 (TestBuildUserAuthChain_BothSourcesPrefersYAML) still uses nopLogger{}. The new recordLogger helper (added at line 349 for the --no-auth cross-check tests) wasn't retrofitted onto this test. Cheap to fix.
  • Web-handler test for buildAuthPayloadclaim_map transformation — still no Go-side test. The translation only lives in hand-edited app.js, which has no test coverage.

Summary

The 4 top blockers are cleanly resolved. The TTL fix elegantly subsumes the refresh-cooldown concern (#5). 5 blockers + 3 smaller bugs + 2 test gaps remain. None of the outstanding items are crypto/correctness in the JWT/JWKS path — they're now concentrated in the egress, audit signal, web UI surface, and maintenance areas.

Most actionable next pass: (a) decide on the port-strip question (#2), (b) add server-side validation in handlers_create.go (#4), (c) decide the dist/app.js story (#3). The rest are quick mechanical fixes.

Naveen Kurra added 9 commits May 23, 2026 14:36
…nitializ#5)

Reviewer flagged: when refreshLocked errors, lastMissRefresh is not
updated → every subsequent unknown-kid request hammers the IdP again
→ a flapping IdP becomes a DDoS amplifier (N callers × broken IdP =
N IdP hits per request burst).

The reviewer was looking at the pre-fix-initializ#1 code. Fix initializ#1 (commit 379d578)
addressed this concern through a DIFFERENT mechanism: it added a
lastAttempt + lastErr pair and a backoffActive() guard, so any refresh
error (not just unknown-kid-after-refresh) suppresses subsequent
refresh attempts inside the refetchGrace window.

This commit makes the no-amplifier property explicit by:

1. Adding TestJWKS_FailedRefreshDoesNotHammerIdP — fires 50 unknown-kid
   requests during a backoff window and asserts the JWKS endpoint
   receives exactly ONE hit (the initial refresh that failed). Then
   advances time past refetchGrace and asserts the next request
   triggers EXACTLY ONE retry — and the burst after that retry stays
   capped at one IdP call again.

2. Adding a doc comment on backoffActive() that names the DDoS-amplifier
   concern explicitly and references the test that pins the contract,
   so future maintainers don't accidentally remove the gate.

No code behavior change — fix initializ#1 already produced the right behavior;
this commit makes the test coverage match.

Verification:
  go test -race ./forge-core/auth/providers/oidc/ — green (incl. new test)
  full sweep — 39/39 packages pass
  golangci-lint v2.10.1 — 0 issues
…nitializ#6)

Reviewer flagged: httpverifier classified 5xx + network errors as
ErrInvalidToken. Fail-closed direction was correct, but the audit signal
was misleading — operators reading audit logs saw "invalid token" alerts
firing when the actual problem was the IdP being down. Different
response, different runbook, distinct alert.

Fix:
- New sentinel auth.ErrProviderUnavailable. Same chain behavior as
  ErrInvalidToken (stop, fail-closed, 401 response), but distinct
  audit signal.
- httpverifier reclassifies its error mapping:
    HTTP 200 + valid:true   → Identity (unchanged)
    HTTP 200 + valid:false  → ErrTokenRejected (unchanged)
    HTTP 401                → ErrTokenRejected (unchanged)
    HTTP other 4xx          → ErrTokenRejected (NEW: was ErrInvalidToken)
    HTTP 5xx                → ErrProviderUnavailable (CHANGED)
    Network/transport error → ErrProviderUnavailable (CHANGED)
    Undecodable body        → ErrProviderUnavailable (CHANGED;
                              200-but-garbage is verifier misbehavior,
                              not a token issue)
    Local marshal/build err → ErrInvalidToken (unchanged; never the
                              verifier's fault)
- middleware.classifyAuthFailure: new branch returns "auth provider
  unavailable" as the client-visible message.
- runner.authFailReason: new reason code "provider_unavailable" in
  the audit event. Documented in the function's reason-code table.

Audit-event contract additions:
  auth_fail { reason: "provider_unavailable", ... }

Operators dashboarding on `reason:provider_unavailable` can now alert
on IdP downtime distinctly from token issues.

Tests (all -race clean):
  TestVerify_500_ReturnsProviderUnavailable          (httpverifier)
  TestVerify_502BadGateway_ReturnsProviderUnavailable
  TestVerify_NetworkError_ReturnsProviderUnavailable
  TestVerify_Non401_4xx_ReturnsRejected              (regression: 400/403 stay rejected)
  TestVerify_UndecodableBody_ReturnsProviderUnavailable
  TestAuthAudit_EmitsAuthFailOnError                 (new "provider_unavailable" row)
  TestClassifyAuthFailure                            (new "auth provider unavailable" message row)

Backward compat:
- Chain behavior preserved (ErrProviderUnavailable fail-closed like
  ErrInvalidToken).
- Existing client retry logic that retried on "invalid token" responses
  will still retry on "auth provider unavailable" (both are 401-class).
- ErrInvalidToken sentinel preserved for token-side failures elsewhere
  (oidc malformed JWT, bad kid, etc.) — unchanged.

OIDC provider not updated in this commit. Its JWKS-fetch failures
currently surface via the chain's "any other error fails closed" path,
mapped to authFailReason "infrastructure". Could be tightened to
ErrProviderUnavailable in a follow-up, but reviewer's specific concern
was the misleading "invalid token" signal from httpverifier — OIDC
already produces "infrastructure" which is at least not misleading.

Verification:
  go test -race forge-core/auth/... forge-cli/runtime/ — green
  full sweep — all packages pass
  golangci-lint v2.10.1 — 0 issues
…iz#7)

Reviewer asked: is the egress matcher port-agnostic? If yes, the
current TestAuthDomains_PortStripped behavior is correct; if no, an
issuer on :8443 is silently blocked at JWKS-fetch time.

Investigation result: YES, the egress matcher IS port-agnostic. All
three callsites strip the port off the outbound host before passing
to matcher.IsAllowed:

  - egress_enforcer.go:40 — req.URL.Hostname() before IsAllowed
  - egress_proxy.go:210   — extractHost() via net.SplitHostPort
  - safe_dialer.go:42     — net.SplitHostPort(addr)

So auth_domains.go emitting hostname-only entries is correct and
consistent. No code fix needed.

What WAS missing: the cross-package contract was implicit. If someone
later hardens the matcher to be port-aware ("allow :443 but not :8080"),
auth_domains.go would silently break — every forge.yaml that points at
a non-443 IdP would suddenly fail JWKS fetches.

This commit:
  - Adds a doc comment on hostFromURL() naming the three callsites
    whose port-stripping behavior we depend on. If any of them changes,
    auth_domains needs to follow.
  - Adds TestAuthDomains_AssumesPortAgnosticMatcher that asserts the
    contract directly: builds a matcher with the host AuthDomains
    produces, checks the hostname-only form is allowed, AND checks
    that host:port form is NOT (yet) allowed. If a future change makes
    the matcher port-aware, the second assertion flips and the test
    flags the cross-package break.

No production code changes. The fix is preventative documentation +
test, not behavior change.

Verification:
  go test -race ./forge-core/security/ ./forge-core/auth/... — green
  full sweep — all packages pass
  golangci-lint v2.10.1 — 0 issues
…nitializ#8)

Reviewer flagged: forge-ui/static/dist/app.js is a 110KB hand-edited
ES-module file living under a path (dist/) that conventionally signals
"generated build artifact, don't touch." There is no source folder, no
package.json, no bundler config, no sourcemap. Every UI change has been
a direct edit to the "dist" file. This will only get worse as more
wizard steps land.

Decision: rename out of dist/ rather than introduce a bundler.

A bundler (esbuild, etc.) would add real operational debt — package.json,
npm install in CI, version pinning, sourcemap rotation, build step in
the Makefile, contributor onboarding cost — for files that are
hand-written single-file ES modules with no JSX, no TypeScript, no
imports beyond htm/preact CDN globals. The file IS the source. Removing
the misleading dist/ name communicates that honestly.

Changes:
  - git mv app.js / style.css / index.html / monaco/ up one level
    from forge-ui/static/dist/ → forge-ui/static/. History preserved
    via git rename detection.
  - static/embed.go: //go:embed all:dist → //go:embed app.js style.css
    index.html monaco. Explicit list keeps embed.go itself out of the
    served assets and documents the asset surface.
  - server.go: drop fs.Sub(static.FS, "dist") — the FS is now rooted
    directly at assets. Removes the unused "io/fs" import. The SPA
    fallback's static.FS.Open() replaces distFS.Open().

Manual smoke (against built binary):
  - GET /              → 200 text/html (index.html)
  - GET /app.js        → 200 text/javascript, 110469 bytes (unchanged)
  - GET /style.css     → 200
  - GET /monaco/editor.js → 200, 2317605 bytes
  - GET /create        → 200 (SPA fallback still works)
  - GET /api/wizard/meta auth_provider_types length == 4 (PR6 unaffected)

Future: if app.js ever does grow JSX/TS/minification needs, that's the
right time to introduce a build step — into a NEW static/src/ directory
producing into static/build/ or similar. Adding it preemptively today
is over-engineering.

Verification:
  go build forge-ui/... — green
  full sweep — 39/39 packages pass
  golangci-lint v2.10.1 — 0 issues
  manual end-to-end smoke test of all asset paths — all 200
…nitializ#9)

Reviewer flagged: handleCreateAgent decoded JSON, checked only name and
model_provider, then handed opts.Auth (with an arbitrary Settings
map[string]any) straight to CreateFunc. The validate package's
ValidateAuthConfig existed but was never called on this path — so a
buggy frontend could POST an oidc payload missing issuer/audience, the
agent directory would scaffold cleanly with a malformed auth: block,
and the error would only surface when the user ran `forge run`. By
then the directory, .env.example, and channel tokens are all on disk.

Fix:
- New validateAuthPayload(opts.Auth) helper in handlers_create.go.
- Called from handleCreateAgent immediately after the name/provider
  checks, BEFORE CreateFunc runs.
- Translates AuthCreateOptions → types.AuthConfig exactly the way
  cmd/ui.go's createFunc does, so this check sees the same shape the
  eventual scaffold will. No drift between "wizard accepts" and
  "forge validate accepts".
- 400 with the validation errors joined as the response message.
- Modes "" / "none" / "custom" skip validation — they don't write
  provider entries, nothing to validate.

Validation surface (delegated to validate.ValidateAuthConfig):
  - oidc:          issuer + audience required
  - http_verifier: url required
  - static_token:  token or token_env required
  - unknown type:  rejected with list of known types

The frontend's pre-submit checks (canNext in app.js step 8) already
guard the happy path, but those are defense in depth at best — anyone
with curl can bypass them. This is the server-side enforcement that
matters.

Tests:
  TestHandleCreateAgent_OIDCMissingIssuerRejected      400, error names "issuer"
  TestHandleCreateAgent_OIDCMissingAudienceRejected    400, error names "audience"
  TestHandleCreateAgent_OIDCBothFieldsAccepted         201 (regression)
  TestHandleCreateAgent_HTTPVerifierMissingURLRejected 400, error names "url"
  TestHandleCreateAgent_UnknownAuthModeRejected        400, error names unknown type
  TestHandleCreateAgent_NoneAndCustomSkipValidation    201 for mode=none and mode=custom
  TestHandleCreateAgent_WithoutAuthPayload             201 with nil Auth (regression)

Each failure test also asserts that CreateFunc was NOT called — the
agent directory must never exist on disk for malformed auth.

Verification:
  go test -race ./forge-ui/ — green
  full sweep — all packages pass
  golangci-lint v2.10.1 — 0 issues
…tializ#10)

Reviewer flagged: resolveAuth's comment claimed the loopback static_token
is "ALWAYS prepended at the chain head (regardless of source)", but the
actual code is conditional on r.authToken != "". It works in practice
only because ResolveAuth() always mints a token in the non-NoAuth path.
Any future refactor that short-circuits that path would silently break
channel-adapter callbacks — no compile error, no test failure, just
"my Slack bot stopped working after we updated forge" weeks later.

Fix is documentation + an invariant test. The conditional itself is
correct (--no-auth path returns early before the prepend, and that path
also doesn't mint a token), so the code stays as-is. What was missing
is making the upstream dependency explicit so a future maintainer
either preserves the invariant or notices when they don't.

Changes:
- resolveAuth doc: replaced "ALWAYS prepended" with an explicit
  description of the precondition (ResolveAuth() must have minted a
  token) and a reference to the pinning test. Inline comment at the
  prepend site explains why the conditional is defensive, not
  load-bearing.
- ResolveAuth() doc: states the invariant it maintains (after nil
  return, either r.authToken != "" or r.cfg.NoAuth is true) and names
  the downstream consumer + test.
- New tests:
    TestResolveAuth_InvariantMintsTokenInNonNoAuthPath
      Direct invariant pin: non-NoAuth + ResolveAuth() returns nil →
      r.authToken MUST be non-empty. Failure message tells future
      maintainers what they broke and where to look.
    TestResolveAuth_NoAuthPathDoesNotMintToken
      Counterpart: --no-auth path does NOT mint. Pins the "anonymous
      mode skips token generation" property too, so neither direction
      can drift.

If someone ever refactors ResolveAuth to skip minting in the non-NoAuth
path, the invariant test fails immediately — and the failure message
points straight at the loopback-prepend dependency.

Verification:
  go test -race ./forge-cli/runtime/ — green (incl. 2 new tests)
  full sweep — all packages pass
  golangci-lint v2.10.1 — 0 issues
Six related cleanups bundled per the reviewer's grouping:

11a. statictoken length leak (provider.go)
  subtle.ConstantTimeCompare returns 0 instantly on length mismatch
  without inspecting bytes, leaking the configured token's length via
  measurable timing differences across many trials. Fix: SHA-256 hash
  both sides and constant-time compare the 32-byte digests. The
  comparison is now over equal-length inputs in every case.

  Pinned by TestVerify_MismatchedLengths_AreRejected — exercises empty,
  prefix, suffix, off-by-one, and same-length-but-different inputs.

11b. OIDC PSS-default mismatch (jwks.go, provider.go)
  parseJWK was defaulting an RSA JWK with no `alg` field to "RS256".
  The cross-check at provider.go then rejected any PS256/PS384/PS512
  token signed by the same RSA key — RFC 7518 says "infer from
  context" when alg is absent, and the same RSA key signs both RS*
  and PS* families.

  Fix: preserve empty alg through parseJWK. In the cross-check, when
  the JWK's alg is empty, fall back to trusting the token's header alg
  — which the asymmetric whitelist already validated upstream. When
  the JWK declares an alg explicitly, the strict comparison still
  applies (algorithm-confusion defense intact).

  Pinned by TestVerify_JWKWithoutAlgAcceptsAnyAsymmetric and
  TestVerify_JWKExplicitAlgStillStrict.

11c. mergeEgressDomains sort order (init_egress.go)
  deriveEgressDomains returned sorted output; mergeEgressDomains
  appended AuthEgressHosts unsorted at the tail. When AuthEgressHosts
  came from a map iteration, the rendered forge.yaml's
  egress.allowed_domains section diffed across runs even when the
  underlying input set was unchanged.

  Fix: sort the merged result. Generated forge.yaml is now stable.

  Pinned by TestMergeEgressDomains_SortedOutput.

11d. renderAuthBlock dead `name` parameter (init_auth.go, init.go)
  Caller passed opts.AuthMode for both the mode and name arguments;
  the name was suppressed by the equality check inside the function.
  The wizard never captures an explicit provider name today.

  Fix: dropped the parameter. If a future wizard step adds name
  capture, reintroduce it then. Existing tests updated; renamed
  TestRenderAuthBlock_NameSameAsModeSuppressed →
  TestRenderAuthBlock_NoNameEverEmitted with a doc-pinning intent.

11e. TUI backspace-on-empty undiscoverable (step_auth.go)
  Backspace-on-empty was the documented escape hatch out of an
  active sub-step input, but no visible affordance surfaced it.

  Fix: inline kbd hint under the input in each sub-step phase —
  "Backspace at empty   back to picker". Doesn't change behavior,
  makes it discoverable.

11f. Identity.Claims unfiltered shallow copy (claims.go, provider.go)
  The OIDC provider's copyClaims returns the full JWT claim set on
  Identity.Claims, including custom claims the issuer adds (PII, raw
  group memberships, internal IDs). Downstream consumers reading
  Identity.Claims will see all of it.

  Fix is documentation for now: WARNING on both the Identity.Claims
  field and the copyClaims helper, naming the risk and pointing future
  consumers at typed Identity fields (UserID/Email/OrgID/Groups) as
  the portable surface. Filtering is properly a future authz-layer
  concern — not something this package can do without policy input.

Verification:
  go test -race forge-core/... forge-cli/... — green
  full sweep — all packages pass
  golangci-lint v2.10.1 — 0 issues
Comment alignment in the candidate-tokens slice tripped CI's
'gofmt -l forge-core forge-cli forge-plugins' check. Mechanical fix —
gofmt re-aligns trailing comments to the same column.

CI Test job was failing here:
  Test → Check formatting → 'test -z $(gofmt -l ...)' exits 1
on the new TestVerify_MismatchedLengths_AreRejected block I added in
b3d4264 (review #11a). Local gofmt -l also reports it now; pushed.
…ializ#12)

Six coverage gaps and one hardening item from the reviewer's eight-point
batch. Two items in the original list are already covered:

  12.2 (--no-auth + YAML auth precedence) is exercised by the four
       TestResolveAuth_NoAuth* tests added in review initializ#4 commit b33522d.
  12.4 (TestAuthDomains in forge-core/security) is exercised by the
       eight TestAuthDomains_* tests in security/auth_domains_test.go
       (added in PR3 + strengthened in review initializ#7 commit cf4ae14).

The six real gaps:

12.1  Warning capture for --auth-url + YAML auth precedence
  TestBuildUserAuthChain_BothSourcesPrefersYAML was asserting the
  chain-behavior half of the contract through a nop logger that
  swallowed the Warn — silent regression on the "we logged a warning"
  half. Switched to recordLogger and added an assertion on
  "--auth-url and forge.yaml". If a refactor ever drops that warning,
  the test fails.

12.3  Full resolveAuth + YAML loopback head-of-chain test
  auth_chain_test.go exercised buildLegacyAuthChain. Nothing exercised
  resolveAuth() end-to-end with a YAML auth: block AND a minted
  internal token, then asserted "loopback is at chain head, ahead of
  YAML provider." Added TestResolveAuth_YAMLAuth_LoopbackPrependedAtChainHead
  that inspects chain.Providers() ordering directly and smoke-tests
  the loopback token verifies with the "internal" source.

12.5  Backend contract test for the frontend's claim_map translation
  app.js's buildAuthPayload translates flat groups_claim → nested
  claim_map.groups before submitting. The backend doesn't run that
  translation; if the frontend ever regresses, downstream YAML would
  silently drop the custom claim. Added two tests in the UI handler:
  - TestHandleCreateAgent_NestedClaimMapShape pins what the backend
    EXPECTS post-translation (claim_map.groups present).
  - TestHandleCreateAgent_FlatGroupsClaim_PreservedAsExtraField
    documents the failure mode if the JS regresses — the request
    succeeds structurally but claim_map is missing.
  Real JS tests would require a Node toolchain we deliberately don't
  add (see PR6 / review initializ#8). These Go tests are the next-best contract.

12.6  Backspace-to-picker behavior in step_auth.go
  Added TestAuthStep_BackspaceOnEmptyInputReturnsToPicker and a
  counterpart TestAuthStep_BackspaceWithContentDoesNotReturnToPicker
  to pin the escape-hatch behavior the review #11e hint advertises.

12.7  Invalid issuer URL blocks advance
  validateHTTPSURL was unit-tested standalone; never exercised as a
  wizard interaction. Added TestAuthStep_InvalidIssuerURLBlocksAdvance:
  bad URL + Enter → still on issuer phase; type a valid replacement
  + Enter → advances to audience.

12.8  renderAuthBlock YAML quoting hardening
  writeYAMLMap previously emitted "key: value" raw. Values containing
  ": " sequences, "#" markers, leading YAML-significant chars
  (!, *, -, [, {, etc.), reserved tokens (true/false/null/yes/no…),
  or control chars could break the parser. Added:
  - yamlScalar/needsYAMLQuoting helpers that emit double-quoted +
    escaped strings only when the input would otherwise change
    meaning. Plain values pass through unchanged so generated
    forge.yaml remains readable.
  - TestRenderAuthBlock_QuotesUnsafeValues (19 sub-cases) pinning
    the predicate's classification per input shape.
  - TestRenderAuthBlock_QuotedValueIsParseable round-trips a value
    containing ": ", "#", "!", and quotes through the renderer →
    yaml.v3 unmarshaler, asserting the rendered line is valid YAML
    AND decodes back to the original string.

Note: the renderer remains a focused subset (string / number / bool /
map). If auth-settings ever grow richer types, switch to yaml.v3 for
the whole block.

Verification:
  test -z "$(gofmt -l forge-core forge-cli forge-plugins)" — clean
  go test -race ./forge-core/... ./forge-cli/... — all green
  golangci-lint v2.10.1 — 0 issues
@naveen-kurra naveen-kurra requested a review from initializ-mk May 23, 2026 20:17
@initializ-mk
Copy link
Copy Markdown
Contributor

Round-3 review — reviews #5#12

Verified all 8 fix commits against the working tree. go test -race on all affected packages (auth, httpverifier, oidc, statictoken, runtime, cmd, ui, security, validate) — all green.

✅ All items resolved

# Item Verdict
5 JWKS no-hammering proof Verified. TestJWKS_FailedRefreshDoesNotHammerIdP (provider_test.go:1049) uses atomic.Int32 to count /jwks hits, fires 50 unknown-kid requests during backoff → asserts exactly 1 hit; advances 35s past refetchGrace → asserts 2; another burst of 50 → stays at 2. Real counters, not log greps.
6 ErrProviderUnavailable sentinel Verified end-to-end: sentinel at provider.go:146; httpverifier mapping (5xx, network err, undecodable body → ErrProviderUnavailable; 401 / non-401 4xx → ErrTokenRejected) at httpverifier/provider.go:120–171; chain semantics via the generic non-yield-error path (chain_provider.go:43–56); user-facing message in classifyAuthFailure (middleware.go:153–157); stable reason code "provider_unavailable" in runner.go:1924–1940. Four targeted tests in httpverifier.
7 Port-stripping contract Verified. auth_domains.go:81–95 documents the three matcher callsites the contract depends on. New TestAuthDomains_AssumesPortAgnosticMatcher (auth_domains_test.go:152–195) builds a real DomainMatcher and asserts hostname is allowed AND host:port form is NOT — so any future port-aware change flips this assertion loudly.
8 static/dist/ rename Verified. forge-ui/static/dist/ removed; static/ directly contains app.js, style.css, index.html, monaco/, embed.go. //go:embed directives updated; server.go:108–128 serves directly from static.FS (no fs.Sub("dist")). Convention documented in embed.go:14–17. All ui tests green.
9 Server-side auth validation Verified. handleCreateAgent calls validateAuthPayload(opts.Auth) before CreateFunc (handlers_create.go:162–165), returns 400 with the validator's detail. Translation to types.AuthConfig matches the cmd/ui.go shape. TestHandleCreateAgent_OIDCMissingIssuerRejected asserts 400 + "issuer is required" + CreateFunc not called.
10 Loopback invariant Verified. runner.go:1752–1762 doc tightened to "WHEN ResolveAuth() has minted one." ResolveAuth itself (runner.go:130–137) documents the mint invariant. Code remains conditional (correct). Pinned by TestResolveAuth_InvariantMintsTokenInNonNoAuthPath + the --no-auth counterpart in auth_chain_test.go:308–341.
11 Six smaller bugs Verified all six: statictoken now SHA-256 hashes both sides before constant-time compare (provider.go:108–123, with length-leak comment + TestVerify_MismatchedLengths_AreRejected); JWK empty-alg fallback fixed at jwks.go:306–317 + provider.go:233–243 with TestVerify_JWKWithoutAlgAcceptsAnyAsymmetric and TestVerify_JWKExplicitAlgStillStrict; mergeEgressDomains now sorts (init_egress.go:28–42); renderAuthBlock name parameter dropped; backspace hint added to TUI; Identity.Claims unfiltered-copy warning at provider.go:53–65.
12 Coverage gaps + YAML quoting Verified. TestBuildUserAuthChain_BothSourcesPrefersYAML now uses recordLogger and asserts warnedAbout("--auth-url and forge.yaml"). Backend tests for claim_map shaping (TestHandleCreateAgent_NestedClaimMapShape + …_FlatGroupsClaim_PreservedAsExtraField). Wizard tests for Backspace + invalid issuer URL. YAML quoting: new yamlScalar/needsYAMLQuoting (init_auth.go:217–274) handles leading specials, : , #, control chars, YAML 1.1 bool/null literals, empty string. Round-trip test via yaml.v3.

Minor note (acknowledged in commit message, not a blocker)

OIDC's own JWKS-fetch failures still surface as "infrastructure" rather than "provider_unavailable". Pre-existing behavior preserved; commit body of #6 calls this out as a follow-up. Reasonable scoping decision.

Test suite

ok  forge-core/auth                          1.719s
ok  forge-core/auth/providers/httpverifier   2.767s
ok  forge-core/auth/providers/oidc           8.239s
ok  forge-core/auth/providers/statictoken    3.601s
ok  forge-cli/runtime                        6.322s
ok  forge-cli/cmd                           19.215s
ok  forge-core/security                     13.125s
ok  forge-core/validate                      1.498s
ok  forge-ui                                 3.373s

Verdict

All 12 review items across three rounds are resolved with corresponding tests pinning the behavior. The two highest-risk fixes in this round — ErrProviderUnavailable (new sentinel touching chain semantics) and the static/dist/ rename (path refactor) — both look clean: the sentinel rides the existing fail-closed contract through ChainProvider without needing a special branch, and the static rename touches all //go:embed directives and server callers consistently.

LGTM — ready to merge.

@initializ-mk initializ-mk merged commit 7998f12 into initializ:main May 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants