feat(auth): Phase 2 — AWS Sigv4, GCP IAP, Azure AD providers (v0.11.0)#79
Open
naveen-kurra wants to merge 12 commits into
Open
feat(auth): Phase 2 — AWS Sigv4, GCP IAP, Azure AD providers (v0.11.0)#79naveen-kurra wants to merge 12 commits into
naveen-kurra wants to merge 12 commits into
Conversation
added 12 commits
May 23, 2026 19:19
…r 1) HeadersFromRequest gains Authorization, X-Goog-Iap-Jwt-Assertion, X-Amz-Date, X-Amz-Security-Token so future providers consuming non-Bearer formats (aws_sigv4, gcp_iap) can read what they need without changing the Provider.Verify signature. TokenKind recognizes the "AWS4-HMAC-SHA256 " prefix and returns "sigv4", so audit logs can distinguish Sigv4 requests from "empty" even though the Bearer extractor returns "". Middleware now consults the chain even when no Bearer token was extracted, provided a non-Bearer auth header is present (Sigv4 Authorization or IAP assertion). When NO auth headers at all are present, the audit reason still resolves to ErrMissingBearer — preserving review initializ#4's stable "missing_token" reason code. Phase 1 providers see zero behavior change; their Verify path is unchanged. All Phase 1 tests pass without modification.
…e 2 pr 2) aws_sigv4 authenticates AWS-IAM callers by reflecting their Sigv4 signature to STS GetCallerIdentity. No aws-sdk-go-v2 dependency (decision §9.1): the STS RPC is ~150 LOC of hand-rolled HTTP + XML. Forge never holds the caller's secret key — STS validates the signature on Forge's behalf. Key pieces: - sigv4_parser.go: pure string parser, fuzz-tested, never panics - sts_client.go: 200/4xx/5xx classification per review initializ#6 contract - identity_cache.go: hash(AKID|YYYYMMDD)-keyed TTL cache, opportunistic eviction past 10k entries, Put does NOT extend prior expiry - arn_matcher.go: shell-style globs via path.Match (decision §9.3), invalid patterns fail at Factory time - provider.go: scope check (service=sts, region match) before any STS round trip, cache hit avoids RPC, rejection does NOT poison the cache security: - Algorithm: only AWS4-HMAC-SHA256 prefix is claimed - Scope: cross-service replay (s3->sts) and cross-region replay (eu-west-1->us-east-1) rejected at parse-time - Cache: bucketing by YYYYMMDD bounds stolen-key window to a day - Body cap: 64 KiB on STS responses - Logs: STS error bodies summarized at 200 chars, newlines stripped audit: - ErrTokenNotForMe -> not_for_me (no AWS4 prefix) - ErrInvalidToken -> invalid (malformed Sigv4) - ErrTokenRejected -> rejected (scope/allowlist/STS 4xx) - ErrProviderUnavailable -> provider_unavailable (STS 5xx/network) extras: - security.AuthDomains gains sts.<region>.amazonaws.com (+ override host when sts_endpoint set for tests) - forge-cli/runtime/runner.go side-effect imports aws_sigv4
gcp_iap consumes the X-Goog-Iap-Jwt-Assertion header that GCP's Identity-Aware Proxy forwards on every authenticated request when Forge sits behind a GCP HTTPS load balancer with IAP enabled. Decision §9.4: IAP issuer (https://cloud.google.com/iap) and JWKS URL (https://www.gstatic.com/iap/verify/public_key-jwk) are hardcoded. They're the only stable contract GCP exposes; an override knob would be a footgun. key pieces: - iap_jwks.go: ES256-only JWKS cache, TTL refresh + backoff + stale-grace (mirrors Phase 1 OIDC review initializ#1 pattern) - provider.go: header-presence check, claims projection, iss/aud gates, sub/email required-claims check - parseECJWKSet drops non-EC / non-P-256 / non-ES256-labeled keys during parse — defense in depth against compromised JWKS - alg whitelist rejects RS256 BEFORE key lookup (algorithm- confusion defense) - aud as string OR array both parse (JWT spec allows either) - audit reasons follow Phase 1 contract: rejected — iss/aud mismatch, expired, bad signature invalid — alg != ES256, missing sub/email, bad kid provider_unavailable — JWKS fetch failed AND no prior key cached not_for_me — header absent extras: - security.AuthDomains returns www.gstatic.com when gcp_iap is configured - forge-cli/runtime/runner.go side-effect imports gcp_iap
azure_ad authenticates Microsoft Entra ID tokens. Composes the
Phase 1 oidc.Provider (decision §9.2) for signature verify + base
claim validation; layers AAD-specific concerns on top:
- Tenant lock-in via the tid claim
- Optional Microsoft Graph group enrichment when JWT groups claim
is empty (AAD truncates at ~200 groups)
- Single-tenant vs multi-tenant issuer template
key pieces:
- provider.go: composed oidc + tenant gate + Source overwrite to
"azure_ad" (replaces the inner "oidc" stamp)
- tenant.go: ExtractTenantID — typed accessor for the tid claim
- graph_client.go: Graph /me/transitiveMemberOf with pagination,
same-host enforcement (rejects redirect attacks), 401/403 ->
ErrTokenRejected, 5xx -> ErrProviderUnavailable, defensive
cap at 5000 groups, body cap 1 MiB per page
- graph_cache.go: 5 min TTL, same shape as aws_sigv4's cache
key decisions:
- oidc.Config gains internal SkipIssuerCheck flag with yaml:"-"
so it CANNOT be set via forge.yaml — only callable from another
Go package. AAD multi-tenant uses it; everything else leaves it
off. Surfacing it in YAML would let operators disable iss
validation by accident.
- Soft-fail on Graph 5xx/401: Identity returned with empty Groups
rather than blocking prod traffic. Hard-fail mode (graph_required)
out of scope for v0.11.
- Forge reflects the CALLER's Bearer to Graph; holds no Graph
credentials of its own.
audit reasons:
- ErrTokenRejected -> rejected (tid mismatch, bad sig, Graph 401)
- ErrInvalidToken -> invalid (missing tid, malformed claims)
- ErrProviderUnavailable -> provider_unavailable (Graph 5xx, JWKS down)
extras:
- security.AuthDomains returns login.microsoftonline.com always;
graph.microsoft.com when groups_mode=graph
- forge-cli/runtime/runner.go side-effect imports azure_ad
…(phase 2 pr 5) Wires aws_sigv4, gcp_iap, and azure_ad into the operator surfaces: cli (forge-cli/cmd/init*.go): - New non-interactive flags namespaced --auth-aws-* / --auth-gcp-iap-* / --auth-azure-* (StringSlice for repeatable allowed-principal globs) - buildAuthFromFlags validates required combinations and emits the right egress hosts per provider (sts.<region>.amazonaws.com, www.gstatic.com, login.microsoftonline.com + graph.microsoft.com when groups_mode=graph) - authEgressHostsFromSettings mirrors the same logic for the Web UI - renderAuthBlock supports []string lists with proper YAML quoting (allowed_principals) web ui (forge-ui/handlers_create.go): - AuthProviderTypeMeta lists the three new types with helpful labels validate (forge-core/validate/auth.go): - knownAuthProviderTypes admits aws_sigv4 / gcp_iap / azure_ad - validateProviderSettings enforces per-type required keys (aws_sigv4.region, gcp_iap.audience, azure_ad.audience + tenant_id-unless-multi-tenant, azure_ad.groups_mode whitelist) tests: - 11 new renderer + flag-parsing tests - Round-trip YAML parse used instead of brittle quote-pattern asserts - Updated wizard-meta test to expect 7 auth provider types deliberate scope cut: - TUI step_auth.go sub-step input flows for the 3 new providers are NOT included. Adding them is mechanical (~100 LOC per provider, mirroring the OIDC issuer→audience→groups_claim phase chain) but out of scope for v0.11.0 cut. Non-interactive flag path covers the production-critical CI/CD case; operators using the TUI can pick "Custom" and edit forge.yaml directly until the follow-up lands.
…pr 6) Adds the operator-facing documentation for the three Phase 2 providers that shipped in PRs 1–5, plus a top-level auth index, chain-semantics concepts page, CHANGELOG, and a README link. new docs: - docs/auth/index.md — provider matrix and chain-semantics overview - docs/auth/concepts/chain.md — first-match-wins, no-fall-through on reject, non-Bearer header support, mixed-chain worked example - docs/auth/providers/aws_sigv4.md — STS reflection setup, awscurl example, assumed-role-vs-IAM-role gotcha called out twice - docs/auth/providers/gcp_iap.md — backend service ID lookup steps, hardcoded JWKS rationale, GCP IAM Conditions for allowlisting - docs/auth/providers/azure_ad.md — app registration walkthrough, single/multi/graph mode configs, multi-tenant warning prominent every provider doc includes: - Prerequisites checklist - forge.yaml example - Configuration reference table - Audit log shape (literal JSON) - Troubleshooting matrix (grep-able reason codes) - Security model + limitations sections CHANGELOG.md (new file): - Lists Added / Changed entries for v0.11.0 - "Notes for upgraders" makes the non-breaking nature explicit - Calls out the known TUI sub-flow gap from PR 5 README.md: - Adds Auth Providers row to the Security documentation table
…owlist
The wizard was asking for Egress confirmation before the operator had
picked an auth provider, so STS / AAD authority / IAP JWKS hosts never
appeared in the egress list. Forge would scaffold a forge.yaml whose
egress_hosts blocked its own auth-provider RPC calls — failure happens
later at `forge run`, with no signal the wizard could have caught.
changes:
- Swap step order in init.go: Auth now runs immediately before Egress
- Extend DeriveEgressFunc with (authMode, authSettings) so the Egress
step's Prepare(ctx) pulls the operator's auth choice from
WizardContext and forwards it into deriveEgressDomains
- deriveEgressDomains calls authEgressHostsFromSettings (same helper
the non-interactive --auth=… path uses) — TUI and CLI now produce
identical egress lists for any given auth choice
- EgressStep's inferSource() learns to label auth-derived hosts:
sts.<region>.amazonaws.com → "aws_sigv4 auth"
www.gstatic.com → "gcp_iap auth"
login.microsoftonline.com → "azure_ad auth"
graph.microsoft.com → "azure_ad auth (graph)"
<oidc issuer host> → "oidc auth"
<http_verifier url host> → "http_verifier auth"
tests:
- TestDeriveEgressDomains_AuthProviderHostsMerged: 8 cases pinning the
per-provider host emission (incl. graph-mode adds graph host)
- TestDeriveEgressDomains_AuthHostsMergeNotOverwrite: auth pass is
additive — provider / channel hosts still emit alongside auth hosts
docs:
- docs/auth/concepts/chain.md gains a "TUI wizard ordering" section
explaining the Auth-before-Egress invariant
…, cleanup Final-pass audit findings against the phase 2 design doc surfaced one correctness bug and several small improvements. All gates clean (go test -race / golangci-lint / gofmt). 42 packages pass. BUG fix — middleware emits token_kind="iap_jwt" for IAP requests: The strategy doc §5/§10 lists five token_kind values: empty, opaque, jwt, sigv4, iap_jwt. PR1 wired sigv4 detection but missed iap_jwt, so successful GCP IAP requests audited with token_kind="empty" — the same value as no-auth requests, defeating the audit-pipeline goal of counting IAP traffic distinctly. Middleware now classifies X-Goog-Iap-Jwt-Assertion presence as kind="iap_jwt" on the empty-Bearer path. New regression test pins it. Improvement — graph_client.go avoids per-page URL re-parse: ensureGraphHost was parsing GraphClient.endpoint via url.Parse on EVERY pagination step. Pre-parse the endpoint Host once at construction and compare against that string instead. Trims redundant work on multi-page Graph responses. Improvement — gcp_iap classifyJWTErr ordering hardened: Replaced the bare substring match on "kid" (which would catch unrelated errors) with the specific patterns: "kid " (e.g. "kid X not found") and "not found" (covers JWKS-resolution failures). Pre-existing ordering invariant comment is now actually defended. Cleanup — drop redundant single-function file: Moved ExtractTenantID from azure_ad/tenant.go into provider.go alongside other claim accessors and removed the empty tenant.go. The function was a 1-liner and didn't justify its own file. Cleanup — inline audienceContains shim: Replaced the audienceContains() wrapper (one-liner around slices.Contains) with a direct call at the use site. Less indirection, same behavior. Cleanup — middleware: simplify hasNonBearerAuth boolean expr: Folded the multi-line if-chain into a single boolean expression. Same semantics, less noise. audit findings deferred as nits, not fixed: - aws_sigv4 Parser as zero-value struct (cosmetic; keeps symmetry) - egress_step.go hostOf manual URL parsing (cosmetic; non-hot path) - 10k eviction comment wording audit findings confirmed not bugs: - GraphCache TTL test (already exists in graph_cache_test.go) - PrependChain loopback invariant intact (runner.go line 2036)
The Phase 2 provider docs were committed as MD files under docs/auth/ but we don't want to version-control them — the source-of-truth lives in the design folder, and we'll deliver via the doc site separately. - .gitignore: add docs/auth/ - git rm --cached docs/auth/** (local files preserved) - README.md: drop the now-broken "Auth Providers" docs row - CHANGELOG.md: drop the docs/auth/*.md links from the v0.11.0 entry No code or test changes.
…ontract Real-AWS testing surfaced a documentation gap: callers cannot use raw `awscurl` / `aws-sdk-go` against Forge's `aws_sigv4` provider because Sigv4 binds the signature to the destination host. Standard tools sign for the URL they're addressing (Forge) — STS then rejects the reflected signature because the host bytes don't match. The server-side code is correct. The client just needs to sign a hypothetical STS request, then attach the resulting headers to its real POST to Forge. Same pattern as aws-iam-authenticator for EKS. This commit: - Ships `scripts/forge-aws-sign.py`, a ~100 LOC reference client using boto3.session + SigV4Auth. CLI flags for --region, --url, --profile, --body, --verbose. Reads SSO/IRSA/profile/env credentials via boto3's standard chain. - Extends the package-level docstring in `forge-core/auth/providers/aws_sigv4/provider.go` with a "Client-side signing contract" section spelling out the 4-step pattern and pointing readers to the reference script. - Adds a "Client-side requirement" section to CHANGELOG.md so adopters know to grab the helper or write their own before integrating. Validated against real AWS: - STS reflection: 200, identity stamped, correct ARN/Account/UserID - ARN allowlist match: 200 (matching pattern) - ARN allowlist miss: 401 reason=rejected (correct authz gate) - No-auth: 401 reason=missing_token (Phase 1 contract preserved)
…uthenticator) Phase 2 PR 2's original "reflect Sigv4 headers" design was broken in the obvious way: Sigv4 binds its signature to the destination host as part of the canonicalized signing input. Headers signed for Forge's host could not be replayed against STS — STS sees host:sts.<region>. amazonaws.com, recomputes the signature, gets a different hash, rejects with "SignatureDoesNotMatch". Caught during real-AWS smoke; documented in PR initializ#79 description. This commit replaces the pattern with the same approach aws-iam-authenticator uses for EKS: Client (3 lines): url = boto3.client('sts').generate_presigned_url('get_caller_identity', ExpiresIn=900) token = 'forge-aws-v1.' + base64.urlsafe_b64encode(url.encode()).rstrip(b'=').decode() requests.post(forge_url, headers={'Authorization': f'Bearer {token}'}, ...) Server: Authorization: Bearer forge-aws-v1.<base64-of-presigned-sts-url> → decode + validate host (SSRF guard) + GET on the URL → STS → identity Net effect on caller experience: identical to JWT/OIDC/azure_ad — "mint token, send Bearer, done." Three lines of client code, hidden in ~15 lines of any AWS SDK in any language. what changed: forge-core/auth/providers/aws_sigv4/ sigv4_parser.go — was parsing AWS4-HMAC-SHA256 Authorization header now parses forge-aws-v1.<base64-url> Bearer tokens (URL host validation, SSRF guard, X-Amz-Credential parsing for cache key derivation) sts_client.go — was POST with reflected headers now GET on the pre-signed URL; same 200/4xx/5xx classification and 64 KiB body cap provider.go — Verify() now reads the Bearer token (not raw headers); SSRF guard via expectedHost field; same cache + ARN allowlist semantics forge-core/auth/ provider.go — HeadersFromRequest reverts X-Amz-Date and X-Amz-Security-Token (no longer needed); keeps X-Goog-Iap-Jwt-Assertion for gcp_iap provider.go — TokenKind detects "forge-aws-v1." prefix → "sigv4" (was: "AWS4-HMAC-SHA256 " on raw Authorization) middleware.go — simplify: empty-Bearer fallback only handles IAP (aws_sigv4 rides standard Bearer flow now) scripts/forge-aws-sign.py — rewrite as a clean reference client. --token-only: print just the token for use with curl/other tools Otherwise: do the round-trip POST and print the response CHANGELOG.md — replace "client wrapper required" friction note with the 3-line happy path snippet what stays unchanged: - forge.yaml shape (still type: aws_sigv4, region:, allowed_principals:) - identity_cache.go, arn_matcher.go (cache and authz logic untouched) - security.AuthDomains (sts.<region>.amazonaws.com derivation) - forge-cli/cmd/init* flag set and renderer - validate.ValidateAuthConfig (region still required) - forge-ui/handlers_create.go (AuthProviderTypeMeta entry) Tests: 42 packages pass, golangci-lint v2.10.1 clean, gofmt clean, no aws-sdk-go imports (decision §9.1 still holds). Net diff: +732 / -625 lines (mostly test rewrites; ~80 LOC net less in the provider package because the new flow is structurally simpler).
…n client Two correctness fixes surfaced by live AWS testing of the pre-signed URL pattern from b3444c2. 1. Preserve the raw URL byte-for-byte. Round-tripping the presigned URL through Go's net/url package re-encoded query params in subtle ways (e.g. "/" in X-Amz-Credential, "+" inside X-Amz-Security-Token) that didn't match how the AWS SDK emitted them on the caller side. STS recomputes the canonical request using whatever bytes we send and gets a different hash → 4xx SignatureDoesNotMatch → audit reason "rejected". - PresignedToken gains a RawURL field — the exact bytes from the decoded token payload. - The parsed *url.URL is kept ONLY for SSRF host validation and query-param inspection. It is NEVER used to construct the outbound request. - Provider.Verify now passes parsed.RawURL to STSClient.GetCallerIdentity. 2. Use SigV4QueryAuth directly in the reference client (not boto3's high-level generate_presigned_url). boto3.client('sts').generate_presigned_url('get_caller_identity', ...) produces a URL STS rejects with SignatureDoesNotMatch when GET. Known quirk — the high-level presigner signs as if the request were a POST. aws-iam-authenticator works around this by signing the AWSRequest explicitly; scripts/forge-aws-sign.py now does the same: req = AWSRequest(method='GET', url='https://sts.{region}.amazonaws.com/?Action=GetCallerIdentity&Version=2011-06-15') SigV4QueryAuth(creds, 'sts', region, expires=900).add_auth(req) token = 'forge-aws-v1.' + base64.urlsafe_b64encode(req.url.encode()).rstrip(b'=').decode() Live validation against real AWS (account 412664885516, SSO assumed-role): - Happy path: HTTP 400 body-shape error + auth_verify with correct ARN - Deny path: HTTP 401 + auth_fail reason="rejected" + token_kind="sigv4" 42 packages still pass; golangci-lint clean; gofmt clean. (Known follow-up surfaced but out of scope: hot-reload of forge.yaml doesn't rebuild the auth chain, so allowlist changes require a hard restart. Same caveat affects all providers, not just aws_sigv4.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 of the pluggable auth provider work — three cloud-native providers
on top of the Phase 1 foundation (#77 /
7998f12). Customers authenticateto Forge using identities they already have in their cloud; no parallel
IdP required.
aws_sigv4GetCallerIdentityURL with their AWS SDK; Forge invokes it. STS returns the caller's canonical ARN. Same pattern asaws-iam-authenticator(EKS).Authorization: Bearer forge-aws-v1.<base64-of-presigned-sts-url>gcp_iapX-Goog-Iap-Jwt-Assertionwhen Forge sits behind a GCP HTTPS LB + IAP.X-Goog-Iap-Jwt-Assertion: <jwt>azure_adoidcprovider.Authorization: Bearer <aad-jwt>Forge never holds any IdP secrets — all three providers verify a caller-
minted credential against a third party (STS / GCP JWKS / AAD JWKS).
Why this matters
Today, putting Forge behind any of the three big cloud IdPs requires
standing up a parallel OIDC issuer (Cognito for AWS, Workspace SAML, etc.).
This PR removes that friction:
existing IAM credentials. Zero secrets stored on Forge, no token endpoint
to host.
assertion directly.
AAD-specific quirks (tenant gate, groups overage) handled correctly.
Design pivot during PR review —
aws_sigv4switched to pre-signed URL patternThe PR went through an in-flight design correction caught by real-AWS
smoke testing. The TL;DR:
What was wrong (original design — commits
9a1ebaethrough382294e)The first design had clients sign their POST to Forge using AWS Sigv4
("header reflection"). Forge would forward the signed headers to STS,
expecting STS to validate them.
This is broken in a deterministic way: Sigv4 binds its signature to
the destination host as part of the canonicalized signing input.
Headers signed for Forge's hostname can't be replayed against STS —
STS computes
host: sts.<region>.amazonaws.com, recomputes thesignature, gets a different hash, and rejects with
SignatureDoesNotMatch.Standard tools (
awscurl,boto3.client('sts'), all AWS SDKs) alwayssign for the URL they're calling, so there was no working client path.
What's correct (current design — commit
8568535)Switched to the pre-signed URL pattern that
aws-iam-authenticatoruses for EKS:
GetCallerIdentityURL — signatureembedded in query params, signed for STS's host.
forge-aws-v1., sends as astandard
Authorization: Bearer …header.sts.<region>.amazonaws.com(SSRF guard), GETs the URL, parses the XML response, stamps Identity.
Why this is the right design (not just a fix)
Three design properties land cleanly:
now have the same caller experience: mint a Bearer token, attach it,
send. Three lines of Python (or any AWS-SDK-bearing language). No
custom signing logic, no header manipulation, no per-call procedure.
aws-iam-authenticator-styletoken format works with any AWS SDK's
SigV4QueryAuthAPI.URL. ~80 LOC of STS client code instead of header-forwarding plumbing.
Locked design decisions (unchanged through the pivot)
aws-sdk-go-v2dependency. STS client is hand-rolled HTTP + XML.azure_adcomposes Phase 1oidc.Provider; no JWT/JWKS code inazure_ad/.allowed_principalsusespath.Matchshell globs (no regex).What clients write (the actual 3 lines)
A reference client lives at
scripts/forge-aws-sign.py(~80 LOC; CLI flagsfor
--region,--profile,--token-only, etc.).✅ Real-AWS validation (live AWS account)
End-to-end validated against real AWS STS using SSO assumed-role
credentials in account
412664885516. Forge instance built fromcommit
8568535, running locally.curlwith no auth headersmissing_tokenvalid bearer token requiredauth_verifyfiresauth_verifyemittedallowed_principalsauth_verifyfiresArnMatcheraccepts matching ARNsallowed_principalsrejectedtoken rejected by auth providerArnMatchercorrectly denies non-matching ARNsAudit log emitted (Test #2 success path)
{ "event":"auth_verify", "fields":{ "method":"POST", "path":"/tasks/send", "provider":"aws_sigv4", "user_id":"arn:aws:sts::412664885516:assumed-role/AWSReservedSSO_PowerUserAccess_c794d5f2c2fe4370/Naveen", "org_id":"412664885516", "token_kind":"sigv4", "groups_count":0, "remote_addr":"[::1]:62448" } }Every field correct:
providermatches the configured name,user_idis the STS-returned canonical ARN (assumed-role form, including session name),org_idis the AWS account number,token_kindis the newforge-aws-v1.-prefix detection.Two bugs caught during live testing (both fixed in
8568535)net/url. Round-tripping the presigned URL through Go's*url.URL.String()re-encoded query params in ways that differed from how the AWS SDK emitted them (e.g.,/inX-Amz-Credential,+insideX-Amz-Security-Token). STS recomputed the canonical request from those re-encoded bytes and rejected. Fix:PresignedTokenkeeps aRawURL stringfield with the byte-for-byte original; the parsed*url.URLis used only for SSRF host validation and query-param inspection.generate_presigned_urlquirk. The reference client originally usedboto3.client('sts').generate_presigned_url('get_caller_identity', …), which produces a URL STS rejects withSignatureDoesNotMatch(known boto3 quirk — signs as if POST). Fix: the reference client now uses the lower-levelSigV4QueryAuth.add_auth()directly, same patternaws-iam-authenticatoruses.Other layers also exercised live
sts.us-east-1.amazonaws.com— Forge's outbound STS call was permitted.static_tokenstill works for the dashboard athttp://localhost:9999/(auto-prepended via PrependChain — Phase 1 review Add per-agent secrets, build signing, and forge framework #10 invariant intact under Phase 2).forge.yamlcontent changes for most fields. Caveat: auth-chain providers are constructed once at startup; modifyingallowed_principalsrequires a hard restart (Ctrl-C+forge runagain) for the new allowlist to take effect. Documented as a follow-up — affects all providers, not justaws_sigv4.639bfa9) ensures wizard-scaffolded configs include the STS host inegress_hosts.What was NOT live-tested (and why)
gcp_iapprovider — requires a GCP project with HTTPS LB + IAP enabled. Covered by unit tests with a fake JWKS signer.azure_adprovider — requires an Entra tenant + app registration. Covered by unit tests with a fake AAD.Per
PHASE2_TEST_STRATEGY.md §8.2, those live tests run at release-tag time, not on every PR.What landed (12 commits)
55942d89a1ebaeaws_sigv4provider (initial — header-reflection design)5b71071gcp_iapprovider1e23140azure_adprovider47c474898578f9639bfa9b5f303baaf8375docs/auth(per reviewer feedback)382294eb3444c2aws_sigv4to pre-signed URL pattern8568535SigV4QueryAuthin clientTotal: ~42 files, +5,500 / -700 lines net (mostly tests + the design-pivot rewrite).
Phase 1 compatibility
static_token,oidc,http_verifier). Phase 1 test suite passes unmodified.Headersmap gained one new key —X-Goog-Iap-Jwt-Assertionforgcp_iap. Existing keys unchanged.oidcpackage gained an internalSkipIssuerCheckfield withyaml:"-"— unreachable fromforge.yaml, only set byazure_admulti-tenant. Operators see no change.Security model highlights
aws_sigv4— the pre-signed URL host MUST matchsts.<configured-region>.amazonaws.com. A token whose URL points elsewhere is rejected at parse time, before any outbound request.azure_ad— multi-tenant requires explicit opt-in; Graph calls only fire aftertidvalidation.Known deferred work
forge.yamldirectly until the TUI follow-up lands.auth.providers(incl.allowed_principals) require a hardforge runrestart. Affects all providers, not just Phase 2.Test plan
go test -race -count=1 ./...— 42 packages greengolangci-lint v2.10.1— 0 issuesgofmt -l forge-core forge-cli forge-plugins— cleanaws-sdk-goimport, IAP constants confined, no JWT inazure_ad,skip_issuer_checknever in YAML) — all passgcp_iapandazure_ad— runs at release-tag time perPHASE2_TEST_STRATEGY.md §8.2Design artifacts (offline)
Full design package in
~/Desktop/forge_designs_and_PRD/phase2_implementation/:PHASE2_CLOUD_NATIVE_PROVIDERS.md— top-level design + §9 locked decisionsPHASE2_PROGRESS_MAP.md— diagram-to-PR trackerPHASE2_TEST_STRATEGY.md— pyramid, harnesses, security catalog, CI gates, manual smokePR1_HEADER_CONTRACT.mdthroughPR6_DOCS.md— per-PR checklists with code sketches and acceptance criteria