feat: add experimental AI provider resource#368
Conversation
Adds a Terraform resource for declarative Coder AI provider configuration, supporting all SDK provider types with AWS Bedrock and API-key providers. Secrets use Terraform 1.11+ write-only arguments and are never stored in state.
- Remove the entitlement check (AI Bridge/Gateway is moving off the premium license gate). - Collapse the api_keys alias map into a single write-only api_key_wo + api_key_wo_version, with a computed api_key_masked. This drops the alias<->UUID-pool reconciliation (mergeAIProviderKeyState); AI Bridge uses only the oldest key per provider, so the multi-key pool was speculative. - Replace settingsEqual with a present-now-or-before nil check; the server merges credentials and clears settings when the block is dropped. - Drop the redundant stringValue/boolValue helpers (ValueString/ValueBool already return the zero value for null/unknown). - State the write-only / Terraform 1.11+ requirement once at the resource level instead of repeating it per attribute.
7c12298 to
844cbf5
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2dde8c9a39
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 891e3b5e17
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
59d4e41 to
b8ee453
Compare
Each parallel test boots its own Coder container with an embedded PostgreSQL. Starting too many at once on the standard hosted runner overwhelms it and they fail to become ready in time. Cap concurrency with -parallel=4 instead of defaulting to GOMAXPROCS.
b8ee453 to
3ca1539
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ca1539a17
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The acceptance suite boots many Coder containers (each with an embedded PostgreSQL) concurrently. On resource-constrained CI runners these can be slow to bind their HTTP listener, causing intermittent 'coder failed to become ready in time' failures within the previous 90s budget. Double the budget to 180s to absorb startup contention.
28abf98 to
4e28632
Compare
|
@codex review re:
Verified this against coder/coder at current main — the premise is correct, but loosening only the provider here wouldn't actually unblock proxy deployments, so I'm holding off. The provider is faithfully mirroring the server's create gate. CreateAIProviderRequest.Validate() in codersdk/aiproviders.go rejects type=bedrock when req.Settings.Bedrock == nil || !req.Settings.Bedrock.IsConfigured() — and IsConfigured() only checks region + credentials, ignoring base_url, exactly like this check. The create handler (coderd/ai_providers.go) calls that same Validate(), so a base-URL-only Bedrock provider (AWS SDK default credential chain, no region) currently returns an HTTP 400 from Coder regardless of what Terraform does. The base-URL-aware IsBedrockConfigured(baseURL, settings) does exist, but it's only used by legacy env-var seeding (cli/server.go) and migration detection (coderd/ai_providers_migrate.go) — never in the create/update API validation path. So if I switch this check to be base-URL-aware in isolation, the apply still fails — just with a server-side 400 instead of a clean client-side diagnostic, which is worse UX. This is an inconsistency in Coder itself between the Bedrock docs and CreateAIProviderRequest.Validate(). The right fix is upstream: have CreateAIProviderRequest.Validate() use IsBedrockConfigured(baseURL, ...) (or correct the docs). Once the API accepts base-URL-only Bedrock, I'll mirror it here |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e28632a3c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| DisplayName types.String `tfsdk:"display_name"` | ||
| Enabled types.Bool `tfsdk:"enabled"` | ||
| BaseURL types.String `tfsdk:"base_url"` | ||
| APIKeyWO types.String `tfsdk:"api_key_wo"` |
There was a problem hiding this comment.
Defer validation when settings is unknown
When settings (or the nested bedrock object) is supplied from an input variable, it is unknown during ValidateResourceConfig, but this model decodes it into Go pointers that cannot represent an unknown object. In that case req.Config.Get can raise a value-conversion diagnostic before the leaf-level unknown checks run, or data.bedrock() can look nil and emit Missing Bedrock Settings, so otherwise valid module configurations such as settings = var.ai_provider_settings fail terraform validate/plan; represent this layer as a types.Object or explicitly defer when the object is unknown.
Useful? React with 👍 / 👎.
Run the acceptance suite with -parallel=1 so only one Coder container boots at a time. If the readiness timeouts disappear, the failures are caused by resource contention from concurrent container startup on the 2-core hosted runner.
When an acceptance test fails (e.g. the readiness timeout in StartCoder), fetch the Coder container's stdout/stderr via the Docker API and emit it through t.Logf so the coderd startup output is visible in CI. This is how we surface why containers never become ready under CI load. The cleanup is registered after the container-removal cleanup so it runs first (LIFO), while the container still exists.
Fix errcheck lint failure by handling the error returned from closing the container log reader, matching the existing puller.Close pattern.
The serial -parallel=1 experiment showed the readiness failures are not caused by Go test parallelism alone, so restore the workflow to the stock go test parallelism while continuing to investigate the startup failures themselves.
The acceptance harness boots Coder without CODER_PG_CONNECTION_URL, so Coder falls back to its embedded PostgreSQL. The upstream Coder image doesn't bundle the Postgres binary, so on every startup Coder downloads the zonky.io embedded-postgres jar from Maven Central. Shared CI egress IPs get rate-limited by Cloudflare (which fronts repo.maven.apache.org), and a single non-200 response reds the whole lane with "coder failed to become ready in time". Bake the binary into a derived image so Coder skips the Maven download entirely (embedded-postgres skips both the fetch and the decompress when <binariesPath>/bin/pg_ctl exists). A new Dockerfile fetches the jar once at build time with retries and extracts it to the path Coder stats. The test job builds this image and points the harness at it via CODER_IMAGE /CODER_VERSION; the harness now falls back to a locally-built image when the registry pull fails.
The CODER_IMAGE/CODER_VERSION env vars (set in CI to point at the derived
embedded-PostgreSQL image) were clobbering explicit per-test version pins.
Back-compat tests that pin an older Coder version (e.g. v2.25.0, v2.29.5)
were therefore running against the latest-derived image, which broke
assertions like TestCheckNoResourceAttr("cors_behavior") since newer Coder
defaults that attribute.
Guard the env override so it only applies when a test left the image and
version at the built-in defaults. The derived image is a substitute for the
default latest boot only (it exists solely at :local and is built FROM
:latest), so version-pinned tests keep using the upstream registry image at
their requested version.
Summary
Adds
coderd_experimental_ai_providerfor managing Coder AI Gateway providers from Terraform, including OpenAI-style API key providers and AWS Bedrock provider settings. Secret inputs use Terraform 1.11+ write-only arguments (*_wo) with explicit version fields so plaintext keys and AWS credentials are sent to Coder without being stored in Terraform state.The resource supports create/read/update/delete/import flows, generated docs/examples, masked key state, Bedrock region derivation from canonical Bedrock
base_urlvalues, API key rotation/clearing semantics, and Bedrock credential rotation/clearing semantics.Validation is split between schema validators and resource-level
ValidateConfig: built-in validators cover simple required-together and non-empty secret checks, while resource validation covers type-dependent combinations and Bedrock's region-or-credentials requirement. Validation deliberately defers when required values are unknown during Terraform's validate/plan walk, matching Terraform's handling of variables and computed references.The tests cover schema validation, write-only secret handling, API key rotation, Bedrock region derivation, unknown-variable deferral, and the plan-stability cases needed for optional+computed AI provider fields.
Notes
This resource requires Terraform 1.11 or later when configured because it uses write-only arguments. The provider test cases for this resource skip Terraform versions below 1.11.
Relates to CODAGT-607