Skip to content

feat: add experimental AI provider resource#368

Open
ethanndickson wants to merge 16 commits into
mainfrom
ethan/ai-provider-resource
Open

feat: add experimental AI provider resource#368
ethanndickson wants to merge 16 commits into
mainfrom
ethan/ai-provider-resource

Conversation

@ethanndickson

@ethanndickson ethanndickson commented Jun 23, 2026

Copy link
Copy Markdown
Member

Summary

Adds coderd_experimental_ai_provider for managing Coder AI Gateway providers from Terraform, including OpenAI-style API key providers and AWS Bedrock provider settings. Secret inputs use Terraform 1.11+ write-only arguments (*_wo) with explicit version fields so plaintext keys and AWS credentials are sent to Coder without being stored in Terraform state.

The resource supports create/read/update/delete/import flows, generated docs/examples, masked key state, Bedrock region derivation from canonical Bedrock base_url values, API key rotation/clearing semantics, and Bedrock credential rotation/clearing semantics.

Validation is split between schema validators and resource-level ValidateConfig: built-in validators cover simple required-together and non-empty secret checks, while resource validation covers type-dependent combinations and Bedrock's region-or-credentials requirement. Validation deliberately defers when required values are unknown during Terraform's validate/plan walk, matching Terraform's handling of variables and computed references.

The tests cover schema validation, write-only secret handling, API key rotation, Bedrock region derivation, unknown-variable deferral, and the plan-stability cases needed for optional+computed AI provider fields.

Notes

This resource requires Terraform 1.11 or later when configured because it uses write-only arguments. The provider test cases for this resource skip Terraform versions below 1.11.

Relates to CODAGT-607

@linear-code

linear-code Bot commented Jun 23, 2026

Copy link
Copy Markdown

CODAGT-607

Adds a Terraform resource for declarative Coder AI provider configuration,
supporting all SDK provider types with AWS Bedrock and API-key providers.
Secrets use Terraform 1.11+ write-only arguments and are never stored in
state.
- Remove the entitlement check (AI Bridge/Gateway is moving off the
  premium license gate).
- Collapse the api_keys alias map into a single write-only api_key_wo +
  api_key_wo_version, with a computed api_key_masked. This drops the
  alias<->UUID-pool reconciliation (mergeAIProviderKeyState); AI Bridge
  uses only the oldest key per provider, so the multi-key pool was
  speculative.
- Replace settingsEqual with a present-now-or-before nil check; the
  server merges credentials and clears settings when the block is
  dropped.
- Drop the redundant stringValue/boolValue helpers (ValueString/ValueBool
  already return the zero value for null/unknown).
- State the write-only / Terraform 1.11+ requirement once at the resource
  level instead of repeating it per attribute.
@ethanndickson ethanndickson force-pushed the ethan/ai-provider-resource branch from 7c12298 to 844cbf5 Compare June 23, 2026 05:46
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dde8c9a39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/provider/ai_provider_resource_test.go
@ethanndickson ethanndickson changed the title Add experimental AI provider resource feat: add experimental AI provider resource Jun 23, 2026
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 891e3b5e17

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/provider/ai_provider_resource.go
@ethanndickson ethanndickson force-pushed the ethan/ai-provider-resource branch from 59d4e41 to b8ee453 Compare June 23, 2026 06:56
Each parallel test boots its own Coder container with an embedded PostgreSQL. Starting too many at once on the standard hosted runner overwhelms it and they fail to become ready in time. Cap concurrency with -parallel=4 instead of defaulting to GOMAXPROCS.
@ethanndickson ethanndickson force-pushed the ethan/ai-provider-resource branch from b8ee453 to 3ca1539 Compare June 23, 2026 06:59
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ca1539a17

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/provider/ai_provider_resource.go
The acceptance suite boots many Coder containers (each with an embedded
PostgreSQL) concurrently. On resource-constrained CI runners these can be
slow to bind their HTTP listener, causing intermittent 'coder failed to
become ready in time' failures within the previous 90s budget. Double the
budget to 180s to absorb startup contention.
@ethanndickson ethanndickson force-pushed the ethan/ai-provider-resource branch from 28abf98 to 4e28632 Compare June 23, 2026 07:27
@ethanndickson

Copy link
Copy Markdown
Member Author

@codex review

re:

Accept Bedrock custom endpoints with default credentials
Honor full Bedrock base URLs without a region

Verified this against coder/coder at current main — the premise is correct, but loosening only the provider here wouldn't actually unblock proxy deployments, so I'm holding off.

The provider is faithfully mirroring the server's create gate. CreateAIProviderRequest.Validate() in codersdk/aiproviders.go rejects type=bedrock when req.Settings.Bedrock == nil || !req.Settings.Bedrock.IsConfigured() — and IsConfigured() only checks region + credentials, ignoring base_url, exactly like this check. The create handler (coderd/ai_providers.go) calls that same Validate(), so a base-URL-only Bedrock provider (AWS SDK default credential chain, no region) currently returns an HTTP 400 from Coder regardless of what Terraform does.

The base-URL-aware IsBedrockConfigured(baseURL, settings) does exist, but it's only used by legacy env-var seeding (cli/server.go) and migration detection (coderd/ai_providers_migrate.go) — never in the create/update API validation path.

So if I switch this check to be base-URL-aware in isolation, the apply still fails — just with a server-side 400 instead of a clean client-side diagnostic, which is worse UX. This is an inconsistency in Coder itself between the Bedrock docs and CreateAIProviderRequest.Validate(). The right fix is upstream: have CreateAIProviderRequest.Validate() use IsBedrockConfigured(baseURL, ...) (or correct the docs). Once the API accepts base-URL-only Bedrock, I'll mirror it here

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e28632a3c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

DisplayName types.String `tfsdk:"display_name"`
Enabled types.Bool `tfsdk:"enabled"`
BaseURL types.String `tfsdk:"base_url"`
APIKeyWO types.String `tfsdk:"api_key_wo"`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Defer validation when settings is unknown

When settings (or the nested bedrock object) is supplied from an input variable, it is unknown during ValidateResourceConfig, but this model decodes it into Go pointers that cannot represent an unknown object. In that case req.Config.Get can raise a value-conversion diagnostic before the leaf-level unknown checks run, or data.bedrock() can look nil and emit Missing Bedrock Settings, so otherwise valid module configurations such as settings = var.ai_provider_settings fail terraform validate/plan; represent this layer as a types.Object or explicitly defer when the object is unknown.

Useful? React with 👍 / 👎.

@ethanndickson ethanndickson self-assigned this Jun 23, 2026
Run the acceptance suite with -parallel=1 so only one Coder container
boots at a time. If the readiness timeouts disappear, the failures are
caused by resource contention from concurrent container startup on the
2-core hosted runner.
When an acceptance test fails (e.g. the readiness timeout in StartCoder),
fetch the Coder container's stdout/stderr via the Docker API and emit it
through t.Logf so the coderd startup output is visible in CI. This is how
we surface why containers never become ready under CI load. The cleanup is
registered after the container-removal cleanup so it runs first (LIFO),
while the container still exists.
Fix errcheck lint failure by handling the error returned from closing the
container log reader, matching the existing puller.Close pattern.
The serial -parallel=1 experiment showed the readiness failures are not
caused by Go test parallelism alone, so restore the workflow to the
stock go test parallelism while continuing to investigate the startup
failures themselves.
The acceptance harness boots Coder without CODER_PG_CONNECTION_URL, so
Coder falls back to its embedded PostgreSQL. The upstream Coder image
doesn't bundle the Postgres binary, so on every startup Coder downloads
the zonky.io embedded-postgres jar from Maven Central. Shared CI egress
IPs get rate-limited by Cloudflare (which fronts repo.maven.apache.org),
and a single non-200 response reds the whole lane with "coder failed to
become ready in time".

Bake the binary into a derived image so Coder skips the Maven download
entirely (embedded-postgres skips both the fetch and the decompress when
<binariesPath>/bin/pg_ctl exists). A new Dockerfile fetches the jar once
at build time with retries and extracts it to the path Coder stats. The
test job builds this image and points the harness at it via CODER_IMAGE
/CODER_VERSION; the harness now falls back to a locally-built image when
the registry pull fails.
The CODER_IMAGE/CODER_VERSION env vars (set in CI to point at the derived
embedded-PostgreSQL image) were clobbering explicit per-test version pins.
Back-compat tests that pin an older Coder version (e.g. v2.25.0, v2.29.5)
were therefore running against the latest-derived image, which broke
assertions like TestCheckNoResourceAttr("cors_behavior") since newer Coder
defaults that attribute.

Guard the env override so it only applies when a test left the image and
version at the built-in defaults. The derived image is a substitute for the
default latest boot only (it exists solely at :local and is built FROM
:latest), so version-pinned tests keep using the upstream registry image at
their requested version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant