AIR CLI Integration: Adding support for air run configuration by riddhibhagwat-db · Pull Request #5657 · databricks/cli

riddhibhagwat-db · 2026-06-18T18:42:42Z

Changes

Ports the air run YAML config schema and its structural validation from the Python CLI (cli/sdk/config.py) to Go, under experimental/air/cmd/.

Schema (runconfig.go): the top-level runConfig plus the nested environment (with docker_image), code_source/snapshot/git, and permission blocks. Reuses the compute model from the parent branch. Includes custom YAML unmarshalers for the three polymorphic fields that don't map to a single Go type: environment.dependencies (string path or inline list), environment.version (string or int), and git.remote (bool or remote-name string).
Loader (runconfig_load.go): loadRunConfig decodes a YAML file with KnownFields(true) — mirroring pydantic's extra="forbid" so unknown keys are rejected — then runs the validation pass.
Validation: every structural rule from the Python schema — required fields, the experiment_name/mlflow_run_name task-key regex and length caps, secret-ref scope/key format, the environment docker-image/dependencies/version exclusivity rules, git branch-xor-commit and remote-requires-branch rules, code_source snapshot requirements, and include_paths relative/no-traversal checks.

Two deliberate divergences from the Python schema, both following from the training-service-only port:

The compute.node_pool_id / compute.pool_name fields were already dropped on the parent branch.
The top-level priority field is dropped here: it's a node-pool queue-ordering knob (it requires a pool in Python) with no meaning for serverless workloads.

Why

"Structural" validation (types, required fields, format/cross-field rules) needs no workspace access, so it's a self-contained, fully unit-testable unit that's worth landing on its own ahead of the launch logic. Splitting it out keeps the upcoming handle_run PR focused on orchestration rather than mixing in ~900 lines of schema.

The extra="forbid" / KnownFields behavior is load-bearing: it's what turns a typo'd or stale config key into an actionable error instead of a silently-ignored field, so it's preserved faithfully. This is stacked on air-integration-m2-1 (the compute model).

Tests

New unit tests in runconfig_test.go (62 subtests, table-driven), covering:

Loading a minimal config and a full-featured config (all blocks populated).
Each polymorphic union decoding both of its forms (dependencies string vs list, git.remote bool vs string, default-unset).
Unknown-field rejection at top level and nested — including explicit cases asserting the dropped priority field and the not-yet-ported bases key surface as errors.
Every validation rule's failure mode, plus file-level errors (missing file, empty file).

go test ./experimental/air/... passes; ./task lint-q reports 0 issues.

Add the experimental `air` command group as the Go port surface for the Python `air` CLI. Every subcommand (run, status, list, logs, cancel, register-image) is registered as a stub that returns a not-implemented error; the real implementations land in later milestones. The package lives under experimental/air/cmd (imported as aircmd), matching the layout of the other experimental features (aitools, genie, postgres); cmd/experimental/ keeps only the dispatcher. TEST_PACKAGES in Taskfile.yml gains ./experimental/air/... so the unit tests keep running after the move. Includes unit tests for the command-tree wiring and the not-implemented stubs, plus an acceptance test exercising the stubs end-to-end. Co-authored-by: Isaac

Rename the run-details subcommand from `status` to `get`, matching the Python air CLI's current `air get run` naming (it replaced `get status`). Renames the file, constructor, command name, and updates the stub/help/unimplemented tests and goldens accordingly. Co-authored-by: Isaac

Implement the read-only run-details command (renamed from `status` to `get`). It fetches a job run via the Jobs API and renders the run's status, start time, duration, retries, experiment, accelerators, dashboard URL, MLflow deep-link, and a foreach/sweep summary. Output is the air-style {v, ts, data} JSON envelope under -o json, or a text view. Renames the command-level identifiers (status -> get) while keeping the run's "status" field/label. Adds format/mlflow/sweep/output helpers with unit tests and an acceptance test, and drops `get` from the not-implemented stub coverage. Co-authored-by: Isaac

Co-authored-by: Isaac

Add compute.go: the gpuType model and compute-block validation the upcoming `air run` config layer depends on. Defines the canonical GPU_* accelerator types, parseGPUType (exact, case-sensitive), gpusPerNode (partition counts), and computeConfig.validate (positive count, multiple-of-per-node, mutually exclusive node_pool_id/pool_name). Co-authored-by: Isaac

The training compute config no longer supports pool placement, so remove the node_pool_id and pool_name fields and the validation that rejected setting both. Co-authored-by: Isaac

Port the run YAML schema and its structural validation from the Python CLI's sdk/config.py: the top-level runConfig plus the environment, docker_image, code_source/snapshot/git, and permission blocks. loadRunConfig decodes a YAML file with KnownFields (mirroring pydantic extra="forbid") and runs the validation pass. "Structural" covers types, required fields, and format/cross-field rules that need no workspace access. Online checks (compute pool resolution, GPU availability), git/filesystem checks, _bases_ composition, and CLI --override handling are deferred to later milestones. Two deliberate divergences from the Python schema, both following from the training-service-only port: the compute pool fields were already dropped, and the top-level priority field is dropped here since it is a node-pool queue-ordering knob with no meaning for serverless workloads. Co-authored-by: Isaac

github-actions · 2026-06-18T18:44:32Z

Waiting for approval

Could not determine reviewers from git history.
Round-robin suggestion: @apeforest

Eligible reviewers: @apeforest, @ben-hansen-db, @bfontain, @lu-wang-dl, @maggiewang-db, @panchalhp-db, @pardis-beikzadeh-db, @vinchenzo-db

_{Suggestions based on git history. See OWNERS for ownership rules.}

eng-dev-ecosystem-bot · 2026-06-18T19:13:09Z

Integration test report

Commit: 73088e7

Run: 27781953252

	Env	🟨KNOWN	💚RECOVERED	🙈SKIP	✅pass	🙈skip	Time
🟨	aws linux	7		13	264	1014	7:04
🟨	aws windows	7		13	266	1012	8:27
💚	aws-ucws linux		7	13	360	928	6:14
💚	aws-ucws windows		7	13	362	926	7:20
💚	azure linux		1	15	267	1012	5:47
💚	azure windows		1	15	269	1010	5:30
💚	azure-ucws linux		1	15	365	924	6:59
💚	azure-ucws windows		1	15	367	922	7:57
💚	gcp linux		1	15	263	1015	6:53
💚	gcp windows		1	15	265	1013	8:06

20 interesting tests: 13 SKIP, 7 KNOWN

	Test Name	aws linux	aws windows	aws-ucws linux	aws-ucws windows	azure linux	azure windows	azure-ucws linux	azure-ucws windows	gcp linux	gcp windows
🟨	TestAccept	🟨K	🟨K	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/invariant/no_drift	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/permissions	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions	🟨K	🟨K	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions	🟨K	🟨K	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct	🟨K	🟨K	💚R	💚R
🟨	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	🟨K	🟨K	💚R	💚R
🙈	TestAccept/bundle/resources/postgres_branches/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/replace_existing	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/update_protected	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/without_branch_id	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_projects/update_display_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/synced_database_tables/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/ssh/connection	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S

Top 21 slowest tests (at least 2 minutes):

duration	env	testname
5:00	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:50	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:23	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:15	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:37	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:28	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:01	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:56	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:56	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:56	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:51	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:51	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:49	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:48	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:46	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:33	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:30	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:23	gcp linux	TestSecretsPutSecretStringValue
2:22	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

riddhibhagwat-db added 8 commits June 12, 2026 20:40

experimental/air: rename stale TestBuildStatusData to TestBuildGetData

89042d0

Co-authored-by: Isaac

experimental/air: apply testifylint fixes in get/format tests

c99239c

Co-authored-by: Isaac

experimental/air: drop node pool / pool name compute fields

62be1a1

The training compute config no longer supports pool placement, so remove the node_pool_id and pool_name fields and the validation that rejected setting both. Co-authored-by: Isaac

Merge branch 'air-cli' into air-integration-m2-2

73088e7

riddhibhagwat-db temporarily deployed to test-trigger-is June 18, 2026 18:48 — with GitHub Actions Inactive

riddhibhagwat-db requested review from ben-hansen-db and maggiewang-db June 18, 2026 21:10

riddhibhagwat-db self-assigned this Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIR CLI Integration: Adding support for air run configuration#5657

AIR CLI Integration: Adding support for air run configuration#5657
riddhibhagwat-db wants to merge 9 commits into
air-clifrom
air-integration-m2-2

riddhibhagwat-db commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

eng-dev-ecosystem-bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

riddhibhagwat-db commented Jun 18, 2026

Changes

Why

Tests

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Waiting for approval

Uh oh!

eng-dev-ecosystem-bot commented Jun 18, 2026

Integration test report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 18, 2026 •

edited

Loading