feat(spark): unify Databricks connector auth with PAT, OAuth M2M, and OIDC federation strategies by rohitrsh · Pull Request #3429 · flyteorg/flytekit

rohitrsh · 2026-04-30T16:57:07Z

Tracking issue

Why are the changes needed?

Databricks has marked Personal Access Tokens (PATs) as a legacy authentication method and is steering customers toward OAuth machine-to-machine (M2M) credentials and OIDC workload-identity federation. Today the Flyte Databricks connector only supports PAT, which means:

Operators have to provision and rotate PATs for every Databricks Service Principal that backs a Flyte tenant.
There is no path to truly secret-less authentication via Kubernetes ServiceAccount JWTs.
The per-namespace tenancy model that landed in Add multi-tenant Databricks token support via cross-namespace K8s secrets #3394 (Closes [Core feature] Add Databricks Serverless Compute Support to Databricks Connector flyte#6911) cannot be extended to OAuth or OIDC without copy-paste work in connector.py.

This PR brings OAuth M2M and OIDC federation to the connector while preserving the existing per-namespace tenancy story, and refactors the existing PAT path so all four modes share one extension point.

What changes were proposed in this pull request?

A unified DatabricksAuth strategy abstraction in a new databricks_auth.py module, plus three new strategies and a refactor of the existing PAT path:

Strategy	What it uses	Where credentials live
`PATAuth`	Personal Access Token (existing flow)	`databricks-token` k8s secret in workflow namespace, with `FLYTE_DATABRICKS_ACCESS_TOKEN` fallback
`OAuthM2MAuth`	Service Principal `client_id` + `client_secret`	`databricks-oauth` k8s secret in workflow namespace, with env-var fallback
`OIDCConnectorIRSAAuth` (Model 1)	The connector pod's own projected JWT (e.g. EKS IRSA)	Federation policy in Databricks bound to the connector SA
`OIDCNamespaceSAAuth` (Model 2)	Per-workflow-namespace ServiceAccount JWT minted via Kubernetes TokenRequest API	Federation policy in Databricks bound to the workflow SA, discovered from SA labels and annotations

Each strategy implements auth_type, get_bearer_token(session), invalidate_cache(), and describe(). select_auth(...) resolves task config -> connector env var -> auto-detect; build_auth(...) reconstructs the strategy from DatabricksJobMetadata so long-running jobs can refresh their token transparently on a 401 response.

Important note for reviewers: PAT is being refactored

This PR is more than a feature addition. It refactors the PAT support that landed via #3394 from a direct call into get_databricks_token(...) into a PATAuth strategy that lives alongside the new modes. The behaviour, env vars, custom-secret-name task field, and cross-namespace lookup semantics are preserved end to end (new tests cover all of those scenarios), but the code path now flows through select_auth -> PATAuth.get_bearer_token -> get_databricks_token. If you reviewed #3394 you may want to read connector.py and the new tests/test_databricks_token.py with this lens; the goal is one extension point for all four modes instead of two.

Files changed

plugins/flytekit-spark/flytekitplugins/spark/databricks_auth.py (new): strategy abstraction, four strategies, select_auth, build_auth, token cache, OIDC discovery cache, validators.
plugins/flytekit-spark/flytekitplugins/spark/connector.py: DatabricksJobMetadata carries auth context for refresh; create calls select_auth; get and delete go through a new _request_with_auth helper that refreshes on 401 and falls back to the stored token for older job metadata.
plugins/flytekit-spark/flytekitplugins/spark/task.py: DatabricksV2 gains optional per-task overrides (databricks_auth_type, databricks_client_id, databricks_oauth_secret, databricks_oidc_token_file, databricks_oidc_audience). All optional, all backward compatible.
plugins/flytekit-spark/tests/test_databricks_auth.py (new): 56 tests covering every strategy, the resolver, auto-detection, fallback rules, and the OIDC Model 2 SA discovery cache.
plugins/flytekit-spark/tests/test_databricks_token.py: refactored to call into the strategy via select_auth while preserving the multi-tenant PAT scenarios from Add multi-tenant Databricks token support via cross-namespace K8s secrets #3394.
plugins/flytekit-spark/tests/test_connector.py: updated DatabricksJobMetadata constructions for the new fields.
plugins/flytekit-spark/README.md: full Authentication section, env-var matrix, RBAC manifest for OIDC Model 2, migration guide from PAT.

Zero workflow-code changes

A core constraint of this PR is that existing Databricks workflows keep working without edits. Operators flip auth modes by setting connector-level env vars; task authors only touch DatabricksV2 fields if a single workflow needs to diverge from the connector default.

How was this patch tested?

Unit tests

$ cd plugins/flytekit-spark
$ pytest tests/test_databricks_auth.py tests/test_databricks_token.py tests/test_connector.py
============================ 100 passed in 3.04s ============================

The full plugin suite (pytest tests/ --ignore=tests/test_remote_register.py) shows 119 passed. The 5 failures in test_wf.py and test_pyspark_transformers.py are a pre-existing PySpark + Java 18+ compatibility issue (Exception: getSubject is not supported) on the local dev machine running Java 25; they pass on upstream CI which uses Java 11/17. None of those tests touch the files in this PR.

End-to-end on a development EKS cluster

Each auth mode was exercised against a real Databricks workspace by patching the connector deployment with an image built from this branch:

pat: per-namespace databricks-token k8s secret. Existing path.
oauth_m2m: per-namespace databricks-oauth k8s secret with client_id and client_secret keys, federation against a Databricks Service Principal.
oidc_federation Model 2: per-workflow-namespace ServiceAccount annotated with flyte.org/databricks-enabled, flyte.org/databricks-client-id, flyte.org/databricks-audience. Connector minted a JWT via Kubernetes TokenRequest, exchanged it for a Databricks bearer token, and the resulting Jobs run in the Databricks console was attributed to the per-namespace Service Principal (verifying Unity Catalog tenancy is preserved).

Pre-commit

$ pre-commit run --files <the seven changed files>
ruff..................Passed
ruff-format...........Passed
codespell.............Passed
pydoclint.............Passed

pydoclint-errors-baseline.txt was not modified by this PR.

Setup process

The plugin README's Authentication section now documents the full operator setup, including the RBAC manifest for OIDC Model 2 and a migration guide from PAT. No infra changes are required for existing PAT users; the connector defaults still resolve PAT first.

Screenshots

n/a

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed.
All commits are signed-off.

Related PRs

Add multi-tenant Databricks token support via cross-namespace K8s secrets #3394: original multi-tenant PAT support. This PR refactors that path into the strategy abstraction; the public behaviour (env vars, secret names, fallback order) is preserved.
Add Databricks Serverless Compute Support #3392: Databricks Serverless compute. Orthogonal feature; preserved in task.py through this rebase.

Docs link

The plugin README in this PR (plugins/flytekit-spark/README.md) is the documentation; there is no separate docs site change.

… OIDC federation strategies Databricks has marked Personal Access Tokens (PATs) as a legacy auth method (https://docs.databricks.com/aws/en/dev-tools/auth/pat) and is steering customers toward OAuth machine-to-machine (M2M) and OIDC workload-identity federation. This change brings both modern auth modes to the Flyte Databricks connector and refactors the existing PAT support into a shared strategy module so all four modes resolve identically. What is added: * OAuth M2M (client credentials) using a per-namespace 'databricks-oauth' K8s secret with operator-level fallbacks via env vars. * OIDC federation, Model 1: the connector pod's own projected JWT (e.g. EKS IRSA) is exchanged for a Databricks bearer token. * OIDC federation, Model 2: per-workflow-namespace ServiceAccount discovery driven by labels and annotations on the SA. The connector mints a JWT via the Kubernetes TokenRequest API and exchanges it for a Databricks token. This preserves the existing per-namespace tenancy model that PAT customers rely on for Unity Catalog access. * A unified DatabricksAuth strategy abstraction in 'databricks_auth.py' with auto-detection, per-strategy token caching, and token refresh on 401 responses for long-running jobs. What changes for existing PAT users: * This PR refactors the PAT support that was added in flyteorg#3394 from a direct function call into a 'PATAuth' strategy that lives alongside the new modes. The behaviour, env vars, and per-namespace 'databricks-token' lookup are preserved end-to-end. Reviewers may want to read 'connector.py' and the new tests with this refactor in mind: PAT now flows through the same 'select_auth' resolver as the new modes so we have one extension point instead of two. * Workflow code is unchanged. 'DatabricksV2' gains optional override fields for power users, but existing tasks keep working without edits. Validation: * 'pytest plugins/flytekit-spark/tests/test_databricks_auth.py plugins/flytekit-spark/tests/test_databricks_token.py plugins/flytekit-spark/tests/test_connector.py' passes (100 tests). * End-to-end tested on an EKS test cluster against a real Databricks workspace for PAT, OAuth M2M, and OIDC Model 2. * Pre-commit (ruff, ruff-format, codespell, pydoclint) clean on the changed plugin files. Tracking: flyteorg/flyte#7319 Related: * flyteorg#3394 (PAT multi-tenancy, refactored here) * flyteorg#3392 (Databricks Serverless compute) * flyteorg/flyte#6911 (original PAT multi-tenancy issue) Signed-off-by: Rohit Sharma <rohitrsh@gmail.com>

codecov · 2026-04-30T17:02:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.19%. Comparing base (39d4a9f) to head (ba19fbf).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (39d4a9f) and HEAD (ba19fbf). Click for more details.

HEAD has 104 uploads less than BASE

Flag BASE (39d4a9f) HEAD (ba19fbf)

105 1

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #3429       +/-   ##
===========================================
- Coverage   84.08%   73.19%   -10.90%     
===========================================
  Files         388      216      -172     
  Lines       31182    22855     -8327     
  Branches     3016     3016               
===========================================
- Hits        26220    16729     -9491     
- Misses       4082     5287     +1205     
+ Partials      880      839       -41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rohitrsh requested review from cosmicBboy, davidmirror-ops, kumare3, machichima, pingsutw, samhita-alla and wild-endeavor as code owners April 30, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spark): unify Databricks connector auth with PAT, OAuth M2M, and OIDC federation strategies#3429

feat(spark): unify Databricks connector auth with PAT, OAuth M2M, and OIDC federation strategies#3429
rohitrsh wants to merge 1 commit intoflyteorg:masterfrom
rohitrsh:feat/databricks-m2m-oidc-auth

rohitrsh commented Apr 30, 2026

Uh oh!

codecov Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rohitrsh commented Apr 30, 2026

Tracking issue

Why are the changes needed?

What changes were proposed in this pull request?

Important note for reviewers: PAT is being refactored

Files changed

Zero workflow-code changes

How was this patch tested?

Unit tests

End-to-end on a development EKS cluster

Pre-commit

Setup process

Screenshots

Check all the applicable boxes

Related PRs

Docs link

Uh oh!

codecov Bot commented Apr 30, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant