feat(registry): configurable driver registries via TOML config#360
feat(registry): configurable driver registries via TOML config#360
Conversation
5a8bf36 to
5ac1ddd
Compare
0813c82 to
79e6f5c
Compare
79e6f5c to
40e5ad2
Compare
amoeba
left a comment
There was a problem hiding this comment.
I did a quick code review over the new pieces and left some comments. Even if we don't have an immediate use case for this, I think having it in dbc is useful.
Some other more general thoughts:
- Would we want CLI integration for this? I think it would be useful for dbc to be able to do all the work to show/add/remove registries from the project and user config.
- I didn't see any tests for how
dbc installwill work with this. Presumably dbc install will respect the new fields in the user config. - Do we need to define a default priority order for registries to handle name conflicts? We have a lot of the same concerns as conda does with channels. I'm wishing for more tests in this PR for the interaction between the default, user, and project registries around name conflicts.
|
|
||
| func SetProjectRegistries(entries []RegistryEntry, replaceDefaults *bool) error { | ||
| if os.Getenv("DBC_BASE_URL") != "" { | ||
| return nil |
There was a problem hiding this comment.
I think we should warn the user here.
| return nil | ||
| } | ||
| if replaceDefaults != nil && *replaceDefaults && len(entries) == 0 { | ||
| return fmt.Errorf("replace_defaults = true requires at least one [[registries]] entry; omit replace_defaults or add a registry entry") |
There was a problem hiding this comment.
I feel like this error would be better surfaced earlier, such as when we initially parse config.toml or dbc.toml. How this is now, I think this would error like this,
error configuring project registries: replace_defaults = true requires at least one [[registries]] entry; omit replace_defaults or add a registry entry
As a user, it's unclear what I should do to fix it. Which file is my problem in?
If we validated this earlier, I bet we'd have enough context to know which file has the issue and we could tell them.
| } | ||
| for _, e := range entries { | ||
| if e.URL == "" { | ||
| return fmt.Errorf("registry entry has empty url") |
There was a problem hiding this comment.
Are these checks needed or is this the best place to handle this? I see loadGlobalConfig already does validation and that seems like a better place. i.e., I much prefer being able to know that any []RegistryEntry I might get is always valid so I'd rather error before creating a RegistryEntry.
| }) | ||
| } | ||
|
|
||
| func boolPtr(b bool) *bool { return &b } |
There was a problem hiding this comment.
is this the best place for this def?
| } | ||
|
|
||
| if configDir, err := internal.GetUserConfigPath(); err != nil { | ||
| fmt.Fprintf(os.Stderr, "warning: failed to locate config directory: %v\n", err) |
There was a problem hiding this comment.
Do we really want to emit a warning? Won't this warn if the folder isn't found? Seems like we should be okay with that.
The warning when the registry config fails to load once found is fine.
|
|
||
| # Sisyphus workflow artifacts (plans, evidence, notepads) | ||
| .sisyphus/ |
There was a problem hiding this comment.
is this just a tool you use? can we revert this?
| if configDir, err := internal.GetUserConfigPath(); err != nil { | ||
| fmt.Fprintf(os.Stderr, "warning: failed to locate config directory: %v\n", err) | ||
| } else if err := dbc.ConfigureRegistries(configDir); err != nil { | ||
| fmt.Fprintf(os.Stderr, "warning: failed to load registry config: %v\n", err) |
There was a problem hiding this comment.
I don't think the error here includes the file with the issue. Whenever we show an error about a TOML file, we need to show the user the path to it.
| if u.Scheme != "http" && u.Scheme != "https" { | ||
| return Registry{}, false |
There was a problem hiding this comment.
Is the validation in here even needed? In the parent call, we already error when a RegistryEntry is invalid.
Introduce RegistryEntry, GlobalConfig, LoadGlobalConfig, WithGlobalConfig, and WithProjectRegistries. NewClient now merges project, global, and default registries (project wins, then global, then defaults) with deduplication by scheme+host+path. WithBaseURL still overrides all other registry sources. Validation (empty URL, missing host, non-http scheme, replace_defaults=true with no entries) runs at NewClient time so malformed configuration surfaces immediately rather than at first request.
Extend DriversList with omitempty Registries and ReplaceDefaults fields so projects can pin or override driver registries inline in dbc.toml. Existing dbc.toml files without those keys continue to decode unchanged. applyProjectRegistries rebuilds the process-wide dbc client with the merged registry list; callers (add, sync) invoke it after decoding the list.
Load ~/.config/dbc/config.toml once at startup and thread it into the default client via WithGlobalConfig. Project commands (add, sync) rebuild the client with WithProjectRegistries after parsing dbc.toml so driver lookups use the merged registry list. DBC_BASE_URL continues to short circuit all registry merging. Tests cover project-registry wiring, invalid project URLs, and backwards-compatibility with dbc.toml files that omit the new fields.
a41a2ad to
5f8660a
Compare
Two issues from roborev review: - DriversList.ReplaceDefaults is now *bool so a project can explicitly set replace_defaults = false to override a global config's replace_defaults = true (previously collapsed into unset vs false). - NewClient now validates GlobalConfig.Registries URLs with the same path used for project registries, and rejects any configuration that produces an empty registry list (e.g. global replace_defaults = true with zero entries), so malformed config surfaces at NewClient time instead of at first request with silent zero-registry lookups. - LoadGlobalConfig applies the same no-entries check when decoding config.toml so CLI users see the error at startup.
…ries The existing TestAdd/TestSync project-registry tests route through testBaseModel, whose getDriverRegistry stub bypasses dbcClient entirely. Add direct coverage that swaps in a lookup routed to the real dbcClient against an httptest server, proving applyProjectRegistries genuinely rewires driver resolution to the merged registry list and that: - a project [[registries]] entry routes Search() to that server; - an explicit project replace_defaults=false restores built-in defaults even when the global config declared replace_defaults=true; - DBC_BASE_URL continues to short-circuit project registry overrides.
LoadGlobalConfig rejecting global replace_defaults=true with zero entries made the CLI diverge from NewClient: the CLI dropped the global config with only a warning, silently re-enabling the built-in defaults even when the project dbc.toml supplied the registries the user actually wanted. Remove that premature check; the post-merge empty-list guard in NewClient already catches the genuinely broken config (global replace_defaults=true with no entries AND no project entries), while letting the valid combination through. Adds both a library test and a CLI wiring test for the project-supplies-entries case.
Eager initDBCClient() in main() failed startup for the valid
'global replace_defaults=true + project supplies registries' case —
NewClient was called before the project dbc.toml could merge in.
Remove the eager call and rely on lazy init via getDriverRegistry /
initDBCClient, so project commands get a chance to rebuild the client
via applyProjectRegistries first.
Non-project commands (search, info, install) still fail fast on the
broken case (global replace_defaults=true with no entries and no
project override) because NewClient rejects the empty merge — now
surfaced at first use rather than startup.
auth.go's requestDeviceCode now ensures init before touching
dbcClient.HTTPClient(). dbcClientOnce is a pointer so tests can simulate
fresh startup.
Adds TestAddCmdEndToEndThroughRealClient which runs AddCmd.GetModel()
against an httptest registry — proves the production getDriverRegistry
closure routes through the rebuilt dbcClient. Adds TestStartupDeferred*
tests for the main() ordering invariants.
Also replaces 'switch { case err == ... }' with 'switch err' in main().
…in GetDriverList Two findings from roborev review of the branch: MEDIUM: OAuth device-code login ran through initDBCClient just to get an HTTPClient. A misconfigured global registry (e.g. replace_defaults=true with no entries) broke 'dbc auth login' — the very recovery path the user would run to fix the config. Add authHTTPClient() that builds a bare HTTP client without touching registry validation, and route device-code calls through it. LOW: GetDriverList decoded the new [[registries]] / replace_defaults fields but never called applyProjectRegistries, so library consumers using that helper silently ignored project registry overrides. Call it immediately after decoding. Regression tests pin both behaviors: - TestAuthHTTPClientDoesNotRequireRegistryConfig - TestGetDriverListHonorsProjectRegistries
Three findings from review of the branch: MEDIUM (1632): GetDriverList mutated the process-wide dbcClient via applyProjectRegistries, which leaked registry state across calls — a later invocation on a dbc.toml without [[registries]] would silently reuse the previous call's client. Switch GetDriverList to build a per-call dbc.Client scoped to the decoded file, so each call sees a fresh registry merge. Test exercises two sequential calls against two different servers to prove no leakage. MEDIUM (1630): NewClient's pre-merge guard rejected project replace_defaults=true with no project entries, even when the global config supplied registries. Remove the pre-merge guard; the post-merge empty-list check is the correct gate and lets 'drop defaults, keep globals' through. Test updated and new test pins the global-entries+project-replace_defaults case. LOW (1629): startup tests didn't exercise the real main() loading path. Extract loadStartupRegistryConfig() and add TestStartupEndToEndGlobalReplaceDefaultsWithProjectEntries that writes a real config.toml + dbc.toml, runs the full startup sequence, and drives AddCmd.GetModel() against an httptest server.
Addresses review 1633: MEDIUM: extract parseStartupArgs() and have main() use it, so the same config-load + argv-parse helper runs under test. The startup regression test now asserts at each step that dbcClient remains nil — if anyone reintroduces eager initDBCClient() between config load and subcommand dispatch the test fails at whichever step broke. LOW: update WithProjectRegistries doc comment to match the actual semantics: an error only when the MERGED registry list is empty. The old text claimed replace_defaults=true with no entries is always rejected, which contradicts the post-merge-only validation.
Addresses review 1634: the previous parseStartupArgs extraction only covered the parser bit. Anyone reintroducing an eager initDBCClient() between p.Parse() and GetModel() would still not have failed the test. Extract a single runStartup() that covers the full path main() runs: config load, argv parse, subcommand selection, and GetModel() for model commands. main() now calls runStartup() and switches on the returned startupKind for its exit cases. Tests drive runStartup() end-to-end, asserting dbcClient remains nil throughout. No functional change to main()'s behavior — same exit codes, same help and version paths, same subcommand dispatch.
Addresses review 1635: after the runStartup refactor, main() always
called runStartup(configDir, ...) even when internal.GetUserConfigPath
failed, leaving configDir = "". runStartup then forwarded that to
LoadGlobalConfig(""), which reads ./config.toml from the current
working directory. That changed registry resolution based on cwd — a
regression against the pre-refactor behavior that simply skipped the
load on this error path.
Guard runStartup to skip the load when configDir is empty. Document the
sentinel in the helper's doc comment. Add TestRunStartupSkipsLoadWhenConfigDirEmpty
that plants a hostile ./config.toml and asserts runStartup leaves
globalRegistryConfig nil.
Addresses review 1636: runStartup("", ...) skipped the load but did not
clear an already-primed globalRegistryConfig, so a second in-process
call would silently reuse the previous invocation's global registries.
Set globalRegistryConfig = nil explicitly in the empty-configDir branch
and add TestRunStartupClearsStaleGlobalConfigWhenConfigDirEmpty, which
primes the global before calling runStartup("") and asserts the
state is cleared.
Addresses review 1637: clearing only globalRegistryConfig was
insufficient because dbcClient is cached behind dbcClientOnce. A
repeat in-process runStartup would hit the already-run sync.Once and
reuse the prior invocation's client.
Introduce resetClientState() that clears dbcClient, dbcClientErr,
dbcClientOnce, and globalRegistryConfig, and call it at the top of
runStartup so every invocation starts from a clean slate regardless
of configDir. Replaces the narrower empty-configDir-only reset.
Strengthen the regression test to prime all three state paths
(globalRegistryConfig + dbcClient + a 'used' sync.Once) before
calling runStartup(""), and assert a subsequent initDBCClient()
produces a client whose registries do NOT include the stale URL.
Addresses review 1639: TestRunStartupClearsStaleClientState primed globalRegistryConfig, dbcClient, and dbcClientOnce but left dbcClientErr unseeded, so a regression that stopped resetting dbcClientErr in resetClientState() would still pass. Prime a sentinel error and assert runStartup clears it and a fresh initDBCClient() does not surface the sentinel.
Two MEDIUM findings from the full-branch review (job 1641): 1. NewClient regression: passing WithRegistries() alongside WithGlobalConfig/WithProjectRegistries caused the explicit list to be treated as the 'defaults' slice in the merge, prepending extra entries and generally not honoring the documented 'sets the driver registries to use' contract. Track explicitRegistries separately and skip the config-file merge when WithRegistries was passed. Added TestNewClientWithRegistryOptions/WithRegistries_takes_precedence... 2. cmd/dbc add lock ordering: the project lock wrapped the registry lookup, so a slow registry fetch could hold .dbc.project.lock for up to the lookup timeout. Concurrent dbc commands would hit the 10s lock timeout and fail with 'another dbc operation is in progress' even though no file mutation was happening. Restructure the add flow: decode + applyProjectRegistries + getDriverRegistry WITHOUT the lock, then acquire the lock only around the read-modify-write phase, re-reading the file under the lock so concurrent additions aren't clobbered. Added TestAddDoesNotHoldLockDuringRegistryLookup.
Addresses review 1645: the post-lock merge wrote the entire stale pre-lock m.list back onto the freshly re-read on-disk state. That could clobber a concurrent dbc add/remove that changed a different driver, and unconditionally overwrote any concurrent edits to [[registries]] / replace_defaults. Narrow the merge to only the driver entries this invocation actually added/replaced (iterating 'specs'), and leave current.Registries / current.ReplaceDefaults untouched so concurrent edits to those fields survive. Added TestAddPreservesConcurrentEdits which simulates a concurrent editor modifying the file mid-registry-lookup (new driver + [[registries]] addition) and asserts both the new driver and the concurrent changes are present after the add completes.
…faults Addresses review 1646: MEDIUM: a concurrent editor changing [[registries]] or replace_defaults between the unlocked driver lookup and the locked write-back phase could cause dbc add to write drivers that were validated against a stale registry set — later sync/install would fail against the new config. Under the lock, compare the re-read list's registry fields to the pre-lock snapshot via a new registriesChanged() helper. On drift, return a clear 'please retry' error rather than write inconsistent state. LOW: TestAddPreservesConcurrentEdits claimed to cover replace_defaults but the simulated concurrent content didn't set it. Split into two tests: one for concurrent driver-only edits (preserved) and one for registry-config drift (aborts). The drift test verifies the written file still contains the concurrent replace_defaults=true, the concurrent registry, and that the aborted add did NOT write its driver entry.
Addresses review 1648: the previous pattern — read-without-lock, lookup, reacquire-for-write — left a new torn-read race. A concurrent writer using os.Create (the writer path for dbc add/remove) could expose partial file contents to the unlocked initial decode, turning a valid concurrent operation into a spurious TOML parse error or a snapshot of torn state. Take the project lock briefly for the initial read, release it before the network lookup (so slow registry fetches still don't hold the lock), and reacquire for the final merge/write. The lock is never held across I/O that blocks on external services. TestAddDoesNotHoldLockDuringRegistryLookup still passes — the read lock is released before the stalled getDriverRegistry call begins.
Addresses review 1649 (LOW): the torn-read fix was uncovered by any regression test. Add TestAddInitialReadBlocksWhileLockHeld which holds the project lock externally and instruments getDriverRegistry to signal when add progressed past the initial read. With the read lock in place, add blocks before reaching the registry lookup and the test waits out a 300ms window before releasing the lock. Without the read lock fix, add reads the file and drives into getDriverRegistry while the lock is held, which the test detects via the channel signal and fails. Verified the test fails under a reverted fix (python-edited add.go into the pre-fix state) and passes on the current code.
Addresses review 1652 (LOW): the prior TestAddInitialReadBlocksWhileLockHeld test used a 300ms timeout on a registry-hook channel as the failure signal, which could spuriously pass on slow CI runners and didn't exercise the torn-read failure mode directly. Add TestAddInitialReadIsAtomicAgainstTornState: write invalidTOML while holding the project lock, start dbc add, assert add does NOT surface a decode error (it's blocked on the lock), then restore validTOML and release. If the initial read skips the lock, add reads the invalid TOML and the channel yields a TOML decode error, which the test detects. Verified: under a reverted fix (initial read without the lock) this test fails with 'toml: expected character ='. Under the current code the test passes — add blocks on the lock until valid content is restored.
Addresses review 1654 (LOW): registriesChanged compared whole RegistryEntry structs, so a concurrent edit that only renamed a registry would false-positive as 'registry configuration changed' and abort dbc add even though the URL resolution is identical. Equivalent URL spellings (trailing slash, host casing) would also false-positive. Normalize URLs (lowercase scheme/host, strip trailing path slash) and ignore Name when comparing. Documented the intent in the helper's doc comment and added TestRegistriesChanged covering name-only changes, trailing-slash variation, case differences, and tri-state replace_defaults. The 1654 MEDIUM on 'project commands rely on process-global state' is intentionally left as-is: cmd/dbc is a single-command-per-process binary, resetClientState() at the top of runStartup clears all cached state at the start of every run, and in-process reuse is a test-only concern that the test harness handles. Adding Once/Err resets to setDBCClient would be defensive code for an unreachable path.
…tion Addresses review 1657: normalizeRegistryURL dropped query, userinfo, fragment, and full path — so two URLs like 'https://r.example.com/?tenant=a' and 'https://r.example.com/?tenant=b' compared equal. That could hide a real registry config change from the dbc add drift check, letting it write a driver resolved against different registry settings than the final dbc.toml. Rebuild the URL via url.URL.String() after lowercasing the scheme and host and trimming a trailing path slash — only the truly no-op differences. Query, userinfo, fragment, and interior path segments are preserved. Added regression tests for tenant-selector query, userinfo, and path-segment changes all comparing unequal.
Addresses review 1658 (LOW): fragments aren't sent on HTTP requests, so two registry URLs that only differ in fragment resolve to the same endpoint. The prior normalizeRegistryURL preserved fragments, which meant a cosmetic '#a' -> '#b' edit triggered the config-drift abort in dbc add. Clear Fragment and RawFragment before serializing; added a regression test pinning that fragment-only changes compare equal.
…ntics Addresses review 1660: mergeRegistries deduplicated entries using a key of scheme://host + trimmed path, which collapsed semantically distinct registries — tenant-bearing query strings, credential-bearing userinfo, and path-mounted tenants were all treated as duplicates. The merged client silently dropped configured registries and resolved drivers against the wrong tenant/identity. Update the dedupe key to the same normalization used by cmd/dbc's registriesChanged drift check: lowercase scheme/host, trim only a trailing path slash, drop fragment — preserve query, userinfo, and path segments. Added regression tests in TestMergeRegistries covering tenant query, userinfo, path-mounted tenants, casing/trailing-slash collapse, and fragment collapse.
Three separate suites (TestLoadGlobalConfig, TestMergeRegistries, TestNewClientWithRegistryOptions) were each a long chain of t.Run closures with repeated setup. Convert each to a single table-driven test with a small struct of inputs + expected outputs and one driver loop. Adds an urls() helper so mergeRegistries cases can assert the full ordered URL list in one expression. Same coverage, much shorter, easier to scan and extend.
…able Addresses review 1662 (LOW): the table refactor only checked len(cfg.Registries) + ReplaceDefaults, dropping the pre-refactor assertions on the parsed URL and Name fields. A regression in TOML decoding, ordering, or field mapping would now pass unnoticed. Replace the wantRegs int with a wantEntries []RegistryEntry field containing the fully-specified expected contents, and compare via assert.Equal so ordering, URL, and Name mismatches all fail the test.
Addresses review 1664 (LOW): registriesChanged compared raw project fields, so semantically-equivalent edits produced false-positive drift aborts — e.g. flipping nil → &false when defaults were already inherited, or adding an exact duplicate registry entry that mergeRegistries would dedupe anyway. Rewrite the helper to run both DriversList values through the same newDBCClient merge path and compare the resulting normalized URL sets. That matches what the client actually uses for driver resolution, so mergeRegistries no-ops and tri-state flips that don't actually change the active registry set no longer abort dbc add. Added three new subtests: - nil vs &false replace_defaults (no global override): equal - exact duplicate entries: equal - name-only duplicate rename: equal The 1664 MEDIUM on concurrent global config.toml edits is intentionally left out of scope. The global-config-during-add race is extremely unlikely in practice (user editing ~/.config/dbc/config.toml while a separate shell runs dbc add), and the recovery is a simple retry — the already-written driver entry is valid against the dbc.toml the user wrote, just not against the fresh global config, which 'dbc sync' or a fresh add would correctly report.
Summary
Adds support for configuring driver registries via TOML config files, allowing users to add custom registries at both the global (
~/.config/columnar/dbc/config.toml) and project (dbc.toml) level.What's New
~/.config/columnar/dbc/config.toml): loaded on every command viaConfigureRegistries()called inmain()dbc.toml):[[registries]]section respected bysyncandaddcommandsreplace_defaults = true: opt-in flag to suppress built-in registries entirelyDBC_BASE_URLenv var: still overrides everything (unchanged behavior)dbc.tomlfiles without[[registries]]work unchangedConfig Format
Global (
~/.config/columnar/dbc/config.toml):Project (
dbc.toml):Implementation Notes
http/httpsscheme and non-empty hostdefaultRegistriesis snapshotted eagerly ininit()(afterdrivers.goinit runs); env-sensitivity documented in commentsadd.godecodesdbc.tomlonce (single file open) for both registry config and driver listremoveintentionally omits registry wiring — it never queries a registry