Skip to content

feat: check_models external readiness check#712

Merged
mikeknep merged 9 commits into
mainfrom
dd-model-health/mknepper
Jun 11, 2026
Merged

feat: check_models external readiness check#712
mikeknep merged 9 commits into
mainfrom
dd-model-health/mknepper

Conversation

@mikeknep

Copy link
Copy Markdown
Contributor

📋 Summary

Introduces a new check_models CLI command and DataDesigner interface method for checking external readiness of models and tools without triggering a full workload (preview or create). This is the "external deps" analogue to the existing validate functionality (internal coherence).

🔗 Related Issue

N/A, direct to PR

🔄 Changes

  • Lifts the checks run by DatasetBuilder to a standalone readiness.py engine module. Used by both the builder and the end user-facing interfaces
  • Creates a flags.py engine module where the async engine flag is centralized. This cleans up some duplication of the env var magic string / constant that was floating around in a few places.

🧪 Testing

  • make test passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@mikeknep mikeknep requested a review from a team as a code owner May 29, 2026 15:02
@greptile-apps

greptile-apps Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a check_models CLI command and DataDesigner.check_models() interface method that runs the same external readiness probes (model health checks and MCP tool checks) that already gate preview and create, but without triggering a full workload. It also extracts the shared check logic into a new engine/readiness.py module and centralizes the async engine feature flag into a new engine/flags.py.

  • New readiness.py engine module lifts _run_model_health_check_if_needed and _run_mcp_tool_check_if_needed out of DatasetBuilder into a public run_readiness_check() function; both DatasetBuilder.build/build_preview and DataDesigner.check_models now share the exact same code path.
  • New flags.py centralizes DATA_DESIGNER_ASYNC_ENGINE so the engine, interface, and readiness modules read a single source of truth instead of each re-parsing the env var independently.
  • check-models CLI command mirrors the existing validate command shape — takes a config source, prints success or a typed error message, exits 1 on failure — complementing validate (internal readiness) with external readiness (provider liveness).

Confidence Score: 5/5

Safe to merge. The refactoring is a clean extraction of existing logic into a shared module with no behavioral regressions.

The readiness check logic is lifted verbatim from DatasetBuilder private methods into readiness.py, and both the existing workload startup path and the new check_models path call the same function. Moving _resolve_async_compatibility() before the readiness probe corrects a subtle inconsistency where the old health check dispatched on the raw env-var flag rather than the resolved async mode. Validation improvements in registry.py are additive correctness fixes. Test coverage is thorough across engine, interface, and CLI layers.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/readiness.py New module; lifts model and MCP health check logic out of DatasetBuilder into a shared run_readiness_check() function. Logic is equivalent to the removed private methods, with the improvement that async vs. sync dispatch now follows the resolved client_concurrency_mode rather than the raw env-var flag.
packages/data-designer-engine/src/data_designer/engine/flags.py New module centralizing DATA_DESIGNER_ASYNC_ENGINE flag; straightforward env-var read with a module-level bool, enabling tests to monkeypatch a single location.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py Replaces the two inline health-check methods with run_readiness_check(); moves _resolve_async_compatibility() before the readiness call to correctly match client mode; switches all flag references to flags.DATA_DESIGNER_ASYNC_ENGINE.
packages/data-designer-engine/src/data_designer/engine/models/registry.py Adds _parse_health_check_chat_response (validates non-empty string) and _validate_health_check_embedding_response (validates exactly one non-empty vector) to replace the permissive lambda x: x parser and unvalidated embedding calls.
packages/data-designer/src/data_designer/interface/data_designer.py Adds check_models() method adjacent to validate(); delegates to run_readiness_check with the resolved client concurrency mode.
packages/data-designer/src/data_designer/cli/controllers/generation_controller.py Adds run_check_models(); handles DataDesignerError and generic Exception separately to surface the error class name for typed engine errors.
scripts/health_checks.py Refactored to use DataDesigner.check_models() rather than calling ModelFacade directly; delegates embedding response validation to the registry layer.
packages/data-designer-engine/tests/engine/test_readiness.py New test file with thorough coverage: alias collection, skip-when-empty, error propagation, ordering, async dispatch, timeout/cancel, and client-mode selection.
packages/data-designer/tests/interface/test_data_designer.py Adds check_models interface tests covering delegation to run_readiness_check, typed error propagation, sync fallback mode, and column materialization.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as check_models_command
    participant Ctrl as GenerationController
    participant DD as DataDesigner
    participant RP as ResourceProvider
    participant RC as readiness.py
    participant MR as ModelRegistry
    participant MCP as MCPRegistry

    User->>CLI: data-designer check-models config.yaml
    CLI->>Ctrl: run_check_models(config_source)
    Ctrl->>Ctrl: _load_config(config_source)
    Ctrl->>DD: DataDesigner()
    Ctrl->>DD: check_models(config_builder)
    DD->>RP: _create_resource_provider(check-models, config_builder)
    DD->>DD: _resolve_client_concurrency_mode(config_builder)
    DD->>RC: run_readiness_check(columns, resource_provider, client_concurrency_mode)
    RC->>MR: run_health_check(model_aliases) or arun_health_check(...)
    MR-->>RC: OK or raises typed error
    RC->>MCP: run_health_check(tool_aliases)
    MCP-->>RC: OK or raises
    RC-->>DD: None
    DD-->>Ctrl: None
    Ctrl->>User: All models and tools responded successfully
Loading

Reviews (5): Last reviewed commit: "Be more strict with health check respons..." | Re-trigger Greptile

if not model_aliases:
return

if flags.DATA_DESIGNER_ASYNC_ENGINE:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small edge case: check_models() can build sync-mode clients when the config forces sync fallback, like allow_resize=True, but readiness still branches on the raw async env flag and calls arun_health_check(). I know allow_resize / sync are probably on the way out, so maybe this is just a known limitation, but it might be worth either matching the resolved client mode here or adding a regression/explicit guard so users do not hit an async/sync client internals error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I overlooked this, thanks. Addressed in c9c118f. This should be fairly straightforward to unwind if we fully drop the sync engine.

fake_loop = Mock()

with (
patch("data_designer.engine.readiness.ensure_async_engine_loop", return_value=fake_loop, create=True),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this patch target does not look like it is used. _run_model_health_check() imports ensure_async_engine_loop inside the function from data_designer.engine.dataset_builders.utils.async_concurrency, so patching data_designer.engine.readiness.ensure_async_engine_loop will not intercept it. Could patch the source path instead and drop create=True; same thing applies to the timeout test below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks. Fixed in 8f52dbd

@andreatgretel

Copy link
Copy Markdown
Contributor

I think this new readiness path could eventually replace the implementation in scripts/health_checks.py, but probably not the script itself. The script is catalog-scoped, checking the predefined provider/model defaults when env vars are present; check_models is config-scoped. A nice follow-up might be to keep the script as the maintainer entrypoint, but have it build small temporary configs from PREDEFINED_PROVIDERS_MODEL_MAP and call the shared readiness path so the actual probe logic does not drift.

mikeknep added 8 commits June 10, 2026 09:45
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
@mikeknep mikeknep force-pushed the dd-model-health/mknepper branch from c4715f9 to c9c118f Compare June 10, 2026 16:26
@mikeknep

Copy link
Copy Markdown
Contributor Author

@andreatgretel I updated the health_checks script to use this new method in b3a9d88, LMK what you think!

@mikeknep mikeknep requested a review from andreatgretel June 10, 2026 16:29
andreatgretel
andreatgretel previously approved these changes Jun 11, 2026

@andreatgretel andreatgretel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replacement direction looks good to me: keeping scripts/health_checks.py as the predefined-provider/catalog entrypoint while delegating the actual probe path through DataDesigner.check_models() seems like the right shape.

One behavioral difference I noticed: the old script had stricter pass criteria than the shared readiness path. It retried chat completions and failed empty/non-string responses, and it required embeddings to return exactly one non-empty vector. The new path reports success as long as ModelRegistry.run_health_check() does not raise, so "" for chat or [[]] for embeddings could now pass.

Since this script is our maintainer smoke test for predefined provider defaults, I’d preserve those non-empty checks either in ModelRegistry.run_health_check() or as script-specific strict validation. Otherwise this looks good to me.

Signed-off-by: Mike Knepper <mknepper@nvidia.com>
@mikeknep

Copy link
Copy Markdown
Contributor Author

I've updated ModelRegistry.[a]run_health_check with more strict checking of the responses in 95e213c

@mikeknep mikeknep requested a review from andreatgretel June 11, 2026 19:33
@mikeknep mikeknep merged commit abe7667 into main Jun 11, 2026
61 checks passed
@mikeknep mikeknep deleted the dd-model-health/mknepper branch June 11, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants