Skip to content

Certify data releases directly from release manifests#401

Merged
MaxGhenis merged 4 commits into
mainfrom
migrate-off-bundles
Jun 12, 2026
Merged

Certify data releases directly from release manifests#401
MaxGhenis merged 4 commits into
mainfrom
migrate-off-bundles

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Fixes #400

Replaces the policyengine-bundles import flow with direct data-release certification. scripts/certify_data_release.py fetches a country's data release manifest from Hugging Face, validates it, and derives the vendored country manifest in one step — regenerating the current US manifest through the new path reproduces it byte-identically apart from the certification source strings (571/571 dataset pins).

Certification gate (now enforced, per docs/release-bundles.md): the model version must exactly match the build-time model (compatibility_basis: built_with_model_package, name and version) or be covered by the publisher's compatible_model_packages claim (PEP 440; basis recorded with a warning). Neither basis refuses certification. Hard failures: unpinned artifacts, missing/unreachable certified dataset.

Removed: src/policyengine/provenance/bundle_import/, the vendored src/policyengine/data/bundle/ archive copies (bundle.json + 8.6k-line validation-report.json shipped in the wheel), scripts/import_policyengine_bundle.py, and their tests. Kept: the legacy single-country refresh (provenance/bundle.py) for data releases that predate release manifests — the UK enhanced FRS stays on it untouched until populace-uk certifies through a manifest, at which point that certification is a PR here using the new script (superseding the flow in policyengine-bundles#28; flagging for Nikhil Woodruff as UK data controller).

Supporting: pyproject pin updates never lower a committed core floor; the US TRACE TRO is regenerated and a regression test asserts the sidecar binds the vendored manifest under the canonical-model convention; AGENTS.md/copilot/skills docs point at the new data-certification engineering skill; the policyengine-bundles repo is untouched (archival is a separate team decision).

Three rounds of independent adversarial review (all findings fixed and verified); full suite 602 passed locally.

🤖 Generated with Claude Code

MaxGhenis and others added 4 commits June 12, 2026 07:19
Replaces the policyengine-bundles import flow: certification fetches
the country's data release manifest from Hugging Face, validates it
(revision pins, reachable certified dataset), and derives the vendored
country manifest in one step — byte-identical to the bundle-imported
manifest apart from the certification source strings (571/571 dataset
pins reproduced for the current US release). The model wheel pin,
pyproject floors, TRACE TRO, and changelog fragment are produced by the
same script. The vendored bundle archive copies and the bundle_import
package are removed; the legacy single-country refresh stays for data
releases that predate release manifests (UK enhanced FRS).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Enforce the certification gate from docs/release-bundles.md: the model
must exactly match the build-time model or be covered by the
publisher's compatibility claim (PEP 440), with the basis recorded in
the vendored manifest; neither basis refuses certification. The data
package repo now comes from the manifest's own location rather than
the default artifact, so an inherited national default cannot
misdirect runtime manifest fetches. Core floors never lower because of
a stale local environment, and the pin helper gains direct test
coverage. Dead AGENTS.md / copilot / skills-README pointers to the
removed release-bundles skill now point at data-certification, and the
architecture doc's flow section describes the in-repo entrypoint.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Regenerate the US TRACE TRO so the shipped sidecar binds the shipped
country manifest (canonical-model convention), with a regression test
asserting the binding. The built-with basis now also requires the
build-time package NAME to match, not just the version string, and the
architecture doc's certification rules describe the implemented gate
(built-with exact or publisher claim; fingerprint basis noted as
planned).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit 8b084c5 into main Jun 12, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Certify data releases directly from release manifests (retire the bundles hop)

1 participant