Skip to content

Spec rebuild defaults PUF donor fits to unweighted, reversing the verified landmine fix #76

@MaxGhenis

Description

@MaxGhenis

Summary

The spec-driven rebuild defaults donor fits to unweighted, and specs/us-2024.yaml declares no weights: on any imputation step — including the PUF steps. This silently reverses the verified fix for the eCPS "landmine" failure mode, two days after the root cause was confirmed.

Where the default lives

  • Blueprint (microplex/docs/spec-driven-rebuild.md §1): "Donor fits are unweighted by default. Set weights: donor_weight_column on a step only when deliberately testing a weighted donor fit." — i.e. weighted fits are framed as the experiment, unweighted as canonical.
  • Engine (microplex/src/microplex/imputation.py, _resolve_step_weight_column and the surrounding fit path): "Omitted step.weights means the fit is intentionally unweighted, even if the donor has survey weights."
  • Spec: grep -n "weights" specs/us-2024.yaml → no matches (both the copy here and microplex/packs/us/specs/us-2024.yaml).

Why this is the landmine bug class

The 2026-06-06 full-population analysis traced the eCPS income-tax blowup to QRF donor fits trained unweighted on the tail-oversampled PUF (policyengine-us-data puf_impute.py fits with no weight_col while _stratified_subsample_index keeps all top-0.5%-AGI donors). The PUF is stratified on the outcome (income), so unweighted conditional fits are biased toward the tail within demographic cells, and rare extreme donors get broadcast as point-mass landmines that detonate under any reweight. Measured then: eCPS miscellaneous_income max $795M with ~3,940 records >$100M (2,301 landmines surviving calibration); the weighted-fit microplex path produced max ~$1.4–2.2M and zero landmines, with calibrated income tax at a sane 9.3% of GDP.

The code that carried that fix — pe_source_impute_engine.py's run_conditioned_block fitting with weight_col="weight" from household_weight — was deleted in PolicyEngine/microplex-us#261 (last present at f3af332) and its behavior was not carried into the spec.

What's needed

A deliberate decision, recorded in the blueprint, between the two coherent designs:

  1. Weighted donor fits (the verified fix): declare weights: <PUF design weight> on the PUF steps in us-2024.yaml, and flip the blueprint's default-framing for outcome-stratified donors. The engine already supports this (ImputationStep.weights).
  2. Unweighted fits as a deliberate pool-building choice (eCPS-style: tail-rich record pool, calibration owns representativeness): then the compensating control must exist — a hard weight-ratio bound in the calibration solve policy (calibration/solve_policy.py has no such bound today), because this is exactly the configuration that produced the landmines in eCPS.

Either way this should be resolved before any spec-built candidate is scored against frozen eCPS — it changes what the scores mean.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions