Skip to content

Stop rescaling housing costs during income imputation (fixes #367)#372

Open
vahid-ahmadi wants to merge 1 commit intomainfrom
fix/ahc-rent-mortgage-rescale-367
Open

Stop rescaling housing costs during income imputation (fixes #367)#372
vahid-ahmadi wants to merge 1 commit intomainfrom
fix/ahc-rent-mortgage-rescale-367

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

Summary

  • Removes the rent/mortgage rescaling in impute_over_incomes. In the built enhanced FRS this multiplier was ~2.5×, which uniformly inflated rent, mortgage_interest_repayment and mortgage_capital_repayment — pushing AHC poverty rates 10–18 pp above HBAI for children and working-age adults while leaving BHC rates close to official.
  • Deletes the now-unused _safe_rescale_factor helper and its dedicated test file.
  • Adds a regression test module that (a) verifies impute_over_incomes leaves housing-cost columns byte-identical, (b) confirms unlisted output variables pass through untouched, (c) covers the zero-income-baseline shape the old helper guarded, and (d) asserts that the built enhanced FRS's per-renter / per-mortgagor medians stay within 30 % of the raw FRS (a direct guard against the bug coming back).

See issue #367 for the full element-by-element audit.

Why rescaling was wrong

The rescale factor was computed as new_income_total / original_income_total using unweighted sums across INCOME_COMPONENTS. FRS dividend_income is near-zero by design (the survey under-reports dividends), so once the SPI-trained QRF replaces it with realistic values the ratio jumps — giving a ~2.5× multiplier on rent and mortgage in the built dataset. Water charges, structural-insurance payments and council tax were not rescaled, so the fingerprint in the enhanced FRS is unambiguous: first-unique-value ratios of 2.523× on the three rescaled columns and 1.000× on every other housing-related column.

Housing-cost year-on-year growth is already handled by per-variable OBR/Ofwat uprating indices, so this rescaling has no defensive role once it's removed.

Validation path

Locally, the three new unit tests pass on the fixed code and would fail against the old rescaling (they inject a ×100 dividend multiplier that would have blown up rent proportionally). The integration assertion currently fails on disk because enhanced_frs_2023_24.h5 was produced before the fix — CI rebuilds the dataset with make data before running tests, so the integration assertion is what validates the end-to-end effect of this change.

After CI rebuild, expected AHC poverty for children should drop from ~42 % to ~30–33 %, closer to HBAI FYE 2025 (27 %) and the DWP FYE 2026 child projection (33 % rel AHC).

Test plan

  • test_housing_costs_pass_through_unchanged: rent/mortgage are byte-identical after impute_over_incomes, even with extreme imputed dividends.
  • test_only_listed_outputs_are_overwritten: only dividend_income is rewritten; employment_income is untouched.
  • test_housing_costs_preserved_when_income_baseline_is_zero: covers the zero-baseline shape previously handled by _safe_rescale_factor.
  • (CI) test_built_enhanced_frs_housing_costs_track_raw_frs: passes after CI rebuilds the enhanced FRS with the fix.
  • (follow-up, post-rebuild) rerun the poverty comparison in 2025 baseline poverty vs HBAI: AHC rates too high for non-pensioners #367 and confirm AHC rates land within HBAI band.

🤖 Generated with Claude Code

impute_over_incomes multiplied rent, mortgage_interest_repayment and
mortgage_capital_repayment by new_income_total / original_income_total
across INCOME_COMPONENTS. Because FRS dividend_income is near-zero and
the SPI-trained QRF predicts realistic dividends, the ratio inflated
those three columns ~2.5× uniformly in the built enhanced FRS — pushing
AHC poverty rates 10-18pp above HBAI for non-pensioners while BHC stayed
close to official. Housing costs now pass through unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant