Stop rescaling housing costs during income imputation (fixes #367)#372
Open
vahid-ahmadi wants to merge 1 commit intomainfrom
Open
Stop rescaling housing costs during income imputation (fixes #367)#372vahid-ahmadi wants to merge 1 commit intomainfrom
vahid-ahmadi wants to merge 1 commit intomainfrom
Conversation
impute_over_incomes multiplied rent, mortgage_interest_repayment and mortgage_capital_repayment by new_income_total / original_income_total across INCOME_COMPONENTS. Because FRS dividend_income is near-zero and the SPI-trained QRF predicts realistic dividends, the ratio inflated those three columns ~2.5× uniformly in the built enhanced FRS — pushing AHC poverty rates 10-18pp above HBAI for non-pensioners while BHC stayed close to official. Housing costs now pass through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
impute_over_incomes. In the built enhanced FRS this multiplier was ~2.5×, which uniformly inflatedrent,mortgage_interest_repaymentandmortgage_capital_repayment— pushing AHC poverty rates 10–18 pp above HBAI for children and working-age adults while leaving BHC rates close to official._safe_rescale_factorhelper and its dedicated test file.impute_over_incomesleaves housing-cost columns byte-identical, (b) confirms unlisted output variables pass through untouched, (c) covers the zero-income-baseline shape the old helper guarded, and (d) asserts that the built enhanced FRS's per-renter / per-mortgagor medians stay within 30 % of the raw FRS (a direct guard against the bug coming back).See issue #367 for the full element-by-element audit.
Why rescaling was wrong
The rescale factor was computed as
new_income_total / original_income_totalusing unweighted sums acrossINCOME_COMPONENTS. FRSdividend_incomeis near-zero by design (the survey under-reports dividends), so once the SPI-trained QRF replaces it with realistic values the ratio jumps — giving a ~2.5× multiplier on rent and mortgage in the built dataset. Water charges, structural-insurance payments and council tax were not rescaled, so the fingerprint in the enhanced FRS is unambiguous: first-unique-value ratios of 2.523× on the three rescaled columns and 1.000× on every other housing-related column.Housing-cost year-on-year growth is already handled by per-variable OBR/Ofwat uprating indices, so this rescaling has no defensive role once it's removed.
Validation path
Locally, the three new unit tests pass on the fixed code and would fail against the old rescaling (they inject a ×100 dividend multiplier that would have blown up rent proportionally). The integration assertion currently fails on disk because
enhanced_frs_2023_24.h5was produced before the fix — CI rebuilds the dataset withmake databefore running tests, so the integration assertion is what validates the end-to-end effect of this change.After CI rebuild, expected AHC poverty for children should drop from ~42 % to ~30–33 %, closer to HBAI FYE 2025 (27 %) and the DWP FYE 2026 child projection (33 % rel AHC).
Test plan
test_housing_costs_pass_through_unchanged: rent/mortgage are byte-identical afterimpute_over_incomes, even with extreme imputed dividends.test_only_listed_outputs_are_overwritten: onlydividend_incomeis rewritten;employment_incomeis untouched.test_housing_costs_preserved_when_income_baseline_is_zero: covers the zero-baseline shape previously handled by_safe_rescale_factor.test_built_enhanced_frs_housing_costs_track_raw_frs: passes after CI rebuilds the enhanced FRS with the fix.🤖 Generated with Claude Code