Skip to content

Add LA-level household land value calibration targets#371

Open
vahid-ahmadi wants to merge 3 commits intomainfrom
la-land-value-targets
Open

Add LA-level household land value calibration targets#371
vahid-ahmadi wants to merge 3 commits intomainfrom
la-land-value-targets

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

Summary

  • Add 360 LA-level ons/household_land_value/{code} calibration targets
  • Generalise mhclg_regional_land.py methodology to local-authority granularity
  • Ship canonical input CSV, target source module, and 18 unit tests

Closes #370.

What this PR does

Extends the regional methodology to LA level

Each LA's share of national household land value is proportional to its total property wealth (households × avg_house_price), scaled so the LA totals match the ONS national household-land series. Exactly the same formula as mhclg_regional_land.py::_compute_regional_shares, one geography deeper.

share_la  = (households_la × avg_house_price_la) / Σ_LAs
target_la = share_la × HOUSEHOLD_LAND_VALUES[year]

Files

New

  • policyengine_uk_data/storage/la_land_values.csv — 360 rows: code, name, households, avg_house_price.
    • households from the existing local_authority_weights.h5 (sum of each LA's 2025 weight row) — keeps household-count semantics aligned with the rest of the LA calibration.
    • avg_house_price from HM Land Registry UK HPI (Dec 2025). Primary match on ONS code, name-based fallback for LAs with re-allocated codes (e.g. Sheffield E08000019 → E08000039 in HPI), NI country-level HPI fallback for missing NI LGD months, national-avg fallback for Isles of Scilly.
  • policyengine_uk_data/targets/sources/la_land.py_compute_la_shares(), _compute_la_targets(), get_targets() returning 360 Target objects with geographic_level=LOCAL_AUTHORITY.
  • policyengine_uk_data/tests/test_la_land_value_targets.py — 18 unit tests.
  • changelog.d/370.md.

Modified

  • policyengine_uk_data/tests/test_regional_land_value_targets.pytest_target_registry_includes_regional now filters by GeographicLevel.REGION (the regional and LA targets share the ons/household_land_value/ name prefix, so filtering by prefix alone now pulls both).

Tests

All unit tests, no baseline fixture needed:

CSV data quality

  • Row count matches local_authorities_2021.csv (360)
  • Columns match schema exactly
  • No missing values; covers E/W/S/NI
  • House prices in [£50k, £2m], households positive

Share / target aggregation

  • Shares sum to 1; all positive
  • LA targets sum to the national ONS household-land series within 1e-6 for every year
  • Kensington and Chelsea avg household land > 3× Blackpool
  • London LAs dominate the top quintile of LAs by avg household land
  • London total land dwarfs North East total by >3×

Registry integration

  • get_targets() returns exactly 360
  • Target names follow ons/household_land_value/{code}; geo_code == code
  • All carry GeographicLevel.LOCAL_AUTHORITY
  • Every target has values for every year in HOUSEHOLD_LAND_VALUES
  • get_all_targets(year=2024, geographic_level=LOCAL_AUTHORITY) returns 360 LA land targets

Results of running the new tests plus adjacent suites (regional land, land targets, target DB, release manifest): 47 passed, 8 skipped.

Out of scope

Wiring these targets into datasets/local_areas/local_authorities/loss.py so the LA reweighting actually calibrates on them. Planned follow-up PR.

Sanity check — top 10 LAs by avg household land value (2024)

LA Avg household land Avg house price (UK HPI)
Kensington and Chelsea £622k £1,178k
Westminster £465k £880k
Camden £414k £784k
Richmond upon Thames £410k £777k
Elmbridge £392k £743k
City of London £391k £740k
Hammersmith and Fulham £377k £714k
Islington £369k £700k
Wandsworth £364k £689k
Haringey £331k £627k

Bottom 10 are all post-industrial / deprived areas (Inverclyde, East Ayrshire, West Dunbartonshire, Hull, Burnley, Hartlepool, Aberdeen, North Ayrshire, Hyndburn, Blackpool — all at £60–72k).

Sources

Related

Generalises targets/sources/mhclg_regional_land.py to local-authority
level. Each LA's share of national household land is proportional to
households x avg_house_price, scaled to the ONS National Balance Sheet
household-land series.

Inputs (all already used elsewhere in the repo):
- storage/la_land_values.csv: 360 LAs with households (from the existing
  local_authority_weights.h5 matrix) and avg_house_price (HM Land
  Registry UK HPI Dec 2025).
- _land.HOUSEHOLD_LAND_VALUES for the national anchor.

Tests cover CSV data quality, share/target aggregation, sensible
ordering (K&C > Blackpool by >3x, London boroughs in top quintile),
and registry integration.

Updates test_regional_land_value_targets.py to filter by
GeographicLevel.REGION now that LA targets share the same name prefix.

Closes #370

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi self-assigned this Apr 20, 2026
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

Note for whoever picks up #357: this PR mirrors mhclg_regional_land.py's property-wealth-share methodology at LA level, so it inherits the same flat-land-share assumption. When the regional shares are updated to multiply by property_wealth_intensity[region], the LA CSV here (storage/la_land_values.csv) needs regenerating in the same PR — same multiplier, with each LA using its region's intensity — so the LA and regional targets stay numerically consistent with each other and with the region-aware formula in policyengine-uk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis
Copy link
Copy Markdown
Contributor

Blocker: data bug in la_land_values.csv — Isles of Scilly (E06000053) has 2,492,115 households. The real figure is ~1,000 (pop ~2,000, 1,115 households per ONS mid-2023 estimate).

Impact: IoS alone absorbs 8.6 % of the national household share (2.49M / 29.0M), which the methodology then multiplies by the national household-land value — £65 bn of UK household land 'lives' in Scilly under this target, depressing every other LA's share by that amount. London LAs take the biggest hit because their share-of-average-price is highest.

Quick verification:

$ awk -F',' 'NR>1 {sum+=$3} END {print sum/1e6}' la_land_values.csv
31.5   # with IoS bug
$ awk -F',' 'NR>1 && $2 != "Isles of Scilly" {sum+=$3} END {print sum/1e6}' la_land_values.csv
29.0   # without — matches ONS ~29.4M

Looks like a UK-HPI 'national-total-as-fallback' path leaked into one LA row. Likely two lines to fix:

  1. Correct la_land_values.csv row to E06000053,Isles of Scilly,1115,308582 (or whatever the canonical source gives).
  2. Add a test that bounds households per LA, e.g. assert (df.households.between(500, 500_000)).all(). None of the current 18 tests catch a 1000x outlier.

Happy to approve once that's in. The methodology itself is sound — mirrors mhclg_regional_land.py::_compute_regional_shares correctly and the target shape aligns with the regional targets.

The E06000053 row carried households=2,492,115 — roughly the South
West region total — from an upstream fallback that fired during CSV
generation. Real IoS has ~1,115 households per ONS mid-2023. With
the bug, IoS absorbed 7.85% of the national property-wealth share,
understating every other LA's 2024 target by ~8.5% (e.g. K&C moved
from £42.6bn to £46.2bn after the fix).

Two new tests prevent the regression:
- test_households_within_plausible_range: bounds every LA to
  [500, 500_000] so any future 10x+ outlier fails immediately.
- test_isles_of_scilly_households_are_thousands_not_millions: tight
  [500, 5_000] bound on the specific row that leaked.

Methodology unchanged; LA targets still sum to the ONS national
household-land series within 1e-6.
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

@MaxGhenis thanks — fixed in 3ed729c.

Data fix

  • Patched E06000053,Isles of Scilly from households=2,492,115 to 1,115 (ONS mid-2023 estimate). avg_house_price=308,582 kept as-is.
  • Post-fix UK household total across the CSV drops to 29.73M, in line with the ~29.4M ONS figure you quoted (was 31.5M pre-fix, matching your awk).

Quantified impact of the fix

  • IoS share of national property-wealth share: 7.85% → 0.0038%.
  • K&C 2024 target: £42.6bn → £46.2bn (+8.5%). Every non-IoS LA gets the same ~8.5% uplift; London LAs were the most suppressed.

Tests added

  • test_households_within_plausible_range — bounds every LA to [500, 500_000] per your suggestion. A future fallback leak of this class fails immediately.
  • test_isles_of_scilly_households_are_thousands_not_millions — explicit [500, 5,000] bound on E06000053 so the specific row that leaked has a named regression test.

Full suite: 20/20 pass locally via uv run pytest policyengine_uk_data/tests/test_la_land_value_targets.py.

Generation-path note: the 2,492,115 figure matches the South West regional household total, so the fallback that fired during CSV generation was a regional sum, not "national-avg" as the PR body suggested. I'll correct the PR description; worth flagging for whoever regenerates the CSV next.

@vahid-ahmadi vahid-ahmadi requested a review from MaxGhenis April 23, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add LA-level household land value calibration targets

2 participants