Conversation
|
Overall Assessment Static review only. I could not execute the test suite in this shell because the available Executive Summary
Methodology The documented extension itself is acceptable; the blockers below are implementation errors inside that extension.
Code Quality No material findings beyond the methodology issues above. Performance No material findings in the changed code. Maintainability
Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; I could not run Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; this shell does not have Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; this shell does not have Executive Summary
Methodology Affected methods: the core dCDH survey extension, the Phase 3 heterogeneity test, and the TWFE diagnostic surface.
Code Quality No additional findings beyond the methodology propagation issues above. Performance No material performance findings in the changed code. Maintainability
Tech Debt No relevant Security No findings. Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only. I could not run the test suite in this shell because Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only. I could not execute the test suite in this shell because the available Python environment is missing core deps like Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only. I could not run the test suite in this shell because Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessmentStatic review only. This shell does not have Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only. I could not run the test suite in this environment because Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings. I did not find a mitigating entry for the P1 above under Security No findings. Documentation/Tests
Path to Approval
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only. I could not run the test suite here because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
- Weighted cell aggregation in _validate_and_aggregate_to_cells() - Survey resolution via _resolve_survey_for_fit() with pweight-only and group-constant validation - IF expansion from group to observation level for TSL variance - Survey-aware SE at all call sites (overall, joiners, leavers, multi-horizon, placebos) via _compute_se() dispatcher - Bootstrap + survey warning (PSU-level deferred) - 12 new tests in test_survey_dcdh.py - Documentation updates: REGISTRY.md, ROADMAP.md, llms-full.txt, choosing_estimator.rst Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…torize IF expansion - Remove _validate_group_constant_survey() call - the IF expansion psi_i = U[g] * (w_i / W_g) handles observation-level variation in weights, strata, and PSU within groups correctly - Vectorize _survey_se_from_group_if using np.bincount + np.unique (was Python loops over all observations) - Replace test_rejects_varying_weights_within_group with two positive tests: varying weights accepted, and varying weights change ATT (time-varying noise to survive first-differencing) - Remove unused survey_weight_type variable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- P1-A: Scale U_centered by 1/divisor before survey IF expansion. dCDH IFs are numerator-scale (U.sum() == N_S * DID_M), but compute_survey_if_variance() expects estimator-scale psi. - P1-B: Zero-weight cells (w_gt <= 0) now treated as absent by setting n_gt=0, preventing NaN propagation into estimates. - P2: Add SE-pinning test (uniform weights + PSU=group matches plug-in SE) and zero-weight cell exclusion test. - P3: Delete unused _validate_group_constant_survey() from survey.py that contradicted the supported within-group variation contract. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ey propagation - P1-A: Drop zero-weight rows entirely from cell DataFrame (instead of just zeroing n_gt) so ragged-panel validator doesn't see them - P1-B: Survey-weighted covariate aggregation in DID^X path (sum(w*x)/sum(w) instead of unweighted mean) - P1-C: Thread _df_survey to all remaining safe_inference() calls: bootstrap t-stats, normalized effects, cost-benefit delta, placebo bootstrap t-stats - P3: Fix REGISTRY overview paragraph (was still saying survey deferred) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- P0: delta overall surface now uses _df_survey instead of df=None at both safe_inference sites (primary delta path + placebo NaN-SE fallback). This makes overall_* under L_max>=2 use survey-t inference and respects safe_inference's df<=0 NaN guard. - P1: HonestDiD dCDH extraction now propagates df_survey from survey_metadata (mirrors CS pattern). Survey-backed dCDH HonestDiD bounds now use survey-aware critical values. - P2: Add 4 regressions (survey delta t-matches-reported, t-vs-z differs, survey+controls, survey+honest_did df propagation). Update stale comment in test_dcdh_extraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- P1 #1: _compute_heterogeneity_test now accepts obs_survey_info and runs survey-aware WLS + Binder TSL IF when survey_design is active. Point estimate via solve_ols(weights=W_elig, weight_type='pweight'); group-level IF ψ_g[X] = inv(X'WX)[1,:] @ x_g * W_g * r_g, expanded to obs-level via w_i/W_g ratio, then compute_survey_if_variance for stratified/PSU variance. safe_inference uses df_survey. Rank-deficiency short-circuits to NaN to avoid point-estimate/IF mismatch between solve_ols's R-style drop and pinv's minimum-norm. - P1 #2: twowayfeweights() now accepts Optional[SurveyDesign]. When provided, resolves weights via _resolve_survey_for_fit and passes them to _validate_and_aggregate_to_cells, restoring fit-vs-helper parity under survey-backed inputs. fweight/aweight rejected. - P3: REGISTRY updates — TWFE parity sentence now includes survey; heterogeneity Note documents the TSL IF mechanics and library extension disclaimer; checklist line-651 lists survey-aware surfaces; new survey+bootstrap-fallback Note after line 652. - P2: 5 new regression tests in test_survey_dcdh.py: TestSurveyHeterogeneity (uniform-weights match, non-uniform beta change, t-dist df_survey) and TestSurveyTWFEParity (fit-vs-helper match, non-pweight rejection). All 254 targeted tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- P1 #1: _compute_twfe_diagnostic now uses cell_weight (w_gt when available, else n_gt) for FE regressions, the normalization denominator, contribution weights, and the Corollary 1 observation shares. On survey-backed inputs the outputs now match the observation-level pweighted TWFE estimand; non-survey path is byte-identical. - P1 #2: Zero-weight rows are dropped before the groupby in _validate_and_aggregate_to_cells when weights are provided, so that d_min/d_max/n_gt reflect the effective sample. Prevents zero-weight subpopulation rows from tripping the fuzzy-DiD guard or inflating downstream n_gt counts. - P2: 2 new regression tests in test_survey_dcdh.py — TestSurveyTWFEOracle.test_survey_twfe_matches_obs_level_pweighted_ols verifies beta_fe matches an observation-level pweighted OLS under survey (would fail if n_gt was still used), and TestZeroWeightSubpopulation.test_mixed_zero_weight_row_excluded_from_validation verifies an injected zero-weight row with opposite treatment value doesn't trip the within-cell constancy check. All 256 targeted tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- twowayfeweights() now raises NotImplementedError when the resolved survey design carries replicate weights, matching fit()'s contract. The fit/helper parity promise now holds for unsupported inputs too. - docs/llms-full.txt and ROADMAP.md no longer claim that Phase 3 parameters raise NotImplementedError; both now correctly note that only `aggregate` remains gated. - Added a helper-level regression that pins the replicate-weight rejection, complementing the existing fit()-level test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The results dataclass docstring still described survey_metadata as 'always None' and survey integration as deferred. survey_design is now implemented and results.survey_metadata is populated whenever a SurveyDesign is passed to fit(). Docstring now describes the populated field and its role in survey-aware inference and HonestDiD propagation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vey branch tests - P1 #1: The R5 zero-weight filter only ran inside the cell aggregation step, after the NaN/coercion checks for group/time/treatment/outcome. Moved the filter to the very top of _validate_and_aggregate_to_cells so validation only sees the effective sample. fit()'s controls, trends_nonparam, and heterogeneity blocks now also scope their NaN/time-invariance checks to positive-weight rows when survey_weights is active. Legitimate SurveyDesign.subpopulation() inputs with NaN in excluded rows now fit cleanly. TSL variance path is unchanged (zero-weight obs still contribute zero psi). - P2: 5 new regression tests in test_survey_dcdh.py — TestZeroWeightSubpopulation now covers NaN outcome and NaN het columns in excluded rows; new TestSurveyTrendsLinear / TestSurveyTrendsNonparam / TestSurveyDesign2 classes exercise survey_design combined with those previously-untested branches. All 262 targeted tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R8's controls-block fix scoped NaN/Inf validation to the positive-weight subset (shorter data_controls) but then assigned those shorter arrays into an x_agg_input built from the full-length frame, causing a length-mismatch on any SurveyDesign.subpopulation() / zero-weight excluded row before covariate aggregation could run. Root-caused fix: derive both the validation frame AND the aggregation frame from the same positive-weight effective sample (data_eff, survey_weights_eff). Zero-weight rows are genuinely out-of-sample throughout the DID^X path now. Non-survey fits unchanged. Added TestZeroWeightSubpopulation.test_zero_weight_row_with_nan_control pinning the subpopulation contract for the DID^X path — injects a zero-weight row with NaN control value and asserts fit() succeeds. All 263 targeted tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- diff_diff/guides/llms-practitioner.txt: add reversible-treatment branch to the Step 4 decision tree pointing at ChaisemartinDHaultfoeuille (alias DCDH), and a short Step 5 example showing the survey_design + L_max + TSL fallback for dCDH. The practitioner guide previously had zero dCDH coverage despite the full API being live. - diff_diff/guides/llms-full.txt: fix stale SyntheticDiDResults table (line 1024) — variance_method now accepts "bootstrap", "jackknife", or "placebo" (jackknife was added for synthdid::vcov parity but the table only listed bootstrap/placebo). Bundled-guide paths only. The old docs/llms-*.txt were moved to diff_diff/guides/ in main (PR #310); these edits target the new canonical location consumed by get_llm_guide(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3624305 to
d045987
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only. I could not run Executive Summary
Methodology Affected methods: survey-backed
Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings. I did not see a new Security No findings. Documentation/Tests
Path to Approval
|
- P1 #1/#2: Add _validate_group_constant_strata_psu() helper and call it from fit() after the weight_type/replicate-weights checks. The dCDH IF expansion psi_i = U[g] * (w_i / W_g) treats each group as the effective sampling unit; when strata or PSU vary within group it silently spreads horizon-specific IF mass across observations in different PSUs, contaminating the stratified-PSU variance. Walk back the overstated claim at the old line 669 comment to match. Within- group-varying weights remain supported. - P1 #3: _survey_se_from_group_if now filters zero-weight rows before np.unique/np.bincount so NaN / non-comparable group IDs on excluded subpopulation rows cannot crash SE factorization. psi stays full- length with zeros in excluded positions to preserve alignment with resolved.strata / resolved.psu inside compute_survey_if_variance. - REGISTRY.md line 652 Note updated: explicitly states the within-group-constant strata/PSU requirement and the within-group-varying weights support. - Tests: new TestSurveyWithinGroupValidation class (4 tests — rejects varying PSU, rejects varying strata, accepts varying weights, and ignores zero-weight rows during the constancy check) plus TestZeroWeightSubpopulation.test_zero_weight_row_with_nan_group_id. All 268 targeted tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only. I could not run Executive Summary
Methodology
Code Quality No additional findings beyond the methodology issue above. Performance No findings. Maintainability No findings. Tech Debt No mitigating Security No findings. Documentation/Tests
Path to Approval
|
Summary
Methodology references (required if estimator / math changes)
Validation
Security / privacy
Generated with Claude Code