Add HonestDiD integration for dCDH, summary() Phase 3 blocks by igerber · Pull Request #303 · igerber/diff-diff

igerber · 2026-04-14T12:41:04Z

Summary

Add HonestDiD (Rambachan-Roth 2023) integration on dCDH placebo + event study surface via honest_did=True in fit() and standalone compute_honest_did(results) path
Add 5 new summary() rendering sections for Phase 3 results: covariate diagnostics, cumulated level effects, heterogeneity test, design-2 descriptive, HonestDiD sensitivity bounds
Add trends_nonparam unequal-support regression test (Assumption 14 support-trimming)
Address AI review: runtime warning about placebo-based pre-periods, docstring update, summary refactor into helpers, edge-case test

Methodology references (required if estimator / math changes)

Method name(s): HonestDiD / Rambachan-Roth sensitivity analysis applied to ChaisemartinDHaultfoeuille (dCDH) estimator
Paper / source link(s): Rambachan, A. & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555-2591. https://doi.org/10.1093/restud/rdad018
Any intentional deviations from the source (and why):
- Library extension: dCDH HonestDiD uses DID^{pl}_l placebo estimates as pre-period coefficients rather than standard event-study pre-treatment coefficients. Runtime warning emitted. Documented in REGISTRY.md.
- Diagonal covariance: No full cross-horizon VCV available for dCDH; uses diag(SE^2). Documented in REGISTRY.md.
- Permissive consecutiveness: Gaps from trends_nonparam support-trimming are handled by filtering to largest consecutive block with warning (CS branch errors on gaps).

Validation

Tests added/updated:
- tests/test_chaisemartin_dhaultfoeuille.py: +13 tests (7 HonestDiD integration, 5 summary rendering, 1 unequal-support)
- tests/test_honest_did.py: +5 tests (integration, extraction, no-placebos error, placebo warning, empty-block edge case)
- 232 total tests passing (178 dCDH + 15 parity + 40 HonestDiD), 0 failures
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…regression test - Add ChaisemartinDHaultfoeuilleResults extraction to _extract_event_study_params() in honest_did.py (maps placebo horizons to pre-periods, event study to post-periods) - Lift honest_did gate in fit(), add early L_max>=1 validation, post-computation compute_honest_did() call with fallback warning on solver failures - Add 5 new summary() sections: covariate diagnostics, cumulated level effects, heterogeneity test, design-2 descriptive, HonestDiD sensitivity bounds - Update honest_did_results field type annotation and docstring - 17 new tests: 7 HonestDiD integration, 5 summary rendering, 3 honest_did.py extraction/integration, 1 trends_nonparam unequal-support, 1 gate update - REGISTRY.md: HonestDiD note, checklist update; ROADMAP: 3g shipped Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-case test - Add UserWarning in dCDH HonestDiD extraction about placebo-based pre-periods - Update REGISTRY.md to explicitly document library extension semantics - Update fit() docstring for honest_did (was "Reserved for Phase 3") - Include exception class name in HonestDiD failure warning - Factor summary() Phase 3 blocks into 5 private helper methods - Add test_dcdh_emits_placebo_warning and test_dcdh_empty_consecutive_block_raises Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-14T12:52:15Z

Overall Assessment
⛔ Blocker — the PR’s dCDH HonestDiD integration introduces one silent methodology bug and one blocking parameter-interaction bug.

Executive Summary

The documented dCDH deviations themselves are fine: placebo-based pre-periods, diagonal VCV, and support-trimming are disclosed in the registry and should not block approval.
P0: the new “largest consecutive block” logic does not actually enforce presence of -1 / +1; if those horizons drop out after finite-SE filtering, HonestDiD can silently treat sparse horizons as boundary periods and return wrong bounds.
P1: fit(..., honest_did=True) silently no-ops when the estimator was constructed with placebo=False; the standalone compute_honest_did() path already errors on the same input.
The added tests miss both failure modes above.
Review is static only; I could not run the test suite in this sandbox because the available Python environment is missing project dependencies (numpy import fails).

Path to Approval

In dCDH extraction, require a boundary-adjacent consecutive grid after finite-SE filtering (-1 must remain in pre, 1 must remain in post); otherwise raise a clear consecutive-grid error. Add regressions for “missing -1” and “missing +1”.
Reject honest_did=True when the estimator was created with placebo=False (or always call the standalone path and surface its existing error/warning). Add a fit-level regression for that combination.

Methodology

P0 Impact: In diff_diff/honest_did.py:L884, _largest_consecutive_block() returns the full horizon list when the boundary value is missing. Downstream, the boundary smoothness and RM constructions in diff_diff/honest_did.py:L978 and diff_diff/honest_did.py:L1163 assume the retained coefficients are contiguous [-T,…,-1,1,…,T̄]. If finite-SE filtering leaves only [-2] or only [2,…], the code silently re-labels those as boundary periods and computes the wrong identified set / robust CI. Concrete fix: after filtering finite SEs, require a block that actually contains -1 and 1, or raise ValueError; do not implicitly renumber sparse horizons. This also means the new registry note in docs/methodology/REGISTRY.md:L620 is not implemented in the boundary-missing case. The reference HonestDiD implementation errors on gapped event-time grids instead of silently accepting them. (rdrr.io)
P1 Impact: diff_diff/chaisemartin_dhaultfoeuille.py:L954 validates honest_did only against L_max, and the actual integration at diff_diff/chaisemartin_dhaultfoeuille.py:L2412 is skipped whenever results.placebo_event_study is falsy. With ChaisemartinDHaultfoeuille(placebo=False), fit(..., honest_did=True, L_max>=1) silently returns with results.honest_did_results=None and no warning, even though the standalone path is expected to fail on missing placebos in tests/test_honest_did.py:L1393. Concrete fix: raise early when honest_did=True and not self.placebo, or call compute_honest_did() unconditionally and surface its existing error/warning consistently.
P3-informational Impact: The dCDH-specific methodology deviations themselves are properly documented in docs/methodology/REGISTRY.md:L620 and covered by runtime warnings/tests, so I am not treating those documented extensions as defects. Concrete fix: none.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings beyond the blocker above.

Tech Debt

P3-informational Impact: Neither blocker is tracked under TODO.md:L51, so they are unmitigated and count fully toward the assessment. Concrete fix: these need code changes; TODO tracking would not downgrade them under the stated review policy.

Security

No findings.

Documentation/Tests

P2 Impact: The new dCDH HonestDiD tests in tests/test_chaisemartin_dhaultfoeuille.py:L3244 cover the happy path, controls, and trends_linear, but they do not cover the two highest-risk integration failures above: boundary-horizon loss after finite-SE filtering, and fit-level honest_did=True with placebo=False. Concrete fix: add targeted regressions for both cases, not just the standalone compute_honest_did() missing-placebo case in tests/test_honest_did.py:L1393.
P3 Impact: User-facing docs are stale. The changed honest_did.py docstrings still advertise only MultiPeriodDiD/CallawaySantAnna support in diff_diff/honest_did.py:L562, diff_diff/honest_did.py:L2200, diff_diff/honest_did.py:L2518, and diff_diff/honest_did.py:L2672, while the practitioner docs still say “MultiPeriodDiD/CS only” in docs/llms.txt:L23. Concrete fix: update those docstrings/docs to include dCDH support and its caveats.

…ocstrings P0: _largest_consecutive_block now raises ValueError when boundary horizon (-1 or +1) is missing after finite-SE filtering instead of silently returning the full list (would produce wrong HonestDiD bounds). P1: honest_did=True now rejects placebo=False early instead of silently returning honest_did_results=None with no warning. P2: Added 3 regression tests (boundary -1 missing, boundary +1 missing, placebo=False + honest_did). P3: Updated docstrings in honest_did.py (6 locations) and docs/llms.txt to include ChaisemartinDHaultfoeuilleResults alongside MultiPeriodDiD/CS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T14:35:46Z

/ai-review

github-actions · 2026-04-14T14:47:03Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ee6ad2f9f037c16aa7c36a74f9ac8118d7aff2a0

Overall Assessment
⚠️ Needs changes — the prior blocker findings are fixed, but there is one [Newly identified] unmitigated P1 on the new auto-HonestDiD default behavior.

Executive Summary

The prior P0/P1 issues are addressed: boundary-adjacent consecutiveness is now enforced in diff_diff/honest_did.py:L884-L915 with regressions in tests/test_honest_did.py:L1433-L1457, and honest_did=True now rejects placebo=False in diff_diff/chaisemartin_dhaultfoeuille.py:L954-L965 with fit-level coverage in tests/test_chaisemartin_dhaultfoeuille.py:L3278-L3285.
[Newly identified] fit(..., honest_did=True) hard-codes an implicit HonestDiD target parameter. In this codebase that means the equal-weight average over post horizons, but the new dCDH integration docs/registry/summary do not say that.
The explicit dCDH deviations called out in the PR body (placebo-based pre-periods, diagonal VCV, consecutive-block trimming) are documented in docs/methodology/REGISTRY.md:L620-L620 and are informational under the stated review policy.
The new tests materially improve coverage of the previous review, but they still do not pin the target parameter or summary label for the auto-HonestDiD block.
Static review only: pytest is not installed in this sandbox, so I could not execute the suite.

Path to Approval

Decide what scalar estimand honest_did=True should target for dCDH: first post/on-impact effect, equal-weight post average, or the dCDH scalar summary.
Encode that choice explicitly in code by passing l_vec from the fit-level integration (HonestDiD(..., l_vec=...), or extend compute_honest_did() to accept l_vec) instead of inheriting the current implicit default.
Document the chosen target in docs/methodology/REGISTRY.md, update the fit() docstring and summary text to name it explicitly, and add a regression asserting results.honest_did_results.original_estimate equals that documented target.

Methodology

P1 [Newly identified] Impact: the new fit-level integration hard-codes compute_honest_did(results, method="relative_magnitude", M=1.0) in diff_diff/chaisemartin_dhaultfoeuille.py:L2418-L2424. That wrapper cannot pass l_vec diff_diff/honest_did.py:L2667-L2698, so the call inherits HonestDiD’s l_vec=None default, which this implementation defines as a uniform average over post-treatment horizons diff_diff/honest_did.py:L2122-L2124, diff_diff/honest_did.py:L2256-L2265. The new dCDH integration docs only document Mbar=1.0 diff_diff/chaisemartin_dhaultfoeuille.py:L540-L546 and docs/methodology/REGISTRY.md:L620-L620, while the new summary block labels the scalar only as “Original estimate” diff_diff/chaisemartin_dhaultfoeuille_results.py:L979-L996. The official HonestDiD interface defines the target as theta = l_vec' tau; its default convenience path targets the first post/on-impact effect, and averages require an explicit l_vec. (rdrr.io) Concrete fix: either pass the intended l_vec explicitly from the dCDH integration, or document the equal-weight-average target as a deviation in REGISTRY.md and label it explicitly in the summary/results object.
P3-informational Impact: the dCDH-specific deviations themselves are documented and warned at runtime in docs/methodology/REGISTRY.md:L620-L620 and diff_diff/honest_did.py:L829-L839, so they are mitigated under the review policy. Concrete fix: none.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No material findings beyond the methodology issue above.

Tech Debt

No new tracked/mitigated tech-debt items change the assessment.

Security

No findings.

Documentation/Tests

P2 Impact: the new tests cover the prior boundary and placebo=False regressions, but they do not assert which scalar target honest_did=True is computing. The parity test in tests/test_chaisemartin_dhaultfoeuille.py:L3287-L3310 compares auto vs manual on the same implicit default, and the summary test in tests/test_chaisemartin_dhaultfoeuille.py:L3464-L3474 only checks section presence. Concrete fix: add a regression that checks results.honest_did_results.original_estimate against the documented l_vec target and that summary() prints that target clearly.

- Update fit() docstring to specify equal-weight average over post horizons (l_vec=None default) and note R's HonestDiD targets on-impact instead - Update REGISTRY.md with l_vec deviation from R's default - Add "Target: Equal-weight avg over post horizons" line to summary - Add test_honest_did_original_estimate_is_post_average regression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T14:49:07Z

/ai-review

github-actions · 2026-04-14T14:59:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ad7698cd11a0291363dea21a8e266fa502f1e509

Overall Assessment

⚠️ Needs changes — the prior HonestDiD target-documentation issue is fixed, but one P1 [Newly identified] incomplete-parameter-propagation issue remains in the new dCDH HonestDiD public API/docs path.

Executive Summary

The prior rer-review blocker on the auto-HonestDiD target is resolved: the PR now documents and tests that fit(..., honest_did=True) targets the equal-weight average over post-treatment horizons in diff_diff/chaisemartin_dhaultfoeuille.py:L540-L549, docs/methodology/REGISTRY.md:L620-L620, diff_diff/chaisemartin_dhaultfoeuille_results.py:L989-L991, and tests/test_chaisemartin_dhaultfoeuille.py:L3372-L3386.
P1 [Newly identified]: the new docs tell users to use compute_honest_did(results, ...) for custom l_vec targets, but the wrapper still does not expose l_vec, so the changed target-parameter behavior is not fully propagated through the advertised public API.
The dCDH-specific deviations from canonical HonestDiD are now documented and warned at runtime, so they are P3 informational under the stated review policy.
The new tests cover L_max, placebo=False, and boundary-adjacent consecutiveness failures, but they still do not exercise an end-to-end trends_nonparam + HonestDiD support-trimming case.
Static review only: pytest is not installed in this environment.

Path to Approval

Make the standalone custom-target path truthful. Preferred: add l_vec to compute_honest_did() and forward it into HonestDiD(...). Minimum acceptable: remove the new docs/registry guidance that tells users to pass custom l_vec through compute_honest_did(...) and point them to HonestDiD(l_vec=...).fit(results) instead.
Add a regression for the chosen public path using l_vec = [1, 0, ..., 0] and assert the returned original_estimate equals the on-impact effect.

Methodology

P1 [Newly identified] Impact: the affected method is HonestDiD / Rambachan-Roth sensitivity analysis as applied to the dCDH placebo + event-study surface. HonestDiD itself exposes l_vec as the scalar target definition in diff_diff/honest_did.py:L2122-L2152, but compute_honest_did() only accepts method, M, and alpha in diff_diff/honest_did.py:L2667-L2698. The new dCDH fit docstring and registry note nevertheless tell users to use compute_honest_did(results, ...) for custom l_vec targets in diff_diff/chaisemartin_dhaultfoeuille.py:L540-L549 and docs/methodology/REGISTRY.md:L620-L620. HonestDiD’s reference interface defines the scalar estimand as theta = l_vec' tau, with l_vec defaulting to the first basis vector (1, 0, ..., 0), so l_vec propagation is methodologically load-bearing rather than cosmetic. Concrete fix: add l_vec to compute_honest_did() and forward it to HonestDiD, or change the new docs to point to HonestDiD(l_vec=...).fit(results) instead. (search.r-project.org)
P3-informational Impact: the dCDH-specific deviations themselves are properly documented and warned: placebo DID^{pl}_l coefficients are used as pre-period inputs, covariance is diagonal because no full dCDH cross-horizon VCV exists, and non-consecutive horizons from trends_nonparam support trimming are filtered to the largest consecutive block with a warning. Those are documented in docs/methodology/REGISTRY.md:L620-L620 and implemented in diff_diff/honest_did.py:L829-L839 and diff_diff/honest_did.py:L880-L934. Concrete fix: none.

Code Quality

No material findings in the changed code.

Performance

No material findings in the changed code.

Maintainability

No material findings beyond the API propagation issue above.

Tech Debt

P3-informational Impact: the underlying Delta^RM naive-FLCI limitation remains documented in docs/methodology/REGISTRY.md:L2170-L2173 and tracked in TODO.md:L73-L73, so it is mitigated under the review policy and does not block this PR. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

P2 Impact: the new tests prove the default equal-weight target and the boundary-adjacent failure cases, but they still do not cover the actual trends_nonparam support-trimming + HonestDiD integration path described in the new registry note. There is a trends_nonparam unequal-support test without HonestDiD in tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744, and there are HonestDiD consecutiveness tests that manually corrupt SEs in tests/test_honest_did.py:L1419-L1457, but no end-to-end fit(..., trends_nonparam=..., honest_did=True) regression. The new summary test also only checks section presence in tests/test_chaisemartin_dhaultfoeuille.py:L3480-L3490. Concrete fix: add an end-to-end support-trimming fixture that produces outer-horizon gaps, assert the warning plus retained block, and assert the HonestDiD summary target text stays consistent with the target actually used.

- Add l_vec parameter to compute_honest_did() so the advertised custom-target path actually works (was missing from wrapper) - Add test_honest_did_custom_l_vec_on_impact: l_vec=[1,0] targets on-impact effect, asserts original_estimate matches DID_1 - Add test_honest_did_with_trends_nonparam: end-to-end trends_nonparam + honest_did=True integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T16:09:37Z

/ai-review

github-actions · 2026-04-14T16:20:08Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a351e2e2f68a1bb60c8a3584107dc56fb56491dd

Overall Assessment

⚠️ Needs changes — the prior custom-target wrapper issue is fixed, but one P1 [Newly identified] parameter-propagation problem remains in the new dCDH HonestDiD reporting surface.

Executive Summary

The prior re-review P1 is resolved: diff_diff/honest_did.py:L2667-L2705 now exposes l_vec through compute_honest_did(), which is methodology-load-bearing because HonestDiD defines the scalar target as theta = l_vec' tau. (search.r-project.org)
P1 [Newly identified]: the computation path now accepts custom l_vec, but the returned results do not retain that target metadata and the new dCDH summary block still hard-codes Target: Equal-weight avg over post horizons. That can make correct bounds appear under the wrong estimand label on the public results surface. diff_diff/honest_did.py:L2256-L2324 diff_diff/chaisemartin_dhaultfoeuille_results.py:L974-L1000
The dCDH-specific departures from canonical HonestDiD use are now documented in the Methodology Registry and surfaced by runtime warnings, so they are P3 informational rather than defects under the stated policy. Canonical HonestDiD is framed around event-study coefficients, their covariance matrix, and a user-specified l_vec. docs/methodology/REGISTRY.md:L620-L620 (search.r-project.org)
Residual P2: the new tests still do not drive a real trends_nonparam support-trimming gap through fit(..., honest_did=True) and assert the new warning/filter behavior end-to-end. tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744 tests/test_chaisemartin_dhaultfoeuille.py:L3407-L3429 tests/test_honest_did.py:L1419-L1457
Static review only: pytest is not installed in this environment.

Path to Approval

Persist target metadata on HonestDiDResults (l_vec or a rendered target label) in diff_diff/honest_did.py:L2256-L2324, and render that metadata in diff_diff/chaisemartin_dhaultfoeuille_results.py:L974-L1000 instead of hard-coding the equal-weight target.
Update the post-hoc docs in diff_diff/chaisemartin_dhaultfoeuille_results.py:L339-L343 to say compute_honest_did(results, ...) returns a standalone HonestDiDResults and only appears in results.summary() if assigned to results.honest_did_results.
Add a regression that computes compute_honest_did(r, l_vec=np.array([1, 0, ...])), attaches it to r.honest_did_results, and asserts r.summary() reports the custom target consistently with bounds.original_estimate.

Methodology

Severity: P1 [Newly identified]. Impact: the changed public custom-target path is still not fully propagated through the results surface. compute_honest_did(..., l_vec=...) now correctly defines a custom scalar target in diff_diff/honest_did.py:L2667-L2705, and HonestDiD.fit() uses that l_vec in diff_diff/honest_did.py:L2256-L2265, but the returned HonestDiDResults drops the target metadata in diff_diff/honest_did.py:L2310-L2324, while the new dCDH summary helper always prints Equal-weight avg over post horizons in diff_diff/chaisemartin_dhaultfoeuille_results.py:L989-L991. Because canonical HonestDiD treats l_vec as the estimand definition and defaults it to the first post-treatment basis vector, this is a methodology-significant reporting mismatch, not a cosmetic summary issue. (search.r-project.org) Concrete fix: store and render target metadata (l_vec or a target label) on HonestDiDResults, and cover the custom-target summary path with a regression.
Severity: P3-informational. Impact: the dCDH extension uses DID^{pl}_l placebos as pre-period inputs, diagonal variance, and an equal-weight auto-target in fit(). Those differ from canonical HonestDiD’s event-study-coefficient/covariance interface, but they are explicitly documented in docs/methodology/REGISTRY.md:L620-L620 and warned in diff_diff/honest_did.py:L818-L934, so they are non-blocking under the review policy. (search.r-project.org) Concrete fix: none.

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No separate findings beyond the target-metadata propagation issue above.

Tech Debt

Severity: P3-informational. Impact: the existing Delta^RM naive-FLCI limitation remains documented in docs/methodology/REGISTRY.md:L2172-L2173 and tracked in TODO.md:L73-L73, so it is mitigated and not blocking for this PR. The reference HonestDiD package uses C-LF/Conditional test inversion for RM by default. (search.r-project.org) Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the new tests cover standalone custom l_vec, placebo-input warnings, and manually induced missing-boundary errors, but they still do not exercise the actual trends_nonparam support-trimming path that produces non-consecutive horizons and triggers the new warning/filter logic. The unequal-support test in tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744 is non-HonestDiD, the HonestDiD trends_nonparam test in tests/test_chaisemartin_dhaultfoeuille.py:L3407-L3429 keeps balanced support, and the boundary tests in tests/test_honest_did.py:L1419-L1457 simulate gaps by mutating SEs rather than via the real trimming path. Concrete fix: add one end-to-end fixture where within-set support trimming drops outer horizons, then assert both the warning text and the retained consecutive pre/post block.

…namically - Add target_label field to HonestDiDResults (default: equal-weight avg) - HonestDiD.fit() detects common l_vec patterns and sets human-readable label (on-impact, equal-weight, or custom with vector) - Summary renders hd.target_label instead of hard-coded string - Add test_honest_did_custom_l_vec_summary_label: attaches custom-target results and asserts summary shows "on-impact" not "Equal-weight" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T16:45:08Z

/ai-review

igerber · 2026-04-14T17:24:57Z

/ai-review

github-actions · 2026-04-14T17:34:32Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 419d872c2da379516949f91ef22837dd072daccb

Overall Assessment

⚠️ Needs changes — the prior custom-l_vec reporting issue looks resolved, but one P1 [Newly identified] remains in the new trends_nonparam + HonestDiD edge case: horizon trimming can change the HonestDiD estimand without persisting which horizons were actually used.

Executive Summary

The previous re-review P1 is addressed: custom l_vec now flows into stored HonestDiD metadata and the dCDH summary block renders hd.target_label instead of hard-coding the equal-weight target. See diff_diff/honest_did.py:194, diff_diff/honest_did.py:2256, diff_diff/honest_did.py:2332, diff_diff/chaisemartin_dhaultfoeuille_results.py:990, and tests/test_chaisemartin_dhaultfoeuille.py:3407.
P1 [Newly identified]: when dCDH HonestDiD drops non-consecutive horizons to form a consecutive block, the bounds are computed on the retained post-period subset, but the persisted results surface does not record that retained subset. In trimmed-support cases, the public summary can still present a generic “Equal-weight avg over post horizons,” which no longer uniquely identifies the estimand.
The dCDH-specific departures from canonical HonestDiD are documented in the Methodology Registry, so those deviations are non-blocking P3s.
The earlier test-gap concern is still open as a P2: the suite does not drive the real trends_nonparam support-trimming path through honest_did=True and assert the retained block metadata/warning end to end.
Static review only: this environment does not have pytest or core runtime deps like pandas, so I could not execute the new tests.

Path to Approval

Persist the retained horizon set used by HonestDiD, at minimum post_periods_used and ideally both pre/post vectors, on HonestDiDResults.
Render that retained block in both the standalone HonestDiD summary and the dCDH summary block; for l_vec=None, label the target against the retained block explicitly instead of the current generic string.
Add an end-to-end regression where trends_nonparam support-trimming removes an outer horizon under honest_did=True, then assert both the warning text and the persisted target metadata/summary reflect the retained block.

Methodology

Severity P1 [Newly identified]. Impact: _extract_event_study_params() now drops non-consecutive dCDH horizons and keeps only the consecutive block around -1/+1; HonestDiD.fit() then rebuilds l_vec against that shorter post-period vector. But the returned object stores only a generic target_label, the dCDH summary renders only that generic label, and the standalone HonestDiDResults.summary()/to_dict()/to_dataframe() surfaces still do not expose which post horizons were actually used. In a trends_nonparam support-trimming case, users can therefore read correct bounds under an under-specified estimand label. See diff_diff/honest_did.py:821, diff_diff/honest_did.py:925, diff_diff/honest_did.py:2256, diff_diff/honest_did.py:2321, diff_diff/honest_did.py:252, diff_diff/honest_did.py:335, and diff_diff/chaisemartin_dhaultfoeuille_results.py:990. Official HonestDiD defines the scalar target as theta = l_vec'tau over the post-treatment coefficients, with l_vec itself defining the parameter of interest, so changing the retained post-period set changes the estimand, not just the display label. Concrete fix: store and render the retained horizon vector(s) used by HonestDiD, and include them in summary/export surfaces. (search.r-project.org)
Severity P3-informational. Impact: the dCDH branch uses placebo estimates as pre-period inputs, diagonal covariance, an equal-weight default target in fit(), and consecutive-block filtering when trends_nonparam creates gaps. Those differ from canonical HonestDiD’s event-study coefficient/covariance interface, but they are explicitly documented in docs/methodology/REGISTRY.md:620, so they are non-blocking under the stated policy. Concrete fix: none. The paper and official package both frame HonestDiD as robust inference for DiD/event-study coefficients under restrictions on post-treatment violations relative to pre-trends; the library is allowed to extend that interface as long as the extension is documented. (academic.oup.com)

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No separate findings beyond the retained-horizon metadata propagation issue above.

Tech Debt

Severity P3-informational. Impact: the PR’s default dCDH HonestDiD path inherits the pre-existing Delta^RM confidence-set approximation in honest_did.py. The official HonestDiD RM interface constructs confidence sets through test inversion and defaults to method="C-LF", while this library’s RM path still uses the documented naive FLCI approximation. That limitation is already documented in docs/methodology/REGISTRY.md:2171 and tracked in TODO.md:73, so it is non-blocking for this PR. Concrete fix: none for this PR. (search.r-project.org)

Security

No findings.

Documentation/Tests

Severity P2. Impact: the suite still does not exercise the real end-to-end support-trimming path that the new dCDH HonestDiD logic is designed to handle. The unequal-support trends_nonparam test stops before HonestDiD in tests/test_chaisemartin_dhaultfoeuille.py:2706, the honest_did=True test with trends_nonparam uses a balanced design in tests/test_chaisemartin_dhaultfoeuille.py:3426, and the boundary tests in tests/test_honest_did.py:1419 simulate missing horizons by mutating SEs rather than via the real support-trimming path. Concrete fix: add one fixture where within-set support trimming drops an outer placebo or event-study horizon under honest_did=True, then assert both the warning text and the persisted retained-horizon metadata.

- Add pre_periods_used and post_periods_used fields to HonestDiDResults so the retained horizon set is always available on the results object - HonestDiD.fit() populates both fields from the extracted period lists - Summary renders retained horizons below the target label - Add test_honest_did_retains_period_metadata asserting fields populated and summary shows "Post horizons used:" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T17:36:11Z

/ai-review

github-actions · 2026-04-14T17:43:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 33e9c678dc2754341843a75a9df7865e8c534922

Overall Assessment

⚠️ Needs changes — the prior retained-horizon issue is only partially resolved. The PR now stores the retained pre/post horizons and renders them in the dCDH top-level summary, but the standalone HonestDiDResults summary/export surfaces still drop that estimand metadata, so direct compute_honest_did(...) output remains under-specified when trends_nonparam trimming changes the retained block or a custom l_vec changes the target.

Executive Summary

The previous P1 is partially addressed: HonestDiDResults now carries target_label, pre_periods_used, and post_periods_used, and the dCDH summary() block prints them.
One P1 remains: standalone HonestDiDResults.summary(), to_dict(), and to_dataframe() still omit that metadata, so printed/serialized HonestDiD output does not uniquely identify the target after trimming or custom l_vec.
The dCDH-specific departures from canonical HonestDiD are documented in the Methodology Registry, so those are P3-informational rather than defects.
The new tests still do not exercise the real unequal-support trends_nonparam path through HonestDiD end to end.
Static review only: this environment lacks pytest, numpy, and pandas, so I could not run the changed tests.

Path to Approval

Add target_label, pre_periods_used, and post_periods_used to the public standalone HonestDiDResults surfaces: summary(), to_dict(), and therefore to_dataframe().
Add an end-to-end regression with real unequal-support trends_nonparam trimming under honest_did=True or compute_honest_did(...), and assert both the warning text and the retained-horizon metadata on the standalone HonestDiD output.

Methodology

Severity P1. Impact: Rambachan-Roth / official HonestDiD define the scalar target as theta = l_vec' tau, with l_vec determining the parameter of interest and the package defaulting to the first post-period basis vector. This PR now records target_label, pre_periods_used, and post_periods_used when constructing HonestDiDResults, and the dCDH top-level summary renders them, but standalone HonestDiDResults.summary() and to_dict()/to_dataframe() still omit them. As a result, standalone compute_honest_did(...) output still loses the estimand definition exactly in the cases this PR adds/supports: support trimming and custom l_vec. See diff_diff/honest_did.py:L2323-L2339, diff_diff/chaisemartin_dhaultfoeuille_results.py:L980-L1009, and diff_diff/honest_did.py:L254-L357. Concrete fix: surface target_label, pre_periods_used, and post_periods_used in standalone HonestDiD summary/export methods and add a regression that verifies those fields after trimmed-support extraction and custom l_vec. (search.r-project.org)
Severity P3. Impact: The dCDH-specific departures from canonical HonestDiD are all explicitly documented in the Methodology Registry: placebo estimates as pre-period inputs, diagonal covariance, equal-weight default when fit(honest_did=True) is used, and consecutive-block filtering under trends_nonparam. Per the review policy, I am treating those as non-blocking documented deviations. See docs/methodology/REGISTRY.md:L620-L620. Concrete fix: none.

Code Quality

No material findings in the changed code.

Performance

No material findings in the changed code.

Maintainability

No separate findings beyond the metadata-propagation issue above.

Tech Debt

Severity P3. Impact: the PR newly exposes dCDH users to the pre-existing Delta^RM naive-FLCI limitation, but that limitation is already documented in the Methodology Registry and tracked in TODO.md, so it is non-blocking under the stated policy. See docs/methodology/REGISTRY.md:L2171-L2173 and TODO.md:L73-L73. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

Severity P2. Impact: the new tests still do not drive the real support-trimming path through HonestDiD. test_trends_nonparam_unequal_support stops before honest_did=True, test_honest_did_with_trends_nonparam uses a balanced design, and the new dCDH boundary tests simulate missing horizons by mutating SEs instead of going through the real trends_nonparam trimming path. See tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744, tests/test_chaisemartin_dhaultfoeuille.py:L3444-L3466, and tests/test_honest_did.py:L1419-L1457. Concrete fix: add one real unequal-support trends_nonparam fixture under HonestDiD and assert the warning plus retained-horizon metadata on the standalone HonestDiD result.

…mary/export - HonestDiDResults.summary() now renders target_label, pre/post_periods_used - HonestDiDResults.to_dict() includes target_label, pre/post_periods_used - to_dataframe() inherits from to_dict() automatically - Add test_dcdh_standalone_surfaces_target_metadata verifying all three surfaces include target metadata for custom l_vec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T18:04:09Z

/ai-review

github-actions · 2026-04-14T18:12:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: dd06b3bbfd2d68bfc356caba1cc704a662cbcec0

Overall Assessment

Looks good. This re-review resolves the prior P1 on missing standalone HonestDiD estimand metadata; the highest remaining unmitigated item is P2 test coverage.

Executive Summary

Previous P1 resolved: standalone HonestDiD output now preserves target_label, pre_periods_used, and post_periods_used through summary(), to_dict(), and to_dataframe() (diff_diff/honest_did.py:L2269-L2348, diff_diff/honest_did.py:L189-L368).
The embedded dCDH summary now renders the HonestDiD target and retained horizons, so fit(honest_did=True) output is no longer under-specified (diff_diff/chaisemartin_dhaultfoeuille_results.py:L974-L1013).
The new fit-time guards for honest_did=True are appropriate: L_max >= 1 is enforced, placebo=False is rejected, and successful fits attach results.honest_did_results (diff_diff/chaisemartin_dhaultfoeuille.py:L957-L967, diff_diff/chaisemartin_dhaultfoeuille.py:L2419-L2437).
The dCDH-specific HonestDiD departures are now explicitly documented in the registry, so they are non-blocking under the stated review policy (docs/methodology/REGISTRY.md:L620-L620).
One P2 remains: the tests still do not drive the actual unequal-support trends_nonparam trimming path through HonestDiD and assert the retained-horizon warning/metadata (tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744, tests/test_chaisemartin_dhaultfoeuille.py:L3444-L3466, tests/test_honest_did.py:L1419-L1460).
Static review only; I could not execute the changed tests here because this environment is missing pytest and numpy.

Methodology

Severity P3. Impact: No unmitigated methodology defect remains in the changed estimator path. Rambachan-Roth frame HonestDiD as a sensitivity-analysis / partial-identification framework, and the official R helpers define the scalar target as theta = l_vec' tau with l_vec defaulting to the first post-treatment basis vector; this PR’s dCDH branch intentionally departs by using DID^{pl}_l placebos as pre-period inputs and an equal-weight default target, but those departures are now explicitly documented in docs/methodology/REGISTRY.md:L620-L620 and surfaced in runtime extraction/warnings at diff_diff/honest_did.py:L843-L946. Concrete fix: none. (academic.oup.com)

Code Quality

No findings in the changed code.

Performance

No findings in the changed code.

Maintainability

No findings in the changed code. The summary() refactor into helper blocks is a net improvement.

Tech Debt

Severity P3. Impact: The pre-existing Delta^RM naive-FLCI limitation remains tracked in TODO.md:L73-L73, and the methodology registry still marks it as a documented deviation rather than a new regression in this PR (docs/methodology/REGISTRY.md:L2171-L2173). Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

Severity P2. Impact: The new tests cover unequal-support trends_nonparam without HonestDiD, balanced trends_nonparam with HonestDiD, and synthetic boundary failures created by mutating SEs, but they still do not run the real support-trimmed trends_nonparam path all the way through HonestDiD and assert the warning plus pre_periods_used/post_periods_used on the resulting standalone or embedded HonestDiD output (tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744, tests/test_chaisemartin_dhaultfoeuille.py:L3444-L3466, tests/test_honest_did.py:L1419-L1460). Concrete fix: add one end-to-end fixture where trends_nonparam removes at least one non-boundary horizon but retains -1 and +1, then run honest_did=True and/or compute_honest_did(results) and assert both the drop-warning text and the retained-horizon metadata on HonestDiDResults.

test_honest_did_trends_nonparam_trimming: State B's early-switching controls vanish at far horizons, causing N_l=0 at h=3 and h=-3. HonestDiD extraction drops NaN-SE horizons and retains [-2,-1,1,2]. Asserts n_obs=0 at trimmed horizons, finite bounds on retained block, and post_periods_used excludes h=3. This exercises the real trends_nonparam support-trimming path through HonestDiD (not SE mutation), addressing the recurring P2 finding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T18:23:01Z

/ai-review

github-actions · 2026-04-14T18:32:25Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 4943910e66b1d878d169e74b63d8ebc8da9fd753

Overall Assessment

⛔ Blocker — one unmitigated P0 remains. The new fit-time HonestDiD integration silently ignores a non-default estimator alpha, so fit(..., honest_did=True) can report the wrong robust CI/significance level whenever ChaisemartinDHaultfoeuille(alpha != 0.05) is used.

Executive Summary

The previous re-review P1 is resolved: standalone HonestDiD results now retain target/horizon metadata, and the embedded dCDH summary renders it (diff_diff/honest_did.py:194, diff_diff/chaisemartin_dhaultfoeuille_results.py:989).
New P0: the fit-time integration calls compute_honest_did(...) without alpha=self.alpha, so non-default estimator confidence levels are silently wrong (diff_diff/chaisemartin_dhaultfoeuille.py:2421, diff_diff/honest_did.py:2694).
The substantive dCDH HonestDiD departures from canonical Rambachan-Roth are now documented in the registry and mirrored in code, so I am treating them as informational rather than defects (docs/methodology/REGISTRY.md:620).
The new honest_did path is still untested on the bootstrap inference surface, which is a P1 parameter-interaction gap under the review rubric (tests/test_chaisemartin_dhaultfoeuille.py:3253, tests/test_honest_did.py:1343).
The prior support-trimming coverage gap is only partially closed: the new end-to-end fixture checks retained horizons, but still suppresses instead of asserts the documented trimming warning (tests/test_chaisemartin_dhaultfoeuille.py:3468).
Static review only; I could not run the changed tests here because this environment is missing pytest and numpy.

Methodology

Severity P0. Impact: fit(..., honest_did=True) omits alpha=self.alpha when it constructs results.honest_did_results (diff_diff/chaisemartin_dhaultfoeuille.py:2425). compute_honest_did() defaults to alpha=0.05 (diff_diff/honest_did.py:2698), and the dCDH summary prints hd.alpha directly (diff_diff/chaisemartin_dhaultfoeuille_results.py:983). That means a user fitting ChaisemartinDHaultfoeuille(alpha=0.10, honest_did=True) will silently get 95% HonestDiD intervals/significance instead of 90%. Concrete fix: pass alpha=self.alpha into the fit-time compute_honest_did(...) call and add a regression test with alpha=0.10 asserting results.honest_did_results.alpha == results.alpha and narrower robust CIs than at alpha=0.05.
Severity P3. Impact: The dCDH-specific deviations from canonical HonestDiD are now documented and match the implementation: placebo-based pre-period coefficients (diff_diff/honest_did.py:843), diagonal covariance with no full VCV (diff_diff/honest_did.py:867), equal-weight post-treatment default target (diff_diff/honest_did.py:2269), and consecutive-block trimming with warning (diff_diff/honest_did.py:894), all reflected in docs/methodology/REGISTRY.md:620. Concrete fix: none. Rambachan-Roth frame HonestDiD as robust sensitivity analysis for a partially identified scalar target under bounded parallel-trends violations, and the official HonestDiD package uses l_vec to switch away from its default first post-treatment effect, so these documented dCDH extensions are not methodology defects. (academic.oup.com)

Code Quality

No additional findings beyond the alpha propagation defect above.

Performance

No findings.

Maintainability

No findings. The summary() split into helper blocks is a net improvement.

Tech Debt

Severity P3. Impact: the pre-existing HonestDiD Delta^RM naive-FLCI limitation remains tracked in TODO.md:73, so it is not a blocker for this PR. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

Severity P1. Impact: the new honest_did parameter is not exercised on the bootstrap inference path. fit() rewrites per-horizon SEs in event_study_effects and placebo_event_study from bootstrap outputs (diff_diff/chaisemartin_dhaultfoeuille.py:1905, diff_diff/chaisemartin_dhaultfoeuille.py:2108), and _extract_event_study_params() consumes exactly those se fields for HonestDiD (diff_diff/honest_did.py:867). But the new test matrices only cover the analytical path (tests/test_chaisemartin_dhaultfoeuille.py:3253, tests/test_honest_did.py:1343). Under the stated parameter-interaction rule, that is still a P1. Concrete fix: add one fit(..., honest_did=True, n_bootstrap>0) test and one standalone compute_honest_did() test on a bootstrap-fitted dCDH result, asserting finite bounds and stable retained-horizon metadata.
Severity P2. Impact: the old support-trimming coverage gap is only partially resolved. The new end-to-end fixture reaches the real trends_nonparam trimming path and checks retained horizons, but it suppresses warnings with simplefilter("ignore") instead of asserting the documented warning text (tests/test_chaisemartin_dhaultfoeuille.py:3503). A regression that drops the promised “Dropping non-consecutive horizons …” warning from diff_diff/honest_did.py:938 would still pass. Concrete fix: capture warnings in that test and assert the warning text together with pre_periods_used / post_periods_used.

Path to Approval

Propagate alpha=self.alpha in the fit-time HonestDiD call at diff_diff/chaisemartin_dhaultfoeuille.py:2425, and add a non-default-alpha regression test.
Add bootstrap interaction tests for both fit(..., honest_did=True) and standalone compute_honest_did() on bootstrap-fitted dCDH results.
Update test_honest_did_trends_nonparam_trimming to record and assert the documented trimming warning instead of suppressing it.

compute_honest_did() call in fit() now passes alpha=self.alpha instead of inheriting the default 0.05. Added regression test asserting honest_did_results.alpha matches estimator alpha. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T18:56:31Z

/ai-review

github-actions · 2026-04-14T19:04:19Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 46e6dc50428135cca7dda8f9199b726106bc4b5b

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains. The prior implementation defects are addressed, but the new honest_did surface still is not tested on the bootstrap inference path.

Executive Summary

The prior implementation findings are resolved: fit-time HonestDiD now propagates estimator alpha, and the new result metadata/summary plumbing is in place (diff_diff/chaisemartin_dhaultfoeuille.py:L2421-L2428, tests/test_chaisemartin_dhaultfoeuille.py:L3407-L3435, diff_diff/honest_did.py:L194-L196, diff_diff/honest_did.py:L2334-L2347).
The dCDH-specific HonestDiD deviations are now documented in the methodology registry, so they are P3 informational rather than methodology defects (docs/methodology/REGISTRY.md:L620).
One P1 remains: there is still no bootstrap interaction coverage for either fit(..., honest_did=True) or standalone compute_honest_did() on bootstrap-fitted dCDH results.
One P2 remains: the new support-trimming regression suppresses, rather than asserts, the documented non-consecutive-horizon warning.
Static review only; I could not run the changed tests here because pytest is unavailable in this environment.

Methodology

Affected method: HonestDiD (Rambachan-Roth 2023) applied to the dCDH placebo + event-study surface. The new summary() blocks are presentation-only.

Severity P3. Impact: The new dCDH path deviates from canonical event-study HonestDiD by using dCDH placebo coefficients as pre-period inputs, using diagonal covariance, and defaulting fit-time calls to an equal-weight post-treatment target. Those departures are explicitly documented in docs/methodology/REGISTRY.md:L620 and mirrored in the runtime warning at diff_diff/honest_did.py:L843-L853, so under the review rubric they are informational, not defects. Rambachan-Roth frame HonestDiD as inference for a partially identified causal parameter under restrictions on post-treatment violations, and the official HonestDiD package exposes the scalar target as theta = l_vec' tau with a first-basis-vector default; the library’s equal-weight target is therefore a documented target-choice deviation rather than a methodology error. Concrete fix: none. citeturn1view0turn2view0

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: The pre-existing HonestDiD Delta^RM naive-FLCI limitation remains tracked in TODO.md:L73, so it is not a blocker for this PR. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

Severity P1. Impact: The new honest_did parameter still lacks bootstrap interaction coverage. The fit-time path computes HonestDiD only after bootstrap overrides the event-study and placebo-horizon SE fields (diff_diff/chaisemartin_dhaultfoeuille.py:L1905-L1929, diff_diff/chaisemartin_dhaultfoeuille.py:L2087-L2119, diff_diff/chaisemartin_dhaultfoeuille.py:L2421-L2428), and the dCDH extractor builds HonestDiD’s diagonal covariance directly from those se values (diff_diff/honest_did.py:L867-L968). But the new coverage is analytical-only: the helper in tests/test_honest_did.py:L1347-L1361 has no n_bootstrap path, and the new honest_did=True regressions in tests/test_chaisemartin_dhaultfoeuille.py:L3253-L3531 never fit with bootstrap enabled. Under the parameter-interaction rule, that remains a P1. Concrete fix: add one fit(..., honest_did=True, n_bootstrap>0) regression and one standalone compute_honest_did() regression on a bootstrap-fitted dCDH result, asserting finite robust bounds and stable retained-horizon metadata.
Severity P2. Impact: The new support-trimming end-to-end test still suppresses the documented warning instead of asserting it. The extraction code promises a warning when non-consecutive horizons are dropped (diff_diff/honest_did.py:L934-L946), but tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3520 ignores warnings entirely. A regression that removes or changes that warning would still pass. Concrete fix: capture warnings in test_honest_did_trends_nonparam_trimming and assert the "Dropping non-consecutive horizons" message alongside the retained-horizon metadata.

Path to Approval

Add a fit-time bootstrap regression for ChaisemartinDHaultfoeuille(..., n_bootstrap>0).fit(..., L_max>=2, honest_did=True) that asserts results.honest_did_results is populated, its CI endpoints are finite, and its retained horizon metadata matches the post-bootstrap finite-SE surface.
Add a standalone bootstrap regression for compute_honest_did() on a bootstrap-fitted dCDH result that exercises the same SE-consumption path and verifies stable bounds/metadata.

P1: Add test_honest_did_with_bootstrap - fits with n_bootstrap=49 and honest_did=True, asserts finite bounds and retained horizons. P2: test_honest_did_trends_nonparam_trimming now captures warnings and asserts the placebo-based pre-period warning is emitted (was suppressing all warnings before). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T19:11:09Z

/ai-review

igerber · 2026-04-14T19:33:49Z

/ai-review

github-actions · 2026-04-14T19:43:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 820c9dd0a1302590153bf091434956511ffec6b5

Overall Assessment

✅ Looks good — the prior bootstrap/HonestDiD interaction P1 is addressed, and I did not find any new unmitigated P0/P1 issues in the changed estimator, inference, or default-behavior paths. One non-blocking P2 test gap remains.

Executive Summary

Previous P1 resolved: the new bootstrap regression at tests/test_chaisemartin_dhaultfoeuille.py:L3539-L3550 now covers the production path where dCDH first propagates bootstrap SEs into post/placebo horizons at diff_diff/chaisemartin_dhaultfoeuille.py:L1905-L1929 and diff_diff/chaisemartin_dhaultfoeuille.py:L2087-L2119, then calls compute_honest_did() at diff_diff/chaisemartin_dhaultfoeuille.py:L2418-L2428.
The dCDH-specific HonestDiD departures are now explicitly documented in the registry and warned at runtime, so they are P3 informational rather than methodology defects: placebo coefficients as pre-period inputs, diagonal covariance, and equal-weight fit-time targeting at docs/methodology/REGISTRY.md:L620, diff_diff/honest_did.py:L843-L850, and diff_diff/honest_did.py:L2269-L2289.
No new P0/P1 findings in the changed methodology, weighting, variance, or assumption-check code.
Remaining P2: the new trimming regression still does not assert the specific non-consecutive-horizon warning emitted by the extractor, so that warning could regress silently and the test would still pass at diff_diff/honest_did.py:L934-L946 and tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3537.
Static review only: I could not execute the suite here because pytest and even numpy are unavailable in this sandbox.

Methodology

Severity P3 | Impact: Rambachan-Roth set HonestDiD up as sensitivity analysis for a scalar post-treatment target under restriction classes including smoothness and relative magnitudes. The dCDH integration here changes the input surface and default target, but those departures are explicitly documented in the registry and surfaced with a runtime warning, so under this rubric they are informational rather than defects. Concrete fix: none. docs/methodology/REGISTRY.md:L620 docs/methodology/REGISTRY.md:L2169-L2173 diff_diff/honest_did.py:L843-L850 diff_diff/honest_did.py:L2269-L2289. citeturn3view0turn2view0

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3 | Impact: honest_did=True defaults to method="relative_magnitude", whose CI path still uses the pre-existing naive-FLCI implementation documented in the HonestDiD registry and already tracked in TODO.md. That limitation predates this PR and is properly tracked, so it is non-blocking here. Concrete fix: none for this PR. docs/methodology/REGISTRY.md:L2171-L2173 TODO.md:L73

Security

No findings.

Documentation/Tests

Severity P2 | Impact: The remaining re-review gap is test-specific. tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3537 records warnings but only asserts the placebo-input warning, even though production code also promises a distinct "Dropping non-consecutive horizons" warning when it trims to the boundary-adjacent consecutive block at diff_diff/honest_did.py:L934-L946. A regression that removes or weakens that warning would still pass. Concrete fix: in test_honest_did_trends_nonparam_trimming, additionally assert a warning containing "Dropping non-consecutive horizons" and ideally the retained pre= / post= metadata.

Static only: I could not run the changed tests in this environment because the sandbox lacks pytest and numpy.

igerber · 2026-04-14T19:46:16Z

/ai-review

Add test_dcdh_interior_gap_triggers_trimming_warning: corrupts h=-2 SE to create an interior gap [-3, -1], asserts the "Dropping non-consecutive horizons" warning is emitted and pre_periods_used == [-1]. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-14T19:53:03Z

/ai-review

github-actions · 2026-04-14T19:59:56Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b88a97e7d39bb75a3cc8ed00ee24ea96aa48e970

Overall Assessment

✅ Looks good

No unmitigated P0 or P1 findings. The prior re-review gap around non-consecutive-horizon warning coverage is addressed, and I did not find new correctness, methodology, variance/SE, or assumption-check regressions in the changed dCDH HonestDiD path.

Executive Summary

The changed methodology-sensitive paths are the new fit(..., honest_did=True) integration, the dCDH adapter inside HonestDiD extraction, and the new summary rendering blocks. I did not find new unmitigated P0/P1 defects in those paths.
The dCDH-specific HonestDiD departures are explicitly documented in the Methodology Registry and mirrored in code: placebo coefficients as pre-period inputs, diagonal covariance, equal-weight post-horizon targeting by default, and consecutive-block trimming for trends_nonparam gaps.
The prior re-review test gap is resolved by the new explicit warning assertion in tests/test_honest_did.py:L1449-L1470.
The end-to-end fit path is also covered for validation, standalone parity, trimming, bootstrap, and summary rendering in tests/test_chaisemartin_dhaultfoeuille.py:L3253-L3311, tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3550, and tests/test_chaisemartin_dhaultfoeuille.py:L3558-L3654.
Static review only: pytest is not installed in this environment, so I could not execute the targeted tests.

Methodology

Severity P3 | Impact: HonestDiD is defined around an event-study coefficient vector and covariance matrix for a scalar target l_vec' tau; the official package defaults l_vec to the first post-treatment effect and exposes conditional/hybridized CI routines for non-smoothness restrictions. This PR’s dCDH integration instead uses DID^{pl}_l placebos as pre-period coefficients, diagonal covariance, and an equal-weight post-horizon target when called from fit(). Those are real deviations from canonical/package behavior, but they are explicitly documented in docs/methodology/REGISTRY.md:L620, warned in diff_diff/honest_did.py:L843-L946, and described in diff_diff/chaisemartin_dhaultfoeuille.py:L540-L549. Concrete fix: none; documented deviation only. citeturn1view0

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3 | Impact: The DeltaRM CI path still uses naive FLCI rather than the paper/package’s ARP conditional or hybrid confidence sets. That limitation predates this PR, is conservative rather than silently anti-conservative, and is already documented/tracked in docs/methodology/REGISTRY.md:L2171-L2173, diff_diff/honest_did.py:L2397-L2437, and TODO.md:L73. Concrete fix: none for this PR; already tracked. citeturn1view0

Security

No findings.

Documentation/Tests

No findings. The previous warning-coverage gap is resolved by tests/test_honest_did.py:L1449-L1470, and the fit-time integration path now has direct coverage for validation, standalone parity, trimming, bootstrap, and summary rendering in tests/test_chaisemartin_dhaultfoeuille.py:L3253-L3311, tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3550, and tests/test_chaisemartin_dhaultfoeuille.py:L3558-L3654.

igerber · 2026-04-14T20:02:26Z

/ai-review

github-actions · 2026-04-14T20:13:39Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b88a97e7d39bb75a3cc8ed00ee24ea96aa48e970

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings in the changed dCDH HonestDiD or summary() paths. The methodology-sensitive departures are documented in the Methodology Registry, the new fit-time guards cover the key preconditions, and the re-review warning/trimming paths now have direct test coverage. Static review only: pytest is not installed in this environment.

Executive Summary

The methodology-sensitive changes are the new dCDH HonestDiD adapter in diff_diff/honest_did.py:834, the fit-time honest_did=True integration in diff_diff/chaisemartin_dhaultfoeuille.py:2419, and the new summary rendering helpers in diff_diff/chaisemartin_dhaultfoeuille_results.py:809.
I did not find a new silent correctness, variance/SE, control-group, or undocumented methodology defect in those paths.
The dCDH departures from canonical HonestDiD are explicitly documented in docs/methodology/REGISTRY.md:620 and mirrored in runtime warnings at diff_diff/honest_did.py:843.
The new precondition checks for L_max and placebo=True are enforced in diff_diff/chaisemartin_dhaultfoeuille.py:957 and diff_diff/chaisemartin_dhaultfoeuille.py:963.
The re-review warning/trimming cases are directly covered in tests/test_honest_did.py:1402, tests/test_honest_did.py:1449, and tests/test_chaisemartin_dhaultfoeuille.py:3515.
I could not execute the tests here because pytest is unavailable.

Methodology

Severity P3 | Impact: dCDH HonestDiD is a documented library extension: it uses DID^{pl}_l placebos as pre-period coefficients, diagonal covariance, an equal-weight default target, and consecutive-block trimming under trends_nonparam. Those are real departures from canonical event-study HonestDiD, but they are documented in docs/methodology/REGISTRY.md:620, warned in diff_diff/honest_did.py:843, and implemented consistently in diff_diff/honest_did.py:867 and diff_diff/chaisemartin_dhaultfoeuille.py:2425. Concrete fix: none; documented deviation only.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3 | Impact: Delta^RM still uses naive FLCI instead of the paper/package’s ARP conditional or hybrid CI. That limitation predates this PR, is conservative rather than silently anti-conservative, and is already documented/tracked in docs/methodology/REGISTRY.md:2171, diff_diff/honest_did.py:2417, and TODO.md:73. Concrete fix: none for this PR; already tracked.

Security

No findings.

Documentation/Tests

No findings. The new tests cover fit-time HonestDiD integration, precondition errors, placebo-warning emission, consecutive-grid trimming, and the new summary blocks at tests/test_chaisemartin_dhaultfoeuille.py:3253, tests/test_chaisemartin_dhaultfoeuille.py:3515, tests/test_chaisemartin_dhaultfoeuille.py:3558, tests/test_honest_did.py:1402, and tests/test_honest_did.py:1449. I could not run them here because pytest is not installed.

igerber and others added 2 commits April 14, 2026 08:24

igerber added the ready-for-ci Triggers CI test workflows label Apr 14, 2026

igerber merged commit e92b6f7 into main Apr 14, 2026
23 of 24 checks passed

igerber deleted the feature/dcdh-honest-did-integration branch April 14, 2026 21:37

igerber mentioned this pull request Apr 14, 2026

Release v3.1.0 #304

Merged

Conversation

igerber commented Apr 14, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

igerber commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant