[hansen_singleton] Replace pandas-datareader with direct FRED / Fama-French downloads#925
[hansen_singleton] Replace pandas-datareader with direct FRED / Fama-French downloads#925mmcky wants to merge 3 commits into
Conversation
…ench fetch pandas-datareader 0.10.0 (last released 2021, unmaintained) breaks at import under pandas 3.0 -- it relies on pandas' private deprecate_kwarg, whose signature changed -- so hansen_singleton_1982 and hansen_singleton_1983 fail to execute under anaconda 2026.06 (see #923). There is no pandas-3.0-compatible pandas-datareader release to pin to. Replace the two web.DataReader calls with small direct downloads that use only the standard library + pandas: - FRED: pd.read_csv from the fredgraph.csv endpoint - Fama-French: parse the F-F_Research_Data_Factors zip from the Ken French data library Since no extra package is needed, the in-notebook `!pip install pandas-datareader` cell and the now-dead date_parser warnings filter are removed too. Verified the new fetch returns byte-identical FRED and Fama-French data to the old pandas-datareader path on pandas 2.3.3, and that the full data construction runs clean with FutureWarning/DeprecationWarning promoted to errors (i.e. pandas-3.0 safe). Closes #924 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Hansen–Singleton 1982/1983 lecture notebooks to remove the runtime dependency on pandas-datareader (which is incompatible with pandas 3.0), replacing it with direct downloads from FRED (CSV endpoint) and the Ken French data library (zip + CSV parsing) using only the standard library and pandas.
Changes:
- Removed the in-notebook
!pip install pandas-datareaderand thepandas_datareaderimport usage. - Added small in-notebook helpers to download/parse FRED series and monthly Fama–French factors directly.
- Updated lecture text to reflect the new data sources (FRED + Ken French) and the direct-download approach.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
lectures/hansen_singleton_1982.md |
Replaces pandas-datareader-based FRED/Fama–French fetching with direct downloads and parsing. |
lectures/hansen_singleton_1983.md |
Same migration as 1982 lecture, keeping the constructed estimation dataset consistent while avoiding pandas 3.0 breakage. |
…line
Switch both lectures to read the pre-built monthly CSV from
_static/lecture_specific/hansen_singleton_198{2,3}/ (added in PR #926) via its
raw GitHub URL, replacing the inline FRED / Fama-French download helpers from
the previous commit. The data construction now lives in the per-lecture
make_data.py maintenance scripts; the lectures just read the frozen snapshot.
This keeps the build reproducible and off the live data endpoints, and still
removes the pandas-datareader dependency that breaks under pandas 3.0.
Depends on PR #926 (must land on main first so the raw URL resolves).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📖 Netlify Preview Ready!Preview URL: https://pr-925--sunny-cactus-210e3e.netlify.app Commit: 📚 Changed LecturesBuild Info
|
| frame = pd.read_csv(DATA_URL, index_col=0, parse_dates=True) | ||
| start = pd.Timestamp(start).to_period("M").to_timestamp("M") | ||
| end = pd.Timestamp(end).to_period("M").to_timestamp("M") | ||
| return frame.loc[start:end] |
There was a problem hiding this comment.
Fixed in 135e0dd — _data = pd.read_csv(DATA_URL, ...) is read once at cell scope and load_hs_monthly_data slices a copy of it. Verified the CSV is now fetched only once even though both get_estimation_data and get_tbill_estimation_data call it.
| frame = pd.read_csv(DATA_URL, index_col=0, parse_dates=True) | ||
| start = pd.Timestamp(start).to_period("M").to_timestamp("M") | ||
| end = pd.Timestamp(end).to_period("M").to_timestamp("M") | ||
| return frame.loc[start:end] |
There was a problem hiding this comment.
Fixed in 135e0dd — the vendored CSV is now read once into a cell-scope _data and load_hs_monthly_data returns a sliced .copy(), so repeated calls don't re-fetch or re-parse.
Read the snapshot once into a module-level _data and have load_hs_monthly_data slice a copy of it, instead of re-downloading/parsing on every call. This removes the redundant fetch in hansen_singleton_1983 (which loads via both get_estimation_data and get_tbill_estimation_data). The .copy() keeps callers from mutating the cached frame. Addresses Copilot review on PR #925. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wrap zipfile.ZipFile(...) in a `with` block so the archive is explicitly closed, instead of leaving it to garbage collection. Pure refactor: both scripts still reproduce byte-identical CSVs. Addresses Copilot review (raised on PR #925, where this code previously lived). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Closes #924.
Problem
hansen_singleton_1982andhansen_singleton_1983fail to execute under anaconda 2026.06 / pandas 3.0 (surfaced by the forced full execution in #923). Both!pip install pandas-datareaderand import it, butpandas-datareader0.10.0 (unmaintained since 2021) relies on the private pandas APIpandas.util._decorators.deprecate_kwarg, whose signature changed in pandas 3.0, so it dies at import:Approach
Following discussion, instead of fetching from the data providers at build time, the data is vendored:
_static/lecture_specific/hansen_singleton_198{2,3}/— amake_data.pymaintenance script (builds the dataset from FRED + Ken French), the frozen*_data.csvsnapshot, and aREADME.md.pandas-datareaderdependency (and the inline fetch helpers) and collapses each lecture's hidden data cell to a singlepd.read_csv(<raw GitHub URL>)of that snapshot, selecting the columns it needs.This matches the existing vendored-data convention used by
mle,ols, andpandas_panel, keeps the build reproducible, and removes the live-fetch fragility (the flaky-network class that also bitols).Verification
make_data.pyoutput vs oldpandas-datareaderpath (pandas 2.3.3)FutureWarning/DeprecationWarning→ errorNet effect on the lectures: −236 / +34 lines — the data machinery moves out to the maintenance scripts.
Note
The two
ar1_*lectures that also fail under a forced run are a separate, pre-existing arviz issue, out of scope here.