Skip to content

[unemployment_bayes] New lecture: Bayesian estimation of nonlinear unemployment dynamics #910

@jstac

Description

@jstac

Goal

Add a new lecture, unemployment_bayes.mdBayesian Estimation of Nonlinear Unemployment Dynamics — covering a smooth nonlinear mean-reversion model for the US unemployment rate, estimated with NUTS in NumPyro, with the full Bayesian-workflow toolkit (prior/posterior predictive checks, R̂, ESS, energy/pair plots, LOO-PSIS, and a NUTS-vs-RWMH-vs-SVI comparison).

It sits alongside the existing Bayesian lectures ar1_bayes.md and bayes_nonconj.md, and is intended as step one of a sequence leading toward a latent-state fisheries/stock-assessment lecture.

Origin

Drafted from unemployment_bayes_lecture.tex (full source embedded below). The draft was reviewed; the Bayesian scaffolding is strong, but the core model needed changes before implementation (see below). Decisions are settled.

Review findings (settled)

The original draft used a sinh restoring force: Δu_t = α + β·sinh(λ(u_{t-1} − ū)) + ε_t. Three substantive problems:

  1. sinh is dynamically explosive over the empirically relevant range. A super-exponential force in a discrete-time map overshoots and diverges. At the prior means, a recession-sized gap (u = 8–9%, which occurred in 1975, 1982, 2009) detonates within a few steps; ~23% of draws from the stated priors produce impossible (negative or NaN) unemployment paths. Verified by simulation:
    lam=1.0  start u=9% ->   9.0   0.8  10.7  -33.4  7.5e15  -inf  nan
    prior predictive (n=2000): exploded=216 (11%)  negative=239 (12%)
    
  2. sinh is symmetric, but the headline stylized fact is asymmetry (Neftçi: rises fast, falls slow). The odd sinh gives equal pull both directions; asymmetry was only bolted on as an afterthought.
  3. The fisheries/Schaefer analogy was wrong as written. It claimed both forces are "smooth, odd functions of the gap." The Schaefer growth term rB(1−B/K) is an even concave parabola, and K (not K/2) is the equilibrium — K/2 is the MSY biomass, which is only the rest point under MSY harvesting.

Agreed design

  • Core model — saturating tanh (in percentage points):
    Δu_t = α + β·tanh(λ(u_{t-1} − ū)) + ε_t,   ε_t ~ N(0, σ²),   β < 0, λ > 0
    
    Odd, near-linear at the origin (reversion coeff βλ, preserving the β–λ identification story), and saturating (|tanh| ≤ 1) so the mean one-step pull is bounded by |β| and the process is globally stable. β reads directly as the max monthly mean-reversion pull (pp); λ sets the gap width of the linear→saturated transition. This is already an LSTAR-family form, tying it to the canonical nonlinear-unemployment literature (Koop–Potter, Teräsvirta).
  • sinh demoted to a cautionary prior-predictive exercise — show it detonates and that ~23% of prior draws are impossible. Lesson: prior predictive checks catch pathologies that look fine equation-by-equation. (Turns the original weakness into the best teaching moment.)
  • Asymmetry promoted to its own substantive extension — hinge term γ(u_{t-1}−ū)⁺ (intuitive), then LSTAR (literature-grade).
  • Priors derived via prior predictive checks, not asserted. Starting point: ū~N(5,1.5²), β~TruncNormal(-0.3,0.2²,high=0), λ~HalfNormal(0.5), α~N(0,0.1²), σ~HalfNormal(0.3).
  • Logit-scale version (x=logit(u), respects (0,1) exactly) offered as an extension, keeping the core interpretable in pp.
  • Fisheries bridge reframed as methodological (latent biomass ↔ latent natural rate; same NumPyro/NUTS state-space workflow), with the Schaefer math corrected — connects to the existing Kalman-filter lectures.
  • Note monthly Δu is autocorrelated → iid-Gaussian residuals will (correctly) fail PPC → motivates quarterly data / AR errors.

Section plan

  1. Introduction & motivation (lead with asymmetry + persistence; methodological fisheries throughline)
  2. The model — tanh smooth-transition force; stability box vs sinh; β–λ identification
  3. Asymmetry extension — hinge then LSTAR
  4. Bayesian formulation — likelihood; priors via prior predictive; sinh cautionary tale
  5. Data — UNRATE/FRED, post-1984 monthly ex-COVID default
  6. Questions, exercises, NumPyro implementation, full diagnostics checklist, sampler comparison (RWMH / fixed-HMC / SVI mean-field vs full-rank)
  7. Extensions — time-varying ū (Kalman bridge), heteroskedastic shocks, logit-scale, corrected Schaefer connection
  8. Suggested lecture timing

Next steps

  • Prototype the NumPyro tanh model on real UNRATE data; confirm the posterior, the β–λ ridge, and that PPC behaves
  • Tune priors via prior-predictive checks (record the workflow for the lecture)
  • Build the sinh cautionary-tale figures (deterministic blow-up + prior-predictive failure rate)
  • Draft lectures/unemployment_bayes.md in MyST, following ar1_bayes.md conventions (GPU admonition include, numpyro/jax/arviz/pymc imports)
  • Restoring-force and transition-width figures
  • Sampler comparison (RWMH / fixed-HMC / SVI) and diagnostics
  • Add to lectures/_toc.yml near the other Bayesian lectures
  • Verify execution (jupytext → py) and check runtime budget
Original draft source — unemployment_bayes_lecture.tex

(Embedded for self-containment. John may also drag-drop the original .tex onto this issue if a downloadable copy is preferred.)

\documentclass[12pt]{article}
\usepackage{amsmath, amssymb, amsthm}
\usepackage{geometry}
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{enumitem}
\usepackage{xcolor}
\usepackage{natbib}

\geometry{margin=1.2in}

\hypersetup{
    colorlinks=true,
    linkcolor=blue!60!black,
    citecolor=blue!60!black,
    urlcolor=blue!60!black
}

\newtheorem{remark}{Remark}
\newtheorem{example}{Example}

\title{\textbf{Bayesian Estimation of a Nonlinear Unemployment Dynamics Model}\\
\large{Notes Toward a QuantEcon Lecture}}
\author{John Stachurski}
\date{\today}

\begin{document}

\maketitle

\tableofcontents
\bigskip

% =========================================================
\section{Introduction and Motivation}
% =========================================================

These notes develop a QuantEcon lecture on Bayesian estimation of a
nonlinear time series model.  The immediate application is unemployment
dynamics, but the modelling ideas and computational techniques carry
over directly to a range of settings in economics, ecology, and
fisheries management.

\subsection{Why nonlinear dynamics?}

Standard linear autoregressive models of macroeconomic time series
impose symmetry and constant mean-reversion speed by construction.
For unemployment these assumptions are empirically questionable.  A
well-documented stylized fact, going back to \citet{Neftci1984}, is
that unemployment rises quickly during recessions and falls slowly
during recoveries.  A second feature, less discussed but equally
visible in the data, is that the restoring force pulling unemployment
back toward its long-run equilibrium appears stronger the further the
series is from that equilibrium.  Near the natural rate the series
drifts; far above it the series snaps back.

Capturing these features requires a nonlinear model.  We develop one
that is parsimonious, economically interpretable, and — crucially for
our purposes — smooth enough that modern Hamiltonian Monte Carlo (HMC)
samplers work without modification.

\subsection{Why Bayesian estimation?}

The model contains parameters that are only weakly identified from
data: the natural rate $\bar u$ and the nonlinearity scale $\lambda$
are an example of a pair that cannot be sharply separated without
strong prior information.  Classical point estimation would paper over
this uncertainty.  Bayesian estimation surfaces it explicitly through
the joint posterior, making the lecture a natural vehicle for teaching
several foundational ideas:

\begin{itemize}[itemsep=4pt]
  \item how prior choice affects inference when identification is weak,
  \item how to read and interpret posterior correlations between
        parameters,
  \item how to assess model fit through posterior predictive checks,
  \item how NUTS/HMC differs from simpler samplers such as
        Metropolis--Hastings, and when the difference matters.
\end{itemize}

\subsection{Connection to fisheries and the broader research programme}

This lecture is designed as the first step in a sequence.  The
nonlinear mean-reversion structure developed here is structurally
identical to logistic growth in a surplus production model for
fisheries stock assessment.  The $\sinh$ restoring force below plays
exactly the role that the Schaefer production term $rB_t(1-B_t/K)$
plays in a biomass dynamics model.  Both are smooth, odd functions of
the gap from equilibrium; both are approximately linear near
equilibrium and superlinear in the tails.  Mastering Bayesian
estimation in the unemployment setting — where the data are familiar
and the equilibrium concept is well understood — provides the
conceptual and computational foundation for the fisheries application,
where the latent state (biomass) is never directly observed.

% =========================================================
\section{The Model}
% =========================================================

\subsection{Notation and setup}

Let $u_t \in (0, 1)$ denote the civilian unemployment rate at time
$t$, measured as a proportion (so $u_t = 0.05$ corresponds to $5\%$).
In practice we work with percentage points; the priors below are
stated in those units.  Let $T$ denote the sample size and write
$\mathbf{u} = (u_1, \ldots, u_T)$ for the observed data.

The model is a first-order nonlinear autoregression for the change in
unemployment:

\begin{equation}
  \label{eq:model}
  \Delta u_t \;=\; \alpha \;+\; \beta\,\sinh\!\bigl(\lambda(u_{t-1} - \bar u)\bigr)
              \;+\; \epsilon_t,
  \qquad \epsilon_t \overset{\text{iid}}{\sim} \mathcal{N}(0,\,\sigma^2),
\end{equation}

where $\Delta u_t = u_t - u_{t-1}$.  The model has five parameters:

\begin{center}
\begin{tabular}{lll}
\toprule
Parameter & Symbol & Role \\
\midrule
Natural rate      & $\bar u > 0$  & Equilibrium toward which $u_t$ reverts \\
Reversion strength & $\beta < 0$  & Overall speed of mean reversion \\
Nonlinearity scale & $\lambda > 0$ & Controls how quickly nonlinearity bites \\
Drift              & $\alpha \in \mathbb{R}$ & Residual asymmetry / long-run drift \\
Shock volatility   & $\sigma > 0$  & Standard deviation of innovations \\
\bottomrule
\end{tabular}
\end{center}

\subsection{The \texorpdfstring{$\sinh$}{sinh} restoring force}

The key term is $\sinh(\lambda g_t)$ where $g_t = u_{t-1} - \bar u$ is
the unemployment gap.  Recall
\[
  \sinh(x) \;=\; \frac{e^x - e^{-x}}{2}.
\]
This function has three properties that make it attractive here.

\begin{enumerate}[itemsep=4pt]
  \item \textbf{Antisymmetry.}  $\sinh(-x) = -\sinh(x)$, so the
        restoring force is in the correct direction regardless of the
        sign of the gap.

  \item \textbf{Near-linear for small gaps.}  By Taylor expansion,
        $\sinh(x) = x + x^3/6 + O(x^5)$, so for $\lambda g_t$ small
        the model reduces to a standard linear AR(1) gap equation:
        \[
          \Delta u_t \approx \alpha + \beta\lambda(u_{t-1} - \bar u)
                              + \epsilon_t.
        \]
        The composite parameter $\beta\lambda$ plays the role of the
        mean-reversion coefficient in the linear model.

  \item \textbf{Super-exponential for large gaps.}  For large $|x|$,
        $\sinh(x) \approx \tfrac{1}{2}e^{|x|}$, so the restoring
        force grows much faster than the gap itself.  This captures
        the intuition that large deviations from the natural rate
        generate unusually strong corrective pressure.
\end{enumerate}

Figure~\ref{fig:sinh} illustrates the restoring force
$-\beta\sinh(\lambda g)$ as a function of $g$ for representative
parameter values.  (This figure should be generated in the lecture
notebook.)

\begin{remark}
  The parameter $\lambda$ controls the threshold at which nonlinearity
  becomes important.  A small $\lambda$ keeps the model close to
  linear over the observed range of unemployment gaps; a large
  $\lambda$ pushes the model into its nonlinear regime even for modest
  gaps.  Whether the data support a large $\lambda$ is one of the
  interesting questions the posterior will answer.
\end{remark}

\subsection{Identification and the \texorpdfstring{$\beta$--$\lambda$}{beta-lambda} problem}

There is a well-known identification tension between $\beta$ and
$\lambda$.  Near equilibrium, the first-order dynamics depend on
$u_{t-1} - \bar u$ only through the product $\beta\lambda$.  Two
parameter combinations $(\beta_1, \lambda_1)$ and $(\beta_2,
\lambda_2)$ with $\beta_1\lambda_1 = \beta_2\lambda_2$ produce
indistinguishable dynamics in a neighbourhood of $\bar u$.  Only
observations in the tails — large gap episodes — identify $\lambda$
separately from $\beta$.

This near-collinearity will appear as a ridge in the joint posterior
of $(\beta, \lambda)$, which NUTS handles gracefully (it explores
curved posteriors efficiently) but which Metropolis--Hastings with an
isotropic proposal handles poorly.  Making this contrast vivid is a
good teaching moment.  One practical remedy is to normalise the gap
before estimation: define $\tilde g_t = (u_{t-1} - \bar u)/s$ where
$s$ is a prior scale for unemployment gaps (say $s = 2$ percentage
points), so that the argument to $\sinh$ is dimensionless and of order
one.  Then $\lambda$ measures deviations in units of $s$ rather than
raw percentage points.

\subsection{Adding explicit asymmetry}

The basic model~\eqref{eq:model} is symmetric around $\bar u$: the
restoring force is equally strong above and below equilibrium.  To
capture the Neftçi asymmetry — faster rises than falls — one can add
a hinge term:

\begin{equation}
  \label{eq:asymmetric}
  \Delta u_t \;=\; \alpha \;+\; \beta\,\sinh\!\bigl(\lambda(u_{t-1}-\bar u)\bigr)
              \;+\; \gamma\,(u_{t-1} - \bar u)^+
              \;+\; \epsilon_t,
\end{equation}

where $(x)^+ = \max(x,0)$.  With $\gamma < 0$ this adds extra
downward pull when unemployment is above the natural rate, generating
faster recovery from recessions than the symmetric model.  The
$\sinh$ term handles the nonlinear mean reversion; the hinge term
handles the directional asymmetry.  These are conceptually distinct
features and separating them makes both easier to interpret.

\begin{remark}
  An alternative to the hinge term is to replace the Gaussian
  $\epsilon_t$ with a skew-Normal distribution, putting the asymmetry
  in the shock distribution rather than the conditional mean.  Both
  formulations are worth exploring with students; they have different
  interpretations and in general give different fits.
\end{remark}

% =========================================================
\section{Bayesian Formulation}
% =========================================================

\subsection{Likelihood}

Conditional on parameters $\theta = (\alpha, \beta, \lambda, \bar u,
\sigma)$ and initial observation $u_0$, the observations
$u_1,\ldots,u_T$ are independent with

\begin{equation}
  u_t \mid u_{t-1}, \theta
  \;\sim\;
  \mathcal{N}\!\Bigl(
    u_{t-1} + \alpha + \beta\sinh\bigl(\lambda(u_{t-1}-\bar u)\bigr),\;
    \sigma^2
  \Bigr).
\end{equation}

The log-likelihood is therefore

\begin{equation}
  \ell(\theta;\mathbf{u})
  \;=\;
  -\frac{T}{2}\log(2\pi\sigma^2)
  \;-\;
  \frac{1}{2\sigma^2}
  \sum_{t=1}^{T}
  \Bigl[
    \Delta u_t - \alpha - \beta\sinh\bigl(\lambda(u_{t-1}-\bar u)\bigr)
  \Bigr]^2.
\end{equation}

This is differentiable in all parameters, which is why NUTS applies
without modification.

\subsection{Prior distributions}

The priors below are stated in percentage-point units (so $u_t = 5.0$
means $5\%$).  All are weakly informative: they rule out economically
implausible values while remaining diffuse over the plausible range.

\begin{align}
  \bar u   &\;\sim\; \mathcal{N}(5.0,\; 1.5^2)
             \label{prior:ubar}
             \tag{P1} \\[4pt]
  \beta    &\;\sim\; \mathcal{N}(-0.3,\; 0.2^2),
             \quad \beta < 0
             \label{prior:beta}
             \tag{P2} \\[4pt]
  \lambda  &\;\sim\; \mathrm{HalfNormal}(1.0)
             \label{prior:lambda}
             \tag{P3} \\[4pt]
  \alpha   &\;\sim\; \mathcal{N}(0,\; 0.2^2)
             \label{prior:alpha}
             \tag{P4} \\[4pt]
  \sigma   &\;\sim\; \mathrm{HalfNormal}(0.5)
             \label{prior:sigma}
             \tag{P5}
\end{align}

\paragraph{Notes on prior choices.}

\begin{itemize}[itemsep=4pt]

  \item \textbf{Natural rate \eqref{prior:ubar}.}  The US civilian
        unemployment rate has averaged approximately $5.5\%$ since
        1948 and has rarely sustained values below $3.5\%$ or above
        $11\%$.  A $\mathcal{N}(5.0, 1.5^2)$ prior puts the
        $95\%$ credible interval at roughly $(2.1, 7.9)$, which is
        consistent with most estimates of the NAIRU.

  \item \textbf{Reversion strength \eqref{prior:beta}.}  The
        truncation $\beta < 0$ enforces stationarity (mean reversion
        toward $\bar u$).  The prior mean of $-0.3$ implies that, in
        the linear approximation, roughly $30\%$ of a one-unit gap is
        closed each period.  This is moderate mean reversion.

  \item \textbf{Nonlinearity scale \eqref{prior:lambda}.}  A
        $\mathrm{HalfNormal}(1.0)$ prior places most mass on $\lambda
        \in (0, 2)$.  Since $\sinh(x) \approx x$ for $|x| \lesssim
        0.5$, a value $\lambda = 1$ implies nonlinearity becomes
        important when the gap exceeds about half a percentage point.
        Students should be encouraged to try alternative scales and
        observe the effect on the posterior.

  \item \textbf{Drift \eqref{prior:alpha}.}  Centered at zero,
        reflecting the prior belief that there is no secular trend in
        unemployment.  A non-zero posterior mean for $\alpha$ would be
        substantively interesting.

  \item \textbf{Volatility \eqref{prior:sigma}.}  Monthly changes in
        the unemployment rate are typically small (standard deviation
        around $0.2$ percentage points in normal times, larger during
        recessions), so $\mathrm{HalfNormal}(0.5)$ is diffuse but not
        uninformative.

\end{itemize}

\subsection{Posterior}

By Bayes' theorem the joint posterior is

\begin{equation}
  p(\theta \mid \mathbf{u})
  \;\propto\;
  \exp\!\bigl(\ell(\theta;\mathbf{u})\bigr)
  \cdot p(\theta),
\end{equation}

where $p(\theta) = p(\bar u)\,p(\beta)\,p(\lambda)\,p(\alpha)\,p(\sigma)$
is the joint prior.  This posterior has no closed form, so we use
MCMC.

% =========================================================
\section{Data}
% =========================================================

\subsection{Primary source}

The natural dataset is the \textbf{US Civilian Unemployment Rate}
published by the Bureau of Labor Statistics, series
\texttt{UNRATE}, available from FRED (Federal Reserve Bank of St.\
Louis).  It is monthly, seasonally adjusted, from January 1948 to
the present.  In Python:

\begin{verbatim}
import pandas_datareader.data as web
import datetime

start = datetime.datetime(1948, 1, 1)
end   = datetime.datetime(2024, 12, 31)
u = web.DataReader("UNRATE", "fred", start, end)
\end{verbatim}

Alternatively the data can be downloaded directly from
\url{https://fred.stlouisfed.org/series/UNRATE} and read with
\texttt{pandas}.

\subsection{Sample choice and preprocessing}

Several choices are worth discussing with students:

\begin{itemize}[itemsep=4pt]

  \item \textbf{Full sample (1948--present)} includes the Korean War
        boom, the stagflation era of the 1970s--80s, and the
        COVID-19 spike of 2020.  Structural breaks are likely.

  \item \textbf{Post-1984 ``Great Moderation'' sample} (1984--2019)
        is more homogeneous and easier to model with a single
        parameter set.  The COVID observation ($u_t \approx 14.7\%$
        in April 2020) is an extreme outlier and can be treated as a
        special case.

  \item \textbf{Quarterly data} (constructed by taking end-of-quarter
        values or averages) reduces autocorrelation in $\epsilon_t$
        and may be more appropriate for a model without MA terms.

\end{itemize}

For a first lecture, the post-1984 monthly series excluding the
COVID spike is a reasonable default.  The COVID observation can be
reintroduced as an exercise: does the posterior change substantially?
Does the model predict it as a tail event?

\subsection{Supplementary series}

The following related series from FRED are useful for robustness
checks and extensions:

\begin{center}
\begin{tabular}{lll}
\toprule
Series & FRED code & Description \\
\midrule
Short-term unemployment & \texttt{U6RATE}  & Includes marginally attached \\
Long-term unemployment  & \texttt{LNS13025703} & 27 weeks or more \\
Youth unemployment      & \texttt{LNS14000012} & Age 16--24 \\
State-level rates       & \texttt{*UR} codes & 50 states, useful for panel \\
\bottomrule
\end{tabular}
\end{center}

% =========================================================
\section{Interesting Questions and Exercises}
% =========================================================

\subsection{Substantive questions}

The following questions can be addressed directly from the posterior
and are likely to generate interesting class discussion.

\begin{enumerate}[itemsep=6pt]

  \item \textbf{Is the nonlinearity real?}  The posterior of $\lambda$
        answers this.  If the posterior is concentrated away from
        zero, the data support nonlinear mean reversion.  If it is
        diffuse and piles up near zero, the linear model is
        adequate.  This is a natural Bayesian model comparison
        exercise.

  \item \textbf{Where is the natural rate?}  The posterior of $\bar u$
        gives a full probability distribution over the natural rate.
        Compare this to external estimates (the CBO publishes a
        long-run NAIRU estimate).  The posterior width communicates
        the genuine uncertainty in this concept.

  \item \textbf{Is there genuine asymmetry?}  The extended
        model~\eqref{eq:asymmetric} with hinge parameter $\gamma$
        tests this directly.  Students can compare the posterior of
        $\gamma$ to zero, or compare marginal likelihoods of the
        symmetric and asymmetric specifications.

  \item \textbf{How did the Great Recession change things?}  Split
        the sample at 2007:Q4 and re-estimate on each sub-sample.
        Do $\bar u$, $\beta$, and $\lambda$ shift?  Does the
        posterior of $\bar u$ post-2008 lie above the pre-2008
        posterior, consistent with hysteresis?

  \item \textbf{What does the model say about the COVID spike?}
        Compute the posterior predictive probability of observing a
        gap as large as April 2020 under the estimated model.  This
        is a tail probability question and a good illustration of
        posterior predictive checks.

\end{enumerate}

\subsection{Methodological questions for teaching Bayesian foundations}

\begin{enumerate}[itemsep=6pt]

  \item \textbf{Prior sensitivity.}  Tighten and loosen the prior on
        $\bar u$ and $\lambda$.  Which parameters are prior-sensitive?
        Which are data-dominated?  This exercise builds intuition
        about the role of sample size and identification strength.

  \item \textbf{The $\beta$--$\lambda$ ridge.}  Plot the joint
        posterior of $(\beta, \lambda)$ as a scatter of MCMC draws.
        The ridge structure is a vivid illustration of weak
        identification and motivates the normalisation discussed in
        Section~\ref{sec:ident}.

  \item \textbf{Convergence and mixing.}  Run multiple chains from
        dispersed starting points.  Compute $\hat R$ (the
        Gelman--Rubin statistic) for each parameter.  For NUTS this
        should be close to 1.0; for a poorly tuned
        Metropolis--Hastings it may not be.

  \item \textbf{Effective sample size.}  Compare the ESS per second
        for NUTS versus a random-walk Metropolis--Hastings sampler.
        The comparison is especially stark for the $(\beta,\lambda)$
        pair, where NUTS exploits gradient information to traverse
        the ridge efficiently.

  \item \textbf{Posterior predictive checks.}  Generate simulated
        paths $\tilde{\mathbf{u}}$ from the posterior predictive
        distribution.  Compare the distribution of simulated paths to
        the observed series on summary statistics: proportion of time
        above $7\%$, mean duration of spells above $7\%$, maximum
        one-year change.  Systematic discrepancies reveal model
        misspecification.

\end{enumerate}

% =========================================================
\section{Computational Implementation}
% =========================================================

\subsection{NumPyro model}

The model maps naturally to NumPyro.  A sketch of the model
function:

\begin{verbatim}
import numpyro
import numpyro.distributions as dist
import jax.numpy as jnp

def unemployment_model(u):
    # Priors
    ubar  = numpyro.sample("ubar",  dist.Normal(5.0, 1.5))
    beta  = numpyro.sample("beta",  dist.TruncatedNormal(
                                        -0.3, 0.2, high=0.0))
    lam   = numpyro.sample("lam",   dist.HalfNormal(1.0))
    alpha = numpyro.sample("alpha", dist.Normal(0.0, 0.2))
    sigma = numpyro.sample("sigma", dist.HalfNormal(0.5))

    # Conditional mean
    gap      = u[:-1] - ubar
    mu       = u[:-1] + alpha + beta * jnp.sinh(lam * gap)

    # Likelihood
    with numpyro.plate("obs", len(u) - 1):
        numpyro.sample("u_obs", dist.Normal(mu, sigma),
                       obs=u[1:])
\end{verbatim}

Run with \texttt{numpyro.infer.NUTS} and
\texttt{numpyro.infer.MCMC}.  Typical settings: 4 chains, 1000
warmup steps, 2000 sampling steps.  On a laptop CPU with monthly
post-1984 data this runs in under a minute.

\subsection{Diagnostics checklist}

\begin{enumerate}[itemsep=4pt]

  \item \textbf{Trace plots.}  Visual inspection of MCMC chains for
        each parameter.  Well-mixed chains look like white noise
        around a stable mean.

  \item \textbf{$\hat R$ statistic.}  Values above $1.01$ signal
        convergence failure.  Use \texttt{arviz.summary()} which
        computes $\hat R$ for all parameters simultaneously.

  \item \textbf{Effective sample size (ESS).}  Both bulk-ESS and
        tail-ESS should be reported.  Low tail-ESS indicates poor
        exploration of the tails of the posterior, which matters for
        risk-oriented applications.

  \item \textbf{Energy plots.}  ArviZ's \texttt{plot\_energy()} shows
        whether the Hamiltonian energy distribution is well-behaved.
        A gap between the marginal energy distribution and the
        energy transition distribution suggests the sampler is not
        exploring the posterior well.

  \item \textbf{Pair plots.}  Scatter plots of all parameter pairs
        from posterior draws, coloured by chain.  The
        $(\beta,\lambda)$ pair will show the ridge; pair plots also
        reveal any multimodality.

  \item \textbf{Posterior predictive checks.}  Compare observed
        $\Delta u_t$ to the posterior predictive distribution of
        $\Delta u_t$.  Plot the observed series against a band of
        posterior predictive paths.

  \item \textbf{Leave-one-out cross-validation (LOO-CV).}  ArviZ
        implements Pareto-smoothed importance sampling LOO
        (\texttt{arviz.loo()}).  High Pareto $k$ values flag
        observations that are influential and poorly fit by the
        model.  The COVID observation will certainly flag here.

\end{enumerate}

\subsection{Alternative samplers}

A valuable component of the lecture is comparing NUTS to simpler
alternatives.

\begin{itemize}[itemsep=6pt]

  \item \textbf{Random-walk Metropolis--Hastings (RWMH).}
        Implement a simple RWMH sampler in JAX for this model.
        This is conceptually straightforward and students can write
        it from scratch.  Compare ESS per second to NUTS.  The
        comparison will be unfavourable for RWMH on the
        $(\beta,\lambda)$ ridge.

  \item \textbf{NUTS vs HMC with fixed step count.}  NumPyro exposes
        both.  NUTS adapts the trajectory length automatically; fixed
        HMC requires tuning.  Show the sensitivity of fixed HMC to
        the number of leapfrog steps.

  \item \textbf{Variational inference (SVI).}  NumPyro supports
        automatic guide generation via \texttt{AutoNormal} and
        \texttt{AutoMultivariateNormal}.  A mean-field
        \texttt{AutoNormal} guide will miss the $(\beta,\lambda)$
        correlation; a full-rank \texttt{AutoMultivariateNormal}
        will capture it.  Comparing these to the NUTS posterior is
        instructive.

\end{itemize}

% =========================================================
\section{Extensions and Connections}
% =========================================================

\subsection{Time-varying natural rate}

Replace the fixed $\bar u$ with a slowly drifting process:
\[
  \bar u_t = \bar u_{t-1} + \xi_t, \quad \xi_t \sim \mathcal{N}(0, \tau^2),
\]
with a tight prior on $\tau$ (e.g.\ $\mathrm{HalfNormal}(0.05)$).
Now the model has a latent state and filtering becomes relevant.
This is the natural bridge to the Kalman filter lectures already in
QuantEcon and to the fisheries application where latent states are
unavoidable.

\subsection{Heteroskedastic shocks}

Allow $\sigma_t^2$ to depend on the gap:
\[
  \sigma_t^2 = \sigma_0^2 + \sigma_1^2 \cdot \mathbf{1}[u_{t-1} > \bar u].
\]
This captures the well-documented increase in volatility during
recessions.  It adds one parameter and can be estimated without
modification to the NUTS setup.

\subsection{Connection to surplus production models}

The logistic surplus production model (Schaefer 1954) for fish
biomass $B_t$ is:
\[
  B_t = B_{t-1} + r B_{t-1}\!\left(1 - \frac{B_{t-1}}{K}\right) - C_{t-1}
        + \eta_t,
\]
where $r$ is the intrinsic growth rate, $K$ is the carrying capacity,
and $C_{t-1}$ is the catch.  Setting $C_{t-1} = 0$ and defining
$g_t = B_{t-1} - K/2$ (the gap from the biomass that maximises
sustainable yield), the growth term becomes
\[
  r B_{t-1}\!\left(1 - \frac{B_{t-1}}{K}\right)
  \;=\;
  \frac{r}{4K}(K^2 - 4g_t^2)
  \;=\;
  \frac{rK}{4} - \frac{r}{K}g_t^2,
\]
a concave function of $g_t$ that pushes $B_t$ back toward $K/2$.
The $\sinh$ model is a smooth, globally defined analogue: it too
pushes $u_t$ back toward $\bar u$ with a force that grows super-linearly
with the gap.  Making this structural analogy explicit helps students
transfer the Bayesian estimation skills from the unemployment setting
to fisheries without relearning the concepts.

% =========================================================
\section{Suggested Lecture Structure}
% =========================================================

\begin{enumerate}[itemsep=6pt]

  \item \textbf{Motivation} (10 min).  Plot US unemployment 1948--present.
        Highlight asymmetry and nonlinear mean reversion visually.
        Show that a linear AR(1) on the gap produces symmetric,
        constant-speed reversion — a poor fit.

  \item \textbf{The $\sinh$ model} (15 min).  Derive
        equation~\eqref{eq:model}.  Plot the restoring force as a
        function of the gap for different $\lambda$.  Discuss the
        linear approximation and when it breaks down.

  \item \textbf{Bayesian formulation} (15 min).  Write down the
        likelihood and priors.  Discuss prior choices.  Show prior
        predictive simulations — draws from the prior predictive
        distribution — to sanity-check that priors are reasonable.

  \item \textbf{NUTS estimation in NumPyro} (20 min).  Live coding
        or walk through the model function.  Run the sampler.  Show
        trace plots and $\hat R$ in real time.

  \item \textbf{Posterior analysis} (20 min).  Marginal posteriors,
        pair plots, $(\beta,\lambda)$ ridge.  Posterior predictive
        checks.  Answer the substantive questions from
        Section~5.1.

  \item \textbf{Comparison with MH} (10 min).  Run the hand-coded
        RWMH.  Compare mixing and ESS.  Discuss why HMC is better
        for curved posteriors.

  \item \textbf{Outlook} (5 min).  Sketch the extension to latent
        states and the fisheries application.

\end{enumerate}

% =========================================================
\bibliographystyle{plainnat}
\begin{thebibliography}{99}

\bibitem[Neftçi(1984)]{Neftci1984}
Neftçi, S.N.\ (1984).
\newblock Are economic time series asymmetric over the business cycle?
\newblock \textit{Journal of Political Economy}, 92(2):307--328.

\bibitem[Hamilton(1989)]{Hamilton1989}
Hamilton, J.D.\ (1989).
\newblock A new approach to the economic analysis of nonstationary time series
  and the business cycle.
\newblock \textit{Econometrica}, 57(2):357--384.

\bibitem[Koop and Potter(1999)]{KoopPotter1999}
Koop, G.\ and Potter, S.M.\ (1999).
\newblock Dynamic asymmetries in US unemployment.
\newblock \textit{Journal of Business \& Economic Statistics}, 17(3):298--312.

\bibitem[Stock and Watson(1998)]{StockWatson1998}
Stock, J.H.\ and Watson, M.W.\ (1998).
\newblock Median unbiased estimation of coefficient variance in a time-varying
  parameter model.
\newblock \textit{Journal of the American Statistical Association},
  93(441):349--358.

\bibitem[Schaefer(1954)]{Schaefer1954}
Schaefer, M.B.\ (1954).
\newblock Some aspects of the dynamics of populations important to the
  management of commercial marine fisheries.
\newblock \textit{Bulletin of the Inter-American Tropical Tuna Commission},
  1(2):27--56.

\bibitem[Hoffman and Gelman(2014)]{HoffmanGelman2014}
Hoffman, M.D.\ and Gelman, A.\ (2014).
\newblock The No-U-Turn Sampler: Adaptively setting path lengths in
  Hamiltonian Monte Carlo.
\newblock \textit{Journal of Machine Learning Research}, 15:1593--1623.

\bibitem[Phan et al.(2019)]{Phan2019}
Phan, D., Pradhan, N., and Jankowiak, M.\ (2019).
\newblock Composable effects for flexible and accelerated probabilistic
  programming in NumPyro.
\newblock \textit{arXiv preprint arXiv:1912.11554}.

\bibitem[Vehtari et al.(2021)]{Vehtari2021}
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., and B{\"u}rkner, P.-C.\
  (2021).
\newblock Rank-normalization, folding, and localization: An improved $\hat{R}$
  for assessing convergence of MCMC.
\newblock \textit{Bayesian Analysis}, 16(2):667--718.

\end{thebibliography}

\end{document}

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions