(Embedded for self-containment. John may also drag-drop the original .tex onto this issue if a downloadable copy is preferred.)
\documentclass[12pt]{article}
\usepackage{amsmath, amssymb, amsthm}
\usepackage{geometry}
\usepackage{hyperref}
\usepackage{booktabs}
\usepackage{enumitem}
\usepackage{xcolor}
\usepackage{natbib}
\geometry{margin=1.2in}
\hypersetup{
colorlinks=true,
linkcolor=blue!60!black,
citecolor=blue!60!black,
urlcolor=blue!60!black
}
\newtheorem{remark}{Remark}
\newtheorem{example}{Example}
\title{\textbf{Bayesian Estimation of a Nonlinear Unemployment Dynamics Model}\\
\large{Notes Toward a QuantEcon Lecture}}
\author{John Stachurski}
\date{\today}
\begin{document}
\maketitle
\tableofcontents
\bigskip
% =========================================================
\section{Introduction and Motivation}
% =========================================================
These notes develop a QuantEcon lecture on Bayesian estimation of a
nonlinear time series model. The immediate application is unemployment
dynamics, but the modelling ideas and computational techniques carry
over directly to a range of settings in economics, ecology, and
fisheries management.
\subsection{Why nonlinear dynamics?}
Standard linear autoregressive models of macroeconomic time series
impose symmetry and constant mean-reversion speed by construction.
For unemployment these assumptions are empirically questionable. A
well-documented stylized fact, going back to \citet{Neftci1984}, is
that unemployment rises quickly during recessions and falls slowly
during recoveries. A second feature, less discussed but equally
visible in the data, is that the restoring force pulling unemployment
back toward its long-run equilibrium appears stronger the further the
series is from that equilibrium. Near the natural rate the series
drifts; far above it the series snaps back.
Capturing these features requires a nonlinear model. We develop one
that is parsimonious, economically interpretable, and — crucially for
our purposes — smooth enough that modern Hamiltonian Monte Carlo (HMC)
samplers work without modification.
\subsection{Why Bayesian estimation?}
The model contains parameters that are only weakly identified from
data: the natural rate $\bar u$ and the nonlinearity scale $\lambda$
are an example of a pair that cannot be sharply separated without
strong prior information. Classical point estimation would paper over
this uncertainty. Bayesian estimation surfaces it explicitly through
the joint posterior, making the lecture a natural vehicle for teaching
several foundational ideas:
\begin{itemize}[itemsep=4pt]
\item how prior choice affects inference when identification is weak,
\item how to read and interpret posterior correlations between
parameters,
\item how to assess model fit through posterior predictive checks,
\item how NUTS/HMC differs from simpler samplers such as
Metropolis--Hastings, and when the difference matters.
\end{itemize}
\subsection{Connection to fisheries and the broader research programme}
This lecture is designed as the first step in a sequence. The
nonlinear mean-reversion structure developed here is structurally
identical to logistic growth in a surplus production model for
fisheries stock assessment. The $\sinh$ restoring force below plays
exactly the role that the Schaefer production term $rB_t(1-B_t/K)$
plays in a biomass dynamics model. Both are smooth, odd functions of
the gap from equilibrium; both are approximately linear near
equilibrium and superlinear in the tails. Mastering Bayesian
estimation in the unemployment setting — where the data are familiar
and the equilibrium concept is well understood — provides the
conceptual and computational foundation for the fisheries application,
where the latent state (biomass) is never directly observed.
% =========================================================
\section{The Model}
% =========================================================
\subsection{Notation and setup}
Let $u_t \in (0, 1)$ denote the civilian unemployment rate at time
$t$, measured as a proportion (so $u_t = 0.05$ corresponds to $5\%$).
In practice we work with percentage points; the priors below are
stated in those units. Let $T$ denote the sample size and write
$\mathbf{u} = (u_1, \ldots, u_T)$ for the observed data.
The model is a first-order nonlinear autoregression for the change in
unemployment:
\begin{equation}
\label{eq:model}
\Delta u_t \;=\; \alpha \;+\; \beta\,\sinh\!\bigl(\lambda(u_{t-1} - \bar u)\bigr)
\;+\; \epsilon_t,
\qquad \epsilon_t \overset{\text{iid}}{\sim} \mathcal{N}(0,\,\sigma^2),
\end{equation}
where $\Delta u_t = u_t - u_{t-1}$. The model has five parameters:
\begin{center}
\begin{tabular}{lll}
\toprule
Parameter & Symbol & Role \\
\midrule
Natural rate & $\bar u > 0$ & Equilibrium toward which $u_t$ reverts \\
Reversion strength & $\beta < 0$ & Overall speed of mean reversion \\
Nonlinearity scale & $\lambda > 0$ & Controls how quickly nonlinearity bites \\
Drift & $\alpha \in \mathbb{R}$ & Residual asymmetry / long-run drift \\
Shock volatility & $\sigma > 0$ & Standard deviation of innovations \\
\bottomrule
\end{tabular}
\end{center}
\subsection{The \texorpdfstring{$\sinh$}{sinh} restoring force}
The key term is $\sinh(\lambda g_t)$ where $g_t = u_{t-1} - \bar u$ is
the unemployment gap. Recall
\[
\sinh(x) \;=\; \frac{e^x - e^{-x}}{2}.
\]
This function has three properties that make it attractive here.
\begin{enumerate}[itemsep=4pt]
\item \textbf{Antisymmetry.} $\sinh(-x) = -\sinh(x)$, so the
restoring force is in the correct direction regardless of the
sign of the gap.
\item \textbf{Near-linear for small gaps.} By Taylor expansion,
$\sinh(x) = x + x^3/6 + O(x^5)$, so for $\lambda g_t$ small
the model reduces to a standard linear AR(1) gap equation:
\[
\Delta u_t \approx \alpha + \beta\lambda(u_{t-1} - \bar u)
+ \epsilon_t.
\]
The composite parameter $\beta\lambda$ plays the role of the
mean-reversion coefficient in the linear model.
\item \textbf{Super-exponential for large gaps.} For large $|x|$,
$\sinh(x) \approx \tfrac{1}{2}e^{|x|}$, so the restoring
force grows much faster than the gap itself. This captures
the intuition that large deviations from the natural rate
generate unusually strong corrective pressure.
\end{enumerate}
Figure~\ref{fig:sinh} illustrates the restoring force
$-\beta\sinh(\lambda g)$ as a function of $g$ for representative
parameter values. (This figure should be generated in the lecture
notebook.)
\begin{remark}
The parameter $\lambda$ controls the threshold at which nonlinearity
becomes important. A small $\lambda$ keeps the model close to
linear over the observed range of unemployment gaps; a large
$\lambda$ pushes the model into its nonlinear regime even for modest
gaps. Whether the data support a large $\lambda$ is one of the
interesting questions the posterior will answer.
\end{remark}
\subsection{Identification and the \texorpdfstring{$\beta$--$\lambda$}{beta-lambda} problem}
There is a well-known identification tension between $\beta$ and
$\lambda$. Near equilibrium, the first-order dynamics depend on
$u_{t-1} - \bar u$ only through the product $\beta\lambda$. Two
parameter combinations $(\beta_1, \lambda_1)$ and $(\beta_2,
\lambda_2)$ with $\beta_1\lambda_1 = \beta_2\lambda_2$ produce
indistinguishable dynamics in a neighbourhood of $\bar u$. Only
observations in the tails — large gap episodes — identify $\lambda$
separately from $\beta$.
This near-collinearity will appear as a ridge in the joint posterior
of $(\beta, \lambda)$, which NUTS handles gracefully (it explores
curved posteriors efficiently) but which Metropolis--Hastings with an
isotropic proposal handles poorly. Making this contrast vivid is a
good teaching moment. One practical remedy is to normalise the gap
before estimation: define $\tilde g_t = (u_{t-1} - \bar u)/s$ where
$s$ is a prior scale for unemployment gaps (say $s = 2$ percentage
points), so that the argument to $\sinh$ is dimensionless and of order
one. Then $\lambda$ measures deviations in units of $s$ rather than
raw percentage points.
\subsection{Adding explicit asymmetry}
The basic model~\eqref{eq:model} is symmetric around $\bar u$: the
restoring force is equally strong above and below equilibrium. To
capture the Neftçi asymmetry — faster rises than falls — one can add
a hinge term:
\begin{equation}
\label{eq:asymmetric}
\Delta u_t \;=\; \alpha \;+\; \beta\,\sinh\!\bigl(\lambda(u_{t-1}-\bar u)\bigr)
\;+\; \gamma\,(u_{t-1} - \bar u)^+
\;+\; \epsilon_t,
\end{equation}
where $(x)^+ = \max(x,0)$. With $\gamma < 0$ this adds extra
downward pull when unemployment is above the natural rate, generating
faster recovery from recessions than the symmetric model. The
$\sinh$ term handles the nonlinear mean reversion; the hinge term
handles the directional asymmetry. These are conceptually distinct
features and separating them makes both easier to interpret.
\begin{remark}
An alternative to the hinge term is to replace the Gaussian
$\epsilon_t$ with a skew-Normal distribution, putting the asymmetry
in the shock distribution rather than the conditional mean. Both
formulations are worth exploring with students; they have different
interpretations and in general give different fits.
\end{remark}
% =========================================================
\section{Bayesian Formulation}
% =========================================================
\subsection{Likelihood}
Conditional on parameters $\theta = (\alpha, \beta, \lambda, \bar u,
\sigma)$ and initial observation $u_0$, the observations
$u_1,\ldots,u_T$ are independent with
\begin{equation}
u_t \mid u_{t-1}, \theta
\;\sim\;
\mathcal{N}\!\Bigl(
u_{t-1} + \alpha + \beta\sinh\bigl(\lambda(u_{t-1}-\bar u)\bigr),\;
\sigma^2
\Bigr).
\end{equation}
The log-likelihood is therefore
\begin{equation}
\ell(\theta;\mathbf{u})
\;=\;
-\frac{T}{2}\log(2\pi\sigma^2)
\;-\;
\frac{1}{2\sigma^2}
\sum_{t=1}^{T}
\Bigl[
\Delta u_t - \alpha - \beta\sinh\bigl(\lambda(u_{t-1}-\bar u)\bigr)
\Bigr]^2.
\end{equation}
This is differentiable in all parameters, which is why NUTS applies
without modification.
\subsection{Prior distributions}
The priors below are stated in percentage-point units (so $u_t = 5.0$
means $5\%$). All are weakly informative: they rule out economically
implausible values while remaining diffuse over the plausible range.
\begin{align}
\bar u &\;\sim\; \mathcal{N}(5.0,\; 1.5^2)
\label{prior:ubar}
\tag{P1} \\[4pt]
\beta &\;\sim\; \mathcal{N}(-0.3,\; 0.2^2),
\quad \beta < 0
\label{prior:beta}
\tag{P2} \\[4pt]
\lambda &\;\sim\; \mathrm{HalfNormal}(1.0)
\label{prior:lambda}
\tag{P3} \\[4pt]
\alpha &\;\sim\; \mathcal{N}(0,\; 0.2^2)
\label{prior:alpha}
\tag{P4} \\[4pt]
\sigma &\;\sim\; \mathrm{HalfNormal}(0.5)
\label{prior:sigma}
\tag{P5}
\end{align}
\paragraph{Notes on prior choices.}
\begin{itemize}[itemsep=4pt]
\item \textbf{Natural rate \eqref{prior:ubar}.} The US civilian
unemployment rate has averaged approximately $5.5\%$ since
1948 and has rarely sustained values below $3.5\%$ or above
$11\%$. A $\mathcal{N}(5.0, 1.5^2)$ prior puts the
$95\%$ credible interval at roughly $(2.1, 7.9)$, which is
consistent with most estimates of the NAIRU.
\item \textbf{Reversion strength \eqref{prior:beta}.} The
truncation $\beta < 0$ enforces stationarity (mean reversion
toward $\bar u$). The prior mean of $-0.3$ implies that, in
the linear approximation, roughly $30\%$ of a one-unit gap is
closed each period. This is moderate mean reversion.
\item \textbf{Nonlinearity scale \eqref{prior:lambda}.} A
$\mathrm{HalfNormal}(1.0)$ prior places most mass on $\lambda
\in (0, 2)$. Since $\sinh(x) \approx x$ for $|x| \lesssim
0.5$, a value $\lambda = 1$ implies nonlinearity becomes
important when the gap exceeds about half a percentage point.
Students should be encouraged to try alternative scales and
observe the effect on the posterior.
\item \textbf{Drift \eqref{prior:alpha}.} Centered at zero,
reflecting the prior belief that there is no secular trend in
unemployment. A non-zero posterior mean for $\alpha$ would be
substantively interesting.
\item \textbf{Volatility \eqref{prior:sigma}.} Monthly changes in
the unemployment rate are typically small (standard deviation
around $0.2$ percentage points in normal times, larger during
recessions), so $\mathrm{HalfNormal}(0.5)$ is diffuse but not
uninformative.
\end{itemize}
\subsection{Posterior}
By Bayes' theorem the joint posterior is
\begin{equation}
p(\theta \mid \mathbf{u})
\;\propto\;
\exp\!\bigl(\ell(\theta;\mathbf{u})\bigr)
\cdot p(\theta),
\end{equation}
where $p(\theta) = p(\bar u)\,p(\beta)\,p(\lambda)\,p(\alpha)\,p(\sigma)$
is the joint prior. This posterior has no closed form, so we use
MCMC.
% =========================================================
\section{Data}
% =========================================================
\subsection{Primary source}
The natural dataset is the \textbf{US Civilian Unemployment Rate}
published by the Bureau of Labor Statistics, series
\texttt{UNRATE}, available from FRED (Federal Reserve Bank of St.\
Louis). It is monthly, seasonally adjusted, from January 1948 to
the present. In Python:
\begin{verbatim}
import pandas_datareader.data as web
import datetime
start = datetime.datetime(1948, 1, 1)
end = datetime.datetime(2024, 12, 31)
u = web.DataReader("UNRATE", "fred", start, end)
\end{verbatim}
Alternatively the data can be downloaded directly from
\url{https://fred.stlouisfed.org/series/UNRATE} and read with
\texttt{pandas}.
\subsection{Sample choice and preprocessing}
Several choices are worth discussing with students:
\begin{itemize}[itemsep=4pt]
\item \textbf{Full sample (1948--present)} includes the Korean War
boom, the stagflation era of the 1970s--80s, and the
COVID-19 spike of 2020. Structural breaks are likely.
\item \textbf{Post-1984 ``Great Moderation'' sample} (1984--2019)
is more homogeneous and easier to model with a single
parameter set. The COVID observation ($u_t \approx 14.7\%$
in April 2020) is an extreme outlier and can be treated as a
special case.
\item \textbf{Quarterly data} (constructed by taking end-of-quarter
values or averages) reduces autocorrelation in $\epsilon_t$
and may be more appropriate for a model without MA terms.
\end{itemize}
For a first lecture, the post-1984 monthly series excluding the
COVID spike is a reasonable default. The COVID observation can be
reintroduced as an exercise: does the posterior change substantially?
Does the model predict it as a tail event?
\subsection{Supplementary series}
The following related series from FRED are useful for robustness
checks and extensions:
\begin{center}
\begin{tabular}{lll}
\toprule
Series & FRED code & Description \\
\midrule
Short-term unemployment & \texttt{U6RATE} & Includes marginally attached \\
Long-term unemployment & \texttt{LNS13025703} & 27 weeks or more \\
Youth unemployment & \texttt{LNS14000012} & Age 16--24 \\
State-level rates & \texttt{*UR} codes & 50 states, useful for panel \\
\bottomrule
\end{tabular}
\end{center}
% =========================================================
\section{Interesting Questions and Exercises}
% =========================================================
\subsection{Substantive questions}
The following questions can be addressed directly from the posterior
and are likely to generate interesting class discussion.
\begin{enumerate}[itemsep=6pt]
\item \textbf{Is the nonlinearity real?} The posterior of $\lambda$
answers this. If the posterior is concentrated away from
zero, the data support nonlinear mean reversion. If it is
diffuse and piles up near zero, the linear model is
adequate. This is a natural Bayesian model comparison
exercise.
\item \textbf{Where is the natural rate?} The posterior of $\bar u$
gives a full probability distribution over the natural rate.
Compare this to external estimates (the CBO publishes a
long-run NAIRU estimate). The posterior width communicates
the genuine uncertainty in this concept.
\item \textbf{Is there genuine asymmetry?} The extended
model~\eqref{eq:asymmetric} with hinge parameter $\gamma$
tests this directly. Students can compare the posterior of
$\gamma$ to zero, or compare marginal likelihoods of the
symmetric and asymmetric specifications.
\item \textbf{How did the Great Recession change things?} Split
the sample at 2007:Q4 and re-estimate on each sub-sample.
Do $\bar u$, $\beta$, and $\lambda$ shift? Does the
posterior of $\bar u$ post-2008 lie above the pre-2008
posterior, consistent with hysteresis?
\item \textbf{What does the model say about the COVID spike?}
Compute the posterior predictive probability of observing a
gap as large as April 2020 under the estimated model. This
is a tail probability question and a good illustration of
posterior predictive checks.
\end{enumerate}
\subsection{Methodological questions for teaching Bayesian foundations}
\begin{enumerate}[itemsep=6pt]
\item \textbf{Prior sensitivity.} Tighten and loosen the prior on
$\bar u$ and $\lambda$. Which parameters are prior-sensitive?
Which are data-dominated? This exercise builds intuition
about the role of sample size and identification strength.
\item \textbf{The $\beta$--$\lambda$ ridge.} Plot the joint
posterior of $(\beta, \lambda)$ as a scatter of MCMC draws.
The ridge structure is a vivid illustration of weak
identification and motivates the normalisation discussed in
Section~\ref{sec:ident}.
\item \textbf{Convergence and mixing.} Run multiple chains from
dispersed starting points. Compute $\hat R$ (the
Gelman--Rubin statistic) for each parameter. For NUTS this
should be close to 1.0; for a poorly tuned
Metropolis--Hastings it may not be.
\item \textbf{Effective sample size.} Compare the ESS per second
for NUTS versus a random-walk Metropolis--Hastings sampler.
The comparison is especially stark for the $(\beta,\lambda)$
pair, where NUTS exploits gradient information to traverse
the ridge efficiently.
\item \textbf{Posterior predictive checks.} Generate simulated
paths $\tilde{\mathbf{u}}$ from the posterior predictive
distribution. Compare the distribution of simulated paths to
the observed series on summary statistics: proportion of time
above $7\%$, mean duration of spells above $7\%$, maximum
one-year change. Systematic discrepancies reveal model
misspecification.
\end{enumerate}
% =========================================================
\section{Computational Implementation}
% =========================================================
\subsection{NumPyro model}
The model maps naturally to NumPyro. A sketch of the model
function:
\begin{verbatim}
import numpyro
import numpyro.distributions as dist
import jax.numpy as jnp
def unemployment_model(u):
# Priors
ubar = numpyro.sample("ubar", dist.Normal(5.0, 1.5))
beta = numpyro.sample("beta", dist.TruncatedNormal(
-0.3, 0.2, high=0.0))
lam = numpyro.sample("lam", dist.HalfNormal(1.0))
alpha = numpyro.sample("alpha", dist.Normal(0.0, 0.2))
sigma = numpyro.sample("sigma", dist.HalfNormal(0.5))
# Conditional mean
gap = u[:-1] - ubar
mu = u[:-1] + alpha + beta * jnp.sinh(lam * gap)
# Likelihood
with numpyro.plate("obs", len(u) - 1):
numpyro.sample("u_obs", dist.Normal(mu, sigma),
obs=u[1:])
\end{verbatim}
Run with \texttt{numpyro.infer.NUTS} and
\texttt{numpyro.infer.MCMC}. Typical settings: 4 chains, 1000
warmup steps, 2000 sampling steps. On a laptop CPU with monthly
post-1984 data this runs in under a minute.
\subsection{Diagnostics checklist}
\begin{enumerate}[itemsep=4pt]
\item \textbf{Trace plots.} Visual inspection of MCMC chains for
each parameter. Well-mixed chains look like white noise
around a stable mean.
\item \textbf{$\hat R$ statistic.} Values above $1.01$ signal
convergence failure. Use \texttt{arviz.summary()} which
computes $\hat R$ for all parameters simultaneously.
\item \textbf{Effective sample size (ESS).} Both bulk-ESS and
tail-ESS should be reported. Low tail-ESS indicates poor
exploration of the tails of the posterior, which matters for
risk-oriented applications.
\item \textbf{Energy plots.} ArviZ's \texttt{plot\_energy()} shows
whether the Hamiltonian energy distribution is well-behaved.
A gap between the marginal energy distribution and the
energy transition distribution suggests the sampler is not
exploring the posterior well.
\item \textbf{Pair plots.} Scatter plots of all parameter pairs
from posterior draws, coloured by chain. The
$(\beta,\lambda)$ pair will show the ridge; pair plots also
reveal any multimodality.
\item \textbf{Posterior predictive checks.} Compare observed
$\Delta u_t$ to the posterior predictive distribution of
$\Delta u_t$. Plot the observed series against a band of
posterior predictive paths.
\item \textbf{Leave-one-out cross-validation (LOO-CV).} ArviZ
implements Pareto-smoothed importance sampling LOO
(\texttt{arviz.loo()}). High Pareto $k$ values flag
observations that are influential and poorly fit by the
model. The COVID observation will certainly flag here.
\end{enumerate}
\subsection{Alternative samplers}
A valuable component of the lecture is comparing NUTS to simpler
alternatives.
\begin{itemize}[itemsep=6pt]
\item \textbf{Random-walk Metropolis--Hastings (RWMH).}
Implement a simple RWMH sampler in JAX for this model.
This is conceptually straightforward and students can write
it from scratch. Compare ESS per second to NUTS. The
comparison will be unfavourable for RWMH on the
$(\beta,\lambda)$ ridge.
\item \textbf{NUTS vs HMC with fixed step count.} NumPyro exposes
both. NUTS adapts the trajectory length automatically; fixed
HMC requires tuning. Show the sensitivity of fixed HMC to
the number of leapfrog steps.
\item \textbf{Variational inference (SVI).} NumPyro supports
automatic guide generation via \texttt{AutoNormal} and
\texttt{AutoMultivariateNormal}. A mean-field
\texttt{AutoNormal} guide will miss the $(\beta,\lambda)$
correlation; a full-rank \texttt{AutoMultivariateNormal}
will capture it. Comparing these to the NUTS posterior is
instructive.
\end{itemize}
% =========================================================
\section{Extensions and Connections}
% =========================================================
\subsection{Time-varying natural rate}
Replace the fixed $\bar u$ with a slowly drifting process:
\[
\bar u_t = \bar u_{t-1} + \xi_t, \quad \xi_t \sim \mathcal{N}(0, \tau^2),
\]
with a tight prior on $\tau$ (e.g.\ $\mathrm{HalfNormal}(0.05)$).
Now the model has a latent state and filtering becomes relevant.
This is the natural bridge to the Kalman filter lectures already in
QuantEcon and to the fisheries application where latent states are
unavoidable.
\subsection{Heteroskedastic shocks}
Allow $\sigma_t^2$ to depend on the gap:
\[
\sigma_t^2 = \sigma_0^2 + \sigma_1^2 \cdot \mathbf{1}[u_{t-1} > \bar u].
\]
This captures the well-documented increase in volatility during
recessions. It adds one parameter and can be estimated without
modification to the NUTS setup.
\subsection{Connection to surplus production models}
The logistic surplus production model (Schaefer 1954) for fish
biomass $B_t$ is:
\[
B_t = B_{t-1} + r B_{t-1}\!\left(1 - \frac{B_{t-1}}{K}\right) - C_{t-1}
+ \eta_t,
\]
where $r$ is the intrinsic growth rate, $K$ is the carrying capacity,
and $C_{t-1}$ is the catch. Setting $C_{t-1} = 0$ and defining
$g_t = B_{t-1} - K/2$ (the gap from the biomass that maximises
sustainable yield), the growth term becomes
\[
r B_{t-1}\!\left(1 - \frac{B_{t-1}}{K}\right)
\;=\;
\frac{r}{4K}(K^2 - 4g_t^2)
\;=\;
\frac{rK}{4} - \frac{r}{K}g_t^2,
\]
a concave function of $g_t$ that pushes $B_t$ back toward $K/2$.
The $\sinh$ model is a smooth, globally defined analogue: it too
pushes $u_t$ back toward $\bar u$ with a force that grows super-linearly
with the gap. Making this structural analogy explicit helps students
transfer the Bayesian estimation skills from the unemployment setting
to fisheries without relearning the concepts.
% =========================================================
\section{Suggested Lecture Structure}
% =========================================================
\begin{enumerate}[itemsep=6pt]
\item \textbf{Motivation} (10 min). Plot US unemployment 1948--present.
Highlight asymmetry and nonlinear mean reversion visually.
Show that a linear AR(1) on the gap produces symmetric,
constant-speed reversion — a poor fit.
\item \textbf{The $\sinh$ model} (15 min). Derive
equation~\eqref{eq:model}. Plot the restoring force as a
function of the gap for different $\lambda$. Discuss the
linear approximation and when it breaks down.
\item \textbf{Bayesian formulation} (15 min). Write down the
likelihood and priors. Discuss prior choices. Show prior
predictive simulations — draws from the prior predictive
distribution — to sanity-check that priors are reasonable.
\item \textbf{NUTS estimation in NumPyro} (20 min). Live coding
or walk through the model function. Run the sampler. Show
trace plots and $\hat R$ in real time.
\item \textbf{Posterior analysis} (20 min). Marginal posteriors,
pair plots, $(\beta,\lambda)$ ridge. Posterior predictive
checks. Answer the substantive questions from
Section~5.1.
\item \textbf{Comparison with MH} (10 min). Run the hand-coded
RWMH. Compare mixing and ESS. Discuss why HMC is better
for curved posteriors.
\item \textbf{Outlook} (5 min). Sketch the extension to latent
states and the fisheries application.
\end{enumerate}
% =========================================================
\bibliographystyle{plainnat}
\begin{thebibliography}{99}
\bibitem[Neftçi(1984)]{Neftci1984}
Neftçi, S.N.\ (1984).
\newblock Are economic time series asymmetric over the business cycle?
\newblock \textit{Journal of Political Economy}, 92(2):307--328.
\bibitem[Hamilton(1989)]{Hamilton1989}
Hamilton, J.D.\ (1989).
\newblock A new approach to the economic analysis of nonstationary time series
and the business cycle.
\newblock \textit{Econometrica}, 57(2):357--384.
\bibitem[Koop and Potter(1999)]{KoopPotter1999}
Koop, G.\ and Potter, S.M.\ (1999).
\newblock Dynamic asymmetries in US unemployment.
\newblock \textit{Journal of Business \& Economic Statistics}, 17(3):298--312.
\bibitem[Stock and Watson(1998)]{StockWatson1998}
Stock, J.H.\ and Watson, M.W.\ (1998).
\newblock Median unbiased estimation of coefficient variance in a time-varying
parameter model.
\newblock \textit{Journal of the American Statistical Association},
93(441):349--358.
\bibitem[Schaefer(1954)]{Schaefer1954}
Schaefer, M.B.\ (1954).
\newblock Some aspects of the dynamics of populations important to the
management of commercial marine fisheries.
\newblock \textit{Bulletin of the Inter-American Tropical Tuna Commission},
1(2):27--56.
\bibitem[Hoffman and Gelman(2014)]{HoffmanGelman2014}
Hoffman, M.D.\ and Gelman, A.\ (2014).
\newblock The No-U-Turn Sampler: Adaptively setting path lengths in
Hamiltonian Monte Carlo.
\newblock \textit{Journal of Machine Learning Research}, 15:1593--1623.
\bibitem[Phan et al.(2019)]{Phan2019}
Phan, D., Pradhan, N., and Jankowiak, M.\ (2019).
\newblock Composable effects for flexible and accelerated probabilistic
programming in NumPyro.
\newblock \textit{arXiv preprint arXiv:1912.11554}.
\bibitem[Vehtari et al.(2021)]{Vehtari2021}
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., and B{\"u}rkner, P.-C.\
(2021).
\newblock Rank-normalization, folding, and localization: An improved $\hat{R}$
for assessing convergence of MCMC.
\newblock \textit{Bayesian Analysis}, 16(2):667--718.
\end{thebibliography}
\end{document}
Goal
Add a new lecture,
unemployment_bayes.md— Bayesian Estimation of Nonlinear Unemployment Dynamics — covering a smooth nonlinear mean-reversion model for the US unemployment rate, estimated with NUTS in NumPyro, with the full Bayesian-workflow toolkit (prior/posterior predictive checks, R̂, ESS, energy/pair plots, LOO-PSIS, and a NUTS-vs-RWMH-vs-SVI comparison).It sits alongside the existing Bayesian lectures
ar1_bayes.mdandbayes_nonconj.md, and is intended as step one of a sequence leading toward a latent-state fisheries/stock-assessment lecture.Origin
Drafted from
unemployment_bayes_lecture.tex(full source embedded below). The draft was reviewed; the Bayesian scaffolding is strong, but the core model needed changes before implementation (see below). Decisions are settled.Review findings (settled)
The original draft used a
sinhrestoring force:Δu_t = α + β·sinh(λ(u_{t-1} − ū)) + ε_t. Three substantive problems:sinhis dynamically explosive over the empirically relevant range. A super-exponential force in a discrete-time map overshoots and diverges. At the prior means, a recession-sized gap (u = 8–9%, which occurred in 1975, 1982, 2009) detonates within a few steps; ~23% of draws from the stated priors produce impossible (negative or NaN) unemployment paths. Verified by simulation:sinhis symmetric, but the headline stylized fact is asymmetry (Neftçi: rises fast, falls slow). The oddsinhgives equal pull both directions; asymmetry was only bolted on as an afterthought.rB(1−B/K)is an even concave parabola, and K (not K/2) is the equilibrium — K/2 is the MSY biomass, which is only the rest point under MSY harvesting.Agreed design
tanh(in percentage points):βλ, preserving the β–λ identification story), and saturating (|tanh| ≤ 1) so the mean one-step pull is bounded by|β|and the process is globally stable. β reads directly as the max monthly mean-reversion pull (pp); λ sets the gap width of the linear→saturated transition. This is already an LSTAR-family form, tying it to the canonical nonlinear-unemployment literature (Koop–Potter, Teräsvirta).sinhdemoted to a cautionary prior-predictive exercise — show it detonates and that ~23% of prior draws are impossible. Lesson: prior predictive checks catch pathologies that look fine equation-by-equation. (Turns the original weakness into the best teaching moment.)γ(u_{t-1}−ū)⁺(intuitive), then LSTAR (literature-grade).ū~N(5,1.5²),β~TruncNormal(-0.3,0.2²,high=0),λ~HalfNormal(0.5),α~N(0,0.1²),σ~HalfNormal(0.3).x=logit(u), respects (0,1) exactly) offered as an extension, keeping the core interpretable in pp.Section plan
tanhsmooth-transition force; stability box vssinh; β–λ identificationsinhcautionary taleNext steps
tanhmodel on real UNRATE data; confirm the posterior, the β–λ ridge, and that PPC behavessinhcautionary-tale figures (deterministic blow-up + prior-predictive failure rate)lectures/unemployment_bayes.mdin MyST, followingar1_bayes.mdconventions (GPU admonition include, numpyro/jax/arviz/pymc imports)lectures/_toc.ymlnear the other Bayesian lecturesOriginal draft source —
unemployment_bayes_lecture.tex(Embedded for self-containment. John may also drag-drop the original
.texonto this issue if a downloadable copy is preferred.)