From 1a4f0523d9b38ac0f62c1dce87cd8e740c1849c9 Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Sun, 12 Apr 2026 12:59:29 -0600
Subject: [PATCH 01/26] Tom's edits of a new Kihlstrom lecture

---
 .../quantecon-lecture-writing.instructions.md |   87 ++
 .github/prompts/quantecon-lecture.prompt.md   |  209 +++
 lectures/_static/quant-econ.bib               |   53 +
 lectures/_toc.yml                             |    1 +
 lectures/information_market_equilibrium.md    | 1185 +++++++++++++++++
 5 files changed, 1535 insertions(+)
 create mode 100644 .github/instructions/quantecon-lecture-writing.instructions.md
 create mode 100644 .github/prompts/quantecon-lecture.prompt.md
 create mode 100644 lectures/information_market_equilibrium.md
diff --git a/.github/instructions/quantecon-lecture-writing.instructions.md b/.github/instructions/quantecon-lecture-writing.instructions.md
new file mode 100644
index 000000000..ee3b607f7
--- /dev/null
+++ b/.github/instructions/quantecon-lecture-writing.instructions.md
@@ -0,0 +1,87 @@
+---
+applyTo: "lectures/**/*.md"
+description: "MyST markdown and QuantEcon lecture writing conventions. Applied when editing or creating files in the lectures/ directory."
+---
+
+# QuantEcon Lecture Writing Conventions
+
+## Equation Spacing (Critical)
+
+Display equations **must** have a blank line before `$$` and after `$$`:
+
+```
+text before
+
+$$
+equation here
+$$
+
+text after
+```
+
+Never place `$$` immediately adjacent to text lines.
+
+## File Frontmatter
+
+Every lecture `.md` file starts with:
+
+```yaml
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+```
+
+## Cross-Reference Label
+
+Immediately after frontmatter, before the title:
+
+```
+(lecture_label)=
+```{raw} jupyter
+<div id="qe-notebook-header" ...>...</div>
+```
+
+# Title
+```
+
+## Code Cells
+
+All executable Python uses `` ```{code-cell} ipython3 ``.
+Non-executable code uses `` ```python ``.
+
+## Citations and References
+
+- Cite with `{cite}` `` `BibKey` ``
+- Check `lectures/_static/quant-econ.bib` for existing keys before adding new ones
+- New references go in a separate `_extra.bib` file alongside the lecture
+
+## Exercises
+
+Use the paired directives:
+
+```
+```{exercise}
+:label: label_ex1
+...
+```
+
+```{solution-start} label_ex1
+:class: dropdown
+```
+...
+```{solution-end}
+```
+```
+
+## Preferred Python Libraries
+
+`numpy`, `scipy`, `matplotlib`, `quantecon`, `jax` (for computationally intensive work), `numba`
diff --git a/.github/prompts/quantecon-lecture.prompt.md b/.github/prompts/quantecon-lecture.prompt.md
new file mode 100644
index 000000000..453651434
--- /dev/null
+++ b/.github/prompts/quantecon-lecture.prompt.md
@@ -0,0 +1,209 @@
+---
+name: "QuantEcon Lecture from Paper"
+description: "Convert a scientific paper (PDF or .tex) into a QuantEcon lecture in MyST markdown. Attach the paper file before invoking. Produces a .md lecture file and a supplementary .bib file."
+argument-hint: "Attach the paper PDF or .tex file, then optionally specify the desired output filename (e.g. 'my_topic.md')"
+agent: "agent"
+---
+
+You are helping Thomas Sargent convert a scientific paper into a QuantEcon lecture written in the MyST dialect of markdown, following the style and conventions of [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md).
+
+## Your Task
+
+1. **Read the attached paper** (PDF or .tex). Understand its core economic/mathematical content, key results, key intuitions, and analytical techniques.
+
+2. **Draft a complete QuantEcon lecture** as a `.md` file in `lectures/`. The lecture should:
+   - Explain the paper's ideas accessibly to a graduate student audience
+   - Lead the reader through the theory step by step, not just summarize
+   - Include substantial Python code cells that illustrate, compute, and visualize the paper's key results
+   - End with exercises (with full solutions in dropdown blocks)
+
+3. **Produce a supplementary `.bib` file** for any references not already in [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib).
+
+---
+
+## MyST / Jupyter Book Format Rules
+
+Follow these rules exactly. Study [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md) as the canonical example.
+
+### File Frontmatter (required, verbatim structure)
+
+```
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+```
+
+### Required Header Block
+
+Immediately after the frontmatter, add a cross-reference label and the QuantEcon logo block:
+
+```
+(my_lecture_label)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Lecture Title
+
+```{contents} Contents
+:depth: 2
+```
+```
+
+### Equations — CRITICAL SPACING RULE
+
+Every display equation block **must** have a blank line before the opening `$$` and a blank line after the closing `$$`. This is mandatory.
+
+**Correct:**
+
+```
+some text before
+
+$$
+E[x] = \mu
+$$
+
+some text after
+```
+
+**Wrong (will break the build):**
+
+```
+some text before
+$$
+E[x] = \mu
+$$
+some text after
+```
+
+Inline math uses single dollars: $\mu$, $\sigma^2$.
+
+Multi-line aligned equations use:
+
+```
+$$
+\begin{aligned}
+a &= b \\
+c &= d
+\end{aligned}
+$$
+```
+
+### Code Cells
+
+Use ` ```{code-cell} ipython3 ` for all executable Python. For the `pip install` cell at the top (if needed):
+
+```
+```{code-cell} ipython3
+:tags: [hide-output]
+!pip install --upgrade quantecon
+```
+```
+
+### Citations
+
+Use `{cite}` with the BibTeX key: `{cite}` `` `Author_Year` ``. Example: `{cite}` `` `Neyman_Pearson` ``.
+
+Check [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib) first. Add only truly missing references to the new `.bib` file.
+
+### Cross-references
+
+- Link to other lectures: `{doc}` `` `likelihood_ratio_process` ``
+- Label a section: `(my_label)=` on the line before the heading
+- Reference a label: `{ref}` `` `my_label` ``
+
+### Admonitions
+
+```
+```{note}
+...
+```
+
+```{warning}
+...
+```
+```
+
+### Exercises with Solutions
+
+```
+```{exercise}
+:label: ex_label1
+
+Exercise text here.
+```
+
+```{solution-start} ex_label1
+:class: dropdown
+```
+
+Full solution here, including code cells if needed.
+
+```{solution-end}
+```
+```
+
+---
+
+## Lecture Structure Template
+
+Follow this section order:
+
+1. **Overview** — What is this lecture about? What will the reader learn? List bullets.
+2. **Setup** — Imports code cell (all needed libraries). If non-standard packages are needed, add the `pip install` cell first.
+3. **Theory sections** — Walk through mathematical content. Alternate prose, equations, and code cells. Each major concept gets its own `##` section.
+4. **Computational/Simulation sections** — Python code that replicates or extends the paper's numerical results.
+5. **Exercises** — 2–4 exercises ranging from straightforward to challenging, each with a full solution.
+6. **References** — at the end, just add: `` ```{bibliography} `` on its own if references were cited (the global bib handles this automatically via `_config.yml`).
+
+---
+
+## Python Code Guidelines
+
+- Use `numpy`, `scipy`, `matplotlib`, `quantecon` as the default stack
+- Prefer `jax.numpy` / JAX for computationally intensive sections (this repo already has JAX installed)
+- Every figure should call `plt.show()` or `plt.tight_layout(); plt.show()`
+- Write clean, readable code with short docstrings on functions
+- Simulate and plot the paper's key theoretical results rather than just describing them
+
+---
+
+## Supplementary BibTeX File
+
+Name it `lectures/_static/<lecture_name>_extra.bib`. Format example:
+
+```bibtex
+@article{Author_Year,
+  author  = {Last, First and Last2, First2},
+  title   = {Full Title of the Paper},
+  journal = {Journal Name},
+  volume  = {XX},
+  number  = {Y},
+  pages   = {1--30},
+  year    = {YYYY}
+}
+```
+
+Only include references **not already found** in `lectures/_static/quant-econ.bib`.
+
+---
+
+## Output
+
+Produce the complete lecture as a single MyST markdown file. After completing it, also report:
+- The name and path of the output file (e.g. `lectures/my_topic.md`)
+- The name and path of the supplementary bib file (if any new references were needed)
+- A brief (3–5 bullet) summary of what the lecture covers
diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 82f5cc7ec..547006f33 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -3631,3 +3631,56 @@ @article{shwartz_ziv_tishby2017
   journal = {arXiv preprint arXiv:1703.00810},
   year    = 2017
 }
+
+@article{kihlstrom_mirman1975,
+  author    = {Kihlstrom, Richard E. and Mirman, Leonard J.},
+  title     = {Information and Market Equilibrium},
+  journal   = {The Bell Journal of Economics},
+  volume    = {6},
+  number    = {1},
+  pages     = {357--376},
+  year      = {1975},
+  publisher = {The RAND Corporation}
+}
+
+@article{muth1961,
+  author  = {Muth, John F.},
+  title   = {Rational Expectations and the Theory of Price Movements},
+  journal = {Econometrica},
+  volume  = {29},
+  number  = {3},
+  pages   = {315--335},
+  year    = {1961}
+}
+
+@article{radner1972,
+  author  = {Radner, Roy},
+  title   = {Existence of Equilibrium Plans, Prices, and Price Expectations
+             in a Sequence of Markets},
+  journal = {Econometrica},
+  volume  = {40},
+  number  = {2},
+  pages   = {289--304},
+  year    = {1972}
+}
+
+@article{arrow1964,
+  author  = {Arrow, Kenneth J.},
+  title   = {The Role of Securities in the Optimal Allocation of Risk-Bearing},
+  journal = {Review of Economic Studies},
+  volume  = {31},
+  number  = {2},
+  pages   = {91--96},
+  year    = {1964}
+}
+
+@article{grossman1976,
+  author  = {Grossman, Sanford J.},
+  title   = {On the Efficiency of Competitive Stock Markets Where Trades Have
+             Diverse Information},
+  journal = {Journal of Finance},
+  volume  = {31},
+  number  = {2},
+  pages   = {573--585},
+  year    = {1976}
+}
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index 28999d83f..a24169f91 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -43,6 +43,7 @@ parts:
   - file: exchangeable
   - file: likelihood_bayes
   - file: blackwell_kihlstrom
+  - file: information_market_equilibrium
   - file: mix_model
   - file: navy_captain
   - file: merging_of_opinions
diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
new file mode 100644
index 000000000..00063a9a1
--- /dev/null
+++ b/lectures/information_market_equilibrium.md
@@ -0,0 +1,1185 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(information_market_equilibrium)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Information and Market Equilibrium
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+This lecture studies two questions about the **informational role of prices**  posed and
+answered by {cite:t}`kihlstrom_mirman1975`.
+
+1. **When do prices transmit inside information?**  An informed insider observes a private
+   signal correlated with an unknown state of the world and adjusts demand accordingly.
+   Equilibrium prices shift.  Under what conditions can an outside observer *infer* the
+   insider's private signal from the equilibrium price?
+
+2. **Do Bayesian price expectations converge?**  In a stationary stochastic exchange
+   economy, an uninformed observer uses the history of market prices and Bayes' Law  to form
+    expectations about the economy's structure.  Do those expectations eventually
+   agree with those  of a fully informed observer?
+
+Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
+
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is  *sufficient* for
+  $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
+  information about the state that $\tilde{y}'$ contains.
+- **Bayesian consistency**: as the sample grows, the posterior concentrates on the true
+  parameter value (even when the underlying economic ßstructure is not globally identified from prices alone).
+
+Important findings of {cite:t}`kihlstrom_mirman1975` are:
+
+- Equilibrium prices transmit inside information **if and only if** the map from the
+  insider's posterior distribution to the equilibrium price vector is invertible
+  (one-to-one).
+- For a two-state pure exchange economy with CES preferences, invertibility holds whenever the
+  elasticity of substitution $\sigma \neq 1$.  With Cobb-Douglas preferences ($\sigma = 1$)
+  the equilibrium price is independent of the insider's posterior, so information is never
+  transmitted.
+- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is notß  identified.
+
+```{note}
+{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in the same
+way that careful econometricians do.  These two objects come in pairs. To each structure or structural model
+there is a reduced form, or collection of reduced forms traced out by different possible regressions.
+```
+
+The lecture is organized as follows.
+
+1. Set up the static two-commodity model and define equilibrium.
+2. State the price-revelation theorem (Theorem 1 of the paper) and the invertibility
+   conditions (Theorem 2).
+3. Illustrate invertibility — and its failure — with numerical examples using CES and
+   Cobb-Douglas preferences.
+4. Introduce the dynamic stochastic economy and derive the Bayesian convergence result.
+5. Simulate Bayesian learning from price observations.
+
+This lecture builds on ideas in {doc}`blackwell_kihlstrom` and {doc}`likelihood_bayes`.
+
+## Setup
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.optimize import brentq
+from scipy.stats import norm
+```
+
+## A Two-Commodity Economy with an Informed Insider
+
+### Preferences, Endowments, and the Unknown State
+
+The economy has two goods. 
+
+Good 2 is the numeraire (price normalized to 1); good 1 trades
+at price $p > 0$.
+
+An unknown parameter $\bar{a}$ affects the value of good 1. 
+
+Agent $i$'s expected utility
+from a bundle $(x_1^i, x_2^i)$ is
+
+$$
+U^i(x_1^i, x_2^i)
+  = \sum_{s=1}^{S} u^i(a_s x_1^i,\, x_2^i)\, PR^i(\bar{a} = a_s),
+$$
+
+where $PR^i$ is agent $i$'s subjective probability distribution over the finite state space
+$A = \{a_1, \ldots, a_S\}$.
+
+Each agent starts with an endowment $w^i$ of good 2 and a share $\theta^i$ of the
+representative firm.
+
+The firm's profit $\pi$ is determined by profit maximization.
+
+Agent
+$i$'s **budget constraint** is
+
+$$
+p x_1^i + x_2^i = w^i + \theta^i \pi.
+$$
+
+Agents maximize expected utility subject to their budget constraints.
+
+A **competitive
+equilibrium** is a price $\hat{p}$ that clears both markets simultaneously.
+
+### The Informed Agent's Problem
+
+Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$ correlated with
+$\bar{a}$ before trading.
+
+Upon observing $\tilde{y} = y$, agent 1 updates their prior
+$\mu = PR^1$ to a **posterior** $\mu_y = (\mu_{y1}, \ldots, \mu_{yS})$ via Bayes' rule:
+
+$$
+\mu_{ys} = PR(\bar{a} = a_s \mid \tilde{y} = y).
+$$
+
+Because agent 1's demand depends on $\mu_y$, the new equilibrium price satisfies
+
+$$
+\hat{p} = p(\mu_y).
+$$
+
+Outside observers who see $\hat{p}$ but not $\tilde{y}$ can try to *back out* the
+insider's posterior from the price.
+
+This is possible when the map $\mu \mapsto p(\mu)$
+is **invertible** on the relevant domain.
+
+(price_revelation_theorem)=
+## Price Revelation: Theorem 1
+
+### Blackwell Sufficiency
+
+The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's private
+information if observing the equilibrium price is just as informative about $\bar{a}$ as
+observing the signal $\tilde{y}$ directly.
+
+In Blackwell's language ({cite}`blackwell1951` and {cite}`blackwell1953`), this means
+$p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
+
+**Definition.**  A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
+respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
+**independent of $\bar{a}$**, such that
+
+$$
+\phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
+\quad \text{for all } a \text{ and all } y',
+$$
+
+where $\phi_a(y) = PR(\tilde{y} = y \mid \bar{a} = a)$.
+
+Thus, once $\tilde{y}$ is known, $\tilde{y}'$ provides no additional information
+about $\bar{a}$.
+
+**Lemma 1** ({cite:t}`kihlstrom_mirman1975`).  The posterior distribution $\mu_{\tilde{y}}$
+is  sufficient  for $\tilde{y}$.
+
+*Proof sketch.*  The posterior $\mu_{\tilde{y}}$ satisfies
+
+$$
+PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
+  = PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y).
+$$
+
+Because the posterior itself *encodes* what $\tilde{y}$ says about $\bar{a}$, observing
+$\tilde{y}$ directly would add no information. $\square$
+
+**Theorem 1** ({cite:t}`kihlstrom_mirman1975`).  In the economy described above, the price
+random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only if** the
+function $p(PR^1)$ is **invertible** on the set
+
+$$
+P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
+  PR(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu(a) > 0 \bigr\}.
+$$
+
+The "only if" direction follows because if $p$ were not one-to-one, two different posteriors
+would generate the same price; an observer could not distinguish them, so the price would
+not transmit all information that resides in  the signal.
+
+### Two Interpretations
+
+**Insider trading in a stock market.**  Good 1 is a risky asset with random return $\bar{a}$;
+good 2 is ''money''.  An insider's demand reveals private information about the return.
+If the invertibility condition holds, outside observers can read the insider's signal from
+the equilibrium stock price.
+
+**Price as a quality signal.**  Good 1 has uncertain quality $\bar{a}$.  Experienced
+consumers (who have sampled the good) observe a signal correlated with quality and buy
+accordingly.  Uninformed consumers can infer quality from the market price, provided
+invertibility holds.
+
+(invertibility_conditions)=
+## Invertibility and the Elasticity of Substitution (Theorem 2)
+
+When does $p(PR^1)$ fail to be invertible?
+
+Theorem 2 of {cite:t}`kihlstrom_mirman1975`
+shows that for a two-state economy ($S = 2$), the answer turns on the **elasticity of
+substitution** $\sigma$ of agent 1's utility function.
+
+### The Two-State First-Order Condition
+
+With $S = 2$ and $\mu = (q,\, 1-q)$, the first-order condition for agent 1's demand
+(equation (12a) in the paper) reduces to
+
+$$
+p(q) = \frac{\alpha_1 q + \alpha_2 (1-q)}{\beta_1 q + \beta_2 (1-q)},
+$$
+
+where
+
+$$
+\alpha_s = a_s\, u^1_1(a_s x_1,\, x_2), \qquad
+\beta_s  = u^1_2(a_s x_1,\, x_2), \qquad s = 1, 2.
+$$
+
+The equilibrium consumption $(x_1, x_2)$ itself depends on $p$, so this is an implicit
+equation in $p$.
+
+**Theorem 2** ({cite:t}`kihlstrom_mirman1975`).  Assume $u^1$ is quasi-concave and
+homothetic with continuous first partials.  Assume agent 1 always consumes positive
+quantities of both goods.  For $S = 2$:
+
+- If $\sigma < 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
+- If $\sigma > 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
+- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), $p(PR^1)$ is **constant** on $P$
+  (no information is transmitted).
+
+Thus, when $\sigma = 1$ the income and substitution effects exactly cancel,
+making agent 1's demand for good 1 independent of information about $\bar{a}$.
+
+So the market price cannot reveal that information.
+
+### CES Utility
+
+For concreteness we work with the **constant-elasticity-of-substitution** (CES) utility
+function
+
+$$
+u(c_1, c_2) = \bigl(c_1^{\rho} + c_2^{\rho}\bigr)^{1/\rho}, \qquad \rho \in (-\infty,0) \cup (0,1),
+$$
+
+whose elasticity of substitution is $\sigma = 1/(1-\rho)$.
+
+- $\rho \to 0$: Cobb-Douglas ($\sigma = 1$).
+- $\rho < 0$: $\sigma < 1$ (complements).
+- $0 < \rho < 1$: $\sigma > 1$ (substitutes).
+
+Pertinent partial derivatives are
+
+$$
+u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1}, \qquad
+u_2(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_2^{\rho-1}.
+$$
+
+### Equilibrium Price as a Function of the Posterior
+
+We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1 at
+equilibrium (i.e.~$x_1 = 1$).
+
+Agent 1's budget constraint then reduces to
+$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$ satisfying
+the first-order condition
+
+$$
+p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
+= q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
+$$
+
+For Cobb-Douglas utility ($\sigma = 1$), first-order-necessary conditions  (FOC) become $p = W^1 - p$,
+giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no information
+is transmitted through the price in the Cobb-Douglas case.
+
+We compute first-order-necessary conditions numerically below.
+
+```{code-cell} ipython3
+def ces_derivatives(c1, c2, rho):
+    """
+    Returns (u1, u2) for u(c1,c2) = (c1^rho + c2^rho)^(1/rho).
+    Uses Cobb-Douglas limit for |rho| < 1e-4 to avoid numerical overflow.
+    """
+    if abs(rho) < 1e-4:
+        # Cobb-Douglas limit  u = sqrt(c1*c2)
+        u1 = 0.5 * np.sqrt(c2 / c1)
+        u2 = 0.5 * np.sqrt(c1 / c2)
+    else:
+        common = (c1**rho + c2**rho)**(1/rho - 1)
+        u1 = common * c1**(rho - 1)
+        u2 = common * c2**(rho - 1)
+    return u1, u2
+
+
+def eq_price(q, a1, a2, W1, rho):
+    """
+    Solve for the equilibrium price when the informed agent absorbs one unit
+    of good 1.  With x1 = 1 and budget constraint x2 = W1 - p, the FOC
+
+        p [q u2(a1, x2) + (1-q) u2(a2, x2)] = q a1 u1(a1, x2) + (1-q) a2 u1(a2, x2)
+
+    has a unique root p* in (0, W1).
+
+    Parameters
+    ----------
+    q   : posterior probability on state 1 (high state)
+    a1  : state-1 productivity value  (a1 > a2)
+    a2  : state-2 productivity value
+    W1  : informed agent's wealth
+    rho : CES parameter  (rho=0 → Cobb-Douglas; analytical p* = W1/2)
+
+    Returns
+    -------
+    p_star : equilibrium price, or nan if solver fails
+    """
+    def residual(p):
+        x2 = W1 - p          # x1 = 1 absorbed at equilibrium
+        u1_s1, u2_s1 = ces_derivatives(a1, x2, rho)
+        u1_s2, u2_s2 = ces_derivatives(a2, x2, rho)
+        lhs = p * (q * u2_s1 + (1 - q) * u2_s2)
+        rhs = q * a1 * u1_s1 + (1 - q) * a2 * u1_s2
+        return lhs - rhs
+
+    try:
+        return brentq(residual, 1e-6, W1 - 1e-6, xtol=1e-10)
+    except ValueError:
+        return np.nan
+```
+
+```{code-cell} ipython3
+# ── Economy parameters ──────────────────────────────────────────────────────
+a1, a2 = 2.0, 0.5     # state values (a1 > a2)
+W1     = 4.0           # informed agent's wealth; equilibrium x2 = W1 - p
+
+# Posterior grid
+q_grid = np.linspace(0.05, 0.95, 200)
+
+# rho values to compare: complements (<0), Cobb-Douglas (=0), substitutes (>0)
+rho_values = [-0.5, 0.0, 0.5]
+rho_labels = [r"$\rho = -0.5$  ($\sigma = 0.67$, complements)",
+              r"$\rho = 0$  ($\sigma = 1$, Cobb-Douglas)",
+              r"$\rho = 0.5$  ($\sigma = 2$, substitutes)"]
+colors     = ["steelblue", "crimson", "forestgreen"]
+
+fig, ax = plt.subplots(figsize=(8, 5))
+
+for rho, label, color in zip(rho_values, rho_labels, colors):
+    prices = [eq_price(q, a1, a2, W1, rho) for q in q_grid]
+    ax.plot(q_grid, prices, label=label, color=color, lw=2)
+
+ax.set_xlabel(r"Posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
+ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
+ax.set_title("Equilibrium price as a function of the informed agent's posterior",
+             fontsize=12)
+ax.legend(fontsize=10)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+The plot confirms Theorem 2.
+
+- **CES with $\sigma \neq 1$**: the equilibrium price is **strictly monotone** in $q$.
+  An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely invert the
+  price to recover $q$—inside information is fully transmitted.
+- **Cobb-Douglas ($\sigma = 1$)**: the price is *flat* in $q$—information is never
+  transmitted through the market.
+
+```{code-cell} ipython3
+# ── Verify that rho=0 (exact Cobb-Douglas) gives a flat line ─────────────────
+p_cd = [eq_price(q, a1, a2, W1, rho=0.0) for q in q_grid]
+
+print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
+      f"max p* = {max(p_cd):.6f}, "
+      f"range = {max(p_cd)-min(p_cd):.2e}")
+print(f"Analytical CD price  = W1/2 = {W1/2:.6f}")
+```
+
+Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the Cobb-Douglas
+equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
+
+(price_monotonicity)=
+### Why Monotonicity Depends on $\sigma$
+
+The derivative $\partial p / \partial q$ has the sign of $\alpha_1 \beta_2 - \alpha_2 \beta_1$
+(from differentiating the FOC formula).
+
+Using
+
+$$
+\frac{\alpha_s}{\beta_s}
+  = \frac{a_s\, u_1(a_s x_1, x_2)}{u_2(a_s x_1, x_2)}
+  = a_s^{(\sigma-1)/\sigma}\,\Bigl(\frac{x_2}{x_1}\Bigr)^{1/\sigma},
+$$
+
+one can show
+
+$$
+\frac{\partial}{\partial a}\,\frac{\alpha}{\beta}
+  = \frac{(\sigma - 1)}{\sigma}\, a^{-1/\sigma}\,
+    \Bigl(\frac{x_2}{x_1}\Bigr)^{1/\sigma}.
+$$
+
+This is positive when $\sigma > 1$, negative when $\sigma < 1$, and **zero when $\sigma = 1$**
+(Cobb-Douglas).
+
+The vanishing derivative means the marginal rate of substitution is
+independent of $a_s$, so the informed agent's demand—and hence the equilibrium price—does
+not respond to changes in beliefs.
+
+Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for different
+values of $\sigma$:
+
+```{code-cell} ipython3
+a_vals = np.linspace(0.3, 3.0, 300)
+x1_fix, x2_fix = 1.0, 1.0   # fix consumption bundle for illustration
+
+fig, ax = plt.subplots(figsize=(7, 4))
+for rho, color in zip([-0.5, -1e-6, 0.5], ["steelblue", "crimson", "forestgreen"]):
+    sigma = 1 / (1 - rho) if abs(rho) > 1e-8 else 1.0
+    ratios = []
+    for a in a_vals:
+        u1, u2 = ces_derivatives(a * x1_fix, x2_fix, rho)
+        ratios.append(a * u1 / u2)
+    ax.plot(a_vals, ratios,
+            label=rf"$\sigma = {sigma:.2f}$", color=color, lw=2)
+
+ax.set_xlabel(r"State value $a_s$", fontsize=12)
+ax.set_ylabel(r"$\alpha_s / \beta_s = a_s u_1 / u_2$", fontsize=12)
+ax.set_title(r"Marginal rate of substitution $\alpha_s/\beta_s$ vs.\ $a_s$", fontsize=12)
+ax.axhline(y=1.0, color="black", lw=0.8, ls="--")
+ax.legend(fontsize=10)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+When $\sigma = 1$ (red line) the ratio is constant across all $a_s$ values—information
+about the state has no effect on the marginal rate of substitution.
+
+For $\sigma < 1$ the
+ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
+equilibrium price strictly monotone in the posterior $q$ in both cases.
+
+(bayesian_price_expectations)=
+## Bayesian Price Expectations in a Dynamic Economy
+
+We now turn to the **dynamic** question of Section 3 in {cite:t}`kihlstrom_mirman1975`.
+
+### A Stochastic Exchange Economy
+
+Time is discrete: $t = 1, 2, \ldots$  In each period $t$:
+
+1. Consumer $i$ receives a random endowment $\omega_i^t$.
+2. Markets open; competitive prices $p^t = p(\omega^t)$ clear all markets.
+3. Consumers trade and consume.
+
+The endowment vectors $\{\tilde{\omega}^t\}$ are **i.i.d.** with density
+$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is a
+**structural parameter vector** that is *fixed but unknown*.
+
+The equilibrium price at time $t$ is a deterministic function of $\omega^t$, so
+$\{p^t\}$ is also i.i.d. with density
+
+$$
+g(p^t \mid \lambda) = \int f(\omega^t \mid \lambda)\,
+  \mathbf{1}\bigl[p(\omega^t) = p^t\bigr]\, d\omega^t.
+$$
+
+Following econometric convention, {cite:t}`kihlstrom_mirman1975` call $g(p \mid \lambda)$
+the **reduced form** and $f(\omega \mid \lambda)$ the **structure**.
+
+### The Identification Problem
+
+Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices loses
+information relative to observing endowments.
+
+In particular, it may be impossible to
+recover $\lambda$ from $g(p \mid \lambda)$ even with infinite price data.
+
+To handle this, partition $\Lambda$ into equivalence classes $\mu$ such that
+$\lambda \in \mu$ and $\lambda' \in \mu$ whenever $g(p \mid \lambda) = g(p \mid \lambda')$
+for all $p$.
+
+The equivalence class $\mu$ containing the true $\lambda$ is the **reduced
+form** (with respect to data on prices).
+
+An observer who knows the infinite price history learns
+$\mu$ but not necessarily $\lambda$.
+
+### Bayesian Updating
+
+An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in \Lambda$.
+After observing the price sequence $(p^1, \ldots, p^t)$, the observer's Bayesian
+posterior is
+
+$$
+h(\lambda \mid p^1, \ldots, p^t)
+  = \frac{h(\lambda)\, \prod_{\tau=1}^{t} g(p^\tau \mid \lambda)}
+         {\displaystyle\sum_{\lambda' \in \Lambda}
+           h(\lambda')\, \prod_{\tau=1}^{t} g(p^\tau \mid \lambda')}.
+$$
+
+At time $t$, the observer's price expectations for the next period are
+
+$$
+g(p^{t+1} \mid p^1, \ldots, p^t)
+  = \sum_{\lambda \in \Lambda} g(p^{t+1} \mid \lambda)\,
+    h(\lambda \mid p^1, \ldots, p^t).
+$$
+
+### The Convergence Theorem
+
+**Theorem** ({cite:t}`kihlstrom_mirman1975`, Section 3).  Let $\bar\lambda$ be the true
+structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.  Then:
+
+$$
+\lim_{t \to \infty} h(\mu \mid p^1, \ldots, p^t)
+  = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,} \end{cases}
+$$
+
+with probability one.  Consequently,
+
+$$
+\lim_{t \to \infty} g(p^{t+1} \mid p^1, \ldots, p^t) = g(p \mid \bar\mu),
+$$
+
+which equals the rational-expectations price distribution for a fully informed observer.
+
+The convergence uses the **Bayesian consistency** result of {cite:t}`degroot1962`: as
+long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
+(which holds here generically), the posterior concentrates on the true reduced form.
+
+**Key insight.**  Price observers converge to **rational expectations** even if they
+never identify the underlying structure $\bar\lambda$.  It is the reduced form
+$g(p \mid \bar\mu)$ that governs equilibrium price expectations, and the Bayesian
+observer learns the reduced form from prices alone.
+
+(bayesian_simulation)=
+## Simulating Bayesian Learning from Prices
+
+We illustrate the theorem with a two-state example.
+
+**Setup.**  Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
+$p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.  The observer knows
+the two possible price distributions (the reduced forms) but not which one governs the
+data.
+
+This is a standard **Bayesian model selection** problem.  With a prior $h_0$ on $\mu_1$
+and the observed price $p^t$, the posterior weight on $\mu_1$ after period $t$ is
+
+$$
+h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
+      + (1-h_{t-1})\, g(p^t \mid \mu_2)}.
+$$
+
+```{code-cell} ipython3
+def simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths,
+                                seed=42):
+    """
+    Simulate Bayesian learning about which price distribution is true.
+
+    Parameters
+    ----------
+    p_bar_true : mean of the true reduced form
+    p_bar_alt  : mean of the alternative reduced form
+    sigma_p    : common standard deviation of price distributions
+    T          : number of periods
+    h0         : initial prior probability on the true model
+    n_paths    : number of simulation paths
+    seed       : random seed
+
+    Returns
+    -------
+    h_paths : array of shape (n_paths, T+1) with posterior beliefs on true model
+    """
+    rng = np.random.default_rng(seed)
+    h_paths = np.zeros((n_paths, T + 1))
+    h_paths[:, 0] = h0
+
+    for path in range(n_paths):
+        h = h0
+        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        for t, p in enumerate(prices):
+            g_true  = norm.pdf(p, loc=p_bar_true, scale=sigma_p)
+            g_alt   = norm.pdf(p, loc=p_bar_alt,  scale=sigma_p)
+            denom   = h * g_true + (1 - h) * g_alt
+            h       = h * g_true / denom
+            h_paths[path, t + 1] = h
+
+    return h_paths
+
+
+def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
+    """Plot posterior beliefs over time."""
+    T = h_paths.shape[1] - 1
+    t_grid = np.arange(T + 1)
+
+    for path in h_paths:
+        ax.plot(t_grid, path, alpha=0.25, lw=0.8, color="steelblue")
+
+    median_path = np.median(h_paths, axis=0)
+    ax.plot(t_grid, median_path, color="navy", lw=2.5, label="Median posterior")
+
+    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="True model weight = 1")
+    ax.set_xlabel("Period $t$", fontsize=12)
+    ax.set_ylabel(r"$h_t$ = posterior weight on true model", fontsize=12)
+    ax.set_title(
+        rf"Bayesian learning: $\bar p_{{\\rm true}}={p_bar_true:.1f}$, "
+        rf"$\bar p_{{\\rm alt}}={p_bar_alt:.1f}$, $\sigma_p={sigma_p:.2f}$",
+        fontsize=11,
+    )
+    ax.legend(fontsize=10)
+    ax.set_ylim(-0.05, 1.08)
+    ax.grid(alpha=0.3)
+```
+
+```{code-cell} ipython3
+T       = 300
+h0      = 0.5     # diffuse prior
+n_paths = 40
+sigma_p = 0.4
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+
+# Case 1: distinct reduced forms (easy to learn)
+p_bar_true, p_bar_alt = 2.0, 1.2
+h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
+axes[0].set_title("Easy case: means far apart", fontsize=12)
+
+# Case 2: similar reduced forms (harder to learn)
+p_bar_true, p_bar_alt = 2.0, 1.8
+h_paths_hard = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+plot_bayesian_learning(h_paths_hard, p_bar_true, p_bar_alt, axes[1])
+axes[1].set_title("Hard case: means close together", fontsize=12)
+
+plt.tight_layout()
+plt.show()
+```
+
+In both panels the posterior weight on the true model converges to 1 with probability one,
+though convergence is slower when the two price distributions are similar (right panel).
+
+### Price Expectations vs. Rational Expectations
+
+We now verify that the observer's price expectations converge to the rational-expectations
+distribution $g(p \mid \bar\mu)$.
+
+```{code-cell} ipython3
+def price_expectation(h_t, p_bar_true, p_bar_alt, p_grid):
+    """
+    Compute the observer's predictive price density at posterior weight h_t.
+    Mixture: h_t * N(p_bar_true, ...) + (1-h_t) * N(p_bar_alt, ...)
+    """
+    return (h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+            + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p))
+
+
+p_bar_true, p_bar_alt = 2.0, 1.2
+sigma_p = 0.4
+T_long  = 1000
+n_paths = 1
+h_paths_long = simulate_bayesian_learning(
+    p_bar_true, p_bar_alt, sigma_p, T_long, h0=0.5, n_paths=n_paths, seed=7
+)
+
+p_grid = np.linspace(0.0, 3.5, 300)
+re_density = norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+
+fig, ax = plt.subplots(figsize=(8, 5))
+snapshots = [0, 10, 50, 200, T_long]
+palette   = plt.cm.Blues(np.linspace(0.3, 1.0, len(snapshots)))
+
+for t_snap, col in zip(snapshots, palette):
+    h_t = h_paths_long[0, t_snap]
+    dens = price_expectation(h_t, p_bar_true, p_bar_alt, p_grid)
+    ax.plot(p_grid, dens, color=col, lw=2,
+            label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$")
+
+ax.plot(p_grid, re_density, "k--", lw=2.5,
+        label=r"Rational expectations $g(p \mid \bar\mu)$")
+ax.set_xlabel("Price $p$", fontsize=12)
+ax.set_ylabel("Density", fontsize=12)
+ax.set_title("Observer's price distribution converges to rational expectations", fontsize=12)
+ax.legend(fontsize=9)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+The sequence of predictive densities (shades of blue) converges to the rational-expectations
+density (dashed black line) as experience accumulates.
+
+This illustrates the main theorem of
+Section 3 of {cite:t}`kihlstrom_mirman1975`.
+
+(km_extension_nonidentification)=
+### Learning the Reduced Form without Identifying the Structure
+
+The convergence result is particularly striking because the observer converges to
+*rational expectations* even when the underlying **structure** $\lambda$ is
+*not identified* by prices.
+
+To illustrate this, consider a case with *three* possible structures
+$\lambda^{(1)}, \lambda^{(2)}, \lambda^{(3)}$ but only *two* reduced forms
+$\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
+(because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price distribution).
+
+```{code-cell} ipython3
+def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths, seed=0):
+    """
+    Bayesian learning with 3 structures, 2 reduced forms.
+    h0_vec  : length-3 array of initial prior weights on each structure
+    p_bar_vec: length-3 array of price means for each structure
+               (structures 0 and 1 share the same reduced form if p_bar_vec[0]==p_bar_vec[1])
+    true_idx: index (0,1,2) of the true structure
+    Returns  : array (n_paths, T+1, 3) posterior weights on each structure
+    """
+    rng = np.random.default_rng(seed)
+    h_paths = np.zeros((n_paths, T + 1, 3))
+    h_paths[:, 0, :] = h0_vec
+
+    for path in range(n_paths):
+        h = np.array(h0_vec, dtype=float)
+        prices = rng.normal(p_bar_vec[true_idx], sigma_p, size=T)
+        for t, p in enumerate(prices):
+            likelihoods = norm.pdf(p, loc=p_bar_vec, scale=sigma_p)
+            h = h * likelihoods
+            h /= h.sum()
+            h_paths[path, t + 1, :] = h
+
+    return h_paths
+
+
+# Structures 0 and 1 have the same reduced form (same price mean)
+p_bar_vec = np.array([2.0, 2.0, 1.2])
+h0_vec    = np.array([1/3, 1/3, 1/3])
+sigma_p   = 0.4
+T         = 400
+true_idx  = 0             # True structure is 0 (indistinguishable from 1)
+
+h_paths_3 = simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths=30)
+t_grid = np.arange(T + 1)
+
+fig, axes = plt.subplots(1, 3, figsize=(13, 4), sharey=True)
+struct_labels = [r"$\lambda^{(1)}$",
+                 r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
+                 r"$\lambda^{(3)}$"]
+
+for k, (ax, label) in enumerate(zip(axes, struct_labels)):
+    for path in h_paths_3:
+        ax.plot(t_grid, path[:, k], alpha=0.25, lw=0.8, color="steelblue")
+    ax.plot(t_grid, np.median(h_paths_3[:, :, k], axis=0),
+            color="navy", lw=2.5, label="Median")
+    ax.set_title(f"Structure {label}", fontsize=10)
+    ax.set_xlabel("Period $t$", fontsize=11)
+    ax.grid(alpha=0.3)
+    ax.legend(fontsize=9)
+
+axes[0].set_ylabel("Posterior weight", fontsize=11)
+fig.suptitle(
+    r"Non-identification: weights on $\lambda^{(1)}$ and $\lambda^{(2)}$ stabilize at "
+    r"non-degenerate values; $\lambda^{(3)}$ is eliminated",
+    fontsize=10, y=1.02
+)
+plt.tight_layout()
+plt.show()
+```
+
+The observer correctly rules out $\lambda^{(3)}$ (the wrong reduced form) with probability
+one, but cannot distinguish $\lambda^{(1)}$ from $\lambda^{(2)}$ because they generate an
+identical price distribution.
+
+Nevertheless, the observer's **price expectations** converge
+to rational expectations because both structures imply the same reduced form $\bar\mu$.
+
+## Exercises
+
+```{exercise}
+:label: km_ex1
+
+**Invertibility with CARA preferences.**  Consider a two-state economy ($a_1 = 2$,
+$a_2 = 0.5$) where the informed agent has **CARA** (constant absolute risk aversion)
+preferences over portfolio wealth:
+
+$$
+u(W) = -e^{-\gamma W}, \quad W = x_2 + \bar{a}\, x_1.
+$$
+
+The agent chooses $x_1$ to maximize
+
+$$
+q\,u(W_1) + (1-q)\,u(W_2), \quad W_s = w - p\,x_1 + a_s\,x_1,
+$$
+
+subject to the budget constraint $p\,x_1 + x_2 = w$.  Total supply of good 1 is $X_1 = 1$.
+
+(a) Derive the first-order condition for the informed agent's optimal $x_1$.
+
+(b) Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
+supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.  Solve it
+numerically for $q \in (0,1)$ and several values of $\gamma$.
+
+(c) Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
+holds.  Explain intuitively why CARA preferences always lead to an invertible price map
+(the elasticity of substitution of portfolio utility is $\sigma = \infty$).
+```
+
+```{solution-start} km_ex1
+:class: dropdown
+```
+
+**(a) First-order condition.**
+
+Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.  The FOC is
+
+$$
+q\,(a_1 - p)\,\gamma\, e^{-\gamma W_1}
+= (1-q)\,(p - a_2)\,\gamma\, e^{-\gamma W_2},
+$$
+
+or equivalently (dividing by $\gamma$ and rearranging)
+
+$$
+q\,(a_1 - p)\, e^{-\gamma(a_1-p) x_1}
+  = (1-q)\,(p - a_2)\, e^{-\gamma(p-a_2) x_1}.
+$$
+
+**(b) Market-clearing equilibrium price.**
+
+Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
+a scalar root-finding problem in $p$:
+
+$$
+F(p;\,q,\gamma) \equiv
+  q\,(a_1-p)\,e^{-\gamma(a_1-p)} - (1-q)\,(p-a_2)\,e^{-\gamma(p-a_2)} = 0.
+$$
+
+```{code-cell} ipython3
+from scipy.optimize import brentq
+
+def F_cara(p, q, a1, a2, gamma, x1=1.0):
+    """Residual of CARA market-clearing condition."""
+    return (q * (a1-p) * np.exp(-gamma*(a1-p)*x1)
+            - (1-q) * (p-a2) * np.exp(-gamma*(p-a2)*x1))
+
+a1, a2  = 2.0, 0.5
+q_grid  = np.linspace(0.05, 0.95, 200)
+gammas  = [0.5, 1.0, 2.0, 5.0]
+colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(gammas)))
+
+fig, ax = plt.subplots(figsize=(8, 5))
+for gamma, color in zip(gammas, colors_sol):
+    p_eq = [brentq(F_cara, a2+1e-4, a1-1e-4,
+                   args=(q, a1, a2, gamma))
+            for q in q_grid]
+    ax.plot(q_grid, p_eq, lw=2, color=color,
+            label=rf"$\gamma = {gamma}$")
+
+ax.set_xlabel(r"Posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
+ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
+ax.set_title("CARA preferences: equilibrium prices", fontsize=12)
+ax.legend(fontsize=10)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+**(c) Invertibility for CARA.**
+
+The price is strictly increasing in $q$ for every $\gamma > 0$.  Intuitively, portfolio
+utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect substitutes** in
+creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$.  By
+Theorem 2 of {cite:t}`kihlstrom_mirman1975`, the price map is therefore always invertible.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: km_ex2
+
+**Convergence rate and KL divergence.**  In the Bayesian learning simulation, the speed of
+convergence to rational expectations is determined by the **Kullback-Leibler divergence**
+between the two reduced forms.
+
+The KL divergence from $g(\cdot \mid \mu_2)$ to $g(\cdot \mid \mu_1)$, for two normal
+distributions with means $\bar{p}_1$ and $\bar{p}_2$ and common variance $\sigma_p^2$, is
+
+$$
+D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2}.
+$$
+
+(a) For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
+($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.8$), compute $D_{KL}$ for $\sigma_p = 0.4$.
+
+(b) Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
+path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot histograms of
+$T_{0.99}$ for both cases.
+
+(c) How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
+roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
+```
+
+```{solution-start} km_ex2
+:class: dropdown
+```
+
+```{code-cell} ipython3
+sigma_p = 0.4
+
+def kl_normal(p1, p2, sigma):
+    """KL divergence between N(p1,sigma^2) and N(p2,sigma^2)."""
+    return (p1 - p2)**2 / (2 * sigma**2)
+
+cases = [("Easy",  2.0, 1.2), ("Hard", 2.0, 1.8)]
+for name, p1, p2 in cases:
+    kl = kl_normal(p1, p2, sigma_p)
+    print(f"{name} case: D_KL = {kl:.4f}")
+
+n_paths = 100
+
+fig, axes = plt.subplots(1, 2, figsize=(11, 4))
+for ax, (name, p1, p2) in zip(axes, cases):
+    kl = kl_normal(p1, p2, sigma_p)
+    paths = simulate_bayesian_learning(p1, p2, sigma_p, T=2000,
+                                       h0=0.5, n_paths=n_paths, seed=42)
+    # First period where posterior >= 0.99
+    T99 = []
+    for path in paths:
+        idx = np.where(path >= 0.99)[0]
+        T99.append(idx[0] if len(idx) > 0 else 2001)
+
+    median_T = np.median(T99)
+    ax.hist(T99, bins=20, color="steelblue", edgecolor="white", alpha=0.8)
+    ax.axvline(median_T, color="crimson", lw=2,
+               label=fr"Median $T_{{0.99}} = {median_T:.0f}$")
+    ax.set_title(
+        f"{name}: $D_{{KL}} = {kl:.4f}$,  "
+        fr"$C/D_{{KL}} \approx {median_T*kl:.1f}$",
+        fontsize=11
+    )
+    ax.set_xlabel(r"$T_{0.99}$", fontsize=12)
+    ax.set_ylabel("Count", fontsize=11)
+    ax.legend(fontsize=10)
+    ax.grid(alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+```
+
+The median $T_{0.99}$ scales as approximately $C/D_{KL}$, confirming that learning is
+faster when the two reduced forms are more easily distinguished (large $D_{KL}$).
+
+```{solution-end}
+```
+
+```{exercise}
+:label: km_ex3
+
+**Failure of invertibility—counterexample for $S > 2$.**  The paper constructs a
+counterexample showing that for $S = 3$ states, even if the elasticity of substitution
+of $u^1$ is everywhere greater than one, $p(PR^1)$ need **not** be invertible.
+
+Consider the marginal rate of substitution for the portfolio utility
+$u^1(a_s x_1 + x_2)$ (infinite elasticity of substitution) and three states
+$a_1 > a_2 > a_3$.  The MRS is
+
+$$
+m(\mu)
+= \frac{a_1\beta_1\mu(a_1) + a_2\beta_2\mu(a_2) + a_3\beta_3\mu(a_3)}
+       {\beta_1\mu(a_1) + \beta_2\mu(a_2) + \beta_3\mu(a_3)},
+$$
+
+where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
+
+(a) For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
+$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of $(q, r)$.
+Compute $\partial m / \partial r$ and show that its sign depends on
+$\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
+
+(b) Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
+aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify numerically
+that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone in $r$),
+giving a counterexample to invertibility.
+
+(c) Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
+```
+
+```{solution-start} km_ex3
+:class: dropdown
+```
+
+**(a)** Rewrite the MRS with $\mu_1 = 1-r-q$:
+
+$$
+m(q,r) = \frac{a_1\beta_1(1-r-q) + a_2\beta_2 r + a_3\beta_3 q}
+               {\beta_1(1-r-q) + \beta_2 r + \beta_3 q}.
+$$
+
+Differentiating using the quotient rule (denominator $D$):
+
+$$
+\frac{\partial m}{\partial r}
+= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3 q)(\beta_2-\beta_1)}{D^2}.
+$$
+
+After simplification this reduces to a signed combination of
+$\beta_1\beta_2(a_1-a_2)({\cdot})$ and $\beta_2\beta_3(a_2-a_3)({\cdot})$ terms
+whose sign is parameter-dependent.
+
+**(b) Numerical verification.**
+
+```{code-cell} ipython3
+def mrs_3state(q, r, a1, a2, a3, x1, x2, gamma):
+    """MRS with mu(a3)=q, mu(a2)=r, mu(a1)=1-r-q, portfolio utility u'(c)=c^{-gamma}."""
+    mu1, mu2, mu3 = 1 - r - q, r, q
+    beta1 = (a1 * x1 + x2)**(-gamma)
+    beta2 = (a2 * x1 + x2)**(-gamma)
+    beta3 = (a3 * x1 + x2)**(-gamma)
+    num = a1*beta1*mu1 + a2*beta2*mu2 + a3*beta3*mu3
+    den = beta1*mu1 + beta2*mu2 + beta3*mu3
+    return num / den
+
+a1, a2, a3  = 3.0, 2.0, 0.5
+x1, x2      = 1.0, 0.5
+gamma       = 2.0
+q_fix       = 0.1       # fix q, vary r
+r_grid      = np.linspace(0.05, 0.80, 200)
+
+# Filter valid (q+r <= 1)
+r_valid = r_grid[r_grid + q_fix <= 0.95]
+m_vals  = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, gamma) for r in r_valid]
+dm_dr   = np.gradient(m_vals, r_valid)
+
+fig, axes = plt.subplots(1, 2, figsize=(11, 4))
+axes[0].plot(r_valid, m_vals, color="steelblue", lw=2)
+axes[0].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
+axes[0].set_ylabel(r"$m(q, r)$ — MRS", fontsize=12)
+axes[0].set_title(fr"MRS is non-monotone in $r$ (CRRA $\gamma={gamma}$)", fontsize=12)
+axes[0].grid(alpha=0.3)
+
+axes[1].plot(r_valid, dm_dr, color="crimson", lw=2)
+axes[1].axhline(0, color="black", lw=1, ls="--")
+axes[1].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
+axes[1].set_ylabel(r"$\partial m / \partial r$", fontsize=12)
+axes[1].set_title("Derivative changes sign — non-invertibility for $S=3$", fontsize=12)
+axes[1].grid(alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+
+print("Sign changes in dm/dr:",
+      np.sum(np.diff(np.sign(dm_dr)) != 0))
+```
+
+The derivative $\partial m / \partial r$ changes sign, confirming that the MRS (and hence
+the equilibrium price) is **not** monotone in $r$ for $S = 3$.
+
+**(c)** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
+and the MRS is a function of $q$ alone.  One can show directly that $\partial m / \partial q$
+has a definite sign determined entirely by whether $a_1 > a_2$ and whether
+$\sigma > 1$ or $\sigma < 1$ hold—there is no room for sign changes.  With three states,
+the two-dimensional prior $(q, r)$ allows richer interactions between $\beta_s$ values that
+can reverse the sign of the derivative.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: km_ex4
+
+**Bayesian learning with misspecified models.**  The convergence theorem assumes the true
+distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
+$h(\bar\lambda) > 0$).  Investigate what happens when the true model is **not** in the
+prior support.
+
+(a) Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
+    places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$.
+    Plot the posterior weight on each model over time.
+
+(b) Show that the **predictive** (mixture) price distribution converges to the *closest*
+    model in KL divergence terms—which by symmetry is the equal mixture, with mean 2.0.
+    Verify this numerically by computing the predictive mean over time.
+
+(c) Relate this finding to the Bayesian consistency literature: when is the limit
+    distribution a good approximation to the true distribution even under misspecification?
+```
+
+```{solution-start} km_ex4
+:class: dropdown
+```
+
+```{code-cell} ipython3
+def simulate_misspecified(T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0):
+    """
+    Misspecified Bayesian learning: two wrong models with means p_bar_wrong[0,1].
+    True model has mean p_bar_true (not in prior support).
+    Returns (n_paths, T+1, 2) array of posterior weights.
+    """
+    rng = np.random.default_rng(seed)
+    h_paths = np.zeros((n_paths, T + 1, 2))
+    h_paths[:, 0, :] = h0
+
+    for path in range(n_paths):
+        h = np.array(h0, dtype=float)
+        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        for t, price in enumerate(prices):
+            likes = norm.pdf(price, loc=p_bar_wrong, scale=sigma_p)
+            h = h * likes
+            h /= h.sum()
+            h_paths[path, t + 1, :] = h
+
+    return h_paths
+
+
+T        = 1000
+p_true   = 2.0
+p_wrong  = np.array([1.5, 2.5])
+sigma_p  = 0.4
+h0       = np.array([0.5, 0.5])
+n_paths  = 30
+
+h_misspec = simulate_misspecified(T, p_true, p_wrong, sigma_p, h0, n_paths)
+
+t_grid = np.arange(T + 1)
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+for ax, k, label in zip(axes, [0, 1], [r"$N(1.5, \sigma^2)$", r"$N(2.5, \sigma^2)$"]):
+    for path in h_misspec:
+        ax.plot(t_grid, path[:, k], alpha=0.2, lw=0.8, color="steelblue")
+    ax.plot(t_grid, np.median(h_misspec[:, :, k], axis=0),
+            color="navy", lw=2.5, label="Median")
+    ax.axhline(0.5, color="crimson", lw=1.5, ls="--", label="0.5 (symmetric limit)")
+    ax.set_title(f"Posterior weight on {label}", fontsize=11)
+    ax.set_xlabel("Period $t$", fontsize=11)
+    ax.set_ylabel("Posterior weight", fontsize=11)
+    ax.legend(fontsize=9)
+    ax.grid(alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+
+# Predictive mean = h[:,0]*1.5 + h[:,1]*2.5
+pred_mean = np.median(
+    h_misspec[:, :, 0] * p_wrong[0] + h_misspec[:, :, 1] * p_wrong[1], axis=0
+)
+print(f"True mean: {p_true}")
+print(f"Predictive mean at T={T}: {pred_mean[-1]:.4f}")
+print("(Symmetry implies equal weight on 1.5 and 2.5 → predictive mean = 2.0)")
+```
+
+By symmetry, the two wrong models are equidistant from the true distribution in KL
+divergence. The posterior therefore converges to the 50-50 mixture, and the predictive mean
+converges to $0.5 \times 1.5 + 0.5 \times 2.5 = 2.0$—coinciding with the true mean
+despite misspecification.  This is an instance of the general result that under
+misspecification, Bayesian posteriors converge to the distribution in the model class that
+minimizes KL divergence from the model actually generating the data.
+
+```{solution-end}
+```
+

From a9d6c474703dabce25eb9d89da7d49922e6e97fa Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Mon, 20 Apr 2026 18:45:35 -0600
Subject: [PATCH 02/26] Tom's April 20 edits of several lectures

---
 lectures/information_market_equilibrium.md |  24 +-
 lectures/multivariate_normal.md            | 400 +++++++++++++++++++-
 lectures/prob_matrix.md                    | 417 ++++++++++++++++++++-
 3 files changed, 811 insertions(+), 30 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 00063a9a1..222f9bccb 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -43,11 +43,11 @@ answered by {cite:t}`kihlstrom_mirman1975`.
 
 Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
 
-- **Blackwell sufficiency**: a random variable $\tilde{y}$ is  *sufficient* for
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be  *sufficient* for a random variable
   $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
   information about the state that $\tilde{y}'$ contains.
-- **Bayesian consistency**: as the sample grows, the posterior concentrates on the true
-  parameter value (even when the underlying economic ßstructure is not globally identified from prices alone).
+- **Bayesian consistency**: as the sample grows, a Bayesian statistician's posterior probability distribution concentrates on the true
+  parameter value *even when the underlying economic structure is not globally identified from prices alone*.
 
 Important findings of {cite:t}`kihlstrom_mirman1975` are:
 
@@ -58,12 +58,12 @@ Important findings of {cite:t}`kihlstrom_mirman1975` are:
   elasticity of substitution $\sigma \neq 1$.  With Cobb-Douglas preferences ($\sigma = 1$)
   the equilibrium price is independent of the insider's posterior, so information is never
   transmitted.
-- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is notß  identified.
+- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not  identified.
 
 ```{note}
-{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in the same
-way that careful econometricians do.  These two objects come in pairs. To each structure or structural model
-there is a reduced form, or collection of reduced forms traced out by different possible regressions.
+{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in a
+way that careful econometricians do.  Reduced-form  and structural models  come in pairs. To each structure or structural model
+there is a reduced form, or collection of reduced forms, underlying  different possible regressions.
 ```
 
 The lecture is organized as follows.
@@ -164,7 +164,7 @@ $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 
 **Definition.**  A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
 respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
-**independent of $\bar{a}$**, such that
+**independent of**$\bar{a}$, such that
 
 $$
 \phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
@@ -468,7 +468,7 @@ equilibrium price strictly monotone in the posterior $q$ in both cases.
 (bayesian_price_expectations)=
 ## Bayesian Price Expectations in a Dynamic Economy
 
-We now turn to the **dynamic** question of Section 3 in {cite:t}`kihlstrom_mirman1975`.
+We now turn to a question addressed in Section 3 of {cite:t}`kihlstrom_mirman1975`.
 
 ### A Stochastic Exchange Economy
 
@@ -550,13 +550,13 @@ $$
 
 which equals the rational-expectations price distribution for a fully informed observer.
 
-The convergence uses the **Bayesian consistency** result of {cite:t}`degroot1962`: as
+Establishing convergence relies on appealing to  the **Bayesian consistency** result of {cite:t}`degroot1962`: as
 long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
 (which holds here generically), the posterior concentrates on the true reduced form.
 
 **Key insight.**  Price observers converge to **rational expectations** even if they
-never identify the underlying structure $\bar\lambda$.  It is the reduced form
-$g(p \mid \bar\mu)$ that governs equilibrium price expectations, and the Bayesian
+never identify the underlying structure $\bar\lambda$.  The reduced form
+$g(p \mid \bar\mu)$ statistical model is used to form equilibrium price expectations, and the Bayesian
 observer learns the reduced form from prices alone.
 
 (bayesian_simulation)=
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 7aacee6eb..e1353ec0b 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -3,8 +3,10 @@ jupytext:
   text_representation:
     extension: .md
     format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
 kernelspec:
-  display_name: Python 3
+  display_name: Python 3 (ipykernel)
   language: python
   name: python3
 ---
@@ -60,7 +62,7 @@ We apply our Python class to some examples.
 
 We  use the following imports:
 
-```{code-cell} ipython
+```{code-cell} ipython3
 import matplotlib.pyplot as plt
 import numpy as np
 from numba import jit
@@ -95,7 +97,7 @@ def f(z, μ, Σ):
     μ: ndarray(float, dim=1 or 2)
         the mean of z, N by 1
     Σ: ndarray(float, dim=2)
-        the covarianece matrix of z, N by 1
+        the covariance matrix of z, N by N
     """
 
     z = np.atleast_2d(z)
@@ -186,7 +188,7 @@ class MultivariateNormal:
     μ: ndarray(float, dim=1)
         the mean of z, N by 1
     Σ: ndarray(float, dim=2)
-        the covarianece matrix of z, N by 1
+        the covariance matrix of z, N by N
 
     Arguments
     ---------
@@ -1093,8 +1095,8 @@ for indices, IQ, conditions in [([*range(2*n), 2*n], 'θ', 'y1, y2, y3, y4'),
           f'{μ_hat[0]:1.2f} and {Σ_hat[0, 0]:1.2f} respectively')
 ```
 
-Evidently, math tests provide no information about $\mu$ and
-language tests provide no information about $\eta$.
+Evidently, math tests provide no information about $\eta$ and
+language tests provide no information about $\theta$.
 
 ## Univariate Time Series Analysis
 
@@ -1688,7 +1690,7 @@ plt.show()
 
 In the above graph, the green line is what the price of the stock would
 be if people had perfect foresight about the path of dividends while the
-green line is the conditional expectation $E p_t | y_t, y_{t-1}$, which is what the price would
+red line is the conditional expectation $E p_t | y_t, y_{t-1}$, which is what the price would
 be if people did not have perfect foresight but were optimally
 predicting future dividends on the basis of the information
 $y_t, y_{t-1}$ at time $t$.
@@ -1895,7 +1897,7 @@ G = np.array([[1., 3.]])
 R = np.array([[1.]])
 
 x0_hat = np.array([0., 1.])
-Σ0 = np.array([[1., .5], [.3, 2.]])
+Σ0 = np.array([[1., .5], [.5, 2.]])
 
 μ = np.hstack([x0_hat, G @ x0_hat])
 Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]])
@@ -2300,3 +2302,385 @@ Pjk = P[:, :2]
 Σy_hat = Pjk @ Σεjk @ Pjk.T
 print('Σy_hat = \n', Σy_hat)
 ```
+
+## Exercises
+
+```{exercise}
+:label: mv_normal_ex1
+
+**Verify conditional mean and variance by simulation**
+
+For the bivariate normal with
+
+$$
+\mu = \begin{bmatrix} 0.5 \\ 1.0 \end{bmatrix}, \quad
+\Sigma = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}
+$$
+
+fix $z_2 = 2$.
+
+(a) Use `MultivariateNormal` to compute the analytical conditional mean
+$\hat{\mu}_1$ and variance $\hat{\Sigma}_{11}$ of $z_1 \mid z_2 = 2$.
+
+(b) Draw $10^6$ samples from the joint distribution. Retain only those
+for which $|z_2 - 2| < 0.05$. Compute the sample mean and variance of
+the retained $z_1$ values.
+
+(c) Confirm that the sample estimates are close to the analytical values.
+```
+
+```{solution-start} mv_normal_ex1
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import statsmodels.api as sm
+
+μ = np.array([.5, 1.])
+Σ = np.array([[1., .5], [.5, 1.]])
+
+# (a) analytical conditional distribution
+mn = MultivariateNormal(μ, Σ)
+mn.partition(1)
+μ1_hat, Σ11_hat = mn.cond_dist(0, np.array([2.]))
+print(f"Analytical  μ̂₁ = {μ1_hat[0]:.4f},  Σ̂₁₁ = {Σ11_hat[0,0]:.4f}")
+
+# (b) simulation
+n = 1_000_000
+data = np.random.multivariate_normal(μ, Σ, size=n)
+z1_all, z2_all = data[:, 0], data[:, 1]
+
+mask = np.abs(z2_all - 2.) < 0.05
+z1_cond = z1_all[mask]
+print(f"Sample size in band: {mask.sum()}")
+print(f"Sample      μ̂₁ = {np.mean(z1_cond):.4f},  Σ̂₁₁ = {np.var(z1_cond, ddof=1):.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex2
+
+**Product of regression slopes equals squared correlation**
+
+For a bivariate normal with standard deviations $\sigma_1 = \sigma_2 = 1$ and
+correlation $\rho$, show analytically that $b_1 b_2 = \rho^2$, where
+$b_1$ is the slope of $z_1$ on $z_2$ and $b_2$ is the slope of $z_2$
+on $z_1$.
+
+Then verify numerically for $\rho \in \{0.2, 0.5, 0.9\}$ that
+`βs[0] * βs[1]` $= \rho^2$ by constructing the appropriate
+`MultivariateNormal` instances.
+```
+
+```{solution-start} mv_normal_ex2
+:class: dropdown
+```
+
+The regression slopes are
+
+$$
+b_1 = \frac{\Sigma_{12}}{\Sigma_{22}} = \frac{\rho \sigma_1 \sigma_2}{\sigma_2^2}
+= \rho \frac{\sigma_1}{\sigma_2}, \qquad
+b_2 = \frac{\Sigma_{21}}{\Sigma_{11}} = \rho \frac{\sigma_2}{\sigma_1}
+$$
+
+so $b_1 b_2 = \rho^2$.
+
+```{code-cell} python3
+import numpy as np
+
+for rho in [0.2, 0.5, 0.9]:
+    Σ = np.array([[1., rho], [rho, 1.]])
+    mn = MultivariateNormal(np.zeros(2), Σ)
+    mn.partition(1)
+    product = float(mn.βs[0]) * float(mn.βs[1])
+    print(f"ρ={rho:.1f}:  b1*b2 = {product:.4f},  ρ² = {rho**2:.4f},  match: {np.isclose(product, rho**2)}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex3
+
+**IQ inference: effect of the signal-to-noise ratio**
+
+Using the one-dimensional IQ model with $n = 50$ test scores and
+$\mu_\theta = 100$, $\sigma_\theta = 10$:
+
+(a) Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
+For each value, plot the posterior standard deviation
+$\hat{\sigma}_\theta$ as a function of the number of test scores
+included (from 1 to 50), with all curves on the same axes.
+
+(b) Explain intuitively why a larger $\sigma_y$ leads to a slower
+decline of posterior uncertainty.
+```
+
+```{solution-start} mv_normal_ex3
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+n_max = 50
+μθ_val, σθ_val = 100., 10.
+
+fig, ax = plt.subplots()
+for σy_val in [1., 5., 10., 20., 50.]:
+    σθ_hat_arr = np.empty(n_max)
+    for i in range(1, n_max + 1):
+        μ_i, Σ_i, _ = construct_moments_IQ(i, μθ_val, σθ_val, σy_val)
+        mn_i = MultivariateNormal(μ_i, Σ_i)
+        mn_i.partition(i)
+        _, Σθ_i = mn_i.cond_dist(1, np.zeros(i))   # conditioning value doesn't affect variance
+        σθ_hat_arr[i - 1] = np.sqrt(Σθ_i[0, 0])
+    ax.plot(range(1, n_max + 1), σθ_hat_arr, label=f'σy={σy_val:.0f}')
+
+ax.set_xlabel('number of test scores')
+ax.set_ylabel(r'posterior $\hat{\sigma}_\theta$')
+ax.legend()
+plt.show()
+```
+
+When $\sigma_y$ is large each test score is a noisy signal about $\theta$,
+so many more observations are required before the posterior variance falls
+appreciably. In the limit $\sigma_y \to 0$ a single observation pins down
+$\theta$ exactly.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex4
+
+**Prior vs. likelihood in IQ inference**
+
+Using the one-dimensional IQ model with $n = 20$ test scores and
+$\mu_\theta = 100$, $\sigma_y = 10$:
+
+(a) Fix $\sigma_y = 10$ and vary the prior spread
+$\sigma_\theta \in \{1, 5, 10, 50, 500\}$. For each value compute the
+posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
+scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
+
+(b) Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
+the posterior mean converges to the sample mean $\bar{y}$, and as
+$\sigma_y \to \infty$ the posterior mean converges to the prior mean
+$\mu_\theta$.
+```
+
+```{solution-start} mv_normal_ex4
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+n_scores = 20
+μθ_val, σy_val = 100., 10.
+
+# draw one set of test scores from a fixed "true" θ
+np.random.seed(42)
+true_θ = 108.
+y_obs = true_θ + σy_val * np.random.randn(n_scores)
+y_bar = np.mean(y_obs)
+
+σθ_vals = [1., 5., 10., 50., 500.]
+μθ_hat_vals = []
+
+for σθ_val in σθ_vals:
+    μ_i, Σ_i, _ = construct_moments_IQ(n_scores, μθ_val, σθ_val, σy_val)
+    mn_i = MultivariateNormal(μ_i, Σ_i)
+    mn_i.partition(n_scores)
+    μθ_hat, _ = mn_i.cond_dist(1, y_obs)
+    μθ_hat_vals.append(float(μθ_hat))
+
+fig, ax = plt.subplots()
+ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', label=r'$\hat{\mu}_\theta$')
+ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean ȳ = {y_bar:.1f}')
+ax.axhline(μθ_val, ls=':',  color='g', label=f'prior mean μθ = {μθ_val:.0f}')
+ax.set_xlabel(r'$\sigma_\theta$')
+ax.set_ylabel(r'posterior mean $\hat{\mu}_\theta$')
+ax.legend()
+plt.show()
+
+print(f"ȳ = {y_bar:.4f}")
+print(f"Large σθ posterior mean ≈ {μθ_hat_vals[-1]:.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex5
+
+**Kalman filter convergence**
+
+Using the `iterate` function from the Filtering Foundations section with
+
+$$
+A = \begin{bmatrix} 0.9 & 0 \\ 0 & 0.5 \end{bmatrix}, \quad
+C = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad
+G = \begin{bmatrix} 1 & 0 \end{bmatrix}, \quad
+R = \begin{bmatrix} 1 \end{bmatrix}
+$$
+
+and initial conditions $\hat{x}_0 = [0, 0]'$, $\Sigma_0 = I_2$:
+
+(a) Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
+
+(b) Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
+$\Sigma_t[1,1]$ over time. Verify that they converge to a steady state.
+
+(c) Plot the filtered state estimates $\hat{x}_t[0]$ together with the
+true $x_t[0]$ and the raw observations $y_t$ on a single figure.
+```
+
+```{solution-start} mv_normal_ex5
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+A_ex = np.array([[0.9, 0.], [0., 0.5]])
+C_ex = np.array([[1.], [1.]])
+G_ex = np.array([[1., 0.]])
+R_ex = np.array([[1.]])
+
+T_ex = 60
+x0_hat_ex = np.zeros(2)
+Σ0_ex = np.eye(2)
+
+# simulate true states and observations
+np.random.seed(7)
+x_true = np.zeros((T_ex + 1, 2))
+y_seq_ex = np.zeros(T_ex)
+for t in range(T_ex):
+    x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * np.random.randn()
+    y_seq_ex[t] = G_ex @ x_true[t] + np.random.randn()
+
+# run filter
+x_hat_seq, Σ_hat_seq = iterate(x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
+
+# (b) conditional variances
+fig, ax = plt.subplots()
+ax.plot(Σ_hat_seq[:, 0, 0], label=r'$\Sigma_t[0,0]$')
+ax.plot(Σ_hat_seq[:, 1, 1], label=r'$\Sigma_t[1,1]$')
+ax.set_xlabel('t')
+ax.set_ylabel('conditional variance')
+ax.legend()
+plt.show()
+
+# (c) filtered state vs. truth vs. observations
+fig, ax = plt.subplots()
+ax.plot(x_true[1:, 0], label='true $x_t[0]$', alpha=0.7)
+ax.plot(x_hat_seq[1:, 0], label=r'filtered $\hat{x}_t[0]$', ls='--')
+ax.plot(y_seq_ex, label='observations $y_t$', alpha=0.4, lw=0.8)
+ax.set_xlabel('t')
+ax.legend()
+plt.show()
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex6
+
+**PCA vs. factor analysis**
+
+In the classic factor analysis model at the end of the lecture the true
+covariance is $\Sigma_y = \Lambda \Lambda' + D$.
+
+(a) Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
+variance explained by the first two principal components and compare
+it with the $\sigma_u = 0.5$ result. Explain the change.
+
+(b) Show that the conditional expectation $E[f \mid Y] = BY$ with
+$B = \Lambda' \Sigma_y^{-1}$ is **not** equal to the two-component PCA
+projection $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$. Plot both on the same
+axes.
+
+(c) In one or two sentences, explain why PCA is misspecified for
+factor-analytic data.
+```
+
+```{solution-start} mv_normal_ex6
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+N_fa = 10
+k_fa = 2
+
+Λ_fa = np.zeros((N_fa, k_fa))
+Λ_fa[:N_fa//2, 0] = 1
+Λ_fa[N_fa//2:, 1] = 1
+
+results_table = {}
+for σu_val in [0.5, 2.0]:
+    D_fa = np.eye(N_fa) * σu_val ** 2
+    Σy_fa = Λ_fa @ Λ_fa.T + D_fa
+
+    λ_fa, P_fa = np.linalg.eigh(Σy_fa)
+    ind_fa = sorted(range(N_fa), key=lambda x: λ_fa[x], reverse=True)
+    P_fa   = P_fa[:, ind_fa]
+    λ_fa   = λ_fa[ind_fa]
+
+    frac = λ_fa[:2].sum() / λ_fa.sum()
+    results_table[σu_val] = frac
+    print(f"σu={σu_val}: fraction explained by first 2 PCs = {frac:.4f}")
+
+# (b) comparison using σu=0.5
+σu_b = 0.5
+D_b  = np.eye(N_fa) * σu_b ** 2
+Σy_b = Λ_fa @ Λ_fa.T + D_b
+
+μz_b = np.zeros(k_fa + N_fa)
+Σz_b = np.block([[np.eye(k_fa), Λ_fa.T], [Λ_fa, Σy_b]])
+z_b  = np.random.multivariate_normal(μz_b, Σz_b)
+f_b  = z_b[:k_fa]
+y_b  = z_b[k_fa:]
+
+# factor-analytic E[f|y]
+B_b    = Λ_fa.T @ np.linalg.inv(Σy_b)
+Efy_b  = B_b @ y_b
+
+# PCA projection
+λ_b, P_b = np.linalg.eigh(Σy_b)
+ind_b    = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
+P_b      = P_b[:, ind_b]
+ε_b      = P_b.T @ y_b
+y_hat_b  = P_b[:, :2] @ ε_b[:2]
+
+fig, ax = plt.subplots(figsize=(8, 4))
+ax.scatter(range(N_fa), Λ_fa @ Efy_b, label=r'Factor-analytic $\Lambda E[f\mid y]$')
+ax.scatter(range(N_fa), y_hat_b, marker='x', label=r'PCA projection $\hat{y}$')
+ax.scatter(range(N_fa), Λ_fa @ f_b, marker='^', alpha=0.6, label=r'True signal $\Lambda f$')
+ax.set_xlabel('observation index')
+ax.legend()
+plt.show()
+```
+
+PCA is misspecified for factor-analytic data because it imposes no
+structure on the residual covariance: it decomposes $\Sigma_y$ into
+eigenvectors that need not align with the factor loadings $\Lambda$.
+The factor model, by contrast, correctly separates the covariance into a
+low-rank systematic part $\Lambda\Lambda'$ and a diagonal idiosyncratic
+part $D$, so its conditional expectation $E[f\mid Y]$ is the minimum-variance
+linear estimator of the factors.
+
+```{solution-end}
+```
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index b142b9e39..3708b9599 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -1,10 +1,10 @@
 ---
 jupytext:
   text_representation:
-    extension: .myst
+    extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.13.8
+    jupytext_version: 1.17.1
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -465,7 +465,7 @@ $$
 An associated conditional distribution is
 
 $$
-\textrm{Prob}\{Y=i\vert X=j\} = \frac{\rho_{ij}}{ \sum_{j}\rho_{ij}}
+\textrm{Prob}\{Y=j\vert X=i\} = \frac{\rho_{ij}}{ \sum_{j}\rho_{ij}}
 = \frac{\textrm{Prob}\{Y=j, X=i\}}{\textrm{Prob}\{ X=i\}}
 $$
 
@@ -491,7 +491,7 @@ The first row is the probability that $Y=j, j=0,1$ conditional on $X=0$.
 The second row is the probability that $Y=j, j=0,1$ conditional on $X=1$.
 
 Note that
-- $\sum_{j}\rho_{ij}= \frac{ \sum_{j}\rho_{ij}}{ \sum_{j}\rho_{ij}}=1$, so each row of the transition matrix $P$ is a probability distribution (not so for each column).
+- $\sum_{j}p_{ij}= \frac{ \sum_{j}\rho_{ij}}{ \sum_{j}\rho_{ij}}=1$, so each row of the transition matrix $P$ is a probability distribution (not so for each column).
 
 
 
@@ -891,11 +891,6 @@ $$
 f(x,y) =(2\pi\sigma_1\sigma_2\sqrt{1-\rho^2})^{-1}\exp\left[-\frac{1}{2(1-\rho^2)}\left(\frac{(x-\mu_1)^2}{\sigma_1^2}-\frac{2\rho(x-\mu_1)(y-\mu_2)}{\sigma_1\sigma_2}+\frac{(y-\mu_2)^2}{\sigma_2^2}\right)\right]
 $$
 
-
-$$
-\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\left[-\frac{1}{2(1-\rho^2)}\left(\frac{(x-\mu_1)^2}{\sigma_1^2}-\frac{2\rho(x-\mu_1)(y-\mu_2)}{\sigma_1\sigma_2}+\frac{(y-\mu_2)^2}{\sigma_2^2}\right)\right]
-$$
-
 We start with a  bivariate normal distribution pinned down by
 
 $$
@@ -1199,7 +1194,7 @@ $$
 \mu_{0}= (1-q)(1-r)+(1-q)r & =1-q\\
 \mu_{1}= q(1-r)+qr & =q\\
 \nu_{0}= (1-q)(1-r)+(1-r)q& =1-r\\
-\mu_{1}= r(1-q)+qr& =r
+\nu_{1}= r(1-q)+qr& =r
 \end{aligned}
 $$
 
@@ -1488,3 +1483,405 @@ print(c2_ymtb)
 We have verified that both joint distributions, $c_1$ and $c_2$, have identical marginal distributions of $X$ and $Y$, respectively.
 
 So they are both couplings of $X$ and $Y$.
+
+**Gaussian Copula Example**
+
+A **Gaussian copula** uses the bivariate normal distribution to induce dependence between
+arbitrary marginal distributions.
+
+The construction has three steps:
+
+1. Draw $(Z_1, Z_2)$ from a bivariate standard normal with correlation $\rho$.
+2. Apply the standard normal CDF: $U_k = \Phi(Z_k)$. The pair $(U_1, U_2)$ has uniform marginals but retains the dependence structure of $(Z_1, Z_2)$ — this is the copula.
+3. Apply the inverse CDF of any desired marginal: $X_k = F_k^{-1}(U_k)$.
+
+The following code illustrates this with exponential marginals.
+
+```{code-cell} ipython3
+from scipy import stats
+
+# Gaussian copula parameters
+rho_cop = 0.8
+n_cop = 100_000
+
+# Step 1: draw from bivariate standard normal with correlation rho_cop
+z = np.random.multivariate_normal(
+    [0, 0], [[1, rho_cop], [rho_cop, 1]], n_cop
+)
+
+# Step 2: apply normal CDF -> uniform marginals (the copula itself)
+u1 = stats.norm.cdf(z[:, 0])
+u2 = stats.norm.cdf(z[:, 1])
+
+# Step 3: apply inverse CDFs of desired marginals (here: Exponential)
+x1 = stats.expon.ppf(u1, scale=1.0)   # Exp with mean 1
+x2 = stats.expon.ppf(u2, scale=0.5)   # Exp with mean 0.5
+
+fig, axes = plt.subplots(1, 2, figsize=(10, 4))
+axes[0].scatter(u1[:3000], u2[:3000], alpha=0.2, s=2)
+axes[0].set_xlabel('$u_1$')
+axes[0].set_ylabel('$u_2$')
+axes[0].set_title(f'Copula (uniform marginals, ρ={rho_cop})')
+axes[1].scatter(x1[:3000], x2[:3000], alpha=0.2, s=2)
+axes[1].set_xlabel('$x_1$ (Exp, mean=1)')
+axes[1].set_ylabel('$x_2$ (Exp, mean=0.5)')
+axes[1].set_title('Exponential marginals via Gaussian copula')
+plt.tight_layout()
+plt.show()
+
+print(f"Sample correlation of (x1, x2): {np.corrcoef(x1, x2)[0, 1]:.3f}")
+print(f"Sample correlation of (u1, u2): {np.corrcoef(u1, u2)[0, 1]:.3f}")
+```
+
+The left panel shows the copula itself — the dependence structure in uniform coordinates.
+The right panel shows the same dependence translated to exponential marginals.
+Changing $\rho$ controls the strength of dependence while the marginals remain unchanged.
+
+## Exercises
+
+```{exercise}
+:label: prob_matrix_ex1
+
+**Independence Test**
+
+Consider the joint distribution
+
+$$
+F = \begin{bmatrix} 0.3 & 0.2 \\ 0.1 & 0.4 \end{bmatrix}
+$$
+
+where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
+
+(a) Compute the marginal distributions $\mu_i = \text{Prob}\{X=i\}$ and $\nu_j = \text{Prob}\{Y=j\}$.
+
+(b) Form the independence matrix $f^{\perp}_{ij} = \mu_i \nu_j$ (the outer product of the two marginal vectors).
+
+(c) Compare $F$ with $f^{\perp}$ and determine whether $X$ and $Y$ are independent.
+
+(d) Verify your conclusion by computing $\text{Prob}\{X=0|Y=10\}$ and checking whether it equals $\text{Prob}\{X=0\}$.
+```
+
+```{solution-start} prob_matrix_ex1
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+F = np.array([[0.3, 0.2],
+              [0.1, 0.4]])
+
+# (a) marginals
+mu = F.sum(axis=1)   # sum over columns -> marginal for X
+nu = F.sum(axis=0)   # sum over rows    -> marginal for Y
+print("mu (marginal of X):", mu)
+print("nu (marginal of Y):", nu)
+
+# (b) independence matrix
+F_indep = np.outer(mu, nu)
+print("\nIndependence matrix (outer product):\n", F_indep)
+print("\nActual joint F:\n", F)
+
+# (c) test independence
+print("\nIndependent (F == mu ⊗ nu)?", np.allclose(F, F_indep))
+
+# (d) conditional vs. marginal
+prob_X0_given_Y10 = F[0, 0] / nu[0]
+print(f"\nProb(X=0 | Y=10) = {prob_X0_given_Y10:.4f}")
+print(f"Prob(X=0)         = {mu[0]:.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex2
+
+**Covariance and Correlation**
+
+Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 20\}$ as in Exercise 1:
+
+(a) Compute $\mathbb{E}[X]$, $\mathbb{E}[Y]$, and $\mathbb{E}[XY] = \sum_i \sum_j x_i y_j f_{ij}$.
+
+(b) Compute $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$.
+
+(c) Compute $\text{Cor}(X,Y) = \text{Cov}(X,Y) / (\sigma_X \sigma_Y)$.
+
+(d) Show analytically that $X \perp Y$ implies $\text{Cov}(X,Y) = 0$.
+```
+
+```{solution-start} prob_matrix_ex2
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+xs = np.array([0, 1])
+ys = np.array([10, 20])
+F  = np.array([[0.3, 0.2],
+               [0.1, 0.4]])
+
+mu = F.sum(axis=1)
+nu = F.sum(axis=0)
+
+# (a)
+E_X  = xs @ mu
+E_Y  = ys @ nu
+E_XY = sum(xs[i] * ys[j] * F[i, j] for i in range(2) for j in range(2))
+print(f"E[X] = {E_X}, E[Y] = {E_Y}, E[XY] = {E_XY}")
+
+# (b)
+cov_XY = E_XY - E_X * E_Y
+print(f"Cov(X,Y) = {cov_XY:.4f}")
+
+# (c)
+var_X  = ((xs - E_X)**2) @ mu
+var_Y  = ((ys - E_Y)**2) @ nu
+cor_XY = cov_XY / np.sqrt(var_X * var_Y)
+print(f"Cor(X,Y) = {cor_XY:.4f}")
+```
+
+For part (d): if $X \perp Y$ then $f_{ij} = \mu_i \nu_j$, so
+
+$$
+\mathbb{E}[XY] = \sum_i \sum_j x_i y_j \mu_i \nu_j
+= \left(\sum_i x_i \mu_i\right)\!\left(\sum_j y_j \nu_j\right)
+= \mathbb{E}[X]\,\mathbb{E}[Y]
+$$
+
+and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0$.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex3
+
+**Sum of Two Dice (Convolution)**
+
+Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
+
+(a) Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
+
+(b) Plot the theoretical distribution.
+
+(c) Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
+
+(d) Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
+```
+
+```{solution-start} prob_matrix_ex3
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+
+# (a) convolution
+f = np.ones(6) / 6
+h = np.convolve(f, f)        # Z takes values 2,...,12
+z_vals = np.arange(2, 13)
+
+# (b & c) plot theory and simulation
+n = 1_000_000
+z_sim = np.random.randint(1, 7, n) + np.random.randint(1, 7, n)
+counts = np.bincount(z_sim, minlength=13)[2:]
+
+fig, ax = plt.subplots()
+ax.bar(z_vals - 0.2, h,          0.4, alpha=0.7, label='Theoretical')
+ax.bar(z_vals + 0.2, counts / n, 0.4, alpha=0.7, label='Empirical')
+ax.set_xlabel('Z = X + Y')
+ax.set_ylabel('Probability')
+ax.legend()
+plt.show()
+
+# (d) moments
+E_Z   = z_vals @ h
+Var_Z = ((z_vals - E_Z)**2) @ h
+print(f"Theory:     E[Z] = {E_Z:.2f}, Var(Z) = {Var_Z:.4f}")
+print(f"Simulation: E[Z] = {np.mean(z_sim):.2f}, Var(Z) = {np.var(z_sim):.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex4
+
+**Multi-Step Transition Probabilities**
+
+Consider a two-state Markov chain with transition matrix
+
+$$
+P = \begin{bmatrix} 0.9 & 0.1 \\ 0.2 & 0.8 \end{bmatrix}
+$$
+
+where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
+
+(a) Starting from $\psi_0 = [1, 0]$, compute $\psi_n = \psi_0 P^n$ for $n = 1, 5, 20, 100$.
+
+(b) Find the stationary distribution $\psi^*$ satisfying $\psi^* P = \psi^*$ and $\sum_i \psi^*_i = 1$.
+
+(c) Verify numerically that $\psi_n \to \psi^*$ as $n$ grows.
+```
+
+```{solution-start} prob_matrix_ex4
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+P    = np.array([[0.9, 0.1],
+                 [0.2, 0.8]])
+psi0 = np.array([1.0, 0.0])
+
+# (a)
+for n in [1, 5, 20, 100]:
+    print(f"psi_{n:3d} = {psi0 @ np.linalg.matrix_power(P, n)}")
+
+# (b) stationary: solve (P^T - I) psi = 0  with  sum = 1
+A = np.vstack([P.T - np.eye(2), np.ones(2)])
+b = np.array([0.0, 0.0, 1.0])
+psi_star, *_ = np.linalg.lstsq(A, b, rcond=None)
+print(f"\nStationary distribution: {psi_star}")
+
+# (c) verify
+psi_100 = psi0 @ np.linalg.matrix_power(P, 100)
+print(f"psi_100 close to stationary? {np.allclose(psi_100, psi_star, atol=1e-6)}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex5
+
+**Fréchet–Hoeffding Bounds**
+
+Let $X \in \{0,1\}$ and $Y \in \{0,1\}$ with marginals $\mu = [0.5,\, 0.5]$ and $\nu = [0.4,\, 0.6]$.
+
+(a) Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
+
+(b) Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
+
+(c) Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
+
+(d) Verify that all three have the correct marginals.
+
+(e) For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
+```
+
+```{solution-start} prob_matrix_ex5
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+xs = np.array([0, 1])
+ys = np.array([0, 1])
+mu = np.array([0.5, 0.5])
+nu = np.array([0.4, 0.6])
+
+# (a) upper Fréchet: maximise P(X=i, Y=i)
+F_upper = np.array([[0.4, 0.1],
+                    [0.0, 0.5]])
+
+# (b) lower Fréchet: maximise P(X=i, Y=1-i)
+F_lower = np.array([[0.0, 0.5],
+                    [0.4, 0.1]])
+
+# (c) independent
+F_indep = np.outer(mu, nu)
+
+# (d) check marginals
+for F, name in [(F_upper, "Upper Fréchet"),
+                (F_lower, "Lower Fréchet"),
+                (F_indep, "Independent  ")]:
+    print(f"{name}: row sums = {F.sum(axis=1)}, col sums = {F.sum(axis=0)}")
+
+# (e) correlations
+def correlation(F, xs, ys):
+    mu_x  = F.sum(axis=1)
+    nu_y  = F.sum(axis=0)
+    E_X   = xs @ mu_x
+    E_Y   = ys @ nu_y
+    E_XY  = sum(xs[i]*ys[j]*F[i,j] for i in range(2) for j in range(2))
+    cov   = E_XY - E_X * E_Y
+    sig_X = np.sqrt(((xs - E_X)**2) @ mu_x)
+    sig_Y = np.sqrt(((ys - E_Y)**2) @ nu_y)
+    return cov / (sig_X * sig_Y)
+
+print(f"\nCor upper Fréchet = {correlation(F_upper, xs, ys):.4f}  (maximum)")
+print(f"Cor lower Fréchet = {correlation(F_lower, xs, ys):.4f}  (minimum)")
+print(f"Cor independent   = {correlation(F_indep, xs, ys):.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex6
+
+**Bayes' Law with a Discrete Prior**
+
+A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
+
+(a) After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
+
+$$
+\mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
+$$
+
+for each $\theta$.
+
+(b) Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
+
+(c) Plot the prior and posterior side by side.
+
+(d) Repeat for $k = 3$ heads and describe how the posterior shifts.
+```
+
+```{solution-start} prob_matrix_ex6
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.special import comb
+
+thetas = np.array([0.2, 0.5, 0.8])
+prior  = np.array([0.25, 0.50, 0.25])
+
+def compute_posterior(k, n, thetas, prior):
+    likelihood = comb(n, k) * thetas**k * (1 - thetas)**(n - k)
+    unnorm = likelihood * prior
+    return unnorm / unnorm.sum(), likelihood
+
+post7, lik7 = compute_posterior(7, 10, thetas, prior)
+post3, lik3 = compute_posterior(3, 10, thetas, prior)
+
+print("k=7:  likelihood =", lik7.round(4), " posterior =", post7.round(4))
+print("k=3:  likelihood =", lik3.round(4), " posterior =", post3.round(4))
+
+x = np.arange(len(thetas))
+w = 0.3
+fig, axes = plt.subplots(1, 2, figsize=(10, 4))
+for ax, post, title in zip(axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
+    ax.bar(x - w/2, prior, w, label='Prior',     alpha=0.7)
+    ax.bar(x + w/2, post,  w, label='Posterior', alpha=0.7)
+    ax.set_xticks(x)
+    ax.set_xticklabels([f'θ={t}' for t in thetas])
+    ax.set_ylabel('Probability')
+    ax.set_title(title)
+    ax.legend()
+plt.tight_layout()
+plt.show()
+```
+
+```{solution-end}
+```

From cc8d3dcc5db8944f35e2f9736cc050f587fa0de0 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 15:46:35 +1000
Subject: [PATCH 03/26] updates

---
 .../quantecon-lecture-writing.instructions.md |  87 -----
 .github/prompts/quantecon-lecture.prompt.md   | 209 -----------
 lectures/information_market_equilibrium.md    | 352 +++++++++++-------
 lectures/lagrangian_lqdp.md                   |  60 ++-
 lectures/multivariate_normal.md               | 160 ++++----
 lectures/prob_matrix.md                       | 222 ++++++-----
 6 files changed, 431 insertions(+), 659 deletions(-)
 delete mode 100644 .github/instructions/quantecon-lecture-writing.instructions.md
 delete mode 100644 .github/prompts/quantecon-lecture.prompt.md

diff --git a/.github/instructions/quantecon-lecture-writing.instructions.md b/.github/instructions/quantecon-lecture-writing.instructions.md
deleted file mode 100644
index ee3b607f7..000000000
--- a/.github/instructions/quantecon-lecture-writing.instructions.md
+++ /dev/null
@@ -1,87 +0,0 @@
----
-applyTo: "lectures/**/*.md"
-description: "MyST markdown and QuantEcon lecture writing conventions. Applied when editing or creating files in the lectures/ directory."
----
-
-# QuantEcon Lecture Writing Conventions
-
-## Equation Spacing (Critical)
-
-Display equations **must** have a blank line before `$$` and after `$$`:
-
-```
-text before
-
-$$
-equation here
-$$
-
-text after
-```
-
-Never place `$$` immediately adjacent to text lines.
-
-## File Frontmatter
-
-Every lecture `.md` file starts with:
-
-```yaml
----
-jupytext:
-  text_representation:
-    extension: .md
-    format_name: myst
-    format_version: 0.13
-    jupytext_version: 1.17.1
-kernelspec:
-  display_name: Python 3 (ipykernel)
-  language: python
-  name: python3
----
-```
-
-## Cross-Reference Label
-
-Immediately after frontmatter, before the title:
-
-```
-(lecture_label)=
-```{raw} jupyter
-<div id="qe-notebook-header" ...>...</div>
-```
-
-# Title
-```
-
-## Code Cells
-
-All executable Python uses `` ```{code-cell} ipython3 ``.
-Non-executable code uses `` ```python ``.
-
-## Citations and References
-
-- Cite with `{cite}` `` `BibKey` ``
-- Check `lectures/_static/quant-econ.bib` for existing keys before adding new ones
-- New references go in a separate `_extra.bib` file alongside the lecture
-
-## Exercises
-
-Use the paired directives:
-
-```
-```{exercise}
-:label: label_ex1
-...
-```
-
-```{solution-start} label_ex1
-:class: dropdown
-```
-...
-```{solution-end}
-```
-```
-
-## Preferred Python Libraries
-
-`numpy`, `scipy`, `matplotlib`, `quantecon`, `jax` (for computationally intensive work), `numba`
diff --git a/.github/prompts/quantecon-lecture.prompt.md b/.github/prompts/quantecon-lecture.prompt.md
deleted file mode 100644
index 453651434..000000000
--- a/.github/prompts/quantecon-lecture.prompt.md
+++ /dev/null
@@ -1,209 +0,0 @@
----
-name: "QuantEcon Lecture from Paper"
-description: "Convert a scientific paper (PDF or .tex) into a QuantEcon lecture in MyST markdown. Attach the paper file before invoking. Produces a .md lecture file and a supplementary .bib file."
-argument-hint: "Attach the paper PDF or .tex file, then optionally specify the desired output filename (e.g. 'my_topic.md')"
-agent: "agent"
----
-
-You are helping Thomas Sargent convert a scientific paper into a QuantEcon lecture written in the MyST dialect of markdown, following the style and conventions of [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md).
-
-## Your Task
-
-1. **Read the attached paper** (PDF or .tex). Understand its core economic/mathematical content, key results, key intuitions, and analytical techniques.
-
-2. **Draft a complete QuantEcon lecture** as a `.md` file in `lectures/`. The lecture should:
-   - Explain the paper's ideas accessibly to a graduate student audience
-   - Lead the reader through the theory step by step, not just summarize
-   - Include substantial Python code cells that illustrate, compute, and visualize the paper's key results
-   - End with exercises (with full solutions in dropdown blocks)
-
-3. **Produce a supplementary `.bib` file** for any references not already in [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib).
-
----
-
-## MyST / Jupyter Book Format Rules
-
-Follow these rules exactly. Study [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md) as the canonical example.
-
-### File Frontmatter (required, verbatim structure)
-
-```
----
-jupytext:
-  text_representation:
-    extension: .md
-    format_name: myst
-    format_version: 0.13
-    jupytext_version: 1.17.1
-kernelspec:
-  display_name: Python 3 (ipykernel)
-  language: python
-  name: python3
----
-```
-
-### Required Header Block
-
-Immediately after the frontmatter, add a cross-reference label and the QuantEcon logo block:
-
-```
-(my_lecture_label)=
-```{raw} jupyter
-<div id="qe-notebook-header" align="right" style="text-align:right;">
-        <a href="https://quantecon.org/" title="quantecon.org">
-                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
-        </a>
-</div>
-```
-
-# Lecture Title
-
-```{contents} Contents
-:depth: 2
-```
-```
-
-### Equations — CRITICAL SPACING RULE
-
-Every display equation block **must** have a blank line before the opening `$$` and a blank line after the closing `$$`. This is mandatory.
-
-**Correct:**
-
-```
-some text before
-
-$$
-E[x] = \mu
-$$
-
-some text after
-```
-
-**Wrong (will break the build):**
-
-```
-some text before
-$$
-E[x] = \mu
-$$
-some text after
-```
-
-Inline math uses single dollars: $\mu$, $\sigma^2$.
-
-Multi-line aligned equations use:
-
-```
-$$
-\begin{aligned}
-a &= b \\
-c &= d
-\end{aligned}
-$$
-```
-
-### Code Cells
-
-Use ` ```{code-cell} ipython3 ` for all executable Python. For the `pip install` cell at the top (if needed):
-
-```
-```{code-cell} ipython3
-:tags: [hide-output]
-!pip install --upgrade quantecon
-```
-```
-
-### Citations
-
-Use `{cite}` with the BibTeX key: `{cite}` `` `Author_Year` ``. Example: `{cite}` `` `Neyman_Pearson` ``.
-
-Check [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib) first. Add only truly missing references to the new `.bib` file.
-
-### Cross-references
-
-- Link to other lectures: `{doc}` `` `likelihood_ratio_process` ``
-- Label a section: `(my_label)=` on the line before the heading
-- Reference a label: `{ref}` `` `my_label` ``
-
-### Admonitions
-
-```
-```{note}
-...
-```
-
-```{warning}
-...
-```
-```
-
-### Exercises with Solutions
-
-```
-```{exercise}
-:label: ex_label1
-
-Exercise text here.
-```
-
-```{solution-start} ex_label1
-:class: dropdown
-```
-
-Full solution here, including code cells if needed.
-
-```{solution-end}
-```
-```
-
----
-
-## Lecture Structure Template
-
-Follow this section order:
-
-1. **Overview** — What is this lecture about? What will the reader learn? List bullets.
-2. **Setup** — Imports code cell (all needed libraries). If non-standard packages are needed, add the `pip install` cell first.
-3. **Theory sections** — Walk through mathematical content. Alternate prose, equations, and code cells. Each major concept gets its own `##` section.
-4. **Computational/Simulation sections** — Python code that replicates or extends the paper's numerical results.
-5. **Exercises** — 2–4 exercises ranging from straightforward to challenging, each with a full solution.
-6. **References** — at the end, just add: `` ```{bibliography} `` on its own if references were cited (the global bib handles this automatically via `_config.yml`).
-
----
-
-## Python Code Guidelines
-
-- Use `numpy`, `scipy`, `matplotlib`, `quantecon` as the default stack
-- Prefer `jax.numpy` / JAX for computationally intensive sections (this repo already has JAX installed)
-- Every figure should call `plt.show()` or `plt.tight_layout(); plt.show()`
-- Write clean, readable code with short docstrings on functions
-- Simulate and plot the paper's key theoretical results rather than just describing them
-
----
-
-## Supplementary BibTeX File
-
-Name it `lectures/_static/<lecture_name>_extra.bib`. Format example:
-
-```bibtex
-@article{Author_Year,
-  author  = {Last, First and Last2, First2},
-  title   = {Full Title of the Paper},
-  journal = {Journal Name},
-  volume  = {XX},
-  number  = {Y},
-  pages   = {1--30},
-  year    = {YYYY}
-}
-```
-
-Only include references **not already found** in `lectures/_static/quant-econ.bib`.
-
----
-
-## Output
-
-Produce the complete lecture as a single MyST markdown file. After completing it, also report:
-- The name and path of the output file (e.g. `lectures/my_topic.md`)
-- The name and path of the supplementary bib file (if any new references were needed)
-- A brief (3–5 bullet) summary of what the lecture covers
diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 222f9bccb..851560d7b 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -28,42 +28,51 @@ kernelspec:
 
 ## Overview
 
-This lecture studies two questions about the **informational role of prices**  posed and
+This lecture studies two questions about the **informational role of prices** posed and
 answered by {cite:t}`kihlstrom_mirman1975`.
 
-1. **When do prices transmit inside information?**  An informed insider observes a private
+1. *When do prices transmit inside information?*   
+   - An informed insider observes a private
    signal correlated with an unknown state of the world and adjusts demand accordingly.
-   Equilibrium prices shift.  Under what conditions can an outside observer *infer* the
+   - Equilibrium prices shift. 
+   - Under what conditions can an outside observer *infer* the
    insider's private signal from the equilibrium price?
 
-2. **Do Bayesian price expectations converge?**  In a stationary stochastic exchange
-   economy, an uninformed observer uses the history of market prices and Bayes' Law  to form
-    expectations about the economy's structure.  Do those expectations eventually
-   agree with those  of a fully informed observer?
+2. *Do Bayesian price expectations converge?*  
+   - In a stationary stochastic exchange
+   economy, an uninformed observer uses the history of market prices and Bayes' Law to form
+   expectations about the economy's structure.  
+   - Do those expectations eventually
+   agree with those of a fully informed observer?
 
 Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
 
-- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be  *sufficient* for a random variable
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be *sufficient* for a random variable
   $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
   information about the state that $\tilde{y}'$ contains.
 - **Bayesian consistency**: as the sample grows, a Bayesian statistician's posterior probability distribution concentrates on the true
-  parameter value *even when the underlying economic structure is not globally identified from prices alone*.
+  parameter value, even when the underlying economic structure is not globally identified from prices alone.
 
 Important findings of {cite:t}`kihlstrom_mirman1975` are:
 
-- Equilibrium prices transmit inside information **if and only if** the map from the
+- Equilibrium prices transmit inside information *if and only if* the map from the
   insider's posterior distribution to the equilibrium price vector is invertible
   (one-to-one).
 - For a two-state pure exchange economy with CES preferences, invertibility holds whenever the
-  elasticity of substitution $\sigma \neq 1$.  With Cobb-Douglas preferences ($\sigma = 1$)
+  elasticity of substitution $\sigma \neq 1$.  
+  - With Cobb-Douglas preferences ($\sigma = 1$)
   the equilibrium price is independent of the insider's posterior, so information is never
   transmitted.
-- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not  identified.
+- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not identified.
 
 ```{note}
-{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in a
-way that careful econometricians do.  Reduced-form  and structural models  come in pairs. To each structure or structural model
-there is a reduced form, or collection of reduced forms, underlying  different possible regressions.
+{cite:t}`kihlstrom_mirman1975` use the terms "reduced form" and "structural" models in a
+way that careful econometricians do. 
+
+Reduced-form and structural models come in pairs. 
+
+To each structure or structural model
+there is a reduced form, or collection of reduced forms, underlying different possible regressions.
 ```
 
 The lecture is organized as follows.
@@ -71,7 +80,7 @@ The lecture is organized as follows.
 1. Set up the static two-commodity model and define equilibrium.
 2. State the price-revelation theorem (Theorem 1 of the paper) and the invertibility
    conditions (Theorem 2).
-3. Illustrate invertibility — and its failure — with numerical examples using CES and
+3. Illustrate invertibility and its failure with numerical examples using CES and
    Cobb-Douglas preferences.
 4. Introduce the dynamic stochastic economy and derive the Bayesian convergence result.
 5. Simulate Bayesian learning from price observations.
@@ -87,9 +96,9 @@ from scipy.optimize import brentq
 from scipy.stats import norm
 ```
 
-## A Two-Commodity Economy with an Informed Insider
+## A two-commodity economy with an informed insider
 
-### Preferences, Endowments, and the Unknown State
+### Preferences, endowments, and the unknown state
 
 The economy has two goods. 
 
@@ -115,7 +124,7 @@ representative firm.
 The firm's profit $\pi$ is determined by profit maximization.
 
 Agent
-$i$'s **budget constraint** is
+$i$'s budget constraint is
 
 $$
 p x_1^i + x_2^i = w^i + \theta^i \pi.
@@ -126,7 +135,7 @@ Agents maximize expected utility subject to their budget constraints.
 A **competitive
 equilibrium** is a price $\hat{p}$ that clears both markets simultaneously.
 
-### The Informed Agent's Problem
+### The informed agent's problem
 
 Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$ correlated with
 $\bar{a}$ before trading.
@@ -151,9 +160,9 @@ This is possible when the map $\mu \mapsto p(\mu)$
 is **invertible** on the relevant domain.
 
 (price_revelation_theorem)=
-## Price Revelation: Theorem 1
+## Price revelation
 
-### Blackwell Sufficiency
+### Blackwell sufficiency
 
 The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's private
 information if observing the equilibrium price is just as informative about $\bar{a}$ as
@@ -162,9 +171,12 @@ observing the signal $\tilde{y}$ directly.
 In Blackwell's language ({cite}`blackwell1951` and {cite}`blackwell1953`), this means
 $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 
-**Definition.**  A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
+```{prf:definition} Sufficiency
+:label: ime_def_sufficiency
+
+A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
 respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
-**independent of**$\bar{a}$, such that
+**independent of** $\bar{a}$, such that
 
 $$
 \phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
@@ -175,11 +187,17 @@ where $\phi_a(y) = PR(\tilde{y} = y \mid \bar{a} = a)$.
 
 Thus, once $\tilde{y}$ is known, $\tilde{y}'$ provides no additional information
 about $\bar{a}$.
+```
+
+```{prf:lemma} Posterior Sufficiency
+:label: ime_lemma_posterior_sufficiency
 
-**Lemma 1** ({cite:t}`kihlstrom_mirman1975`).  The posterior distribution $\mu_{\tilde{y}}$
-is  sufficient  for $\tilde{y}$.
+({cite:t}`kihlstrom_mirman1975`) The posterior distribution $\mu_{\tilde{y}}$
+is sufficient for $\tilde{y}$.
+```
 
-*Proof sketch.*  The posterior $\mu_{\tilde{y}}$ satisfies
+```{prf:proof} (Sketch)
+The posterior $\mu_{\tilde{y}}$ satisfies
 
 $$
 PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
@@ -187,9 +205,13 @@ PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
 $$
 
 Because the posterior itself *encodes* what $\tilde{y}$ says about $\bar{a}$, observing
-$\tilde{y}$ directly would add no information. $\square$
+$\tilde{y}$ directly would add no information.
+```
+
+```{prf:theorem} Price Revelation
+:label: ime_theorem_price_revelation
 
-**Theorem 1** ({cite:t}`kihlstrom_mirman1975`).  In the economy described above, the price
+In the economy described above, the price
 random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only if** the
 function $p(PR^1)$ is **invertible** on the set
 
@@ -197,25 +219,34 @@ $$
 P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
   PR(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu(a) > 0 \bigr\}.
 $$
+```
 
 The "only if" direction follows because if $p$ were not one-to-one, two different posteriors
 would generate the same price; an observer could not distinguish them, so the price would
-not transmit all information that resides in  the signal.
+not transmit all information that resides in the signal.
+
+### Two interpretations
+
+#### Insider trading in a stock market
 
-### Two Interpretations
+Good 1 is a risky asset with random return $\bar{a}$; good 2 is "money".
+
+An insider's demand reveals private information about the return.
 
-**Insider trading in a stock market.**  Good 1 is a risky asset with random return $\bar{a}$;
-good 2 is ''money''.  An insider's demand reveals private information about the return.
 If the invertibility condition holds, outside observers can read the insider's signal from
 the equilibrium stock price.
 
-**Price as a quality signal.**  Good 1 has uncertain quality $\bar{a}$.  Experienced
-consumers (who have sampled the good) observe a signal correlated with quality and buy
-accordingly.  Uninformed consumers can infer quality from the market price, provided
-invertibility holds.
+#### Price as a quality signal
+
+Good 1 has uncertain quality $\bar{a}$.
+
+Experienced consumers (who have sampled the good) observe a signal correlated with quality
+and buy accordingly.
+
+Uninformed consumers can infer quality from the market price, provided invertibility holds.
 
 (invertibility_conditions)=
-## Invertibility and the Elasticity of Substitution (Theorem 2)
+## Invertibility and the elasticity of substitution
 
 When does $p(PR^1)$ fail to be invertible?
 
@@ -223,7 +254,7 @@ Theorem 2 of {cite:t}`kihlstrom_mirman1975`
 shows that for a two-state economy ($S = 2$), the answer turns on the **elasticity of
 substitution** $\sigma$ of agent 1's utility function.
 
-### The Two-State First-Order Condition
+### The two-state first-order condition
 
 With $S = 2$ and $\mu = (q,\, 1-q)$, the first-order condition for agent 1's demand
 (equation (12a) in the paper) reduces to
@@ -242,21 +273,25 @@ $$
 The equilibrium consumption $(x_1, x_2)$ itself depends on $p$, so this is an implicit
 equation in $p$.
 
-**Theorem 2** ({cite:t}`kihlstrom_mirman1975`).  Assume $u^1$ is quasi-concave and
-homothetic with continuous first partials.  Assume agent 1 always consumes positive
-quantities of both goods.  For $S = 2$:
+```{prf:theorem} Invertibility Conditions
+:label: ime_theorem_invertibility_conditions
+
+Assume $u^1$ is quasi-concave and
+homothetic with continuous first partials. Assume agent 1 always consumes positive
+quantities of both goods. For $S = 2$:
 
 - If $\sigma < 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
 - If $\sigma > 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
 - If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), $p(PR^1)$ is **constant** on $P$
   (no information is transmitted).
+```
 
 Thus, when $\sigma = 1$ the income and substitution effects exactly cancel,
 making agent 1's demand for good 1 independent of information about $\bar{a}$.
 
 So the market price cannot reveal that information.
 
-### CES Utility
+### CES utility
 
 For concreteness we work with the **constant-elasticity-of-substitution** (CES) utility
 function
@@ -278,10 +313,10 @@ u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1}, \qqu
 u_2(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_2^{\rho-1}.
 $$
 
-### Equilibrium Price as a Function of the Posterior
+### Equilibrium price as a function of the posterior
 
 We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1 at
-equilibrium (i.e.~$x_1 = 1$).
+equilibrium (i.e., $x_1 = 1$).
 
 Agent 1's budget constraint then reduces to
 $x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$ satisfying
@@ -292,11 +327,11 @@ p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
 = q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
 $$
 
-For Cobb-Douglas utility ($\sigma = 1$), first-order-necessary conditions  (FOC) become $p = W^1 - p$,
+For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p = W^1 - p$,
 giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no information
 is transmitted through the price in the Cobb-Douglas case.
 
-We compute first-order-necessary conditions numerically below.
+We compute first-order conditions numerically below.
 
 ```{code-cell} ipython3
 def ces_derivatives(c1, c2, rho):
@@ -351,6 +386,12 @@ def eq_price(q, a1, a2, W1, rho):
 ```
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: equilibrium price vs posterior
+    name: fig-eq-price-posterior
+---
 # ── Economy parameters ──────────────────────────────────────────────────────
 a1, a2 = 2.0, 0.5     # state values (a1 > a2)
 W1     = 4.0           # informed agent's wealth; equilibrium x2 = W1 - p
@@ -371,17 +412,15 @@ for rho, label, color in zip(rho_values, rho_labels, colors):
     prices = [eq_price(q, a1, a2, W1, rho) for q in q_grid]
     ax.plot(q_grid, prices, label=label, color=color, lw=2)
 
-ax.set_xlabel(r"Posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
-ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
-ax.set_title("Equilibrium price as a function of the informed agent's posterior",
-             fontsize=12)
+ax.set_xlabel(r"posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
+ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.legend(fontsize=10)
 ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
-The plot confirms Theorem 2.
+The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
 - **CES with $\sigma \neq 1$**: the equilibrium price is **strictly monotone** in $q$.
   An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely invert the
@@ -390,7 +429,7 @@ The plot confirms Theorem 2.
   transmitted through the market.
 
 ```{code-cell} ipython3
-# ── Verify that rho=0 (exact Cobb-Douglas) gives a flat line ─────────────────
+# Verify that rho=0 (exact Cobb-Douglas) gives a flat line
 p_cd = [eq_price(q, a1, a2, W1, rho=0.0) for q in q_grid]
 
 print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
@@ -403,7 +442,7 @@ Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the Cobb-
 equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
 
 (price_monotonicity)=
-### Why Monotonicity Depends on $\sigma$
+### Why monotonicity depends on $\sigma$
 
 The derivative $\partial p / \partial q$ has the sign of $\alpha_1 \beta_2 - \alpha_2 \beta_1$
 (from differentiating the FOC formula).
@@ -435,6 +474,12 @@ Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for diffe
 values of $\sigma$:
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: marginal rate of substitution
+    name: fig-mrs-alpha-beta
+---
 a_vals = np.linspace(0.3, 3.0, 300)
 x1_fix, x2_fix = 1.0, 1.0   # fix consumption bundle for illustration
 
@@ -448,9 +493,8 @@ for rho, color in zip([-0.5, -1e-6, 0.5], ["steelblue", "crimson", "forestgreen"
     ax.plot(a_vals, ratios,
             label=rf"$\sigma = {sigma:.2f}$", color=color, lw=2)
 
-ax.set_xlabel(r"State value $a_s$", fontsize=12)
+ax.set_xlabel(r"state value $a_s$", fontsize=12)
 ax.set_ylabel(r"$\alpha_s / \beta_s = a_s u_1 / u_2$", fontsize=12)
-ax.set_title(r"Marginal rate of substitution $\alpha_s/\beta_s$ vs.\ $a_s$", fontsize=12)
 ax.axhline(y=1.0, color="black", lw=0.8, ls="--")
 ax.legend(fontsize=10)
 ax.grid(alpha=0.3)
@@ -466,13 +510,15 @@ ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
 equilibrium price strictly monotone in the posterior $q$ in both cases.
 
 (bayesian_price_expectations)=
-## Bayesian Price Expectations in a Dynamic Economy
+## Bayesian price expectations in a dynamic economy
 
 We now turn to a question addressed in Section 3 of {cite:t}`kihlstrom_mirman1975`.
 
-### A Stochastic Exchange Economy
+### A stochastic exchange economy
+
+Time is discrete: $t = 1, 2, \ldots$
 
-Time is discrete: $t = 1, 2, \ldots$  In each period $t$:
+In each period $t$:
 
 1. Consumer $i$ receives a random endowment $\omega_i^t$.
 2. Markets open; competitive prices $p^t = p(\omega^t)$ clear all markets.
@@ -493,7 +539,7 @@ $$
 Following econometric convention, {cite:t}`kihlstrom_mirman1975` call $g(p \mid \lambda)$
 the **reduced form** and $f(\omega \mid \lambda)$ the **structure**.
 
-### The Identification Problem
+### The identification problem
 
 Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices loses
 information relative to observing endowments.
@@ -511,9 +557,10 @@ form** (with respect to data on prices).
 An observer who knows the infinite price history learns
 $\mu$ but not necessarily $\lambda$.
 
-### Bayesian Updating
+### Bayesian updating
 
 An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in \Lambda$.
+
 After observing the price sequence $(p^1, \ldots, p^t)$, the observer's Bayesian
 posterior is
 
@@ -532,45 +579,57 @@ g(p^{t+1} \mid p^1, \ldots, p^t)
     h(\lambda \mid p^1, \ldots, p^t).
 $$
 
-### The Convergence Theorem
+### The convergence theorem
+
+```{prf:theorem} Bayesian Convergence
+:label: ime_theorem_bayesian_convergence
+
+Let $\bar\lambda$ be the true
+structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.
 
-**Theorem** ({cite:t}`kihlstrom_mirman1975`, Section 3).  Let $\bar\lambda$ be the true
-structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.  Then:
+Then
 
 $$
 \lim_{t \to \infty} h(\mu \mid p^1, \ldots, p^t)
   = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,} \end{cases}
 $$
 
-with probability one.  Consequently,
+with probability one.
+
+Consequently,
 
 $$
 \lim_{t \to \infty} g(p^{t+1} \mid p^1, \ldots, p^t) = g(p \mid \bar\mu),
 $$
 
 which equals the rational-expectations price distribution for a fully informed observer.
+```
 
-Establishing convergence relies on appealing to  the **Bayesian consistency** result of {cite:t}`degroot1962`: as
+Establishing convergence relies on appealing to the **Bayesian consistency** result of {cite:t}`degroot1962`: as
 long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
 (which holds here generically), the posterior concentrates on the true reduced form.
 
-**Key insight.**  Price observers converge to **rational expectations** even if they
-never identify the underlying structure $\bar\lambda$.  The reduced form
-$g(p \mid \bar\mu)$ statistical model is used to form equilibrium price expectations, and the Bayesian
-observer learns the reduced form from prices alone.
+Price observers converge to **rational expectations** even if they never identify the
+underlying structure $\bar\lambda$.
+
+The reduced form $g(p \mid \bar\mu)$ statistical model is used to form equilibrium price
+expectations, and the Bayesian observer learns the reduced form from prices alone.
 
 (bayesian_simulation)=
-## Simulating Bayesian Learning from Prices
+## Simulating Bayesian learning from prices
 
 We illustrate the theorem with a two-state example.
 
-**Setup.**  Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
-$p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.  The observer knows
-the two possible price distributions (the reduced forms) but not which one governs the
-data.
+Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
+$p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.
 
-This is a standard **Bayesian model selection** problem.  With a prior $h_0$ on $\mu_1$
-and the observed price $p^t$, the posterior weight on $\mu_1$ after period $t$ is
+The observer knows the two possible price distributions (the reduced forms) but not which
+one governs the data.
+
+This is a standard **Bayesian model selection** problem.
+
+With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight on $\mu_1$
+after period $t$ is
 
 $$
 h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
@@ -623,22 +682,23 @@ def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
         ax.plot(t_grid, path, alpha=0.25, lw=0.8, color="steelblue")
 
     median_path = np.median(h_paths, axis=0)
-    ax.plot(t_grid, median_path, color="navy", lw=2.5, label="Median posterior")
+    ax.plot(t_grid, median_path, color="navy", lw=2, label="median posterior")
 
-    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="True model weight = 1")
-    ax.set_xlabel("Period $t$", fontsize=12)
+    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="true model weight = 1")
+    ax.set_xlabel("period $t$", fontsize=12)
     ax.set_ylabel(r"$h_t$ = posterior weight on true model", fontsize=12)
-    ax.set_title(
-        rf"Bayesian learning: $\bar p_{{\\rm true}}={p_bar_true:.1f}$, "
-        rf"$\bar p_{{\\rm alt}}={p_bar_alt:.1f}$, $\sigma_p={sigma_p:.2f}$",
-        fontsize=11,
-    )
     ax.legend(fontsize=10)
     ax.set_ylim(-0.05, 1.08)
     ax.grid(alpha=0.3)
 ```
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: bayesian learning across paths
+    name: fig-bayesian-learning
+---
 T       = 300
 h0      = 0.5     # diffuse prior
 n_paths = 40
@@ -650,13 +710,11 @@ fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 p_bar_true, p_bar_alt = 2.0, 1.2
 h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
-axes[0].set_title("Easy case: means far apart", fontsize=12)
 
 # Case 2: similar reduced forms (harder to learn)
 p_bar_true, p_bar_alt = 2.0, 1.8
 h_paths_hard = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths_hard, p_bar_true, p_bar_alt, axes[1])
-axes[1].set_title("Hard case: means close together", fontsize=12)
 
 plt.tight_layout()
 plt.show()
@@ -665,12 +723,18 @@ plt.show()
 In both panels the posterior weight on the true model converges to 1 with probability one,
 though convergence is slower when the two price distributions are similar (right panel).
 
-### Price Expectations vs. Rational Expectations
+### Price expectations vs. rational expectations
 
 We now verify that the observer's price expectations converge to the rational-expectations
 distribution $g(p \mid \bar\mu)$.
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: price distribution convergence
+    name: fig-price-convergence
+---
 def price_expectation(h_t, p_bar_true, p_bar_alt, p_grid):
     """
     Compute the observer's predictive price density at posterior weight h_t.
@@ -701,11 +765,10 @@ for t_snap, col in zip(snapshots, palette):
     ax.plot(p_grid, dens, color=col, lw=2,
             label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$")
 
-ax.plot(p_grid, re_density, "k--", lw=2.5,
-        label=r"Rational expectations $g(p \mid \bar\mu)$")
-ax.set_xlabel("Price $p$", fontsize=12)
-ax.set_ylabel("Density", fontsize=12)
-ax.set_title("Observer's price distribution converges to rational expectations", fontsize=12)
+ax.plot(p_grid, re_density, "k--", lw=2,
+        label=r"rational expectations $g(p \mid \bar\mu)$")
+ax.set_xlabel("price $p$", fontsize=12)
+ax.set_ylabel("density", fontsize=12)
 ax.legend(fontsize=9)
 ax.grid(alpha=0.3)
 plt.tight_layout()
@@ -715,11 +778,10 @@ plt.show()
 The sequence of predictive densities (shades of blue) converges to the rational-expectations
 density (dashed black line) as experience accumulates.
 
-This illustrates the main theorem of
-Section 3 of {cite:t}`kihlstrom_mirman1975`.
+This illustrates {prf:ref}`ime_theorem_bayesian_convergence`.
 
 (km_extension_nonidentification)=
-### Learning the Reduced Form without Identifying the Structure
+### Learning the reduced form without identifying the structure
 
 The convergence result is particularly striking because the observer converges to
 *rational expectations* even when the underlying **structure** $\lambda$ is
@@ -731,6 +793,12 @@ $\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
 (because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price distribution).
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: learning with non-identification
+    name: fig-nonidentification
+---
 def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths, seed=0):
     """
     Bayesian learning with 3 structures, 2 reduced forms.
@@ -775,18 +843,12 @@ for k, (ax, label) in enumerate(zip(axes, struct_labels)):
     for path in h_paths_3:
         ax.plot(t_grid, path[:, k], alpha=0.25, lw=0.8, color="steelblue")
     ax.plot(t_grid, np.median(h_paths_3[:, :, k], axis=0),
-            color="navy", lw=2.5, label="Median")
-    ax.set_title(f"Structure {label}", fontsize=10)
-    ax.set_xlabel("Period $t$", fontsize=11)
+            color="navy", lw=2, label=f"median weight on {label}")
+    ax.set_xlabel("period $t$", fontsize=11)
     ax.grid(alpha=0.3)
     ax.legend(fontsize=9)
 
-axes[0].set_ylabel("Posterior weight", fontsize=11)
-fig.suptitle(
-    r"Non-identification: weights on $\lambda^{(1)}$ and $\lambda^{(2)}$ stabilize at "
-    r"non-degenerate values; $\lambda^{(3)}$ is eliminated",
-    fontsize=10, y=1.02
-)
+axes[0].set_ylabel("posterior weight", fontsize=11)
 plt.tight_layout()
 plt.show()
 ```
@@ -819,13 +881,13 @@ $$
 
 subject to the budget constraint $p\,x_1 + x_2 = w$.  Total supply of good 1 is $X_1 = 1$.
 
-(a) Derive the first-order condition for the informed agent's optimal $x_1$.
+1. Derive the first-order condition for the informed agent's optimal $x_1$.
 
-(b) Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
+1. Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
 supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.  Solve it
 numerically for $q \in (0,1)$ and several values of $\gamma$.
 
-(c) Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
+1. Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
 holds.  Explain intuitively why CARA preferences always lead to an invertible price map
 (the elasticity of substitution of portfolio utility is $\sigma = \infty$).
 ```
@@ -834,7 +896,7 @@ holds.  Explain intuitively why CARA preferences always lead to an invertible pr
 :class: dropdown
 ```
 
-**(a) First-order condition.**
+**1. First-order condition.**
 
 Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.  The FOC is
 
@@ -847,17 +909,17 @@ or equivalently (dividing by $\gamma$ and rearranging)
 
 $$
 q\,(a_1 - p)\, e^{-\gamma(a_1-p) x_1}
-  = (1-q)\,(p - a_2)\, e^{-\gamma(p-a_2) x_1}.
+  = (1-q)\,(p - a_2)\, e^{\gamma(p-a_2) x_1}.
 $$
 
-**(b) Market-clearing equilibrium price.**
+**2. Market-clearing equilibrium price.**
 
 Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
 a scalar root-finding problem in $p$:
 
 $$
 F(p;\,q,\gamma) \equiv
-  q\,(a_1-p)\,e^{-\gamma(a_1-p)} - (1-q)\,(p-a_2)\,e^{-\gamma(p-a_2)} = 0.
+  q\,(a_1-p)\,e^{-\gamma(a_1-p)} - (1-q)\,(p-a_2)\,e^{\gamma(p-a_2)} = 0.
 $$
 
 ```{code-cell} ipython3
@@ -866,7 +928,7 @@ from scipy.optimize import brentq
 def F_cara(p, q, a1, a2, gamma, x1=1.0):
     """Residual of CARA market-clearing condition."""
     return (q * (a1-p) * np.exp(-gamma*(a1-p)*x1)
-            - (1-q) * (p-a2) * np.exp(-gamma*(p-a2)*x1))
+            - (1-q) * (p-a2) * np.exp(gamma*(p-a2)*x1))
 
 a1, a2  = 2.0, 0.5
 q_grid  = np.linspace(0.05, 0.95, 200)
@@ -875,14 +937,14 @@ colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(gammas)))
 
 fig, ax = plt.subplots(figsize=(8, 5))
 for gamma, color in zip(gammas, colors_sol):
-    p_eq = [brentq(F_cara, a2+1e-4, a1-1e-4,
+    p_eq = [brentq(F_cara, a2, a1,
                    args=(q, a1, a2, gamma))
             for q in q_grid]
     ax.plot(q_grid, p_eq, lw=2, color=color,
             label=rf"$\gamma = {gamma}$")
 
-ax.set_xlabel(r"Posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
-ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
+ax.set_xlabel(r"posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
+ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.set_title("CARA preferences: equilibrium prices", fontsize=12)
 ax.legend(fontsize=10)
 ax.grid(alpha=0.3)
@@ -890,12 +952,12 @@ plt.tight_layout()
 plt.show()
 ```
 
-**(c) Invertibility for CARA.**
+**3. Invertibility for CARA.**
 
 The price is strictly increasing in $q$ for every $\gamma > 0$.  Intuitively, portfolio
 utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect substitutes** in
-creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$.  By
-Theorem 2 of {cite:t}`kihlstrom_mirman1975`, the price map is therefore always invertible.
+creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$. By
+{prf:ref}`ime_theorem_invertibility_conditions`, the price map is therefore always invertible.
 
 ```{solution-end}
 ```
@@ -903,7 +965,7 @@ Theorem 2 of {cite:t}`kihlstrom_mirman1975`, the price map is therefore always i
 ```{exercise}
 :label: km_ex2
 
-**Convergence rate and KL divergence.**  In the Bayesian learning simulation, the speed of
+In the Bayesian learning simulation, the speed of
 convergence to rational expectations is determined by the **Kullback-Leibler divergence**
 between the two reduced forms.
 
@@ -914,14 +976,14 @@ $$
 D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2}.
 $$
 
-(a) For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
+1. For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
 ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.8$), compute $D_{KL}$ for $\sigma_p = 0.4$.
 
-(b) Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
+1. Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
 path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot histograms of
 $T_{0.99}$ for both cases.
 
-(c) How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
+1. How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
 roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
 ```
 
@@ -964,7 +1026,7 @@ for ax, (name, p1, p2) in zip(axes, cases):
         fontsize=11
     )
     ax.set_xlabel(r"$T_{0.99}$", fontsize=12)
-    ax.set_ylabel("Count", fontsize=11)
+    ax.set_ylabel("count", fontsize=11)
     ax.legend(fontsize=10)
     ax.grid(alpha=0.3)
 
@@ -997,24 +1059,24 @@ $$
 
 where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
 
-(a) For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
+1. For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
 $\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of $(q, r)$.
 Compute $\partial m / \partial r$ and show that its sign depends on
 $\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
 
-(b) Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
+1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
 aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify numerically
 that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone in $r$),
 giving a counterexample to invertibility.
 
-(c) Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
+1. Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
 ```
 
 ```{solution-start} km_ex3
 :class: dropdown
 ```
 
-**(a)** Rewrite the MRS with $\mu_1 = 1-r-q$:
+**1.** Rewrite the MRS with $\mu_1 = 1-r-q$:
 
 $$
 m(q,r) = \frac{a_1\beta_1(1-r-q) + a_2\beta_2 r + a_3\beta_3 q}
@@ -1032,7 +1094,7 @@ After simplification this reduces to a signed combination of
 $\beta_1\beta_2(a_1-a_2)({\cdot})$ and $\beta_2\beta_3(a_2-a_3)({\cdot})$ terms
 whose sign is parameter-dependent.
 
-**(b) Numerical verification.**
+**2. Numerical verification.**
 
 ```{code-cell} ipython3
 def mrs_3state(q, r, a1, a2, a3, x1, x2, gamma):
@@ -1080,7 +1142,7 @@ print("Sign changes in dm/dr:",
 The derivative $\partial m / \partial r$ changes sign, confirming that the MRS (and hence
 the equilibrium price) is **not** monotone in $r$ for $S = 3$.
 
-**(c)** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
+**3.** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
 and the MRS is a function of $q$ alone.  One can show directly that $\partial m / \partial q$
 has a definite sign determined entirely by whether $a_1 > a_2$ and whether
 $\sigma > 1$ or $\sigma < 1$ hold—there is no room for sign changes.  With three states,
@@ -1093,20 +1155,23 @@ can reverse the sign of the derivative.
 ```{exercise}
 :label: km_ex4
 
-**Bayesian learning with misspecified models.**  The convergence theorem assumes the true
+{prf:ref}`ime_theorem_bayesian_convergence`
+assumes the true
 distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
 $h(\bar\lambda) > 0$).  Investigate what happens when the true model is **not** in the
 prior support.
 
-(a) Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
+1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
     places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$.
-    Plot the posterior weight on each model over time.
 
-(b) Show that the **predictive** (mixture) price distribution converges to the *closest*
+    - Plot the posterior weight on each model over time.
+
+2. Show that the **predictive** (mixture) price distribution converges to the *closest*
     model in KL divergence terms—which by symmetry is the equal mixture, with mean 2.0.
-    Verify this numerically by computing the predictive mean over time.
 
-(c) Relate this finding to the Bayesian consistency literature: when is the limit
+    - Verify this numerically by computing the predictive mean over time.
+
+3. Relate this finding to the Bayesian consistency literature: when is the limit
     distribution a good approximation to the true distribution even under misspecification?
 ```
 
@@ -1153,11 +1218,11 @@ for ax, k, label in zip(axes, [0, 1], [r"$N(1.5, \sigma^2)$", r"$N(2.5, \sigma^2
     for path in h_misspec:
         ax.plot(t_grid, path[:, k], alpha=0.2, lw=0.8, color="steelblue")
     ax.plot(t_grid, np.median(h_misspec[:, :, k], axis=0),
-            color="navy", lw=2.5, label="Median")
+            color="navy", lw=2, label="median")
     ax.axhline(0.5, color="crimson", lw=1.5, ls="--", label="0.5 (symmetric limit)")
     ax.set_title(f"Posterior weight on {label}", fontsize=11)
-    ax.set_xlabel("Period $t$", fontsize=11)
-    ax.set_ylabel("Posterior weight", fontsize=11)
+    ax.set_xlabel("period $t$", fontsize=11)
+    ax.set_ylabel("posterior weight", fontsize=11)
     ax.legend(fontsize=9)
     ax.grid(alpha=0.3)
 
@@ -1174,12 +1239,15 @@ print("(Symmetry implies equal weight on 1.5 and 2.5 → predictive mean = 2.0)"
 ```
 
 By symmetry, the two wrong models are equidistant from the true distribution in KL
-divergence. The posterior therefore converges to the 50-50 mixture, and the predictive mean
+divergence. 
+
+The posterior therefore converges to the 50-50 mixture, and the predictive mean
 converges to $0.5 \times 1.5 + 0.5 \times 2.5 = 2.0$—coinciding with the true mean
-despite misspecification.  This is an instance of the general result that under
+despite misspecification. 
+
+This is an instance of the general result that under
 misspecification, Bayesian posteriors converge to the distribution in the model class that
 minimizes KL divergence from the model actually generating the data.
 
 ```{solution-end}
 ```
-
diff --git a/lectures/lagrangian_lqdp.md b/lectures/lagrangian_lqdp.md
index f1e680cc6..5cce10050 100644
--- a/lectures/lagrangian_lqdp.md
+++ b/lectures/lagrangian_lqdp.md
@@ -451,11 +451,16 @@ solves. See {cite}`Ljungqvist2012`,  ch 12.
 
 ## Application
 
-Here we demonstrate the computation with an example which is the deterministic version of an example borrowed from this [quantecon lecture](https://python.quantecon.org/lqcontrol.html).
+Here we demonstrate the computation with the deterministic permanent-income example from this [quantecon lecture](https://python.quantecon.org/lqcontrol.html).
+
+Because that model is discounted, we apply the invariant-subspace method to the
+equivalent **undiscounted** system obtained from the transformed matrices
+$\hat A = \beta^{1/2} A$ and $\hat B = \beta^{1/2} B$.
 
 ```{code-cell} ipython3
 # Model parameters
 r = 0.05
+β = 1 / (1 + r)
 c_bar = 2
 μ = 1
 
@@ -468,7 +473,7 @@ B = [[-1],
      [0]]
 
 # Construct an LQ instance
-lq = LQ(Q, R, A, B)
+lq = LQ(Q, R, A, B, beta=β)
 ```
 
 Given matrices $A$, $B$, $Q$, $R$, we can then compute $L$, $N$, and $M=L^{-1}N$.
@@ -476,7 +481,7 @@ Given matrices $A$, $B$, $Q$, $R$, we can then compute $L$, $N$, and $M=L^{-1}N$
 ```{code-cell} ipython3
 def construct_LNM(A, B, Q, R):
 
-    n, k = lq.n, lq.k
+    n = A.shape[0]
 
     # construct L and N
     L = np.zeros((2*n, 2*n))
@@ -496,7 +501,10 @@ def construct_LNM(A, B, Q, R):
 ```
 
 ```{code-cell} ipython3
-L, N, M = construct_LNM(lq.A, lq.B, lq.Q, lq.R)
+A_bar = lq.A * lq.beta ** (1/2)
+B_bar = lq.B * lq.beta ** (1/2)
+
+L, N, M = construct_LNM(A_bar, B_bar, lq.Q, lq.R)
 ```
 
 ```{code-cell} ipython3
@@ -517,7 +525,7 @@ M @ J @ M.T - J
 We can compute the eigenvalues of $M$ using `np.linalg.eigvals`, arranged in ascending order.
 
 ```{code-cell} ipython3
-eigvals = sorted(np.linalg.eigvals(M))
+eigvals = sorted(np.linalg.eigvals(M), key=lambda z: (abs(z), z.real, z.imag))
 eigvals
 ```
 
@@ -529,18 +537,14 @@ When we apply Schur decomposition such that $M=V W V^{-1}$, we want
 To get what we want, let's define a sorting function that tells `scipy.schur` to sort the corresponding eigenvalues with modulus smaller than 1 to the upper left.
 
 ```{code-cell} ipython3
-stable_eigvals = eigvals[:n]
+tol = 1e-10
 
 def sort_fun(x):
-    "Sort the eigenvalues with modules smaller than 1 to the top-left."
-
-    if x in stable_eigvals:
-        stable_eigvals.pop(stable_eigvals.index(x))
-        return True
-    else:
-        return False
+    "Sort the eigenvalues with modulus smaller than 1 to the top-left."
+    return abs(x) < 1 - tol
 
-W, V, _ = schur(M, sort=sort_fun)
+W, V, stable_dim = schur(M, sort=sort_fun)
+stable_dim
 ```
 
 ```{code-cell} ipython3
@@ -584,25 +588,24 @@ def stable_solution(M, verbose=True):
         The matrix represents the linear difference equations system.
     """
     n = M.shape[0] // 2
-    stable_eigvals = list(sorted(np.linalg.eigvals(M))[:n])
+    tol = 1e-10
 
     def sort_fun(x):
-        "Sort the eigenvalues with modules smaller than 1 to the top-left."
-
-        if x in stable_eigvals:
-            stable_eigvals.pop(stable_eigvals.index(x))
-            return True
-        else:
-            return False
-
-    W, V, _ = schur(M, sort=sort_fun)
+        "Sort the eigenvalues with modulus smaller than 1 to the top-left."
+        return abs(x) < 1 - tol
+
+    W, V, stable_dim = schur(M, sort=sort_fun)
+    if stable_dim != n:
+        raise ValueError(
+    f"Expected {n} stable eigenvalues inside the unit circle, found {stable_dim}."
+    )
     if verbose:
         print('eigenvalues:\n')
         print('    W11: {}'.format(np.diag(W[:n, :n])))
         print('    W22: {}'.format(np.diag(W[n:, n:])))
 
-    # compute V21 V11^{-1}
-    P = V[n:, :n] @ np.linalg.inv(V[:n, :n])
+    # compute V21 V11^{-1} without forming the inverse explicitly
+    P = np.linalg.solve(V[:n, :n].T, V[n:, :n].T).T
 
     return W, V, P
 
@@ -761,11 +764,6 @@ For example, when $\beta=\frac{1}{1+r}$, we can solve for $P$ with $\hat{A}=\bet
 
 These settings are adopted by default in the function `stationary_P` defined above.
 
-```{code-cell} ipython3
-β = 1 / (1 + r)
-lq.beta = β
-```
-
 ```{code-cell} ipython3
 stationary_P(lq)
 ```
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index e1353ec0b..2b0d292df 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -37,7 +37,7 @@ In this lecture, you will learn formulas for
 * marginal distributions for all subvectors of $x$
 * conditional distributions for subvectors of $x$ conditional on other subvectors of $x$
 
-We will use  the multivariate normal distribution to formulate some useful models:
+We will use the multivariate normal distribution to formulate some useful models:
 
 * a factor analytic model of an intelligence quotient, i.e., IQ
 * a factor analytic model of two independent inherent abilities, say, mathematical and verbal.
@@ -46,7 +46,7 @@ We will use  the multivariate normal distribution to formulate some useful model
 * time series generated by linear stochastic difference equations
 * optimal linear filtering theory
 
-## The Multivariate Normal Distribution
+## The multivariate normal distribution
 
 This lecture defines a Python class `MultivariateNormal` to be used
 to generate **marginal** and **conditional** distributions associated
@@ -60,7 +60,7 @@ For a multivariate normal distribution it is very convenient that
 
 We apply our Python class to some examples.
 
-We  use the following imports:
+We use the following imports:
 
 ```{code-cell} ipython3
 import matplotlib.pyplot as plt
@@ -75,11 +75,11 @@ multivariate normal probability density.
 This means that the probability density takes the form
 
 $$
-f\left(z;\mu,\Sigma\right)=\left(2\pi\right)^{-\left(\frac{N}{2}\right)}\det\left(\Sigma\right)^{-\frac{1}{2}}\exp\left(-.5\left(z-\mu\right)^{\prime}\Sigma^{-1}\left(z-\mu\right)\right)
+f\left(z;\mu,\Sigma\right)=\left(2\pi\right)^{-\left(\frac{N}{2}\right)}\det\left(\Sigma\right)^{-\frac{1}{2}}\exp\left(-.5\left(z-\mu\right)^\top\Sigma^{-1}\left(z-\mu\right)\right)
 $$
 
 where $\mu=Ez$ is the mean of the random vector $z$ and
-$\Sigma=E\left(z-\mu\right)\left(z-\mu\right)^\prime$ is the
+$\Sigma=E\left(z-\mu\right)\left(z-\mu\right)^\top$ is the
 covariance matrix of $z$.
 
 The covariance matrix $\Sigma$ is symmetric and positive definite.
@@ -157,7 +157,7 @@ $$
 and covariance matrix
 
 $$
-\hat{\Sigma}_{11}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}=\Sigma_{11}-\beta\Sigma_{22}\beta^{\prime}
+\hat{\Sigma}_{11}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}=\Sigma_{11}-\beta\Sigma_{22}\beta^\top
 $$
 
 where
@@ -264,7 +264,7 @@ squares regressions.
 We’ll compare those linear least squares regressions for the simulated
 data to their population counterparts.
 
-## Bivariate Example
+## Bivariate example
 
 We start with a bivariate normal distribution pinned down by
 
@@ -298,7 +298,7 @@ Let's illustrate the fact that you _can regress anything on anything else_.
 
 We have computed everything we need to compute two regression lines, one of $z_2$ on $z_1$, the other of $z_1$ on $z_2$.
 
-We'll represent  these regressions as
+We'll represent these regressions as
 
 $$
 z_1 = a_1 + b_1 z_2 + \epsilon_1
@@ -322,7 +322,7 @@ $$
 E \epsilon_2 z_1 = 0
 $$
 
-Let's  compute $a_1, a_2, b_1, b_2$.
+Let's compute $a_1, a_2, b_1, b_2$.
 
 ```{code-cell} python3
 
@@ -358,7 +358,12 @@ Now let's plot the two regression lines and stare at them.
 
 
 ```{code-cell} python3
-
+---
+mystnb:
+  figure:
+    caption: two regressions
+    name: fig-two-regressions
+---
 z2 = np.linspace(-4,4,100)
 
 
@@ -385,14 +390,13 @@ ax.xaxis.set_ticks_position('bottom')
 ax.yaxis.set_ticks_position('left')
 plt.ylabel('$z_1$', loc = 'top')
 plt.xlabel('$z_2$,', loc = 'right')
-plt.title('two regressions')
 plt.plot(z2,z1, 'r', label = "$z_1$ on $z_2$")
 plt.plot(z2,z1h, 'b', label = "$z_2$ on $z_1$")
 plt.legend()
 plt.show()
 ```
 
-The red line is the  expectation of $z_1$ conditional on $z_2$.
+The red line is the expectation of $z_1$ conditional on $z_2$.
 
 The intercept and slope of the red line are
 
@@ -412,7 +416,7 @@ print("1/b2 = ", 1/b2)
 
 We can use these regression lines or our code to compute conditional expectations.
 
-Let's  compute the mean and variance of the distribution of $z_2$
+Let's compute the mean and variance of the distribution of $z_2$
 conditional on $z_1=5$.
 
 After that we'll reverse what are on the left and right sides of the regression.
@@ -504,9 +508,9 @@ Thus, in each case, for our very large sample size, the sample analogues
 closely approximate their population counterparts.
 
 A Law of Large
-Numbers explains why  sample analogues approximate  population objects.
+Numbers explains why sample analogues approximate population objects.
 
-## Trivariate Example
+## Trivariate example
 
 Let’s apply our code to a trivariate example.
 
@@ -570,7 +574,7 @@ multi_normal.βs[0], results.params
 Once again, sample analogues do a good job of approximating their
 populations counterparts.
 
-## One Dimensional Intelligence (IQ)
+## One dimensional intelligence (IQ)
 
 Let’s move closer to a real-life example, namely, inferring a
 one-dimensional measure of intelligence called IQ from a list of test
@@ -725,7 +729,7 @@ conditional normal distribution of the IQ $\theta$.
 
 In the following code, `ind` sets the variables on the right side of the regression.
 
-Given the way we have defined the vector $X$, we want  to set `ind=1` in order to make $\theta$ the left side variable in the
+Given the way we have defined the vector $X$, we want to set `ind=1` in order to make $\theta$ the left side variable in the
 population regression.
 
 ```{code-cell} python3
@@ -811,9 +815,9 @@ Thus, each $y_{i}$ adds information about $\theta$.
 
 If we were to drive the number of tests $n \rightarrow + \infty$, the
 conditional standard deviation $\hat{\sigma}_{\theta}$ would
-converge to $0$ at  rate $\frac{1}{n^{.5}}$.
+converge to $0$ at rate $\frac{1}{n^{.5}}$.
 
-## Information as Surprise
+## Information as surprise
 
 By using a different representation, let’s look at things from a
 different perspective.
@@ -828,13 +832,13 @@ where $C$ is a lower triangular **Cholesky factor** of
 $\Sigma$ so that
 
 $$
-\Sigma \equiv DD^{\prime} = C C^\prime
+\Sigma \equiv DD^\top = C C^\top
 $$
 
 and
 
 $$
-E \epsilon \epsilon' = I .
+E \epsilon \epsilon^\top = I .
 $$
 
 It follows that
@@ -928,13 +932,13 @@ np.max(np.abs(μθ_hat_arr - μθ_hat_arr_C)) < 1e-10
 np.max(np.abs(Σθ_hat_arr - Σθ_hat_arr_C)) < 1e-10
 ```
 
-## Cholesky Factor Magic
+## Cholesky factor magic
 
 Evidently, the Cholesky factorizations automatically computes the
 population  **regression coefficients** and associated statistics
 that are produced by our `MultivariateNormal` class.
 
-The Cholesky factorization  computes these things **recursively**.
+The Cholesky factorization computes these things **recursively**.
 
 Indeed, in formula {eq}`mnv_1`,
 
@@ -944,7 +948,7 @@ Indeed, in formula {eq}`mnv_1`,
 - the coefficient $c_i$ is the simple population regression
   coefficient of $\theta - \mu_\theta$ on $\epsilon_i$
 
-## Math and Verbal  Intelligence
+## Math and verbal intelligence
 
 We can alter the preceding example to be more realistic.
 
@@ -1098,7 +1102,7 @@ for indices, IQ, conditions in [([*range(2*n), 2*n], 'θ', 'y1, y2, y3, y4'),
 Evidently, math tests provide no information about $\eta$ and
 language tests provide no information about $\theta$.
 
-## Univariate Time Series Analysis
+## Univariate time series analysis
 
 We can use the multivariate normal distribution and a little matrix
 algebra to present foundations of univariate linear time series
@@ -1110,7 +1114,7 @@ Consider the following model:
 
 $$
 \begin{aligned}
-x_0 & \sim  N\left(0, \sigma_0^2\right) \\
+x_0 & \sim N\left(0, \sigma_0^2\right) \\
 x_{t+1} & = a x_{t} + b w_{t+1}, \quad w_{t+1} \sim N\left(0, 1\right), t \geq 0  \\
 y_{t} & = c x_{t} + d v_{t}, \quad v_{t} \sim N\left(0, 1\right), t \geq 0
 \end{aligned}
@@ -1166,7 +1170,7 @@ $c$ and $d$ as diagonal respectively.
 Consequently, the covariance matrix of $Y$ is
 
 $$
-\Sigma_{y} = E Y Y^{\prime} = C \Sigma_{x} C^{\prime} + D D^{\prime}
+\Sigma_{y} = E Y Y^\top = C \Sigma_{x} C^\top + D D^\top
 $$
 
 By stacking $X$ and $Y$, we can write
@@ -1181,8 +1185,8 @@ $$
 and
 
 $$
-\Sigma_{z} = EZZ^{\prime}=\left[\begin{array}{cc}
-\Sigma_{x} & \Sigma_{x}C^{\prime}\\
+\Sigma_{z} = EZZ^\top=\left[\begin{array}{cc}
+\Sigma_{x} & \Sigma_{x}C^\top\\
 C\Sigma_{x} & \Sigma_{y}
 \end{array}\right]
 $$
@@ -1263,7 +1267,7 @@ x = z[:T+1]
 y = z[T+1:]
 ```
 
-### Smoothing Example
+### Smoothing example
 
 This is an instance of a classic `smoothing` calculation whose purpose
 is to compute $E X \mid Y$.
@@ -1297,7 +1301,7 @@ print(" E [ X | Y] = ", )
 multi_normal_ex1.cond_dist(0, y)
 ```
 
-### Filtering Exercise
+### Filtering exercise
 
 Compute $E\left[x_{t} \mid y_{t-1}, y_{t-2}, \dots, y_{0}\right]$.
 
@@ -1340,7 +1344,7 @@ sub_y = y[:t]
 multi_normal_ex2.cond_dist(0, sub_y)
 ```
 
-### Prediction Exercise
+### Prediction exercise
 
 Compute $E\left[y_{t} \mid y_{t-j}, \dots, y_{0} \right]$.
 
@@ -1380,10 +1384,10 @@ sub_y = y[:t-j+1]
 multi_normal_ex3.cond_dist(0, sub_y)
 ```
 
-### Constructing a Wold Representation
+### Constructing a Wold representation
 
 Now we’ll apply Cholesky decomposition to decompose
-$\Sigma_{y}=H H^{\prime}$ and form
+$\Sigma_{y}=H H^\top$ and form
 
 $$
 \epsilon = H^{-1} Y.
@@ -1414,7 +1418,7 @@ y
 This example is an instance of what is known as a **Wold representation** in time series analysis.
 
 
-## Stochastic Difference Equation
+## Stochastic difference equation
 
 Consider the stochastic second-order linear difference equation
 
@@ -1476,8 +1480,8 @@ We have
 $$
 \begin{aligned}
 \mu_{y} = A^{-1} \mu_{b} \\
-\Sigma_{y} &= A^{-1} E \left[\left(b - \mu_{b} + u \right) \left(b - \mu_{b} + u \right)^{\prime}\right] \left(A^{-1}\right)^{\prime} \\
-           &= A^{-1} \left(\Sigma_{b} + \Sigma_{u} \right) \left(A^{-1}\right)^{\prime}
+\Sigma_{y} &= A^{-1} E \left[\left(b - \mu_{b} + u \right) \left(b - \mu_{b} + u \right)^\top\right] \left(A^{-1}\right)^\top \\
+           &= A^{-1} \left(\Sigma_{b} + \Sigma_{u} \right) \left(A^{-1}\right)^\top
 \end{aligned}
 $$
 
@@ -1495,7 +1499,7 @@ $$
 
 $$
 \Sigma_{b}=\left[\begin{array}{cc}
-C\Sigma_{\tilde{y}}C^{\prime} & \boldsymbol{0}_{N-2\times N-2}\\
+C\Sigma_{\tilde{y}}C^\top & \boldsymbol{0}_{N-2\times N-2}\\
 \boldsymbol{0}_{N-2\times2} & \boldsymbol{0}_{N-2\times N-2}
 \end{array}\right],\quad C=\left[\begin{array}{cc}
 \alpha_{2} & \alpha_{1}\\
@@ -1531,7 +1535,7 @@ T = 160
 ```
 
 ```{code-cell} python3
-# construct A and A^{\prime}
+# construct A and A^\top
 A = np.zeros((T, T))
 
 for i in range(T):
@@ -1567,7 +1571,7 @@ C = np.array([[𝛼2, 𝛼1], [0, 𝛼2]])
 Σy = A_inv @ (Σb + Σu) @ A_inv.T
 ```
 
-## Application to Stock Price Model
+## Application to stock price model
 
 Let
 
@@ -1604,7 +1608,7 @@ we have
 $$
 \begin{aligned}
 \mu_{p} = B \mu_{y} \\
-\Sigma_{p} = B \Sigma_{y} B^{\prime}
+\Sigma_{p} = B \Sigma_{y} B^\top
 \end{aligned}
 $$
 
@@ -1641,7 +1645,7 @@ $$
 $$
 
 $$
-\Sigma_{z}=D\Sigma_{y}D^{\prime}
+\Sigma_{z}=D\Sigma_{y}D^\top
 $$
 
 ```{code-cell} python3
@@ -1695,7 +1699,7 @@ be if people did not have perfect foresight but were optimally
 predicting future dividends on the basis of the information
 $y_t, y_{t-1}$ at time $t$.
 
-## Filtering Foundations
+## Filtering foundations
 
 Assume that $x_0$ is an $n \times 1$ random vector and that
 $y_0$ is a $p \times 1$ random vector determined by the
@@ -1713,7 +1717,7 @@ We consider the problem of someone who
 
 * *observes* $y_0$
 *  does not observe $x_0$,
-*  knows $\hat x_0, \Sigma_0, G, R$ and therefore  the joint probability distribution of the vector $\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$
+* knows $\hat x_0, \Sigma_0, G, R$ and therefore the joint probability distribution of the vector $\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$
 * wants to infer $x_0$ from $y_0$ in light of what he knows about that
 joint probability distribution.
 
@@ -1730,7 +1734,7 @@ $$
                           G \Sigma_0 & G \Sigma_0 G' + R \end{bmatrix}
 $$
 
-By applying an appropriate instance of the above formulas for the  mean vector $\hat \mu_1$ and covariance matrix
+By applying an appropriate instance of the above formulas for the mean vector $\hat \mu_1$ and covariance matrix
 $\hat \Sigma_{11}$ of $z_1$ conditional on $z_2$, we find that the probability distribution of
 $x_0$ conditional on $y_0$ is
 ${\mathcal N}(\tilde x_0, \tilde \Sigma_0)$ where
@@ -1860,7 +1864,7 @@ $$
 \Sigma_{t+1}= C C' + A \Sigma_t A' - A \Sigma_t G' (G \Sigma_t G' +R)^{-1} G \Sigma_t A' .
 $$
 
-This is a matrix Riccati difference equation that is closely related to another matrix Riccati difference equation that appears in  a quantecon lecture on the basics of linear quadratic control theory.
+This is a matrix Riccati difference equation that is closely related to another matrix Riccati difference equation that appears in a quantecon lecture on the basics of linear quadratic control theory.
 
 That equation has the form
 
@@ -1876,7 +1880,7 @@ P_{t-1} =R + A' P_t A  - A' P_t B
 Stare at the two preceding equations for a moment or two, the first being a matrix difference equation for a conditional covariance matrix, the
 second being a matrix difference equation in the matrix appearing in a quadratic form for an intertemporal cost of value function.
 
-Although the  two equations are not identical, they display striking family resemblences.
+Although the two equations are not identical, they display striking family resemblances.
 
 * the first equation tells dynamics that work **forward**  in time
 * the second equation tells dynamics that work  **backward** in time
@@ -1931,7 +1935,7 @@ x1_cond = A @ μ1_hat
 x1_cond, Σ1_cond
 ```
 
-### Code for Iterating
+### Code for iterating
 
 Here is code for solving a dynamic filtering problem by iterating on our
 equations, followed by an example.
@@ -1972,10 +1976,10 @@ iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2])
 
 The iterative algorithm just described is a version of the celebrated **Kalman filter**.
 
-We describe the Kalman filter  and some applications of it in {doc}`A First Look at the Kalman Filter <kalman>`
+We describe the Kalman filter and some applications of it in {doc}`A First Look at the Kalman Filter <kalman>`
 
 
-## Classic Factor Analysis Model
+## Classic factor analysis model
 
 The factor analysis model widely used in psychology and other fields can
 be represented as
@@ -1987,11 +1991,11 @@ $$
 where
 
 1. $Y$ is $n \times 1$ random vector,
-   $E U U^{\prime} = D$ is a diagonal matrix,
+   $E U U^\top = D$ is a diagonal matrix,
 1. $\Lambda$ is $n \times k$ coefficient matrix,
 1. $f$ is $k \times 1$ random vector,
-   $E f f^{\prime} = I$,
-1. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f' = 0 $ )
+   $E f f^\top = I$,
+1. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f^\top = 0 $ )
 1. It is presumed that $k$ is small relative to $n$; often
    $k$ is only $1$ or $2$, as in our IQ examples.
 
@@ -1999,15 +2003,15 @@ This implies that
 
 $$
 \begin{aligned}
-\Sigma_y = E Y Y^{\prime} = \Lambda \Lambda^{\prime} + D \\
-E Y f^{\prime} = \Lambda \\
-E f Y^{\prime} = \Lambda^{\prime}
+\Sigma_y = E Y Y^\top = \Lambda \Lambda^\top + D \\
+E Y f^\top = \Lambda \\
+E f Y^\top = \Lambda^\top
 \end{aligned}
 $$
 
 Thus, the covariance matrix $\Sigma_Y$ is the sum of a diagonal
 matrix $D$ and a positive semi-definite matrix
-$\Lambda \Lambda^{\prime}$ of rank $k$.
+$\Lambda \Lambda^\top$ of rank $k$.
 
 This means that all covariances among the $n$ components of the
 $Y$ vector are intermediated by their common dependencies on the
@@ -2026,9 +2030,9 @@ the covariance matrix of the expanded random vector $Z$ can be
 computed as
 
 $$
-\Sigma_{z} = EZZ^{\prime}=\left(\begin{array}{cc}
-I & \Lambda^{\prime}\\
-\Lambda & \Lambda\Lambda^{\prime}+D
+\Sigma_{z} = EZZ^\top=\left(\begin{array}{cc}
+I & \Lambda^\top\\
+\Lambda & \Lambda\Lambda^\top+D
 \end{array}\right)
 $$
 
@@ -2115,7 +2119,7 @@ multi_normal_factor.cond_dist(0, y)
 
 We can verify that the conditional mean
 $E \left[f \mid Y=y\right] = B Y$ where
-$B = \Lambda^{\prime} \Sigma_{y}^{-1}$.
+$B = \Lambda^\top \Sigma_{y}^{-1}$.
 
 ```{code-cell} python3
 B = Λ.T @ np.linalg.inv(Σy)
@@ -2136,7 +2140,7 @@ $\Lambda I^{-1} f = \Lambda f$.
 Λ @ f
 ```
 
-## PCA and Factor Analysis
+## PCA and factor analysis
 
 To learn about Principal Components Analysis (PCA), please see this lecture {doc}`Singular Value Decompositions <svd_intro>`.
 
@@ -2158,7 +2162,7 @@ governs the data on $Y$ we have generated.
 So we compute the PCA decomposition
 
 $$
-\Sigma_{y} = P \tilde{\Lambda} P^{\prime}
+\Sigma_{y} = P \tilde{\Lambda} P^\top
 $$
 
 where $\tilde{\Lambda}$ is a diagonal matrix.
@@ -2172,7 +2176,7 @@ $$
 and
 
 $$
-\epsilon = P^\prime Y
+\epsilon = P^\top Y
 $$
 
 Note that we will arrange the eigenvectors in $P$ in the
@@ -2319,14 +2323,14 @@ $$
 
 fix $z_2 = 2$.
 
-(a) Use `MultivariateNormal` to compute the analytical conditional mean
+1. Use `MultivariateNormal` to compute the analytical conditional mean
 $\hat{\mu}_1$ and variance $\hat{\Sigma}_{11}$ of $z_1 \mid z_2 = 2$.
 
-(b) Draw $10^6$ samples from the joint distribution. Retain only those
+1. Draw $10^6$ samples from the joint distribution. Retain only those
 for which $|z_2 - 2| < 0.05$. Compute the sample mean and variance of
 the retained $z_1$ values.
 
-(c) Confirm that the sample estimates are close to the analytical values.
+1. Confirm that the sample estimates are close to the analytical values.
 ```
 
 ```{solution-start} mv_normal_ex1
@@ -2411,12 +2415,12 @@ for rho in [0.2, 0.5, 0.9]:
 Using the one-dimensional IQ model with $n = 50$ test scores and
 $\mu_\theta = 100$, $\sigma_\theta = 10$:
 
-(a) Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
+1. Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
 For each value, plot the posterior standard deviation
 $\hat{\sigma}_\theta$ as a function of the number of test scores
 included (from 1 to 50), with all curves on the same axes.
 
-(b) Explain intuitively why a larger $\sigma_y$ leads to a slower
+1. Explain intuitively why a larger $\sigma_y$ leads to a slower
 decline of posterior uncertainty.
 ```
 
@@ -2464,12 +2468,12 @@ $\theta$ exactly.
 Using the one-dimensional IQ model with $n = 20$ test scores and
 $\mu_\theta = 100$, $\sigma_y = 10$:
 
-(a) Fix $\sigma_y = 10$ and vary the prior spread
+1. Fix $\sigma_y = 10$ and vary the prior spread
 $\sigma_\theta \in \{1, 5, 10, 50, 500\}$. For each value compute the
 posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
 scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
 
-(b) Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
+1. Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
 the posterior mean converges to the sample mean $\bar{y}$, and as
 $\sigma_y \to \infty$ the posterior mean converges to the prior mean
 $\mu_\theta$.
@@ -2534,12 +2538,12 @@ $$
 
 and initial conditions $\hat{x}_0 = [0, 0]'$, $\Sigma_0 = I_2$:
 
-(a) Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
+1. Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
 
-(b) Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
+1. Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
 $\Sigma_t[1,1]$ over time. Verify that they converge to a steady state.
 
-(c) Plot the filtered state estimates $\hat{x}_t[0]$ together with the
+1. Plot the filtered state estimates $\hat{x}_t[0]$ together with the
 true $x_t[0]$ and the raw observations $y_t$ on a single figure.
 ```
 
@@ -2601,16 +2605,16 @@ plt.show()
 In the classic factor analysis model at the end of the lecture the true
 covariance is $\Sigma_y = \Lambda \Lambda' + D$.
 
-(a) Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
+1. Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
 variance explained by the first two principal components and compare
 it with the $\sigma_u = 0.5$ result. Explain the change.
 
-(b) Show that the conditional expectation $E[f \mid Y] = BY$ with
-$B = \Lambda' \Sigma_y^{-1}$ is **not** equal to the two-component PCA
+1. Show that the conditional expectation $E[f \mid Y] = BY$ with
+$B = \Lambda^\top \Sigma_y^{-1}$ is **not** equal to the two-component PCA
 projection $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$. Plot both on the same
 axes.
 
-(c) In one or two sentences, explain why PCA is misspecified for
+1. In one or two sentences, explain why PCA is misspecified for
 factor-analytic data.
 ```
 
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index 3708b9599..8cbb50f40 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -17,7 +17,7 @@ kernelspec:
 
 This lecture uses matrix algebra to illustrate some basic ideas about probability theory.
 
-After introducing  underlying objects, we'll use matrices and vectors to describe probability distributions.
+After introducing underlying objects, we'll use matrices and vectors to describe probability distributions.
 
 Among concepts that we'll be studying include
 
@@ -29,13 +29,13 @@ Among concepts that we'll be studying include
     - couplings
     - copulas
 - the probability distribution of a sum of two independent random variables
-    - convolution of  marginal distributions
+    - convolution of marginal distributions
 - parameters that define a probability distribution
 - sufficient statistics as data summaries
 
 We'll use a matrix to represent a bivariate or multivariate probability distribution and a vector to represent a univariate probability distribution
 
-This {doc}`companion lecture <stats_examples>` describes some popular probability distributions and describes how to  use Python to sample from them. 
+This {doc}`companion lecture <stats_examples>` describes some popular probability distributions and describes how to use Python to sample from them.
 
 
 In addition to what's in Anaconda, this lecture will need the following libraries:
@@ -59,14 +59,14 @@ set_matplotlib_formats('retina')
 ```
 
 
-## Sketch of Basic Concepts
+## Sketch of basic concepts
 
 We'll briefly define what we mean by a **probability space**, a **probability measure**, and a **random variable**.
 
 For most of this lecture, we sweep these objects into the background
  
 ```{note}
-Nevertheless, they'll be lurking beneath  **induced distributions** of random variables that  we'll  focus on here. These deeper objects are essential for defining  and analysing  the concepts of stationarity and ergodicity that underly laws of large numbers.  For a relatively
+Nevertheless, they'll be lurking beneath **induced distributions** of random variables that we'll focus on here. These deeper objects are essential for defining and analysing the concepts of stationarity and ergodicity that underly laws of large numbers. For a relatively
 nontechnical presentation of some of these results see this chapter from Lars Peter Hansen and Thomas J. Sargent's online monograph titled "Risk, Uncertainty, and Values":<https://lphansen.github.io/QuantMFR/book/1_stochastic_processes.html>.
 ``` 
   
@@ -76,18 +76,18 @@ Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega
 
 Let $\mathcal{G} \subset \Omega$ be a subset of $\Omega$.
 
-Let $\mathcal{F}$ be a collection of such subsets  $\mathcal{G} \subset \Omega$.
+Let $\mathcal{F}$ be a collection of such subsets $\mathcal{G} \subset \Omega$.
 
-The pair $\Omega,\mathcal{F}$  forms our **probability space** on which we want to put a probability measure.
+The pair $\Omega,\mathcal{F}$ forms our **probability space** on which we want to put a probability measure.
 
-A **probability measure** $\mu$ maps a set of possible underlying outcomes  $\mathcal{G} \in \mathcal{F}$  into a scalar number between $0$ and $1$
+A **probability measure** $\mu$ maps a set of possible underlying outcomes $\mathcal{G} \in \mathcal{F}$ into a scalar number between $0$ and $1$
 
 - this is the "probability" that $X$ belongs to $A$, denoted by $ \textrm{Prob}\{X\in A\}$.
 
 A **random variable** $X(\omega)$ is a function of the underlying outcome $\omega \in \Omega$.
 
 
-The random variable $X(\omega)$  has a **probability distribution** that is induced by the underlying probability measure $\mu$ and the function
+The random variable $X(\omega)$ has a **probability distribution** that is induced by the underlying probability measure $\mu$ and the function
 $X(\omega)$:
 
 $$
@@ -98,34 +98,34 @@ where ${\mathcal G}$ is the subset of $\Omega$ for which $X(\omega) \in A$.
 
 We call this the induced probability distribution of random variable $X$.
 
-Instead of working explicitly with an underlying probability space $\Omega,\mathcal{F}$  and probability measure $\mu$,
-applied statisticians often proceed simply by specifying a form for an induced distribution for a random variable $X$. 
+Instead of working explicitly with an underlying probability space $\Omega,\mathcal{F}$ and probability measure $\mu$,
+applied statisticians often proceed simply by specifying a form for an induced distribution for a random variable $X$.
 
-That is how we'll proceed in this lecture and in many subsequent lectures. 
+That is how we'll proceed in this lecture and in many subsequent lectures.
 
 
-## What Does Probability Mean?
+## What does probability mean?
 
 Before diving in, we'll say a few words about what probability theory means and how it connects to statistics.
 
-We  also touch  on these topics in the quantecon lectures  <https://python.quantecon.org/prob_meaning.html> and <https://python.quantecon.org/navy_captain.html>.
+We also touch on these topics in {doc}`prob_meaning` and {doc}`navy_captain`.
 
-For much of this lecture we'll be discussing  fixed "population" probabilities.
+For much of this lecture we'll be discussing fixed "population" probabilities.
 
 These are purely mathematical objects.
 
 To appreciate how statisticians connect probabilities to data, the key is to understand the following concepts:
 
 * A single draw from a probability distribution
-* Repeated independently  and identically distributed (i.i.d.)  draws of "samples" or "realizations" from the same probability distribution
-* A **statistic** defined as a  function of a sequence of samples
-* An **empirical distribution** or **histogram** (a binned empirical distribution) that records observed  **relative frequencies**
-* The idea that a  population probability  distribution is  what we anticipate **relative frequencies** will be in a long sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by **anticipated relative frequencies**
+* Repeated independently and identically distributed (i.i.d.) draws of "samples" or "realizations" from the same probability distribution
+* A **statistic** defined as a function of a sequence of samples
+* An **empirical distribution** or **histogram** (a binned empirical distribution) that records observed **relative frequencies**
+* The idea that a population probability distribution is what we anticipate **relative frequencies** will be in a long sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by **anticipated relative frequencies**
      - **Law of Large Numbers (LLN)**
-     -  **Central Limit Theorem (CLT)**
+     - **Central Limit Theorem (CLT)**
 
 
-**Scalar example**
+#### Scalar example
 
 Let $X$ be a scalar random variable that takes on the $I$ possible values
 $0, 1, 2, \ldots, I-1$ with probabilities
@@ -147,12 +147,12 @@ $$
 
 as a short-hand way of saying that the random variable $X$ is described by the probability distribution $ \{{f_i}\}_{i=0}^{I-1}$.
 
-Consider drawing a  sample $x_0, x_1, \dots , x_{N-1}$ of  $N$ independent and identically distributoed  draws of $X$. 
+Consider drawing a sample $x_0, x_1, \dots , x_{N-1}$ of $N$ independent and identically distributed draws of $X$.
 
-What do the "identical" and "independent" mean in   IID or iid ("identically and independently distributed")?
+What do "identical" and "independent" mean in IID or iid ("identically and independently distributed")?
 
 - "identical" means that each draw is from the same distribution.
-- "independent" means that  joint distribution  equal  products of marginal distributions, i.e.,
+- "independent" means that the joint distribution equals the product of marginal distributions, i.e.,
 
 $$
 \begin{aligned}
@@ -161,9 +161,9 @@ $$
 \end{aligned}
 $$
 
-We define an  **empirical distribution** as follows.
+We define an **empirical distribution** as follows.
 
-For each $i  = 0,\dots,I-1$, let 
+For each $i = 0,\dots,I-1$, let
 
 $$
 \begin{aligned}
@@ -174,35 +174,30 @@ N & = \sum^{I-1}_{i=0} N_i \quad \text{total number of draws},\\
 $$
 
 
-Key concepts that connect probability theory with statistics are laws of large numbers and central limit theorems
+Key concepts that connect probability theory with statistics are laws of large numbers and central limit theorems.
 
-**LLN:**
+A Law of Large Numbers (LLN) states that $\tilde {f_i} \to f_i$ as $N \to \infty$.
 
-- A Law of Large Numbers (LLN) states that $\tilde {f_i} \to f_i \text{ as } N \to \infty$
+A Central Limit Theorem (CLT) describes a **rate** at which $\tilde {f_i} \to f_i$.
 
-**CLT:**
+See {doc}`lln_clt` for a detailed treatment of both results.
 
-- A Central Limit Theorem (CLT) describes a  **rate** at which $\tilde {f_i} \to f_i$
+For "frequentist" statisticians, **anticipated relative frequency** is **all** that a probability distribution means.
 
+But for a Bayesian it means something else -- something partly subjective and purely personal.
 
-**Remarks**
+We say "partly" because a Bayesian also pays attention to relative frequencies.
 
-- For "frequentist" statisticians, **anticipated relative frequency**  is **all** that a probability distribution means.
 
-- But for a Bayesian it means something else -- something partly  subjective and purely personal.
-     
-     * we say "partly" because a Bayesian also pays attention to relative frequencies 
+## Representing probability distributions
 
-
-## Representing  Probability Distributions
-
-A  probability distribution $\textrm{Prob} (X \in A)$ can  be described by its **cumulative distribution function (CDF)**
+A probability distribution $\textrm{Prob} (X \in A)$ can be described by its **cumulative distribution function (CDF)**
 
 $$
 F_{X}(x) = \textrm{Prob}\{X\leq x\}.
 $$
 
-Sometimes, but not always, a random variable can also be described by  **density function** $f(x)$
+Sometimes, but not always, a random variable can also be described by a **density function** $f(x)$
 that is related to its CDF by
 
 $$
@@ -215,7 +210,7 @@ $$
 
 Here $B$ is a set of possible $X$'s whose probability of occurring we want to compute.
 
-When a probability density exists, a probability distribution can be characterized either by its CDF or by its  density.
+When a probability density exists, a probability distribution can be characterized either by its CDF or by its density.
 
 For a **discrete-valued** random variable
 
@@ -231,7 +226,7 @@ Doing this enables us to confine our tool set basically to linear algebra.
 Later we'll briefly discuss how to approximate a continuous random variable with a discrete random variable.
 
 
-## Univariate Probability Distributions
+## Univariate probability distributions
 
 We'll devote most of this lecture to discrete-valued random variables, but we'll say a few things
 about continuous-valued random variables.
@@ -281,15 +276,19 @@ $$
 where $\theta $ is a vector of parameters that is of much smaller dimension than $I$.
 
 
-**Remarks:**
+A **statistical model** is a joint probability distribution characterized by a list of **parameters**.
+
+The concept of **parameter** is intimately related to the notion of **sufficient statistic**.
+
+A **statistic** is a nonlinear function of a data set.
+
+**Sufficient statistics** summarize all **information** that a data set contains about parameters of a statistical model.
+
+Note that a sufficient statistic corresponds to a particular statistical model.
+
+Sufficient statistics are key tools that AI uses to summarize or compress a **big data** set.
 
-- A **statistical model** is a joint probability distribution characterized by a list of **parameters** 
-- The concept of  **parameter** is intimately related to the notion of  **sufficient statistic**.
-- A **statistic** is a   nonlinear function of a data set.
-- **Sufficient statistics**  summarize all  **information** that a  data set contains  about  parameters of statistical model.
-   * Note that a sufficient statistic corresponds to a particular statistical model. 
-   * Sufficient statistics are key  tools that AI uses to summarize or compress  a **big data** set.
--  R. A. Fisher provided a rigorous definition of **information** -- see <https://en.wikipedia.org/wiki/Fisher_information>
+R. A. Fisher provided a rigorous definition of **information** -- see <https://en.wikipedia.org/wiki/Fisher_information>.
 
 
 
@@ -323,7 +322,7 @@ $$
 \textrm{Prob}\{X\in \tilde{X}\} =1
 $$
 
-## Bivariate Probability Distributions
+## Bivariate probability distributions
 
 We'll now discuss a bivariate **joint distribution**.
 
@@ -357,7 +356,7 @@ $$
 \sum_{i}\sum_{j}f_{ij}=1
 $$
 
-## Marginal Probability Distributions
+## Marginal probability distributions
 
 The joint distribution induce marginal distributions
 
@@ -391,7 +390,7 @@ $$
 \end{aligned}
 $$
 
-**Digression:** If two random variables $X,Y$ are continuous and have joint density $f(x,y)$, then marginal distributions can be computed by
+As a digression, if two random variables $X,Y$ are continuous and have joint density $f(x,y)$, then marginal distributions can be computed by
 
 $$
 \begin{aligned}
@@ -400,7 +399,7 @@ f(y)& = \int_{\mathbb{R}} f(x,y) dx
 \end{aligned}
 $$
 
-## Conditional Probability  Distributions
+## Conditional probability distributions
 
 Conditional probabilities are defined according to
 
@@ -426,7 +425,7 @@ $$
 =\frac{ \sum_{i}f_{ij} }{ \sum_{i}f_{ij}}=1
 $$
 
-**Remark:** The mathematics  of conditional probability  implies:
+The mathematics of conditional probability implies:
 
 $$
 \textrm{Prob}\{X=i|Y=j\}	=\frac{\textrm{Prob}\{X=i,Y=j\}}{\textrm{Prob}\{Y=j\}}=\frac{\textrm{Prob}\{Y=j|X=i\}\textrm{Prob}\{X=i\}}{\textrm{Prob}\{Y=j\}}
@@ -446,7 +445,7 @@ $$
 $$
 
 
-## Transition Probability Matrix
+## Transition probability matrix
 
 Consider the following joint probability distribution of  two random variables.
 
@@ -495,7 +494,7 @@ Note that
 
 
 
-## Application: Forecasting a Time Series
+## Application: forecasting a time series
 
 Suppose that there are two time periods.
 
@@ -519,11 +518,10 @@ A conditional distribution is
 
 $$\text{Prob} \{X(1)=j|X(0)=i\}= \frac{f_{ij}}{ \sum_{j}f_{ij}}$$
 
-**Remark:**
-- This formula is a workhorse for applied economic forecasters.
+This formula is a workhorse for applied economic forecasters.
 
 
-## Statistical Independence
+## Statistical independence
 
 Random variables X and Y are statistically **independent** if
 
@@ -550,7 +548,7 @@ $$
 $$
 
 
-## Means and Variances
+## Means and variances
 
 The  mean and variance of a discrete random variable $X$  are
 
@@ -571,7 +569,7 @@ $$
 \end{aligned}
 $$
 
-## Matrix Representations of Some Bivariate Distributions
+## Matrix representations of some bivariate distributions
 
 Let's use matrices to represent a joint distribution, conditional distribution, marginal distribution, and the mean and variance of a  bivariate random variable.
 
@@ -590,12 +588,9 @@ $$ \textrm{Prob}(X=i)=\sum_j{f_{ij}}=u_i  $$
 $$ \textrm{Prob}(Y=j)=\sum_i{f_{ij}}=v_j $$
 
 
-**Sampling:**
+Let's write some Python code that lets us draw some long samples and compute relative frequencies.
 
-Let's write some Python code that let's us  draw some long samples and compute relative frequencies.
-
-The code will let us  check whether  the "sampling" distribution agrees   with the "population" distribution - confirming that
-the population distribution correctly tells us the relative frequencies that we should expect in a large sample. 
+The code lets us check whether the "sampling" distribution agrees with the "population" distribution -- confirming that the population distribution correctly tells us the relative frequencies that we should expect in a large sample.
 
 
 
@@ -844,7 +839,7 @@ class discrete_bijoint:
 
 Let's apply our code to some examples.
 
-**Example 1**
+#### Example 1
 
 ```{code-cell} ipython3
 # joint
@@ -863,7 +858,7 @@ d.marg_dist()
 d.cond_dist()
 ```
 
-**Example 2**
+#### Example 2
 
 ```{code-cell} ipython3
 xs_new = np.array([10, 20, 30])
@@ -882,7 +877,7 @@ d_new.marg_dist()
 d_new.cond_dist()
 ```
 
-## A Continuous Bivariate Random Vector
+## A continuous bivariate random vector
 
 
 A two-dimensional Gaussian distribution has  joint density
@@ -929,9 +924,9 @@ y = np.linspace(-10, 10, 1_000)
 x_mesh, y_mesh = np.meshgrid(x, y, indexing="ij")
 ```
 
-**Joint Distribution**
+#### Joint distribution
 
-Let's  plot the **population** joint density.
+Let's plot the **population** joint density.
 
 ```{code-cell} ipython3
 # %matplotlib notebook
@@ -967,7 +962,7 @@ x = data[:, 0]
 y = data[:, 1]
 ```
 
-**Marginal distribution**
+#### Marginal distribution
 
 ```{code-cell} ipython3
 plt.hist(x, bins=1_000, alpha=0.6)
@@ -987,7 +982,7 @@ plt.hist(y_sim, bins=1_000, density=True, alpha=0.4, histtype="step")
 plt.show()
 ```
 
-**Conditional distribution**
+#### Conditional distribution
 
 For a bivariate normal population distribution, the conditional distributions are also normal:
 
@@ -1074,7 +1069,7 @@ print(μy, σy)
 print(μ2 + ρ * σ2 * (1 - μ1) / σ1, np.sqrt(σ2**2 * (1 - ρ**2)))
 ```
 
-## Sum of Two Independently Distributed Random Variables
+## Sum of two independently distributed random variables
 
 Let $X, Y$ be two independent discrete random variables that take values in $\bar{X}, \bar{Y}$, respectively.
 
@@ -1159,8 +1154,6 @@ $$
 
 Given two marginal distribution, $\mu$ for $X$ and $\nu$ for $Y$, a joint distribution $f_{ij}$ is said to be a **coupling** of $\mu$ and $\nu$.
 
-**Example:**
-
 Consider the following bivariate example.
 
 $$
@@ -1229,10 +1222,9 @@ But the joint distributions differ.
 
 Thus, multiple  joint distributions $[f_{ij}]$ can have  the same marginals.
 
-**Remark:**
-- Couplings  are important in optimal transport problems and in Markov processes. Please see this {doc}`lecture about optimal transport <opt_transport>`
+Couplings are important in optimal transport problems and in Markov processes. Please see this {doc}`lecture about optimal transport <opt_transport>`.
 
-## Copula Functions
+## Copula functions
 
 Suppose that $X_1, X_2, \dots, X_n$ are $N$ random variables  and that
 
@@ -1260,9 +1252,9 @@ Thus, for given marginal distributions, we can use  a copula function to determi
 
 Copula functions are often used to characterize **dependence** of  random variables.
 
-**Discrete marginal distribution**
+#### Discrete marginal distribution
 
-As mentioned above,  for two given marginal distributions there can be more than one coupling.
+As mentioned above, for two given marginal distributions there can be more than one coupling.
 
 For example, consider two  random variables $X, Y$ with distributions
 
@@ -1484,7 +1476,7 @@ We have verified that both joint distributions, $c_1$ and $c_2$, have identical
 
 So they are both couplings of $X$ and $Y$.
 
-**Gaussian Copula Example**
+### Gaussian copula example
 
 A **Gaussian copula** uses the bivariate normal distribution to induce dependence between
 arbitrary marginal distributions.
@@ -1498,6 +1490,12 @@ The construction has three steps:
 The following code illustrates this with exponential marginals.
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: gaussian copula with exponential marginals
+    name: fig-gaussian-copula
+---
 from scipy import stats
 
 # Gaussian copula parameters
@@ -1521,11 +1519,9 @@ fig, axes = plt.subplots(1, 2, figsize=(10, 4))
 axes[0].scatter(u1[:3000], u2[:3000], alpha=0.2, s=2)
 axes[0].set_xlabel('$u_1$')
 axes[0].set_ylabel('$u_2$')
-axes[0].set_title(f'Copula (uniform marginals, ρ={rho_cop})')
 axes[1].scatter(x1[:3000], x2[:3000], alpha=0.2, s=2)
 axes[1].set_xlabel('$x_1$ (Exp, mean=1)')
 axes[1].set_ylabel('$x_2$ (Exp, mean=0.5)')
-axes[1].set_title('Exponential marginals via Gaussian copula')
 plt.tight_layout()
 plt.show()
 
@@ -1533,8 +1529,10 @@ print(f"Sample correlation of (x1, x2): {np.corrcoef(x1, x2)[0, 1]:.3f}")
 print(f"Sample correlation of (u1, u2): {np.corrcoef(u1, u2)[0, 1]:.3f}")
 ```
 
-The left panel shows the copula itself — the dependence structure in uniform coordinates.
+The left panel shows the copula itself -- the dependence structure in uniform coordinates, drawn from a bivariate normal with correlation $\rho = 0.8$.
+
 The right panel shows the same dependence translated to exponential marginals.
+
 Changing $\rho$ controls the strength of dependence while the marginals remain unchanged.
 
 ## Exercises
@@ -1552,13 +1550,13 @@ $$
 
 where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
 
-(a) Compute the marginal distributions $\mu_i = \text{Prob}\{X=i\}$ and $\nu_j = \text{Prob}\{Y=j\}$.
+1. Compute the marginal distributions $\mu_i = \text{Prob}\{X=i\}$ and $\nu_j = \text{Prob}\{Y=j\}$.
 
-(b) Form the independence matrix $f^{\perp}_{ij} = \mu_i \nu_j$ (the outer product of the two marginal vectors).
+1. Form the independence matrix $f^{\perp}_{ij} = \mu_i \nu_j$ (the outer product of the two marginal vectors).
 
-(c) Compare $F$ with $f^{\perp}$ and determine whether $X$ and $Y$ are independent.
+1. Compare $F$ with $f^{\perp}$ and determine whether $X$ and $Y$ are independent.
 
-(d) Verify your conclusion by computing $\text{Prob}\{X=0|Y=10\}$ and checking whether it equals $\text{Prob}\{X=0\}$.
+1. Verify your conclusion by computing $\text{Prob}\{X=0|Y=10\}$ and checking whether it equals $\text{Prob}\{X=0\}$.
 ```
 
 ```{solution-start} prob_matrix_ex1
@@ -1601,13 +1599,13 @@ print(f"Prob(X=0)         = {mu[0]:.4f}")
 
 Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 20\}$ as in Exercise 1:
 
-(a) Compute $\mathbb{E}[X]$, $\mathbb{E}[Y]$, and $\mathbb{E}[XY] = \sum_i \sum_j x_i y_j f_{ij}$.
+1. Compute $\mathbb{E}[X]$, $\mathbb{E}[Y]$, and $\mathbb{E}[XY] = \sum_i \sum_j x_i y_j f_{ij}$.
 
-(b) Compute $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$.
+1. Compute $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$.
 
-(c) Compute $\text{Cor}(X,Y) = \text{Cov}(X,Y) / (\sigma_X \sigma_Y)$.
+1. Compute $\text{Cor}(X,Y) = \text{Cov}(X,Y) / (\sigma_X \sigma_Y)$.
 
-(d) Show analytically that $X \perp Y$ implies $\text{Cov}(X,Y) = 0$.
+1. Show analytically that $X \perp Y$ implies $\text{Cov}(X,Y) = 0$.
 ```
 
 ```{solution-start} prob_matrix_ex2
@@ -1642,7 +1640,7 @@ cor_XY = cov_XY / np.sqrt(var_X * var_Y)
 print(f"Cor(X,Y) = {cor_XY:.4f}")
 ```
 
-For part (d): if $X \perp Y$ then $f_{ij} = \mu_i \nu_j$, so
+For part 4: if $X \perp Y$ then $f_{ij} = \mu_i \nu_j$, so
 
 $$
 \mathbb{E}[XY] = \sum_i \sum_j x_i y_j \mu_i \nu_j
@@ -1662,13 +1660,13 @@ and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0
 
 Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
 
-(a) Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
+1. Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
 
-(b) Plot the theoretical distribution.
+1. Plot the theoretical distribution.
 
-(c) Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
+1. Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
 
-(d) Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
+1. Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
 ```
 
 ```{solution-start} prob_matrix_ex3
@@ -1720,11 +1718,11 @@ $$
 
 where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
 
-(a) Starting from $\psi_0 = [1, 0]$, compute $\psi_n = \psi_0 P^n$ for $n = 1, 5, 20, 100$.
+1. Starting from $\psi_0 = [1, 0]$, compute $\psi_n = \psi_0 P^n$ for $n = 1, 5, 20, 100$.
 
-(b) Find the stationary distribution $\psi^*$ satisfying $\psi^* P = \psi^*$ and $\sum_i \psi^*_i = 1$.
+1. Find the stationary distribution $\psi^*$ satisfying $\psi^* P = \psi^*$ and $\sum_i \psi^*_i = 1$.
 
-(c) Verify numerically that $\psi_n \to \psi^*$ as $n$ grows.
+1. Verify numerically that $\psi_n \to \psi^*$ as $n$ grows.
 ```
 
 ```{solution-start} prob_matrix_ex4
@@ -1763,15 +1761,15 @@ print(f"psi_100 close to stationary? {np.allclose(psi_100, psi_star, atol=1e-6)}
 
 Let $X \in \{0,1\}$ and $Y \in \{0,1\}$ with marginals $\mu = [0.5,\, 0.5]$ and $\nu = [0.4,\, 0.6]$.
 
-(a) Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
+1. Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
 
-(b) Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
+1. Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
 
-(c) Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
+1. Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
 
-(d) Verify that all three have the correct marginals.
+1. Verify that all three have the correct marginals.
 
-(e) For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
+1. For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
 ```
 
 ```{solution-start} prob_matrix_ex5
@@ -1830,7 +1828,7 @@ print(f"Cor independent   = {correlation(F_indep, xs, ys):.4f}")
 
 A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
 
-(a) After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
+1. After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
 
 $$
 \mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
@@ -1838,11 +1836,11 @@ $$
 
 for each $\theta$.
 
-(b) Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
+1. Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
 
-(c) Plot the prior and posterior side by side.
+1. Plot the prior and posterior side by side.
 
-(d) Repeat for $k = 3$ heads and describe how the posterior shifts.
+1. Repeat for $k = 3$ heads and describe how the posterior shifts.
 ```
 
 ```{solution-start} prob_matrix_ex6

From 41fba896f97d1ec23ec6f0b5da43a1615460502a Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 20:45:23 +1000
Subject: [PATCH 04/26] updates

---
 lectures/information_market_equilibrium.md | 1143 ++++++++++++++------
 lectures/multivariate_normal.md            |   31 +-
 lectures/prob_matrix.md                    |  213 ++--
 3 files changed, 901 insertions(+), 486 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 851560d7b..475df4d1f 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -15,7 +15,10 @@ kernelspec:
 ```{raw} jupyter
 <div id="qe-notebook-header" align="right" style="text-align:right;">
         <a href="https://quantecon.org/" title="quantecon.org">
-                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+                <img style="width:250px;display:inline;"
+                width="250px"
+                src="https://assets.quantecon.org/img/qe-menubar-logo.svg"
+                alt="QuantEcon">
         </a>
 </div>
 ```
@@ -28,66 +31,85 @@ kernelspec:
 
 ## Overview
 
-This lecture studies two questions about the **informational role of prices** posed and
+This lecture studies two questions about the **informational role of prices**
+posed and
 answered by {cite:t}`kihlstrom_mirman1975`.
 
 1. *When do prices transmit inside information?*   
    - An informed insider observes a private
-   signal correlated with an unknown state of the world and adjusts demand accordingly.
+   signal correlated with an unknown state of the world and adjusts demand
+   accordingly.
    - Equilibrium prices shift. 
    - Under what conditions can an outside observer *infer* the
    insider's private signal from the equilibrium price?
 
 2. *Do Bayesian price expectations converge?*  
    - In a stationary stochastic exchange
-   economy, an uninformed observer uses the history of market prices and Bayes' Law to form
-   expectations about the economy's structure.  
+   economy, an uninformed observer uses the history of market prices and
+   Bayes' Law to form
+   beliefs about the economy's structure and hence about its induced price
+   distribution.
    - Do those expectations eventually
    agree with those of a fully informed observer?
 
 Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
 
-- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be *sufficient* for a random variable
-  $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be
+  *sufficient* for a random variable
+  $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives
+  all the
   information about the state that $\tilde{y}'$ contains.
-- **Bayesian consistency**: as the sample grows, a Bayesian statistician's posterior probability distribution concentrates on the true
-  parameter value, even when the underlying economic structure is not globally identified from prices alone.
+- **Bayesian consistency**: as the sample grows, posterior beliefs eliminate
+  models that imply the wrong **price distribution**, so even when structure is
+  not identified from prices the posterior mass on the true **reduced form**
+  still converges to one.
 
 Important findings of {cite:t}`kihlstrom_mirman1975` are:
 
-- Equilibrium prices transmit inside information *if and only if* the map from the
-  insider's posterior distribution to the equilibrium price vector is invertible
-  (one-to-one).
-- For a two-state pure exchange economy with CES preferences, invertibility holds whenever the
-  elasticity of substitution $\sigma \neq 1$.  
-  - With Cobb-Douglas preferences ($\sigma = 1$)
-  the equilibrium price is independent of the insider's posterior, so information is never
-  transmitted.
-- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not identified.
+- Equilibrium prices transmit inside information *if and only if* the map from
+  the
+  insider's posterior distribution to the equilibrium price is one-to-one on
+  the set of
+  posteriors that can actually arise from the signal.
+- In the paper's two-state theorem, invertibility holds when the informed
+  agent's utility is homothetic and the elasticity of substitution is everywhere
+  either below one or above one, with CES preferences providing a convenient
+  illustration and Cobb-Douglas preferences ($\sigma = 1$) giving the opposite
+  case in which the equilibrium price is independent of the insider's posterior.
+- In the dynamic economy, as information accumulates, Bayesian price
+  expectations converge to **rational expectations**, even when the deep
+  structure is not identified from prices alone.
 
 ```{note}
-{cite:t}`kihlstrom_mirman1975` use the terms "reduced form" and "structural" models in a
+{cite:t}`kihlstrom_mirman1975` use the terms "reduced form" and "structural"
+models in a
 way that careful econometricians do. 
 
 Reduced-form and structural models come in pairs. 
 
 To each structure or structural model
-there is a reduced form, or collection of reduced forms, underlying different possible regressions.
+there is a reduced form, or collection of reduced forms, underlying different
+possible regressions.
+
+In this lecture, a **structure** is a parameterization of the underlying
+endowment process.
 ```
 
 The lecture is organized as follows.
 
 1. Set up the static two-commodity model and define equilibrium.
-2. State the price-revelation theorem (Theorem 1 of the paper) and the invertibility
-   conditions (Theorem 2).
-3. Illustrate invertibility and its failure with numerical examples using CES and
+2. State the price-revelation theorem and the invertibility conditions.
+3. Illustrate invertibility and its failure with numerical examples using CES
+   and
    Cobb-Douglas preferences.
-4. Introduce the dynamic stochastic economy and derive the Bayesian convergence result.
+4. Introduce the dynamic stochastic economy and derive the Bayesian convergence
+   result.
 5. Simulate Bayesian learning from price observations.
 
-This lecture builds on ideas in {doc}`blackwell_kihlstrom` and {doc}`likelihood_bayes`.
+This lecture builds on ideas in {doc}`blackwell_kihlstrom` and
+{doc}`likelihood_bayes`.
 
-## Setup
+We start by importing some Python packages.
 
 ```{code-cell} ipython3
 import numpy as np
@@ -96,7 +118,8 @@ from scipy.optimize import brentq
 from scipy.stats import norm
 ```
 
-## A two-commodity economy with an informed insider
+
+## Setup
 
 ### Preferences, endowments, and the unknown state
 
@@ -112,16 +135,28 @@ from a bundle $(x_1^i, x_2^i)$ is
 
 $$
 U^i(x_1^i, x_2^i)
-  = \sum_{s=1}^{S} u^i(a_s x_1^i,\, x_2^i)\, PR^i(\bar{a} = a_s),
+  = \sum_{s=1}^{S} u^i(a_s x_1^i,\, x_2^i)\, P^i(\bar{a} = a_s),
 $$
 
-where $PR^i$ is agent $i$'s subjective probability distribution over the finite state space
+where $P^i$ is agent $i$'s subjective probability distribution over the finite
+state space
 $A = \{a_1, \ldots, a_S\}$.
 
-Each agent starts with an endowment $w^i$ of good 2 and a share $\theta^i$ of the
+Each agent starts with an endowment $w^i$ of good 2 and a share $\theta^i$ of
+the
 representative firm.
 
-The firm's profit $\pi$ is determined by profit maximization.
+In the paper's formal model, a single firm transforms good 2 into good 1
+according to
+$y_1 = f(y_2)$ with $f' < 0$ and chooses production to maximize
+
+$$
+\pi(p) = \max_{y_2 \leq 0} \{p f(y_2) + y_2\}.
+$$
+
+The firm's profit $\pi$ is then distributed to households according to the
+shares
+$\theta^i$.
 
 Agent
 $i$'s budget constraint is
@@ -135,16 +170,25 @@ Agents maximize expected utility subject to their budget constraints.
 A **competitive
 equilibrium** is a price $\hat{p}$ that clears both markets simultaneously.
 
+For most of what follows, the production side matters only through the induced
+equilibrium price map, so when we turn to numerical illustrations we will
+suppress production and use a pure-exchange / portfolio interpretation to keep
+the calculations transparent.
+
 ### The informed agent's problem
 
-Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$ correlated with
+Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$
+correlated with
 $\bar{a}$ before trading.
 
-Upon observing $\tilde{y} = y$, agent 1 updates their prior
-$\mu = PR^1$ to a **posterior** $\mu_y = (\mu_{y1}, \ldots, \mu_{yS})$ via Bayes' rule:
+Before the signal arrives, agent 1 has prior beliefs
+$\mu_0 = P^1$.
+
+Upon observing $\tilde{y} = y$, agent 1 updates to the
+**posterior** $\mu_y = (\mu_{y1}, \ldots, \mu_{yS})$ via Bayes' rule:
 
 $$
-\mu_{ys} = PR(\bar{a} = a_s \mid \tilde{y} = y).
+\mu_{ys} = P(\bar{a} = a_s \mid \tilde{y} = y).
 $$
 
 Because agent 1's demand depends on $\mu_y$, the new equilibrium price satisfies
@@ -153,46 +197,60 @@ $$
 \hat{p} = p(\mu_y).
 $$
 
-Outside observers who see $\hat{p}$ but not $\tilde{y}$ can try to *back out* the
+Outside observers who see $\hat{p}$ but not $\tilde{y}$ can try to *back out*
+the
 insider's posterior from the price.
 
-This is possible when the map $\mu \mapsto p(\mu)$
-is **invertible** on the relevant domain.
+Define the set of realized posteriors
+
+$$
+M = \{\mu_y : y \in Y,\; P(\tilde y = y) > 0\}.
+$$
+
+The key question is whether the map $\mu \mapsto p(\mu)$ is one-to-one on $M$.
+
+To answer that question, we now translate "information in prices" into
+Blackwell's language of sufficiency.
 
 (price_revelation_theorem)=
 ## Price revelation
 
 ### Blackwell sufficiency
 
-The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's private
-information if observing the equilibrium price is just as informative about $\bar{a}$ as
+The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's
+private
+information if observing the equilibrium price is just as informative about
+$\bar{a}$ as
 observing the signal $\tilde{y}$ directly.
 
-In Blackwell's language ({cite}`blackwell1951` and {cite}`blackwell1953`), this means
+In Blackwell's language ({cite:t}`blackwell1951` and {cite:t}`blackwell1953`),
+this means
 $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 
 ```{prf:definition} Sufficiency
 :label: ime_def_sufficiency
 
 A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
-respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
+respect to $\bar{a}$) if there exists a conditional distribution $P(y' \mid y)$,
 **independent of** $\bar{a}$, such that
 
 $$
-\phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
+\phi'_a(y') = \sum_{y \in Y} P(y' \mid y)\, \phi_a(y)
 \quad \text{for all } a \text{ and all } y',
 $$
 
-where $\phi_a(y) = PR(\tilde{y} = y \mid \bar{a} = a)$.
+where $\phi_a(y) = P(\tilde{y} = y \mid \bar{a} = a)$.
 
 Thus, once $\tilde{y}$ is known, $\tilde{y}'$ provides no additional information
 about $\bar{a}$.
 ```
 
+{cite:t}`kihlstrom_mirman1975` show that 
+
 ```{prf:lemma} Posterior Sufficiency
 :label: ime_lemma_posterior_sufficiency
 
-({cite:t}`kihlstrom_mirman1975`) The posterior distribution $\mu_{\tilde{y}}$
+The posterior distribution $\mu_{\tilde{y}}$
 is sufficient for $\tilde{y}$.
 ```
 
@@ -200,30 +258,76 @@ is sufficient for $\tilde{y}$.
 The posterior $\mu_{\tilde{y}}$ satisfies
 
 $$
-PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
-  = PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y).
+P(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
+  = P(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y).
 $$
 
-Because the posterior itself *encodes* what $\tilde{y}$ says about $\bar{a}$, observing
-$\tilde{y}$ directly would add no information.
+This identity says that once the posterior is known, conditioning on the
+original signal
+$\tilde y$ does not change beliefs about $\bar a$.
+
+Equivalently, the conditional law of $\tilde y$ given $\mu_{\tilde y}$ is
+independent of
+$\bar a$, so $\mu_{\tilde y}$ is sufficient for $\tilde y$ in Blackwell's sense.
 ```
 
+Now let's think about the mapping from 
+belief to price.
+
 ```{prf:theorem} Price Revelation
 :label: ime_theorem_price_revelation
 
 In the economy described above, the price
-random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only if** the
-function $p(PR^1)$ is **invertible** on the set
+random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only
+if** the
+belief-to-price map is one-to-one on the realized posterior set $M$,
+equivalently if its
+inverse is well defined on the price set
 
 $$
 P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
-  PR(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu(a) > 0 \bigr\}.
+  P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \bigr\}.
 $$
 ```
 
-The "only if" direction follows because if $p$ were not one-to-one, two different posteriors
-would generate the same price; an observer could not distinguish them, so the price would
-not transmit all information that resides in the signal.
+The logic is
+
+$$
+\tilde y \quad \longrightarrow \quad \mu_{\tilde y} \quad \longrightarrow \quad
+p(\mu_{\tilde y}).
+$$
+
+The first arrow loses no information about $\bar a$ by
+{prf:ref}`ime_lemma_posterior_sufficiency`, and the theorem asks when the second
+arrow also loses no information.
+
+The proof has two parts.
+
+If $p(\cdot)$ is one-to-one on $M$, then observing the price is equivalent to
+observing the
+posterior itself because
+
+$$
+P(\mu_{\tilde y} = \mu \mid p(\mu_{\tilde y}) = p)
+= \begin{cases}
+1 & \text{if } \mu = p^{-1}(p), \\
+0 & \text{otherwise.}
+\end{cases}
+$$
+
+This conditional distribution is independent of the state, so price is
+sufficient for the
+posterior; together with {prf:ref}`ime_lemma_posterior_sufficiency`, price is
+therefore
+sufficient for the signal.
+
+Conversely, if two different posteriors in $M$ generated the same price, an
+observer of the price could not tell which posterior had occurred, and the paper
+shows formally that in this case the conditional distribution of the posterior
+given price would depend on the state, so price could not be sufficient.
+
+Before turning to invertibility itself, it helps to keep in mind the two
+economic interpretations emphasized in the paper.
 
 ### Two interpretations
 
@@ -233,56 +337,137 @@ Good 1 is a risky asset with random return $\bar{a}$; good 2 is "money".
 
 An insider's demand reveals private information about the return.
 
-If the invertibility condition holds, outside observers can read the insider's signal from
+If the invertibility condition holds, outside observers can read the insider's
+signal from
 the equilibrium stock price.
 
 #### Price as a quality signal
 
 Good 1 has uncertain quality $\bar{a}$.
 
-Experienced consumers (who have sampled the good) observe a signal correlated with quality
+Experienced consumers (who have sampled the good) observe a signal correlated
+with quality
 and buy accordingly.
 
-Uninformed consumers can infer quality from the market price, provided invertibility holds.
+Uninformed consumers can infer quality from the market price, provided
+invertibility holds.
 
 (invertibility_conditions)=
 ## Invertibility and the elasticity of substitution
 
-When does $p(PR^1)$ fail to be invertible?
+The price-revelation theorem reduces the economic problem to a narrower one:
+when is the belief-to-price map actually one-to-one?
 
-Theorem 2 of {cite:t}`kihlstrom_mirman1975`
-shows that for a two-state economy ($S = 2$), the answer turns on the **elasticity of
+When does the belief-to-price map fail to be invertible?
+
+Theorem {prf:ref}`ime_theorem_invertibility_conditions`
+shows that for a two-state economy ($S = 2$), the answer depends on the
+**elasticity of
 substitution** $\sigma$ of agent 1's utility function.
 
+Before stating the theorem, it helps to see the two intermediate steps in the
+paper's
+argument.
+
+```{prf:lemma} Same Price Implies Same Allocation
+:label: ime_lemma_same_price_same_allocation
+
+Fix the beliefs of all agents except agent 1.
+
+If two posterior beliefs $\mu$ and $\mu'$
+both generate the same equilibrium price $p$, then they generate the same
+equilibrium
+allocation for every trader.
+```
+
+```{prf:proof} (Sketch)
+All uninformed agents face the same price $p$ and keep the same beliefs, so
+their demands
+are unchanged.
+
+The firm's supply is also unchanged because it depends only on $p$.
+
+Market clearing then pins down agent 1's demand as the residual, so agent 1 must
+consume the
+same bundle under $\mu$ and $\mu'$ as well.
+```
+
+This lemma lets us define the informed agent's equilibrium bundle as a function
+of price
+alone:
+
+$$
+x(p) = (x_1(p), x_2(p)).
+$$
+
+Whenever the informed agent consumes positive amounts of both goods, optimality
+of $x(p)$
+under posterior $\mu$ gives the interior first-order condition
+
+$$
+p = \frac{\sum_{s=1}^S a_s u_1^1(a_s x_1(p), x_2(p))\, \mu(a_s)}
+         {\sum_{s=1}^S u_2^1(a_s x_1(p), x_2(p))\, \mu(a_s)}.
+$$
+
+For a fixed price $p$, the bundle $x(p)$ is fixed too, so invertibility boils
+down to
+whether this equation admits a unique posterior $\mu$.
+
+```{prf:lemma} Unique Posterior at a Given Price
+:label: ime_lemma_unique_posterior
+
+If, for each price $p \in P$, the first-order condition above has a unique
+solution
+$\mu \in M$, then the price map is invertible on $P$.
+```
+
+This is Lemma 3 in the paper: if two different posteriors gave the same price,
+then by
+{prf:ref}`ime_lemma_same_price_same_allocation` they would share the same bundle
+$x(p)$,
+contradicting uniqueness of the posterior that solves the first-order condition
+at that
+price.
+
 ### The two-state first-order condition
 
-With $S = 2$ and $\mu = (q,\, 1-q)$, the first-order condition for agent 1's demand
-(equation (12a) in the paper) reduces to
+With $S = 2$ and $\mu = (q,\, 1-q)$, define
 
 $$
-p(q) = \frac{\alpha_1 q + \alpha_2 (1-q)}{\beta_1 q + \beta_2 (1-q)},
+\alpha_s(p) = a_s\, u^1_1(a_s x_1(p),\, x_2(p)), \qquad
+\beta_s(p)  = u^1_2(a_s x_1(p),\, x_2(p)), \qquad s = 1, 2.
 $$
 
-where
+Then the first-order condition becomes
 
 $$
-\alpha_s = a_s\, u^1_1(a_s x_1,\, x_2), \qquad
-\beta_s  = u^1_2(a_s x_1,\, x_2), \qquad s = 1, 2.
+p = \frac{\alpha_1(p)\, q + \alpha_2(p)\, (1-q)}
+         {\beta_1(p)\, q + \beta_2(p)\, (1-q)}.
 $$
 
-The equilibrium consumption $(x_1, x_2)$ itself depends on $p$, so this is an implicit
-equation in $p$.
+At a fixed price $p$, the quantities $\alpha_s(p)$ and $\beta_s(p)$ are
+constants, so
+uniqueness of the posterior is the same as uniqueness of the scalar $q$ solving
+this
+equation.
 
 ```{prf:theorem} Invertibility Conditions
 :label: ime_theorem_invertibility_conditions
 
 Assume $u^1$ is quasi-concave and
-homothetic with continuous first partials. Assume agent 1 always consumes positive
-quantities of both goods. For $S = 2$:
+homothetic with continuous first partials. 
+
+Assume agent 1 always consumes positive
+quantities of both goods.
+
+For $S = 2$:
 
-- If $\sigma < 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
-- If $\sigma > 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
-- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), $p(PR^1)$ is **constant** on $P$
+- If $\sigma < 1$ for all feasible allocations, the price map is **invertible**
+  on $P$.
+- If $\sigma > 1$ for all feasible allocations, the price map is **invertible**
+  on $P$.
+- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), the price map is **constant** on
+  $P$
   (no information is transmitted).
 ```
 
@@ -291,13 +476,18 @@ making agent 1's demand for good 1 independent of information about $\bar{a}$.
 
 So the market price cannot reveal that information.
 
+The general theorem is abstract, so we now specialize to CES utility to make the
+mechanism concrete.
+
 ### CES utility
 
-For concreteness we work with the **constant-elasticity-of-substitution** (CES) utility
+For concreteness we work with the **constant-elasticity-of-substitution** (CES)
+utility
 function
 
 $$
-u(c_1, c_2) = \bigl(c_1^{\rho} + c_2^{\rho}\bigr)^{1/\rho}, \qquad \rho \in (-\infty,0) \cup (0,1),
+u(c_1, c_2) = \bigl(c_1^{\rho} + c_2^{\rho}\bigr)^{1/\rho}, \qquad \rho \in
+(-\infty,0) \cup (0,1),
 $$
 
 whose elasticity of substitution is $\sigma = 1/(1-\rho)$.
@@ -309,17 +499,26 @@ whose elasticity of substitution is $\sigma = 1/(1-\rho)$.
 Pertinent partial derivatives are
 
 $$
-u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1}, \qquad
+u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1},
+\qquad
 u_2(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_2^{\rho-1}.
 $$
 
+This CES example is only an illustration, because the theorem itself covers any
+homothetic utility with elasticity everywhere above one or everywhere below one.
+
+With that example in hand, we can compute the equilibrium price directly as a
+function of the posterior.
+
 ### Equilibrium price as a function of the posterior
 
-We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1 at
+We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1
+at
 equilibrium (i.e., $x_1 = 1$).
 
 Agent 1's budget constraint then reduces to
-$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$ satisfying
+$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$
+satisfying
 the first-order condition
 
 $$
@@ -327,54 +526,37 @@ p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
 = q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
 $$
 
-For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p = W^1 - p$,
-giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no information
+For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p =
+W^1 - p$,
+giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no
+information
 is transmitted through the price in the Cobb-Douglas case.
 
 We compute first-order conditions numerically below.
 
 ```{code-cell} ipython3
-def ces_derivatives(c1, c2, rho):
+def ces_derivatives(c1, c2, ρ):
     """
-    Returns (u1, u2) for u(c1,c2) = (c1^rho + c2^rho)^(1/rho).
-    Uses Cobb-Douglas limit for |rho| < 1e-4 to avoid numerical overflow.
+    Return CES marginal utilities.
+
+    Use the Cobb-Douglas limit near rho = 0.
     """
-    if abs(rho) < 1e-4:
-        # Cobb-Douglas limit  u = sqrt(c1*c2)
+    if abs(ρ) < 1e-4:
         u1 = 0.5 * np.sqrt(c2 / c1)
         u2 = 0.5 * np.sqrt(c1 / c2)
     else:
-        common = (c1**rho + c2**rho)**(1/rho - 1)
-        u1 = common * c1**(rho - 1)
-        u2 = common * c2**(rho - 1)
+        common = (c1**ρ + c2**ρ)**(1 / ρ - 1)
+        u1 = common * c1**(ρ - 1)
+        u2 = common * c2**(ρ - 1)
     return u1, u2
 
 
-def eq_price(q, a1, a2, W1, rho):
-    """
-    Solve for the equilibrium price when the informed agent absorbs one unit
-    of good 1.  With x1 = 1 and budget constraint x2 = W1 - p, the FOC
-
-        p [q u2(a1, x2) + (1-q) u2(a2, x2)] = q a1 u1(a1, x2) + (1-q) a2 u1(a2, x2)
-
-    has a unique root p* in (0, W1).
-
-    Parameters
-    ----------
-    q   : posterior probability on state 1 (high state)
-    a1  : state-1 productivity value  (a1 > a2)
-    a2  : state-2 productivity value
-    W1  : informed agent's wealth
-    rho : CES parameter  (rho=0 → Cobb-Douglas; analytical p* = W1/2)
-
-    Returns
-    -------
-    p_star : equilibrium price, or nan if solver fails
-    """
+def eq_price(q, a1, a2, W1, ρ):
+    """Return the equilibrium price for posterior q."""
     def residual(p):
-        x2 = W1 - p          # x1 = 1 absorbed at equilibrium
-        u1_s1, u2_s1 = ces_derivatives(a1, x2, rho)
-        u1_s2, u2_s2 = ces_derivatives(a2, x2, rho)
+        x2 = W1 - p
+        u1_s1, u2_s1 = ces_derivatives(a1, x2, ρ)
+        u1_s2, u2_s2 = ces_derivatives(a2, x2, ρ)
         lhs = p * (q * u2_s1 + (1 - q) * u2_s2)
         rhs = q * a1 * u1_s1 + (1 - q) * a2 * u1_s2
         return lhs - rhs
@@ -392,45 +574,44 @@ mystnb:
     caption: equilibrium price vs posterior
     name: fig-eq-price-posterior
 ---
-# ── Economy parameters ──────────────────────────────────────────────────────
 a1, a2 = 2.0, 0.5     # state values (a1 > a2)
-W1     = 4.0           # informed agent's wealth; equilibrium x2 = W1 - p
+W1 = 4.0
 
-# Posterior grid
 q_grid = np.linspace(0.05, 0.95, 200)
 
-# rho values to compare: complements (<0), Cobb-Douglas (=0), substitutes (>0)
-rho_values = [-0.5, 0.0, 0.5]
-rho_labels = [r"$\rho = -0.5$  ($\sigma = 0.67$, complements)",
-              r"$\rho = 0$  ($\sigma = 1$, Cobb-Douglas)",
-              r"$\rho = 0.5$  ($\sigma = 2$, substitutes)"]
-colors     = ["steelblue", "crimson", "forestgreen"]
+ρ_values = [-0.5, 0.0, 0.5]
+ρ_labels = [
+    r"$\rho = -0.5$ ($\sigma = 0.67$, complements)",
+    r"$\rho = 0$ ($\sigma = 1$, Cobb-Douglas)",
+    r"$\rho = 0.5$ ($\sigma = 2$, substitutes)",
+]
 
 fig, ax = plt.subplots(figsize=(8, 5))
 
-for rho, label, color in zip(rho_values, rho_labels, colors):
-    prices = [eq_price(q, a1, a2, W1, rho) for q in q_grid]
-    ax.plot(q_grid, prices, label=label, color=color, lw=2)
+for ρ, label in zip(ρ_values, ρ_labels):
+    prices = [eq_price(q, a1, a2, W1, ρ) for q in q_grid]
+    ax.plot(q_grid, prices, label=label, lw=2)
 
 ax.set_xlabel(r"posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
 ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.legend(fontsize=10)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
 The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
-- **CES with $\sigma \neq 1$**: the equilibrium price is **strictly monotone** in $q$.
-  An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely invert the
+- *CES with $\sigma \neq 1$*: the equilibrium price is **strictly monotone** in
+  $q$.
+
+  - An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely
+    invert the
   price to recover $q$—inside information is fully transmitted.
-- **Cobb-Douglas ($\sigma = 1$)**: the price is *flat* in $q$—information is never
+- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$—information is never
   transmitted through the market.
 
 ```{code-cell} ipython3
-# Verify that rho=0 (exact Cobb-Douglas) gives a flat line
-p_cd = [eq_price(q, a1, a2, W1, rho=0.0) for q in q_grid]
+p_cd = [eq_price(q, a1, a2, W1, ρ=0.0) for q in q_grid]
 
 print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
       f"max p* = {max(p_cd):.6f}, "
@@ -438,14 +619,38 @@ print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
 print(f"Analytical CD price  = W1/2 = {W1/2:.6f}")
 ```
 
-Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the Cobb-Douglas
+Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the
+Cobb-Douglas
 equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
 
+The numerical plot shows monotonicity, and the next subsection connects that
+pattern back to the proof of {prf:ref}`ime_theorem_invertibility_conditions`.
+
 (price_monotonicity)=
 ### Why monotonicity depends on $\sigma$
 
-The derivative $\partial p / \partial q$ has the sign of $\alpha_1 \beta_2 - \alpha_2 \beta_1$
-(from differentiating the FOC formula).
+The key derivative in the paper fixes a price $p$, treats $\alpha_s(p)$ and
+$\beta_s(p)$ as constants, and then differentiates the right-hand side of
+
+$$
+\frac{\alpha_1(p)\, q + \alpha_2(p)\, (1-q)}
+     {\beta_1(p)\, q + \beta_2(p)\, (1-q)}
+$$
+
+is a function of $q$ whose derivative is
+
+$$
+\frac{\partial}{\partial q}
+\frac{\alpha_1 q + \alpha_2 (1-q)}
+     {\beta_1 q + \beta_2 (1-q)}
+= \frac{\alpha_1 \beta_2 - \alpha_2 \beta_1}
+       {\bigl[\beta_1 q + \beta_2 (1-q)\bigr]^2}.
+$$
+
+So the sign is determined by $\alpha_1 \beta_2 - \alpha_2 \beta_1$, and if that
+sign is constant then for each fixed price there is at most one posterior weight
+$q$ consistent with the first-order condition, which is exactly what
+{prf:ref}`ime_theorem_invertibility_conditions` requires.
 
 Using
 
@@ -463,14 +668,23 @@ $$
     \Bigl(\frac{x_2}{x_1}\Bigr)^{1/\sigma}.
 $$
 
-This is positive when $\sigma > 1$, negative when $\sigma < 1$, and **zero when $\sigma = 1$**
-(Cobb-Douglas).
+For the CES specification, this derivative is positive when $\sigma > 1$,
+negative when
+$\sigma < 1$, and *zero when $\sigma = 1$*.
+
+In other words, for CES utility the ratio $\alpha_s / \beta_s$ moves
+monotonically with the state value $a_s$ unless $\sigma = 1$, which makes the
+fixed-price first-order-condition expression monotone in $q$ and in turn
+delivers invertibility.
 
-The vanishing derivative means the marginal rate of substitution is
-independent of $a_s$, so the informed agent's demand—and hence the equilibrium price—does
+The vanishing derivative in the Cobb-Douglas case means the marginal rate of
+substitution is
+independent of $a_s$, so the informed agent's demand, and hence the equilibrium
+price, does
 not respond to changes in beliefs.
 
-Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for different
+Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for
+different
 values of $\sigma$:
 
 ```{code-cell} ipython3
@@ -481,38 +695,41 @@ mystnb:
     name: fig-mrs-alpha-beta
 ---
 a_vals = np.linspace(0.3, 3.0, 300)
-x1_fix, x2_fix = 1.0, 1.0   # fix consumption bundle for illustration
+x1_fix, x2_fix = 1.0, 1.0
 
 fig, ax = plt.subplots(figsize=(7, 4))
-for rho, color in zip([-0.5, -1e-6, 0.5], ["steelblue", "crimson", "forestgreen"]):
-    sigma = 1 / (1 - rho) if abs(rho) > 1e-8 else 1.0
+for ρ in [-0.5, -1e-6, 0.5]:
+    σ = 1 / (1 - ρ) if abs(ρ) > 1e-8 else 1.0
     ratios = []
     for a in a_vals:
-        u1, u2 = ces_derivatives(a * x1_fix, x2_fix, rho)
+        u1, u2 = ces_derivatives(a * x1_fix, x2_fix, ρ)
         ratios.append(a * u1 / u2)
-    ax.plot(a_vals, ratios,
-            label=rf"$\sigma = {sigma:.2f}$", color=color, lw=2)
+    ax.plot(a_vals, ratios, label=rf"$\sigma = {σ:.2f}$", lw=2)
 
 ax.set_xlabel(r"state value $a_s$", fontsize=12)
 ax.set_ylabel(r"$\alpha_s / \beta_s = a_s u_1 / u_2$", fontsize=12)
 ax.axhline(y=1.0, color="black", lw=0.8, ls="--")
 ax.legend(fontsize=10)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
-When $\sigma = 1$ (red line) the ratio is constant across all $a_s$ values—information
+When $\sigma = 1$ the ratio is constant across all $a_s$ values—information
 about the state has no effect on the marginal rate of substitution.
 
 For $\sigma < 1$ the
 ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
 equilibrium price strictly monotone in the posterior $q$ in both cases.
 
+The static analysis asks whether a current price reveals current private
+information, whereas the next section asks what a whole history of prices
+reveals over time.
+
 (bayesian_price_expectations)=
 ## Bayesian price expectations in a dynamic economy
 
-We now turn to a question addressed in Section 3 of {cite:t}`kihlstrom_mirman1975`.
+We now turn to a question addressed in Section 3 of
+{cite:t}`kihlstrom_mirman1975`.
 
 ### A stochastic exchange economy
 
@@ -525,41 +742,73 @@ In each period $t$:
 3. Consumers trade and consume.
 
 The endowment vectors $\{\tilde{\omega}^t\}$ are **i.i.d.** with density
-$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is a
+$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is
+a
 **structural parameter vector** that is *fixed but unknown*.
 
 The equilibrium price at time $t$ is a deterministic function of $\omega^t$, so
-$\{p^t\}$ is also i.i.d. with density
+$\{p^t\}$ is also i.i.d.
+
+For any measurable price set $P$, let
+
+$$
+W(P) = \{\omega^t : p(\omega^t) \in P\}.
+$$
+
+Then
 
 $$
-g(p^t \mid \lambda) = \int f(\omega^t \mid \lambda)\,
-  \mathbf{1}\bigl[p(\omega^t) = p^t\bigr]\, d\omega^t.
+P_\lambda(p^t \in P) = P_\lambda(\omega^t \in W(P))
+= \int_{W(P)} f(\omega^t \mid \lambda)\, d\omega^t.
 $$
 
-Following econometric convention, {cite:t}`kihlstrom_mirman1975` call $g(p \mid \lambda)$
-the **reduced form** and $f(\omega \mid \lambda)$ the **structure**.
+The induced price density is denoted by $g(p^t \mid \lambda)$.
+
+For a given structure $\lambda$, this density is the observable implication of
+the model, and when several structures imply the same density we group them
+into a single reduced-form class.
+
+The next issue is therefore what an observer can and cannot infer about the
+structure from price data alone.
 
 ### The identification problem
 
-Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices loses
+Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices
+loses
 information relative to observing endowments.
 
 In particular, it may be impossible to
 recover $\lambda$ from $g(p \mid \lambda)$ even with infinite price data.
 
 To handle this, partition $\Lambda$ into equivalence classes $\mu$ such that
-$\lambda \in \mu$ and $\lambda' \in \mu$ whenever $g(p \mid \lambda) = g(p \mid \lambda')$
+$\lambda \in \mu$ and $\lambda' \in \mu$ whenever $g(p \mid \lambda) = g(p \mid
+\lambda')$
 for all $p$.
 
 The equivalence class $\mu$ containing the true $\lambda$ is the **reduced
-form** (with respect to data on prices).
+form** relevant for price data.
 
 An observer who knows the infinite price history learns
 $\mu$ but not necessarily $\lambda$.
 
+Once that distinction is clear, Bayesian updating can be written down directly.
+
 ### Bayesian updating
 
-An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in \Lambda$.
+An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in
+\Lambda$.
+
+If the observer could see endowments directly, the posterior would be
+
+$$
+h(\lambda \mid \omega^1, \ldots, \omega^t)
+  = \frac{h(\lambda)\, \prod_{\tau=1}^{t} f(\omega^\tau \mid \lambda)}
+         {\displaystyle\sum_{\lambda' \in \Lambda}
+           h(\lambda')\, \prod_{\tau=1}^{t} f(\omega^\tau \mid \lambda')},
+$$
+
+and the paper appeals to a Bayesian consistency result to conclude that this
+posterior concentrates on the true structure $\bar \lambda$.
 
 After observing the price sequence $(p^1, \ldots, p^t)$, the observer's Bayesian
 posterior is
@@ -571,6 +820,21 @@ h(\lambda \mid p^1, \ldots, p^t)
            h(\lambda')\, \prod_{\tau=1}^{t} g(p^\tau \mid \lambda')}.
 $$
 
+Price data cannot distinguish structures inside the same reduced-form class.
+
+Indeed, if
+$\lambda$ and $\lambda'$ belong to the same class $\mu$, then
+$g(\cdot \mid \lambda) = g(\cdot \mid \lambda')$, so
+
+$$
+\frac{h(\lambda \mid p^1, \ldots, p^t)}
+     {h(\lambda' \mid p^1, \ldots, p^t)}
+= \frac{h(\lambda)}{h(\lambda')}
+$$
+
+for every sample history, so the relative odds within an observationally
+equivalent class never change.
+
 At time $t$, the observer's price expectations for the next period are
 
 $$
@@ -579,6 +843,9 @@ g(p^{t+1} \mid p^1, \ldots, p^t)
     h(\lambda \mid p^1, \ldots, p^t).
 $$
 
+With the posterior and predictive density defined, we can state the paper's
+convergence result.
+
 ### The convergence theorem
 
 ```{prf:theorem} Bayesian Convergence
@@ -587,11 +854,31 @@ $$
 Let $\bar\lambda$ be the true
 structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.
 
+Assume the prior assigns positive probability to $\bar\lambda$ (equivalently,
+positive
+probability to the class $\bar\mu$).
+
+Define the posterior mass on a reduced-form class by
+
+$$
+H_t(\mu) = \sum_{\lambda \in \mu} h(\lambda \mid p^1, \ldots, p^t).
+$$
+
+Because all structures inside a class imply the same $g(\cdot \mid \lambda)$,
+the
+predictive density can equivalently be written as
+
+$$
+g(p^{t+1} \mid p^1, \ldots, p^t)
+  = \sum_{\mu} g(p^{t+1} \mid \mu)\, H_t(\mu).
+$$
+
 Then
 
 $$
-\lim_{t \to \infty} h(\mu \mid p^1, \ldots, p^t)
-  = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,} \end{cases}
+\lim_{t \to \infty} H_t(\mu)
+  = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,}
+  \end{cases}
 $$
 
 with probability one.
@@ -602,18 +889,29 @@ $$
 \lim_{t \to \infty} g(p^{t+1} \mid p^1, \ldots, p^t) = g(p \mid \bar\mu),
 $$
 
-which equals the rational-expectations price distribution for a fully informed observer.
+which equals the rational-expectations price distribution for a fully informed
+observer.
 ```
 
-Establishing convergence relies on appealing to the **Bayesian consistency** result of {cite:t}`degroot1962`: as
-long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
-(which holds here generically), the posterior concentrates on the true reduced form.
+The important distinction is that price observers need not learn $\bar \lambda$
+itself.
+
+They only learn which reduced-form class is correct.
 
-Price observers converge to **rational expectations** even if they never identify the
-underlying structure $\bar\lambda$.
+That is enough for forecasting because every $\lambda \in \bar \mu$ generates
+the same price density $g(\cdot \mid \bar \mu)$.
 
-The reduced form $g(p \mid \bar\mu)$ statistical model is used to form equilibrium price
-expectations, and the Bayesian observer learns the reduced form from prices alone.
+This is exactly the paper's point: rational price expectations emerge from
+learning the
+reduced form, not from identifying every structural detail of the economy.
+
+Here "rational expectations" means that the observer's predictive distribution
+for next
+period's price matches the objective price distribution generated by the true
+reduced form.
+
+The theorem is easiest to absorb in a stripped-down example, so we now turn to a
+simple simulation.
 
 (bayesian_simulation)=
 ## Simulating Bayesian learning from prices
@@ -623,12 +921,14 @@ We illustrate the theorem with a two-state example.
 Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
 $p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.
 
-The observer knows the two possible price distributions (the reduced forms) but not which
+The observer knows the two possible price distributions (the reduced forms) but
+not which
 one governs the data.
 
-This is a standard **Bayesian model selection** problem.
+This is a **Bayesian model selection** problem.
 
-With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight on $\mu_1$
+With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight
+on $\mu_1$
 after period $t$ is
 
 $$
@@ -637,37 +937,22 @@ h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
 $$
 
 ```{code-cell} ipython3
-def simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths,
-                                seed=42):
-    """
-    Simulate Bayesian learning about which price distribution is true.
-
-    Parameters
-    ----------
-    p_bar_true : mean of the true reduced form
-    p_bar_alt  : mean of the alternative reduced form
-    sigma_p    : common standard deviation of price distributions
-    T          : number of periods
-    h0         : initial prior probability on the true model
-    n_paths    : number of simulation paths
-    seed       : random seed
-
-    Returns
-    -------
-    h_paths : array of shape (n_paths, T+1) with posterior beliefs on true model
-    """
+def simulate_bayesian_learning(
+    p_bar_true, p_bar_alt, σ_p, T, h0, n_paths, seed=42
+):
+    """Simulate posterior learning between two Gaussian reduced forms."""
     rng = np.random.default_rng(seed)
     h_paths = np.zeros((n_paths, T + 1))
     h_paths[:, 0] = h0
 
     for path in range(n_paths):
         h = h0
-        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        prices = rng.normal(p_bar_true, σ_p, size=T)
         for t, p in enumerate(prices):
-            g_true  = norm.pdf(p, loc=p_bar_true, scale=sigma_p)
-            g_alt   = norm.pdf(p, loc=p_bar_alt,  scale=sigma_p)
-            denom   = h * g_true + (1 - h) * g_alt
-            h       = h * g_true / denom
+            g_true = norm.pdf(p, loc=p_bar_true, scale=σ_p)
+            g_alt = norm.pdf(p, loc=p_bar_alt, scale=σ_p)
+            denom = h * g_true + (1 - h) * g_alt
+            h = h * g_true / denom
             h_paths[path, t + 1] = h
 
     return h_paths
@@ -684,12 +969,16 @@ def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
     median_path = np.median(h_paths, axis=0)
     ax.plot(t_grid, median_path, color="navy", lw=2, label="median posterior")
 
-    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="true model weight = 1")
+    ax.axhline(
+        y=1.0,
+        color="black",
+        ls="--",
+        lw=1.2,
+        label="true model weight = 1",
+    )
     ax.set_xlabel("period $t$", fontsize=12)
     ax.set_ylabel(r"$h_t$ = posterior weight on true model", fontsize=12)
     ax.legend(fontsize=10)
-    ax.set_ylim(-0.05, 1.08)
-    ax.grid(alpha=0.3)
 ```
 
 ```{code-cell} ipython3
@@ -702,30 +991,38 @@ mystnb:
 T       = 300
 h0      = 0.5     # diffuse prior
 n_paths = 40
-sigma_p = 0.4
+σ_p = 0.4
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 
-# Case 1: distinct reduced forms (easy to learn)
+# Distinct reduced forms.
 p_bar_true, p_bar_alt = 2.0, 1.2
-h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, σ_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
 
-# Case 2: similar reduced forms (harder to learn)
+# Similar reduced forms.
 p_bar_true, p_bar_alt = 2.0, 1.8
-h_paths_hard = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+h_paths_hard = simulate_bayesian_learning(
+    p_bar_true, p_bar_alt, σ_p, T, h0, n_paths
+)
 plot_bayesian_learning(h_paths_hard, p_bar_true, p_bar_alt, axes[1])
 
 plt.tight_layout()
 plt.show()
 ```
 
-In both panels the posterior weight on the true model converges to 1 with probability one,
-though convergence is slower when the two price distributions are similar (right panel).
+In both panels the posterior weight on the true model converges to 1 with
+probability one,
+though convergence is slower when the two price distributions are similar (right
+panel).
+
+This first simulation tracks posterior mass, and the next one tracks the
+predictive density itself.
 
 ### Price expectations vs. rational expectations
 
-We now verify that the observer's price expectations converge to the rational-expectations
+We now verify that the observer's price expectations converge to the
+rational-expectations
 distribution $g(p \mid \bar\mu)$.
 
 ```{code-cell} ipython3
@@ -735,62 +1032,72 @@ mystnb:
     caption: price distribution convergence
     name: fig-price-convergence
 ---
-def price_expectation(h_t, p_bar_true, p_bar_alt, p_grid):
-    """
-    Compute the observer's predictive price density at posterior weight h_t.
-    Mixture: h_t * N(p_bar_true, ...) + (1-h_t) * N(p_bar_alt, ...)
-    """
-    return (h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
-            + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p))
+def price_expectation(h_t, p_bar_true, p_bar_alt, sigma_p, p_grid):
+    """Return the predictive price density at posterior weight h_t."""
+    return (
+        h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+        + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p)
+    )
 
 
 p_bar_true, p_bar_alt = 2.0, 1.2
-sigma_p = 0.4
-T_long  = 1000
+σ_p = 0.4
 n_paths = 1
+T_long = 1000
+
 h_paths_long = simulate_bayesian_learning(
-    p_bar_true, p_bar_alt, sigma_p, T_long, h0=0.5, n_paths=n_paths, seed=7
+    p_bar_true, p_bar_alt, σ_p, T_long, h0=0.5, n_paths=n_paths, seed=7
 )
 
 p_grid = np.linspace(0.0, 3.5, 300)
-re_density = norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+re_density = norm.pdf(p_grid, loc=p_bar_true, scale=σ_p)
 
 fig, ax = plt.subplots(figsize=(8, 5))
-snapshots = [0, 10, 50, 200, T_long]
+snapshots = [0, 1, 3, 5, 10]
 palette   = plt.cm.Blues(np.linspace(0.3, 1.0, len(snapshots)))
 
 for t_snap, col in zip(snapshots, palette):
     h_t = h_paths_long[0, t_snap]
-    dens = price_expectation(h_t, p_bar_true, p_bar_alt, p_grid)
-    ax.plot(p_grid, dens, color=col, lw=2,
-            label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$")
+    dens = price_expectation(h_t, p_bar_true, p_bar_alt, σ_p, p_grid)
+    ax.plot(
+        p_grid,
+        dens,
+        color=col,
+        lw=2,
+        label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$",
+    )
 
 ax.plot(p_grid, re_density, "k--", lw=2,
-        label=r"rational expectations $g(p \mid \bar\mu)$")
+        label=r"rational expectations $g(p \mid \bar{\mu})$")
 ax.set_xlabel("price $p$", fontsize=12)
 ax.set_ylabel("density", fontsize=12)
 ax.legend(fontsize=9)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
-The sequence of predictive densities (shades of blue) converges to the rational-expectations
+The sequence of predictive densities (shades of blue) converges to the
+rational-expectations
 density (dashed black line) as experience accumulates.
 
 This illustrates {prf:ref}`ime_theorem_bayesian_convergence`.
 
+We can now sharpen the point by looking at a case in which the reduced form is
+learned but the underlying structure is not.
+
 (km_extension_nonidentification)=
 ### Learning the reduced form without identifying the structure
 
-The convergence result is particularly striking because the observer converges to
+The convergence result is particularly striking because the observer converges
+to
 *rational expectations* even when the underlying **structure** $\lambda$ is
 *not identified* by prices.
 
 To illustrate this, consider a case with *three* possible structures
 $\lambda^{(1)}, \lambda^{(2)}, \lambda^{(3)}$ but only *two* reduced forms
 $\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
-(because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price distribution).
+(because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price
+distribution).
 
 ```{code-cell} ipython3
 ---
@@ -799,24 +1106,19 @@ mystnb:
     caption: learning with non-identification
     name: fig-nonidentification
 ---
-def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths, seed=0):
-    """
-    Bayesian learning with 3 structures, 2 reduced forms.
-    h0_vec  : length-3 array of initial prior weights on each structure
-    p_bar_vec: length-3 array of price means for each structure
-               (structures 0 and 1 share the same reduced form if p_bar_vec[0]==p_bar_vec[1])
-    true_idx: index (0,1,2) of the true structure
-    Returns  : array (n_paths, T+1, 3) posterior weights on each structure
-    """
+def simulate_learning_3struct(
+    T, h0_vec, p_bar_vec, σ_p, true_idx, n_paths, seed=0
+):
+    """Simulate learning with three structures and two reduced forms."""
     rng = np.random.default_rng(seed)
     h_paths = np.zeros((n_paths, T + 1, 3))
     h_paths[:, 0, :] = h0_vec
 
     for path in range(n_paths):
         h = np.array(h0_vec, dtype=float)
-        prices = rng.normal(p_bar_vec[true_idx], sigma_p, size=T)
+        prices = rng.normal(p_bar_vec[true_idx], σ_p, size=T)
         for t, p in enumerate(prices):
-            likelihoods = norm.pdf(p, loc=p_bar_vec, scale=sigma_p)
+            likelihoods = norm.pdf(p, loc=p_bar_vec, scale=σ_p)
             h = h * likelihoods
             h /= h.sum()
             h_paths[path, t + 1, :] = h
@@ -824,20 +1126,24 @@ def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths,
     return h_paths
 
 
-# Structures 0 and 1 have the same reduced form (same price mean)
+# Structures 0 and 1 share the same reduced form.
 p_bar_vec = np.array([2.0, 2.0, 1.2])
-h0_vec    = np.array([1/3, 1/3, 1/3])
-sigma_p   = 0.4
-T         = 400
-true_idx  = 0             # True structure is 0 (indistinguishable from 1)
+h0_vec = np.array([1 / 3, 1 / 3, 1 / 3])
+σ_p = 0.4
+T = 400
+true_idx = 0     # Structure 0 is observationally equivalent to 1.
 
-h_paths_3 = simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths=30)
+h_paths_3 = simulate_learning_3struct(
+    T, h0_vec, p_bar_vec, σ_p, true_idx, n_paths=30
+)
 t_grid = np.arange(T + 1)
 
 fig, axes = plt.subplots(1, 3, figsize=(13, 4), sharey=True)
-struct_labels = [r"$\lambda^{(1)}$",
-                 r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
-                 r"$\lambda^{(3)}$"]
+struct_labels = [
+    r"$\lambda^{(1)}$",
+    r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
+    r"$\lambda^{(3)}$",
+]
 
 for k, (ax, label) in enumerate(zip(axes, struct_labels)):
     for path in h_paths_3:
@@ -845,7 +1151,6 @@ for k, (ax, label) in enumerate(zip(axes, struct_labels)):
     ax.plot(t_grid, np.median(h_paths_3[:, :, k], axis=0),
             color="navy", lw=2, label=f"median weight on {label}")
     ax.set_xlabel("period $t$", fontsize=11)
-    ax.grid(alpha=0.3)
     ax.legend(fontsize=9)
 
 axes[0].set_ylabel("posterior weight", fontsize=11)
@@ -853,20 +1158,25 @@ plt.tight_layout()
 plt.show()
 ```
 
-The observer correctly rules out $\lambda^{(3)}$ (the wrong reduced form) with probability
-one, but cannot distinguish $\lambda^{(1)}$ from $\lambda^{(2)}$ because they generate an
+The observer correctly rules out $\lambda^{(3)}$ (the wrong reduced form) with
+probability
+one, but cannot distinguish $\lambda^{(1)}$ from $\lambda^{(2)}$ because they
+generate an
 identical price distribution.
 
 Nevertheless, the observer's **price expectations** converge
-to rational expectations because both structures imply the same reduced form $\bar\mu$.
+to rational expectations because both structures imply the same reduced form
+$\bar\mu$.
+
 
 ## Exercises
 
 ```{exercise}
 :label: km_ex1
 
-**Invertibility with CARA preferences.**  Consider a two-state economy ($a_1 = 2$,
-$a_2 = 0.5$) where the informed agent has **CARA** (constant absolute risk aversion)
+Consider a two-state economy ($a_1 = 2$,
+$a_2 = 0.5$) where the informed agent has **CARA** (constant absolute risk
+aversion)
 preferences over portfolio wealth:
 
 $$
@@ -879,17 +1189,24 @@ $$
 q\,u(W_1) + (1-q)\,u(W_2), \quad W_s = w - p\,x_1 + a_s\,x_1,
 $$
 
-subject to the budget constraint $p\,x_1 + x_2 = w$.  Total supply of good 1 is $X_1 = 1$.
+subject to the budget constraint $p\,x_1 + x_2 = w$.
+
+Total supply of good 1 is $X_1 = 1$.
 
 1. Derive the first-order condition for the informed agent's optimal $x_1$.
 
-1. Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
-supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.  Solve it
+1. Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the
+   entire
+supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.
+Solve it
 numerically for $q \in (0,1)$ and several values of $\gamma$.
 
-1. Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
-holds.  Explain intuitively why CARA preferences always lead to an invertible price map
-(the elasticity of substitution of portfolio utility is $\sigma = \infty$).
+1. Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility
+   condition
+holds in this example. Explain why this is economically similar to the $\sigma >
+1$ case in
+{prf:ref}`ime_theorem_invertibility_conditions`, but not a direct application of
+that theorem.
 ```
 
 ```{solution-start} km_ex1
@@ -898,7 +1215,9 @@ holds.  Explain intuitively why CARA preferences always lead to an invertible pr
 
 **1. First-order condition.**
 
-Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.  The FOC is
+Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.
+
+The FOC is
 
 $$
 q\,(a_1 - p)\,\gamma\, e^{-\gamma W_1}
@@ -915,6 +1234,7 @@ $$
 **2. Market-clearing equilibrium price.**
 
 Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
+
 a scalar root-finding problem in $p$:
 
 $$
@@ -925,39 +1245,46 @@ $$
 ```{code-cell} ipython3
 from scipy.optimize import brentq
 
-def F_cara(p, q, a1, a2, gamma, x1=1.0):
-    """Residual of CARA market-clearing condition."""
-    return (q * (a1-p) * np.exp(-gamma*(a1-p)*x1)
-            - (1-q) * (p-a2) * np.exp(gamma*(p-a2)*x1))
+def F_cara(p, q, a1, a2, γ, x1=1.0):
+    """Residual for the CARA equilibrium condition."""
+    return (q * (a1 - p) * np.exp(-γ * (a1 - p) * x1)
+            - (1 - q) * (p - a2) * np.exp(γ * (p - a2) * x1))
 
-a1, a2  = 2.0, 0.5
-q_grid  = np.linspace(0.05, 0.95, 200)
-gammas  = [0.5, 1.0, 2.0, 5.0]
-colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(gammas)))
+a1, a2 = 2.0, 0.5
+q_grid = np.linspace(0.05, 0.95, 200)
+γ_values = [0.5, 1.0, 2.0, 5.0]
+colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(γ_values)))
 
 fig, ax = plt.subplots(figsize=(8, 5))
-for gamma, color in zip(gammas, colors_sol):
+for γ, color in zip(γ_values, colors_sol):
     p_eq = [brentq(F_cara, a2, a1,
-                   args=(q, a1, a2, gamma))
+                   args=(q, a1, a2, γ))
             for q in q_grid]
     ax.plot(q_grid, p_eq, lw=2, color=color,
-            label=rf"$\gamma = {gamma}$")
+            label=rf"$\gamma = {γ}$")
 
 ax.set_xlabel(r"posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
 ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.set_title("CARA preferences: equilibrium prices", fontsize=12)
 ax.legend(fontsize=10)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
 **3. Invertibility for CARA.**
 
-The price is strictly increasing in $q$ for every $\gamma > 0$.  Intuitively, portfolio
-utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect substitutes** in
-creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$. By
-{prf:ref}`ime_theorem_invertibility_conditions`, the price map is therefore always invertible.
+The price is strictly increasing in $q$ for every $\gamma > 0$, because
+portfolio utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect
+substitutes** in creating wealth, so a higher posterior probability of the
+high-return state raises the marginal value of the risky asset and pushes the
+equilibrium price upward.
+
+This behavior is similar in spirit to the $\sigma > 1$ case in
+{prf:ref}`ime_theorem_invertibility_conditions`, but it is *not* a direct
+consequence of that theorem because CARA utility over wealth is not homothetic
+in the two-good representation used in the theorem.
+
+Here monotonicity is verified directly from the specific first-order condition.
 
 ```{solution-end}
 ```
@@ -966,21 +1293,27 @@ creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$.
 :label: km_ex2
 
 In the Bayesian learning simulation, the speed of
-convergence to rational expectations is determined by the **Kullback-Leibler divergence**
+convergence to rational expectations is determined by the **Kullback-Leibler
+divergence**
 between the two reduced forms.
 
-The KL divergence from $g(\cdot \mid \mu_2)$ to $g(\cdot \mid \mu_1)$, for two normal
-distributions with means $\bar{p}_1$ and $\bar{p}_2$ and common variance $\sigma_p^2$, is
+The KL divergence from $g(\cdot \mid \mu_2)$ to $g(\cdot \mid \mu_1)$, for two
+normal
+distributions with means $\bar{p}_1$ and $\bar{p}_2$ and common variance
+$\sigma_p^2$, is
 
 $$
 D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2}.
 $$
 
-1. For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
+1. For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard"
+   case
 ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.8$), compute $D_{KL}$ for $\sigma_p = 0.4$.
 
-1. Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
-path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot histograms of
+1. Re-run the simulations from the lecture for both cases with $n=100$ paths.
+   For each
+path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot
+histograms of
 $T_{0.99}$ for both cases.
 
 1. How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
@@ -992,25 +1325,25 @@ roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
 ```
 
 ```{code-cell} ipython3
-sigma_p = 0.4
+σ_p = 0.4
 
-def kl_normal(p1, p2, sigma):
-    """KL divergence between N(p1,sigma^2) and N(p2,sigma^2)."""
-    return (p1 - p2)**2 / (2 * sigma**2)
+def kl_normal(p1, p2, σ):
+    """Return the KL divergence for N(p1, sigma^2) and N(p2, sigma^2)."""
+    return (p1 - p2)**2 / (2 * σ**2)
 
 cases = [("Easy",  2.0, 1.2), ("Hard", 2.0, 1.8)]
 for name, p1, p2 in cases:
-    kl = kl_normal(p1, p2, sigma_p)
+    kl = kl_normal(p1, p2, σ_p)
     print(f"{name} case: D_KL = {kl:.4f}")
 
 n_paths = 100
 
 fig, axes = plt.subplots(1, 2, figsize=(11, 4))
 for ax, (name, p1, p2) in zip(axes, cases):
-    kl = kl_normal(p1, p2, sigma_p)
-    paths = simulate_bayesian_learning(p1, p2, sigma_p, T=2000,
+    kl = kl_normal(p1, p2, σ_p)
+    paths = simulate_bayesian_learning(p1, p2, σ_p, T=2000,
                                        h0=0.5, n_paths=n_paths, seed=42)
-    # First period where posterior >= 0.99
+    # First period with posterior >= 0.99.
     T99 = []
     for path in paths:
         idx = np.where(path >= 0.99)[0]
@@ -1028,14 +1361,15 @@ for ax, (name, p1, p2) in zip(axes, cases):
     ax.set_xlabel(r"$T_{0.99}$", fontsize=12)
     ax.set_ylabel("count", fontsize=11)
     ax.legend(fontsize=10)
-    ax.grid(alpha=0.3)
 
 plt.tight_layout()
 plt.show()
 ```
 
-The median $T_{0.99}$ scales as approximately $C/D_{KL}$, confirming that learning is
-faster when the two reduced forms are more easily distinguished (large $D_{KL}$).
+The median $T_{0.99}$ scales as approximately $C/D_{KL}$, confirming that
+learning is
+faster when the two reduced forms are more easily distinguished (large
+$D_{KL}$).
 
 ```{solution-end}
 ```
@@ -1043,13 +1377,17 @@ faster when the two reduced forms are more easily distinguished (large $D_{KL}$)
 ```{exercise}
 :label: km_ex3
 
-**Failure of invertibility—counterexample for $S > 2$.**  The paper constructs a
-counterexample showing that for $S = 3$ states, even if the elasticity of substitution
-of $u^1$ is everywhere greater than one, $p(PR^1)$ need **not** be invertible.
+The paper constructs a
+counterexample showing that for $S = 3$ states, even if the elasticity of
+substitution
+of $u^1$ is everywhere greater than one, the price map need **not** be
+invertible.
 
 Consider the marginal rate of substitution for the portfolio utility
 $u^1(a_s x_1 + x_2)$ (infinite elasticity of substitution) and three states
-$a_1 > a_2 > a_3$.  The MRS is
+$a_1 > a_2 > a_3$.
+
+The MRS is
 
 $$
 m(\mu)
@@ -1060,16 +1398,21 @@ $$
 where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
 
 1. For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
-$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of $(q, r)$.
+$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of
+$(q, r)$.
 Compute $\partial m / \partial r$ and show that its sign depends on
 $\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
 
-1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
-aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify numerically
-that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone in $r$),
+1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with
+   risk
+aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify
+numerically
+that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone
+in $r$),
 giving a counterexample to invertibility.
 
-1. Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
+1. Explain why this non-monotonicity does *not* arise in the two-state case $S =
+   2$.
 ```
 
 ```{solution-start} km_ex3
@@ -1087,7 +1430,8 @@ Differentiating using the quotient rule (denominator $D$):
 
 $$
 \frac{\partial m}{\partial r}
-= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3 q)(\beta_2-\beta_1)}{D^2}.
+= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3
+q)(\beta_2-\beta_1)}{D^2}.
 $$
 
 After simplification this reduces to a signed combination of
@@ -1097,40 +1441,41 @@ whose sign is parameter-dependent.
 **2. Numerical verification.**
 
 ```{code-cell} ipython3
-def mrs_3state(q, r, a1, a2, a3, x1, x2, gamma):
-    """MRS with mu(a3)=q, mu(a2)=r, mu(a1)=1-r-q, portfolio utility u'(c)=c^{-gamma}."""
-    mu1, mu2, mu3 = 1 - r - q, r, q
-    beta1 = (a1 * x1 + x2)**(-gamma)
-    beta2 = (a2 * x1 + x2)**(-gamma)
-    beta3 = (a3 * x1 + x2)**(-gamma)
-    num = a1*beta1*mu1 + a2*beta2*mu2 + a3*beta3*mu3
-    den = beta1*mu1 + beta2*mu2 + beta3*mu3
+def mrs_3state(q, r, a1, a2, a3, x1, x2, γ):
+    """Return the three-state MRS at (q, r)."""
+    μ1, μ2, μ3 = 1 - r - q, r, q
+    β1 = (a1 * x1 + x2)**(-γ)
+    β2 = (a2 * x1 + x2)**(-γ)
+    β3 = (a3 * x1 + x2)**(-γ)
+    num = a1 * β1 * μ1 + a2 * β2 * μ2 + a3 * β3 * μ3
+    den = β1 * μ1 + β2 * μ2 + β3 * μ3
     return num / den
 
-a1, a2, a3  = 3.0, 2.0, 0.5
-x1, x2      = 1.0, 0.5
-gamma       = 2.0
-q_fix       = 0.1       # fix q, vary r
-r_grid      = np.linspace(0.05, 0.80, 200)
+a1, a2, a3 = 3.0, 2.0, 0.5
+x1, x2 = 1.0, 0.5
+γ = 2.0
+q_fix = 0.1
+r_grid = np.linspace(0.05, 0.80, 200)
 
-# Filter valid (q+r <= 1)
+# Valid region: q + r <= 1.
 r_valid = r_grid[r_grid + q_fix <= 0.95]
-m_vals  = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, gamma) for r in r_valid]
-dm_dr   = np.gradient(m_vals, r_valid)
+m_vals = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, γ) for r in r_valid]
+dm_dr = np.gradient(m_vals, r_valid)
 
 fig, axes = plt.subplots(1, 2, figsize=(11, 4))
 axes[0].plot(r_valid, m_vals, color="steelblue", lw=2)
 axes[0].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
-axes[0].set_ylabel(r"$m(q, r)$ — MRS", fontsize=12)
-axes[0].set_title(fr"MRS is non-monotone in $r$ (CRRA $\gamma={gamma}$)", fontsize=12)
-axes[0].grid(alpha=0.3)
+axes[0].set_ylabel("MRS m(q, r)", fontsize=12)
+axes[0].set_title(f"MRS is non-monotone in r (CRRA gamma={γ})", fontsize=12)
 
 axes[1].plot(r_valid, dm_dr, color="crimson", lw=2)
 axes[1].axhline(0, color="black", lw=1, ls="--")
 axes[1].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
 axes[1].set_ylabel(r"$\partial m / \partial r$", fontsize=12)
-axes[1].set_title("Derivative changes sign — non-invertibility for $S=3$", fontsize=12)
-axes[1].grid(alpha=0.3)
+axes[1].set_title(
+    "Derivative changes sign - non-invertibility for $S=3$",
+    fontsize=12,
+)
 
 plt.tight_layout()
 plt.show()
@@ -1139,15 +1484,19 @@ print("Sign changes in dm/dr:",
       np.sum(np.diff(np.sign(dm_dr)) != 0))
 ```
 
-The derivative $\partial m / \partial r$ changes sign, confirming that the MRS (and hence
+The derivative $\partial m / \partial r$ changes sign, confirming that the MRS
+(and hence
 the equilibrium price) is **not** monotone in $r$ for $S = 3$.
 
-**3.** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
-and the MRS is a function of $q$ alone.  One can show directly that $\partial m / \partial q$
-has a definite sign determined entirely by whether $a_1 > a_2$ and whether
-$\sigma > 1$ or $\sigma < 1$ hold—there is no room for sign changes.  With three states,
-the two-dimensional prior $(q, r)$ allows richer interactions between $\beta_s$ values that
-can reverse the sign of the derivative.
+**3.** In the two-state case $S = 2$, the prior is parameterized by a single
+scalar $q$ and the MRS is a function of $q$ alone.
+
+One can show directly that $\partial m / \partial q$ has a definite sign
+determined entirely by whether $a_1 > a_2$ and whether $\sigma > 1$ or $\sigma <
+1$ hold, so there is no room for sign changes.
+
+With three states, the two-dimensional prior $(q, r)$ allows richer interactions
+between $\beta_s$ values that can reverse the sign of the derivative.
 
 ```{solution-end}
 ```
@@ -1158,21 +1507,33 @@ can reverse the sign of the derivative.
 {prf:ref}`ime_theorem_bayesian_convergence`
 assumes the true
 distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
-$h(\bar\lambda) > 0$).  Investigate what happens when the true model is **not** in the
+$h(\bar\lambda) > 0$).
+
+Investigate what happens when the true model is **not** in the
 prior support.
 
-1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
-    places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$.
+1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior
+   that
+    places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.3,
+    0.4^2)$.
 
     - Plot the posterior weight on each model over time.
 
-2. Show that the **predictive** (mixture) price distribution converges to the *closest*
-    model in KL divergence terms—which by symmetry is the equal mixture, with mean 2.0.
+2. Show that the **predictive** (mixture) price distribution converges to the
+   *closest*
+    model in KL divergence terms.
 
-    - Verify this numerically by computing the predictive mean over time.
+    - Compute the KL divergence from the true model to each wrong model.
+    - Verify numerically that the posterior concentrates on the closer wrong
+      model and that
+      the predictive mean converges to that model's mean.
 
 3. Relate this finding to the Bayesian consistency literature: when is the limit
-    distribution a good approximation to the true distribution even under misspecification?
+    distribution a good approximation to the true distribution even under
+    misspecification?
+    Why is the symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ a
+    knife-edge case rather
+    than a setting with a deterministic 50-50 posterior limit?
 ```
 
 ```{solution-start} km_ex4
@@ -1180,12 +1541,10 @@ prior support.
 ```
 
 ```{code-cell} ipython3
-def simulate_misspecified(T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0):
-    """
-    Misspecified Bayesian learning: two wrong models with means p_bar_wrong[0,1].
-    True model has mean p_bar_true (not in prior support).
-    Returns (n_paths, T+1, 2) array of posterior weights.
-    """
+def simulate_misspecified(
+    T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0
+):
+    """Simulate learning under a misspecified two-model prior."""
     rng = np.random.default_rng(seed)
     h_paths = np.zeros((n_paths, T + 1, 2))
     h_paths[:, 0, :] = h0
@@ -1202,52 +1561,116 @@ def simulate_misspecified(T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed
     return h_paths
 
 
-T        = 1000
-p_true   = 2.0
-p_wrong  = np.array([1.5, 2.5])
-sigma_p  = 0.4
-h0       = np.array([0.5, 0.5])
-n_paths  = 30
+def predictive_density(weights, means, sigma_p, p_grid):
+    """Return the predictive density under the current posterior weights."""
+    density = np.zeros_like(p_grid)
+    for weight, mean in zip(weights, means):
+        density += weight * norm.pdf(p_grid, loc=mean, scale=sigma_p)
+    return density
+
+
+T = 1000
+p_true = 2.0
+p_wrong = np.array([1.5, 2.3])
+sigma_p = 0.4
+h0 = np.array([0.5, 0.5])
+n_paths = 30
 
 h_misspec = simulate_misspecified(T, p_true, p_wrong, sigma_p, h0, n_paths)
 
+kl_vals = (p_true - p_wrong)**2 / (2 * sigma_p**2)
+for mean, kl in zip(p_wrong, kl_vals):
+    print(f"KL(true || N({mean:.1f}, sigma^2)) = {kl:.4f}")
+
 t_grid = np.arange(T + 1)
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
-for ax, k, label in zip(axes, [0, 1], [r"$N(1.5, \sigma^2)$", r"$N(2.5, \sigma^2)$"]):
+labels = [r"$N(1.5, \sigma^2)$", r"$N(2.3, \sigma^2)$"]
+for ax, k, label in zip(axes, [0, 1], labels):
     for path in h_misspec:
         ax.plot(t_grid, path[:, k], alpha=0.2, lw=0.8, color="steelblue")
     ax.plot(t_grid, np.median(h_misspec[:, :, k], axis=0),
             color="navy", lw=2, label="median")
-    ax.axhline(0.5, color="crimson", lw=1.5, ls="--", label="0.5 (symmetric limit)")
     ax.set_title(f"Posterior weight on {label}", fontsize=11)
     ax.set_xlabel("period $t$", fontsize=11)
     ax.set_ylabel("posterior weight", fontsize=11)
     ax.legend(fontsize=9)
-    ax.grid(alpha=0.3)
 
 plt.tight_layout()
 plt.show()
 
-# Predictive mean = h[:,0]*1.5 + h[:,1]*2.5
+# Predictive density and mean along the median posterior path.
+median_path = np.median(h_misspec, axis=0)
+p_grid = np.linspace(0.0, 3.5, 300)
+closer_idx = np.argmin(kl_vals)
+
+fig, ax = plt.subplots(figsize=(8, 4))
+colors = plt.cm.Blues(np.linspace(0.3, 1.0, 4))
+for t_snap, color in zip([0, 10, 100, T], colors):
+    dens = predictive_density(median_path[t_snap], p_wrong, sigma_p, p_grid)
+    ax.plot(p_grid, dens, color=color, lw=2, label=f"t = {t_snap}")
+
+ax.plot(
+    p_grid,
+    norm.pdf(p_grid, loc=p_wrong[closer_idx], scale=sigma_p),
+    "k--",
+    lw=2,
+    label="KL-best wrong model",
+)
+ax.set_xlabel("price $p$", fontsize=11)
+ax.set_ylabel("density", fontsize=11)
+ax.legend(fontsize=9)
+plt.tight_layout()
+plt.show()
+
 pred_mean = np.median(
     h_misspec[:, :, 0] * p_wrong[0] + h_misspec[:, :, 1] * p_wrong[1], axis=0
 )
 print(f"True mean: {p_true}")
 print(f"Predictive mean at T={T}: {pred_mean[-1]:.4f}")
-print("(Symmetry implies equal weight on 1.5 and 2.5 → predictive mean = 2.0)")
+print(f"Closer misspecified mean: {p_wrong[np.argmin(kl_vals)]:.1f}")
 ```
 
-By symmetry, the two wrong models are equidistant from the true distribution in KL
-divergence. 
+Here
+
+$$
+D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(2.3, 0.4^2)\bigr)
+<
+D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(1.5, 0.4^2)\bigr),
+$$
 
-The posterior therefore converges to the 50-50 mixture, and the predictive mean
-converges to $0.5 \times 1.5 + 0.5 \times 2.5 = 2.0$—coinciding with the true mean
-despite misspecification. 
+so the model with mean $2.3$ is the unique KL-best approximation among the two
+wrong models, and in the simulation posterior weight concentrates on that model
+while the predictive mean converges to $2.3$, not to the true mean $2.0$.
 
 This is an instance of the general result that under
-misspecification, Bayesian posteriors converge to the distribution in the model class that
+misspecification, Bayesian posteriors converge to the distribution in the model
+class that
 minimizes KL divergence from the model actually generating the data.
 
+The connection is that posterior odds are cumulative likelihood ratios.
+
+If we compare the two wrong Gaussian models $f$ and $g$, then under the true
+distribution $h$ the average log likelihood ratio satisfies
+
+$$
+\frac{1}{t} E_h[\log L_t] = K(h,g) - K(h,f).
+$$
+
+So if $f$ is KL-closer to $h$ than $g$ is, $\log L_t$ has positive drift and
+posterior odds tilt toward $f$.
+
+That is exactly the mechanism emphasized in {doc}`Likelihood Ratio Processes
+<likelihood_ratio_process>`.
+
+The lecture {doc}`likelihood_bayes` gives the Bayesian version of the same
+argument by showing how the posterior is a monotone transform of the likelihood
+ratio process.
+
+The symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ is different because both
+wrong models are equally far from the truth in KL terms, so there is no unique
+pseudo-true model and that knife-edge symmetry does **not** imply a
+deterministic 50-50 posterior limit.
+
 ```{solution-end}
 ```
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 2b0d292df..55c513e3a 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -326,13 +326,13 @@ Let's compute $a_1, a_2, b_1, b_2$.
 
 ```{code-cell} python3
 
-beta = multi_normal.βs
+β = multi_normal.βs
 
-a1 = μ[0] - beta[0]*μ[1]
-b1 = beta[0]
+a1 = μ[0] - β[0]*μ[1]
+b1 = β[0]
 
-a2 = μ[1] - beta[1]*μ[0]
-b2 = beta[1]
+a2 = μ[1] - β[1]*μ[0]
+b2 = β[1]
 ```
 
 Let's print out the intercepts and slopes.
@@ -2339,18 +2339,15 @@ the retained $z_1$ values.
 
 ```{code-cell} python3
 import numpy as np
-import statsmodels.api as sm
 
 μ = np.array([.5, 1.])
 Σ = np.array([[1., .5], [.5, 1.]])
 
-# (a) analytical conditional distribution
 mn = MultivariateNormal(μ, Σ)
 mn.partition(1)
 μ1_hat, Σ11_hat = mn.cond_dist(0, np.array([2.]))
 print(f"Analytical  μ̂₁ = {μ1_hat[0]:.4f},  Σ̂₁₁ = {Σ11_hat[0,0]:.4f}")
 
-# (b) simulation
 n = 1_000_000
 data = np.random.multivariate_normal(μ, Σ, size=n)
 z1_all, z2_all = data[:, 0], data[:, 1]
@@ -2396,12 +2393,12 @@ so $b_1 b_2 = \rho^2$.
 ```{code-cell} python3
 import numpy as np
 
-for rho in [0.2, 0.5, 0.9]:
-    Σ = np.array([[1., rho], [rho, 1.]])
+for ρ in [0.2, 0.5, 0.9]:
+    Σ = np.array([[1., ρ], [ρ, 1.]])
     mn = MultivariateNormal(np.zeros(2), Σ)
     mn.partition(1)
     product = float(mn.βs[0]) * float(mn.βs[1])
-    print(f"ρ={rho:.1f}:  b1*b2 = {product:.4f},  ρ² = {rho**2:.4f},  match: {np.isclose(product, rho**2)}")
+    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ² = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
 ```
 
 ```{solution-end}
@@ -2442,7 +2439,7 @@ for σy_val in [1., 5., 10., 20., 50.]:
         μ_i, Σ_i, _ = construct_moments_IQ(i, μθ_val, σθ_val, σy_val)
         mn_i = MultivariateNormal(μ_i, Σ_i)
         mn_i.partition(i)
-        _, Σθ_i = mn_i.cond_dist(1, np.zeros(i))   # conditioning value doesn't affect variance
+        _, Σθ_i = mn_i.cond_dist(1, np.zeros(i))
         σθ_hat_arr[i - 1] = np.sqrt(Σθ_i[0, 0])
     ax.plot(range(1, n_max + 1), σθ_hat_arr, label=f'σy={σy_val:.0f}')
 
@@ -2490,7 +2487,6 @@ import matplotlib.pyplot as plt
 n_scores = 20
 μθ_val, σy_val = 100., 10.
 
-# draw one set of test scores from a fixed "true" θ
 np.random.seed(42)
 true_θ = 108.
 y_obs = true_θ + σy_val * np.random.randn(n_scores)
@@ -2564,7 +2560,6 @@ T_ex = 60
 x0_hat_ex = np.zeros(2)
 Σ0_ex = np.eye(2)
 
-# simulate true states and observations
 np.random.seed(7)
 x_true = np.zeros((T_ex + 1, 2))
 y_seq_ex = np.zeros(T_ex)
@@ -2572,10 +2567,8 @@ for t in range(T_ex):
     x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * np.random.randn()
     y_seq_ex[t] = G_ex @ x_true[t] + np.random.randn()
 
-# run filter
 x_hat_seq, Σ_hat_seq = iterate(x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
 
-# (b) conditional variances
 fig, ax = plt.subplots()
 ax.plot(Σ_hat_seq[:, 0, 0], label=r'$\Sigma_t[0,0]$')
 ax.plot(Σ_hat_seq[:, 1, 1], label=r'$\Sigma_t[1,1]$')
@@ -2584,7 +2577,6 @@ ax.set_ylabel('conditional variance')
 ax.legend()
 plt.show()
 
-# (c) filtered state vs. truth vs. observations
 fig, ax = plt.subplots()
 ax.plot(x_true[1:, 0], label='true $x_t[0]$', alpha=0.7)
 ax.plot(x_hat_seq[1:, 0], label=r'filtered $\hat{x}_t[0]$', ls='--')
@@ -2633,7 +2625,6 @@ k_fa = 2
 Λ_fa[:N_fa//2, 0] = 1
 Λ_fa[N_fa//2:, 1] = 1
 
-results_table = {}
 for σu_val in [0.5, 2.0]:
     D_fa = np.eye(N_fa) * σu_val ** 2
     Σy_fa = Λ_fa @ Λ_fa.T + D_fa
@@ -2644,10 +2635,8 @@ for σu_val in [0.5, 2.0]:
     λ_fa   = λ_fa[ind_fa]
 
     frac = λ_fa[:2].sum() / λ_fa.sum()
-    results_table[σu_val] = frac
     print(f"σu={σu_val}: fraction explained by first 2 PCs = {frac:.4f}")
 
-# (b) comparison using σu=0.5
 σu_b = 0.5
 D_b  = np.eye(N_fa) * σu_b ** 2
 Σy_b = Λ_fa @ Λ_fa.T + D_b
@@ -2658,11 +2647,9 @@ z_b  = np.random.multivariate_normal(μz_b, Σz_b)
 f_b  = z_b[:k_fa]
 y_b  = z_b[k_fa:]
 
-# factor-analytic E[f|y]
 B_b    = Λ_fa.T @ np.linalg.inv(Σy_b)
 Efy_b  = B_b @ y_b
 
-# PCA projection
 λ_b, P_b = np.linalg.eigh(Σy_b)
 ind_b    = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
 P_b      = P_b[:, ind_b]
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index 8cbb50f40..bc5ac21f4 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -66,35 +66,45 @@ We'll briefly define what we mean by a **probability space**, a **probability me
 For most of this lecture, we sweep these objects into the background
  
 ```{note}
-Nevertheless, they'll be lurking beneath **induced distributions** of random variables that we'll focus on here. These deeper objects are essential for defining and analysing the concepts of stationarity and ergodicity that underly laws of large numbers. For a relatively
-nontechnical presentation of some of these results see this chapter from Lars Peter Hansen and Thomas J. Sargent's online monograph titled "Risk, Uncertainty, and Values":<https://lphansen.github.io/QuantMFR/book/1_stochastic_processes.html>.
+Nevertheless, they'll be lurking beneath **induced distributions** of random variables that we'll focus on here. 
+
+These deeper objects are essential for defining and analysing the concepts of stationarity and ergodicity that underly laws of large numbers.
+
+For a relatively
+nontechnical presentation of some of these results see this chapter from Lars Peter Hansen and Thomas J. Sargent's online monograph titled [*Risk, Uncertainty, and Values*](https://lphansen.github.io/QuantMFR/book/1_stochastic_processes.html).
 ``` 
   
 
 
-Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega$ be a particular underlying outcomes.
+Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega$ be a particular underlying outcome.
 
-Let $\mathcal{G} \subset \Omega$ be a subset of $\Omega$.
+Let $\mathcal{F}$ be a collection of subsets of $\Omega$ that we call **events**.
 
-Let $\mathcal{F}$ be a collection of such subsets $\mathcal{G} \subset \Omega$.
+(Technically, $\mathcal{F}$ is a [$\sigma$-algebra](https://en.wikipedia.org/wiki/Sigma-algebra).)
 
-The pair $\Omega,\mathcal{F}$ forms our **probability space** on which we want to put a probability measure.
+A **probability measure** $\mu$ maps each event $\mathcal{G} \in \mathcal{F}$ into a scalar number $\mu(\mathcal{G})$ between $0$ and $1$, with $\mu(\Omega)=1$.
 
-A **probability measure** $\mu$ maps a set of possible underlying outcomes $\mathcal{G} \in \mathcal{F}$ into a scalar number between $0$ and $1$
+The triple $\Omega,\mathcal{F},\mu$ forms our **probability space**.
 
-- this is the "probability" that $X$ belongs to $A$, denoted by $ \textrm{Prob}\{X\in A\}$.
+A **random variable** $X(\omega)$ is a function of the underlying outcome $\omega \in \Omega$ that assigns a value in some set of possible values.
 
-A **random variable** $X(\omega)$ is a function of the underlying outcome $\omega \in \Omega$.
+If $A$ is a set of possible values of $X$, then the event that $X$ lies in $A$ is
 
+$$
+\mathcal{G} = \{\omega \in \Omega : X(\omega) \in A\}.
+$$
 
-The random variable $X(\omega)$ has a **probability distribution** that is induced by the underlying probability measure $\mu$ and the function
-$X(\omega)$:
+The random variable $X(\omega)$ has a **probability distribution** induced by the probability measure $\mu$:
 
 $$
-\textrm{Prob} (X \in A ) = \int_{\mathcal{G}} \mu(\omega) d \omega
-$$ (eq:CDFfromdensity)
+\textrm{Prob}(X \in A) = \mu(\mathcal{G}).
+$$
 
-where ${\mathcal G}$ is the subset of $\Omega$ for which $X(\omega) \in A$.
+If $\mu$ has a density $p(\omega)$, then we can also write
+
+$$
+\textrm{Prob}(X \in A) = \int_{\mathcal{G}} p(\omega)\, d \omega
+$$ (eq:CDFfromdensity)
 
 We call this the induced probability distribution of random variable $X$.
 
@@ -124,6 +134,7 @@ To appreciate how statisticians connect probabilities to data, the key is to und
      - **Law of Large Numbers (LLN)**
      - **Central Limit Theorem (CLT)**
 
+### A discrete random variable example
 
 #### Scalar example
 
@@ -156,7 +167,7 @@ What do "identical" and "independent" mean in IID or iid ("identically and indep
 
 $$
 \begin{aligned}
-\textrm{Prob}\{x_0 = i_0, x_1 = i_1, \dots , x_{N-1} = i_{N-1}\} &= \textrm{Prob}\{x_0 = i_0\} \cdot \dots \cdot \textrm{Prob}\{x_{I-1} = i_{I-1}\}\\
+\textrm{Prob}\{x_0 = i_0, x_1 = i_1, \dots , x_{N-1} = i_{N-1}\} &= \textrm{Prob}\{x_0 = i_0\} \cdot \dots \cdot \textrm{Prob}\{x_{N-1} = i_{N-1}\}\\
 &= f_{i_0} f_{i_1} \cdot \dots \cdot f_{i_{N-1}}\\
 \end{aligned}
 $$
@@ -182,13 +193,14 @@ A Central Limit Theorem (CLT) describes a **rate** at which $\tilde {f_i} \to f_
 
 See {doc}`lln_clt` for a detailed treatment of both results.
 
+### Understanding probability: frequentist vs. Bayesian
+
 For "frequentist" statisticians, **anticipated relative frequency** is **all** that a probability distribution means.
 
 But for a Bayesian it means something else -- something partly subjective and purely personal.
 
 We say "partly" because a Bayesian also pays attention to relative frequencies.
 
-
 ## Representing probability distributions
 
 A probability distribution $\textrm{Prob} (X \in A)$ can be described by its **cumulative distribution function (CDF)**
@@ -216,7 +228,7 @@ For a **discrete-valued** random variable
 
 * the number  of possible values of $X$ is finite or countably infinite
 * we replace a  **density** with a **probability mass function**, a non-negative sequence that sums to one
-* we replace integration with summation in the formula like {eq}`eq:CDFfromdensity` that relates a CDF to a probability mass function
+* when a density exists, we replace integration with summation in formulas like {eq}`eq:CDFfromdensity`
 
 
 In this lecture, we mostly discuss discrete random variables.
@@ -297,7 +309,7 @@ An example of a parametric probability distribution is  a **geometric distributi
 It is described by
 
 $$
-f_{i} = \textrm{Prob}\{X=i\} = (1-\lambda)\lambda^{i},\quad \lambda \in [0,1], \quad i = 0, 1, 2, \ldots
+f_{i} = \textrm{Prob}\{X=i\} = (1-\lambda)\lambda^{i},\quad \lambda \in [0,1), \quad i = 0, 1, 2, \ldots
 $$
 
 Evidently,  $\sum_{i=0}^{\infty}f_i=1$.
@@ -310,7 +322,7 @@ $$
 
 ### Continuous random variable
 
-Let $X$ be a continous random variable that takes values $X \in \tilde{X}\equiv[X_U,X_L]$ whose distributions have parameters $\theta$.
+Let $X$ be a continuous random variable that takes values in a set $\tilde{X} \subseteq \mathbb{R}$ and whose distribution has parameters $\theta$.
 
 $$
 \textrm{Prob}\{X\in A\} = \int_{x\in A} f(x;\theta)\,dx;  \quad f(x;\theta)\ge0
@@ -432,7 +444,7 @@ $$
 $$ (eq:condprobbayes)
 
 ```{note}
-Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 1,  \ldots, J$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
+Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 0,  \ldots, I-1$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
 He  then interprets  formula {eq}`eq:condprobbayes` as a procedure for constructing a **posterior** distribution that describes how he would  revise his subjective beliefs after observing that $Y$ equals $j$.  
 ```
 
@@ -839,6 +851,8 @@ class discrete_bijoint:
 
 Let's apply our code to some examples.
 
+### Numerical examples
+
 #### Example 1
 
 ```{code-cell} ipython3
@@ -924,6 +938,8 @@ y = np.linspace(-10, 10, 1_000)
 x_mesh, y_mesh = np.meshgrid(x, y, indexing="ij")
 ```
 
+### Joint, marginal, and conditional distributions
+
 #### Joint distribution
 
 Let's plot the **population** joint density.
@@ -987,9 +1003,9 @@ plt.show()
 For a bivariate normal population distribution, the conditional distributions are also normal:
 
 $$
-\begin{aligned} \\
-[X|Y &= y ]\sim \mathbb{N}\bigg[\mu_X+\rho\sigma_X\frac{y-\mu_Y}{\sigma_Y},\sigma_X^2(1-\rho^2)\bigg] \\
-[Y|X &= x ]\sim \mathbb{N}\bigg[\mu_Y+\rho\sigma_Y\frac{x-\mu_X}{\sigma_X},\sigma_Y^2(1-\rho^2)\bigg]
+\begin{aligned}
+X \mid Y = y &\sim \mathbb{N}\bigg[\mu_X+\rho\sigma_X\frac{y-\mu_Y}{\sigma_Y},\sigma_X^2(1-\rho^2)\bigg] \\
+Y \mid X = x &\sim \mathbb{N}\bigg[\mu_Y+\rho\sigma_Y\frac{x-\mu_X}{\sigma_X},\sigma_Y^2(1-\rho^2)\bigg]
 \end{aligned}
 $$
 
@@ -997,30 +1013,33 @@ $$
 Please see this {doc}`quantecon lecture <multivariate_normal>` for more details.
 ```
 
-Let's approximate  the joint density by discretizing and mapping the approximating joint density into a  matrix.
+Let's approximate the joint density by discretizing and mapping the approximating joint density into a matrix.
+
+On an evenly spaced grid, we can approximate the conditional distribution by assigning probability weights proportional to a slice of the joint density.
 
-We can compute the discretized marginal density  by just using matrix algebra and  noting that
+For fixed $y$, this means that
 
 $$
-\textrm{Prob}\{X=i|Y=j\}=\frac{f_{ij}}{\sum_{i}f_{ij}}
+z_i
+\equiv \frac{f(x_i,y)}{\sum_k f(x_k,y)}
 $$
 
 Fix $y=0$.
 
 ```{code-cell} ipython3
-# discretized marginal density
+# discretized conditional distribution of X given Y = 0
 x = np.linspace(-10, 10, 1_000_000)
 z = func(x, y=0) / np.sum(func(x, y=0))
 plt.plot(x, z)
 plt.show()
 ```
 
-The mean and variance are computed by
+The conditional mean and variance are then approximated by
 
 $$
 \begin{aligned}
-\mathbb{E}\left[X\vert Y=j\right] & =\sum_{i}iProb\{X=i\vert Y=j\}=\sum_{i}i\frac{f_{ij}}{\sum_{i}f_{ij}} \\
-\mathbb{D}\left[X\vert Y=j\right] &=\sum_{i}\left(i-\mu_{X\vert Y=j}\right)^{2}\frac{f_{ij}}{\sum_{i}f_{ij}}
+\mathbb{E}\left[X\vert Y=y\right] & \approx \sum_i x_i z_i \\
+\mathbb{D}\left[X\vert Y=y\right] & \approx \sum_i\left(x_i-\mu_{X\vert Y=y}\right)^{2} z_i
 \end{aligned}
 $$
 
@@ -1042,14 +1061,14 @@ plt.show()
 Fix $x=1$.
 
 ```{code-cell} ipython3
-y = np.linspace(0, 10, 1_000_000)
+y = np.linspace(-10, 10, 1_000_000)
 z = func(x=1, y=y) / np.sum(func(x=1, y=y))
 plt.plot(y,z)
 plt.show()
 ```
 
 ```{code-cell} ipython3
-# discretized mean and standard deviation
+# discretized conditional mean and standard deviation
 μy = np.dot(y,z)
 σy = np.sqrt(np.dot((y - μy)**2, z))
 
@@ -1226,7 +1245,7 @@ Couplings are important in optimal transport problems and in Markov processes. P
 
 ## Copula functions
 
-Suppose that $X_1, X_2, \dots, X_n$ are $N$ random variables  and that
+Suppose that $X_1, X_2, \dots, X_N$ are $N$ random variables  and that
 
 * their marginal distributions are $F_1(x_1), F_2(x_2),\dots, F_N(x_N)$,  and
 
@@ -1238,12 +1257,15 @@ $$
 H(x_1,x_2,\dots,x_N) = C(F_1(x_1), F_2(x_2),\dots,F_N(x_N)).
 $$
 
-We can obtain
+If the marginal distributions are continuous, then the copula is unique.
+In that case, we can recover it from the marginal inverses:
 
 $$
-C(u_1,u_2,\dots,u_n) = H[F^{-1}_1(u_1),F^{-1}_2(u_2),\dots,F^{-1}_N(u_N)]
+C(u_1,u_2,\dots,u_N) = H(F^{-1}_1(u_1),F^{-1}_2(u_2),\dots,F^{-1}_N(u_N))
 $$
 
+When marginal distributions are not continuous, one uses generalized inverses, and the copula is uniquely determined only on $\textrm{Ran}(F_1)\times \cdots \times \textrm{Ran}(F_N)$.
+
 In a reverse direction of logic, given univariate  **marginal distributions**
 $F_1(x_1), F_2(x_2),\dots,F_N(x_N)$ and a copula function $C(\cdot)$, the function $H(x_1,x_2,\dots,x_N) = C(F_1(x_1), F_2(x_2),\dots,F_N(x_N))$ is a **coupling** of $F_1(x_1), F_2(x_2),\dots,F_N(x_N)$.
 
@@ -1252,6 +1274,8 @@ Thus, for given marginal distributions, we can use  a copula function to determi
 
 Copula functions are often used to characterize **dependence** of  random variables.
 
+### Bivariate examples with discrete and continuous distributions
+
 #### Discrete marginal distribution
 
 As mentioned above, for two given marginal distributions there can be more than one coupling.
@@ -1272,9 +1296,8 @@ For these two random variables there can be more than one coupling.
 Let's first generate X and Y.
 
 ```{code-cell} ipython3
-# define parameters
-mu = np.array([0.6, 0.4])
-nu = np.array([0.3, 0.7])
+μ = np.array([0.6, 0.4])
+ν = np.array([0.3, 0.7])
 
 # number of draws
 draws = 1_000_000
@@ -1285,10 +1308,10 @@ p = np.random.rand(draws)
 # generate draws of X and Y via uniform distribution
 x = np.ones(draws)
 y = np.ones(draws)
-x[p <= mu[0]] = 0
-x[p > mu[0]] = 1
-y[p <= nu[0]] = 0
-y[p > nu[0]] = 1
+x[p <= μ[0]] = 0
+x[p > μ[0]] = 1
+y[p <= ν[0]] = 0
+y[p > ν[0]] = 1
 ```
 
 ```{code-cell} ipython3
@@ -1499,12 +1522,12 @@ mystnb:
 from scipy import stats
 
 # Gaussian copula parameters
-rho_cop = 0.8
+ρ_cop = 0.8
 n_cop = 100_000
 
-# Step 1: draw from bivariate standard normal with correlation rho_cop
+# Step 1: draw from bivariate standard normal with correlation ρ_cop
 z = np.random.multivariate_normal(
-    [0, 0], [[1, rho_cop], [rho_cop, 1]], n_cop
+    [0, 0], [[1, ρ_cop], [ρ_cop, 1]], n_cop
 )
 
 # Step 2: apply normal CDF -> uniform marginals (the copula itself)
@@ -1569,24 +1592,20 @@ import numpy as np
 F = np.array([[0.3, 0.2],
               [0.1, 0.4]])
 
-# (a) marginals
-mu = F.sum(axis=1)   # sum over columns -> marginal for X
-nu = F.sum(axis=0)   # sum over rows    -> marginal for Y
-print("mu (marginal of X):", mu)
-print("nu (marginal of Y):", nu)
+μ = F.sum(axis=1)
+ν = F.sum(axis=0)
+print("μ (marginal of X):", μ)
+print("ν (marginal of Y):", ν)
 
-# (b) independence matrix
-F_indep = np.outer(mu, nu)
+F_indep = np.outer(μ, ν)
 print("\nIndependence matrix (outer product):\n", F_indep)
 print("\nActual joint F:\n", F)
 
-# (c) test independence
-print("\nIndependent (F == mu ⊗ nu)?", np.allclose(F, F_indep))
+print("\nIndependent (F == μ ⊗ ν)?", np.allclose(F, F_indep))
 
-# (d) conditional vs. marginal
-prob_X0_given_Y10 = F[0, 0] / nu[0]
+prob_X0_given_Y10 = F[0, 0] / ν[0]
 print(f"\nProb(X=0 | Y=10) = {prob_X0_given_Y10:.4f}")
-print(f"Prob(X=0)         = {mu[0]:.4f}")
+print(f"Prob(X=0)         = {μ[0]:.4f}")
 ```
 
 ```{solution-end}
@@ -1620,22 +1639,19 @@ ys = np.array([10, 20])
 F  = np.array([[0.3, 0.2],
                [0.1, 0.4]])
 
-mu = F.sum(axis=1)
-nu = F.sum(axis=0)
+μ = F.sum(axis=1)
+ν = F.sum(axis=0)
 
-# (a)
-E_X  = xs @ mu
-E_Y  = ys @ nu
+E_X  = xs @ μ
+E_Y  = ys @ ν
 E_XY = sum(xs[i] * ys[j] * F[i, j] for i in range(2) for j in range(2))
 print(f"E[X] = {E_X}, E[Y] = {E_Y}, E[XY] = {E_XY}")
 
-# (b)
 cov_XY = E_XY - E_X * E_Y
 print(f"Cov(X,Y) = {cov_XY:.4f}")
 
-# (c)
-var_X  = ((xs - E_X)**2) @ mu
-var_Y  = ((ys - E_Y)**2) @ nu
+var_X  = ((xs - E_X)**2) @ μ
+var_Y  = ((ys - E_Y)**2) @ ν
 cor_XY = cov_XY / np.sqrt(var_X * var_Y)
 print(f"Cor(X,Y) = {cor_XY:.4f}")
 ```
@@ -1677,12 +1693,10 @@ Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z =
 import numpy as np
 import matplotlib.pyplot as plt
 
-# (a) convolution
 f = np.ones(6) / 6
-h = np.convolve(f, f)        # Z takes values 2,...,12
+h = np.convolve(f, f)
 z_vals = np.arange(2, 13)
 
-# (b & c) plot theory and simulation
 n = 1_000_000
 z_sim = np.random.randint(1, 7, n) + np.random.randint(1, 7, n)
 counts = np.bincount(z_sim, minlength=13)[2:]
@@ -1695,7 +1709,6 @@ ax.set_ylabel('Probability')
 ax.legend()
 plt.show()
 
-# (d) moments
 E_Z   = z_vals @ h
 Var_Z = ((z_vals - E_Z)**2) @ h
 print(f"Theory:     E[Z] = {E_Z:.2f}, Var(Z) = {Var_Z:.4f}")
@@ -1734,21 +1747,18 @@ import numpy as np
 
 P    = np.array([[0.9, 0.1],
                  [0.2, 0.8]])
-psi0 = np.array([1.0, 0.0])
+ψ0 = np.array([1.0, 0.0])
 
-# (a)
 for n in [1, 5, 20, 100]:
-    print(f"psi_{n:3d} = {psi0 @ np.linalg.matrix_power(P, n)}")
+    print(f"ψ_{n:3d} = {ψ0 @ np.linalg.matrix_power(P, n)}")
 
-# (b) stationary: solve (P^T - I) psi = 0  with  sum = 1
 A = np.vstack([P.T - np.eye(2), np.ones(2)])
 b = np.array([0.0, 0.0, 1.0])
-psi_star, *_ = np.linalg.lstsq(A, b, rcond=None)
-print(f"\nStationary distribution: {psi_star}")
+ψ_star, *_ = np.linalg.lstsq(A, b, rcond=None)
+print(f"\nStationary distribution: {ψ_star}")
 
-# (c) verify
-psi_100 = psi0 @ np.linalg.matrix_power(P, 100)
-print(f"psi_100 close to stationary? {np.allclose(psi_100, psi_star, atol=1e-6)}")
+ψ_100 = ψ0 @ np.linalg.matrix_power(P, 100)
+print(f"ψ_100 close to stationary? {np.allclose(ψ_100, ψ_star, atol=1e-6)}")
 ```
 
 ```{solution-end}
@@ -1781,37 +1791,32 @@ import numpy as np
 
 xs = np.array([0, 1])
 ys = np.array([0, 1])
-mu = np.array([0.5, 0.5])
-nu = np.array([0.4, 0.6])
+μ = np.array([0.5, 0.5])
+ν = np.array([0.4, 0.6])
 
-# (a) upper Fréchet: maximise P(X=i, Y=i)
 F_upper = np.array([[0.4, 0.1],
                     [0.0, 0.5]])
 
-# (b) lower Fréchet: maximise P(X=i, Y=1-i)
 F_lower = np.array([[0.0, 0.5],
                     [0.4, 0.1]])
 
-# (c) independent
-F_indep = np.outer(mu, nu)
+F_indep = np.outer(μ, ν)
 
-# (d) check marginals
 for F, name in [(F_upper, "Upper Fréchet"),
                 (F_lower, "Lower Fréchet"),
                 (F_indep, "Independent  ")]:
     print(f"{name}: row sums = {F.sum(axis=1)}, col sums = {F.sum(axis=0)}")
 
-# (e) correlations
 def correlation(F, xs, ys):
-    mu_x  = F.sum(axis=1)
-    nu_y  = F.sum(axis=0)
-    E_X   = xs @ mu_x
-    E_Y   = ys @ nu_y
+    μ_x  = F.sum(axis=1)
+    ν_y  = F.sum(axis=0)
+    E_X  = xs @ μ_x
+    E_Y  = ys @ ν_y
     E_XY  = sum(xs[i]*ys[j]*F[i,j] for i in range(2) for j in range(2))
     cov   = E_XY - E_X * E_Y
-    sig_X = np.sqrt(((xs - E_X)**2) @ mu_x)
-    sig_Y = np.sqrt(((ys - E_Y)**2) @ nu_y)
-    return cov / (sig_X * sig_Y)
+    σ_X = np.sqrt(((xs - E_X)**2) @ μ_x)
+    σ_Y = np.sqrt(((ys - E_Y)**2) @ ν_y)
+    return cov / (σ_X * σ_Y)
 
 print(f"\nCor upper Fréchet = {correlation(F_upper, xs, ys):.4f}  (maximum)")
 print(f"Cor lower Fréchet = {correlation(F_lower, xs, ys):.4f}  (minimum)")
@@ -1852,28 +1857,28 @@ import numpy as np
 import matplotlib.pyplot as plt
 from scipy.special import comb
 
-thetas = np.array([0.2, 0.5, 0.8])
-prior  = np.array([0.25, 0.50, 0.25])
+θ_vals = np.array([0.2, 0.5, 0.8])
+π = np.array([0.25, 0.50, 0.25])
 
-def compute_posterior(k, n, thetas, prior):
-    likelihood = comb(n, k) * thetas**k * (1 - thetas)**(n - k)
-    unnorm = likelihood * prior
+def compute_posterior(k, n, θ_vals, π):
+    likelihood = comb(n, k) * θ_vals**k * (1 - θ_vals)**(n - k)
+    unnorm = likelihood * π
     return unnorm / unnorm.sum(), likelihood
 
-post7, lik7 = compute_posterior(7, 10, thetas, prior)
-post3, lik3 = compute_posterior(3, 10, thetas, prior)
+post7, lik7 = compute_posterior(7, 10, θ_vals, π)
+post3, lik3 = compute_posterior(3, 10, θ_vals, π)
 
 print("k=7:  likelihood =", lik7.round(4), " posterior =", post7.round(4))
 print("k=3:  likelihood =", lik3.round(4), " posterior =", post3.round(4))
 
-x = np.arange(len(thetas))
+x = np.arange(len(θ_vals))
 w = 0.3
 fig, axes = plt.subplots(1, 2, figsize=(10, 4))
 for ax, post, title in zip(axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
-    ax.bar(x - w/2, prior, w, label='Prior',     alpha=0.7)
+    ax.bar(x - w/2, π, w, label='Prior',     alpha=0.7)
     ax.bar(x + w/2, post,  w, label='Posterior', alpha=0.7)
     ax.set_xticks(x)
-    ax.set_xticklabels([f'θ={t}' for t in thetas])
+    ax.set_xticklabels([f'θ={t}' for t in θ_vals])
     ax.set_ylabel('Probability')
     ax.set_title(title)
     ax.legend()

From 142fa51d1f1473197f71d1fbb710deda6b29dafe Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 22:33:59 +1000
Subject: [PATCH 05/26] updates

---
 lectures/multivariate_normal.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 55c513e3a..a3be75575 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -2346,7 +2346,7 @@ import numpy as np
 mn = MultivariateNormal(μ, Σ)
 mn.partition(1)
 μ1_hat, Σ11_hat = mn.cond_dist(0, np.array([2.]))
-print(f"Analytical  μ̂₁ = {μ1_hat[0]:.4f},  Σ̂₁₁ = {Σ11_hat[0,0]:.4f}")
+print(f"Analytical  μ1_hat = {μ1_hat[0]:.4f},  Σ11_hat = {Σ11_hat[0,0]:.4f}")
 
 n = 1_000_000
 data = np.random.multivariate_normal(μ, Σ, size=n)
@@ -2355,7 +2355,7 @@ z1_all, z2_all = data[:, 0], data[:, 1]
 mask = np.abs(z2_all - 2.) < 0.05
 z1_cond = z1_all[mask]
 print(f"Sample size in band: {mask.sum()}")
-print(f"Sample      μ̂₁ = {np.mean(z1_cond):.4f},  Σ̂₁₁ = {np.var(z1_cond, ddof=1):.4f}")
+print(f"Sample      μ1_hat = {np.mean(z1_cond):.4f},  Σ11_hat = {np.var(z1_cond, ddof=1):.4f}")
 ```
 
 ```{solution-end}
@@ -2398,7 +2398,7 @@ for ρ in [0.2, 0.5, 0.9]:
     mn = MultivariateNormal(np.zeros(2), Σ)
     mn.partition(1)
     product = float(mn.βs[0]) * float(mn.βs[1])
-    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ² = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
+    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ^2 = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
 ```
 
 ```{solution-end}
@@ -2504,15 +2504,15 @@ for σθ_val in σθ_vals:
 
 fig, ax = plt.subplots()
 ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', label=r'$\hat{\mu}_\theta$')
-ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean ȳ = {y_bar:.1f}')
+ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean y_bar = {y_bar:.1f}')
 ax.axhline(μθ_val, ls=':',  color='g', label=f'prior mean μθ = {μθ_val:.0f}')
 ax.set_xlabel(r'$\sigma_\theta$')
 ax.set_ylabel(r'posterior mean $\hat{\mu}_\theta$')
 ax.legend()
 plt.show()
 
-print(f"ȳ = {y_bar:.4f}")
-print(f"Large σθ posterior mean ≈ {μθ_hat_vals[-1]:.4f}")
+print(f"y_bar = {y_bar:.4f}")
+print(f"Large σθ posterior mean approx {μθ_hat_vals[-1]:.4f}")
 ```
 
 ```{solution-end}

From b353c4e024830b5523bda20d099a9ff66eb6cc96 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 22:35:55 +1000
Subject: [PATCH 06/26] update

---
 lectures/information_market_equilibrium.md | 26 +++++++++++-----------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 475df4d1f..fdba548c8 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -1032,11 +1032,11 @@ mystnb:
     caption: price distribution convergence
     name: fig-price-convergence
 ---
-def price_expectation(h_t, p_bar_true, p_bar_alt, sigma_p, p_grid):
+def price_expectation(h_t, p_bar_true, p_bar_alt, σ_p, p_grid):
     """Return the predictive price density at posterior weight h_t."""
     return (
-        h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
-        + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p)
+        h_t * norm.pdf(p_grid, loc=p_bar_true, scale=σ_p)
+        + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=σ_p)
     )
 
 
@@ -1542,7 +1542,7 @@ prior support.
 
 ```{code-cell} ipython3
 def simulate_misspecified(
-    T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0
+    T, p_bar_true, p_bar_wrong, σ_p, h0, n_paths, seed=0
 ):
     """Simulate learning under a misspecified two-model prior."""
     rng = np.random.default_rng(seed)
@@ -1551,9 +1551,9 @@ def simulate_misspecified(
 
     for path in range(n_paths):
         h = np.array(h0, dtype=float)
-        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        prices = rng.normal(p_bar_true, σ_p, size=T)
         for t, price in enumerate(prices):
-            likes = norm.pdf(price, loc=p_bar_wrong, scale=sigma_p)
+            likes = norm.pdf(price, loc=p_bar_wrong, scale=σ_p)
             h = h * likes
             h /= h.sum()
             h_paths[path, t + 1, :] = h
@@ -1561,24 +1561,24 @@ def simulate_misspecified(
     return h_paths
 
 
-def predictive_density(weights, means, sigma_p, p_grid):
+def predictive_density(weights, means, σ_p, p_grid):
     """Return the predictive density under the current posterior weights."""
     density = np.zeros_like(p_grid)
     for weight, mean in zip(weights, means):
-        density += weight * norm.pdf(p_grid, loc=mean, scale=sigma_p)
+        density += weight * norm.pdf(p_grid, loc=mean, scale=σ_p)
     return density
 
 
 T = 1000
 p_true = 2.0
 p_wrong = np.array([1.5, 2.3])
-sigma_p = 0.4
+σ_p = 0.4
 h0 = np.array([0.5, 0.5])
 n_paths = 30
 
-h_misspec = simulate_misspecified(T, p_true, p_wrong, sigma_p, h0, n_paths)
+h_misspec = simulate_misspecified(T, p_true, p_wrong, σ_p, h0, n_paths)
 
-kl_vals = (p_true - p_wrong)**2 / (2 * sigma_p**2)
+kl_vals = (p_true - p_wrong)**2 / (2 * σ_p**2)
 for mean, kl in zip(p_wrong, kl_vals):
     print(f"KL(true || N({mean:.1f}, sigma^2)) = {kl:.4f}")
 
@@ -1607,12 +1607,12 @@ closer_idx = np.argmin(kl_vals)
 fig, ax = plt.subplots(figsize=(8, 4))
 colors = plt.cm.Blues(np.linspace(0.3, 1.0, 4))
 for t_snap, color in zip([0, 10, 100, T], colors):
-    dens = predictive_density(median_path[t_snap], p_wrong, sigma_p, p_grid)
+    dens = predictive_density(median_path[t_snap], p_wrong, σ_p, p_grid)
     ax.plot(p_grid, dens, color=color, lw=2, label=f"t = {t_snap}")
 
 ax.plot(
     p_grid,
-    norm.pdf(p_grid, loc=p_wrong[closer_idx], scale=sigma_p),
+    norm.pdf(p_grid, loc=p_wrong[closer_idx], scale=σ_p),
     "k--",
     lw=2,
     label="KL-best wrong model",

From 1ae9e85544b9898119f449f521eafc259419a7de Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Tue, 21 Apr 2026 19:18:19 -0600
Subject: [PATCH 07/26] Tom's April 21 edits of new lecture

---
 lectures/_static/quant-econ.bib          |   89 ++
 lectures/_toc.yml                        |    1 +
 lectures/misspecified_recovery.md        | 1348 ++++++++++++++++++++++
 lectures/misspecified_recovery_extra.bib |  111 ++
 4 files changed, 1549 insertions(+)
 create mode 100644 lectures/misspecified_recovery.md
 create mode 100644 lectures/misspecified_recovery_extra.bib

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 82f5cc7ec..b178ec0ae 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -1,3 +1,92 @@
+@article{BorovickaHansenScheinkman2016,
+  author    = {Borovička, Jaroslav and Hansen, Lars Peter and Scheinkman, José A.},
+  title     = {Misspecified Recovery},
+  journal   = {Journal of Finance},
+  volume    = {71},
+  number    = {6},
+  pages     = {2493--2544},
+  year      = {2016},
+  doi       = {10.1111/jofi.12404}
+}
+
+@article{Ross2015,
+  author    = {Ross, Stephen A.},
+  title     = {The Recovery Theorem},
+  journal   = {Journal of Finance},
+  volume    = {70},
+  number    = {2},
+  pages     = {615--648},
+  year      = {2015},
+  doi       = {10.1111/jofi.12092}
+}
+
+@article{HansenScheinkman2009,
+  author    = {Hansen, Lars Peter and Scheinkman, José A.},
+  title     = {Long-Term Risk: An Operator Approach},
+  journal   = {Econometrica},
+  volume    = {77},
+  number    = {1},
+  pages     = {177--234},
+  year      = {2009},
+  doi       = {10.3982/ECTA6761}
+}
+
+@article{AlvarezJermann2005,
+  author    = {Alvarez, Fernando and Jermann, Urban J.},
+  title     = {Using Asset Prices to Measure the Persistence of the Marginal Utility of Wealth},
+  journal   = {Econometrica},
+  volume    = {73},
+  number    = {6},
+  pages     = {1977--2016},
+  year      = {2005},
+  doi       = {10.1111/j.1468-0262.2005.00643.x}
+}
+
+@article{BakshiChabiYo2012,
+  author    = {Bakshi, Gurdip and Chabi-Yo, Fousseni},
+  title     = {Variance Bounds on the Permanent and Transitory Components of Stochastic Discount Factors},
+  journal   = {Journal of Financial Economics},
+  volume    = {105},
+  number    = {1},
+  pages     = {191--208},
+  year      = {2012},
+  doi       = {10.1016/j.jfineco.2011.10.004}
+}
+
+@article{BackusGregoryZin1989,
+  author    = {Backus, David K. and Gregory, Allan W. and Zin, Stanley E.},
+  title     = {Risk Premiums in the Term Structure: Evidence from Artificial Economies},
+  journal   = {Journal of Monetary Economics},
+  volume    = {24},
+  number    = {3},
+  pages     = {371--399},
+  year      = {1989},
+  doi       = {10.1016/0304-3932(89)90033-X}
+}
+
+@article{Hansen2012,
+  author    = {Hansen, Lars Peter},
+  title     = {Dynamic Valuation Decomposition within Stochastic Economies},
+  journal   = {Econometrica},
+  volume    = {80},
+  number    = {3},
+  pages     = {911--967},
+  year      = {2012},
+  note      = {Fisher--Schultz Lecture},
+  doi       = {10.3982/ECTA8070}
+}
+
+@article{BackusChernovZin2014,
+  author    = {Backus, David K. and Chernov, Mikhail and Zin, Stanley E.},
+  title     = {Sources of Entropy in Representative Agent Models},
+  journal   = {Journal of Finance},
+  volume    = {69},
+  number    = {1},
+  pages     = {51--99},
+  year      = {2014},
+  doi       = {10.1111/jofi.12090}
+}
+
 @article{Borovicka2020,
   author    = {Borovička, Jaroslav},
   title     = {Survival and Long-Run Dynamics with Heterogeneous Beliefs under Recursive Preferences},
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index f322b2864..b24d80c17 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -141,6 +141,7 @@ parts:
   - file: harrison_kreps
   - file: morris_learn
   - file: affine_risk_prices
+  - file: misspecified_recovery
 - caption: Data and Empirics
   numbered: true
   chapters:
diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
new file mode 100644
index 000000000..c9d40c948
--- /dev/null
+++ b/lectures/misspecified_recovery.md
@@ -0,0 +1,1348 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(misspecified_recovery)=
+```{raw} html
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Misspecified Recovery
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+Asset prices are forward-looking: they encode investors' expectations about future economic
+states and their valuations of different risks.  A long-standing question in finance is
+whether we can *recover* the probability distribution used by investors — their subjective
+beliefs — from observed asset prices alone.
+
+{cite}`BorovickaHansenScheinkman2016` study the challenge of separating investors'
+beliefs from their risk preferences using **Perron–Frobenius theory**.  The key finding
+is that Perron–Frobenius theory applied to Arrow prices recovers a **long-term risk-neutral
+measure** that absorbs all long-horizon risk adjustments.  This recovered measure coincides
+with investors' subjective beliefs only under a stringent — and often empirically
+implausible — restriction on the stochastic discount factor.
+
+After completing this lecture you will be able to:
+
+- Explain why Arrow prices alone cannot identify both transition probabilities and stochastic
+  discount factors without additional restrictions.
+- Construct **risk-neutral** and **long-term risk-neutral** transition matrices from Arrow
+  prices using the Perron–Frobenius eigenvalue–eigenvector decomposition.
+- Decompose any stochastic discount factor process into a trend component, a state-dependent
+  component, and a **martingale component**, and explain what the martingale encodes.
+- Identify the exact condition under which {cite}`Ross2015`'s Recovery Theorem succeeds,
+  and show that this condition fails in empirically relevant models with recursive utility
+  or permanent consumption shocks.
+- Simulate the {cite}`Bansal_Yaron_2004` long-run risk model and compare the stationary
+  distributions under the physical and recovered probability measures.
+
+### Related lectures
+
+- {doc}`affine_risk_prices`: affine models of the stochastic discount factor and term structure.
+- {doc}`markov_asset`: Markov asset pricing and stationary equilibria.
+- {doc}`harrison_kreps`: risk-neutral pricing and the change-of-measure approach.
+
+## Setup
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+import matplotlib.cm as cm
+from scipy import linalg
+from scipy.stats import gaussian_kde
+import warnings
+warnings.filterwarnings('ignore')
+
+plt.rcParams.update({
+    'axes.spines.top': False,
+    'axes.spines.right': False,
+    'font.size': 11,
+    'figure.dpi': 110,
+})
+```
+
+## Arrow Prices and the Identification Challenge
+
+### Arrow prices and stochastic discount factors
+
+Consider a discrete-time economy with an $n$-state Markov chain $\{X_t\}$ governed
+by transition matrix $\mathbf{P} = [p_{ij}]$.  An **Arrow price** $q_{ij}$ is the
+date-$t$ price of a claim that pays $\$1$ tomorrow in state $j$ given that the current
+state is $i$.  We collect these prices in a matrix $\mathbf{Q} = [q_{ij}]$.
+
+A **stochastic discount factor** (SDF) $s_{ij}$ prices risk by discounting the payoff
+in state $j$ tomorrow when today's state is $i$.  Arrow prices and the SDF are linked by
+
+$$
+q_{ij} = s_{ij} \, p_{ij}.
+$$
+
+Given $\mathbf{Q}$, any pair $(\mathbf{S}, \mathbf{P})$ satisfying $q_{ij} = s_{ij} p_{ij}$
+for all $(i,j)$ is consistent with the observed prices.  The fundamental identification
+problem is that $\mathbf{Q}$ has $n^2$ entries, $\mathbf{P}$ has $n(n-1)$ free entries
+(rows sum to one), and $\mathbf{S}$ has $n^2$ free entries — so there are far more
+unknowns than equations.
+
+To make progress, we can impose restrictions on the SDF.  Two classical restrictions are
+studied in the sections that follow.
+
+### A three-state illustration
+
+To build intuition, we work with a three-state Markov chain representing
+**recession**, **normal**, and **expansion** phases of the business cycle.
+The physical transition matrix and consumption levels are:
+
+```{code-cell} ipython3
+# Physical transition matrix (recession, normal, expansion)
+P_phys = np.array([
+    [0.70, 0.25, 0.05],   # from recession
+    [0.15, 0.65, 0.20],   # from normal
+    [0.05, 0.30, 0.65],   # from expansion
+])
+
+# Consumption levels in each state (arbitrary units)
+c_levels = np.array([0.85, 1.00, 1.15])
+state_names = ['Recession', 'Normal', 'Expansion']
+
+# Preference parameters
+delta  = 0.99    # monthly discount factor
+gamma  = 5.0     # coefficient of relative risk aversion
+
+# Arrow price matrix under power utility with rational expectations:
+#   q_ij = delta * (c_j / c_i)^{-gamma} * p_ij
+n = len(c_levels)
+Q_mat = np.zeros((n, n))
+for i in range(n):
+    for j in range(n):
+        Q_mat[i, j] = delta * (c_levels[j] / c_levels[i])**(-gamma) * P_phys[i, j]
+
+print("Arrow price matrix Q:")
+print(np.round(Q_mat, 5))
+print(f"\nSum of each row (= price of risk-free bond): {Q_mat.sum(axis=1).round(5)}")
+```
+
+## Risk-Neutral Probabilities
+
+The **risk-neutral restriction** sets
+
+$$
+\bar{s}_{i,j} = \bar{q}_i
+$$
+
+where $\bar{q}_i = \sum_j q_{ij}$ is the price of a one-period discount bond in state $i$.
+Under this restriction all future states are discounted equally from state $i$, so risk
+adjustments depend only on the current state.  The resulting risk-neutral probabilities are
+
+$$
+\bar{p}_{ij} = \frac{q_{ij}}{\bar{q}_i}.
+$$
+
+```{code-cell} ipython3
+def risk_neutral_probs(Q):
+    """Compute risk-neutral transition matrix from Arrow price matrix."""
+    q_bonds = Q.sum(axis=1)            # one-period bond prices
+    P_bar   = Q / q_bonds[:, np.newaxis]
+    return P_bar, q_bonds
+
+
+P_bar, q_bonds = risk_neutral_probs(Q_mat)
+
+print("One-period bond prices (risk-free discount factors):")
+for i, (s, qb) in enumerate(zip(state_names, q_bonds)):
+    print(f"  {s:12s}: {qb:.5f}  (annualized yield ≈ {-np.log(qb)*12:.2%})")
+
+print("\nRisk-neutral transition matrix P̄:")
+print(np.round(P_bar, 4))
+print(f"\nRow sums: {P_bar.sum(axis=1)}")
+```
+
+```{note}
+Risk-neutral probabilities absorb **one-period** (short-run) risk adjustments.  They are
+widely used in financial engineering but are generally *not* equal to investors' beliefs.
+When short-term interest rates vary across states, risk-neutral probabilities are
+also horizon-dependent: the $t$-period forward measure differs from $\bar{\mathbf{P}}^t$.
+```
+
+## Long-Term Risk-Neutral Probabilities: Perron–Frobenius Theory
+
+### The eigenvalue problem
+
+The long-term behavior of discount factors is governed by a different restriction.
+**Long-term risk pricing** sets
+
+$$
+\hat{s}_{ij} = \exp(\hat{\eta}) \frac{\hat{e}_i}{\hat{e}_j}
+$$
+
+for a scalar $\hat{\eta}$ and a vector of positive numbers $\{\hat{e}_i\}$.
+Substituting into $q_{ij} = s_{ij} p_{ij}$ gives:
+
+$$
+\hat{p}_{ij} = \exp(-\hat{\eta}) \, q_{ij} \, \frac{\hat{e}_j}{\hat{e}_i}.
+$$
+
+For $\hat{\mathbf{P}}$ to be a valid transition matrix (rows summing to one), we need
+$\sum_j \hat{p}_{ij} = 1$, which requires
+
+$$
+\sum_j q_{ij} \hat{e}_j = \exp(\hat{\eta}) \hat{e}_i, \quad \text{i.e.,} \quad \mathbf{Q} \hat{\mathbf{e}} = \exp(\hat{\eta}) \hat{\mathbf{e}}.
+$$
+
+This is an **eigenvalue–eigenvector problem** for the Arrow price matrix $\mathbf{Q}$.
+By the **Perron–Frobenius theorem**, if $\mathbf{Q}$ has strictly positive entries, the
+dominant eigenvalue is unique, real, and positive, and its eigenvector has strictly
+positive entries.  This gives a unique construction:
+
+1. Solve $\mathbf{Q} \hat{\mathbf{e}} = \exp(\hat{\eta}) \hat{\mathbf{e}}$ for the
+   dominant eigenvalue–eigenvector pair.
+2. Set $\hat{p}_{ij} = \exp(-\hat{\eta}) \, q_{ij} \, \hat{e}_j / \hat{e}_i$.
+
+{cite}`BorovickaHansenScheinkman2016` call the resulting $\hat{\mathbf{P}}$ the
+**long-term risk-neutral measure** because, under $\hat{\mathbf{P}}$, the long-horizon
+risk premia on stochastically growing cash flows are identically zero.
+
+### Python implementation
+
+```{code-cell} ipython3
+def perron_frobenius(Q):
+    """
+    Compute the Perron-Frobenius decomposition of an Arrow price matrix.
+
+    Parameters
+    ----------
+    Q : ndarray, shape (n, n)
+        Arrow price matrix.
+
+    Returns
+    -------
+    eta_hat   : float   — log of the dominant eigenvalue
+    exp_eta   : float   — dominant eigenvalue exp(η̂)
+    e_hat     : ndarray — dominant eigenvector (positive, normalized to sum=1)
+    P_hat     : ndarray — long-term risk-neutral transition matrix
+    """
+    eigenvalues, eigenvectors = linalg.eig(Q)
+
+    # Dominant eigenvalue: largest real part (real & positive by Perron–Frobenius)
+    idx     = np.argmax(eigenvalues.real)
+    exp_eta = eigenvalues[idx].real
+    e_hat   = eigenvectors[:, idx].real
+
+    # Ensure positive entries (PF guarantees existence; numpy may flip sign)
+    if e_hat.mean() < 0:
+        e_hat = -e_hat
+    e_hat = np.abs(e_hat) / np.abs(e_hat).sum()   # normalize to sum = 1
+
+    eta_hat = np.log(exp_eta)
+
+    # Long-term risk-neutral transition matrix
+    # P_hat[i,j] = exp(-η̂) * Q[i,j] * e_hat[j] / e_hat[i]
+    P_hat = (1.0 / exp_eta) * Q * e_hat[np.newaxis, :] / e_hat[:, np.newaxis]
+
+    return eta_hat, exp_eta, e_hat, P_hat
+
+
+eta_hat, exp_eta, e_hat, P_hat = perron_frobenius(Q_mat)
+
+print(f"Dominant eigenvalue  exp(η̂) = {exp_eta:.6f}")
+print(f"Log eigenvalue       η̂      = {eta_hat:.5f}  "
+      f"(annualized ≈ {eta_hat*12:.4f})")
+print(f"\nEigenvector ê = {e_hat.round(5)}")
+print(f"\nLong-term risk-neutral P̂:")
+print(np.round(P_hat, 4))
+print(f"\nRow sums: {P_hat.sum(axis=1)}")
+```
+
+### Comparing the three probability measures
+
+```{code-cell} ipython3
+fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
+
+matrices = [
+    (P_phys, r'Physical  $\mathbf{P}$',            'Blues'),
+    (P_bar,  r'Risk-neutral  $\bar{\mathbf{P}}$',  'Oranges'),
+    (P_hat,  r'Long-term risk-neutral $\hat{\mathbf{P}}$', 'Greens'),
+]
+
+for ax, (mat, title, cmap) in zip(axes, matrices):
+    im = ax.imshow(mat, cmap=cmap, vmin=0, vmax=0.85, aspect='auto')
+    ax.set_title(title, fontsize=12, pad=10)
+    ax.set_xticks(range(n));  ax.set_yticks(range(n))
+    ax.set_xticklabels(state_names, rotation=20, fontsize=9)
+    ax.set_yticklabels(state_names, fontsize=9)
+    ax.set_xlabel('Next state', fontsize=9)
+    ax.set_ylabel('Current state', fontsize=9)
+    plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
+    for i in range(n):
+        for j in range(n):
+            clr = 'white' if mat[i, j] > 0.45 else 'black'
+            ax.text(j, i, f'{mat[i,j]:.3f}', ha='center', va='center',
+                    fontsize=9, color=clr)
+
+plt.suptitle('Transition Matrices Under Alternative Probability Measures',
+             fontsize=13, y=1.02)
+plt.tight_layout()
+plt.show()
+```
+
+```{code-cell} ipython3
+# Stationary distributions under each measure
+def stationary_dist(P):
+    """Compute stationary distribution of an ergodic transition matrix P."""
+    n = P.shape[0]
+    A      = (P.T - np.eye(n))
+    A[-1]  = 1.0
+    b      = np.zeros(n);  b[-1] = 1.0
+    return linalg.solve(A, b)
+
+pi_phys = stationary_dist(P_phys)
+pi_bar  = stationary_dist(P_bar)
+pi_hat  = stationary_dist(P_hat)
+
+fig, ax = plt.subplots(figsize=(8, 4))
+x  = np.arange(n)
+w  = 0.25
+labels = [r'Physical $P$', r'Risk-neutral $\bar{P}$',
+          r'Long-term risk-neutral $\hat{P}$']
+colors = ['steelblue', 'darkorange', 'forestgreen']
+for k, (pi, lbl, col) in enumerate(zip([pi_phys, pi_bar, pi_hat], labels, colors)):
+    bars = ax.bar(x + k*w, pi, width=w, label=lbl, color=col, alpha=0.85,
+                  edgecolor='white')
+    for b_, v in zip(bars, pi):
+        ax.text(b_.get_x() + w/2, v + 0.008, f'{v:.3f}',
+                ha='center', va='bottom', fontsize=9)
+
+ax.set_xticks(x + w);  ax.set_xticklabels(state_names)
+ax.set_ylabel('Stationary probability')
+ax.set_title('Stationary Distributions Under Three Probability Measures')
+ax.legend(fontsize=9)
+plt.tight_layout();  plt.show()
+
+print("Stationary distributions:")
+for lbl, pi in zip(labels, [pi_phys, pi_bar, pi_hat]):
+    print(f"  {lbl:45s}: {np.round(pi,4)}")
+```
+
+The long-term risk-neutral measure $\hat{\mathbf{P}}$ assigns **higher weight to bad
+states** (recession) and **lower weight to good states** (expansion) than the physical
+measure $\mathbf{P}$.  This is the risk adjustment for long-run growth uncertainty: a
+risk-averse investor's long-run discount rates embed a premium for permanent income risk.
+
+## The Martingale Decomposition
+
+### Decomposing the SDF process
+
+Let $\hat{\mathbf{e}}$ and $\hat{\eta}$ solve the Perron–Frobenius problem.  Define the
+process
+
+$$
+\frac{\hat{H}_{t+1}}{\hat{H}_t} = (X_t)' \hat{\mathbf{H}} X_{t+1},
+\quad \text{where} \quad
+\hat{h}_{ij} = \frac{\hat{p}_{ij}}{p_{ij}}.
+$$
+
+Because $\sum_j \hat{h}_{ij} p_{ij} = \sum_j \hat{p}_{ij} = 1$, the process $\hat{H}$
+is a martingale under the physical measure $\mathbf{P}$.  The accumulated SDF then admits
+the **multiplicative decomposition**:
+
+$$
+S_t = \exp(\hat{\eta} t) \left(\frac{\hat{e}(X_0)}{\hat{e}(X_t)}\right)
+      \left(\frac{\hat{H}_t}{\hat{H}_0}\right).
+$$
+
+The three components are:
+
+| Component | Interpretation |
+|---|---|
+| $\exp(\hat{\eta} t)$ | Deterministic exponential discounting; $-\hat{\eta}$ is the long-run yield |
+| $\hat{e}(X_0)/\hat{e}(X_t)$ | State-dependent trend; mean-stationary under $\hat{\mathbf{P}}$ |
+| $\hat{H}_t/\hat{H}_0$ | Martingale; encodes long-run risk adjustments |
+
+```{code-cell} ipython3
+# SDF matrix: s_ij = q_ij / p_ij
+S_mat = np.where(P_phys > 0, Q_mat / P_phys, 0.0)
+
+# Trend SDF: ŝ_ij = exp(η̂) * e_hat_i / e_hat_j
+S_hat = exp_eta * e_hat[:, np.newaxis] / e_hat[np.newaxis, :]
+
+# Martingale increment: ĥ_ij = P̂_ij / P_ij  (also = S_ij / Ŝ_ij)
+H_incr = np.where(P_phys > 0, P_hat / P_phys, 0.0)
+
+print("SDF matrix S = Q/P:")
+print(np.round(S_mat, 4))
+print("\nTrend SDF Ŝ = exp(η̂) × ê_i / ê_j:")
+print(np.round(S_hat, 4))
+print("\nMartingale increment ĥ = Ŝ × H̃_incr (= P̂/P):")
+print(np.round(H_incr, 4))
+
+# Verify martingale property: E[ĥ_{ij} | X_t=i] = sum_j ĥ_ij * p_ij = 1
+mart_check = (H_incr * P_phys).sum(axis=1)
+print(f"\nMartingale property check — E[ĥ | X_t=i] = {mart_check}")
+```
+
+Higher risk aversion amplifies the pessimistic distortion: as $\gamma$ increases, the
+recovered measure assigns growing probability to the recession state.
+(The figures illustrating this appear below, after we define the Epstein–Zin utility
+function that is needed to compute them.)
+
+## When Does Recovery Succeed?
+
+### The Ross recovery condition
+
+{cite}`Ross2015` proposes to identify investors' subjective beliefs by imposing
+
+$$
+\widetilde{S}_t = \exp(-\delta t) \frac{m(X_t)}{m(X_0)}
+$$
+
+for some positive function $m$ and discount rate $\delta$ (Condition 4 in
+{cite}`BorovickaHansenScheinkman2016`).  Under this restriction, the SDF has **no
+martingale component**: $\hat{H}_t \equiv 1$.
+
+Equivalently, recovery succeeds if and only if the physical stochastic discount factor
+takes the "long-term risk pricing" form
+
+$$
+s_{ij} = \exp(\hat{\eta}) \frac{\hat{e}_i}{\hat{e}_j}
+$$
+
+with $\hat{h}_{ij} \equiv 1$.  In this case $\hat{\mathbf{P}} = \mathbf{P}$ and the
+Perron–Frobenius procedure recovers the true probabilities.
+
+The critical question is: when is the martingale component degenerate?
+
+### Power utility with trend-stationary consumption
+
+Consider a power-utility investor with risk aversion $\gamma$ and *trend-stationary*
+consumption $C_t = \exp(g_c t)(c \cdot X_t)$ where $c$ is a positive vector.  The
+one-period SDF is
+
+$$
+s_{ij} = \exp(-\delta - \gamma g_c) \left(\frac{c_j}{c_i}\right)^{-\gamma}.
+$$
+
+This has the exact long-term risk pricing form with $\hat{e}_j = c_j^\gamma$ and
+$\hat{\eta} = -(\delta + \gamma g_c)$.  Therefore $\hat{h}_{ij} \equiv 1$ and **Ross
+recovery succeeds exactly** when consumption fluctuations around a deterministic trend
+are the only source of risk.
+
+```{code-cell} ipython3
+# Verify: for trend-stationary power utility, ĥ_ij = 1 identically
+gc = 0.002   # monthly trend growth
+
+# Trend-stationary: consumption growth ratio depends only on state, not history
+# s_ij = exp(-delta - gamma*gc) * (c_j/c_i)^(-gamma)
+S_trend = np.zeros((n, n))
+for i in range(n):
+    for j in range(n):
+        S_trend[i, j] = np.exp(-delta - gamma*gc) * (c_levels[j]/c_levels[i])**(-gamma)
+
+Q_trend = S_trend * P_phys
+
+_, exp_eta_t, e_hat_t, P_hat_t = perron_frobenius(Q_trend)
+
+H_incr_trend = np.where(P_phys > 0, P_hat_t / P_phys, 0.0)
+
+print("Martingale increment ĥ_ij for trend-stationary power utility:")
+print(np.round(H_incr_trend, 6))
+print(f"\nMax deviation from 1: {np.abs(H_incr_trend[P_phys>0] - 1).max():.2e}")
+print("→ Martingale is trivial: Recovery succeeds.")
+```
+
+### Recursive (Epstein–Zin) utility
+
+When the investor has **Epstein–Zin recursive preferences** with risk aversion
+$\gamma \neq 1$, continuation values $V_t$ satisfy the recursion
+
+$$
+V_t = \bigl[1-\exp(-\delta)\bigr] \log C_t
+      + \frac{\exp(-\delta)}{1-\gamma}
+        \log \mathbf{E}_t\bigl[\exp\bigl((1-\gamma)V_{t+1}\bigr)\bigr].
+$$
+
+The SDF takes the form (see {cite}`BorovickaHansenScheinkman2016`, Example 2)
+
+$$
+s_{ij} = \exp(-\delta - g_c)\frac{c_i}{c_j}
+         \left(\frac{v^*_j}{\mathbf{P}_i v^*}\right),
+$$
+
+where $v^*_i = \exp\!\bigl[(1-\gamma)v_i\bigr]$ and $\mathbf{P}_i$ is the $i$-th row of
+$\mathbf{P}$.  The additional factor $v^*_j/(\mathbf{P}_i v^*)$ introduces a **nontrivial
+martingale component** whenever continuation values vary across states.
+
+```{code-cell} ipython3
+def solve_ez_finite(P, c, delta, gamma, gc, tol=1e-12, max_iter=5000):
+    """
+    Solve for Epstein-Zin continuation values in finite Markov chain.
+
+    Solves the fixed-point v_i = (1-β)log(c_i) + β/(1-γ) log(P_i @ exp((1-γ)v))
+    where β = exp(-delta - gc).  The special case γ = 1 (log utility) is handled
+    separately to avoid the 0/0 indeterminate form: the recursion reduces to
+    v = (I - β P)^{-1} (1-β) log(c) and the SDF simplifies to
+    s_ij = exp(-δ - g_c) c_i / c_j.
+
+    Returns
+    -------
+    v     : ndarray — continuation values (net of time trend)
+    vstar : ndarray — exp((1-γ)v)
+    s     : ndarray — one-period SDF matrix
+    """
+    beta  = np.exp(-delta - gc)
+    log_c = np.log(c)
+    n     = len(c)
+
+    if abs(gamma - 1.0) < 1e-10:
+        # Log utility: (I - β P) v = (1-β) log c
+        v     = linalg.solve(np.eye(n) - beta * P, (1 - beta) * log_c)
+        vstar = np.ones(n)     # exp((1-1)*v) = 1
+        Pv    = np.ones(n)     # P @ ones = ones
+    else:
+        # General recursive utility: fixed-point iteration
+        v = log_c.copy()
+        for _ in range(max_iter):
+            vstar = np.exp((1 - gamma) * v)
+            Pv    = P @ vstar
+            v_new = ((1 - beta) * log_c
+                     + beta / (1 - gamma) * np.log(Pv))
+            if np.max(np.abs(v_new - v)) < tol:
+                v = v_new
+                break
+            v = v_new
+        vstar = np.exp((1 - gamma) * v)
+        Pv    = P @ vstar
+
+    # SDF matrix
+    s = np.zeros((n, n))
+    for i in range(n):
+        for j in range(n):
+            s[i, j] = np.exp(-delta - gc) * (c[i] / c[j]) * (vstar[j] / Pv[i])
+
+    return v, vstar, s
+
+
+# Compare: γ = 1 (log utility, degenerate martingale) vs γ = 5
+gc_ex = 0.001   # monthly consumption trend growth
+
+for gam, label in [(1.0, 'γ = 1  (log utility)'), (5.0, 'γ = 5  (risk aversion)')]:
+    v_ez, vstar_ez, S_ez = solve_ez_finite(P_phys, c_levels,
+                                            delta, gam, gc_ex)
+    Q_ez    = S_ez * P_phys
+    _, _, _, P_hat_ez = perron_frobenius(Q_ez)
+    H_ez    = np.where(P_phys > 0, P_hat_ez / P_phys, 0.0)
+
+    pi_hat_ez  = stationary_dist(P_hat_ez)
+    print(f"\n{label}")
+    print(f"  Continuation values v = {v_ez.round(4)}")
+    print(f"  Max |ĥ_ij - 1|        = {np.abs(H_ez[P_phys>0] - 1).max():.4f}")
+    print(f"  Stationary P̂         = {pi_hat_ez.round(4)}")
+    print(f"  Stationary P          = {pi_phys.round(4)}")
+```
+
+```{code-cell} ipython3
+# Show how the martingale depends on γ for recursive utility
+# Start at 1.0: the gamma=1 special case in solve_ez_finite is handled explicitly.
+gammas_ez = np.linspace(1.0, 10.0, 50)
+mart_errors = []
+pi_rec_hat  = []
+
+for gam in gammas_ez:
+    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, delta, gam, gc_ex)
+    Q_g          = S_g * P_phys
+    _, _, _, Ph  = perron_frobenius(Q_g)
+    H_g          = np.where(P_phys > 0, Ph / P_phys, 0.0)
+    mart_errors.append(np.abs(H_g[P_phys > 0] - 1).max())
+    pi_rec_hat.append(stationary_dist(Ph)[0])
+
+fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
+
+ax1.plot(gammas_ez, mart_errors, color='firebrick', lw=2.5)
+ax1.set_xlabel('Risk aversion  γ')
+ax1.set_ylabel(r'$\max_{i,j} |\hat{h}_{ij} - 1|$')
+ax1.set_title('Martingale non-degeneracy vs risk aversion\n(Epstein–Zin utility)')
+
+ax2.plot(gammas_ez, pi_rec_hat, color='steelblue', lw=2.5,
+         label=r'Recession weight under $\hat{P}$')
+ax2.axhline(pi_phys[0], ls='--', color='grey', lw=1.5,
+            label=f'Recession weight under $P$  ({pi_phys[0]:.3f})')
+ax2.set_xlabel('Risk aversion  γ')
+ax2.set_ylabel('Stationary probability')
+ax2.set_title('Recovered recession probability vs risk aversion')
+ax2.legend(fontsize=9)
+
+plt.tight_layout();  plt.show()
+```
+
+```{code-cell} ipython3
+# Visualize the martingale increment using Epstein-Zin utility (γ=5).
+# Trend-stationary power utility always yields ĥ_ij = 1 by construction (see Exercise 4),
+# so we use recursive utility here to reveal a genuinely non-trivial martingale.
+gamma_ez_demo = 5.0
+_, _, S_ez_demo = solve_ez_finite(P_phys, c_levels, delta, gamma_ez_demo, gc_ex)
+Q_ez_demo      = S_ez_demo * P_phys
+_, _, _, P_hat_ez_demo = perron_frobenius(Q_ez_demo)
+H_incr_ez      = np.where(P_phys > 0, P_hat_ez_demo / P_phys, 1.0)
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))
+
+vmax_h = max(1.5, H_incr_ez.max() * 1.05)
+vmin_h = min(0.5, H_incr_ez.min() * 0.95)
+im0 = axes[0].imshow(H_incr_ez, cmap='RdYlGn', vmin=vmin_h, vmax=vmax_h, aspect='auto')
+axes[0].set_title(
+    r'Martingale increment $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij}$' '\n'
+    r'(Epstein–Zin utility, $\gamma=5$)',
+    fontsize=11)
+for i in range(n):
+    for j in range(n):
+        axes[0].text(j, i, f'{H_incr_ez[i,j]:.3f}',
+                     ha='center', va='center', fontsize=10)
+axes[0].set_xticks(range(n));  axes[0].set_yticks(range(n))
+axes[0].set_xticklabels(state_names, rotation=20, fontsize=9)
+axes[0].set_yticklabels(state_names, fontsize=9)
+axes[0].set_xlabel('Next state');  axes[0].set_ylabel('Current state')
+plt.colorbar(im0, ax=axes[0], fraction=0.046)
+
+# How risk aversion γ shifts the recovered measure under Epstein-Zin utility.
+gammas_shift = np.linspace(1.0, 12, 60)
+rec_wts_ez   = []
+for g in gammas_shift:
+    _, _, S_g = solve_ez_finite(P_phys, c_levels, delta, g, gc_ex)
+    Q_g       = S_g * P_phys
+    _, _, _, Ph = perron_frobenius(Q_g)
+    rec_wts_ez.append(stationary_dist(Ph)[0])
+
+axes[1].plot(gammas_shift, rec_wts_ez, color='steelblue', lw=2.5)
+axes[1].axhline(pi_phys[0], color='grey', ls='--', lw=1.5,
+                label=fr'Physical recession prob = {pi_phys[0]:.3f}')
+axes[1].set_xlabel('Risk aversion  γ')
+axes[1].set_ylabel(r'Recession weight under $\hat{P}$')
+axes[1].set_title(r'How $\gamma$ shifts the long-term risk-neutral measure'
+                  '\n(Epstein–Zin utility)')
+axes[1].legend(fontsize=9)
+plt.tight_layout();  plt.show()
+```
+
+At $\gamma = 1$ (log utility), the continuation value is constant across states and the
+martingale is trivial, so recovery succeeds.  For $\gamma > 1$, continuation values vary
+with the state, generating a non-degenerate martingale that grows with risk aversion.
+
+## The Long-Run Risk Model
+
+We now illustrate the results quantitatively using the Bansal–Yaron
+{cite}`Bansal_Yaron_2004` long-run risk model, calibrated to {cite}`BorovickaHansenScheinkman2016`
+(Figure 1).
+
+### Model setup
+
+The state vector $X_t = (X_{1t}, X_{2t})'$ follows the continuous-time diffusion
+
+$$
+\begin{aligned}
+dX_{1t} &= \bar{\mu}_{11}(X_{1t} - \iota_1)\,dt + \sqrt{X_{2t}}\,\bar{\sigma}_1 dW_t \\
+dX_{2t} &= \bar{\mu}_{22}(X_{2t} - \iota_2)\,dt + \sqrt{X_{2t}}\,\bar{\sigma}_2 dW_t,
+\end{aligned}
+$$
+
+where $W_t$ is a three-dimensional Brownian motion.  Here $X_{1t}$ is the
+**predictable component of consumption growth** and $X_{2t}$ is **stochastic volatility**.
+
+The representative agent has Epstein–Zin preferences with unit elasticity of substitution.
+The stochastic discount factor satisfies
+
+$$
+d\log S_t = -\delta\,dt - d\log C_t + d\log H^*_t,
+$$
+
+where $H^*$ is a martingale determined by the continuation value of the recursive utility.
+
+```{code-cell} ipython3
+# Model parameters from Borovicka-Hansen-Scheinkman (2016), Figure 1
+# Monthly frequency
+lrr_params = dict(
+    delta    = 0.002,          # subjective discount rate
+    gamma    = 10.0,           # risk aversion
+    mu11     = -0.021,         # mean reversion of X1
+    mu12     = 0.0,            # (under P; becomes non-zero under P̂)
+    mu22     = -0.013,         # mean reversion of X2
+    iota1    = 0.0,            # long-run mean of X1
+    iota2    = 1.0,            # long-run mean of X2 (normalized)
+    sigma1   = np.array([0.0,  0.00034, 0.0  ]),   # diffusion of X1 (1×3)
+    sigma2   = np.array([0.0,  0.0,    -0.038]),    # diffusion of X2 (1×3)
+    beta_c0  = 0.0015,         # consumption drift constant
+    beta_c1  = 1.0,            # loading on X1
+    beta_c2  = 0.0,            # loading on X2
+    alpha_c  = np.array([0.0078, 0.0, 0.0]),        # consumption diffusion (1×3)
+)
+```
+
+### Solving the value function
+
+The log continuation value $v(X_t)$ is affine in the state: $v(x) = \bar{v}_0 + \bar{v}_1 x_1 + \bar{v}_2 x_2$.
+The coefficients satisfy the algebraic system in Appendix D of {cite}`BorovickaHansenScheinkman2016`.
+
+```{code-cell} ipython3
+def solve_value_function(p):
+    """
+    Solve for Epstein-Zin value function coefficients in the LRR model.
+
+    The continuation value satisfies:
+        log V_t = log C_t + v_bar0 + v_bar1*X1_t + v_bar2*X2_t
+
+    Returns v_bar1, v_bar2.
+    """
+    delta, gamma     = p['delta'], p['gamma']
+    mu11, mu12, mu22 = p['mu11'],  p['mu12'], p['mu22']
+    sigma1, sigma2   = p['sigma1'], p['sigma2']
+    beta_c1, beta_c2 = p['beta_c1'], p['beta_c2']
+    mu12_p, alpha_c  = p['mu12'], p['alpha_c']
+
+    # Linear equation for v̄₁
+    # δ v̄₁ = β̄c,1 + μ̄₁₁ v̄₁  ⟹  v̄₁ = β̄c,1 / (δ - μ̄₁₁)
+    v1 = beta_c1 / (delta - mu11)
+
+    # Quadratic equation for v̄₂
+    # 0 = (μ̄₂₂ - δ)v̄₂ + β̄c,2 + μ̄₁₂v̄₁ + ½(1-γ)|A + B v̄₂|²
+    # where A = ᾱc + σ̄₁ v̄₁,  B = σ̄₂
+    A_vec = alpha_c + sigma1 * v1
+    B_vec = sigma2
+
+    a = 0.5 * (1 - gamma) * np.dot(B_vec, B_vec)
+    b = (mu22 - delta) + (1 - gamma) * np.dot(A_vec, B_vec)
+    c = beta_c2 + mu12 * v1 + 0.5 * (1 - gamma) * np.dot(A_vec, A_vec)
+
+    disc = b**2 - 4*a*c
+    if disc < 0:
+        raise ValueError("Value function does not exist for these parameters.")
+
+    # "Minus" solution (generates ergodic dynamics under P̂)
+    v2 = (-b - np.sqrt(disc)) / (2 * a)
+    return v1, v2, A_vec, B_vec
+
+
+v1, v2, A_vec, B_vec = solve_value_function(lrr_params)
+print(f"Value-function slope on X1:  v̄₁ = {v1:.4f}")
+print(f"Value-function slope on X2:  v̄₂ = {v2:.4f}")
+print(f"\nInterpretation:")
+print(f"  Higher X1 (better expected growth) raises continuation value (v̄₁ > 0)")
+print(f"  Higher X2 (more volatility) lowers continuation value (v̄₂ < 0)")
+```
+
+### Perron–Frobenius and recovered dynamics
+
+```{code-cell} ipython3
+def solve_pf_lrr(p, v1, v2, A_vec):
+    """
+    Solve the Perron-Frobenius problem for the long-run risk model.
+
+    Eigenfunction guess: ê(x) = exp(ē₁ x₁ + ē₂ x₂).
+
+    Returns ē₁, ē₂, η̂, and the SDF diffusion vector ᾱs.
+    """
+    delta, gamma     = p['delta'], p['gamma']
+    mu11, mu12, mu22 = p['mu11'],  p['mu12'], p['mu22']
+    iota1, iota2     = p['iota1'], p['iota2']
+    sigma1, sigma2   = p['sigma1'], p['sigma2']
+    alpha_c          = p['alpha_c']
+    beta_c0          = p['beta_c0']
+    beta_c1, beta_c2 = p['beta_c1'], p['beta_c2']
+
+    # SDF diffusion:  ᾱs = −γ ᾱc + (1−γ)(σ̄₁v̄₁ + σ̄₂v̄₂)
+    alpha_s = (-gamma * alpha_c
+               + (1 - gamma) * (sigma1 * v1 + sigma2 * v2))
+
+    # SDF drift parameters in  βs(x) = β̄s0 + β̄s11(x1−ι1) + β̄s12(x2−ι2)
+    beta_s11 = -beta_c1
+    beta_s12 = -beta_c2 - 0.5 * np.dot(alpha_s, alpha_s)
+    beta_s0  = (-delta - beta_c0
+                - 0.5 * iota2 * np.dot(alpha_s, alpha_s))
+
+    # Equation 0 = β̄s11 + μ̄₁₁ ē₁  ⟹  ē₁ = −β̄s11 / μ̄₁₁
+    e1 = -beta_s11 / mu11
+
+    # Quadratic for ē₂
+    # 0 = (β̄s12 + ½|ᾱs|²)  +  ē₁(μ̄₁₂ + σ̄₁·ᾱs) + ½ē₁²|σ̄₁|²
+    #     + ē₂(μ̄₂₂ + σ̄₂·ᾱs + ē₁ σ̄₁·σ̄₂') + ½ē₂²|σ̄₂|²
+    const_pf = (beta_s12 + 0.5*np.dot(alpha_s, alpha_s)    # = 0 by construction
+                + e1*(mu12 + np.dot(sigma1, alpha_s))
+                + 0.5*e1**2*np.dot(sigma1, sigma1))
+    lin_pf   = mu22 + np.dot(sigma2, alpha_s) + e1*np.dot(sigma1, sigma2)
+    quad_pf  = 0.5 * np.dot(sigma2, sigma2)
+
+    disc = lin_pf**2 - 4*quad_pf*const_pf
+    e2_m = (-lin_pf - np.sqrt(disc)) / (2*quad_pf)
+    e2_p = (-lin_pf + np.sqrt(disc)) / (2*quad_pf)
+
+    # η̂ = β̄s0 - β̄s12·ι₂ - ē₂·μ̄₂₂·ι₂  (ι₁ = 0)
+    eta_m = beta_s0 - beta_s12*iota2 - e2_m*mu22*iota2
+    eta_p = beta_s0 - beta_s12*iota2 - e2_p*mu22*iota2
+
+    # Choose solution with smaller |η̂| (ergodicity requirement)
+    if abs(eta_m) <= abs(eta_p):
+        e2, eta_hat = e2_m, eta_m
+    else:
+        e2, eta_hat = e2_p, eta_p
+
+    return e1, e2, eta_hat, alpha_s
+
+
+e1, e2, eta_hat_lrr, alpha_s = solve_pf_lrr(lrr_params, v1, v2, A_vec)
+
+print(f"PF eigenfunction coefficients:  ē₁ = {e1:.4f},  ē₂ = {e2:.4f}")
+print(f"Log eigenvalue:                 η̂  = {eta_hat_lrr:.6f}  "
+      f"(annualized = {eta_hat_lrr*12:.4f})")
+print(f"\nInterpretation:")
+print(f"  ē₁ = {e1:.2f}: ê down-weights high-X1 (good growth) states")
+print(f"  ē₂ = {e2:.2f}: ê up-weights high-X2 (high volatility) states")
+```
+
+### Computing the P̂ dynamics
+
+```{code-cell} ipython3
+def compute_phat_dynamics(p, e1, e2, alpha_s):
+    """
+    Compute the drift parameters of X under the recovered measure P̂.
+
+    Under P̂, the Brownian motion is
+        dŴ_t = −√X₂_t α̂_h dt + dW_t
+    where α̂_h = ᾱs + σ̄₁ ē₁ + σ̄₂ ē₂.
+    """
+    mu11, mu12, mu22 = p['mu11'],  p['mu12'], p['mu22']
+    iota1, iota2     = p['iota1'], p['iota2']
+    sigma1, sigma2   = p['sigma1'], p['sigma2']
+
+    # Martingale drift correction
+    alpha_h = alpha_s + sigma1 * e1 + sigma2 * e2
+
+    # New drift parameters under P̂
+    mu_hat_11 = mu11
+    mu_hat_12 = mu12 + np.dot(sigma1, alpha_h)
+    mu_hat_22 = mu22 + np.dot(sigma2, alpha_h)
+
+    # New long-run means
+    iota_hat_2 = (mu22 / mu_hat_22) * iota2
+    iota_hat_1 = (iota1
+                  + (1.0/mu11) * (mu12*iota2 - mu_hat_12*iota_hat_2))
+
+    return dict(
+        mu_hat_11  = mu_hat_11,
+        mu_hat_12  = mu_hat_12,
+        mu_hat_22  = mu_hat_22,
+        iota_hat_1 = iota_hat_1,
+        iota_hat_2 = iota_hat_2,
+        alpha_h    = alpha_h,
+        sigma1     = sigma1,
+        sigma2     = sigma2,
+    )
+
+
+phat_dyn = compute_phat_dynamics(lrr_params, e1, e2, alpha_s)
+
+print("Dynamics of X under P̂  (vs physical P):")
+print(f"  μ̂₁₁ = {phat_dyn['mu_hat_11']:.4f}  "
+      f"(same as physical μ̄₁₁ = {lrr_params['mu11']:.4f})")
+print(f"  μ̂₁₂ = {phat_dyn['mu_hat_12']:.6f}  "
+      f"(physical = 0 — new coupling created by risk adjustment)")
+print(f"  μ̂₂₂ = {phat_dyn['mu_hat_22']:.5f}  "
+      f"(physical = {lrr_params['mu22']:.4f})")
+print(f"  ι̂₁  = {phat_dyn['iota_hat_1']:.5f}  "
+      f"(physical ι₁ = {lrr_params['iota1']:.4f}  — lower mean growth under P̂)")
+print(f"  ι̂₂  = {phat_dyn['iota_hat_2']:.5f}  "
+      f"(physical ι₂ = {lrr_params['iota2']:.4f}  — higher mean volatility under P̂)")
+```
+
+### Simulating and comparing stationary distributions
+
+```{code-cell} ipython3
+def simulate_lrr(dyn, T=600_000, seed=42):
+    """
+    Simulate the LRR state vector using Euler-Maruyama (monthly steps).
+
+    Parameters
+    ----------
+    dyn  : dict with mu11, mu12, mu22, iota1, iota2, sigma1, sigma2
+    T    : number of monthly steps
+    seed : random seed
+
+    Returns
+    -------
+    X1, X2 : ndarray — stationary sample paths (burn-in discarded)
+    """
+    rng     = np.random.default_rng(seed)
+    mu11    = dyn.get('mu11',     dyn.get('mu_hat_11'))
+    mu12    = dyn.get('mu12',     dyn.get('mu_hat_12', 0.0))
+    mu22    = dyn.get('mu22',     dyn.get('mu_hat_22'))
+    iota1   = dyn.get('iota1',    dyn.get('iota_hat_1'))
+    iota2   = dyn.get('iota2',    dyn.get('iota_hat_2'))
+    sigma1  = dyn['sigma1']
+    sigma2  = dyn['sigma2']
+
+    X1 = np.zeros(T)
+    X2 = np.full(T, iota2)
+
+    for t in range(1, T):
+        X2t      = max(X2[t-1], 1e-9)
+        sq_X2    = np.sqrt(X2t)
+        dW       = rng.standard_normal(3)          # monthly Δt = 1
+
+        X1[t] = X1[t-1] + (mu11*(X1[t-1]-iota1) + mu12*(X2t-iota2)) + sq_X2*np.dot(sigma1, dW)
+        X2[t] = max(X2[t-1] + mu22*(X2t-iota2)   + sq_X2*np.dot(sigma2, dW),  1e-9)
+
+    burn = T // 5
+    return X1[burn:], X2[burn:]
+
+
+# Simulation under physical P
+print("Simulating under physical measure P ...")
+X1_P, X2_P = simulate_lrr(
+    dict(mu11=lrr_params['mu11'],  mu12=lrr_params['mu12'],
+         mu22=lrr_params['mu22'],  iota1=lrr_params['iota1'],
+         iota2=lrr_params['iota2'],
+         sigma1=lrr_params['sigma1'], sigma2=lrr_params['sigma2']),
+    T=600_000
+)
+
+# Simulation under recovered measure P̂
+print("Simulating under recovered measure P̂ ...")
+X1_Ph, X2_Ph = simulate_lrr(
+    dict(mu11    = phat_dyn['mu_hat_11'],
+         mu12    = phat_dyn['mu_hat_12'],
+         mu22    = phat_dyn['mu_hat_22'],
+         iota1   = phat_dyn['iota_hat_1'],
+         iota2   = phat_dyn['iota_hat_2'],
+         sigma1  = lrr_params['sigma1'],
+         sigma2  = lrr_params['sigma2']),
+    T=600_000
+)
+print("Done.")
+```
+
+```{code-cell} ipython3
+# Reproduce Figure 1 of Borovicka-Hansen-Scheinkman (2016)
+def kde2d_contour(ax, X1, X2, levels=8, color='k', alpha=1.0, lw=1.5,
+                  bandwidth=None):
+    """Plot contour lines of a 2D kernel density estimate."""
+    xy     = np.vstack([X2, X1])
+    kde    = gaussian_kde(xy, bw_method=bandwidth)
+    x2g    = np.linspace(X2.min()*0.9, X2.max()*1.1, 120)
+    x1g    = np.linspace(X1.min()*0.9, X1.max()*1.1, 120)
+    X2g, X1g = np.meshgrid(x2g, x1g)
+    Z      = kde(np.vstack([X2g.ravel(), X1g.ravel()])).reshape(X2g.shape)
+    ax.contour(X2g, X1g, Z, levels=levels, colors=color, alpha=alpha,
+               linewidths=lw)
+
+fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5), sharey=True)
+
+# Left panel: distribution under P
+kde2d_contour(ax1, X1_P, X2_P, color='navy', levels=7)
+ax1.set_xlabel('Conditional volatility  $X_2$', fontsize=11)
+ax1.set_ylabel('Mean growth rate  $X_1$',       fontsize=11)
+ax1.set_title(r'Physical measure  $P$', fontsize=12)
+
+# Right panel: distribution under P̂, plus outermost contour of P̄ (risk-neutral)
+kde2d_contour(ax2, X1_Ph, X2_Ph, color='navy', levels=7)
+ax2.set_xlabel('Conditional volatility  $X_2$', fontsize=11)
+ax2.set_title(r'Long-term risk-neutral  $\hat{P}$', fontsize=12)
+
+# Annotate distributional shifts
+for ax in (ax1, ax2):
+    ax.axhline(0, color='grey', lw=0.8, ls='--')
+    ax.axvline(lrr_params['iota2'], color='grey', lw=0.8, ls='--')
+
+ax1.annotate(f"Mean X₁ ≈ {X1_P.mean():.4f}", xy=(0.05, 0.92),
+             xycoords='axes fraction', fontsize=9, color='navy')
+ax1.annotate(f"Mean X₂ ≈ {X2_P.mean():.4f}", xy=(0.05, 0.85),
+             xycoords='axes fraction', fontsize=9, color='navy')
+ax2.annotate(f"Mean X₁ ≈ {X1_Ph.mean():.4f}", xy=(0.05, 0.92),
+             xycoords='axes fraction', fontsize=9, color='navy')
+ax2.annotate(f"Mean X₂ ≈ {X2_Ph.mean():.4f}", xy=(0.05, 0.85),
+             xycoords='axes fraction', fontsize=9, color='navy')
+
+plt.suptitle('Stationary Distributions of $(X_1, X_2)$ Under $P$ and $\\hat{P}$\n'
+             '(reproducing Figure 1 of Borovička, Hansen & Scheinkman 2016)',
+             fontsize=12, y=1.02)
+plt.tight_layout();  plt.show()
+```
+
+The recovered measure $\hat{P}$ concentrates around **lower mean growth** (more negative
+$X_1$) and **higher conditional volatility** (larger $X_2$).  Forecasts made using
+$\hat{P}$ are systematically pessimistic compared to forecasts based on the true
+distribution $P$.
+
+## Measuring the Martingale Component
+
+### Entropy bounds
+
+Even without observing the full array of Arrow prices, we can obtain **lower bounds**
+on the size of the martingale component.  For a convex function
+$\phi_\theta(r) = [(r)^{1+\theta} - 1] / [\theta(1+\theta)]$, the discrepancy
+between $\hat{P}$ and $P$ satisfies
+
+$$
+\lambda_\theta = E\!\left[\phi_\theta\!\left(\frac{\hat{H}_{t+1}}{\hat{H}_t}\right)\right]
+\geq 0,
+$$
+
+with equality if and only if the martingale is trivial.  Two special cases are:
+
+- **$\theta = -1$**: $\phi_{-1}(r) = -\log r$, so $\lambda_{-1} = -E[\log(\hat{H}_{t+1}/\hat{H}_t)]$ is the **expected log-likelihood** (entropy).
+- **$\theta = 1$**: $\lambda_1 = \tfrac{1}{2}\mathrm{Var}[\hat{H}_{t+1}/\hat{H}_t]$.
+
+```{code-cell} ipython3
+def phi_theta(r, theta):
+    """Discrepancy function φ_θ(r) = [(r)^{1+θ} - 1] / [θ(1+θ)]."""
+    if abs(theta) < 1e-10:      # θ → 0: relative entropy r log r
+        return r * np.log(r)
+    if abs(theta + 1) < 1e-10:  # θ → -1: -log r
+        return -np.log(r)
+    return (r**(1 + theta) - 1) / (theta * (1 + theta))
+
+
+def martingale_entropy(Q, P, theta=-1):
+    """
+    Compute the stationary-average discrepancy E[φ_θ(ĥ)] for the finite-state chain.
+    """
+    _, exp_eta, e_hat, P_hat = perron_frobenius(Q)
+    H_incr   = np.where(P > 0, P_hat / P, 1.0)  # ĥ_ij
+    pi_hat   = stationary_dist(P_hat)
+
+    # Stationary-average: Σ_i Σ_j π̂_i ĥ_ij p_ij  φ_θ(ĥ_ij)
+    disc = 0.0
+    for i in range(P.shape[0]):
+        for j in range(P.shape[1]):
+            if P[i, j] > 0:
+                disc += pi_hat[i] * P[i, j] * phi_theta(H_incr[i, j], theta)
+    return disc
+
+
+# Compute entropy for different γ values
+gammas_ent = np.linspace(1.0, 10.0, 50)  # gamma=1 handled by solve_ez_finite
+entropies  = {'θ=-1 (neg. log)': [], 'θ=0 (rel. entropy)': [], 'θ=1 (variance/2)': []}
+
+for gam in gammas_ent:
+    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, delta, gam, gc_ex)
+    Q_g         = S_g * P_phys
+    for theta, key in [(-1, 'θ=-1 (neg. log)'), (0, 'θ=0 (rel. entropy)'),
+                        (1, 'θ=1 (variance/2)')]:
+        entropies[key].append(martingale_entropy(Q_g, P_phys, theta=theta))
+
+fig, ax = plt.subplots(figsize=(8, 4.5))
+colors_ent = ['firebrick', 'darkorange', 'steelblue']
+for (label, vals), col in zip(entropies.items(), colors_ent):
+    ax.plot(gammas_ent, vals, label=label, color=col, lw=2)
+
+ax.set_xlabel('Risk aversion  γ')
+ax.set_ylabel(r'$E[\phi_\theta(\hat{H}_{t+1}/\hat{H}_t)]$')
+ax.set_title('Discrepancy Measures for the Martingale Component\n'
+             '(larger values ↔ larger deviation from Ross recovery)')
+ax.legend(fontsize=9)
+plt.tight_layout();  plt.show()
+```
+
+All three discrepancy measures increase with risk aversion, confirming that a higher
+$\gamma$ implies a larger — and more economically significant — martingale component.
+{cite}`AlvarezJermann2005` and {cite}`BakshiChabiYo2012` use analogous bounds with
+long-maturity bond returns to find empirically large martingale components in U.S. data.
+
+## Exercises
+
+```{exercise}
+:label: ex_risk_neutral
+
+**Verify risk-neutral probabilities.** Consider a two-state Markov chain with physical
+transition matrix
+
+$$
+\mathbf{P} = \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix}
+$$
+
+and Arrow price matrix
+
+$$
+\mathbf{Q} = \begin{pmatrix} 0.72 & 0.15 \\ 0.36 & 0.42 \end{pmatrix}.
+$$
+
+1. Compute the risk-neutral transition matrix $\bar{\mathbf{P}}$ and verify it is a
+   valid probability matrix.
+2. Compute the one-period discount bond prices and the implied risk-free rates in each
+   state.
+3. Show that the SDF $\bar{s}_{ij} = \bar{q}_i$ is independent of the next state $j$.
+```
+
+```{solution-start} ex_risk_neutral
+:class: dropdown
+```
+
+```{code-cell} ipython3
+# Exercise 1 solution
+P2 = np.array([[0.8, 0.2],
+               [0.4, 0.6]])
+Q2 = np.array([[0.72, 0.15],
+               [0.36, 0.42]])
+
+P_bar2, q_bonds2 = risk_neutral_probs(Q2)
+
+print("Risk-neutral transition matrix P̄:")
+print(np.round(P_bar2, 4))
+print(f"\nRow sums: {P_bar2.sum(axis=1)}")
+print(f"\nOne-period bond prices q̄ᵢ: {q_bonds2}")
+print(f"Annualized risk-free rates: {(-np.log(q_bonds2)*12).round(4)}")
+
+# Verify SDF independence from j
+S2 = Q2 / P2
+print(f"\nSDF matrix S = Q/P:")
+print(np.round(S2, 4))
+print("Row 0: all entries should equal q̄₀ =", round(q_bonds2[0], 4))
+print("Row 1: all entries should equal q̄₁ =", round(q_bonds2[1], 4))
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: ex_gamma_sensitivity
+
+**Risk aversion and recovery distortion.** Using the three-state example from the
+lecture (with $\delta = 0.99$ and trend-stationary consumption levels
+$c = [0.85, 1.00, 1.15]$), investigate how the recovered probability
+vector $\hat{\boldsymbol{\pi}}$ depends on the risk aversion parameter $\gamma$.
+
+1. For each $\gamma \in \{1, 2, 5, 10, 15\}$, compute the long-term risk-neutral
+   stationary distribution $\hat{\boldsymbol{\pi}}$.
+2. Plot all five distributions as grouped bar charts alongside the physical
+   distribution $\boldsymbol{\pi}$.
+3. At what value of $\gamma$ does the recession probability under $\hat{\mathbf{P}}$
+   exceed $50\%$?
+```
+
+```{solution-start} ex_gamma_sensitivity
+:class: dropdown
+```
+
+```{code-cell} ipython3
+# Exercise 2 solution
+gammas_ex2 = [1, 2, 5, 10, 15]
+all_pi = []
+
+for gam in gammas_ex2:
+    Q_g = np.zeros((3, 3))
+    for i in range(3):
+        for j in range(3):
+            Q_g[i, j] = delta * (c_levels[j]/c_levels[i])**(-gam) * P_phys[i, j]
+    _, _, _, Ph_g = perron_frobenius(Q_g)
+    all_pi.append(stationary_dist(Ph_g))
+
+fig, ax = plt.subplots(figsize=(10, 4.5))
+x   = np.arange(3)
+w   = 0.13
+colors_g = plt.cm.Blues(np.linspace(0.3, 0.9, len(gammas_ex2)))
+
+# Physical distribution
+bars = ax.bar(x - 3*w, pi_phys, width=w, color='grey', alpha=0.7, label='Physical P')
+for b_, v in zip(bars, pi_phys):
+    ax.text(b_.get_x()+w/2, v+0.005, f'{v:.3f}', ha='center', va='bottom', fontsize=7)
+
+for k, (gam, pi_g, col) in enumerate(zip(gammas_ex2, all_pi, colors_g)):
+    bars = ax.bar(x + (k-1.5)*w, pi_g, width=w, color=col,
+                  label=f'γ={gam}')
+    for b_, v in zip(bars, pi_g):
+        ax.text(b_.get_x()+w/2, v+0.005, f'{v:.3f}',
+                ha='center', va='bottom', fontsize=7)
+
+ax.set_xticks(x);  ax.set_xticklabels(state_names)
+ax.set_ylabel('Stationary probability')
+ax.set_title(r'Stationary distribution of $\hat{P}$ for varying risk aversion $\gamma$')
+ax.legend(fontsize=8, loc='upper right')
+plt.tight_layout();  plt.show()
+
+# Part 3: find γ where recession probability under P̂ exceeds 50%
+gammas_fine = np.linspace(1, 30, 200)
+rec_probs = []
+for gam in gammas_fine:
+    Q_g = np.array([[delta*(c_levels[j]/c_levels[i])**(-gam)*P_phys[i,j]
+                     for j in range(3)] for i in range(3)])
+    _, _, _, Ph_g = perron_frobenius(Q_g)
+    rec_probs.append(stationary_dist(Ph_g)[0])
+
+# Interpolate crossing point
+idx50 = np.where(np.array(rec_probs) > 0.5)[0]
+if len(idx50) > 0:
+    print(f"\nRecession prob under P̂ exceeds 50% at approximately γ ≈ {gammas_fine[idx50[0]]:.1f}")
+else:
+    print(f"\nRecession prob under P̂ does not exceed 50% for γ ≤ 30")
+    print(f"  Maximum recession prob = {max(rec_probs):.4f} at γ = 30")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: ex_lrr_gamma
+
+**Effect of risk aversion in the long-run risk model.** Repeat the long-run risk
+simulation from the lecture for $\gamma \in \{5, 10, 15\}$ (keeping all other
+parameters fixed at their calibrated values).
+
+1. For each $\gamma$, compute $(\bar{e}_1, \bar{e}_2)$ and $\hat{\eta}$.
+2. Plot $\hat{\iota}_1$ (long-run mean of $X_1$ under $\hat{P}$) as a function of $\gamma$.
+   Interpret the result in terms of long-run expected consumption growth.
+3. Plot $\hat{\iota}_2$ (long-run mean of $X_2$ under $\hat{P}$) as a function of $\gamma$.
+   Interpret in terms of long-run volatility.
+```
+
+```{solution-start} ex_lrr_gamma
+:class: dropdown
+```
+
+```{code-cell} ipython3
+# Exercise 3 solution
+gammas_lrr = np.linspace(2.0, 18.0, 40)
+iota_hat_1_vals = []
+iota_hat_2_vals = []
+eta_hat_vals    = []
+
+p_copy = dict(lrr_params)  # copy to modify gamma
+
+for gam in gammas_lrr:
+    p_copy['gamma'] = gam
+    try:
+        v1g, v2g, A_g, _ = solve_value_function(p_copy)
+        e1g, e2g, eta_g, alpha_sg = solve_pf_lrr(p_copy, v1g, v2g, A_g)
+        dyn_g = compute_phat_dynamics(p_copy, e1g, e2g, alpha_sg)
+        iota_hat_1_vals.append(dyn_g['iota_hat_1'])
+        iota_hat_2_vals.append(dyn_g['iota_hat_2'])
+        eta_hat_vals.append(eta_g)
+    except Exception:
+        iota_hat_1_vals.append(np.nan)
+        iota_hat_2_vals.append(np.nan)
+        eta_hat_vals.append(np.nan)
+
+fig, axes = plt.subplots(1, 3, figsize=(14, 4))
+
+axes[0].plot(gammas_lrr, iota_hat_1_vals, color='steelblue', lw=2.5)
+axes[0].axhline(lrr_params['iota1'], ls='--', color='grey', lw=1.5,
+                label=f"Physical ι₁ = {lrr_params['iota1']}")
+axes[0].set_xlabel('Risk aversion  γ');  axes[0].set_ylabel(r'$\hat{\iota}_1$')
+axes[0].set_title('Long-run mean of $X_1$ under $\\hat{P}$\n(↓ = lower expected growth)')
+axes[0].legend(fontsize=9)
+
+axes[1].plot(gammas_lrr, iota_hat_2_vals, color='firebrick', lw=2.5)
+axes[1].axhline(lrr_params['iota2'], ls='--', color='grey', lw=1.5,
+                label=f"Physical ι₂ = {lrr_params['iota2']}")
+axes[1].set_xlabel('Risk aversion  γ');  axes[1].set_ylabel(r'$\hat{\iota}_2$')
+axes[1].set_title('Long-run mean of $X_2$ under $\\hat{P}$\n(↑ = higher expected volatility)')
+axes[1].legend(fontsize=9)
+
+axes[2].plot(gammas_lrr, np.array(eta_hat_vals)*12, color='purple', lw=2.5)
+axes[2].set_xlabel('Risk aversion  γ');  axes[2].set_ylabel(r'Annualized $\hat{\eta}$')
+axes[2].set_title('Long-run discount rate $\\hat{\\eta}$\n(more negative = higher long-run yield)')
+
+plt.tight_layout();  plt.show()
+
+print("Higher γ → more negative ι̂₁ (P̂ expects lower growth than P)")
+print("Higher γ → higher ι̂₂ (P̂ expects higher volatility than P)")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: ex_recovery_test
+
+**Testing the Ross recovery condition.** Show algebraically and numerically that, for
+any $n$-state power-utility model with trend-stationary consumption (as in Example 1 of
+{cite}`BorovickaHansenScheinkman2016`), the martingale increment satisfies
+$\hat{h}_{ij} \equiv 1$.
+
+1. Write the SDF as $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ for some constant $A$.
+   Show that the Perron-Frobenius eigenvector is $\hat{e}_j = c_j^\gamma$ (up to scale)
+   and find $\hat{\eta}$.
+2. Compute $\hat{p}_{ij} = \exp(-\hat{\eta}) q_{ij} \hat{e}_j / \hat{e}_i$ and verify
+   it equals $p_{ij}$.
+3. Confirm numerically for the three-state example with $\gamma = 5$ and
+   $c = [0.85, 1.00, 1.15]$.
+```
+
+```{solution-start} ex_recovery_test
+:class: dropdown
+```
+
+**Analytical derivation:**
+
+With $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ we have $q_{ij} = A(c_j/c_i)^{-\gamma} p_{ij}$.
+Guess $\hat{e}_j = c_j^\gamma$.  Then
+
+$$
+[\mathbf{Q} \hat{\mathbf{e}}]_i
+= \sum_j q_{ij} \hat{e}_j
+= A \sum_j \frac{c_j^{-\gamma}}{c_i^{-\gamma}} p_{ij} \cdot c_j^\gamma
+= A \sum_j p_{ij}
+= A.
+$$
+
+So $\mathbf{Q}\hat{\mathbf{e}} = A \hat{\mathbf{e}}$, confirming $\hat{\mathbf{e}} = \{c_j^\gamma\}$
+and $\exp(\hat{\eta}) = A$.  Therefore
+
+$$
+\hat{p}_{ij}
+= \frac{1}{A} q_{ij} \frac{\hat{e}_j}{\hat{e}_i}
+= \frac{1}{A} \cdot A \frac{c_j^{-\gamma}}{c_i^{-\gamma}} p_{ij}
+  \cdot \frac{c_j^\gamma}{c_i^\gamma}
+= p_{ij}.
+$$
+
+Hence $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij} = 1$ for all $(i,j)$.
+
+```{code-cell} ipython3
+# Exercise 4 numerical verification
+# Use trend-stationary power utility (Section "When Does Recovery Succeed?")
+gc_ex4 = 0.002
+S_ts   = np.zeros((3, 3))
+for i in range(3):
+    for j in range(3):
+        S_ts[i, j] = np.exp(-delta - gamma*gc_ex4) * (c_levels[j]/c_levels[i])**(-gamma)
+
+Q_ts = S_ts * P_phys
+
+# Perron-Frobenius
+_, exp_eta_ts, e_hat_ts, P_hat_ts = perron_frobenius(Q_ts)
+
+# Check eigenvector is proportional to c^gamma
+e_theory = c_levels**gamma
+e_theory /= e_theory.sum()
+
+print("Computed eigenvector ê:", np.round(e_hat_ts, 6))
+print("Theoretical cʸ / norm: ", np.round(e_theory, 6))
+print(f"Max discrepancy: {np.abs(e_hat_ts - e_theory).max():.2e}")
+
+H_ts = np.where(P_phys > 0, P_hat_ts / P_phys, 0.0)
+print(f"\nMartingale increment matrix ĥ:")
+print(np.round(H_ts, 6))
+print(f"Max |ĥ_ij - 1|: {np.abs(H_ts[P_phys>0] - 1).max():.2e}")
+print("→ Recovery is exact for trend-stationary power utility.")
+```
+
+```{solution-end}
+```
+
+## References
+
+```{bibliography}
+:filter: docname in docnames
+```
diff --git a/lectures/misspecified_recovery_extra.bib b/lectures/misspecified_recovery_extra.bib
new file mode 100644
index 000000000..345a7011c
--- /dev/null
+++ b/lectures/misspecified_recovery_extra.bib
@@ -0,0 +1,111 @@
+% Supplementary bibliography for misspecified_recovery.md
+% Add these entries to lectures/_static/quant-econ.bib
+
+@article{BorovickaHansenScheinkman2016,
+  author  = {Borovička, Jaroslav and Hansen, Lars Peter and Scheinkman, José A.},
+  title   = {Misspecified Recovery},
+  journal = {Journal of Finance},
+  volume  = {71},
+  number  = {6},
+  pages   = {2493--2544},
+  year    = {2016},
+  doi     = {10.1111/jofi.12404}
+}
+
+@article{Ross2015,
+  author  = {Ross, Stephen A.},
+  title   = {The Recovery Theorem},
+  journal = {Journal of Finance},
+  volume  = {70},
+  number  = {2},
+  pages   = {615--648},
+  year    = {2015},
+  doi     = {10.1111/jofi.12092}
+}
+
+@article{HansenScheinkman2009,
+  author  = {Hansen, Lars Peter and Scheinkman, José A.},
+  title   = {Long-Term Risk: An Operator Approach},
+  journal = {Econometrica},
+  volume  = {77},
+  number  = {1},
+  pages   = {177--234},
+  year    = {2009},
+  doi     = {10.3982/ECTA6761}
+}
+
+@article{AlvarezJermann2005,
+  author  = {Alvarez, Fernando and Jermann, Urban J.},
+  title   = {Using Asset Prices to Measure the Persistence of the Marginal Utility
+             of Wealth},
+  journal = {Econometrica},
+  volume  = {73},
+  number  = {6},
+  pages   = {1977--2016},
+  year    = {2005},
+  doi     = {10.1111/j.1468-0262.2005.00643.x}
+}
+
+@article{BakshiChabiYo2012,
+  author  = {Bakshi, Gurdip and Chabi-Yo, Fousseni},
+  title   = {Variance Bounds on the Permanent and Transitory Components of
+             Stochastic Discount Factors},
+  journal = {Journal of Financial Economics},
+  volume  = {105},
+  number  = {1},
+  pages   = {191--208},
+  year    = {2012},
+  doi     = {10.1016/j.jfineco.2011.10.004}
+}
+
+@article{BackusGregoryZin1989,
+  author  = {Backus, David K. and Gregory, Allan W. and Zin, Stanley E.},
+  title   = {Risk Premiums in the Term Structure: Evidence from Artificial Economies},
+  journal = {Journal of Monetary Economics},
+  volume  = {24},
+  number  = {3},
+  pages   = {371--399},
+  year    = {1989},
+  doi     = {10.1016/0304-3932(89)90033-X}
+}
+
+@article{Hansen2012,
+  author  = {Hansen, Lars Peter},
+  title   = {Dynamic Valuation Decomposition within Stochastic Economies},
+  journal = {Econometrica},
+  volume  = {80},
+  number  = {3},
+  pages   = {911--967},
+  year    = {2012},
+  note    = {Fisher--Schultz Lecture at the European Meetings of the Econometric Society},
+  doi     = {10.3982/ECTA8070}
+}
+
+@article{KrepsPorteus1978,
+  author  = {Kreps, David M. and Porteus, Evan L.},
+  title   = {Temporal Resolution of Uncertainty and Dynamic Choice Theory},
+  journal = {Econometrica},
+  volume  = {46},
+  number  = {1},
+  pages   = {185--200},
+  year    = {1978},
+  doi     = {10.2307/1913656}
+}
+
+@article{HansenScheinkman2014,
+  author  = {Hansen, Lars Peter and Scheinkman, José A.},
+  title   = {Stochastic Compounding and Uncertain Valuation},
+  year    = {2014},
+  note    = {Working paper, University of Chicago, Columbia University, Princeton University}
+}
+
+@article{BackusChernovZin2014,
+  author  = {Backus, David K. and Chernov, Mikhail and Zin, Stanley E.},
+  title   = {Sources of Entropy in Representative Agent Models},
+  journal = {Journal of Finance},
+  volume  = {69},
+  number  = {1},
+  pages   = {51--99},
+  year    = {2014},
+  doi     = {10.1111/jofi.12090}
+}

From 76d1a159eb6b22d7634d7015b08f38212a014d48 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 22 Apr 2026 14:21:59 +1000
Subject: [PATCH 08/26] updates

---
 lectures/information_market_equilibrium.md | 237 +++------------------
 1 file changed, 33 insertions(+), 204 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index fdba548c8..5dffc3b90 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -71,11 +71,9 @@ Important findings of {cite:t}`kihlstrom_mirman1975` are:
   insider's posterior distribution to the equilibrium price is one-to-one on
   the set of
   posteriors that can actually arise from the signal.
-- In the paper's two-state theorem, invertibility holds when the informed
-  agent's utility is homothetic and the elasticity of substitution is everywhere
-  either below one or above one, with CES preferences providing a convenient
-  illustration and Cobb-Douglas preferences ($\sigma = 1$) giving the opposite
-  case in which the equilibrium price is independent of the insider's posterior.
+  - Invertibility holds when the informed
+    agent's utility is homothetic and the elasticity of substitution is everywhere
+    either below one or above one.
 - In the dynamic economy, as information accumulates, Bayesian price
   expectations converge to **rational expectations**, even when the deep
   structure is not identified from prices alone.
@@ -230,8 +228,8 @@ $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 ```{prf:definition} Sufficiency
 :label: ime_def_sufficiency
 
-A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
-respect to $\bar{a}$) if there exists a conditional distribution $P(y' \mid y)$,
+A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ with
+respect to $\bar{a}$ if there exists a conditional distribution $P(y' \mid y)$,
 **independent of** $\bar{a}$, such that
 
 $$
@@ -285,7 +283,7 @@ equivalently if its
 inverse is well defined on the price set
 
 $$
-P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
+\mathcal{P} \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
   P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \bigr\}.
 $$
 ```
@@ -360,7 +358,7 @@ when is the belief-to-price map actually one-to-one?
 
 When does the belief-to-price map fail to be invertible?
 
-Theorem {prf:ref}`ime_theorem_invertibility_conditions`
+{prf:ref}`ime_theorem_invertibility_conditions`
 shows that for a two-state economy ($S = 2$), the answer depends on the
 **elasticity of
 substitution** $\sigma$ of agent 1's utility function.
@@ -372,25 +370,21 @@ argument.
 ```{prf:lemma} Same Price Implies Same Allocation
 :label: ime_lemma_same_price_same_allocation
 
-Fix the beliefs of all agents except agent 1.
+Assume that $u^i$ has continuous first partial derivatives
+and that $u^i$ is quasi-concave. Let $p\in\mathcal{P}$. If there exist two measures $\mu^*$ and $\mu'$ in $M$ such that $p(\mu*, P^2, . . . ,P^n), = p(\mu',P^2, ... ,P^n)=p$, then
+
+$$
+x^i(\mu^*, P^2, \dots, P^n) = x^i(\mu', P^2, \dots, P^n), \quad 
+i = 1, \dots, n
+$$
+```
+
+This lemma says that fix the beliefs of all agents except agent 1.
 
 If two posterior beliefs $\mu$ and $\mu'$
 both generate the same equilibrium price $p$, then they generate the same
 equilibrium
 allocation for every trader.
-```
-
-```{prf:proof} (Sketch)
-All uninformed agents face the same price $p$ and keep the same beliefs, so
-their demands
-are unchanged.
-
-The firm's supply is also unchanged because it depends only on $p$.
-
-Market clearing then pins down agent 1's demand as the residual, so agent 1 must
-consume the
-same bundle under $\mu$ and $\mu'$ as well.
-```
 
 This lemma lets us define the informed agent's equilibrium bundle as a function
 of price
@@ -421,7 +415,7 @@ solution
 $\mu \in M$, then the price map is invertible on $P$.
 ```
 
-This is Lemma 3 in the paper: if two different posteriors gave the same price,
+If two different posteriors gave the same price,
 then by
 {prf:ref}`ime_lemma_same_price_same_allocation` they would share the same bundle
 $x(p)$,
@@ -476,9 +470,6 @@ making agent 1's demand for good 1 independent of information about $\bar{a}$.
 
 So the market price cannot reveal that information.
 
-The general theorem is abstract, so we now specialize to CES utility to make the
-mechanism concrete.
-
 ### CES utility
 
 For concreteness we work with the **constant-elasticity-of-substitution** (CES)
@@ -528,7 +519,7 @@ $$
 
 For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p =
 W^1 - p$,
-giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no
+giving $p^* = W^1/2$ regardless of the posterior $q$, confirming that no
 information
 is transmitted through the price in the Cobb-Douglas case.
 
@@ -601,13 +592,13 @@ plt.show()
 
 The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
-- *CES with $\sigma \neq 1$*: the equilibrium price is **strictly monotone** in
+- *CES with $\sigma \neq 1$*: the equilibrium price is *strictly monotone* in
   $q$.
 
   - An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely
     invert the
-  price to recover $q$—inside information is fully transmitted.
-- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$—information is never
+  price to recover $q$, that is, inside information is fully transmitted.
+- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$, that is, information is never
   transmitted through the market.
 
 ```{code-cell} ipython3
@@ -714,7 +705,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-When $\sigma = 1$ the ratio is constant across all $a_s$ values—information
+When $\sigma = 1$ the ratio is constant across all $a_s$ values, information
 about the state has no effect on the marginal rate of substitution.
 
 For $\sigma < 1$ the
@@ -843,9 +834,6 @@ g(p^{t+1} \mid p^1, \ldots, p^t)
     h(\lambda \mid p^1, \ldots, p^t).
 $$
 
-With the posterior and predictive density defined, we can state the paper's
-convergence result.
-
 ### The convergence theorem
 
 ```{prf:theorem} Bayesian Convergence
@@ -1141,7 +1129,7 @@ t_grid = np.arange(T + 1)
 fig, axes = plt.subplots(1, 3, figsize=(13, 4), sharey=True)
 struct_labels = [
     r"$\lambda^{(1)}$",
-    r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
+    r"$\lambda^{(2)}$",
     r"$\lambda^{(3)}$",
 ]
 
@@ -1377,166 +1365,25 @@ $D_{KL}$).
 ```{exercise}
 :label: km_ex3
 
-The paper constructs a
-counterexample showing that for $S = 3$ states, even if the elasticity of
-substitution
-of $u^1$ is everywhere greater than one, the price map need **not** be
-invertible.
-
-Consider the marginal rate of substitution for the portfolio utility
-$u^1(a_s x_1 + x_2)$ (infinite elasticity of substitution) and three states
-$a_1 > a_2 > a_3$.
-
-The MRS is
-
-$$
-m(\mu)
-= \frac{a_1\beta_1\mu(a_1) + a_2\beta_2\mu(a_2) + a_3\beta_3\mu(a_3)}
-       {\beta_1\mu(a_1) + \beta_2\mu(a_2) + \beta_3\mu(a_3)},
-$$
-
-where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
-
-1. For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
-$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of
-$(q, r)$.
-Compute $\partial m / \partial r$ and show that its sign depends on
-$\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
-
-1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with
-   risk
-aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify
-numerically
-that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone
-in $r$),
-giving a counterexample to invertibility.
-
-1. Explain why this non-monotonicity does *not* arise in the two-state case $S =
-   2$.
-```
-
-```{solution-start} km_ex3
-:class: dropdown
-```
-
-**1.** Rewrite the MRS with $\mu_1 = 1-r-q$:
-
-$$
-m(q,r) = \frac{a_1\beta_1(1-r-q) + a_2\beta_2 r + a_3\beta_3 q}
-               {\beta_1(1-r-q) + \beta_2 r + \beta_3 q}.
-$$
-
-Differentiating using the quotient rule (denominator $D$):
-
-$$
-\frac{\partial m}{\partial r}
-= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3
-q)(\beta_2-\beta_1)}{D^2}.
-$$
-
-After simplification this reduces to a signed combination of
-$\beta_1\beta_2(a_1-a_2)({\cdot})$ and $\beta_2\beta_3(a_2-a_3)({\cdot})$ terms
-whose sign is parameter-dependent.
-
-**2. Numerical verification.**
-
-```{code-cell} ipython3
-def mrs_3state(q, r, a1, a2, a3, x1, x2, γ):
-    """Return the three-state MRS at (q, r)."""
-    μ1, μ2, μ3 = 1 - r - q, r, q
-    β1 = (a1 * x1 + x2)**(-γ)
-    β2 = (a2 * x1 + x2)**(-γ)
-    β3 = (a3 * x1 + x2)**(-γ)
-    num = a1 * β1 * μ1 + a2 * β2 * μ2 + a3 * β3 * μ3
-    den = β1 * μ1 + β2 * μ2 + β3 * μ3
-    return num / den
-
-a1, a2, a3 = 3.0, 2.0, 0.5
-x1, x2 = 1.0, 0.5
-γ = 2.0
-q_fix = 0.1
-r_grid = np.linspace(0.05, 0.80, 200)
-
-# Valid region: q + r <= 1.
-r_valid = r_grid[r_grid + q_fix <= 0.95]
-m_vals = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, γ) for r in r_valid]
-dm_dr = np.gradient(m_vals, r_valid)
-
-fig, axes = plt.subplots(1, 2, figsize=(11, 4))
-axes[0].plot(r_valid, m_vals, color="steelblue", lw=2)
-axes[0].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
-axes[0].set_ylabel("MRS m(q, r)", fontsize=12)
-axes[0].set_title(f"MRS is non-monotone in r (CRRA gamma={γ})", fontsize=12)
-
-axes[1].plot(r_valid, dm_dr, color="crimson", lw=2)
-axes[1].axhline(0, color="black", lw=1, ls="--")
-axes[1].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
-axes[1].set_ylabel(r"$\partial m / \partial r$", fontsize=12)
-axes[1].set_title(
-    "Derivative changes sign - non-invertibility for $S=3$",
-    fontsize=12,
-)
-
-plt.tight_layout()
-plt.show()
-
-print("Sign changes in dm/dr:",
-      np.sum(np.diff(np.sign(dm_dr)) != 0))
-```
-
-The derivative $\partial m / \partial r$ changes sign, confirming that the MRS
-(and hence
-the equilibrium price) is **not** monotone in $r$ for $S = 3$.
-
-**3.** In the two-state case $S = 2$, the prior is parameterized by a single
-scalar $q$ and the MRS is a function of $q$ alone.
-
-One can show directly that $\partial m / \partial q$ has a definite sign
-determined entirely by whether $a_1 > a_2$ and whether $\sigma > 1$ or $\sigma <
-1$ hold, so there is no room for sign changes.
-
-With three states, the two-dimensional prior $(q, r)$ allows richer interactions
-between $\beta_s$ values that can reverse the sign of the derivative.
-
-```{solution-end}
-```
-
-```{exercise}
-:label: km_ex4
-
 {prf:ref}`ime_theorem_bayesian_convergence`
 assumes the true
 distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
 $h(\bar\lambda) > 0$).
 
-Investigate what happens when the true model is **not** in the
+Investigate what happens when the true model is *not* in the
 prior support.
 
-1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior
+Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior
    that
     places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.3,
     0.4^2)$.
 
-    - Plot the posterior weight on each model over time.
-
-2. Show that the **predictive** (mixture) price distribution converges to the
-   *closest*
-    model in KL divergence terms.
+Plot the posterior weight on each model over time.
 
-    - Compute the KL divergence from the true model to each wrong model.
-    - Verify numerically that the posterior concentrates on the closer wrong
-      model and that
-      the predictive mean converges to that model's mean.
-
-3. Relate this finding to the Bayesian consistency literature: when is the limit
-    distribution a good approximation to the true distribution even under
-    misspecification?
-    Why is the symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ a
-    knife-edge case rather
-    than a setting with a deterministic 50-50 posterior limit?
+Discuss your findings.
 ```
 
-```{solution-start} km_ex4
+```{solution-start} km_ex3
 :class: dropdown
 ```
 
@@ -1599,7 +1446,7 @@ for ax, k, label in zip(axes, [0, 1], labels):
 plt.tight_layout()
 plt.show()
 
-# Predictive density and mean along the median posterior path.
+# Predictive density and mean along the median posterior path
 median_path = np.median(h_misspec, axis=0)
 p_grid = np.linspace(0.0, 3.5, 300)
 closer_idx = np.argmin(kl_vals)
@@ -1639,16 +1486,10 @@ D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(2.3, 0.4^2)\bigr)
 D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(1.5, 0.4^2)\bigr),
 $$
 
-so the model with mean $2.3$ is the unique KL-best approximation among the two
-wrong models, and in the simulation posterior weight concentrates on that model
-while the predictive mean converges to $2.3$, not to the true mean $2.0$.
+so the model with mean $2.3$ is the KL-best approximation among the two
+wrong models, and in the simulation posterior weight concentrates on that model.
 
-This is an instance of the general result that under
-misspecification, Bayesian posteriors converge to the distribution in the model
-class that
-minimizes KL divergence from the model actually generating the data.
-
-The connection is that posterior odds are cumulative likelihood ratios.
+Since posterior odds are cumulative {doc}`likelihood ratios<likelihood_bayes>`.
 
 If we compare the two wrong Gaussian models $f$ and $g$, then under the true
 distribution $h$ the average log likelihood ratio satisfies
@@ -1660,17 +1501,5 @@ $$
 So if $f$ is KL-closer to $h$ than $g$ is, $\log L_t$ has positive drift and
 posterior odds tilt toward $f$.
 
-That is exactly the mechanism emphasized in {doc}`Likelihood Ratio Processes
-<likelihood_ratio_process>`.
-
-The lecture {doc}`likelihood_bayes` gives the Bayesian version of the same
-argument by showing how the posterior is a monotone transform of the likelihood
-ratio process.
-
-The symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ is different because both
-wrong models are equally far from the truth in KL terms, so there is no unique
-pseudo-true model and that knife-edge symmetry does **not** imply a
-deterministic 50-50 posterior limit.
-
 ```{solution-end}
 ```

From 5f6caf311a29dd7f089ae79396fb65530c68df78 Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Wed, 22 Apr 2026 11:25:54 -0600
Subject: [PATCH 09/26] Tom's April 22 edit of misspecified recovery lecture

---
 lectures/misspecified_recovery.md        | 100 ++++++++++++++------
 lectures/misspecified_recovery_extra.bib | 111 -----------------------
 2 files changed, 72 insertions(+), 139 deletions(-)
 delete mode 100644 lectures/misspecified_recovery_extra.bib

diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index c9d40c948..60eabc320 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -29,14 +29,20 @@ kernelspec:
 ## Overview
 
 Asset prices are forward-looking: they encode investors' expectations about future economic
-states and their valuations of different risks.  A long-standing question in finance is
-whether we can *recover* the probability distribution used by investors — their subjective
+states and their valuations of different risks.  
+
+A long-standing question in finance is
+whether one can *recover* the probability distribution used by investors — their subjective
 beliefs — from observed asset prices alone.
 
 {cite}`BorovickaHansenScheinkman2016` study the challenge of separating investors'
-beliefs from their risk preferences using **Perron–Frobenius theory**.  The key finding
+beliefs from their risk preferences using **Perron–Frobenius theory**. 
+
+Their key finding
 is that Perron–Frobenius theory applied to Arrow prices recovers a **long-term risk-neutral
-measure** that absorbs all long-horizon risk adjustments.  This recovered measure coincides
+measure** that absorbs all long-horizon risk adjustments.  
+
+This recovered measure coincides
 with investors' subjective beliefs only under a stringent — and often empirically
 implausible — restriction on the stochastic discount factor.
 
@@ -84,30 +90,41 @@ plt.rcParams.update({
 ### Arrow prices and stochastic discount factors
 
 Consider a discrete-time economy with an $n$-state Markov chain $\{X_t\}$ governed
-by transition matrix $\mathbf{P} = [p_{ij}]$.  An **Arrow price** $q_{ij}$ is the
+by transition matrix $\mathbf{P} = [p_{ij}]$.  
+
+An **Arrow price** $q_{ij}$ is the
 date-$t$ price of a claim that pays $\$1$ tomorrow in state $j$ given that the current
-state is $i$.  We collect these prices in a matrix $\mathbf{Q} = [q_{ij}]$.
+state is $i$.  
+
+Collect these prices in a matrix $\mathbf{Q} = [q_{ij}]$.
 
 A **stochastic discount factor** (SDF) $s_{ij}$ prices risk by discounting the payoff
-in state $j$ tomorrow when today's state is $i$.  Arrow prices and the SDF are linked by
+in state $j$ tomorrow when today's state is $i$.  
+
+Arrow prices and the SDF are linked by
 
 $$
 q_{ij} = s_{ij} \, p_{ij}.
 $$
 
 Given $\mathbf{Q}$, any pair $(\mathbf{S}, \mathbf{P})$ satisfying $q_{ij} = s_{ij} p_{ij}$
-for all $(i,j)$ is consistent with the observed prices.  The fundamental identification
+for all $(i,j)$ is consistent with the observed prices.  
+
+The fundamental identification
 problem is that $\mathbf{Q}$ has $n^2$ entries, $\mathbf{P}$ has $n(n-1)$ free entries
 (rows sum to one), and $\mathbf{S}$ has $n^2$ free entries — so there are far more
 unknowns than equations.
 
-To make progress, we can impose restrictions on the SDF.  Two classical restrictions are
+To make progress, we can impose restrictions on the SDF.
+
+Two classical restrictions are
 studied in the sections that follow.
 
 ### A three-state illustration
 
 To build intuition, we work with a three-state Markov chain representing
 **recession**, **normal**, and **expansion** phases of the business cycle.
+
 The physical transition matrix and consumption levels are:
 
 ```{code-cell} ipython3
@@ -148,8 +165,11 @@ $$
 $$
 
 where $\bar{q}_i = \sum_j q_{ij}$ is the price of a one-period discount bond in state $i$.
+
 Under this restriction all future states are discounted equally from state $i$, so risk
-adjustments depend only on the current state.  The resulting risk-neutral probabilities are
+adjustments depend only on the current state.  
+
+The resulting risk-neutral probabilities are
 
 $$
 \bar{p}_{ij} = \frac{q_{ij}}{\bar{q}_i}.
@@ -207,9 +227,12 @@ $$
 $$
 
 This is an **eigenvalue–eigenvector problem** for the Arrow price matrix $\mathbf{Q}$.
+
 By the **Perron–Frobenius theorem**, if $\mathbf{Q}$ has strictly positive entries, the
 dominant eigenvalue is unique, real, and positive, and its eigenvector has strictly
-positive entries.  This gives a unique construction:
+positive entries.  
+
+This gives a unique construction:
 
 1. Solve $\mathbf{Q} \hat{\mathbf{e}} = \exp(\hat{\eta}) \hat{\mathbf{e}}$ for the
    dominant eigenvalue–eigenvector pair.
@@ -342,7 +365,9 @@ for lbl, pi in zip(labels, [pi_phys, pi_bar, pi_hat]):
 
 The long-term risk-neutral measure $\hat{\mathbf{P}}$ assigns **higher weight to bad
 states** (recession) and **lower weight to good states** (expansion) than the physical
-measure $\mathbf{P}$.  This is the risk adjustment for long-run growth uncertainty: a
+measure $\mathbf{P}$.  
+
+This is the risk adjustment for long-run growth uncertainty: a
 risk-averse investor's long-run discount rates embed a premium for permanent income risk.
 
 ## The Martingale Decomposition
@@ -359,7 +384,9 @@ $$
 $$
 
 Because $\sum_j \hat{h}_{ij} p_{ij} = \sum_j \hat{p}_{ij} = 1$, the process $\hat{H}$
-is a martingale under the physical measure $\mathbf{P}$.  The accumulated SDF then admits
+is a martingale under the physical measure $\mathbf{P}$.  
+
+The accumulated SDF then admits
 the **multiplicative decomposition**:
 
 $$
@@ -399,7 +426,9 @@ print(f"\nMartingale property check — E[ĥ | X_t=i] = {mart_check}")
 
 Higher risk aversion amplifies the pessimistic distortion: as $\gamma$ increases, the
 recovered measure assigns growing probability to the recession state.
-(The figures illustrating this appear below, after we define the Epstein–Zin utility
+
+
+(Gigures illustrating this will appear below, after we define the Epstein–Zin utility
 function that is needed to compute them.)
 
 ## When Does Recovery Succeed?
@@ -413,7 +442,9 @@ $$
 $$
 
 for some positive function $m$ and discount rate $\delta$ (Condition 4 in
-{cite}`BorovickaHansenScheinkman2016`).  Under this restriction, the SDF has **no
+{cite}`BorovickaHansenScheinkman2016`).  
+
+Under this restriction, the SDF has **no
 martingale component**: $\hat{H}_t \equiv 1$.
 
 Equivalently, recovery succeeds if and only if the physical stochastic discount factor
@@ -431,7 +462,9 @@ The critical question is: when is the martingale component degenerate?
 ### Power utility with trend-stationary consumption
 
 Consider a power-utility investor with risk aversion $\gamma$ and *trend-stationary*
-consumption $C_t = \exp(g_c t)(c \cdot X_t)$ where $c$ is a positive vector.  The
+consumption $C_t = \exp(g_c t)(c \cdot X_t)$ where $c$ is a positive vector.
+
+The
 one-period SDF is
 
 $$
@@ -439,7 +472,9 @@ s_{ij} = \exp(-\delta - \gamma g_c) \left(\frac{c_j}{c_i}\right)^{-\gamma}.
 $$
 
 This has the exact long-term risk pricing form with $\hat{e}_j = c_j^\gamma$ and
-$\hat{\eta} = -(\delta + \gamma g_c)$.  Therefore $\hat{h}_{ij} \equiv 1$ and **Ross
+$\hat{\eta} = -(\delta + \gamma g_c)$.
+
+Therefore $\hat{h}_{ij} \equiv 1$ and **Ross
 recovery succeeds exactly** when consumption fluctuations around a deterministic trend
 are the only source of risk.
 
@@ -485,7 +520,9 @@ s_{ij} = \exp(-\delta - g_c)\frac{c_i}{c_j}
 $$
 
 where $v^*_i = \exp\!\bigl[(1-\gamma)v_i\bigr]$ and $\mathbf{P}_i$ is the $i$-th row of
-$\mathbf{P}$.  The additional factor $v^*_j/(\mathbf{P}_i v^*)$ introduces a **nontrivial
+$\mathbf{P}$.  
+
+The additional factor $v^*_j/(\mathbf{P}_i v^*)$ introduces a **nontrivial
 martingale component** whenever continuation values vary across states.
 
 ```{code-cell} ipython3
@@ -640,7 +677,9 @@ plt.tight_layout();  plt.show()
 ```
 
 At $\gamma = 1$ (log utility), the continuation value is constant across states and the
-martingale is trivial, so recovery succeeds.  For $\gamma > 1$, continuation values vary
+martingale is trivial, so recovery succeeds.
+
+For $\gamma > 1$, continuation values vary
 with the state, generating a non-degenerate martingale that grows with risk aversion.
 
 ## The Long-Run Risk Model
@@ -660,10 +699,13 @@ dX_{2t} &= \bar{\mu}_{22}(X_{2t} - \iota_2)\,dt + \sqrt{X_{2t}}\,\bar{\sigma}_2
 \end{aligned}
 $$
 
-where $W_t$ is a three-dimensional Brownian motion.  Here $X_{1t}$ is the
+where $W_t$ is a three-dimensional Brownian motion.
+
+Here $X_{1t}$ is the
 **predictable component of consumption growth** and $X_{2t}$ is **stochastic volatility**.
 
 The representative agent has Epstein–Zin preferences with unit elasticity of substitution.
+
 The stochastic discount factor satisfies
 
 $$
@@ -695,6 +737,7 @@ lrr_params = dict(
 ### Solving the value function
 
 The log continuation value $v(X_t)$ is affine in the state: $v(x) = \bar{v}_0 + \bar{v}_1 x_1 + \bar{v}_2 x_2$.
+
 The coefficients satisfy the algebraic system in Appendix D of {cite}`BorovickaHansenScheinkman2016`.
 
 ```{code-cell} ipython3
@@ -981,7 +1024,9 @@ plt.tight_layout();  plt.show()
 ```
 
 The recovered measure $\hat{P}$ concentrates around **lower mean growth** (more negative
-$X_1$) and **higher conditional volatility** (larger $X_2$).  Forecasts made using
+$X_1$) and **higher conditional volatility** (larger $X_2$).
+
+Forecasts made using
 $\hat{P}$ are systematically pessimistic compared to forecasts based on the true
 distribution $P$.
 
@@ -990,7 +1035,9 @@ distribution $P$.
 ### Entropy bounds
 
 Even without observing the full array of Arrow prices, we can obtain **lower bounds**
-on the size of the martingale component.  For a convex function
+on the size of the martingale component.
+
+For a convex function
 $\phi_\theta(r) = [(r)^{1+\theta} - 1] / [\theta(1+\theta)]$, the discrepancy
 between $\hat{P}$ and $P$ satisfies
 
@@ -999,7 +1046,9 @@ $$
 \geq 0,
 $$
 
-with equality if and only if the martingale is trivial.  Two special cases are:
+with equality if and only if the martingale is trivial.
+
+Two special cases are:
 
 - **$\theta = -1$**: $\phi_{-1}(r) = -\log r$, so $\lambda_{-1} = -E[\log(\hat{H}_{t+1}/\hat{H}_t)]$ is the **expected log-likelihood** (entropy).
 - **$\theta = 1$**: $\lambda_1 = \tfrac{1}{2}\mathrm{Var}[\hat{H}_{t+1}/\hat{H}_t]$.
@@ -1341,8 +1390,3 @@ print("→ Recovery is exact for trend-stationary power utility.")
 ```{solution-end}
 ```
 
-## References
-
-```{bibliography}
-:filter: docname in docnames
-```
diff --git a/lectures/misspecified_recovery_extra.bib b/lectures/misspecified_recovery_extra.bib
deleted file mode 100644
index 345a7011c..000000000
--- a/lectures/misspecified_recovery_extra.bib
+++ /dev/null
@@ -1,111 +0,0 @@
-% Supplementary bibliography for misspecified_recovery.md
-% Add these entries to lectures/_static/quant-econ.bib
-
-@article{BorovickaHansenScheinkman2016,
-  author  = {Borovička, Jaroslav and Hansen, Lars Peter and Scheinkman, José A.},
-  title   = {Misspecified Recovery},
-  journal = {Journal of Finance},
-  volume  = {71},
-  number  = {6},
-  pages   = {2493--2544},
-  year    = {2016},
-  doi     = {10.1111/jofi.12404}
-}
-
-@article{Ross2015,
-  author  = {Ross, Stephen A.},
-  title   = {The Recovery Theorem},
-  journal = {Journal of Finance},
-  volume  = {70},
-  number  = {2},
-  pages   = {615--648},
-  year    = {2015},
-  doi     = {10.1111/jofi.12092}
-}
-
-@article{HansenScheinkman2009,
-  author  = {Hansen, Lars Peter and Scheinkman, José A.},
-  title   = {Long-Term Risk: An Operator Approach},
-  journal = {Econometrica},
-  volume  = {77},
-  number  = {1},
-  pages   = {177--234},
-  year    = {2009},
-  doi     = {10.3982/ECTA6761}
-}
-
-@article{AlvarezJermann2005,
-  author  = {Alvarez, Fernando and Jermann, Urban J.},
-  title   = {Using Asset Prices to Measure the Persistence of the Marginal Utility
-             of Wealth},
-  journal = {Econometrica},
-  volume  = {73},
-  number  = {6},
-  pages   = {1977--2016},
-  year    = {2005},
-  doi     = {10.1111/j.1468-0262.2005.00643.x}
-}
-
-@article{BakshiChabiYo2012,
-  author  = {Bakshi, Gurdip and Chabi-Yo, Fousseni},
-  title   = {Variance Bounds on the Permanent and Transitory Components of
-             Stochastic Discount Factors},
-  journal = {Journal of Financial Economics},
-  volume  = {105},
-  number  = {1},
-  pages   = {191--208},
-  year    = {2012},
-  doi     = {10.1016/j.jfineco.2011.10.004}
-}
-
-@article{BackusGregoryZin1989,
-  author  = {Backus, David K. and Gregory, Allan W. and Zin, Stanley E.},
-  title   = {Risk Premiums in the Term Structure: Evidence from Artificial Economies},
-  journal = {Journal of Monetary Economics},
-  volume  = {24},
-  number  = {3},
-  pages   = {371--399},
-  year    = {1989},
-  doi     = {10.1016/0304-3932(89)90033-X}
-}
-
-@article{Hansen2012,
-  author  = {Hansen, Lars Peter},
-  title   = {Dynamic Valuation Decomposition within Stochastic Economies},
-  journal = {Econometrica},
-  volume  = {80},
-  number  = {3},
-  pages   = {911--967},
-  year    = {2012},
-  note    = {Fisher--Schultz Lecture at the European Meetings of the Econometric Society},
-  doi     = {10.3982/ECTA8070}
-}
-
-@article{KrepsPorteus1978,
-  author  = {Kreps, David M. and Porteus, Evan L.},
-  title   = {Temporal Resolution of Uncertainty and Dynamic Choice Theory},
-  journal = {Econometrica},
-  volume  = {46},
-  number  = {1},
-  pages   = {185--200},
-  year    = {1978},
-  doi     = {10.2307/1913656}
-}
-
-@article{HansenScheinkman2014,
-  author  = {Hansen, Lars Peter and Scheinkman, José A.},
-  title   = {Stochastic Compounding and Uncertain Valuation},
-  year    = {2014},
-  note    = {Working paper, University of Chicago, Columbia University, Princeton University}
-}
-
-@article{BackusChernovZin2014,
-  author  = {Backus, David K. and Chernov, Mikhail and Zin, Stanley E.},
-  title   = {Sources of Entropy in Representative Agent Models},
-  journal = {Journal of Finance},
-  volume  = {69},
-  number  = {1},
-  pages   = {51--99},
-  year    = {2014},
-  doi     = {10.1111/jofi.12090}
-}

From 3d9e1eb4a7556404edd554cbd906011c9bc571c9 Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Wed, 22 Apr 2026 12:31:47 -0600
Subject: [PATCH 10/26] Add Ross Recovery Theorem lecture

- Add lectures/ross_recovery.md: new QuantEcon lecture on Ross (2015)
  covering the Recovery Theorem, Perron-Frobenius approach, natural vs
  risk-neutral distributions, tail risk, and efficient markets tests
- Add ross_recovery to _toc.yml before misspecified_recovery in asset
  pricing section
- Add 7 new BibTeX entries to quant-econ.bib: BreedenLitzenberger1978,
  CarrYu2012, BlackScholes1973, Merton1973, CoxRossRubinstein1979,
  JackwerthRubinstein1996, Weitzman2007
---
 lectures/_static/quant-econ.bib |   77 +++
 lectures/_toc.yml               |    1 +
 lectures/ross_recovery.md       | 1030 +++++++++++++++++++++++++++++++
 3 files changed, 1108 insertions(+)
 create mode 100644 lectures/ross_recovery.md

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index b178ec0ae..1c88e19c9 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -1,3 +1,80 @@
+@article{BreedenLitzenberger1978,
+  author    = {Breeden, Douglas T. and Litzenberger, Robert H.},
+  title     = {Prices of State-Contingent Claims Implicit in Option Prices},
+  journal   = {Journal of Business},
+  volume    = {51},
+  number    = {4},
+  pages     = {621--651},
+  year      = {1978},
+  doi       = {10.1086/296025}
+}
+
+@article{CarrYu2012,
+  author    = {Carr, Peter and Yu, Jiming},
+  title     = {Risk, Return, and {Ross} Recovery},
+  journal   = {Journal of Derivatives},
+  volume    = {20},
+  number    = {1},
+  pages     = {38--59},
+  year      = {2012},
+  doi       = {10.3905/jod.2012.20.1.038}
+}
+
+@article{BlackScholes1973,
+  author    = {Black, Fischer and Scholes, Myron},
+  title     = {The Pricing of Options and Corporate Liabilities},
+  journal   = {Journal of Political Economy},
+  volume    = {81},
+  number    = {3},
+  pages     = {637--654},
+  year      = {1973},
+  doi       = {10.1086/260062}
+}
+
+@article{Merton1973,
+  author    = {Merton, Robert C.},
+  title     = {Theory of Rational Option Pricing},
+  journal   = {Bell Journal of Economics and Management Science},
+  volume    = {4},
+  number    = {1},
+  pages     = {141--183},
+  year      = {1973},
+  doi       = {10.2307/3003143}
+}
+
+@article{CoxRossRubinstein1979,
+  author    = {Cox, John C. and Ross, Stephen A. and Rubinstein, Mark},
+  title     = {Option Pricing: A Simplified Approach},
+  journal   = {Journal of Financial Economics},
+  volume    = {7},
+  number    = {3},
+  pages     = {229--263},
+  year      = {1979},
+  doi       = {10.1016/0304-405X(79)90015-1}
+}
+
+@article{JackwerthRubinstein1996,
+  author    = {Jackwerth, Jens Carsten and Rubinstein, Mark},
+  title     = {Recovering Probability Distributions from Option Prices},
+  journal   = {Journal of Finance},
+  volume    = {51},
+  number    = {5},
+  pages     = {1611--1631},
+  year      = {1996},
+  doi       = {10.1111/j.1540-6261.1996.tb05219.x}
+}
+
+@article{Weitzman2007,
+  author    = {Weitzman, Martin L.},
+  title     = {Subjective Expectations and Asset-Return Puzzles},
+  journal   = {American Economic Review},
+  volume    = {97},
+  number    = {4},
+  pages     = {1102--1130},
+  year      = {2007},
+  doi       = {10.1257/aer.97.4.1102}
+}
+
 @article{BorovickaHansenScheinkman2016,
   author    = {Borovička, Jaroslav and Hansen, Lars Peter and Scheinkman, José A.},
   title     = {Misspecified Recovery},
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index b24d80c17..51aa52861 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -141,6 +141,7 @@ parts:
   - file: harrison_kreps
   - file: morris_learn
   - file: affine_risk_prices
+  - file: ross_recovery
   - file: misspecified_recovery
 - caption: Data and Empirics
   numbered: true
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
new file mode 100644
index 000000000..2c966c45e
--- /dev/null
+++ b/lectures/ross_recovery.md
@@ -0,0 +1,1030 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(ross_recovery)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# The Recovery Theorem
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+From option prices we can extract risk-neutral (martingale) probabilities of
+future outcomes.  But risk-neutral probabilities blend two things: the market's
+*actual* probability beliefs and investors' *risk aversion*.  Disentangling the
+two has long seemed impossible without imposing parametric assumptions on
+preferences.
+
+{cite}`Ross2015` showed that under a key assumption — the **transition
+independence** of the pricing kernel — the natural (real-world) probability
+distribution and the pricing kernel can be uniquely recovered from state prices
+alone, without historical return data or parametric utility functions.  This
+result is called the **Recovery Theorem**.
+
+The theorem has several important implications.
+
+* It enables model-free extraction of the market's forward-looking probability
+  distribution from option prices.
+* It provides model-free tests of the efficient market hypothesis.
+* It sheds light on the "dark matter" of finance: the probability of rare
+  catastrophic events embedded in market prices.
+
+This lecture covers
+
+* The basic Arrow–Debreu framework linking state prices, the pricing kernel,
+  and natural probabilities.
+* The Recovery Theorem and its proof via the Perron–Frobenius theorem.
+* A computational implementation that recovers the natural distribution from a
+  simulated state-price matrix.
+* Comparisons between risk-neutral and recovered natural densities.
+
+Let's import the packages we'll need.
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.linalg import eig
+from scipy.stats import norm
+import matplotlib.cm as cm
+
+plt.rcParams['figure.figsize'] = (10, 6)
+plt.rcParams['axes.grid'] = True
+plt.rcParams['grid.alpha'] = 0.3
+```
+
+## Model Setup
+
+### Arrow–Debreu State Prices
+
+Consider a discrete-time, discrete-state economy.  At each date the economy
+occupies one of $m$ states $\theta_1, \ldots, \theta_m$.  An **Arrow–Debreu
+security** pays \$1 if the economy is in state $\theta_j$ next period and
+nothing otherwise.
+
+Denote by $p(\theta_i, \theta_j)$ the price today, when the current state is
+$\theta_i$, of the Arrow–Debreu security paying in state $\theta_j$ next
+period.  Collect these into an $m \times m$ **state price transition matrix**
+
+$$
+P = [p(\theta_i, \theta_j)]_{i,j=1}^m.
+$$
+
+The row sums give the state-dependent interest factor: $\sum_j p(\theta_i,
+\theta_j) = e^{-r(\theta_i)}$.
+
+### The Pricing Kernel
+
+From the Fundamental Theorem of Asset Pricing, the pricing kernel
+$\phi(\theta_i, \theta_j)$ relates state prices to natural probabilities via
+
+$$
+p(\theta_i, \theta_j) = \phi(\theta_i, \theta_j) \, f(\theta_i, \theta_j),
+$$
+
+where $f(\theta_i, \theta_j)$ is the natural (conditional) probability of
+transitioning from state $\theta_i$ to $\theta_j$.
+
+In the canonical representative-agent model with additively separable utility
+and discount factor $\delta$, the first-order condition gives
+
+$$
+\phi(\theta_i, \theta_j) = \frac{p(\theta_i, \theta_j)}{f(\theta_i, \theta_j)}
+    = \frac{\delta U'(c(\theta_j))}{U'(c(\theta_i))}.
+$$
+
+The key structural property this implies is **transition independence**.
+
+### Transition Independence
+
+```{note}
+**Definition.** A pricing kernel is *transition independent* if there exists a
+positive function $h$ on the state space and a positive scalar $\delta$ such
+that for every transition from state $\theta_i$ to $\theta_j$,
+
+$$
+\phi(\theta_i, \theta_j) = \delta \, \frac{h(\theta_j)}{h(\theta_i)}.
+$$
+```
+
+Transition independence says the kernel depends on the *ending* state and
+normalizes by the *beginning* state.  It holds for any agent with
+intertemporally additive separable utility (where $h = U'$) and also for
+Epstein–Zin recursive preferences {cite}`Epstein_Zin1989`.
+
+Under transition independence, the state-price equation becomes
+
+$$
+p(\theta_i, \theta_j) = \delta \, \frac{h(\theta_j)}{h(\theta_i)} \,
+    f(\theta_i, \theta_j).
+$$
+
+In matrix notation, defining the diagonal matrix $D$ with $D_{ii} = h(\theta_i)/\delta$,
+
+$$
+DP = \delta F D,
+$$
+
+or equivalently,
+
+$$
+F = \frac{1}{\delta} D P D^{-1}.
+$$
+
+## The Recovery Theorem
+
+### Reducing to an Eigenvalue Problem
+
+Since $F$ is a stochastic matrix, its rows sum to one: $F e = e$ where $e$
+is the vector of ones.  Substituting the expression for $F$:
+
+$$
+\frac{1}{\delta} D P D^{-1} e = e
+\quad \Longrightarrow \quad
+P z = \delta z, \quad z \equiv D^{-1} e.
+$$
+
+This is an **eigenvalue problem**: we seek a positive vector $z$ and scalar
+$\delta$ satisfying $Pz = \delta z$.
+
+The Perron–Frobenius theorem guarantees exactly one such solution when $P$ is
+nonnegative and irreducible.
+
+```{note}
+**Theorem (Perron–Frobenius).** Every nonnegative irreducible matrix has a
+unique positive eigenvector (up to scaling) and a unique largest positive
+eigenvalue.
+```
+
+### Statement and Proof Sketch
+
+```{note}
+**Theorem 1 (Recovery Theorem, {cite}`Ross2015`).** Suppose there is no
+arbitrage, the state price transition matrix $P$ is irreducible, and the
+pricing kernel is transition independent.  Then there exists a *unique*
+positive solution $(\delta, z, F)$ to the recovery problem.  That is, for any
+set of state prices there is a unique compatible natural probability transition
+matrix and a unique pricing kernel.
+```
+
+*Proof sketch.*  Because $P$ is nonnegative and irreducible, the
+Perron–Frobenius theorem gives a unique positive eigenvector $z > 0$ with
+positive eigenvalue $\lambda > 0$ satisfying $Pz = \lambda z$.  Setting
+
+$$
+\delta = \lambda, \qquad D_{ii} = \frac{1}{z_i},
+$$
+
+the natural probability transition matrix is uniquely recovered as
+
+$$
+f_{ij} = \frac{1}{\delta} \frac{z_j}{z_i} \, p_{ij}.
+$$
+
+One can verify that $F$ is indeed a stochastic matrix: all entries are
+positive and each row sums to one.  Uniqueness follows from the uniqueness of
+the Perron–Frobenius eigenvector. $\blacksquare$
+
+### Pricing Kernel from the Eigenvector
+
+The recovered kernel values are
+
+$$
+\phi(\theta_i, \theta_j) = \frac{\delta}{z_i / z_j}
+    = \frac{z_j}{z_i} \cdot \frac{1}{1},
+$$
+
+so the kernel at state $\theta_i$ (relative to a baseline state) is $1/z_i$.
+States with high $z_i$ have **low** kernel values, meaning the market assigns
+relatively less pricing weight per unit of probability — consistent with those
+states being "good times."
+
+### Corollary: Risk-Neutral Pricing When Rates Are State-Independent
+
+```{note}
+**Theorem 2 ({cite}`Ross2015`).** If the riskless rate is the same in all
+states ($Pe = \gamma e$ for some scalar $\gamma$), then the unique natural
+distribution consistent with recovery is the risk-neutral (martingale)
+distribution itself: $F = (1/\gamma) P$.
+```
+
+This remarkable result says that with a constant interest rate and a bounded
+irreducible state space, recovery forces risk-neutrality — a non-trivial
+restriction of the model.
+
+## Python Implementation
+
+We now implement the Recovery Theorem numerically.
+
+### Building a State Price Matrix from a Lognormal Model
+
+Following {cite}`Ross2015` Section IV, suppose the natural distribution of
+log-returns over one period is normal:
+
+$$
+\log(S_T/S_0) \sim \mathcal{N}\!\left((\mu - \tfrac{1}{2}\sigma^2)T, \sigma^2 T\right).
+$$
+
+With a CRRA pricing kernel $\phi(S_0, S_T) = e^{-\delta T}(S_T/S_0)^{-\gamma}$,
+the state price density is
+
+$$
+P_T(s, s_T) = e^{-\delta T} e^{-\gamma(s_T - s)} \,
+    n\!\left(\frac{s_T - s - (\mu - \frac{1}{2}\sigma^2)T}{\sigma\sqrt{T}}\right),
+$$
+
+where $s = \ln S_0$, $s_T = \ln S_T$, and $n(\cdot)$ is the standard normal
+density.
+
+We discretize this onto a grid of $m$ states and build the matrix $P$.
+
+```{code-cell} ipython3
+def build_state_price_matrix(mu, sigma, gamma, delta, T=1.0, n_states=11, n_sigma=5):
+    """
+    Build an m x m state price transition matrix for the lognormal / CRRA model.
+
+    Parameters
+    ----------
+    mu     : float  Natural expected log-return (annualised)
+    sigma  : float  Volatility (annualised)
+    gamma  : float  Coefficient of relative risk aversion
+    delta  : float  Subjective discount rate
+    T      : float  Horizon (years) for one period
+    n_states : int  Number of discrete states
+    n_sigma  : int  Grid half-width in standard deviations
+
+    Returns
+    -------
+    P      : (m, m) array  State price matrix
+    states : (m,) array    State values (log-return grid)
+    """
+    # Equally-spaced grid from -n_sigma*sigma to +n_sigma*sigma
+    states = np.linspace(-n_sigma * sigma * np.sqrt(T),
+                          n_sigma * sigma * np.sqrt(T),
+                          n_states)
+    ds = states[1] - states[0]   # grid spacing
+
+    m = n_states
+    P = np.zeros((m, m))
+
+    drift = (mu - 0.5 * sigma**2) * T
+
+    for i in range(m):
+        s_i = states[i]
+        for j in range(m):
+            s_j = states[j]
+            log_return = s_j - s_i
+            # Normal density evaluated at s_j given s_i
+            n_val = norm.pdf(log_return, loc=drift, scale=sigma * np.sqrt(T))
+            # Pricing kernel
+            kernel = np.exp(-delta * T) * np.exp(-gamma * log_return)
+            P[i, j] = kernel * n_val * ds
+
+    return P, states
+```
+
+```{code-cell} ipython3
+# Parameters matching the numerical example in Ross (2015), Section IV
+mu    = 0.08    # 8% annual expected return
+sigma = 0.20    # 20% annual volatility
+gamma = 3.0     # CRRA coefficient
+delta = 0.02    # 2% annual discount rate
+T     = 1.0     # one-year horizon
+
+P, states = build_state_price_matrix(mu, sigma, gamma, delta, T,
+                                     n_states=11, n_sigma=5)
+
+print("State price matrix P  (rows = current state, cols = future state)")
+print("Row sums (should equal discount factor e^{-r}):")
+print(np.round(P.sum(axis=1), 4))
+print(f"\nImplied annual interest rate: {-np.log(P[5].sum()):.4f}")
+```
+
+### Applying the Recovery Theorem
+
+The Recovery Theorem requires computing the **dominant eigenvector** of $P$.
+
+```{code-cell} ipython3
+def recover_natural_distribution(P):
+    """
+    Apply the Recovery Theorem to state price matrix P.
+
+    Returns
+    -------
+    F     : (m, m) array  Recovered natural probability transition matrix
+    z     : (m,) array    Dominant eigenvector of P (Perron vector)
+    delta : float         Recovered subjective discount rate
+    phi   : (m,) array    Recovered kernel values (relative to state 0)
+    """
+    m = P.shape[0]
+
+    # Compute all eigenvalues and right eigenvectors
+    eigenvalues, eigenvectors = eig(P)
+
+    # Find the dominant (Perron) eigenvalue — largest positive real one
+    real_mask = np.isreal(eigenvalues)
+    real_eigenvalues = eigenvalues[real_mask].real
+    real_eigenvectors = eigenvectors[:, real_mask].real
+
+    idx = np.argmax(real_eigenvalues)
+    delta_recovered = real_eigenvalues[idx]
+    z = real_eigenvectors[:, idx]
+
+    # Ensure z is positive (Perron vector)
+    if np.mean(z) < 0:
+        z = -z
+
+    # Normalize so that z[reference] = 1
+    z = z / z[m // 2]
+
+    # Diagonal matrix D with D_ii = 1/z_i
+    D = np.diag(1.0 / z)
+    D_inv = np.diag(z)
+
+    # Recover natural probability matrix
+    F = (1.0 / delta_recovered) * D @ P @ D_inv
+
+    # Clip small numerical negatives and renormalize rows
+    F = np.clip(F, 0, None)
+    F = F / F.sum(axis=1, keepdims=True)
+
+    # Pricing kernel relative to middle state
+    phi = 1.0 / z
+
+    return F, z, delta_recovered, phi
+```
+
+```{code-cell} ipython3
+F, z, delta_rec, phi = recover_natural_distribution(P)
+
+print(f"Recovered discount rate δ  = {delta_rec:.6f}  (true: {np.exp(-delta):.6f})")
+print(f"\nRecovered kernel φ (monotone decreasing in good states):")
+print(np.round(phi, 4))
+print(f"\nNatural probability matrix F  (row sums should be 1):")
+print(np.round(F.sum(axis=1), 6))
+```
+
+### Visualizing Natural vs. Risk-Neutral Distributions
+
+A key insight of {cite}`Ross2015` is that the natural distribution systematically
+differs from the risk-neutral one.  In particular, the natural distribution
+stochastically dominates the risk-neutral distribution (Theorem 3 in the paper).
+
+```{code-cell} ipython3
+def get_marginal(transition_matrix, initial_row, n_periods, states_exp):
+    """
+    Compute the marginal distribution at horizon n_periods by iterating
+    the transition matrix starting from a given initial distribution.
+
+    Parameters
+    ----------
+    transition_matrix : (m, m) array
+    initial_row       : int  Index of the starting state
+    n_periods         : int  Horizon
+    states_exp        : (m,) array  Gross return levels exp(states)
+    """
+    m = transition_matrix.shape[0]
+    # Start with all weight on the initial row
+    dist = np.zeros(m)
+    dist[initial_row] = 1.0
+
+    for _ in range(n_periods):
+        dist = dist @ transition_matrix
+
+    return dist
+```
+
+```{code-cell} ipython3
+# Starting from the middle state (current state = S_0)
+mid = len(states) // 2
+
+# Risk-neutral transition matrix Q_ij = P_ij / sum_k P_ik  (row normalise P)
+row_sums = P.sum(axis=1, keepdims=True)
+Q_rn = P / row_sums   # risk-neutral probabilities
+
+# One-period marginals
+f_nat = F[mid, :]              # natural: row of recovered F
+f_rn  = Q_rn[mid, :]          # risk-neutral: row of Q
+
+# State labels in gross return terms
+gross_returns = np.exp(states)
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+# Panel A: densities
+axes[0].plot(gross_returns, f_nat, 'b-o', ms=5, label='Natural (recovered)', lw=2)
+axes[0].plot(gross_returns, f_rn,  'r--s', ms=5, label='Risk-neutral',       lw=2)
+axes[0].set_xlabel('Gross return $S_T / S_0$')
+axes[0].set_ylabel('Probability')
+axes[0].set_title('One-Period Marginal Distributions')
+axes[0].legend()
+
+# Panel B: pricing kernel
+axes[1].plot(gross_returns, phi, 'g-^', ms=5, lw=2)
+axes[1].set_xlabel('Gross return $S_T / S_0$')
+axes[1].set_ylabel('Kernel $\\phi$ (relative)')
+axes[1].set_title('Recovered Pricing Kernel')
+
+plt.tight_layout()
+plt.savefig('ross_recovery_distributions.png', dpi=120)
+plt.show()
+```
+
+```{code-cell} ipython3
+# Compute summary statistics
+E_nat = np.sum(f_nat * gross_returns)
+E_rn  = np.sum(f_rn  * gross_returns)
+std_nat = np.sqrt(np.sum(f_nat * (gross_returns - E_nat)**2))
+std_rn  = np.sqrt(np.sum(f_rn  * (gross_returns - E_rn )**2))
+
+risk_free = np.sum(P[mid])   # price of riskless bond from middle state
+
+print("Summary Statistics (one-period horizon)")
+print(f"{'':30s} {'Natural':>12s}   {'Risk-Neutral':>12s}")
+print("-" * 57)
+print(f"{'Expected gross return':30s} {E_nat:>12.4f}   {E_rn:>12.4f}")
+print(f"{'Std dev':30s} {std_nat:>12.4f}   {std_rn:>12.4f}")
+print(f"{'Risk-free discount factor':30s} {risk_free:>12.4f}")
+print(f"{'Annual risk-free rate':30s} {-np.log(risk_free):>12.4f}")
+print(f"{'Equity risk premium':30s} {E_nat - 1/risk_free:>12.4f}")
+```
+
+### Stochastic Dominance
+
+Theorem 3 of {cite}`Ross2015` shows that the natural marginal density
+**first-order stochastically dominates** the risk-neutral density: the CDF of
+the natural distribution lies *below* that of the risk-neutral distribution.
+
+Intuitively, because the pricing kernel is declining (investors fear bad
+outcomes), risk-neutral probabilities overweight bad states and underweight
+good states relative to the natural measure.
+
+```{code-cell} ipython3
+# CDFs
+cdf_nat = np.cumsum(f_nat)
+cdf_rn  = np.cumsum(f_rn)
+
+fig, ax = plt.subplots(figsize=(9, 5))
+ax.plot(gross_returns, cdf_nat, 'b-o', ms=5, lw=2, label='Natural CDF')
+ax.plot(gross_returns, cdf_rn,  'r--s', ms=5, lw=2, label='Risk-neutral CDF')
+ax.set_xlabel('Gross return $S_T / S_0$')
+ax.set_ylabel('Cumulative probability')
+ax.set_title('Stochastic Dominance: Natural CDF lies below Risk-Neutral CDF')
+ax.legend()
+plt.tight_layout()
+plt.savefig('ross_recovery_stochdom.png', dpi=120)
+plt.show()
+
+# Verify dominance: natural CDF should be <= risk-neutral CDF at every point
+print(f"Natural CDF ≤ Risk-neutral CDF at all states: "
+      f"{np.all(cdf_nat <= cdf_rn + 1e-10)}")
+```
+
+## Extracting the Pricing Kernel and Risk Premium
+
+The pricing kernel recovered from $P$ via the Perron–Frobenius theorem has an
+intuitive interpretation.  In the CRRA model the kernel is proportional to
+$\exp(-\gamma \cdot \text{log-return})$, so it is decreasing in the return.
+
+The **equity risk premium** can be computed as the difference between the
+expected return under the natural measure and the risk-free rate:
+
+$$
+\text{ERP} = E^f[R] - r_f = \frac{\sum_j f_{ij}\, (S_j/S_i)}{\sum_j p_{ij}} - 1.
+$$
+
+```{code-cell} ipython3
+def compute_risk_premia(P, F, states):
+    """
+    Compute the equity risk premium for each starting state.
+
+    Parameters
+    ----------
+    P, F   : (m, m) arrays  State price and natural probability matrices
+    states : (m,) array     Log-state grid
+
+    Returns
+    -------
+    erp    : (m,) array     Equity risk premium from each starting state
+    rf     : (m,) array     Risk-free rate from each starting state
+    """
+    m = len(states)
+    gross_returns = np.exp(states)
+
+    rf  = np.zeros(m)
+    erp = np.zeros(m)
+
+    for i in range(m):
+        discount = P[i].sum()          # riskless discount factor
+        rf[i]  = -np.log(discount)     # risk-free rate
+
+        # Expected gross return under natural measure
+        # We compute E[S_j/S_i] = sum_j F_ij * exp(s_j - s_i)
+        relative_returns = np.exp(states - states[i])
+        E_R_nat = np.sum(F[i] * relative_returns)
+        E_R_rn  = np.sum((P[i] / discount) * relative_returns)
+
+        erp[i] = np.log(E_R_nat) - rf[i]
+
+    return erp, rf
+
+
+erp, rf = compute_risk_premia(P, F, states)
+
+fig, axes = plt.subplots(1, 2, figsize=(13, 5))
+
+axes[0].plot(np.exp(states), rf * 100, 'b-o', ms=5, lw=2)
+axes[0].set_xlabel('Current state $S / S_0$')
+axes[0].set_ylabel('Annual risk-free rate (%)')
+axes[0].set_title('Risk-Free Rate by State')
+
+axes[1].plot(np.exp(states), erp * 100, 'r-^', ms=5, lw=2)
+axes[1].set_xlabel('Current state $S / S_0$')
+axes[1].set_ylabel('Equity Risk Premium (%)')
+axes[1].set_title('Recovered Equity Risk Premium by State')
+
+plt.tight_layout()
+plt.savefig('ross_recovery_erp.png', dpi=120)
+plt.show()
+
+mid = len(states) // 2
+print(f"At the middle state:")
+print(f"  Risk-free rate  ≈ {rf[mid]*100:.2f}% (true: {delta*100:.2f}%)")
+print(f"  Equity premium  ≈ {erp[mid]*100:.2f}% (true: {(mu-delta)*100:.2f}%)")
+```
+
+## Sensitivity Analysis: Effect of Risk Aversion
+
+The shape of the pricing kernel, and hence the gap between natural and
+risk-neutral probabilities, depends on the coefficient of risk aversion $\gamma$.
+
+```{code-cell} ipython3
+gammas = [1.0, 2.0, 3.0, 5.0, 8.0]
+colors = cm.viridis(np.linspace(0.1, 0.9, len(gammas)))
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+for gamma_val, color in zip(gammas, colors):
+    P_g, states_g = build_state_price_matrix(mu, sigma, gamma_val, delta, T)
+    F_g, z_g, delta_g, phi_g = recover_natural_distribution(P_g)
+    mid_g = len(states_g) // 2
+
+    f_nat_g = F_g[mid_g, :]
+    row_sum  = P_g[mid_g].sum()
+    f_rn_g  = P_g[mid_g] / row_sum
+
+    gross = np.exp(states_g)
+    erp_val = (np.sum(f_nat_g * np.exp(states_g - states_g[mid_g]))
+               - np.exp(delta_g)) * 100
+
+    axes[0].plot(gross, phi_g, color=color, lw=2,
+                 label=f'$\\gamma={gamma_val:.0f}$')
+    axes[1].plot(gross, f_nat_g - f_rn_g, color=color, lw=2,
+                 label=f'$\\gamma={gamma_val:.0f}$')
+
+axes[0].set_xlabel('Gross return')
+axes[0].set_ylabel('Kernel $\\phi$')
+axes[0].set_title('Pricing Kernel vs Risk Aversion')
+axes[0].legend(fontsize=9)
+
+axes[1].axhline(0, color='k', lw=0.8, ls='--')
+axes[1].set_xlabel('Gross return')
+axes[1].set_ylabel('Natural minus risk-neutral probability')
+axes[1].set_title('Natural minus Risk-Neutral Density')
+axes[1].legend(fontsize=9)
+
+plt.tight_layout()
+plt.savefig('ross_recovery_sensitivity.png', dpi=120)
+plt.show()
+```
+
+The plots confirm the single-crossing property from Theorem 3 of
+{cite}`Ross2015`: for returns below some threshold $v$, risk-neutral
+probability exceeds natural probability; above $v$ the natural probability
+dominates.  A higher $\gamma$ amplifies this wedge.
+
+## Recovering the Discount Rate
+
+A useful by-product of the Recovery Theorem is the **recovered subjective
+discount rate** $\delta$, which equals the Perron–Frobenius eigenvalue of $P$.
+
+Corollary 1 of {cite}`Ross2015` states that $\delta$ is bounded above by the
+largest observed interest factor (i.e., the maximum row sum of $P$):
+
+$$
+\delta \leq \max_i \sum_j p(\theta_i, \theta_j).
+$$
+
+```{code-cell} ipython3
+# Vary the true discount rate and check how well we recover it
+true_deltas = np.linspace(0.00, 0.06, 13)
+recovered_deltas = []
+
+for d in true_deltas:
+    P_d, _ = build_state_price_matrix(mu, sigma, gamma=3.0, delta=d, T=1.0)
+    _, _, d_rec, _ = recover_natural_distribution(P_d)
+    recovered_deltas.append(d_rec)
+
+plt.figure(figsize=(8, 5))
+plt.plot(true_deltas * 100, true_deltas * 100, 'k--', lw=1.5, label='45° line')
+plt.plot(true_deltas * 100,
+         [-np.log(d_r) * 100 for d_r in recovered_deltas],
+         'bo-', ms=6, lw=2, label='Recovered $\\delta$')
+plt.xlabel('True discount rate (%)')
+plt.ylabel('Recovered discount rate (%)')
+plt.title('Accuracy of Recovered Discount Rate')
+plt.legend()
+plt.tight_layout()
+plt.savefig('ross_recovery_delta.png', dpi=120)
+plt.show()
+```
+
+## Tail Risk: Natural vs. Risk-Neutral Probabilities of Catastrophe
+
+One of the most striking applications of the Recovery Theorem is its ability
+to separate the market's genuine fear of catastrophes from the risk premium
+attached to them.
+
+{cite}`barro2006rare` and {cite}`MehraPrescott1985` discuss how rare disasters
+might explain the equity premium puzzle.  The risk-neutral probability of a
+large decline is elevated both because (a) the market assigns a high natural
+probability to such events and (b) the pricing kernel upweights bad outcomes.
+Recovery lets us decompose these two forces.
+
+```{code-cell} ipython3
+# Compare left-tail probabilities: P(R < threshold) under each measure
+thresholds = np.linspace(-0.40, 0.10, 200)   # log-returns
+
+def tail_prob(f_dist, states, threshold):
+    """CDF evaluated at threshold (log-return)."""
+    return float(np.sum(f_dist[states <= threshold]))
+
+P_base, states_base = build_state_price_matrix(
+    mu, sigma, gamma=3.0, delta=0.02, T=1.0,
+    n_states=41, n_sigma=5)
+F_base, z_base, delta_base, phi_base = recover_natural_distribution(P_base)
+
+mid_b = len(states_base) // 2
+f_nat_base = F_base[mid_b]
+f_rn_base  = P_base[mid_b] / P_base[mid_b].sum()
+
+prob_nat = [tail_prob(f_nat_base, states_base, t) for t in thresholds]
+prob_rn  = [tail_prob(f_rn_base,  states_base, t) for t in thresholds]
+
+fig, ax = plt.subplots(figsize=(10, 5))
+ax.plot(np.exp(thresholds), prob_nat, 'b-',  lw=2, label='Natural (recovered)')
+ax.plot(np.exp(thresholds), prob_rn,  'r--', lw=2, label='Risk-neutral')
+ax.set_xlabel('Gross return threshold')
+ax.set_ylabel('Probability of decline below threshold')
+ax.set_title('Tail Probabilities: Natural vs. Risk-Neutral')
+ax.axvline(x=0.75, color='gray', ls=':', lw=1.5, label='25% decline')
+ax.axvline(x=0.70, color='silver', ls=':', lw=1.5, label='30% decline')
+ax.legend()
+plt.tight_layout()
+plt.savefig('ross_recovery_tail.png', dpi=120)
+plt.show()
+
+# Print specific tail probabilities
+for thresh, label in [(-0.25, '25% decline'), (-0.30, '30% decline'),
+                       (-0.10, '10% decline')]:
+    p_n = tail_prob(f_nat_base, states_base, thresh)
+    p_r = tail_prob(f_rn_base,  states_base, thresh)
+    print(f"P(log-return < {thresh:.0%}):   Natural = {p_n:.4f},   "
+          f"Risk-Neutral = {p_r:.4f},   Ratio = {p_r/p_n:.2f}x")
+```
+
+The risk-neutral density assigns higher probability to large drops than the
+recovered natural density.  The ratio captures the additional weight from risk
+aversion — the premium investors demand to bear tail risk.
+
+## Testing Efficient Markets
+
+{cite}`Ross2015` shows that once the pricing kernel is recovered, one obtains
+an **upper bound on the Sharpe ratio** for any investment strategy:
+
+$$
+\sigma(\phi) \geq e^{-rT} \frac{|\mu_\text{excess}|}{\sigma_\text{asset}},
+$$
+
+where $\sigma(\phi)$ is the standard deviation of the pricing kernel.  This
+follows from the Hansen–Jagannathan bound {cite}`Hansen_Jagannathan_1991`.
+
+Equivalently, the $R^2$ of any return-forecasting regression using publicly
+available information is bounded above by the variance of the pricing kernel:
+
+$$
+R^2 \leq e^{2rT} \, \mathrm{Var}(\phi).
+$$
+
+```{code-cell} ipython3
+def kernel_variance(phi, f_nat):
+    """Variance of the pricing kernel under the natural measure."""
+    E_phi   = np.sum(phi * f_nat)
+    E_phi2  = np.sum(phi**2 * f_nat)
+    return E_phi2 - E_phi**2, E_phi
+
+
+var_phi, E_phi = kernel_variance(phi_base, f_nat_base)
+std_phi = np.sqrt(var_phi)
+
+print(f"Pricing kernel statistics (one year):")
+print(f"  E[φ]     = {E_phi:.4f}")
+print(f"  Var(φ)   = {var_phi:.4f}")
+print(f"  Std(φ)   = {std_phi:.4f}")
+print(f"\nHansen-Jagannathan bound on Sharpe ratio: {std_phi:.4f}")
+print(f"Upper bound on R² in return forecasting: {var_phi:.4f}")
+```
+
+## Limitations and Extensions
+
+The Recovery Theorem is a remarkable theoretical result, but several caveats
+apply in practice.
+
+**Finite state space.** The theorem requires a bounded, irreducible Markov
+chain.  In continuous, unbounded state spaces (e.g., a lognormal diffusion),
+uniqueness fails because any exponential $e^{\alpha x}$ satisfies the
+characteristic equation.  {cite}`CarrYu2012` establish recovery with a bounded
+diffusion.
+
+**Transition independence.** If the kernel is not transition independent,
+recovery is not guaranteed.  {cite}`BorovickaHansenScheinkman2016` show that
+the Ross recovery can confound the long-run risk component of the kernel with
+the natural probability distribution, yielding an incorrect decomposition.
+
+**Empirical estimation.** Extracting reliable state prices from observed option
+prices requires careful interpolation and extrapolation.  The mapping from
+implied volatilities to state prices via the {cite}`BreedenLitzenberger1978` formula involves second derivatives, which amplify measurement error.
+
+**State dependence.** The state must capture all relevant variables: the level
+of volatility, not just the current index level, is an important state variable
+for equity options.
+
+## Exercises
+
+```{exercise}
+:label: rt_ex1
+
+**The Perron–Frobenius vector and the pricing kernel.**
+
+Consider the $3 \times 3$ state price matrix
+
+$$
+P = \begin{pmatrix}
+0.8 & 0.12 & 0.02 \\
+0.10 & 0.75 & 0.10 \\
+0.05 & 0.15 & 0.72
+\end{pmatrix}.
+$$
+
+(a) Compute the dominant eigenvalue $\delta$ and the corresponding eigenvector $z$ of $P$.
+
+(b) Use $z$ to recover the natural probability transition matrix $F$ via
+
+$$
+f_{ij} = \frac{1}{\delta} \frac{z_j}{z_i} p_{ij}.
+$$
+
+(c) Verify that each row of $F$ sums to one and all entries are positive.
+
+(d) Compute the pricing kernel $\phi_i = 1/z_i$ for each state.  Does the
+    kernel decrease as we move from state 1 to state 3 (i.e., from bad to
+    good states)?
+```
+
+```{solution-start} rt_ex1
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+from scipy.linalg import eig
+
+# (a) Dominant eigenvalue and eigenvector
+P_ex = np.array([
+    [0.80, 0.12, 0.02],
+    [0.10, 0.75, 0.10],
+    [0.05, 0.15, 0.72]
+])
+
+eigenvalues, eigenvectors = eig(P_ex)
+real_mask = np.isreal(eigenvalues)
+real_ev   = eigenvalues[real_mask].real
+real_evec = eigenvectors[:, real_mask].real
+
+idx   = np.argmax(real_ev)
+delta_ex = real_ev[idx]
+z_ex  = real_evec[:, idx]
+if z_ex.min() < 0:
+    z_ex = -z_ex
+z_ex = z_ex / z_ex[1]   # normalise to middle state
+
+print(f"(a) Dominant eigenvalue δ = {delta_ex:.6f}")
+print(f"    Eigenvector z          = {z_ex}")
+
+# (b) Recover F
+D_ex    = np.diag(1.0 / z_ex)
+D_inv_ex = np.diag(z_ex)
+F_ex    = (1.0 / delta_ex) * D_ex @ P_ex @ D_inv_ex
+
+print(f"\n(b) Recovered natural transition matrix F:")
+print(np.round(F_ex, 4))
+
+# (c) Row sums
+print(f"\n(c) Row sums of F: {np.round(F_ex.sum(axis=1), 8)}")
+print(f"    All non-negative: {(F_ex >= -1e-10).all()}")
+
+# (d) Pricing kernel
+phi_ex = 1.0 / z_ex
+print(f"\n(d) Pricing kernel φ = {np.round(phi_ex, 4)}")
+print(f"    Kernel decreasing state 1→3: {phi_ex[0] > phi_ex[1] > phi_ex[2]}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: rt_ex2
+
+**Stochastic dominance.**
+
+Using the recovered $F$ and the normalised risk-neutral matrix
+$Q = P / \text{row sums}$ from the exercise above:
+
+(a) Compute the one-step marginal distributions $f_j = F_{2,j}$ and $q_j = Q_{2,j}$
+    starting from state 2 (index 1 in Python).
+
+(b) Compute the CDFs $\hat F_k = \sum_{j \leq k} f_j$ and $\hat Q_k = \sum_{j
+    \leq k} q_j$ for each state.
+
+(c) Verify numerically that $\hat F_k \leq \hat Q_k$ for every $k$, confirming
+    that the natural distribution stochastically dominates the risk-neutral
+    distribution (Theorem 3 of {cite}`Ross2015`).
+```
+
+```{solution-start} rt_ex2
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+P_ex = np.array([
+    [0.80, 0.12, 0.02],
+    [0.10, 0.75, 0.10],
+    [0.05, 0.15, 0.72]
+])
+
+# Recompute F from exercise 1
+from scipy.linalg import eig
+eigenvalues, eigenvectors = eig(P_ex)
+real_mask = np.isreal(eigenvalues)
+real_ev   = eigenvalues[real_mask].real
+real_evec = eigenvectors[:, real_mask].real
+idx   = np.argmax(real_ev)
+delta_ex = real_ev[idx]
+z_ex  = real_evec[:, idx]
+if z_ex.min() < 0:
+    z_ex = -z_ex
+z_ex = z_ex / z_ex[1]
+
+D_ex    = np.diag(1.0 / z_ex)
+D_inv_ex = np.diag(z_ex)
+F_ex    = (1.0 / delta_ex) * D_ex @ P_ex @ D_inv_ex
+F_ex    = np.clip(F_ex, 0, None)
+F_ex   /= F_ex.sum(axis=1, keepdims=True)
+
+# (a) Marginals from state 2 (index 1)
+start = 1
+f_marg = F_ex[start]
+q_marg = P_ex[start] / P_ex[start].sum()
+
+print("(a) One-step marginals from state 2:")
+print(f"    Natural f     = {np.round(f_marg, 4)}")
+print(f"    Risk-neutral q = {np.round(q_marg, 4)}")
+
+# (b) CDFs
+cdf_nat = np.cumsum(f_marg)
+cdf_rn  = np.cumsum(q_marg)
+
+print("\n(b) CDFs:")
+for k in range(3):
+    print(f"    State {k+1}: CDF_nat = {cdf_nat[k]:.4f},  CDF_rn = {cdf_rn[k]:.4f}")
+
+# (c) Stochastic dominance
+dominates = np.all(cdf_nat <= cdf_rn + 1e-10)
+print(f"\n(c) Natural CDF ≤ Risk-neutral CDF at all states: {dominates}")
+print("    → Natural distribution stochastically dominates risk-neutral distribution ✓")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: rt_ex3
+
+**Risk aversion and tail risk.**
+
+Write a function `tail_risk_ratio(gamma, threshold, mu, sigma, delta, T)` that:
+
+1. Constructs the state price matrix $P$ using `build_state_price_matrix` with
+   the given parameters and `n_states=41`.
+2. Applies `recover_natural_distribution` to obtain $F$.
+3. Computes $P(\text{log-return} < \text{threshold})$ under both the natural
+   and risk-neutral distributions starting from the middle state.
+4. Returns the ratio $p_\text{risk-neutral} / p_\text{natural}$.
+
+Using this function, plot the ratio as a function of $\gamma \in [1, 10]$ for
+a threshold of $-30\%$ (i.e., `threshold = -0.30`).
+
+Explain the economic interpretation: why does a higher $\gamma$ raise the ratio?
+```
+
+```{solution-start} rt_ex3
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+def tail_risk_ratio(gamma, threshold, mu=0.08, sigma=0.20, delta=0.02, T=1.0):
+    """
+    Compute ratio of risk-neutral to natural tail probability P(log-return < threshold).
+    """
+    P_g, states_g = build_state_price_matrix(
+        mu, sigma, gamma, delta, T, n_states=41, n_sigma=5)
+
+    F_g, z_g, delta_g, phi_g = recover_natural_distribution(P_g)
+
+    mid_g = len(states_g) // 2
+
+    f_nat_g = F_g[mid_g]
+    f_rn_g  = P_g[mid_g] / P_g[mid_g].sum()
+
+    p_nat = float(np.sum(f_nat_g[states_g <= threshold]))
+    p_rn  = float(np.sum(f_rn_g[states_g  <= threshold]))
+
+    if p_nat < 1e-12:
+        return np.nan
+    return p_rn / p_nat
+
+
+gammas = np.linspace(1.0, 10.0, 20)
+ratios = [tail_risk_ratio(g, -0.30) for g in gammas]
+
+plt.figure(figsize=(9, 5))
+plt.plot(gammas, ratios, 'b-o', ms=5, lw=2)
+plt.xlabel('Risk aversion coefficient $\\gamma$')
+plt.ylabel('Risk-neutral / Natural tail probability')
+plt.title('Tail Risk Ratio for a 30% Decline vs Risk Aversion')
+plt.tight_layout()
+plt.savefig('ross_recovery_ex3.png', dpi=120)
+plt.show()
+
+# Economic interpretation
+print("Economic interpretation:")
+print("A higher γ means the pricing kernel falls more steeply in bad states.")
+print("This upweights bad outcomes (crashes) more heavily under risk-neutral")
+print("probabilities, raising the ratio — even if the true crash probability")
+print("(natural measure) stays the same.")
+print(f"\nRatio at γ=1.0: {tail_risk_ratio(1.0, -0.30):.2f}")
+print(f"Ratio at γ=5.0: {tail_risk_ratio(5.0, -0.30):.2f}")
+print(f"Ratio at γ=10.0: {tail_risk_ratio(10.0, -0.30):.2f}")
+```
+
+**Economic interpretation.**  A higher coefficient of risk aversion $\gamma$
+makes the pricing kernel steeper: the market assigns a larger premium per unit
+of probability to bad-state payoffs.  Risk-neutral probabilities, which
+incorporate this premium, overstate the natural probability of a crash by a
+factor that grows rapidly with $\gamma$.  This is the "dark matter" of finance:
+the high risk-neutral probability of a crash seen in option prices can be
+attributed mostly to risk aversion rather than a genuinely elevated natural
+probability of a catastrophe.
+
+```{solution-end}
+```
+
+## References
+
+```{bibliography}
+:filter: docname in docnames
+```

From 980d0bd898290e2ce18957b36e8024cefacb9c26 Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Wed, 22 Apr 2026 13:11:23 -0600
Subject: [PATCH 11/26] Add Economic Networks citation and reference PF theorem
 to section 1.2.3

---
 lectures/_static/quant-econ.bib |  10 ++++
 lectures/ross_recovery.md       | 100 +++++++++++++++++++-------------
 2 files changed, 71 insertions(+), 39 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 1c88e19c9..0e800599a 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -783,6 +783,16 @@ @book{Sargent_Stachurski_2025
   year={2025}
 }
 
+@book{Sargent_Stachurski_2024,
+  place={Cambridge},
+  series={Structural Analysis in the Social Sciences},
+  title={Economic Networks: Theory and Computation},
+  publisher={Cambridge University Press},
+  author={Sargent, Thomas J. and Stachurski, John},
+  year={2024},
+  collection={Structural Analysis in the Social Sciences}
+}
+
 @incollection{slutsky:1927,
  address = {Moscow},
  author = {Slutsky, Eugen},
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index 2c966c45e..242c7e08c 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -29,15 +29,21 @@ kernelspec:
 ## Overview
 
 From option prices we can extract risk-neutral (martingale) probabilities of
-future outcomes.  But risk-neutral probabilities blend two things: the market's
-*actual* probability beliefs and investors' *risk aversion*.  Disentangling the
-two has long seemed impossible without imposing parametric assumptions on
-preferences.
+future outcomes.
 
-{cite}`Ross2015` showed that under a key assumption — the **transition
+But risk-neutral probabilities blend two things: the market's
+*actual* probability beliefs and investors' *risk aversion*.
+
+Disentangling the
+two seems to require imposing  parametric assumptions on
+preferences of a representative investor.
+
+Nevertheless, {cite}`Ross2015` showed that under a key assumption — the **transition
 independence** of the pricing kernel — the natural (real-world) probability
 distribution and the pricing kernel can be uniquely recovered from state prices
-alone, without historical return data or parametric utility functions.  This
+alone, without historical return data or parametric utility functions.
+
+This
 result is called the **Recovery Theorem**.
 
 The theorem has several important implications.
@@ -46,13 +52,13 @@ The theorem has several important implications.
   distribution from option prices.
 * It provides model-free tests of the efficient market hypothesis.
 * It sheds light on the "dark matter" of finance: the probability of rare
-  catastrophic events embedded in market prices.
+  catastrophic events allegedly embedded in market prices.
 
 This lecture covers
 
 * The basic Arrow–Debreu framework linking state prices, the pricing kernel,
   and natural probabilities.
-* The Recovery Theorem and its proof via the Perron–Frobenius theorem.
+* Ross's Recovery Theorem and its proof via the Perron–Frobenius theorem.
 * A computational implementation that recovers the natural distribution from a
   simulated state-price matrix.
 * Comparisons between risk-neutral and recovered natural densities.
@@ -75,14 +81,20 @@ plt.rcParams['grid.alpha'] = 0.3
 
 ### Arrow–Debreu State Prices
 
-Consider a discrete-time, discrete-state economy.  At each date the economy
-occupies one of $m$ states $\theta_1, \ldots, \theta_m$.  An **Arrow–Debreu
+Consider a discrete-time, discrete-state economy.
+
+At each date the economy
+occupies one of $m$ states $\theta_1, \ldots, \theta_m$.
+
+An **Arrow–Debreu
 security** pays \$1 if the economy is in state $\theta_j$ next period and
 nothing otherwise.
 
 Denote by $p(\theta_i, \theta_j)$ the price today, when the current state is
 $\theta_i$, of the Arrow–Debreu security paying in state $\theta_j$ next
-period.  Collect these into an $m \times m$ **state price transition matrix**
+period.
+
+Collect these into an $m \times m$ **state price transition matrix**
 
 $$
 P = [p(\theta_i, \theta_j)]_{i,j=1}^m.
@@ -115,7 +127,7 @@ The key structural property this implies is **transition independence**.
 
 ### Transition Independence
 
-```{note}
+
 **Definition.** A pricing kernel is *transition independent* if there exists a
 positive function $h$ on the state space and a positive scalar $\delta$ such
 that for every transition from state $\theta_i$ to $\theta_j$,
@@ -123,10 +135,12 @@ that for every transition from state $\theta_i$ to $\theta_j$,
 $$
 \phi(\theta_i, \theta_j) = \delta \, \frac{h(\theta_j)}{h(\theta_i)}.
 $$
-```
+
 
 Transition independence says the kernel depends on the *ending* state and
-normalizes by the *beginning* state.  It holds for any agent with
+normalizes by the *beginning* state.
+
+It holds for any agent with
 intertemporally additive separable utility (where $h = U'$) and also for
 Epstein–Zin recursive preferences {cite}`Epstein_Zin1989`.
 
@@ -151,10 +165,12 @@ $$
 
 ## The Recovery Theorem
 
-### Reducing to an Eigenvalue Problem
+### Reduction to an Eigenvalue Problem
 
 Since $F$ is a stochastic matrix, its rows sum to one: $F e = e$ where $e$
-is the vector of ones.  Substituting the expression for $F$:
+is the vector of ones.
+
+Substituting the expression for $F$:
 
 $$
 \frac{1}{\delta} D P D^{-1} e = e
@@ -168,22 +184,23 @@ $\delta$ satisfying $Pz = \delta z$.
 The Perron–Frobenius theorem guarantees exactly one such solution when $P$ is
 nonnegative and irreducible.
 
-```{note}
 **Theorem (Perron–Frobenius).** Every nonnegative irreducible matrix has a
 unique positive eigenvector (up to scaling) and a unique largest positive
 eigenvalue.
-```
 
-### Statement and Proof Sketch
+Section 1.2.3 of {cite}`Sargent_Stachurski_2024` provides a proof  of this theorem as well as a discussion of its applications to economic networks.
+
+
+### Ross's Recovery Theorem
 
-```{note}
-**Theorem 1 (Recovery Theorem, {cite}`Ross2015`).** Suppose there is no
-arbitrage, the state price transition matrix $P$ is irreducible, and the
+
+**Theorem 1 (Recovery Theorem, {cite}`Ross2015`).** Suppose prices provide  no
+arbitrage opportunities, that the state price transition matrix $P$ is irreducible, and that the
 pricing kernel is transition independent.  Then there exists a *unique*
 positive solution $(\delta, z, F)$ to the recovery problem.  That is, for any
 set of state prices there is a unique compatible natural probability transition
 matrix and a unique pricing kernel.
-```
+
 
 *Proof sketch.*  Because $P$ is nonnegative and irreducible, the
 Perron–Frobenius theorem gives a unique positive eigenvector $z > 0$ with
@@ -219,12 +236,12 @@ states being "good times."
 
 ### Corollary: Risk-Neutral Pricing When Rates Are State-Independent
 
-```{note}
+
 **Theorem 2 ({cite}`Ross2015`).** If the riskless rate is the same in all
 states ($Pe = \gamma e$ for some scalar $\gamma$), then the unique natural
 distribution consistent with recovery is the risk-neutral (martingale)
 distribution itself: $F = (1/\gamma) P$.
-```
+
 
 This remarkable result says that with a constant interest rate and a bounded
 irreducible state space, recovery forces risk-neutrality — a non-trivial
@@ -385,8 +402,10 @@ print(np.round(F.sum(axis=1), 6))
 ### Visualizing Natural vs. Risk-Neutral Distributions
 
 A key insight of {cite}`Ross2015` is that the natural distribution systematically
-differs from the risk-neutral one.  In particular, the natural distribution
-stochastically dominates the risk-neutral distribution (Theorem 3 in the paper).
+differs from the risk-neutral one.
+
+In particular, the natural distribution
+stochastically dominates the risk-neutral distribution (Theorem 3 in {cite}`Ross2015`).
 
 ```{code-cell} ipython3
 def get_marginal(transition_matrix, initial_row, n_periods, states_exp):
@@ -473,7 +492,7 @@ Theorem 3 of {cite}`Ross2015` shows that the natural marginal density
 **first-order stochastically dominates** the risk-neutral density: the CDF of
 the natural distribution lies *below* that of the risk-neutral distribution.
 
-Intuitively, because the pricing kernel is declining (investors fear bad
+Because the pricing kernel is declining (investors fear bad
 outcomes), risk-neutral probabilities overweight bad states and underweight
 good states relative to the natural measure.
 
@@ -500,8 +519,9 @@ print(f"Natural CDF ≤ Risk-neutral CDF at all states: "
 
 ## Extracting the Pricing Kernel and Risk Premium
 
-The pricing kernel recovered from $P$ via the Perron–Frobenius theorem has an
-intuitive interpretation.  In the CRRA model the kernel is proportional to
+The pricing kernel recovered from $P$ via the Perron–Frobenius theorem has the following interpretation.
+
+In the CRRA model the kernel is proportional to
 $\exp(-\gamma \cdot \text{log-return})$, so it is decreasing in the return.
 
 The **equity risk premium** can be computed as the difference between the
@@ -664,10 +684,13 @@ to separate the market's genuine fear of catastrophes from the risk premium
 attached to them.
 
 {cite}`barro2006rare` and {cite}`MehraPrescott1985` discuss how rare disasters
-might explain the equity premium puzzle.  The risk-neutral probability of a
+might explain the equity premium puzzle.  
+
+The risk-neutral probability of a
 large decline is elevated both because (a) the market assigns a high natural
 probability to such events and (b) the pricing kernel upweights bad outcomes.
-Recovery lets us decompose these two forces.
+
+Ross's Recovery Machinery lets us decompose these two forces.
 
 ```{code-cell} ipython3
 # Compare left-tail probabilities: P(R < threshold) under each measure
@@ -712,7 +735,9 @@ for thresh, label in [(-0.25, '25% decline'), (-0.30, '30% decline'),
 ```
 
 The risk-neutral density assigns higher probability to large drops than the
-recovered natural density.  The ratio captures the additional weight from risk
+recovered natural density.
+
+The ratio captures the additional weight from risk
 aversion — the premium investors demand to bear tail risk.
 
 ## Testing Efficient Markets
@@ -724,7 +749,9 @@ $$
 \sigma(\phi) \geq e^{-rT} \frac{|\mu_\text{excess}|}{\sigma_\text{asset}},
 $$
 
-where $\sigma(\phi)$ is the standard deviation of the pricing kernel.  This
+where $\sigma(\phi)$ is the standard deviation of the pricing kernel.
+
+This
 follows from the Hansen–Jagannathan bound {cite}`Hansen_Jagannathan_1991`.
 
 Equivalently, the $R^2$ of any return-forecasting regression using publicly
@@ -1023,8 +1050,3 @@ probability of a catastrophe.
 ```{solution-end}
 ```
 
-## References
-
-```{bibliography}
-:filter: docname in docnames
-```

From c866955eb617b055ec79f41ae031756897a15313 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 23 Apr 2026 11:45:28 +1000
Subject: [PATCH 12/26] updates

---
 lectures/information_market_equilibrium.md | 163 ++++++++++----------
 lectures/multivariate_normal.md            |  16 +-
 lectures/prob_matrix.md                    | 170 +++++++--------------
 3 files changed, 146 insertions(+), 203 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 5dffc3b90..5dd40fc49 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -248,8 +248,8 @@ about $\bar{a}$.
 ```{prf:lemma} Posterior Sufficiency
 :label: ime_lemma_posterior_sufficiency
 
-The posterior distribution $\mu_{\tilde{y}}$
-is sufficient for $\tilde{y}$.
+The posterior distribution $\mu_{\tilde{y}}$ is a sufficient statistic for
+$\tilde{y}$.
 ```
 
 ```{prf:proof} (Sketch)
@@ -275,16 +275,13 @@ belief to price.
 ```{prf:theorem} Price Revelation
 :label: ime_theorem_price_revelation
 
-In the economy described above, the price
-random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only
-if** the
-belief-to-price map is one-to-one on the realized posterior set $M$,
-equivalently if its
-inverse is well defined on the price set
+In the model outlined above, the price random variable $p(\mu_{\tilde{y}})$ is
+sufficient for the random variable $\tilde{y}$ if and only if the function
+$p(P^1)$ is invertible on the set of prices
 
 $$
-\mathcal{P} \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
-  P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \bigr\}.
+\mathcal{P} = \Bigl\{\, p(\mu_y) : y \in Y,\;
+  P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \Bigr\}.
 $$
 ```
 
@@ -370,25 +367,32 @@ argument.
 ```{prf:lemma} Same Price Implies Same Allocation
 :label: ime_lemma_same_price_same_allocation
 
-Assume that $u^i$ has continuous first partial derivatives
-and that $u^i$ is quasi-concave. Let $p\in\mathcal{P}$. If there exist two measures $\mu^*$ and $\mu'$ in $M$ such that $p(\mu*, P^2, . . . ,P^n), = p(\mu',P^2, ... ,P^n)=p$, then
+Assume that $u^i$ has continuous first partial derivatives and that $u^i$ is
+quasi-concave.
+
+Let $p \in \mathcal{P}$.
+
+If there exist two measures $\mu^*$ and $\mu'$ in $M$ such that
+$p(\mu^*, P^2, \ldots, P^n) = p(\mu', P^2, \ldots, P^n) = p$, then
 
 $$
-x^i(\mu^*, P^2, \dots, P^n) = x^i(\mu', P^2, \dots, P^n), \quad 
-i = 1, \dots, n
+x^i(\mu^*, P^2, \ldots, P^n) = x^i(\mu', P^2, \ldots, P^n), \quad
+i = 1, \ldots, n.
 $$
 ```
 
-This lemma says that fix the beliefs of all agents except agent 1.
+Fix the beliefs of all agents except agent 1.
 
-If two posterior beliefs $\mu$ and $\mu'$
-both generate the same equilibrium price $p$, then they generate the same
-equilibrium
-allocation for every trader.
+The lemma says that if two posterior beliefs $\mu^*$ and $\mu'$ for agent 1
+both support the same equilibrium price $p$, then they support the same
+equilibrium allocation for every trader.
+
+The intuition is that when the price is unchanged, the demands of the
+uninformed traders are unchanged too, so market clearing forces the informed
+agent's bundle to be unchanged as well.
 
 This lemma lets us define the informed agent's equilibrium bundle as a function
-of price
-alone:
+of price alone:
 
 $$
 x(p) = (x_1(p), x_2(p)).
@@ -410,18 +414,24 @@ whether this equation admits a unique posterior $\mu$.
 ```{prf:lemma} Unique Posterior at a Given Price
 :label: ime_lemma_unique_posterior
 
-If, for each price $p \in P$, the first-order condition above has a unique
-solution
-$\mu \in M$, then the price map is invertible on $P$.
+Assume that the first partial derivatives of $u^1$ exist and that $u^1$ is
+quasi-concave.
+
+Also assume that agent 1 always consumes positive quantities of both goods.
+
+Then $p(P^1)$ is invertible on $\mathcal{P}$ if for each $p \in \mathcal{P}$
+there exists a unique probability measure $\mu \in M$ such that
+
+$$
+\frac{\sum_{s=1}^S a_s\, u^1_1(a_s x_1(p), x_2(p))\, \mu(a_s)}
+     {\sum_{s=1}^S u^1_2(a_s x_1(p), x_2(p))\, \mu(a_s)} = p.
+$$
 ```
 
-If two different posteriors gave the same price,
-then by
+If two different posteriors gave the same price, then by
 {prf:ref}`ime_lemma_same_price_same_allocation` they would share the same bundle
-$x(p)$,
-contradicting uniqueness of the posterior that solves the first-order condition
-at that
-price.
+$x(p)$, contradicting uniqueness of the posterior that solves the first-order
+condition at that price.
 
 ### The two-state first-order condition
 
@@ -448,27 +458,26 @@ equation.
 ```{prf:theorem} Invertibility Conditions
 :label: ime_theorem_invertibility_conditions
 
-Assume $u^1$ is quasi-concave and
-homothetic with continuous first partials. 
+Assume that the first partial derivatives of $u^1$ exist and that $u^1$ is
+quasi-concave and homothetic.
 
-Assume agent 1 always consumes positive
-quantities of both goods.
+Also suppose that the informed agent always consumes positive quantities of
+both goods in all equilibrium allocations.
 
-For $S = 2$:
+If $S = 2$ and the elasticity of substitution of $u^1$ is either always less
+than one or always greater than one, then $p(P^1)$ is invertible on
+$\mathcal{P}$.
 
-- If $\sigma < 1$ for all feasible allocations, the price map is **invertible**
-  on $P$.
-- If $\sigma > 1$ for all feasible allocations, the price map is **invertible**
-  on $P$.
-- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), the price map is **constant** on
-  $P$
-  (no information is transmitted).
+If $u^1$ is Cobb-Douglas (elasticity of substitution constant and equal to
+one), then $p(P^1)$ is constant on $\mathcal{P}$.
 ```
 
-Thus, when $\sigma = 1$ the income and substitution effects exactly cancel,
-making agent 1's demand for good 1 independent of information about $\bar{a}$.
+When $\sigma = 1$ the income and substitution effects exactly cancel, so
+agent 1's demand for good 1 does not respond to changes in beliefs about
+$\bar{a}$.
 
-So the market price cannot reveal that information.
+Because the demand is unchanged, the market-clearing price is unchanged too,
+and the price reveals nothing about the insider's signal.
 
 ### CES utility
 
@@ -592,14 +601,14 @@ plt.show()
 
 The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
-- *CES with $\sigma \neq 1$*: the equilibrium price is *strictly monotone* in
-  $q$.
+For CES with $\sigma \neq 1$, the equilibrium price is strictly monotone in $q$.
+
+An outside observer who knows the equilibrium map $p^*(\cdot)$ can therefore
+invert the price uniquely to recover $q$, so the inside information is fully
+transmitted.
 
-  - An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely
-    invert the
-  price to recover $q$, that is, inside information is fully transmitted.
-- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$, that is, information is never
-  transmitted through the market.
+For Cobb-Douglas ($\sigma = 1$), the price is flat in $q$, so information is
+never transmitted through the market.
 
 ```{code-cell} ipython3
 p_cd = [eq_price(q, a1, a2, W1, ρ=0.0) for q in q_grid]
@@ -620,15 +629,16 @@ pattern back to the proof of {prf:ref}`ime_theorem_invertibility_conditions`.
 (price_monotonicity)=
 ### Why monotonicity depends on $\sigma$
 
-The key derivative in the paper fixes a price $p$, treats $\alpha_s(p)$ and
-$\beta_s(p)$ as constants, and then differentiates the right-hand side of
+Fix a price $p$ and treat $\alpha_s(p)$ and $\beta_s(p)$ as constants.
+
+The right-hand side of the two-state first-order condition
 
 $$
 \frac{\alpha_1(p)\, q + \alpha_2(p)\, (1-q)}
      {\beta_1(p)\, q + \beta_2(p)\, (1-q)}
 $$
 
-is a function of $q$ whose derivative is
+is then a function of $q$ alone, with derivative
 
 $$
 \frac{\partial}{\partial q}
@@ -705,12 +715,12 @@ plt.tight_layout()
 plt.show()
 ```
 
-When $\sigma = 1$ the ratio is constant across all $a_s$ values, information
-about the state has no effect on the marginal rate of substitution.
+When $\sigma = 1$ the ratio is constant across all $a_s$ values, so
+information about the state has no effect on the marginal rate of substitution.
 
-For $\sigma < 1$ the
-ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
-equilibrium price strictly monotone in the posterior $q$ in both cases.
+For $\sigma < 1$ the ratio is decreasing in $a_s$, and for $\sigma > 1$ it is
+increasing, making the equilibrium price strictly monotone in the posterior $q$
+in both cases.
 
 The static analysis asks whether a current price reveals current private
 information, whereas the next section asks what a whole history of prices
@@ -1201,11 +1211,10 @@ that theorem.
 :class: dropdown
 ```
 
-**1. First-order condition.**
+For the first-order condition, define $W_s = w + (a_s - p)\,x_1$ for
+$s = 1, 2$.
 
-Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.
-
-The FOC is
+Then the FOC is
 
 $$
 q\,(a_1 - p)\,\gamma\, e^{-\gamma W_1}
@@ -1219,11 +1228,8 @@ q\,(a_1 - p)\, e^{-\gamma(a_1-p) x_1}
   = (1-q)\,(p - a_2)\, e^{\gamma(p-a_2) x_1}.
 $$
 
-**2. Market-clearing equilibrium price.**
-
-Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
-
-a scalar root-finding problem in $p$:
+Setting $x_1 = 1$ (the informed agent absorbs all supply), this becomes a
+scalar root-finding problem in $p$:
 
 $$
 F(p;\,q,\gamma) \equiv
@@ -1259,16 +1265,15 @@ plt.tight_layout()
 plt.show()
 ```
 
-**3. Invertibility for CARA.**
+The price is strictly increasing in $q$ for every $\gamma > 0$.
 
-The price is strictly increasing in $q$ for every $\gamma > 0$, because
-portfolio utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect
-substitutes** in creating wealth, so a higher posterior probability of the
-high-return state raises the marginal value of the risky asset and pushes the
-equilibrium price upward.
+The reason is that portfolio utility $u(x_2 + \bar{a}\,x_1)$ treats the two
+goods as perfect substitutes in creating wealth, so a higher posterior
+probability of the high-return state raises the marginal value of the risky
+asset and pushes the equilibrium price upward.
 
 This behavior is similar in spirit to the $\sigma > 1$ case in
-{prf:ref}`ime_theorem_invertibility_conditions`, but it is *not* a direct
+{prf:ref}`ime_theorem_invertibility_conditions`, but it is not a direct
 consequence of that theorem because CARA utility over wealth is not homothetic
 in the two-good representation used in the theorem.
 
@@ -1486,10 +1491,10 @@ D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(2.3, 0.4^2)\bigr)
 D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(1.5, 0.4^2)\bigr),
 $$
 
-so the model with mean $2.3$ is the KL-best approximation among the two
-wrong models, and in the simulation posterior weight concentrates on that model.
+so the model with mean $2.3$ is the KL-best approximation among the two wrong
+models, and in the simulation posterior weight concentrates on that model.
 
-Since posterior odds are cumulative {doc}`likelihood ratios<likelihood_bayes>`.
+Posterior odds are cumulative {doc}`likelihood ratios<likelihood_bayes>`.
 
 If we compare the two wrong Gaussian models $f$ and $g$, then under the true
 distribution $h$ the average log likelihood ratio satisfies
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index a3be75575..70f361ae9 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -1003,7 +1003,7 @@ w_{6}
 $$
 
 where
-$w \begin{bmatrix} w_1 \cr w_2 \cr \vdots \cr w_6 \end{bmatrix}$
+$w = \begin{bmatrix} w_1 \cr w_2 \cr \vdots \cr w_6 \end{bmatrix}$
 is a standard normal random vector.
 
 We construct a Python function `construct_moments_IQ2d` to construct
@@ -1066,7 +1066,7 @@ multi_normal_IQ2d.partition(k)
 multi_normal_IQ2d.cond_dist(1, [*y1, *y2])
 ```
 
-Now let’s compute distributions of $\theta$ and $\mu$
+Now let’s compute distributions of $\theta$ and $\eta$
 separately conditional on various subsets of test scores.
 
 It will be fun to compare outcomes with the help of an auxiliary function
@@ -1423,7 +1423,7 @@ This example is an instance of what is known as a **Wold representation** in tim
 Consider the stochastic second-order linear difference equation
 
 $$
-y_{t} = \alpha_{0} + \alpha_{1} y_{y-1} + \alpha_{2} y_{t-2} + u_{t}
+y_{t} = \alpha_{0} + \alpha_{1} y_{t-1} + \alpha_{2} y_{t-2} + u_{t}
 $$
 
 where $u_{t} \sim N \left(0, \sigma_{u}^{2}\right)$ and
@@ -1518,7 +1518,6 @@ $$
 
 ```{code-cell} python3
 # set parameters
-T = 80
 T = 160
 # coefficients of the second order difference equation
 𝛼0 = 10
@@ -1526,7 +1525,6 @@ T = 160
 𝛼2 = -.9
 
 # variance of u
-σu = 1.
 σu = 10.
 
 # distribution of y_{-1} and y_{0}
@@ -1840,7 +1838,7 @@ of $x_t$ conditional on
 $y_0, y_1, \ldots , y_{t-1} = y^{t-1}$ is
 
 $$
-x_t | y^{t-1} \sim {\mathcal N}(A \tilde x_t , A \tilde \Sigma_t A' + C C' )
+x_t | y^{t-1} \sim {\mathcal N}(A \tilde x_{t-1} , A \tilde \Sigma_{t-1} A' + C C' )
 $$
 
 where $\{\tilde x_t, \tilde \Sigma_t\}_{t=1}^\infty$ can be
@@ -2015,7 +2013,7 @@ $\Lambda \Lambda^\top$ of rank $k$.
 
 This means that all covariances among the $n$ components of the
 $Y$ vector are intermediated by their common dependencies on the
-$k<$ factors.
+$k$ factors.
 
 Form
 
@@ -2277,8 +2275,8 @@ $Y$ on the first two principal components does a good job of
 approximating $Ef \mid y$.
 
 We confirm this in the following plot of $f$,
-$E y \mid f$, $E f \mid y$, and $\hat{y}$ on the
-coordinate axis versus $y$ on the ordinate axis.
+$E y \mid f$, $E f \mid y$, and $\hat{y}$ against the
+observation index on the horizontal axis.
 
 ```{code-cell} python3
 plt.scatter(range(N), Λ @ f, label='$Ey|f$')
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index bc5ac21f4..3a63a54d3 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -53,6 +53,8 @@ As usual, we'll start with some imports
 import numpy as np
 import matplotlib.pyplot as plt
 import prettytable as pt
+from scipy import stats
+from scipy.special import comb
 from mpl_toolkits.mplot3d import Axes3D
 from matplotlib_inline.backend_inline import set_matplotlib_formats
 set_matplotlib_formats('retina')
@@ -300,7 +302,7 @@ Note that a sufficient statistic corresponds to a particular statistical model.
 
 Sufficient statistics are key tools that AI uses to summarize or compress a **big data** set.
 
-R. A. Fisher provided a rigorous definition of **information** -- see <https://en.wikipedia.org/wiki/Fisher_information>.
+R. A. Fisher provided a rigorous definition of **information** -- see [Fisher information](https://en.wikipedia.org/wiki/Fisher_information).
 
 
 
@@ -370,7 +372,7 @@ $$
 
 ## Marginal probability distributions
 
-The joint distribution induce marginal distributions
+The joint distribution induces marginal distributions
 
 $$
 \textrm{Prob}\{X=i\}= \sum_{j=0}^{J-1}f_{ij} = \mu_i, \quad i=0,\ldots,I-1
@@ -433,7 +435,7 @@ where $i=0, \ldots,I-1, \quad j=0,\ldots,J-1$.
 Note that
 
 $$
-\sum_{i}\textrm{Prob}\{X_i=i|Y_j=j\}
+\sum_{i}\textrm{Prob}\{X=i|Y=j\}
 =\frac{ \sum_{i}f_{ij} }{ \sum_{i}f_{ij}}=1
 $$
 
@@ -444,7 +446,11 @@ $$
 $$ (eq:condprobbayes)
 
 ```{note}
-Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 0,  \ldots, I-1$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
+Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. 
+
+A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 0,  \ldots, I-1$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
+
+
 He  then interprets  formula {eq}`eq:condprobbayes` as a procedure for constructing a **posterior** distribution that describes how he would  revise his subjective beliefs after observing that $Y$ equals $j$.  
 ```
 
@@ -491,8 +497,8 @@ where
 $$
 \left[
    \begin{matrix}
-  p_{11} & p_{12}\\
-  p_{21} & p_{22}
+  p_{00} & p_{01}\\
+  p_{10} & p_{11}
   \end{matrix}
 \right]
 $$
@@ -519,7 +525,7 @@ Suppose that
 
 $$
 \begin{aligned}
-\text{Prob} \{X(0)=i,X(1)=j\} &=f_{ij}≥0，i=0,\cdots,I-1\\
+\text{Prob} \{X(0)=i,X(1)=j\} &=f_{ij}\geq 0, \quad i=0,\cdots,I-1, \quad j=0,\cdots,J-1\\
 \sum_{i}\sum_{j}f_{ij}&=1
 \end{aligned}
 $$
@@ -545,8 +551,8 @@ where
 
 $$
 \begin{aligned}
-\textrm{Prob}\{X=i\} &=f_i\ge0， \sum{f_i}=1 \cr
-\textrm{Prob}\{Y=j\} & =g_j\ge0， \sum{g_j}=1
+\textrm{Prob}\{X=i\} &=f_i\ge 0, \quad \sum_{i}{f_i}=1 \cr
+\textrm{Prob}\{Y=j\} & =g_j\ge 0, \quad \sum_{j}{g_j}=1
 \end{aligned}
 $$
 
@@ -572,7 +578,7 @@ $$
 \end{aligned}
 $$
 
-A continuous random variable having  density $f_{X}(x)$) has  mean and variance
+A continuous random variable having  density $f_{X}(x)$ has  mean and variance
 
 $$
 \begin{aligned}
@@ -1136,10 +1142,10 @@ Start with a joint distribution
 $$
 \begin{aligned}
 f_{ij} & =\textrm{Prob}\{X=i,Y=j\}\\
-i& =0, \cdots，I-1\\
-j& =0, \cdots，J-1\\
-& \text{stacked to an }I×J\text{ matrix}\\
-& e.g. \quad I=1, J=1
+i& =0, \cdots, I-1\\
+j& =0, \cdots, J-1\\
+& \text{stacked to an }I\times J\text{ matrix}\\
+& e.g. \quad I=2, J=2
 \end{aligned}
 $$
 
@@ -1148,8 +1154,8 @@ where
 $$
 \left[
    \begin{matrix}
-  f_{11} & f_{12}\\
-  f_{21} & f_{22}
+  f_{00} & f_{01}\\
+  f_{10} & f_{11}
   \end{matrix}
 \right]
 $$
@@ -1158,7 +1164,7 @@ From the joint distribution, we have shown above that we  obtain **unique** marg
 
 Now we'll try to go in a reverse direction.
 
-We'll find that from two marginal distributions, can we usually construct more than one   joint distribution that verifies these marginals.
+We'll find that from two marginal distributions we can usually construct more than one joint distribution that satisfies these marginals.
 
 Each of these joint distributions is called a **coupling** of the two marginal distributions.
 
@@ -1171,7 +1177,7 @@ $$
 \end{aligned}
 $$
 
-Given two marginal distribution, $\mu$ for $X$ and $\nu$ for $Y$, a joint distribution $f_{ij}$ is said to be a **coupling** of $\mu$ and $\nu$.
+Given two marginal distributions, $\mu$ for $X$ and $\nu$ for $Y$, a joint distribution $f_{ij}$ is said to be a **coupling** of $\mu$ and $\nu$.
 
 Consider the following bivariate example.
 
@@ -1187,7 +1193,7 @@ $$
 
 We construct  two couplings.
 
-The first coupling if our two marginal distributions is the joint distribution
+The first coupling of our two marginal distributions is the joint distribution
 
 $$f_{ij}=
 \left[
@@ -1223,7 +1229,7 @@ f_{ij}=
 \right]
 $$
 
-The verify that this is a coupling, note that
+To verify that this is a coupling, note that
 
 $$
 \begin{aligned}
@@ -1258,6 +1264,7 @@ H(x_1,x_2,\dots,x_N) = C(F_1(x_1), F_2(x_2),\dots,F_N(x_N)).
 $$
 
 If the marginal distributions are continuous, then the copula is unique.
+
 In that case, we can recover it from the marginal inverses:
 
 $$
@@ -1365,7 +1372,7 @@ draws1 = 1_000_000
 # generate draws from uniform distribution
 p = np.random.rand(draws1)
 
-# generate draws of first copuling via uniform distribution
+# generate draws of first coupling via uniform distribution
 c1 = np.vstack([np.ones(draws1), np.ones(draws1)])
 # X=0, Y=0
 c1[0, p <= f1_cum[0]] = 0
@@ -1440,7 +1447,7 @@ draws2 = 1_000_000
 # generate draws from uniform distribution
 p = np.random.rand(draws2)
 
-# generate draws of first coupling via uniform distribution
+# generate draws of second coupling via uniform distribution
 c2 = np.vstack([np.ones(draws2), np.ones(draws2)])
 # X=0, Y=0
 c2[0, p <= f2_cum[0]] = 0
@@ -1464,7 +1471,7 @@ f2_10 = sum((c2[0, :] == 1)*(c2[1, :] == 0))/draws2
 f2_11 = sum((c2[0, :] == 1)*(c2[1, :] == 1))/draws2
 
 # print output of second joint distribution
-print("first joint distribution for c2")
+print("second joint distribution for c2")
 c2_mtb = pt.PrettyTable()
 c2_mtb.field_names = ['c2_x_value', 'c2_y_value', 'c2_prob']
 c2_mtb.add_row([0, 0, f2_00])
@@ -1507,7 +1514,8 @@ arbitrary marginal distributions.
 The construction has three steps:
 
 1. Draw $(Z_1, Z_2)$ from a bivariate standard normal with correlation $\rho$.
-2. Apply the standard normal CDF: $U_k = \Phi(Z_k)$. The pair $(U_1, U_2)$ has uniform marginals but retains the dependence structure of $(Z_1, Z_2)$ — this is the copula.
+2. Apply the standard normal CDF: $U_k = \Phi(Z_k)$. 
+   - The pair $(U_1, U_2)$ has uniform marginals but retains the dependence structure of $(Z_1, Z_2)$ --- this is the copula.
 3. Apply the inverse CDF of any desired marginal: $X_k = F_k^{-1}(U_k)$.
 
 The following code illustrates this with exponential marginals.
@@ -1519,22 +1527,21 @@ mystnb:
     caption: gaussian copula with exponential marginals
     name: fig-gaussian-copula
 ---
-from scipy import stats
 
 # Gaussian copula parameters
 ρ_cop = 0.8
 n_cop = 100_000
 
-# Step 1: draw from bivariate standard normal with correlation ρ_cop
+# Draw from bivariate standard normal with correlation ρ_cop
 z = np.random.multivariate_normal(
     [0, 0], [[1, ρ_cop], [ρ_cop, 1]], n_cop
 )
 
-# Step 2: apply normal CDF -> uniform marginals (the copula itself)
+# Apply normal CDF -> uniform marginals (the copula itself)
 u1 = stats.norm.cdf(z[:, 0])
 u2 = stats.norm.cdf(z[:, 1])
 
-# Step 3: apply inverse CDFs of desired marginals (here: Exponential)
+# Apply inverse CDFs of desired marginals (here: Exponential)
 x1 = stats.expon.ppf(u1, scale=1.0)   # Exp with mean 1
 x2 = stats.expon.ppf(u2, scale=0.5)   # Exp with mean 0.5
 
@@ -1545,7 +1552,6 @@ axes[0].set_ylabel('$u_2$')
 axes[1].scatter(x1[:3000], x2[:3000], alpha=0.2, s=2)
 axes[1].set_xlabel('$x_1$ (Exp, mean=1)')
 axes[1].set_ylabel('$x_2$ (Exp, mean=0.5)')
-plt.tight_layout()
 plt.show()
 
 print(f"Sample correlation of (x1, x2): {np.corrcoef(x1, x2)[0, 1]:.3f}")
@@ -1587,8 +1593,6 @@ where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-
 F = np.array([[0.3, 0.2],
               [0.1, 0.4]])
 
@@ -1601,7 +1605,7 @@ F_indep = np.outer(μ, ν)
 print("\nIndependence matrix (outer product):\n", F_indep)
 print("\nActual joint F:\n", F)
 
-print("\nIndependent (F == μ ⊗ ν)?", np.allclose(F, F_indep))
+print("\nIndependent (F == μ times ν)?", np.allclose(F, F_indep))
 
 prob_X0_given_Y10 = F[0, 0] / ν[0]
 print(f"\nProb(X=0 | Y=10) = {prob_X0_given_Y10:.4f}")
@@ -1632,8 +1636,6 @@ Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 2
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-
 xs = np.array([0, 1])
 ys = np.array([10, 20])
 F  = np.array([[0.3, 0.2],
@@ -1672,17 +1674,17 @@ and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0
 ```{exercise}
 :label: prob_matrix_ex3
 
-**Sum of Two Dice (Convolution)**
+**Sum of Two Dice**
 
 Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
 
 1. Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
 
-1. Plot the theoretical distribution.
+1. Plot the result generated by the formula.
 
 1. Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
 
-1. Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
+1. Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ from the two calculations
 ```
 
 ```{solution-start} prob_matrix_ex3
@@ -1690,11 +1692,14 @@ Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z =
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
-
 f = np.ones(6) / 6
-h = np.convolve(f, f)
+g = np.ones(6) / 6
+h = [
+    sum(f[i]*g[k-i] for i in range(
+        max(0, k-len(g)+1), # f_i exists 
+        min(len(f), k+1))   # g_{k-i} exists
+        ) 
+        for k in range(len(f) + len(g) - 1)]
 z_vals = np.arange(2, 13)
 
 n = 1_000_000
@@ -1743,10 +1748,8 @@ where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-
-P    = np.array([[0.9, 0.1],
-                 [0.2, 0.8]])
+P = np.array([[0.9, 0.1],
+              [0.2, 0.8]])
 ψ0 = np.array([1.0, 0.0])
 
 for n in [1, 5, 20, 100]:
@@ -1767,68 +1770,6 @@ print(f"ψ_100 close to stationary? {np.allclose(ψ_100, ψ_star, atol=1e-6)}")
 ```{exercise}
 :label: prob_matrix_ex5
 
-**Fréchet–Hoeffding Bounds**
-
-Let $X \in \{0,1\}$ and $Y \in \{0,1\}$ with marginals $\mu = [0.5,\, 0.5]$ and $\nu = [0.4,\, 0.6]$.
-
-1. Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
-
-1. Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
-
-1. Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
-
-1. Verify that all three have the correct marginals.
-
-1. For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
-```
-
-```{solution-start} prob_matrix_ex5
-:class: dropdown
-```
-
-```{code-cell} ipython3
-import numpy as np
-
-xs = np.array([0, 1])
-ys = np.array([0, 1])
-μ = np.array([0.5, 0.5])
-ν = np.array([0.4, 0.6])
-
-F_upper = np.array([[0.4, 0.1],
-                    [0.0, 0.5]])
-
-F_lower = np.array([[0.0, 0.5],
-                    [0.4, 0.1]])
-
-F_indep = np.outer(μ, ν)
-
-for F, name in [(F_upper, "Upper Fréchet"),
-                (F_lower, "Lower Fréchet"),
-                (F_indep, "Independent  ")]:
-    print(f"{name}: row sums = {F.sum(axis=1)}, col sums = {F.sum(axis=0)}")
-
-def correlation(F, xs, ys):
-    μ_x  = F.sum(axis=1)
-    ν_y  = F.sum(axis=0)
-    E_X  = xs @ μ_x
-    E_Y  = ys @ ν_y
-    E_XY  = sum(xs[i]*ys[j]*F[i,j] for i in range(2) for j in range(2))
-    cov   = E_XY - E_X * E_Y
-    σ_X = np.sqrt(((xs - E_X)**2) @ μ_x)
-    σ_Y = np.sqrt(((ys - E_Y)**2) @ ν_y)
-    return cov / (σ_X * σ_Y)
-
-print(f"\nCor upper Fréchet = {correlation(F_upper, xs, ys):.4f}  (maximum)")
-print(f"Cor lower Fréchet = {correlation(F_lower, xs, ys):.4f}  (minimum)")
-print(f"Cor independent   = {correlation(F_indep, xs, ys):.4f}")
-```
-
-```{solution-end}
-```
-
-```{exercise}
-:label: prob_matrix_ex6
-
 **Bayes' Law with a Discrete Prior**
 
 A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
@@ -1848,15 +1789,11 @@ for each $\theta$.
 1. Repeat for $k = 3$ heads and describe how the posterior shifts.
 ```
 
-```{solution-start} prob_matrix_ex6
+```{solution-start} prob_matrix_ex5
 :class: dropdown
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
-from scipy.special import comb
-
 θ_vals = np.array([0.2, 0.5, 0.8])
 π = np.array([0.25, 0.50, 0.25])
 
@@ -1868,13 +1805,16 @@ def compute_posterior(k, n, θ_vals, π):
 post7, lik7 = compute_posterior(7, 10, θ_vals, π)
 post3, lik3 = compute_posterior(3, 10, θ_vals, π)
 
-print("k=7:  likelihood =", lik7.round(4), " posterior =", post7.round(4))
-print("k=3:  likelihood =", lik3.round(4), " posterior =", post3.round(4))
+print("k=7:  likelihood =", lik7.round(4), 
+      " posterior =", post7.round(4))
+print("k=3:  likelihood =", lik3.round(4), 
+      " posterior =", post3.round(4))
 
 x = np.arange(len(θ_vals))
 w = 0.3
 fig, axes = plt.subplots(1, 2, figsize=(10, 4))
-for ax, post, title in zip(axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
+for ax, post, title in zip(
+    axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
     ax.bar(x - w/2, π, w, label='Prior',     alpha=0.7)
     ax.bar(x + w/2, post,  w, label='Posterior', alpha=0.7)
     ax.set_xticks(x)

From b8aed808ea763bf54a61abc6c486b2eaa2585ed0 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 23 Apr 2026 12:04:43 +1000
Subject: [PATCH 13/26] updates

---
 lectures/misspecified_recovery.md | 33 ++++++++++------
 lectures/ross_recovery.md         | 65 ++++++++++++++++++-------------
 2 files changed, 61 insertions(+), 37 deletions(-)

diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index 60eabc320..f1a980228 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -85,7 +85,7 @@ plt.rcParams.update({
 })
 ```
 
-## Arrow Prices and the Identification Challenge
+## Arrow prices and the identification challenge
 
 ### Arrow prices and stochastic discount factors
 
@@ -156,7 +156,7 @@ print(np.round(Q_mat, 5))
 print(f"\nSum of each row (= price of risk-free bond): {Q_mat.sum(axis=1).round(5)}")
 ```
 
-## Risk-Neutral Probabilities
+## Risk-neutral probabilities
 
 The **risk-neutral restriction** sets
 
@@ -195,13 +195,16 @@ print(f"\nRow sums: {P_bar.sum(axis=1)}")
 ```
 
 ```{note}
-Risk-neutral probabilities absorb **one-period** (short-run) risk adjustments.  They are
+Risk-neutral probabilities absorb **one-period** (short-run) risk adjustments.
+
+They are
 widely used in financial engineering but are generally *not* equal to investors' beliefs.
+
 When short-term interest rates vary across states, risk-neutral probabilities are
 also horizon-dependent: the $t$-period forward measure differs from $\bar{\mathbf{P}}^t$.
 ```
 
-## Long-Term Risk-Neutral Probabilities: Perron–Frobenius Theory
+## Long-term risk-neutral probabilities: Perron–Frobenius theory
 
 ### The eigenvalue problem
 
@@ -370,11 +373,13 @@ measure $\mathbf{P}$.
 This is the risk adjustment for long-run growth uncertainty: a
 risk-averse investor's long-run discount rates embed a premium for permanent income risk.
 
-## The Martingale Decomposition
+## The martingale decomposition
 
 ### Decomposing the SDF process
 
-Let $\hat{\mathbf{e}}$ and $\hat{\eta}$ solve the Perron–Frobenius problem.  Define the
+Let $\hat{\mathbf{e}}$ and $\hat{\eta}$ solve the Perron–Frobenius problem.
+
+Define the
 process
 
 $$
@@ -431,7 +436,7 @@ recovered measure assigns growing probability to the recession state.
 (Gigures illustrating this will appear below, after we define the Epstein–Zin utility
 function that is needed to compute them.)
 
-## When Does Recovery Succeed?
+## When does recovery succeed?
 
 ### The Ross recovery condition
 
@@ -454,7 +459,9 @@ $$
 s_{ij} = \exp(\hat{\eta}) \frac{\hat{e}_i}{\hat{e}_j}
 $$
 
-with $\hat{h}_{ij} \equiv 1$.  In this case $\hat{\mathbf{P}} = \mathbf{P}$ and the
+with $\hat{h}_{ij} \equiv 1$.
+
+In this case $\hat{\mathbf{P}} = \mathbf{P}$ and the
 Perron–Frobenius procedure recovers the true probabilities.
 
 The critical question is: when is the martingale component degenerate?
@@ -682,7 +689,7 @@ martingale is trivial, so recovery succeeds.
 For $\gamma > 1$, continuation values vary
 with the state, generating a non-degenerate martingale that grows with risk aversion.
 
-## The Long-Run Risk Model
+## The long-run risk model
 
 We now illustrate the results quantitatively using the Bansal–Yaron
 {cite}`Bansal_Yaron_2004` long-run risk model, calibrated to {cite}`BorovickaHansenScheinkman2016`
@@ -1030,7 +1037,7 @@ Forecasts made using
 $\hat{P}$ are systematically pessimistic compared to forecasts based on the true
 distribution $P$.
 
-## Measuring the Martingale Component
+## Measuring the martingale component
 
 ### Entropy bounds
 
@@ -1106,6 +1113,7 @@ plt.tight_layout();  plt.show()
 
 All three discrepancy measures increase with risk aversion, confirming that a higher
 $\gamma$ implies a larger — and more economically significant — martingale component.
+
 {cite}`AlvarezJermann2005` and {cite}`BakshiChabiYo2012` use analogous bounds with
 long-maturity bond returns to find empirically large martingale components in U.S. data.
 
@@ -1335,7 +1343,10 @@ $\hat{h}_{ij} \equiv 1$.
 **Analytical derivation:**
 
 With $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ we have $q_{ij} = A(c_j/c_i)^{-\gamma} p_{ij}$.
-Guess $\hat{e}_j = c_j^\gamma$.  Then
+
+Guess $\hat{e}_j = c_j^\gamma$.
+
+Then
 
 $$
 [\mathbf{Q} \hat{\mathbf{e}}]_i
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index 242c7e08c..f1034171a 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -77,9 +77,9 @@ plt.rcParams['axes.grid'] = True
 plt.rcParams['grid.alpha'] = 0.3
 ```
 
-## Model Setup
+## Model setup
 
-### Arrow–Debreu State Prices
+### Arrow–Debreu state prices
 
 Consider a discrete-time, discrete-state economy.
 
@@ -103,7 +103,7 @@ $$
 The row sums give the state-dependent interest factor: $\sum_j p(\theta_i,
 \theta_j) = e^{-r(\theta_i)}$.
 
-### The Pricing Kernel
+### The pricing kernel
 
 From the Fundamental Theorem of Asset Pricing, the pricing kernel
 $\phi(\theta_i, \theta_j)$ relates state prices to natural probabilities via
@@ -125,7 +125,7 @@ $$
 
 The key structural property this implies is **transition independence**.
 
-### Transition Independence
+### Transition independence
 
 
 **Definition.** A pricing kernel is *transition independent* if there exists a
@@ -163,9 +163,9 @@ $$
 F = \frac{1}{\delta} D P D^{-1}.
 $$
 
-## The Recovery Theorem
+## The recovery theorem
 
-### Reduction to an Eigenvalue Problem
+### Reduction to an eigenvalue problem
 
 Since $F$ is a stochastic matrix, its rows sum to one: $F e = e$ where $e$
 is the vector of ones.
@@ -191,7 +191,7 @@ eigenvalue.
 Section 1.2.3 of {cite}`Sargent_Stachurski_2024` provides a proof  of this theorem as well as a discussion of its applications to economic networks.
 
 
-### Ross's Recovery Theorem
+### Ross's recovery theorem
 
 
 **Theorem 1 (Recovery Theorem, {cite}`Ross2015`).** Suppose prices provide  no
@@ -217,10 +217,12 @@ f_{ij} = \frac{1}{\delta} \frac{z_j}{z_i} \, p_{ij}.
 $$
 
 One can verify that $F$ is indeed a stochastic matrix: all entries are
-positive and each row sums to one.  Uniqueness follows from the uniqueness of
+positive and each row sums to one.
+
+Uniqueness follows from the uniqueness of
 the Perron–Frobenius eigenvector. $\blacksquare$
 
-### Pricing Kernel from the Eigenvector
+### Pricing kernel from the eigenvector
 
 The recovered kernel values are
 
@@ -230,11 +232,12 @@ $$
 $$
 
 so the kernel at state $\theta_i$ (relative to a baseline state) is $1/z_i$.
+
 States with high $z_i$ have **low** kernel values, meaning the market assigns
 relatively less pricing weight per unit of probability — consistent with those
 states being "good times."
 
-### Corollary: Risk-Neutral Pricing When Rates Are State-Independent
+### Corollary: risk-neutral pricing when rates are state-independent
 
 
 **Theorem 2 ({cite}`Ross2015`).** If the riskless rate is the same in all
@@ -247,11 +250,11 @@ This remarkable result says that with a constant interest rate and a bounded
 irreducible state space, recovery forces risk-neutrality — a non-trivial
 restriction of the model.
 
-## Python Implementation
+## Python implementation
 
 We now implement the Recovery Theorem numerically.
 
-### Building a State Price Matrix from a Lognormal Model
+### Building a state price matrix from a lognormal model
 
 Following {cite}`Ross2015` Section IV, suppose the natural distribution of
 log-returns over one period is normal:
@@ -335,7 +338,7 @@ print(np.round(P.sum(axis=1), 4))
 print(f"\nImplied annual interest rate: {-np.log(P[5].sum()):.4f}")
 ```
 
-### Applying the Recovery Theorem
+### Applying the recovery theorem
 
 The Recovery Theorem requires computing the **dominant eigenvector** of $P$.
 
@@ -399,7 +402,7 @@ print(f"\nNatural probability matrix F  (row sums should be 1):")
 print(np.round(F.sum(axis=1), 6))
 ```
 
-### Visualizing Natural vs. Risk-Neutral Distributions
+### Visualizing natural vs. risk-neutral distributions
 
 A key insight of {cite}`Ross2015` is that the natural distribution systematically
 differs from the risk-neutral one.
@@ -486,7 +489,7 @@ print(f"{'Annual risk-free rate':30s} {-np.log(risk_free):>12.4f}")
 print(f"{'Equity risk premium':30s} {E_nat - 1/risk_free:>12.4f}")
 ```
 
-### Stochastic Dominance
+### Stochastic dominance
 
 Theorem 3 of {cite}`Ross2015` shows that the natural marginal density
 **first-order stochastically dominates** the risk-neutral density: the CDF of
@@ -517,7 +520,7 @@ print(f"Natural CDF ≤ Risk-neutral CDF at all states: "
       f"{np.all(cdf_nat <= cdf_rn + 1e-10)}")
 ```
 
-## Extracting the Pricing Kernel and Risk Premium
+## Extracting the pricing kernel and risk premium
 
 The pricing kernel recovered from $P$ via the Perron–Frobenius theorem has the following interpretation.
 
@@ -591,7 +594,7 @@ print(f"  Risk-free rate  ≈ {rf[mid]*100:.2f}% (true: {delta*100:.2f}%)")
 print(f"  Equity premium  ≈ {erp[mid]*100:.2f}% (true: {(mu-delta)*100:.2f}%)")
 ```
 
-## Sensitivity Analysis: Effect of Risk Aversion
+## Sensitivity analysis: effect of risk aversion
 
 The shape of the pricing kernel, and hence the gap between natural and
 risk-neutral probabilities, depends on the coefficient of risk aversion $\gamma$.
@@ -639,9 +642,11 @@ plt.show()
 The plots confirm the single-crossing property from Theorem 3 of
 {cite}`Ross2015`: for returns below some threshold $v$, risk-neutral
 probability exceeds natural probability; above $v$ the natural probability
-dominates.  A higher $\gamma$ amplifies this wedge.
+dominates.
 
-## Recovering the Discount Rate
+A higher $\gamma$ amplifies this wedge.
+
+## Recovering the discount rate
 
 A useful by-product of the Recovery Theorem is the **recovered subjective
 discount rate** $\delta$, which equals the Perron–Frobenius eigenvalue of $P$.
@@ -677,7 +682,7 @@ plt.savefig('ross_recovery_delta.png', dpi=120)
 plt.show()
 ```
 
-## Tail Risk: Natural vs. Risk-Neutral Probabilities of Catastrophe
+## Tail risk: natural vs. risk-neutral probabilities of catastrophe
 
 One of the most striking applications of the Recovery Theorem is its ability
 to separate the market's genuine fear of catastrophes from the risk premium
@@ -740,7 +745,7 @@ recovered natural density.
 The ratio captures the additional weight from risk
 aversion — the premium investors demand to bear tail risk.
 
-## Testing Efficient Markets
+## Testing efficient markets
 
 {cite}`Ross2015` shows that once the pricing kernel is recovered, one obtains
 an **upper bound on the Sharpe ratio** for any investment strategy:
@@ -780,24 +785,32 @@ print(f"\nHansen-Jagannathan bound on Sharpe ratio: {std_phi:.4f}")
 print(f"Upper bound on R² in return forecasting: {var_phi:.4f}")
 ```
 
-## Limitations and Extensions
+## Limitations and extensions
 
 The Recovery Theorem is a remarkable theoretical result, but several caveats
 apply in practice.
 
 **Finite state space.** The theorem requires a bounded, irreducible Markov
-chain.  In continuous, unbounded state spaces (e.g., a lognormal diffusion),
+chain.
+
+In continuous, unbounded state spaces (e.g., a lognormal diffusion),
 uniqueness fails because any exponential $e^{\alpha x}$ satisfies the
-characteristic equation.  {cite}`CarrYu2012` establish recovery with a bounded
+characteristic equation.
+
+{cite}`CarrYu2012` establish recovery with a bounded
 diffusion.
 
 **Transition independence.** If the kernel is not transition independent,
-recovery is not guaranteed.  {cite}`BorovickaHansenScheinkman2016` show that
+recovery is not guaranteed.
+
+{cite}`BorovickaHansenScheinkman2016` show that
 the Ross recovery can confound the long-run risk component of the kernel with
 the natural probability distribution, yielding an incorrect decomposition.
 
 **Empirical estimation.** Extracting reliable state prices from observed option
-prices requires careful interpolation and extrapolation.  The mapping from
+prices requires careful interpolation and extrapolation.
+
+The mapping from
 implied volatilities to state prices via the {cite}`BreedenLitzenberger1978` formula involves second derivatives, which amplify measurement error.
 
 **State dependence.** The state must capture all relevant variables: the level

From ba24b95c41fdfbabd7c8cb061c00fda3a4589f60 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Thu, 23 Apr 2026 18:54:31 +0800
Subject: [PATCH 14/26] update

---
 lectures/_static/quant-econ.bib            | 12 +++++-------
 lectures/information_market_equilibrium.md |  3 ---
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 547006f33..6cf319fcd 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -3655,18 +3655,17 @@ @article{muth1961
 
 @article{radner1972,
   author  = {Radner, Roy},
-  title   = {Existence of Equilibrium Plans, Prices, and Price Expectations
-             in a Sequence of Markets},
+  title   = {Existence of Equilibrium of Plans, Prices, and Price Expectations in a Sequence of Markets},
   journal = {Econometrica},
   volume  = {40},
   number  = {2},
-  pages   = {289--304},
+  pages   = {289--303},
   year    = {1972}
 }
 
 @article{arrow1964,
   author  = {Arrow, Kenneth J.},
-  title   = {The Role of Securities in the Optimal Allocation of Risk-Bearing},
+  title   = {The Role of Securities in the Optimal Allocation of Risk-bearing},
   journal = {Review of Economic Studies},
   volume  = {31},
   number  = {2},
@@ -3676,11 +3675,10 @@ @article{arrow1964
 
 @article{grossman1976,
   author  = {Grossman, Sanford J.},
-  title   = {On the Efficiency of Competitive Stock Markets Where Trades Have
-             Diverse Information},
+  title   = {On the Efficiency of Competitive Stock Markets Where Trades Have Diverse Information},
   journal = {Journal of Finance},
   volume  = {31},
   number  = {2},
   pages   = {573--585},
   year    = {1976}
-}
+}
\ No newline at end of file
diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 5dd40fc49..d9b4f95cc 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -88,9 +88,6 @@ Reduced-form and structural models come in pairs.
 To each structure or structural model
 there is a reduced form, or collection of reduced forms, underlying different
 possible regressions.
-
-In this lecture, a **structure** is a parameterization of the underlying
-endowment process.
 ```
 
 The lecture is organized as follows.

From 5d97aad0f85e5c6c4dd633faeff838af18897719 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Fri, 24 Apr 2026 01:53:14 +0800
Subject: [PATCH 15/26] updates

---
 lectures/information_market_equilibrium.md | 104 ++++--
 lectures/multivariate_normal.md            | 379 +++++++++++++++------
 lectures/prob_matrix.md                    |  70 ++--
 3 files changed, 382 insertions(+), 171 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index d9b4f95cc..bfb19ac3f 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -71,7 +71,7 @@ Important findings of {cite:t}`kihlstrom_mirman1975` are:
   insider's posterior distribution to the equilibrium price is one-to-one on
   the set of
   posteriors that can actually arise from the signal.
-  - Invertibility holds when the informed
+  - For the two-state case ($S = 2$), invertibility holds when the informed
     agent's utility is homothetic and the elasticity of substitution is everywhere
     either below one or above one.
 - In the dynamic economy, as information accumulates, Bayesian price
@@ -120,8 +120,9 @@ from scipy.stats import norm
 
 The economy has two goods. 
 
-Good 2 is the numeraire (price normalized to 1); good 1 trades
-at price $p > 0$.
+Good 2 is the numeraire (price normalized to 1). 
+
+Good 1 trades at price $p > 0$.
 
 An unknown parameter $\bar{a}$ affects the value of good 1. 
 
@@ -174,7 +175,7 @@ the calculations transparent.
 
 Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$
 correlated with
-$\bar{a}$ before trading.
+$\bar{a}$ before trading, where $\tilde{y}$ takes values in a finite set $Y$.
 
 Before the signal arrives, agent 1 has prior beliefs
 $\mu_0 = P^1$.
@@ -347,9 +348,6 @@ invertibility holds.
 (invertibility_conditions)=
 ## Invertibility and the elasticity of substitution
 
-The price-revelation theorem reduces the economic problem to a narrower one:
-when is the belief-to-price map actually one-to-one?
-
 When does the belief-to-price map fail to be invertible?
 
 {prf:ref}`ime_theorem_invertibility_conditions`
@@ -395,6 +393,9 @@ $$
 x(p) = (x_1(p), x_2(p)).
 $$
 
+Throughout, $u^i_j$ denotes the partial derivative of $u^i$ with respect to its
+$j$-th argument.
+
 Whenever the informed agent consumes positive amounts of both goods, optimality
 of $x(p)$
 under posterior $\mu$ gives the interior first-order condition
@@ -478,7 +479,7 @@ and the price reveals nothing about the insider's signal.
 
 ### CES utility
 
-For concreteness we work with the **constant-elasticity-of-substitution** (CES)
+For concreteness we work with a simplified example with the **constant-elasticity-of-substitution** (CES)
 utility
 function
 
@@ -513,19 +514,22 @@ We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1
 at
 equilibrium (i.e., $x_1 = 1$).
 
+Let $W_1 = w^1 + \theta^1 \pi$ denote agent 1's total wealth (endowment plus
+profit share).
+
 Agent 1's budget constraint then reduces to
-$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$
+$x_2 = W_1 - p$, and the equilibrium price is the unique $p \in (0, W_1)$
 satisfying
 the first-order condition
 
 $$
-p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
-= q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
+p \bigl[q\, u_2(a_1,\, W_1-p) + (1-q)\, u_2(a_2,\, W_1-p)\bigr]
+= q\, a_1\, u_1(a_1,\, W_1-p) + (1-q)\, a_2\, u_1(a_2,\, W_1-p).
 $$
 
 For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p =
-W^1 - p$,
-giving $p^* = W^1/2$ regardless of the posterior $q$, confirming that no
+W_1 - p$,
+giving $p^* = W_1/2$ regardless of the posterior $q$, confirming that no
 information
 is transmitted through the price in the Cobb-Douglas case.
 
@@ -616,7 +620,7 @@ print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
 print(f"Analytical CD price  = W1/2 = {W1/2:.6f}")
 ```
 
-Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the
+Every entry equals $W_1/2 = 2.0$ exactly, confirming analytically that the
 Cobb-Douglas
 equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
 
@@ -740,9 +744,9 @@ In each period $t$:
 3. Consumers trade and consume.
 
 The endowment vectors $\{\tilde{\omega}^t\}$ are **i.i.d.** with density
-$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is
+$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_K)$ is
 a
-**structural parameter vector** that is *fixed but unknown*.
+**structural parameter vector** (of dimension $K$) that is *fixed but unknown*.
 
 The equilibrium price at time $t$ is a deterministic function of $\omega^t$, so
 $\{p^t\}$ is also i.i.d.
@@ -849,9 +853,7 @@ $$
 Let $\bar\lambda$ be the true
 structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.
 
-Assume the prior assigns positive probability to $\bar\lambda$ (equivalently,
-positive
-probability to the class $\bar\mu$).
+Assume the prior assigns positive probability to the reduced-form class $\bar\mu$.
 
 Define the posterior mass on a reduced-form class by
 
@@ -888,6 +890,16 @@ which equals the rational-expectations price distribution for a fully informed
 observer.
 ```
 
+```{note}
+Note that the theorem only requires the prior to assign positive probability to the reduced-form class $\bar\mu$ that contains the true structure $\bar\lambda$.
+
+This is implied by, but weaker than, assigning positive probability to the true
+structural parameter $\bar\lambda$ itself.
+
+A prior could place zero mass on $\bar\lambda$
+while still placing positive mass on other structures inside $\bar\mu$.
+```
+
 The important distinction is that price observers need not learn $\bar \lambda$
 itself.
 
@@ -896,7 +908,7 @@ They only learn which reduced-form class is correct.
 That is enough for forecasting because every $\lambda \in \bar \mu$ generates
 the same price density $g(\cdot \mid \bar \mu)$.
 
-This is exactly the paper's point: rational price expectations emerge from
+Rational price expectations emerge from
 learning the
 reduced form, not from identifying every structural detail of the economy.
 
@@ -905,8 +917,7 @@ for next
 period's price matches the objective price distribution generated by the true
 reduced form.
 
-The theorem is easiest to absorb in a stripped-down example, so we now turn to a
-simple simulation.
+Let's now turn to a simple simulation.
 
 (bayesian_simulation)=
 ## Simulating Bayesian learning from prices
@@ -920,7 +931,7 @@ The observer knows the two possible price distributions (the reduced forms) but
 not which
 one governs the data.
 
-This is a **Bayesian model selection** problem.
+This is a **Bayesian model selection** problem we have seen in {doc}`likelihood_bayes`.
 
 With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight
 on $\mu_1$
@@ -931,6 +942,8 @@ h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
       + (1-h_{t-1})\, g(p^t \mid \mu_2)}.
 $$
 
+We consider a numerical example with two normal distributions with different means
+
 ```{code-cell} ipython3
 def simulate_bayesian_learning(
     p_bar_true, p_bar_alt, σ_p, T, h0, n_paths, seed=42
@@ -976,6 +989,16 @@ def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
     ax.legend(fontsize=10)
 ```
 
+We consider two cases, one that is easy to learn and another one that is harder to learn,
+using $T = 300$ periods, $n = 40$ simulated paths, a diffuse prior $h_0 = 0.5$, and
+common standard deviation $\sigma_p = 0.4$.
+
+- *Easy case*: true model $N(2.0,\, 0.4^2)$, alternative $N(1.2,\, 0.4^2)$.
+- *Hard case*: true model $N(2.0,\, 0.4^2)$, alternative $N(1.8,\, 0.4^2)$.
+
+Whether easy or hard to learn depends on "how close" the true distribution is compared to the
+alternative hypothesis.
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -983,19 +1006,19 @@ mystnb:
     caption: bayesian learning across paths
     name: fig-bayesian-learning
 ---
-T       = 300
-h0      = 0.5     # diffuse prior
+T = 300
+h0 = 0.5     # diffuse prior
 n_paths = 40
 σ_p = 0.4
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 
-# Distinct reduced forms.
+# Distinct reduced forms
 p_bar_true, p_bar_alt = 2.0, 1.2
 h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, σ_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
 
-# Similar reduced forms.
+# Similar reduced forms
 p_bar_true, p_bar_alt = 2.0, 1.8
 h_paths_hard = simulate_bayesian_learning(
     p_bar_true, p_bar_alt, σ_p, T, h0, n_paths
@@ -1011,15 +1034,16 @@ probability one,
 though convergence is slower when the two price distributions are similar (right
 panel).
 
-This first simulation tracks posterior mass, and the next one tracks the
-predictive density itself.
-
 ### Price expectations vs. rational expectations
 
 We now verify that the observer's price expectations converge to the
 rational-expectations
 distribution $g(p \mid \bar\mu)$.
 
+We continue to use the parameterization of the "easy-to-learn" example above
+($\bar{p}_{\text{true}} = 2.0$, $\bar{p}_{\text{alt}} = 1.2$, $\sigma_p = 0.4$),
+now extending to $T = 1{,}000$ periods with a single simulated path and prior $h_0 = 0.5$
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -1094,6 +1118,12 @@ $\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
 (because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price
 distribution).
 
+The three structures have price means $\bar{p}_1 = \bar{p}_2 = 2.0$ and
+$\bar{p}_3 = 1.2$, with common standard deviation $\sigma_p = 0.4$, a
+uniform prior $h_0 = (1/3, 1/3, 1/3)$, and $T = 400$ periods over $30$ paths.
+
+The true structure is $\lambda^{(1)}$.
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -1121,12 +1151,12 @@ def simulate_learning_3struct(
     return h_paths
 
 
-# Structures 0 and 1 share the same reduced form.
+# Structures 0 and 1 share the same reduced form
 p_bar_vec = np.array([2.0, 2.0, 1.2])
 h0_vec = np.array([1 / 3, 1 / 3, 1 / 3])
 σ_p = 0.4
 T = 400
-true_idx = 0     # Structure 0 is observationally equivalent to 1.
+true_idx = 0     # Structure 0 is observationally equivalent to 1
 
 h_paths_3 = simulate_learning_3struct(
     T, h0_vec, p_bar_vec, σ_p, true_idx, n_paths=30
@@ -1314,11 +1344,13 @@ roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 σ_p = 0.4
 
 def kl_normal(p1, p2, σ):
-    """Return the KL divergence for N(p1, sigma^2) and N(p2, sigma^2)."""
+    """Return the KL divergence for N(p1, σ^2) and N(p2, σ^2)."""
     return (p1 - p2)**2 / (2 * σ**2)
 
 cases = [("Easy",  2.0, 1.2), ("Hard", 2.0, 1.8)]
@@ -1333,7 +1365,7 @@ for ax, (name, p1, p2) in zip(axes, cases):
     kl = kl_normal(p1, p2, σ_p)
     paths = simulate_bayesian_learning(p1, p2, σ_p, T=2000,
                                        h0=0.5, n_paths=n_paths, seed=42)
-    # First period with posterior >= 0.99.
+    # First period with posterior >= 0.99
     T99 = []
     for path in paths:
         idx = np.where(path >= 0.99)[0]
@@ -1389,6 +1421,8 @@ Discuss your findings.
 :class: dropdown
 ```
 
+Here is  one solution:
+
 ```{code-cell} ipython3
 def simulate_misspecified(
     T, p_bar_true, p_bar_wrong, σ_p, h0, n_paths, seed=0
@@ -1429,7 +1463,7 @@ h_misspec = simulate_misspecified(T, p_true, p_wrong, σ_p, h0, n_paths)
 
 kl_vals = (p_true - p_wrong)**2 / (2 * σ_p**2)
 for mean, kl in zip(p_wrong, kl_vals):
-    print(f"KL(true || N({mean:.1f}, sigma^2)) = {kl:.4f}")
+    print(f"KL(true || N({mean:.1f}, σ^2)) = {kl:.4f}")
 
 t_grid = np.arange(T + 1)
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 70f361ae9..96adcd647 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -67,6 +67,8 @@ import matplotlib.pyplot as plt
 import numpy as np
 from numba import jit
 import statsmodels.api as sm
+
+rng = np.random.default_rng(0)
 ```
 
 Assume that an $N \times 1$ random vector $z$ has a
@@ -474,7 +476,7 @@ of $\epsilon$ will converge to $\hat{\Sigma}_1$.
 n = 1_000_000 # sample size
 
 # simulate multivariate normal random vectors
-data = np.random.multivariate_normal(μ, Σ, size=n)
+data = rng.multivariate_normal(μ, Σ, size=n)
 z1_data = data[:, 0]
 z2_data = data[:, 1]
 
@@ -517,8 +519,8 @@ Let’s apply our code to a trivariate example.
 We’ll specify the mean vector and the covariance matrix as follows.
 
 ```{code-cell} python3
-μ = np.random.random(3)
-C = np.random.random((3, 3))
+μ = rng.random(3)
+C = rng.random((3, 3))
 Σ = C @ C.T # positive semi-definite
 
 multi_normal = MultivariateNormal(μ, Σ)
@@ -545,7 +547,7 @@ z2 = np.array([2., 5.])
 
 ```{code-cell} python3
 n = 1_000_000
-data = np.random.multivariate_normal(μ, Σ, size=n)
+data = rng.multivariate_normal(μ, Σ, size=n)
 z1_data = data[:, :k]
 z2_data = data[:, k:]
 ```
@@ -714,7 +716,7 @@ $\theta$ conditional on our test scores.
 Let’s do that and then print out some pertinent quantities.
 
 ```{code-cell} python3
-x = np.random.multivariate_normal(μ_IQ, Σ_IQ)
+x = rng.multivariate_normal(μ_IQ, Σ_IQ)
 y = x[:-1] # test scores
 θ = x[-1]  # IQ
 ```
@@ -1044,7 +1046,7 @@ n = 2
 
 ```{code-cell} python3
 # take one draw
-x = np.random.multivariate_normal(μ_IQ2d, Σ_IQ2d)
+x = rng.multivariate_normal(μ_IQ2d, Σ_IQ2d)
 y1 = x[:n]
 y2 = x[n:2*n]
 θ = x[2*n]
@@ -1261,7 +1263,7 @@ This is going to be very useful for doing the conditioning to be used in
 the fun exercises below.
 
 ```{code-cell} python3
-z = np.random.multivariate_normal(μz, Σz)
+z = rng.multivariate_normal(μz, Σz)
 
 x = z[:T+1]
 y = z[T+1:]
@@ -1660,7 +1662,7 @@ conditional mean $E \left[p_{t} \mid y_{t-1}, y_{t}\right]$ using
 the `MultivariateNormal` class.
 
 ```{code-cell} python3
-z = np.random.multivariate_normal(μz, Σz)
+z = rng.multivariate_normal(μz, Σz)
 y, p = z[:T], z[T:]
 ```
 
@@ -1979,7 +1981,7 @@ We describe the Kalman filter and some applications of it in {doc}`A First Look
 
 ## Classic factor analysis model
 
-The factor analysis model widely used in psychology and other fields can
+The factor analysis model can
 be represented as
 
 $$
@@ -1990,11 +1992,11 @@ where
 
 1. $Y$ is $n \times 1$ random vector,
    $E U U^\top = D$ is a diagonal matrix,
-1. $\Lambda$ is $n \times k$ coefficient matrix,
-1. $f$ is $k \times 1$ random vector,
+2. $\Lambda$ is $n \times k$ coefficient matrix,
+3. $f$ is $k \times 1$ random vector,
    $E f f^\top = I$,
-1. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f^\top = 0 $ )
-1. It is presumed that $k$ is small relative to $n$; often
+4. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f^\top = 0 $ )
+5. It is presumed that $k$ is small relative to $n$; often
    $k$ is only $1$ or $2$, as in our IQ examples.
 
 This implies that
@@ -2097,7 +2099,7 @@ $Z$.
 ```
 
 ```{code-cell} python3
-z = np.random.multivariate_normal(μz, Σz)
+z = rng.multivariate_normal(μz, Σz)
 
 f = z[:k]
 y = z[k:]
@@ -2148,8 +2150,9 @@ model.
 
 
 
-Technically, this means that the PCA model is misspecified. (Can you
-explain why?)
+Technically, this means that the PCA model is misspecified.
+
+(Can you explain why?)
 
 Nevertheless, this exercise will let us study how well the first two
 principal components from a PCA can approximate the conditional
@@ -2250,51 +2253,144 @@ Let’s look at them, after which we’ll look at $E f | y = B y$
 B @ y
 ```
 
-The fraction of variance in $y_{t}$ explained by the first two
-principal components can be computed as below.
+```{note}
+The two largest eigenvalues are both $5.25$ in this example. 
+
+
+When an
+eigenvalue is repeated, the associated principal components are not
+individually pinned down: any orthonormal basis for the same
+two-dimensional eigenspace is valid.
+
+For that reason, it is not meaningful to compare $\epsilon_1$ and
+$\epsilon_2$ component-by-component with $E[f \mid Y]$. 
+
+The PC scores
+live in a PCA coordinate system, while $E[f \mid Y]$ lives in factor
+space. 
+
+Even within the common two-dimensional subspace, the PCA basis can
+be rotated or sign-flipped, and its coordinates need not use the same
+scaling as the factor coordinates.
+
+What is uniquely determined is the two-dimensional subspace spanned by
+the first two columns of $P$. 
+
+In this symmetric example, that subspace is
+exactly the column space of $\Lambda$.
+```
+
+The fraction of variance in $y_t$ explained by the first two principal
+components is
 
 ```{code-cell} python3
 𝜆_tilde[:2].sum() / 𝜆_tilde.sum()
 ```
 
-Compute
+To compare PCA with the factor model in observation space, compute
 
 $$
 \hat{Y} = P_{j} \epsilon_{j} + P_{k} \epsilon_{k}
 $$
 
-where $P_{j}$ and $P_{k}$ correspond to the largest two
-eigenvalues.
+where $P_j$ and $P_k$ are the eigenvectors associated with the two
+largest eigenvalues.
 
 ```{code-cell} python3
 y_hat = P[:, :2] @ ε[:2]
 ```
 
-In this example, it turns out that the projection $\hat{Y}$ of
-$Y$ on the first two principal components does a good job of
-approximating $Ef \mid y$.
+$\hat{Y}$ is the rank-2 PCA approximation to $Y$ in observation space,
+so it is a 10-vector rather than a 2-vector. 
+
+The natural observation-space
+counterpart from the factor model is $\Lambda E[f \mid Y]$, which is
+also a 10-vector.
+
+In this symmetric example, both vectors lie in the same two-dimensional
+subspace, namely the column space of $\Lambda$. 
+
+They are therefore close,
+but not identical. 
+
+The PCA reconstruction uses the block means directly,
+while $\Lambda E[f \mid Y]$ shrinks those block means toward zero by the
+factor $5/(5+\sigma_u^2) \approx 0.952$.
+
+The next plot makes this comparison concrete.
+
+The two scatter plots, $E[Y \mid f] = \Lambda f$ and $\hat{Y}$, are both
+10-vectors in observation space, so they can be compared directly.
+
+The horizontal lines show the factor values $f_1$ and $f_2$, together
+with their posterior means $E[f_i \mid Y]$. 
 
-We confirm this in the following plot of $f$,
-$E y \mid f$, $E f \mid y$, and $\hat{y}$ against the
-observation index on the horizontal axis.
+These are 2-dimensional
+factor-space quantities, drawn over the relevant half of the index set to
+match the block structure of $\Lambda$.
+
+This uses the same idea as the earlier formula
+$E[Y \mid f] = \Lambda f$: the matrix $\Lambda$ maps a 2-vector in factor
+space into a 10-vector in observation space. 
+
+In our example,
+
+$$
+\Lambda a
+=
+\begin{bmatrix}
+a_1 \\
+\vdots \\
+a_1 \\
+a_2 \\
+\vdots \\
+a_2
+\end{bmatrix}
+\quad \text{for any } a = \begin{bmatrix} a_1 \\ a_2 \end{bmatrix},
+$$
+
+because the first five rows of $\Lambda$ are $(1,0)$ and the last five
+rows are $(0,1)$.
+
+Therefore, once we observe $Y=y$, the posterior mean
+$E[f \mid Y=y] = \begin{bmatrix} E[f_1 \mid y] \\ E[f_2 \mid y] \end{bmatrix}$
+is converted into the observation-space vector
+
+$$
+\Lambda E[f \mid Y=y]
+=
+\begin{bmatrix}
+E[f_1 \mid y] \\
+\vdots \\
+E[f_1 \mid y] \\
+E[f_2 \mid y] \\
+\vdots \\
+E[f_2 \mid y]
+\end{bmatrix}.
+$$
+
+So the horizontal line at height $E[f_1 \mid y]$ over the first five
+indices, together with the horizontal line at height $E[f_2 \mid y]$
+over the last five indices, is exactly a picture of
+$\Lambda E[f \mid Y=y]$.
 
 ```{code-cell} python3
-plt.scatter(range(N), Λ @ f, label='$Ey|f$')
-plt.scatter(range(N), y_hat, label=r'$\hat{y}$')
+plt.scatter(range(N), Λ @ f, label=r'$E[Y \mid f]$')
+plt.scatter(range(N), y_hat, label=r'$\hat{Y}$')
 plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')
 plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')
 
 Efy = B @ y
-plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label='$Ef_{1}|y$')
-plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label='$Ef_{2}|y$')
+plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label=r'$E[f_1 \mid y]$')
+plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label=r'$E[f_2 \mid y]$')
 plt.legend()
 
 plt.show()
 ```
 
-The covariance matrix of $\hat{Y}$ can be computed by first
-constructing the covariance matrix of $\epsilon$ and then use the
-upper left block for $\epsilon_{1}$ and $\epsilon_{2}$.
+To compute the covariance matrix of $\hat{Y}$, first form the covariance
+matrix of $\epsilon$ and then extract the upper-left block corresponding
+to $\epsilon_1$ and $\epsilon_2$.
 
 ```{code-cell} python3
 Σεjk = (P.T @ Σy @ P)[:2, :2]
@@ -2324,9 +2420,11 @@ fix $z_2 = 2$.
 1. Use `MultivariateNormal` to compute the analytical conditional mean
 $\hat{\mu}_1$ and variance $\hat{\Sigma}_{11}$ of $z_1 \mid z_2 = 2$.
 
-1. Draw $10^6$ samples from the joint distribution. Retain only those
-for which $|z_2 - 2| < 0.05$. Compute the sample mean and variance of
-the retained $z_1$ values.
+1. Draw $10^6$ samples from the joint distribution.
+
+   Retain only those for which $|z_2 - 2| < 0.05$.
+
+   Compute the sample mean and variance of the retained $z_1$ values.
 
 1. Confirm that the sample estimates are close to the analytical values.
 ```
@@ -2335,9 +2433,9 @@ the retained $z_1$ values.
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
+Here is one solution:
 
+```{code-cell} python3
 μ = np.array([.5, 1.])
 Σ = np.array([[1., .5], [.5, 1.]])
 
@@ -2347,7 +2445,7 @@ mn.partition(1)
 print(f"Analytical  μ1_hat = {μ1_hat[0]:.4f},  Σ11_hat = {Σ11_hat[0,0]:.4f}")
 
 n = 1_000_000
-data = np.random.multivariate_normal(μ, Σ, size=n)
+data = rng.multivariate_normal(μ, Σ, size=n)
 z1_all, z2_all = data[:, 0], data[:, 1]
 
 mask = np.abs(z2_all - 2.) < 0.05
@@ -2389,14 +2487,13 @@ $$
 so $b_1 b_2 = \rho^2$.
 
 ```{code-cell} python3
-import numpy as np
-
 for ρ in [0.2, 0.5, 0.9]:
     Σ = np.array([[1., ρ], [ρ, 1.]])
     mn = MultivariateNormal(np.zeros(2), Σ)
     mn.partition(1)
-    product = float(mn.βs[0]) * float(mn.βs[1])
-    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ^2 = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
+    product = mn.βs[0].item() * mn.βs[1].item()
+    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f}")
+    print(f"ρ^2 = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
 ```
 
 ```{solution-end}
@@ -2411,11 +2508,12 @@ Using the one-dimensional IQ model with $n = 50$ test scores and
 $\mu_\theta = 100$, $\sigma_\theta = 10$:
 
 1. Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
-For each value, plot the posterior standard deviation
+
+- For each value, plot the posterior standard deviation
 $\hat{\sigma}_\theta$ as a function of the number of test scores
 included (from 1 to 50), with all curves on the same axes.
 
-1. Explain intuitively why a larger $\sigma_y$ leads to a slower
+2. Explain intuitively why a larger $\sigma_y$ leads to a slower
 decline of posterior uncertainty.
 ```
 
@@ -2423,10 +2521,9 @@ decline of posterior uncertainty.
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+Here is one solution:
 
+```{code-cell} python3
 n_max = 50
 μθ_val, σθ_val = 100., 10.
 
@@ -2449,13 +2546,15 @@ plt.show()
 
 When $\sigma_y$ is large each test score is a noisy signal about $\theta$,
 so many more observations are required before the posterior variance falls
-appreciably. In the limit $\sigma_y \to 0$ a single observation pins down
+appreciably. 
+
+In the limit $\sigma_y \to 0$ a single observation pins down
 $\theta$ exactly.
 
 ```{solution-end}
 ```
 
-```{exercise}
+````{exercise}
 :label: mv_normal_ex4
 
 **Prior vs. likelihood in IQ inference**
@@ -2464,30 +2563,41 @@ Using the one-dimensional IQ model with $n = 20$ test scores and
 $\mu_\theta = 100$, $\sigma_y = 10$:
 
 1. Fix $\sigma_y = 10$ and vary the prior spread
-$\sigma_\theta \in \{1, 5, 10, 50, 500\}$. For each value compute the
-posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
-scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
+$\sigma_\theta \in \{1, 5, 10, 50, 500\}$.
+
+    - For each value compute the
+    posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
+    scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
 
-1. Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
-the posterior mean converges to the sample mean $\bar{y}$, and as
-$\sigma_y \to \infty$ the posterior mean converges to the prior mean
-$\mu_\theta$.
+1. Show analytically (or verify numerically) that
+
+   - as $\sigma_\theta \to \infty$ the posterior mean converges to the
+     sample mean $\bar{y}$ (the data dominate the prior), and
+   - as $\sigma_\theta \to 0$ the posterior mean converges to the prior
+     mean $\mu_\theta$ (the prior dominates the data).
+
+```{hint}
+The posterior mean formula is
+$\hat{\mu}_\theta = \bigl(\mu_\theta/\sigma_\theta^2 + n\bar{y}/\sigma_y^2\bigr)
+\big/ \bigl(1/\sigma_\theta^2 + n/\sigma_y^2\bigr)$.
 ```
 
+Examine each limit by letting $\sigma_\theta$ go to $\infty$ or $0$.
+````
+
 ```{solution-start} mv_normal_ex4
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+Here is one solution:
 
+```{code-cell} python3
 n_scores = 20
 μθ_val, σy_val = 100., 10.
 
-np.random.seed(42)
+rng = np.random.default_rng(42)
 true_θ = 108.
-y_obs = true_θ + σy_val * np.random.randn(n_scores)
+y_obs = true_θ + σy_val * rng.standard_normal(n_scores)
 y_bar = np.mean(y_obs)
 
 σθ_vals = [1., 5., 10., 50., 500.]
@@ -2498,19 +2608,34 @@ for σθ_val in σθ_vals:
     mn_i = MultivariateNormal(μ_i, Σ_i)
     mn_i.partition(n_scores)
     μθ_hat, _ = mn_i.cond_dist(1, y_obs)
-    μθ_hat_vals.append(float(μθ_hat))
+    μθ_hat_vals.append(μθ_hat.item())
+
+def posterior_mean(σθ_val):
+    μ_i, Σ_i, _ = construct_moments_IQ(n_scores, μθ_val, σθ_val, σy_val)
+    mn_i = MultivariateNormal(μ_i, Σ_i)
+    mn_i.partition(n_scores)
+    μθ_hat, _ = mn_i.cond_dist(1, y_obs)
+    return μθ_hat.item()
 
 fig, ax = plt.subplots()
-ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', label=r'$\hat{\mu}_\theta$')
-ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean y_bar = {y_bar:.1f}')
-ax.axhline(μθ_val, ls=':',  color='g', label=f'prior mean μθ = {μθ_val:.0f}')
+ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', 
+            label=r'$\hat{\mu}_\theta$')
+ax.axhline(y_bar,  ls='--', color='r', 
+            label=f'sample mean y_bar = {y_bar:.1f}')
+ax.axhline(μθ_val, ls=':',  color='g', 
+            label=f'prior mean μθ = {μθ_val:.0f}')
 ax.set_xlabel(r'$\sigma_\theta$')
 ax.set_ylabel(r'posterior mean $\hat{\mu}_\theta$')
 ax.legend()
 plt.show()
 
+σθ_small = 1e-2
+σθ_large = 1e4
+
 print(f"y_bar = {y_bar:.4f}")
-print(f"Large σθ posterior mean approx {μθ_hat_vals[-1]:.4f}")
+print(f"Posterior mean with σθ={σθ_large:.0e}: {posterior_mean(σθ_large):.4f}")
+print(f"Posterior mean with σθ={σθ_small:.0e}: {posterior_mean(σθ_small):.4f}")
+print(f"Prior mean μθ = {μθ_val:.4f}")
 ```
 
 ```{solution-end}
@@ -2535,9 +2660,11 @@ and initial conditions $\hat{x}_0 = [0, 0]'$, $\Sigma_0 = I_2$:
 1. Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
 
 1. Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
-$\Sigma_t[1,1]$ over time. Verify that they converge to a steady state.
+$\Sigma_t[1,1]$ over time.
 
-1. Plot the filtered state estimates $\hat{x}_t[0]$ together with the
+   Verify that they converge to a steady state.
+
+1. Plot the filtered state estimates $\tilde{x}_t[0]$ together with the
 true $x_t[0]$ and the raw observations $y_t$ on a single figure.
 ```
 
@@ -2545,10 +2672,9 @@ true $x_t[0]$ and the raw observations $y_t$ on a single figure.
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+Here is one solution:
 
+```{code-cell} python3
 A_ex = np.array([[0.9, 0.], [0., 0.5]])
 C_ex = np.array([[1.], [1.]])
 G_ex = np.array([[1., 0.]])
@@ -2558,26 +2684,44 @@ T_ex = 60
 x0_hat_ex = np.zeros(2)
 Σ0_ex = np.eye(2)
 
-np.random.seed(7)
+rng = np.random.default_rng(7)
 x_true = np.zeros((T_ex + 1, 2))
 y_seq_ex = np.zeros(T_ex)
 for t in range(T_ex):
-    x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * np.random.randn()
-    y_seq_ex[t] = G_ex @ x_true[t] + np.random.randn()
+    x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * rng.standard_normal()
+    y_seq_ex[t] = (G_ex @ x_true[t]).item() + rng.standard_normal()
 
-x_hat_seq, Σ_hat_seq = iterate(x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
+x_hat_seq, Σ_hat_seq = iterate(
+    x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
 
+# x_hat_seq[t] = E[x_t | y^{t-1}] (one-step-ahead prediction)
+# Σ_hat_seq[t] = corresponding prediction-error covariance
 fig, ax = plt.subplots()
 ax.plot(Σ_hat_seq[:, 0, 0], label=r'$\Sigma_t[0,0]$')
 ax.plot(Σ_hat_seq[:, 1, 1], label=r'$\Sigma_t[1,1]$')
 ax.set_xlabel('t')
-ax.set_ylabel('conditional variance')
+ax.set_ylabel('prediction-error variance')
 ax.legend()
 plt.show()
 
+# The `iterate` function stores one-step-ahead predictions. 
+# We recover the filtered estimates E[x_t | y^t] by re-applying
+# the measurement-update step at each t.
+n_state = 2
+x_filt_seq = np.empty((T_ex, n_state))
+for t in range(T_ex):
+    xt_hat = x_hat_seq[t]
+    Σt     = Σ_hat_seq[t]
+    μ_k = np.hstack([xt_hat, G_ex @ xt_hat])
+    Σ_k = np.block([[Σt,          Σt  @ G_ex.T          ],
+                    [G_ex @ Σt,   G_ex @ Σt @ G_ex.T + R_ex]])
+    mn_k = MultivariateNormal(μ_k, Σ_k)
+    mn_k.partition(n_state)
+    x_filt_seq[t], _ = mn_k.cond_dist(0, y_seq_ex[t:t+1])
+
 fig, ax = plt.subplots()
-ax.plot(x_true[1:, 0], label='true $x_t[0]$', alpha=0.7)
-ax.plot(x_hat_seq[1:, 0], label=r'filtered $\hat{x}_t[0]$', ls='--')
+ax.plot(x_true[:-1, 0], label='true $x_t[0]$', alpha=0.7)
+ax.plot(x_filt_seq[:, 0], label=r'filtered $\tilde{x}_t[0]$', ls='--')
 ax.plot(y_seq_ex, label='observations $y_t$', alpha=0.4, lw=0.8)
 ax.set_xlabel('t')
 ax.legend()
@@ -2595,14 +2739,22 @@ plt.show()
 In the classic factor analysis model at the end of the lecture the true
 covariance is $\Sigma_y = \Lambda \Lambda' + D$.
 
-1. Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
-variance explained by the first two principal components and compare
-it with the $\sigma_u = 0.5$ result. Explain the change.
+1. Set $\sigma_u = 2$ (instead of $0.5$). 
+
+    - Recompute the fraction of
+    variance explained by the first two principal components and compare
+    it with the $\sigma_u = 0.5$ result. 
+    - Explain the change.
 
-1. Show that the conditional expectation $E[f \mid Y] = BY$ with
-$B = \Lambda^\top \Sigma_y^{-1}$ is **not** equal to the two-component PCA
-projection $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$. Plot both on the same
-axes.
+1. Show that the observation-space factor-analytic posterior
+   $\Lambda E[f \mid Y] = \Lambda B Y$ (an $N$-vector) is **not** equal to
+   the two-component PCA reconstruction
+   $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$ (also an $N$-vector).
+    - Plot both on the same axes.
+
+   *Note:* $E[f \mid Y] = BY$ is a $k$-vector and $\hat{Y}$ is an
+   $N$-vector, so they cannot be compared directly; the comparison must be
+   made in observation space via $\Lambda E[f \mid Y]$.
 
 1. In one or two sentences, explain why PCA is misspecified for
 factor-analytic data.
@@ -2612,9 +2764,10 @@ factor-analytic data.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+rng = np.random.default_rng(42)
 
 N_fa = 10
 k_fa = 2
@@ -2636,40 +2789,50 @@ for σu_val in [0.5, 2.0]:
     print(f"σu={σu_val}: fraction explained by first 2 PCs = {frac:.4f}")
 
 σu_b = 0.5
-D_b  = np.eye(N_fa) * σu_b ** 2
+D_b = np.eye(N_fa) * σu_b ** 2
 Σy_b = Λ_fa @ Λ_fa.T + D_b
 
 μz_b = np.zeros(k_fa + N_fa)
 Σz_b = np.block([[np.eye(k_fa), Λ_fa.T], [Λ_fa, Σy_b]])
-z_b  = np.random.multivariate_normal(μz_b, Σz_b)
-f_b  = z_b[:k_fa]
-y_b  = z_b[k_fa:]
+z_b = rng.multivariate_normal(μz_b, Σz_b)
+f_b = z_b[:k_fa]
+y_b = z_b[k_fa:]
 
-B_b    = Λ_fa.T @ np.linalg.inv(Σy_b)
+B_b = Λ_fa.T @ np.linalg.inv(Σy_b)
 Efy_b  = B_b @ y_b
 
 λ_b, P_b = np.linalg.eigh(Σy_b)
-ind_b    = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
-P_b      = P_b[:, ind_b]
-ε_b      = P_b.T @ y_b
-y_hat_b  = P_b[:, :2] @ ε_b[:2]
+ind_b = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
+P_b = P_b[:, ind_b]
+ε_b = P_b.T @ y_b
+y_hat_b = P_b[:, :2] @ ε_b[:2]
 
 fig, ax = plt.subplots(figsize=(8, 4))
-ax.scatter(range(N_fa), Λ_fa @ Efy_b, label=r'Factor-analytic $\Lambda E[f\mid y]$')
-ax.scatter(range(N_fa), y_hat_b, marker='x', label=r'PCA projection $\hat{y}$')
-ax.scatter(range(N_fa), Λ_fa @ f_b, marker='^', alpha=0.6, label=r'True signal $\Lambda f$')
+ax.scatter(range(N_fa), 
+        Λ_fa @ Efy_b, label=r'Factor-analytic $\Lambda E[f\mid y]$')
+ax.scatter(range(N_fa), 
+        y_hat_b, marker='x', label=r'PCA projection $\hat{y}$')
+ax.scatter(range(N_fa), 
+        Λ_fa @ f_b, marker='^', alpha=0.6, label=r'True signal $\Lambda f$')
 ax.set_xlabel('observation index')
 ax.legend()
 plt.show()
 ```
 
-PCA is misspecified for factor-analytic data because it imposes no
-structure on the residual covariance: it decomposes $\Sigma_y$ into
-eigenvectors that need not align with the factor loadings $\Lambda$.
-The factor model, by contrast, correctly separates the covariance into a
-low-rank systematic part $\Lambda\Lambda'$ and a diagonal idiosyncratic
-part $D$, so its conditional expectation $E[f\mid Y]$ is the minimum-variance
-linear estimator of the factors.
+In this symmetric example, PCA does recover the same two-dimensional
+observation-space subspace as the factor model, namely the column space
+of $\Lambda$. But PCA is still misspecified for factor-analytic data,
+because it treats the covariance matrix as an arbitrary matrix to be
+approximated and does not use the special decomposition
+$\Sigma_y = \Lambda \Lambda^\top + D$ into a common part and an
+idiosyncratic noise part.
+
+So the two methods are solving different problems. PCA forms
+$\hat{Y}$ as the best rank-2 approximation to the observed data vector
+$Y$, which in this example amounts to using the block means. The factor
+model instead computes $\Lambda E[f \mid Y]$, the conditional mean of the
+latent common component $\Lambda f$ given the data, and because it
+accounts for noise it shrinks those block means toward zero.
 
 ```{solution-end}
 ```
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index 3a63a54d3..b29820c20 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -58,6 +58,8 @@ from scipy.special import comb
 from mpl_toolkits.mplot3d import Axes3D
 from matplotlib_inline.backend_inline import set_matplotlib_formats
 set_matplotlib_formats('retina')
+
+rng = np.random.default_rng(0)
 ```
 
 
@@ -620,7 +622,7 @@ f = np.array([[0.3, 0.2], [0.1, 0.4]])
 f_cum = np.cumsum(f)
 
 # draw random numbers
-p = np.random.rand(1_000_000)
+p = rng.random(1_000_000)
 x = np.vstack([xs[1]*np.ones(p.shape), ys[1]*np.ones(p.shape)])
 # map to the bivariate distribution
 
@@ -777,7 +779,7 @@ class discrete_bijoint:
         xs = self.xs
         ys = self.ys
         f_cum = np.cumsum(self.f)
-        p = np.random.rand(n)
+        p = rng.random(n)
         x = np.empty([2, p.shape[0]])
         lf = len(f_cum)
         lx = len(xs)-1
@@ -979,7 +981,7 @@ Next  we can use   a built-in `numpy` function to draw random samples, then calc
 μ= np.array([0, 5])
 σ= np.array([[5, .2], [.2, 1]])
 n = 1_000_000
-data = np.random.multivariate_normal(μ, σ, n)
+data = rng.multivariate_normal(μ, σ, n)
 x = data[:, 0]
 y = data[:, 1]
 ```
@@ -990,7 +992,7 @@ y = data[:, 1]
 plt.hist(x, bins=1_000, alpha=0.6)
 μx_hat, σx_hat = np.mean(x), np.std(x)
 print(μx_hat, σx_hat)
-x_sim = np.random.normal(μx_hat, σx_hat, 1_000_000)
+x_sim = rng.normal(μx_hat, σx_hat, 1_000_000)
 plt.hist(x_sim, bins=1_000, alpha=0.4, histtype="step")
 plt.show()
 ```
@@ -999,7 +1001,7 @@ plt.show()
 plt.hist(y, bins=1_000, density=True, alpha=0.6)
 μy_hat, σy_hat = np.mean(y), np.std(y)
 print(μy_hat, σy_hat)
-y_sim = np.random.normal(μy_hat, σy_hat, 1_000_000)
+y_sim = rng.normal(μy_hat, σy_hat, 1_000_000)
 plt.hist(y_sim, bins=1_000, density=True, alpha=0.4, histtype="step")
 plt.show()
 ```
@@ -1059,7 +1061,7 @@ Let's draw from a normal distribution with above mean and variance and check how
 σx = np.sqrt(np.dot((x - μx)**2, z))
 
 # sample
-zz = np.random.normal(μx, σx, 1_000_000)
+zz = rng.normal(μx, σx, 1_000_000)
 plt.hist(zz, bins=300, density=True, alpha=0.3, range=[-10, 10])
 plt.show()
 ```
@@ -1079,7 +1081,7 @@ plt.show()
 σy = np.sqrt(np.dot((y - μy)**2, z))
 
 # sample
-zz = np.random.normal(μy,σy,1_000_000)
+zz = rng.normal(μy, σy, 1_000_000)
 plt.hist(zz, bins=100, density=True, alpha=0.3)
 plt.show()
 ```
@@ -1187,7 +1189,7 @@ $$
 \text{Prob} \{X=1\}=& q  =\mu_{1}\\
 \text{Prob} \{Y=0\}=& 1-r  =\nu_{0}\\
 \text{Prob} \{Y=1\}= & r  =\nu_{1}\\
-\text{where } 0 \leq q < r \leq 1
+\text{where } 0 \leq q \leq r \leq 1
 \end{aligned}
 $$
 
@@ -1309,16 +1311,17 @@ Let's first generate X and Y.
 # number of draws
 draws = 1_000_000
 
-# generate draws from uniform distribution
-p = np.random.rand(draws)
+# generate independent draws from uniform distribution for X and Y
+p_x = rng.random(draws)
+p_y = rng.random(draws)
 
-# generate draws of X and Y via uniform distribution
+# generate draws of X and Y via independent uniform draws
 x = np.ones(draws)
 y = np.ones(draws)
-x[p <= μ[0]] = 0
-x[p > μ[0]] = 1
-y[p <= ν[0]] = 0
-y[p > ν[0]] = 1
+x[p_x <= μ[0]] = 0
+x[p_x > μ[0]] = 1
+y[p_y <= ν[0]] = 0
+y[p_y > ν[0]] = 1
 ```
 
 ```{code-cell} ipython3
@@ -1370,7 +1373,7 @@ f1_cum = np.cumsum(f1)
 draws1 = 1_000_000
 
 # generate draws from uniform distribution
-p = np.random.rand(draws1)
+p = rng.random(draws1)
 
 # generate draws of first coupling via uniform distribution
 c1 = np.vstack([np.ones(draws1), np.ones(draws1)])
@@ -1445,7 +1448,7 @@ f2_cum = np.cumsum(f2)
 draws2 = 1_000_000
 
 # generate draws from uniform distribution
-p = np.random.rand(draws2)
+p = rng.random(draws2)
 
 # generate draws of second coupling via uniform distribution
 c2 = np.vstack([np.ones(draws2), np.ones(draws2)])
@@ -1533,7 +1536,7 @@ mystnb:
 n_cop = 100_000
 
 # Draw from bivariate standard normal with correlation ρ_cop
-z = np.random.multivariate_normal(
+z = rng.multivariate_normal(
     [0, 0], [[1, ρ_cop], [ρ_cop, 1]], n_cop
 )
 
@@ -1592,6 +1595,8 @@ where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 F = np.array([[0.3, 0.2],
               [0.1, 0.4]])
@@ -1635,6 +1640,8 @@ Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 2
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 xs = np.array([0, 1])
 ys = np.array([10, 20])
@@ -1676,7 +1683,7 @@ and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0
 
 **Sum of Two Dice**
 
-Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
+Let $X$ and $Y$ be **independent** random variables, each uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
 
 1. Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
 
@@ -1691,6 +1698,8 @@ Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z =
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 f = np.ones(6) / 6
 g = np.ones(6) / 6
@@ -1703,7 +1712,7 @@ h = [
 z_vals = np.arange(2, 13)
 
 n = 1_000_000
-z_sim = np.random.randint(1, 7, n) + np.random.randint(1, 7, n)
+z_sim = rng.integers(1, 7, n) + rng.integers(1, 7, n)
 counts = np.bincount(z_sim, minlength=13)[2:]
 
 fig, ax = plt.subplots()
@@ -1747,6 +1756,8 @@ where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 P = np.array([[0.9, 0.1],
               [0.2, 0.8]])
@@ -1774,25 +1785,29 @@ print(f"ψ_100 close to stationary? {np.allclose(ψ_100, ψ_star, atol=1e-6)}")
 
 A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
 
+Assume that, conditional on $\theta$, the coin flips are i.i.d. Bernoulli($\theta$).
+
 1. After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
 
-$$
-\mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
-$$
+   $$
+   \mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
+   $$
 
-for each $\theta$.
+   for each $\theta$.
 
-1. Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
+2. Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
 
-1. Plot the prior and posterior side by side.
+3. Plot the prior and posterior side by side.
 
-1. Repeat for $k = 3$ heads and describe how the posterior shifts.
+4. Repeat for $k = 3$ heads and describe how the posterior shifts.
 ```
 
 ```{solution-start} prob_matrix_ex5
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 θ_vals = np.array([0.2, 0.5, 0.8])
 π = np.array([0.25, 0.50, 0.25])
@@ -1822,7 +1837,6 @@ for ax, post, title in zip(
     ax.set_ylabel('Probability')
     ax.set_title(title)
     ax.legend()
-plt.tight_layout()
 plt.show()
 ```
 

From 5c62e885de50c4be903bdd80ffd172522820438f Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Fri, 24 Apr 2026 14:09:48 +0800
Subject: [PATCH 16/26] updates

---
 lectures/misspecified_recovery.md | 750 +++++++++++++++---------------
 lectures/ross_recovery.md         | 299 ++++++------
 2 files changed, 516 insertions(+), 533 deletions(-)

diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index f1a980228..cf274ca5f 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -137,19 +137,19 @@ P_phys = np.array([
 
 # Consumption levels in each state (arbitrary units)
 c_levels = np.array([0.85, 1.00, 1.15])
-state_names = ['Recession', 'Normal', 'Expansion']
+state_names = ['recession', 'normal', 'expansion']
 
 # Preference parameters
-delta  = 0.99    # monthly discount factor
-gamma  = 5.0     # coefficient of relative risk aversion
+δ = 0.99    # monthly discount factor
+γ = 5.0     # coefficient of relative risk aversion
 
 # Arrow price matrix under power utility with rational expectations:
-#   q_ij = delta * (c_j / c_i)^{-gamma} * p_ij
+#   q_ij = δ * (c_j / c_i)^{-γ} * p_ij
 n = len(c_levels)
 Q_mat = np.zeros((n, n))
 for i in range(n):
     for j in range(n):
-        Q_mat[i, j] = delta * (c_levels[j] / c_levels[i])**(-gamma) * P_phys[i, j]
+        Q_mat[i, j] = δ * (c_levels[j] / c_levels[i])**(-γ) * P_phys[i, j]
 
 print("Arrow price matrix Q:")
 print(np.round(Q_mat, 5))
@@ -179,7 +179,7 @@ $$
 def risk_neutral_probs(Q):
     """Compute risk-neutral transition matrix from Arrow price matrix."""
     q_bonds = Q.sum(axis=1)            # one-period bond prices
-    P_bar   = Q / q_bonds[:, np.newaxis]
+    P_bar = Q / q_bonds[:, np.newaxis]
     return P_bar, q_bonds
 
 
@@ -187,9 +187,9 @@ P_bar, q_bonds = risk_neutral_probs(Q_mat)
 
 print("One-period bond prices (risk-free discount factors):")
 for i, (s, qb) in enumerate(zip(state_names, q_bonds)):
-    print(f"  {s:12s}: {qb:.5f}  (annualized yield ≈ {-np.log(qb)*12:.2%})")
+    print(f"  {s:12s}: {qb:.5f}  (annualized yield ~ {-np.log(qb)*12:.2%})")
 
-print("\nRisk-neutral transition matrix P̄:")
+print("\nRisk-neutral transition matrix P_bar:")
 print(np.round(P_bar, 4))
 print(f"\nRow sums: {P_bar.sum(axis=1)}")
 ```
@@ -259,39 +259,39 @@ def perron_frobenius(Q):
 
     Returns
     -------
-    eta_hat   : float   — log of the dominant eigenvalue
-    exp_eta   : float   — dominant eigenvalue exp(η̂)
-    e_hat     : ndarray — dominant eigenvector (positive, normalized to sum=1)
-    P_hat     : ndarray — long-term risk-neutral transition matrix
+    η_hat : float — log of the dominant eigenvalue
+    exp_η : float — dominant eigenvalue exp(η_hat)
+    e_hat : ndarray — dominant eigenvector (positive, normalized to sum=1)
+    P_hat : ndarray — long-term risk-neutral transition matrix
     """
     eigenvalues, eigenvectors = linalg.eig(Q)
 
     # Dominant eigenvalue: largest real part (real & positive by Perron–Frobenius)
-    idx     = np.argmax(eigenvalues.real)
-    exp_eta = eigenvalues[idx].real
-    e_hat   = eigenvectors[:, idx].real
+    idx = np.argmax(eigenvalues.real)
+    exp_η = eigenvalues[idx].real
+    e_hat = eigenvectors[:, idx].real
 
     # Ensure positive entries (PF guarantees existence; numpy may flip sign)
     if e_hat.mean() < 0:
         e_hat = -e_hat
     e_hat = np.abs(e_hat) / np.abs(e_hat).sum()   # normalize to sum = 1
 
-    eta_hat = np.log(exp_eta)
+    η_hat = np.log(exp_η)
 
     # Long-term risk-neutral transition matrix
-    # P_hat[i,j] = exp(-η̂) * Q[i,j] * e_hat[j] / e_hat[i]
-    P_hat = (1.0 / exp_eta) * Q * e_hat[np.newaxis, :] / e_hat[:, np.newaxis]
+    # P_hat[i,j] = exp(-η_hat) * Q[i,j] * e_hat[j] / e_hat[i]
+    P_hat = (1.0 / exp_η) * Q * e_hat[np.newaxis, :] / e_hat[:, np.newaxis]
 
-    return eta_hat, exp_eta, e_hat, P_hat
+    return η_hat, exp_η, e_hat, P_hat
 
 
-eta_hat, exp_eta, e_hat, P_hat = perron_frobenius(Q_mat)
+η_hat, exp_η, e_hat, P_hat = perron_frobenius(Q_mat)
 
-print(f"Dominant eigenvalue  exp(η̂) = {exp_eta:.6f}")
-print(f"Log eigenvalue       η̂      = {eta_hat:.5f}  "
-      f"(annualized ≈ {eta_hat*12:.4f})")
-print(f"\nEigenvector ê = {e_hat.round(5)}")
-print(f"\nLong-term risk-neutral P̂:")
+print(f"Dominant eigenvalue  exp(η_hat) = {exp_η:.6f}")
+print(f"Log eigenvalue       η_hat      = {η_hat:.5f}  "
+      f"(annualized ~ {η_hat*12:.4f})")
+print(f"\nEigenvector e_hat = {e_hat.round(5)}")
+print(f"\nLong-term risk-neutral P_hat:")
 print(np.round(P_hat, 4))
 print(f"\nRow sums: {P_hat.sum(axis=1)}")
 ```
@@ -302,9 +302,9 @@ print(f"\nRow sums: {P_hat.sum(axis=1)}")
 fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
 
 matrices = [
-    (P_phys, r'Physical  $\mathbf{P}$',            'Blues'),
-    (P_bar,  r'Risk-neutral  $\bar{\mathbf{P}}$',  'Oranges'),
-    (P_hat,  r'Long-term risk-neutral $\hat{\mathbf{P}}$', 'Greens'),
+    (P_phys, r'physical  $\mathbf{P}$', 'Blues'),
+    (P_bar, r'risk-neutral  $\bar{\mathbf{P}}$', 'Oranges'),
+    (P_hat, r'long-term risk-neutral $\hat{\mathbf{P}}$', 'Greens'),
 ]
 
 for ax, (mat, title, cmap) in zip(axes, matrices):
@@ -313,8 +313,8 @@ for ax, (mat, title, cmap) in zip(axes, matrices):
     ax.set_xticks(range(n));  ax.set_yticks(range(n))
     ax.set_xticklabels(state_names, rotation=20, fontsize=9)
     ax.set_yticklabels(state_names, fontsize=9)
-    ax.set_xlabel('Next state', fontsize=9)
-    ax.set_ylabel('Current state', fontsize=9)
+    ax.set_xlabel('next state', fontsize=9)
+    ax.set_ylabel('current state', fontsize=9)
     plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
     for i in range(n):
         for j in range(n):
@@ -322,7 +322,7 @@ for ax, (mat, title, cmap) in zip(axes, matrices):
             ax.text(j, i, f'{mat[i,j]:.3f}', ha='center', va='center',
                     fontsize=9, color=clr)
 
-plt.suptitle('Transition Matrices Under Alternative Probability Measures',
+plt.suptitle('transition matrices under alternative probability measures',
              fontsize=13, y=1.02)
 plt.tight_layout()
 plt.show()
@@ -333,37 +333,37 @@ plt.show()
 def stationary_dist(P):
     """Compute stationary distribution of an ergodic transition matrix P."""
     n = P.shape[0]
-    A      = (P.T - np.eye(n))
-    A[-1]  = 1.0
-    b      = np.zeros(n);  b[-1] = 1.0
+    A = (P.T - np.eye(n))
+    A[-1] = 1.0
+    b = np.zeros(n);  b[-1] = 1.0
     return linalg.solve(A, b)
 
-pi_phys = stationary_dist(P_phys)
-pi_bar  = stationary_dist(P_bar)
-pi_hat  = stationary_dist(P_hat)
+π_phys = stationary_dist(P_phys)
+π_bar = stationary_dist(P_bar)
+π_hat = stationary_dist(P_hat)
 
 fig, ax = plt.subplots(figsize=(8, 4))
-x  = np.arange(n)
-w  = 0.25
-labels = [r'Physical $P$', r'Risk-neutral $\bar{P}$',
-          r'Long-term risk-neutral $\hat{P}$']
+x = np.arange(n)
+w = 0.25
+labels = [r'physical $P$', r'risk-neutral $\bar{P}$',
+          r'long-term risk-neutral $\hat{P}$']
 colors = ['steelblue', 'darkorange', 'forestgreen']
-for k, (pi, lbl, col) in enumerate(zip([pi_phys, pi_bar, pi_hat], labels, colors)):
-    bars = ax.bar(x + k*w, pi, width=w, label=lbl, color=col, alpha=0.85,
+for k, (π, lbl, col) in enumerate(zip([π_phys, π_bar, π_hat], labels, colors)):
+    bars = ax.bar(x + k*w, π, width=w, label=lbl, color=col, alpha=0.85,
                   edgecolor='white')
-    for b_, v in zip(bars, pi):
+    for b_, v in zip(bars, π):
         ax.text(b_.get_x() + w/2, v + 0.008, f'{v:.3f}',
                 ha='center', va='bottom', fontsize=9)
 
 ax.set_xticks(x + w);  ax.set_xticklabels(state_names)
-ax.set_ylabel('Stationary probability')
-ax.set_title('Stationary Distributions Under Three Probability Measures')
+ax.set_ylabel('stationary probability')
+ax.set_title('stationary distributions under three probability measures')
 ax.legend(fontsize=9)
 plt.tight_layout();  plt.show()
 
 print("Stationary distributions:")
-for lbl, pi in zip(labels, [pi_phys, pi_bar, pi_hat]):
-    print(f"  {lbl:45s}: {np.round(pi,4)}")
+for lbl, π in zip(labels, [π_phys, π_bar, π_hat]):
+    print(f"  {lbl:45s}: {np.round(π,4)}")
 ```
 
 The long-term risk-neutral measure $\hat{\mathbf{P}}$ assigns **higher weight to bad
@@ -411,22 +411,22 @@ The three components are:
 # SDF matrix: s_ij = q_ij / p_ij
 S_mat = np.where(P_phys > 0, Q_mat / P_phys, 0.0)
 
-# Trend SDF: ŝ_ij = exp(η̂) * e_hat_i / e_hat_j
-S_hat = exp_eta * e_hat[:, np.newaxis] / e_hat[np.newaxis, :]
+# Trend SDF: s_hat_ij = exp(η_hat) * e_hat_i / e_hat_j
+S_hat = exp_η * e_hat[:, np.newaxis] / e_hat[np.newaxis, :]
 
-# Martingale increment: ĥ_ij = P̂_ij / P_ij  (also = S_ij / Ŝ_ij)
+# Martingale increment: h_hat_ij = P_hat_ij / P_ij  (also = S_ij / S_hat_ij)
 H_incr = np.where(P_phys > 0, P_hat / P_phys, 0.0)
 
 print("SDF matrix S = Q/P:")
 print(np.round(S_mat, 4))
-print("\nTrend SDF Ŝ = exp(η̂) × ê_i / ê_j:")
+print("\nTrend SDF S_hat = exp(η_hat) * e_hat_i / e_hat_j:")
 print(np.round(S_hat, 4))
-print("\nMartingale increment ĥ = Ŝ × H̃_incr (= P̂/P):")
+print("\nMartingale increment h_hat = S_hat * H_tilde_incr (= P_hat/P):")
 print(np.round(H_incr, 4))
 
-# Verify martingale property: E[ĥ_{ij} | X_t=i] = sum_j ĥ_ij * p_ij = 1
+# Verify martingale property: E[h_hat_{ij} | X_t=i] = sum_j h_hat_ij * p_ij = 1
 mart_check = (H_incr * P_phys).sum(axis=1)
-print(f"\nMartingale property check — E[ĥ | X_t=i] = {mart_check}")
+print(f"\nMartingale property check — E[h_hat | X_t=i] = {mart_check}")
 ```
 
 Higher risk aversion amplifies the pessimistic distortion: as $\gamma$ increases, the
@@ -486,26 +486,26 @@ recovery succeeds exactly** when consumption fluctuations around a deterministic
 are the only source of risk.
 
 ```{code-cell} ipython3
-# Verify: for trend-stationary power utility, ĥ_ij = 1 identically
+# Verify: for trend-stationary power utility, h_hat_ij = 1 identically
 gc = 0.002   # monthly trend growth
 
 # Trend-stationary: consumption growth ratio depends only on state, not history
-# s_ij = exp(-delta - gamma*gc) * (c_j/c_i)^(-gamma)
+# s_ij = exp(-δ - γ*gc) * (c_j/c_i)^(-γ)
 S_trend = np.zeros((n, n))
 for i in range(n):
     for j in range(n):
-        S_trend[i, j] = np.exp(-delta - gamma*gc) * (c_levels[j]/c_levels[i])**(-gamma)
+        S_trend[i, j] = np.exp(-δ - γ*gc) * (c_levels[j]/c_levels[i])**(-γ)
 
 Q_trend = S_trend * P_phys
 
-_, exp_eta_t, e_hat_t, P_hat_t = perron_frobenius(Q_trend)
+_, exp_η_t, e_hat_t, P_hat_t = perron_frobenius(Q_trend)
 
 H_incr_trend = np.where(P_phys > 0, P_hat_t / P_phys, 0.0)
 
-print("Martingale increment ĥ_ij for trend-stationary power utility:")
+print("Martingale increment h_hat_ij for trend-stationary power utility:")
 print(np.round(H_incr_trend, 6))
 print(f"\nMax deviation from 1: {np.abs(H_incr_trend[P_phys>0] - 1).max():.2e}")
-print("→ Martingale is trivial: Recovery succeeds.")
+print("-> Martingale is trivial: Recovery succeeds.")
 ```
 
 ### Recursive (Epstein–Zin) utility
@@ -533,51 +533,51 @@ The additional factor $v^*_j/(\mathbf{P}_i v^*)$ introduces a **nontrivial
 martingale component** whenever continuation values vary across states.
 
 ```{code-cell} ipython3
-def solve_ez_finite(P, c, delta, gamma, gc, tol=1e-12, max_iter=5000):
+def solve_ez_finite(P, c, δ, γ, gc, tol=1e-12, max_iter=5000):
     """
     Solve for Epstein-Zin continuation values in finite Markov chain.
 
     Solves the fixed-point v_i = (1-β)log(c_i) + β/(1-γ) log(P_i @ exp((1-γ)v))
-    where β = exp(-delta - gc).  The special case γ = 1 (log utility) is handled
+    where β = exp(-δ - gc).  The special case γ = 1 (log utility) is handled
     separately to avoid the 0/0 indeterminate form: the recursion reduces to
     v = (I - β P)^{-1} (1-β) log(c) and the SDF simplifies to
     s_ij = exp(-δ - g_c) c_i / c_j.
 
     Returns
     -------
-    v     : ndarray — continuation values (net of time trend)
+    v : ndarray — continuation values (net of time trend)
     vstar : ndarray — exp((1-γ)v)
-    s     : ndarray — one-period SDF matrix
+    s : ndarray — one-period SDF matrix
     """
-    beta  = np.exp(-delta - gc)
+    β = np.exp(-δ - gc)
     log_c = np.log(c)
-    n     = len(c)
+    n = len(c)
 
-    if abs(gamma - 1.0) < 1e-10:
+    if abs(γ - 1.0) < 1e-10:
         # Log utility: (I - β P) v = (1-β) log c
-        v     = linalg.solve(np.eye(n) - beta * P, (1 - beta) * log_c)
+        v = linalg.solve(np.eye(n) - β * P, (1 - β) * log_c)
         vstar = np.ones(n)     # exp((1-1)*v) = 1
-        Pv    = np.ones(n)     # P @ ones = ones
+        Pv = np.ones(n)        # P @ ones = ones
     else:
         # General recursive utility: fixed-point iteration
         v = log_c.copy()
         for _ in range(max_iter):
-            vstar = np.exp((1 - gamma) * v)
-            Pv    = P @ vstar
-            v_new = ((1 - beta) * log_c
-                     + beta / (1 - gamma) * np.log(Pv))
+            vstar = np.exp((1 - γ) * v)
+            Pv = P @ vstar
+            v_new = ((1 - β) * log_c
+                     + β / (1 - γ) * np.log(Pv))
             if np.max(np.abs(v_new - v)) < tol:
                 v = v_new
                 break
             v = v_new
-        vstar = np.exp((1 - gamma) * v)
-        Pv    = P @ vstar
+        vstar = np.exp((1 - γ) * v)
+        Pv = P @ vstar
 
     # SDF matrix
     s = np.zeros((n, n))
     for i in range(n):
         for j in range(n):
-            s[i, j] = np.exp(-delta - gc) * (c[i] / c[j]) * (vstar[j] / Pv[i])
+            s[i, j] = np.exp(-δ - gc) * (c[i] / c[j]) * (vstar[j] / Pv[i])
 
     return v, vstar, s
 
@@ -585,50 +585,50 @@ def solve_ez_finite(P, c, delta, gamma, gc, tol=1e-12, max_iter=5000):
 # Compare: γ = 1 (log utility, degenerate martingale) vs γ = 5
 gc_ex = 0.001   # monthly consumption trend growth
 
-for gam, label in [(1.0, 'γ = 1  (log utility)'), (5.0, 'γ = 5  (risk aversion)')]:
+for γ_val, label in [(1.0, 'γ = 1  (log utility)'), (5.0, 'γ = 5  (risk aversion)')]:
     v_ez, vstar_ez, S_ez = solve_ez_finite(P_phys, c_levels,
-                                            delta, gam, gc_ex)
-    Q_ez    = S_ez * P_phys
+                                            δ, γ_val, gc_ex)
+    Q_ez = S_ez * P_phys
     _, _, _, P_hat_ez = perron_frobenius(Q_ez)
-    H_ez    = np.where(P_phys > 0, P_hat_ez / P_phys, 0.0)
+    H_ez = np.where(P_phys > 0, P_hat_ez / P_phys, 0.0)
 
-    pi_hat_ez  = stationary_dist(P_hat_ez)
+    π_hat_ez = stationary_dist(P_hat_ez)
     print(f"\n{label}")
     print(f"  Continuation values v = {v_ez.round(4)}")
-    print(f"  Max |ĥ_ij - 1|        = {np.abs(H_ez[P_phys>0] - 1).max():.4f}")
-    print(f"  Stationary P̂         = {pi_hat_ez.round(4)}")
-    print(f"  Stationary P          = {pi_phys.round(4)}")
+    print(f"  Max |h_hat_ij - 1|        = {np.abs(H_ez[P_phys>0] - 1).max():.4f}")
+    print(f"  Stationary P_hat         = {π_hat_ez.round(4)}")
+    print(f"  Stationary P          = {π_phys.round(4)}")
 ```
 
 ```{code-cell} ipython3
 # Show how the martingale depends on γ for recursive utility
-# Start at 1.0: the gamma=1 special case in solve_ez_finite is handled explicitly.
-gammas_ez = np.linspace(1.0, 10.0, 50)
+# Start at 1.0: the γ=1 special case in solve_ez_finite is handled explicitly.
+γs_ez = np.linspace(1.0, 10.0, 50)
 mart_errors = []
-pi_rec_hat  = []
+π_rec_hat = []
 
-for gam in gammas_ez:
-    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, delta, gam, gc_ex)
-    Q_g          = S_g * P_phys
-    _, _, _, Ph  = perron_frobenius(Q_g)
-    H_g          = np.where(P_phys > 0, Ph / P_phys, 0.0)
+for γ_val in γs_ez:
+    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
+    Q_g = S_g * P_phys
+    _, _, _, Ph = perron_frobenius(Q_g)
+    H_g = np.where(P_phys > 0, Ph / P_phys, 0.0)
     mart_errors.append(np.abs(H_g[P_phys > 0] - 1).max())
-    pi_rec_hat.append(stationary_dist(Ph)[0])
+    π_rec_hat.append(stationary_dist(Ph)[0])
 
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
 
-ax1.plot(gammas_ez, mart_errors, color='firebrick', lw=2.5)
-ax1.set_xlabel('Risk aversion  γ')
+ax1.plot(γs_ez, mart_errors, color='firebrick', lw=2.5)
+ax1.set_xlabel('risk aversion  γ')
 ax1.set_ylabel(r'$\max_{i,j} |\hat{h}_{ij} - 1|$')
-ax1.set_title('Martingale non-degeneracy vs risk aversion\n(Epstein–Zin utility)')
-
-ax2.plot(gammas_ez, pi_rec_hat, color='steelblue', lw=2.5,
-         label=r'Recession weight under $\hat{P}$')
-ax2.axhline(pi_phys[0], ls='--', color='grey', lw=1.5,
-            label=f'Recession weight under $P$  ({pi_phys[0]:.3f})')
-ax2.set_xlabel('Risk aversion  γ')
-ax2.set_ylabel('Stationary probability')
-ax2.set_title('Recovered recession probability vs risk aversion')
+ax1.set_title('martingale non-degeneracy vs risk aversion\n(Epstein–Zin utility)')
+
+ax2.plot(γs_ez, π_rec_hat, color='steelblue', lw=2.5,
+         label=r'recession weight under $\hat{P}$')
+ax2.axhline(π_phys[0], ls='--', color='grey', lw=1.5,
+            label=f'recession weight under $P$  ({π_phys[0]:.3f})')
+ax2.set_xlabel('risk aversion  γ')
+ax2.set_ylabel('stationary probability')
+ax2.set_title('recovered recession probability vs risk aversion')
 ax2.legend(fontsize=9)
 
 plt.tight_layout();  plt.show()
@@ -636,13 +636,13 @@ plt.tight_layout();  plt.show()
 
 ```{code-cell} ipython3
 # Visualize the martingale increment using Epstein-Zin utility (γ=5).
-# Trend-stationary power utility always yields ĥ_ij = 1 by construction (see Exercise 4),
+# Trend-stationary power utility always yields h_hat_ij = 1 by construction (see Exercise 4),
 # so we use recursive utility here to reveal a genuinely non-trivial martingale.
-gamma_ez_demo = 5.0
-_, _, S_ez_demo = solve_ez_finite(P_phys, c_levels, delta, gamma_ez_demo, gc_ex)
-Q_ez_demo      = S_ez_demo * P_phys
+γ_ez_demo = 5.0
+_, _, S_ez_demo = solve_ez_finite(P_phys, c_levels, δ, γ_ez_demo, gc_ex)
+Q_ez_demo = S_ez_demo * P_phys
 _, _, _, P_hat_ez_demo = perron_frobenius(Q_ez_demo)
-H_incr_ez      = np.where(P_phys > 0, P_hat_ez_demo / P_phys, 1.0)
+H_incr_ez = np.where(P_phys > 0, P_hat_ez_demo / P_phys, 1.0)
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))
 
@@ -650,7 +650,7 @@ vmax_h = max(1.5, H_incr_ez.max() * 1.05)
 vmin_h = min(0.5, H_incr_ez.min() * 0.95)
 im0 = axes[0].imshow(H_incr_ez, cmap='RdYlGn', vmin=vmin_h, vmax=vmax_h, aspect='auto')
 axes[0].set_title(
-    r'Martingale increment $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij}$' '\n'
+    r'martingale increment $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij}$' '\n'
     r'(Epstein–Zin utility, $\gamma=5$)',
     fontsize=11)
 for i in range(n):
@@ -660,24 +660,24 @@ for i in range(n):
 axes[0].set_xticks(range(n));  axes[0].set_yticks(range(n))
 axes[0].set_xticklabels(state_names, rotation=20, fontsize=9)
 axes[0].set_yticklabels(state_names, fontsize=9)
-axes[0].set_xlabel('Next state');  axes[0].set_ylabel('Current state')
+axes[0].set_xlabel('next state');  axes[0].set_ylabel('current state')
 plt.colorbar(im0, ax=axes[0], fraction=0.046)
 
 # How risk aversion γ shifts the recovered measure under Epstein-Zin utility.
-gammas_shift = np.linspace(1.0, 12, 60)
-rec_wts_ez   = []
-for g in gammas_shift:
-    _, _, S_g = solve_ez_finite(P_phys, c_levels, delta, g, gc_ex)
-    Q_g       = S_g * P_phys
+γs_shift = np.linspace(1.0, 12, 60)
+rec_wts_ez = []
+for g in γs_shift:
+    _, _, S_g = solve_ez_finite(P_phys, c_levels, δ, g, gc_ex)
+    Q_g = S_g * P_phys
     _, _, _, Ph = perron_frobenius(Q_g)
     rec_wts_ez.append(stationary_dist(Ph)[0])
 
-axes[1].plot(gammas_shift, rec_wts_ez, color='steelblue', lw=2.5)
-axes[1].axhline(pi_phys[0], color='grey', ls='--', lw=1.5,
-                label=fr'Physical recession prob = {pi_phys[0]:.3f}')
-axes[1].set_xlabel('Risk aversion  γ')
-axes[1].set_ylabel(r'Recession weight under $\hat{P}$')
-axes[1].set_title(r'How $\gamma$ shifts the long-term risk-neutral measure'
+axes[1].plot(γs_shift, rec_wts_ez, color='steelblue', lw=2.5)
+axes[1].axhline(π_phys[0], color='grey', ls='--', lw=1.5,
+                label=fr'physical recession prob = {π_phys[0]:.3f}')
+axes[1].set_xlabel('risk aversion  γ')
+axes[1].set_ylabel(r'recession weight under $\hat{P}$')
+axes[1].set_title(r'how $\gamma$ shifts the long-term risk-neutral measure'
                   '\n(Epstein–Zin utility)')
 axes[1].legend(fontsize=9)
 plt.tight_layout();  plt.show()
@@ -725,19 +725,19 @@ where $H^*$ is a martingale determined by the continuation value of the recursiv
 # Model parameters from Borovicka-Hansen-Scheinkman (2016), Figure 1
 # Monthly frequency
 lrr_params = dict(
-    delta    = 0.002,          # subjective discount rate
-    gamma    = 10.0,           # risk aversion
-    mu11     = -0.021,         # mean reversion of X1
-    mu12     = 0.0,            # (under P; becomes non-zero under P̂)
-    mu22     = -0.013,         # mean reversion of X2
-    iota1    = 0.0,            # long-run mean of X1
-    iota2    = 1.0,            # long-run mean of X2 (normalized)
-    sigma1   = np.array([0.0,  0.00034, 0.0  ]),   # diffusion of X1 (1×3)
-    sigma2   = np.array([0.0,  0.0,    -0.038]),    # diffusion of X2 (1×3)
-    beta_c0  = 0.0015,         # consumption drift constant
-    beta_c1  = 1.0,            # loading on X1
-    beta_c2  = 0.0,            # loading on X2
-    alpha_c  = np.array([0.0078, 0.0, 0.0]),        # consumption diffusion (1×3)
+    δ = 0.002,         # subjective discount rate
+    γ = 10.0,          # risk aversion
+    μ11 = -0.021,      # mean reversion of X1
+    μ12 = 0.0,         # (under P; becomes non-zero under P_hat)
+    μ22 = -0.013,      # mean reversion of X2
+    ι1 = 0.0,          # long-run mean of X1
+    ι2 = 1.0,          # long-run mean of X2 (normalized)
+    σ1 = np.array([0.0, 0.00034, 0.0]),   # diffusion of X1 (1*3)
+    σ2 = np.array([0.0, 0.0, -0.038]),    # diffusion of X2 (1*3)
+    β_c0 = 0.0015,     # consumption drift constant
+    β_c1 = 1.0,        # loading on X1
+    β_c2 = 0.0,        # loading on X2
+    α_c = np.array([0.0078, 0.0, 0.0]),   # consumption diffusion (1*3)
 )
 ```
 
@@ -757,41 +757,41 @@ def solve_value_function(p):
 
     Returns v_bar1, v_bar2.
     """
-    delta, gamma     = p['delta'], p['gamma']
-    mu11, mu12, mu22 = p['mu11'],  p['mu12'], p['mu22']
-    sigma1, sigma2   = p['sigma1'], p['sigma2']
-    beta_c1, beta_c2 = p['beta_c1'], p['beta_c2']
-    mu12_p, alpha_c  = p['mu12'], p['alpha_c']
-
-    # Linear equation for v̄₁
-    # δ v̄₁ = β̄c,1 + μ̄₁₁ v̄₁  ⟹  v̄₁ = β̄c,1 / (δ - μ̄₁₁)
-    v1 = beta_c1 / (delta - mu11)
-
-    # Quadratic equation for v̄₂
-    # 0 = (μ̄₂₂ - δ)v̄₂ + β̄c,2 + μ̄₁₂v̄₁ + ½(1-γ)|A + B v̄₂|²
-    # where A = ᾱc + σ̄₁ v̄₁,  B = σ̄₂
-    A_vec = alpha_c + sigma1 * v1
-    B_vec = sigma2
-
-    a = 0.5 * (1 - gamma) * np.dot(B_vec, B_vec)
-    b = (mu22 - delta) + (1 - gamma) * np.dot(A_vec, B_vec)
-    c = beta_c2 + mu12 * v1 + 0.5 * (1 - gamma) * np.dot(A_vec, A_vec)
+    δ, γ = p['δ'], p['γ']
+    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
+    σ1, σ2 = p['σ1'], p['σ2']
+    β_c1, β_c2 = p['β_c1'], p['β_c2']
+    α_c = p['α_c']
+
+    # Linear equation for v_bar1
+    # δ v_bar1 = β_c1 + μ_bar11 v_bar1  =>  v_bar1 = β_c1 / (δ - μ_bar11)
+    v1 = β_c1 / (δ - μ11)
+
+    # Quadratic equation for v_bar2
+    # 0 = (μ_bar22 - δ)v_bar2 + β_c2 + μ_bar12 v_bar1 + (1/2)(1-γ)|A + B v_bar2|^2
+    # where A = α_c_bar + σ_bar1 v_bar1,  B = σ_bar2
+    A_vec = α_c + σ1 * v1
+    B_vec = σ2
+
+    a = 0.5 * (1 - γ) * np.dot(B_vec, B_vec)
+    b = (μ22 - δ) + (1 - γ) * np.dot(A_vec, B_vec)
+    c = β_c2 + μ12 * v1 + 0.5 * (1 - γ) * np.dot(A_vec, A_vec)
 
     disc = b**2 - 4*a*c
     if disc < 0:
         raise ValueError("Value function does not exist for these parameters.")
 
-    # "Minus" solution (generates ergodic dynamics under P̂)
+    # "Minus" solution (generates ergodic dynamics under P_hat)
     v2 = (-b - np.sqrt(disc)) / (2 * a)
     return v1, v2, A_vec, B_vec
 
 
 v1, v2, A_vec, B_vec = solve_value_function(lrr_params)
-print(f"Value-function slope on X1:  v̄₁ = {v1:.4f}")
-print(f"Value-function slope on X2:  v̄₂ = {v2:.4f}")
+print(f"Value-function slope on X1:  v_bar1 = {v1:.4f}")
+print(f"Value-function slope on X2:  v_bar2 = {v2:.4f}")
 print(f"\nInterpretation:")
-print(f"  Higher X1 (better expected growth) raises continuation value (v̄₁ > 0)")
-print(f"  Higher X2 (more volatility) lowers continuation value (v̄₂ < 0)")
+print(f"  Higher X1 (better expected growth) raises continuation value (v_bar1 > 0)")
+print(f"  Higher X2 (more volatility) lowers continuation value (v_bar2 < 0)")
 ```
 
 ### Perron–Frobenius and recovered dynamics
@@ -801,120 +801,120 @@ def solve_pf_lrr(p, v1, v2, A_vec):
     """
     Solve the Perron-Frobenius problem for the long-run risk model.
 
-    Eigenfunction guess: ê(x) = exp(ē₁ x₁ + ē₂ x₂).
+    Eigenfunction guess: e_hat(x) = exp(e_bar1 x1 + e_bar2 x2).
 
-    Returns ē₁, ē₂, η̂, and the SDF diffusion vector ᾱs.
+    Returns e_bar1, e_bar2, η_hat, and the SDF diffusion vector α_s.
     """
-    delta, gamma     = p['delta'], p['gamma']
-    mu11, mu12, mu22 = p['mu11'],  p['mu12'], p['mu22']
-    iota1, iota2     = p['iota1'], p['iota2']
-    sigma1, sigma2   = p['sigma1'], p['sigma2']
-    alpha_c          = p['alpha_c']
-    beta_c0          = p['beta_c0']
-    beta_c1, beta_c2 = p['beta_c1'], p['beta_c2']
-
-    # SDF diffusion:  ᾱs = −γ ᾱc + (1−γ)(σ̄₁v̄₁ + σ̄₂v̄₂)
-    alpha_s = (-gamma * alpha_c
-               + (1 - gamma) * (sigma1 * v1 + sigma2 * v2))
-
-    # SDF drift parameters in  βs(x) = β̄s0 + β̄s11(x1−ι1) + β̄s12(x2−ι2)
-    beta_s11 = -beta_c1
-    beta_s12 = -beta_c2 - 0.5 * np.dot(alpha_s, alpha_s)
-    beta_s0  = (-delta - beta_c0
-                - 0.5 * iota2 * np.dot(alpha_s, alpha_s))
-
-    # Equation 0 = β̄s11 + μ̄₁₁ ē₁  ⟹  ē₁ = −β̄s11 / μ̄₁₁
-    e1 = -beta_s11 / mu11
-
-    # Quadratic for ē₂
-    # 0 = (β̄s12 + ½|ᾱs|²)  +  ē₁(μ̄₁₂ + σ̄₁·ᾱs) + ½ē₁²|σ̄₁|²
-    #     + ē₂(μ̄₂₂ + σ̄₂·ᾱs + ē₁ σ̄₁·σ̄₂') + ½ē₂²|σ̄₂|²
-    const_pf = (beta_s12 + 0.5*np.dot(alpha_s, alpha_s)    # = 0 by construction
-                + e1*(mu12 + np.dot(sigma1, alpha_s))
-                + 0.5*e1**2*np.dot(sigma1, sigma1))
-    lin_pf   = mu22 + np.dot(sigma2, alpha_s) + e1*np.dot(sigma1, sigma2)
-    quad_pf  = 0.5 * np.dot(sigma2, sigma2)
+    δ, γ = p['δ'], p['γ']
+    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
+    ι1, ι2 = p['ι1'], p['ι2']
+    σ1, σ2 = p['σ1'], p['σ2']
+    α_c = p['α_c']
+    β_c0 = p['β_c0']
+    β_c1, β_c2 = p['β_c1'], p['β_c2']
+
+    # SDF diffusion:  α_s = -γ α_c + (1-γ)(σ_bar1 v_bar1 + σ_bar2 v_bar2)
+    α_s = (-γ * α_c
+           + (1 - γ) * (σ1 * v1 + σ2 * v2))
+
+    # SDF drift parameters in  β_s(x) = β_s0 + β_s11(x1-ι1) + β_s12(x2-ι2)
+    β_s11 = -β_c1
+    β_s12 = -β_c2 - 0.5 * np.dot(α_s, α_s)
+    β_s0 = (-δ - β_c0
+            - 0.5 * ι2 * np.dot(α_s, α_s))
+
+    # Equation 0 = β_s11 + μ_bar11 e_bar1  =>  e_bar1 = -β_s11 / μ_bar11
+    e1 = -β_s11 / μ11
+
+    # Quadratic for e_bar2
+    # 0 = (β_s12 + (1/2)|α_s|^2)  +  e_bar1(μ_bar12 + σ_bar1*α_s) + (1/2)e_bar1^2|σ_bar1|^2
+    #     + e_bar2(μ_bar22 + σ_bar2*α_s + e_bar1 σ_bar1*σ_bar2') + (1/2)e_bar2^2|σ_bar2|^2
+    const_pf = (β_s12 + 0.5*np.dot(α_s, α_s)    # = 0 by construction
+                + e1*(μ12 + np.dot(σ1, α_s))
+                + 0.5*e1**2*np.dot(σ1, σ1))
+    lin_pf = μ22 + np.dot(σ2, α_s) + e1*np.dot(σ1, σ2)
+    quad_pf = 0.5 * np.dot(σ2, σ2)
 
     disc = lin_pf**2 - 4*quad_pf*const_pf
     e2_m = (-lin_pf - np.sqrt(disc)) / (2*quad_pf)
     e2_p = (-lin_pf + np.sqrt(disc)) / (2*quad_pf)
 
-    # η̂ = β̄s0 - β̄s12·ι₂ - ē₂·μ̄₂₂·ι₂  (ι₁ = 0)
-    eta_m = beta_s0 - beta_s12*iota2 - e2_m*mu22*iota2
-    eta_p = beta_s0 - beta_s12*iota2 - e2_p*mu22*iota2
+    # η_hat = β_s0 - β_s12*ι2 - e_bar2*μ_bar22*ι2  (ι1 = 0)
+    η_m = β_s0 - β_s12*ι2 - e2_m*μ22*ι2
+    η_p = β_s0 - β_s12*ι2 - e2_p*μ22*ι2
 
-    # Choose solution with smaller |η̂| (ergodicity requirement)
-    if abs(eta_m) <= abs(eta_p):
-        e2, eta_hat = e2_m, eta_m
+    # Choose solution with smaller |η_hat| (ergodicity requirement)
+    if abs(η_m) <= abs(η_p):
+        e2, η_hat = e2_m, η_m
     else:
-        e2, eta_hat = e2_p, eta_p
+        e2, η_hat = e2_p, η_p
 
-    return e1, e2, eta_hat, alpha_s
+    return e1, e2, η_hat, α_s
 
 
-e1, e2, eta_hat_lrr, alpha_s = solve_pf_lrr(lrr_params, v1, v2, A_vec)
+e1, e2, η_hat_lrr, α_s = solve_pf_lrr(lrr_params, v1, v2, A_vec)
 
-print(f"PF eigenfunction coefficients:  ē₁ = {e1:.4f},  ē₂ = {e2:.4f}")
-print(f"Log eigenvalue:                 η̂  = {eta_hat_lrr:.6f}  "
-      f"(annualized = {eta_hat_lrr*12:.4f})")
+print(f"PF eigenfunction coefficients:  e_bar1 = {e1:.4f},  e_bar2 = {e2:.4f}")
+print(f"Log eigenvalue:                 η_hat  = {η_hat_lrr:.6f}  "
+      f"(annualized = {η_hat_lrr*12:.4f})")
 print(f"\nInterpretation:")
-print(f"  ē₁ = {e1:.2f}: ê down-weights high-X1 (good growth) states")
-print(f"  ē₂ = {e2:.2f}: ê up-weights high-X2 (high volatility) states")
+print(f"  e_bar1 = {e1:.2f}: e_hat down-weights high-X1 (good growth) states")
+print(f"  e_bar2 = {e2:.2f}: e_hat up-weights high-X2 (high volatility) states")
 ```
 
-### Computing the P̂ dynamics
+### Computing the P_hat dynamics
 
 ```{code-cell} ipython3
-def compute_phat_dynamics(p, e1, e2, alpha_s):
+def compute_phat_dynamics(p, e1, e2, α_s):
     """
-    Compute the drift parameters of X under the recovered measure P̂.
+    Compute the drift parameters of X under the recovered measure P_hat.
 
-    Under P̂, the Brownian motion is
-        dŴ_t = −√X₂_t α̂_h dt + dW_t
-    where α̂_h = ᾱs + σ̄₁ ē₁ + σ̄₂ ē₂.
+    Under P_hat, the Brownian motion is
+        dW_hat_t = -sqrt(X2_t) * α_hat_h dt + dW_t
+    where α_hat_h = α_s + σ_bar1 e_bar1 + σ_bar2 e_bar2.
     """
-    mu11, mu12, mu22 = p['mu11'],  p['mu12'], p['mu22']
-    iota1, iota2     = p['iota1'], p['iota2']
-    sigma1, sigma2   = p['sigma1'], p['sigma2']
+    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
+    ι1, ι2 = p['ι1'], p['ι2']
+    σ1, σ2 = p['σ1'], p['σ2']
 
     # Martingale drift correction
-    alpha_h = alpha_s + sigma1 * e1 + sigma2 * e2
+    α_h = α_s + σ1 * e1 + σ2 * e2
 
-    # New drift parameters under P̂
-    mu_hat_11 = mu11
-    mu_hat_12 = mu12 + np.dot(sigma1, alpha_h)
-    mu_hat_22 = mu22 + np.dot(sigma2, alpha_h)
+    # New drift parameters under P_hat
+    μ_hat_11 = μ11
+    μ_hat_12 = μ12 + np.dot(σ1, α_h)
+    μ_hat_22 = μ22 + np.dot(σ2, α_h)
 
     # New long-run means
-    iota_hat_2 = (mu22 / mu_hat_22) * iota2
-    iota_hat_1 = (iota1
-                  + (1.0/mu11) * (mu12*iota2 - mu_hat_12*iota_hat_2))
+    ι_hat_2 = (μ22 / μ_hat_22) * ι2
+    ι_hat_1 = (ι1
+               + (1.0/μ11) * (μ12*ι2 - μ_hat_12*ι_hat_2))
 
     return dict(
-        mu_hat_11  = mu_hat_11,
-        mu_hat_12  = mu_hat_12,
-        mu_hat_22  = mu_hat_22,
-        iota_hat_1 = iota_hat_1,
-        iota_hat_2 = iota_hat_2,
-        alpha_h    = alpha_h,
-        sigma1     = sigma1,
-        sigma2     = sigma2,
+        μ_hat_11 = μ_hat_11,
+        μ_hat_12 = μ_hat_12,
+        μ_hat_22 = μ_hat_22,
+        ι_hat_1 = ι_hat_1,
+        ι_hat_2 = ι_hat_2,
+        α_h = α_h,
+        σ1 = σ1,
+        σ2 = σ2,
     )
 
 
-phat_dyn = compute_phat_dynamics(lrr_params, e1, e2, alpha_s)
+phat_dyn = compute_phat_dynamics(lrr_params, e1, e2, α_s)
 
-print("Dynamics of X under P̂  (vs physical P):")
-print(f"  μ̂₁₁ = {phat_dyn['mu_hat_11']:.4f}  "
-      f"(same as physical μ̄₁₁ = {lrr_params['mu11']:.4f})")
-print(f"  μ̂₁₂ = {phat_dyn['mu_hat_12']:.6f}  "
+print("Dynamics of X under P_hat  (vs physical P):")
+print(f"  μ_hat_11 = {phat_dyn['μ_hat_11']:.4f}  "
+      f"(same as physical μ_bar_11 = {lrr_params['μ11']:.4f})")
+print(f"  μ_hat_12 = {phat_dyn['μ_hat_12']:.6f}  "
       f"(physical = 0 — new coupling created by risk adjustment)")
-print(f"  μ̂₂₂ = {phat_dyn['mu_hat_22']:.5f}  "
-      f"(physical = {lrr_params['mu22']:.4f})")
-print(f"  ι̂₁  = {phat_dyn['iota_hat_1']:.5f}  "
-      f"(physical ι₁ = {lrr_params['iota1']:.4f}  — lower mean growth under P̂)")
-print(f"  ι̂₂  = {phat_dyn['iota_hat_2']:.5f}  "
-      f"(physical ι₂ = {lrr_params['iota2']:.4f}  — higher mean volatility under P̂)")
+print(f"  μ_hat_22 = {phat_dyn['μ_hat_22']:.5f}  "
+      f"(physical = {lrr_params['μ22']:.4f})")
+print(f"  ι_hat_1  = {phat_dyn['ι_hat_1']:.5f}  "
+      f"(physical ι1 = {lrr_params['ι1']:.4f}  — lower mean growth under P_hat)")
+print(f"  ι_hat_2  = {phat_dyn['ι_hat_2']:.5f}  "
+      f"(physical ι2 = {lrr_params['ι2']:.4f}  — higher mean volatility under P_hat)")
 ```
 
 ### Simulating and comparing stationary distributions
@@ -926,7 +926,7 @@ def simulate_lrr(dyn, T=600_000, seed=42):
 
     Parameters
     ----------
-    dyn  : dict with mu11, mu12, mu22, iota1, iota2, sigma1, sigma2
+    dyn  : dict with μ11, μ12, μ22, ι1, ι2, σ1, σ2
     T    : number of monthly steps
     seed : random seed
 
@@ -934,25 +934,25 @@ def simulate_lrr(dyn, T=600_000, seed=42):
     -------
     X1, X2 : ndarray — stationary sample paths (burn-in discarded)
     """
-    rng     = np.random.default_rng(seed)
-    mu11    = dyn.get('mu11',     dyn.get('mu_hat_11'))
-    mu12    = dyn.get('mu12',     dyn.get('mu_hat_12', 0.0))
-    mu22    = dyn.get('mu22',     dyn.get('mu_hat_22'))
-    iota1   = dyn.get('iota1',    dyn.get('iota_hat_1'))
-    iota2   = dyn.get('iota2',    dyn.get('iota_hat_2'))
-    sigma1  = dyn['sigma1']
-    sigma2  = dyn['sigma2']
+    rng = np.random.default_rng(seed)
+    μ11 = dyn.get('μ11', dyn.get('μ_hat_11'))
+    μ12 = dyn.get('μ12', dyn.get('μ_hat_12', 0.0))
+    μ22 = dyn.get('μ22', dyn.get('μ_hat_22'))
+    ι1 = dyn.get('ι1', dyn.get('ι_hat_1'))
+    ι2 = dyn.get('ι2', dyn.get('ι_hat_2'))
+    σ1 = dyn['σ1']
+    σ2 = dyn['σ2']
 
     X1 = np.zeros(T)
-    X2 = np.full(T, iota2)
+    X2 = np.full(T, ι2)
 
     for t in range(1, T):
-        X2t      = max(X2[t-1], 1e-9)
-        sq_X2    = np.sqrt(X2t)
-        dW       = rng.standard_normal(3)          # monthly Δt = 1
+        X2t = max(X2[t-1], 1e-9)
+        sq_X2 = np.sqrt(X2t)
+        dW = rng.standard_normal(3)          # monthly Δt = 1
 
-        X1[t] = X1[t-1] + (mu11*(X1[t-1]-iota1) + mu12*(X2t-iota2)) + sq_X2*np.dot(sigma1, dW)
-        X2[t] = max(X2[t-1] + mu22*(X2t-iota2)   + sq_X2*np.dot(sigma2, dW),  1e-9)
+        X1[t] = X1[t-1] + (μ11*(X1[t-1]-ι1) + μ12*(X2t-ι2)) + sq_X2*np.dot(σ1, dW)
+        X2[t] = max(X2[t-1] + μ22*(X2t-ι2) + sq_X2*np.dot(σ2, dW),  1e-9)
 
     burn = T // 5
     return X1[burn:], X2[burn:]
@@ -961,23 +961,23 @@ def simulate_lrr(dyn, T=600_000, seed=42):
 # Simulation under physical P
 print("Simulating under physical measure P ...")
 X1_P, X2_P = simulate_lrr(
-    dict(mu11=lrr_params['mu11'],  mu12=lrr_params['mu12'],
-         mu22=lrr_params['mu22'],  iota1=lrr_params['iota1'],
-         iota2=lrr_params['iota2'],
-         sigma1=lrr_params['sigma1'], sigma2=lrr_params['sigma2']),
+    dict(μ11=lrr_params['μ11'], μ12=lrr_params['μ12'],
+         μ22=lrr_params['μ22'], ι1=lrr_params['ι1'],
+         ι2=lrr_params['ι2'],
+         σ1=lrr_params['σ1'], σ2=lrr_params['σ2']),
     T=600_000
 )
 
-# Simulation under recovered measure P̂
-print("Simulating under recovered measure P̂ ...")
+# Simulation under recovered measure P_hat
+print("Simulating under recovered measure P_hat ...")
 X1_Ph, X2_Ph = simulate_lrr(
-    dict(mu11    = phat_dyn['mu_hat_11'],
-         mu12    = phat_dyn['mu_hat_12'],
-         mu22    = phat_dyn['mu_hat_22'],
-         iota1   = phat_dyn['iota_hat_1'],
-         iota2   = phat_dyn['iota_hat_2'],
-         sigma1  = lrr_params['sigma1'],
-         sigma2  = lrr_params['sigma2']),
+    dict(μ_hat_11=phat_dyn['μ_hat_11'],
+         μ_hat_12=phat_dyn['μ_hat_12'],
+         μ_hat_22=phat_dyn['μ_hat_22'],
+         ι_hat_1=phat_dyn['ι_hat_1'],
+         ι_hat_2=phat_dyn['ι_hat_2'],
+         σ1=lrr_params['σ1'],
+         σ2=lrr_params['σ2']),
     T=600_000
 )
 print("Done.")
@@ -988,12 +988,12 @@ print("Done.")
 def kde2d_contour(ax, X1, X2, levels=8, color='k', alpha=1.0, lw=1.5,
                   bandwidth=None):
     """Plot contour lines of a 2D kernel density estimate."""
-    xy     = np.vstack([X2, X1])
-    kde    = gaussian_kde(xy, bw_method=bandwidth)
-    x2g    = np.linspace(X2.min()*0.9, X2.max()*1.1, 120)
-    x1g    = np.linspace(X1.min()*0.9, X1.max()*1.1, 120)
+    xy = np.vstack([X2, X1])
+    kde = gaussian_kde(xy, bw_method=bandwidth)
+    x2g = np.linspace(X2.min()*0.9, X2.max()*1.1, 120)
+    x1g = np.linspace(X1.min()*0.9, X1.max()*1.1, 120)
     X2g, X1g = np.meshgrid(x2g, x1g)
-    Z      = kde(np.vstack([X2g.ravel(), X1g.ravel()])).reshape(X2g.shape)
+    Z = kde(np.vstack([X2g.ravel(), X1g.ravel()])).reshape(X2g.shape)
     ax.contour(X2g, X1g, Z, levels=levels, colors=color, alpha=alpha,
                linewidths=lw)
 
@@ -1001,30 +1001,30 @@ fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5), sharey=True)
 
 # Left panel: distribution under P
 kde2d_contour(ax1, X1_P, X2_P, color='navy', levels=7)
-ax1.set_xlabel('Conditional volatility  $X_2$', fontsize=11)
-ax1.set_ylabel('Mean growth rate  $X_1$',       fontsize=11)
-ax1.set_title(r'Physical measure  $P$', fontsize=12)
+ax1.set_xlabel('conditional volatility  $X_2$', fontsize=11)
+ax1.set_ylabel('mean growth rate  $X_1$', fontsize=11)
+ax1.set_title(r'physical measure  $P$', fontsize=12)
 
-# Right panel: distribution under P̂, plus outermost contour of P̄ (risk-neutral)
+# Right panel: distribution under P_hat, plus outermost contour of P_bar (risk-neutral)
 kde2d_contour(ax2, X1_Ph, X2_Ph, color='navy', levels=7)
-ax2.set_xlabel('Conditional volatility  $X_2$', fontsize=11)
-ax2.set_title(r'Long-term risk-neutral  $\hat{P}$', fontsize=12)
+ax2.set_xlabel('conditional volatility  $X_2$', fontsize=11)
+ax2.set_title(r'long-term risk-neutral  $\hat{P}$', fontsize=12)
 
 # Annotate distributional shifts
 for ax in (ax1, ax2):
     ax.axhline(0, color='grey', lw=0.8, ls='--')
-    ax.axvline(lrr_params['iota2'], color='grey', lw=0.8, ls='--')
+    ax.axvline(lrr_params['ι2'], color='grey', lw=0.8, ls='--')
 
-ax1.annotate(f"Mean X₁ ≈ {X1_P.mean():.4f}", xy=(0.05, 0.92),
+ax1.annotate(f"mean X1 ~ {X1_P.mean():.4f}", xy=(0.05, 0.92),
              xycoords='axes fraction', fontsize=9, color='navy')
-ax1.annotate(f"Mean X₂ ≈ {X2_P.mean():.4f}", xy=(0.05, 0.85),
+ax1.annotate(f"mean X2 ~ {X2_P.mean():.4f}", xy=(0.05, 0.85),
              xycoords='axes fraction', fontsize=9, color='navy')
-ax2.annotate(f"Mean X₁ ≈ {X1_Ph.mean():.4f}", xy=(0.05, 0.92),
+ax2.annotate(f"mean X1 ~ {X1_Ph.mean():.4f}", xy=(0.05, 0.92),
              xycoords='axes fraction', fontsize=9, color='navy')
-ax2.annotate(f"Mean X₂ ≈ {X2_Ph.mean():.4f}", xy=(0.05, 0.85),
+ax2.annotate(f"mean X2 ~ {X2_Ph.mean():.4f}", xy=(0.05, 0.85),
              xycoords='axes fraction', fontsize=9, color='navy')
 
-plt.suptitle('Stationary Distributions of $(X_1, X_2)$ Under $P$ and $\\hat{P}$\n'
+plt.suptitle('stationary distributions of $(X_1, X_2)$ under $P$ and $\\hat{P}$\n'
              '(reproducing Figure 1 of Borovička, Hansen & Scheinkman 2016)',
              fontsize=12, y=1.02)
 plt.tight_layout();  plt.show()
@@ -1061,52 +1061,52 @@ Two special cases are:
 - **$\theta = 1$**: $\lambda_1 = \tfrac{1}{2}\mathrm{Var}[\hat{H}_{t+1}/\hat{H}_t]$.
 
 ```{code-cell} ipython3
-def phi_theta(r, theta):
+def φ_θ(r, θ):
     """Discrepancy function φ_θ(r) = [(r)^{1+θ} - 1] / [θ(1+θ)]."""
-    if abs(theta) < 1e-10:      # θ → 0: relative entropy r log r
+    if abs(θ) < 1e-10:      # θ -> 0: relative entropy r log r
         return r * np.log(r)
-    if abs(theta + 1) < 1e-10:  # θ → -1: -log r
+    if abs(θ + 1) < 1e-10:  # θ -> -1: -log r
         return -np.log(r)
-    return (r**(1 + theta) - 1) / (theta * (1 + theta))
+    return (r**(1 + θ) - 1) / (θ * (1 + θ))
 
 
-def martingale_entropy(Q, P, theta=-1):
+def martingale_entropy(Q, P, θ=-1):
     """
-    Compute the stationary-average discrepancy E[φ_θ(ĥ)] for the finite-state chain.
+    Compute the stationary-average discrepancy E[φ_θ(h_hat)] for the finite-state chain.
     """
-    _, exp_eta, e_hat, P_hat = perron_frobenius(Q)
-    H_incr   = np.where(P > 0, P_hat / P, 1.0)  # ĥ_ij
-    pi_hat   = stationary_dist(P_hat)
+    _, exp_η, e_hat, P_hat = perron_frobenius(Q)
+    H_incr = np.where(P > 0, P_hat / P, 1.0)  # h_hat_ij
+    π_hat = stationary_dist(P_hat)
 
-    # Stationary-average: Σ_i Σ_j π̂_i ĥ_ij p_ij  φ_θ(ĥ_ij)
+    # Stationary-average: sum_i sum_j π_hat_i h_hat_ij p_ij  φ_θ(h_hat_ij)
     disc = 0.0
     for i in range(P.shape[0]):
         for j in range(P.shape[1]):
             if P[i, j] > 0:
-                disc += pi_hat[i] * P[i, j] * phi_theta(H_incr[i, j], theta)
+                disc += π_hat[i] * P[i, j] * φ_θ(H_incr[i, j], θ)
     return disc
 
 
 # Compute entropy for different γ values
-gammas_ent = np.linspace(1.0, 10.0, 50)  # gamma=1 handled by solve_ez_finite
-entropies  = {'θ=-1 (neg. log)': [], 'θ=0 (rel. entropy)': [], 'θ=1 (variance/2)': []}
+γs_ent = np.linspace(1.0, 10.0, 50)  # γ=1 handled by solve_ez_finite
+entropies = {'θ=-1 (neg. log)': [], 'θ=0 (rel. entropy)': [], 'θ=1 (variance/2)': []}
 
-for gam in gammas_ent:
-    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, delta, gam, gc_ex)
-    Q_g         = S_g * P_phys
-    for theta, key in [(-1, 'θ=-1 (neg. log)'), (0, 'θ=0 (rel. entropy)'),
-                        (1, 'θ=1 (variance/2)')]:
-        entropies[key].append(martingale_entropy(Q_g, P_phys, theta=theta))
+for γ_val in γs_ent:
+    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
+    Q_g = S_g * P_phys
+    for θ, key in [(-1, 'θ=-1 (neg. log)'), (0, 'θ=0 (rel. entropy)'),
+                   (1, 'θ=1 (variance/2)')]:
+        entropies[key].append(martingale_entropy(Q_g, P_phys, θ=θ))
 
 fig, ax = plt.subplots(figsize=(8, 4.5))
 colors_ent = ['firebrick', 'darkorange', 'steelblue']
 for (label, vals), col in zip(entropies.items(), colors_ent):
-    ax.plot(gammas_ent, vals, label=label, color=col, lw=2)
+    ax.plot(γs_ent, vals, label=label, color=col, lw=2)
 
-ax.set_xlabel('Risk aversion  γ')
+ax.set_xlabel('risk aversion  γ')
 ax.set_ylabel(r'$E[\phi_\theta(\hat{H}_{t+1}/\hat{H}_t)]$')
-ax.set_title('Discrepancy Measures for the Martingale Component\n'
-             '(larger values ↔ larger deviation from Ross recovery)')
+ax.set_title('discrepancy measures for the martingale component\n'
+             '(larger values <-> larger deviation from Ross recovery)')
 ax.legend(fontsize=9)
 plt.tight_layout();  plt.show()
 ```
@@ -1155,18 +1155,18 @@ Q2 = np.array([[0.72, 0.15],
 
 P_bar2, q_bonds2 = risk_neutral_probs(Q2)
 
-print("Risk-neutral transition matrix P̄:")
+print("Risk-neutral transition matrix P_bar:")
 print(np.round(P_bar2, 4))
 print(f"\nRow sums: {P_bar2.sum(axis=1)}")
-print(f"\nOne-period bond prices q̄ᵢ: {q_bonds2}")
+print(f"\nOne-period bond prices q_bar_i: {q_bonds2}")
 print(f"Annualized risk-free rates: {(-np.log(q_bonds2)*12).round(4)}")
 
 # Verify SDF independence from j
 S2 = Q2 / P2
 print(f"\nSDF matrix S = Q/P:")
 print(np.round(S2, 4))
-print("Row 0: all entries should equal q̄₀ =", round(q_bonds2[0], 4))
-print("Row 1: all entries should equal q̄₁ =", round(q_bonds2[1], 4))
+print("Row 0: all entries should equal q_bar0 =", round(q_bonds2[0], 4))
+print("Row 1: all entries should equal q_bar1 =", round(q_bonds2[1], 4))
 ```
 
 ```{solution-end}
@@ -1194,45 +1194,45 @@ vector $\hat{\boldsymbol{\pi}}$ depends on the risk aversion parameter $\gamma$.
 
 ```{code-cell} ipython3
 # Exercise 2 solution
-gammas_ex2 = [1, 2, 5, 10, 15]
-all_pi = []
+γs_ex2 = [1, 2, 5, 10, 15]
+all_π = []
 
-for gam in gammas_ex2:
+for γ_val in γs_ex2:
     Q_g = np.zeros((3, 3))
     for i in range(3):
         for j in range(3):
-            Q_g[i, j] = delta * (c_levels[j]/c_levels[i])**(-gam) * P_phys[i, j]
+            Q_g[i, j] = δ * (c_levels[j]/c_levels[i])**(-γ_val) * P_phys[i, j]
     _, _, _, Ph_g = perron_frobenius(Q_g)
-    all_pi.append(stationary_dist(Ph_g))
+    all_π.append(stationary_dist(Ph_g))
 
 fig, ax = plt.subplots(figsize=(10, 4.5))
-x   = np.arange(3)
-w   = 0.13
-colors_g = plt.cm.Blues(np.linspace(0.3, 0.9, len(gammas_ex2)))
+x = np.arange(3)
+w = 0.13
+colors_g = plt.cm.Blues(np.linspace(0.3, 0.9, len(γs_ex2)))
 
 # Physical distribution
-bars = ax.bar(x - 3*w, pi_phys, width=w, color='grey', alpha=0.7, label='Physical P')
-for b_, v in zip(bars, pi_phys):
+bars = ax.bar(x - 3*w, π_phys, width=w, color='grey', alpha=0.7, label='physical P')
+for b_, v in zip(bars, π_phys):
     ax.text(b_.get_x()+w/2, v+0.005, f'{v:.3f}', ha='center', va='bottom', fontsize=7)
 
-for k, (gam, pi_g, col) in enumerate(zip(gammas_ex2, all_pi, colors_g)):
-    bars = ax.bar(x + (k-1.5)*w, pi_g, width=w, color=col,
-                  label=f'γ={gam}')
-    for b_, v in zip(bars, pi_g):
+for k, (γ_val, π_g, col) in enumerate(zip(γs_ex2, all_π, colors_g)):
+    bars = ax.bar(x + (k-1.5)*w, π_g, width=w, color=col,
+                  label=f'γ={γ_val}')
+    for b_, v in zip(bars, π_g):
         ax.text(b_.get_x()+w/2, v+0.005, f'{v:.3f}',
                 ha='center', va='bottom', fontsize=7)
 
 ax.set_xticks(x);  ax.set_xticklabels(state_names)
-ax.set_ylabel('Stationary probability')
-ax.set_title(r'Stationary distribution of $\hat{P}$ for varying risk aversion $\gamma$')
+ax.set_ylabel('stationary probability')
+ax.set_title(r'stationary distribution of $\hat{P}$ for varying risk aversion $\gamma$')
 ax.legend(fontsize=8, loc='upper right')
 plt.tight_layout();  plt.show()
 
-# Part 3: find γ where recession probability under P̂ exceeds 50%
-gammas_fine = np.linspace(1, 30, 200)
+# Part 3: find γ where recession probability under P_hat exceeds 50%
+γs_fine = np.linspace(1, 30, 200)
 rec_probs = []
-for gam in gammas_fine:
-    Q_g = np.array([[delta*(c_levels[j]/c_levels[i])**(-gam)*P_phys[i,j]
+for γ_val in γs_fine:
+    Q_g = np.array([[δ*(c_levels[j]/c_levels[i])**(-γ_val)*P_phys[i,j]
                      for j in range(3)] for i in range(3)])
     _, _, _, Ph_g = perron_frobenius(Q_g)
     rec_probs.append(stationary_dist(Ph_g)[0])
@@ -1240,9 +1240,9 @@ for gam in gammas_fine:
 # Interpolate crossing point
 idx50 = np.where(np.array(rec_probs) > 0.5)[0]
 if len(idx50) > 0:
-    print(f"\nRecession prob under P̂ exceeds 50% at approximately γ ≈ {gammas_fine[idx50[0]]:.1f}")
+    print(f"\nRecession prob under P_hat exceeds 50% at approximately γ ~ {γs_fine[idx50[0]]:.1f}")
 else:
-    print(f"\nRecession prob under P̂ does not exceed 50% for γ ≤ 30")
+    print(f"\nRecession prob under P_hat does not exceed 50% for γ <= 30")
     print(f"  Maximum recession prob = {max(rec_probs):.4f} at γ = 30")
 ```
 
@@ -1269,51 +1269,51 @@ parameters fixed at their calibrated values).
 
 ```{code-cell} ipython3
 # Exercise 3 solution
-gammas_lrr = np.linspace(2.0, 18.0, 40)
-iota_hat_1_vals = []
-iota_hat_2_vals = []
-eta_hat_vals    = []
+γs_lrr = np.linspace(2.0, 18.0, 40)
+ι_hat_1_vals = []
+ι_hat_2_vals = []
+η_hat_vals = []
 
-p_copy = dict(lrr_params)  # copy to modify gamma
+p_copy = dict(lrr_params)  # copy to modify γ
 
-for gam in gammas_lrr:
-    p_copy['gamma'] = gam
+for γ_val in γs_lrr:
+    p_copy['γ'] = γ_val
     try:
         v1g, v2g, A_g, _ = solve_value_function(p_copy)
-        e1g, e2g, eta_g, alpha_sg = solve_pf_lrr(p_copy, v1g, v2g, A_g)
-        dyn_g = compute_phat_dynamics(p_copy, e1g, e2g, alpha_sg)
-        iota_hat_1_vals.append(dyn_g['iota_hat_1'])
-        iota_hat_2_vals.append(dyn_g['iota_hat_2'])
-        eta_hat_vals.append(eta_g)
+        e1g, e2g, η_g, α_sg = solve_pf_lrr(p_copy, v1g, v2g, A_g)
+        dyn_g = compute_phat_dynamics(p_copy, e1g, e2g, α_sg)
+        ι_hat_1_vals.append(dyn_g['ι_hat_1'])
+        ι_hat_2_vals.append(dyn_g['ι_hat_2'])
+        η_hat_vals.append(η_g)
     except Exception:
-        iota_hat_1_vals.append(np.nan)
-        iota_hat_2_vals.append(np.nan)
-        eta_hat_vals.append(np.nan)
+        ι_hat_1_vals.append(np.nan)
+        ι_hat_2_vals.append(np.nan)
+        η_hat_vals.append(np.nan)
 
 fig, axes = plt.subplots(1, 3, figsize=(14, 4))
 
-axes[0].plot(gammas_lrr, iota_hat_1_vals, color='steelblue', lw=2.5)
-axes[0].axhline(lrr_params['iota1'], ls='--', color='grey', lw=1.5,
-                label=f"Physical ι₁ = {lrr_params['iota1']}")
-axes[0].set_xlabel('Risk aversion  γ');  axes[0].set_ylabel(r'$\hat{\iota}_1$')
-axes[0].set_title('Long-run mean of $X_1$ under $\\hat{P}$\n(↓ = lower expected growth)')
+axes[0].plot(γs_lrr, ι_hat_1_vals, color='steelblue', lw=2.5)
+axes[0].axhline(lrr_params['ι1'], ls='--', color='grey', lw=1.5,
+                label=f"physical ι1 = {lrr_params['ι1']}")
+axes[0].set_xlabel('risk aversion  γ');  axes[0].set_ylabel(r'$\hat{\iota}_1$')
+axes[0].set_title('long-run mean of $X_1$ under $\\hat{P}$\n(down = lower expected growth)')
 axes[0].legend(fontsize=9)
 
-axes[1].plot(gammas_lrr, iota_hat_2_vals, color='firebrick', lw=2.5)
-axes[1].axhline(lrr_params['iota2'], ls='--', color='grey', lw=1.5,
-                label=f"Physical ι₂ = {lrr_params['iota2']}")
-axes[1].set_xlabel('Risk aversion  γ');  axes[1].set_ylabel(r'$\hat{\iota}_2$')
-axes[1].set_title('Long-run mean of $X_2$ under $\\hat{P}$\n(↑ = higher expected volatility)')
+axes[1].plot(γs_lrr, ι_hat_2_vals, color='firebrick', lw=2.5)
+axes[1].axhline(lrr_params['ι2'], ls='--', color='grey', lw=1.5,
+                label=f"physical ι2 = {lrr_params['ι2']}")
+axes[1].set_xlabel('risk aversion  γ');  axes[1].set_ylabel(r'$\hat{\iota}_2$')
+axes[1].set_title('long-run mean of $X_2$ under $\\hat{P}$\n(up = higher expected volatility)')
 axes[1].legend(fontsize=9)
 
-axes[2].plot(gammas_lrr, np.array(eta_hat_vals)*12, color='purple', lw=2.5)
-axes[2].set_xlabel('Risk aversion  γ');  axes[2].set_ylabel(r'Annualized $\hat{\eta}$')
-axes[2].set_title('Long-run discount rate $\\hat{\\eta}$\n(more negative = higher long-run yield)')
+axes[2].plot(γs_lrr, np.array(η_hat_vals)*12, color='purple', lw=2.5)
+axes[2].set_xlabel('risk aversion  γ');  axes[2].set_ylabel(r'annualized $\hat{\eta}$')
+axes[2].set_title('long-run discount rate $\\hat{\\eta}$\n(more negative = higher long-run yield)')
 
 plt.tight_layout();  plt.show()
 
-print("Higher γ → more negative ι̂₁ (P̂ expects lower growth than P)")
-print("Higher γ → higher ι̂₂ (P̂ expects higher volatility than P)")
+print("Higher γ -> more negative ι_hat1 (P_hat expects lower growth than P)")
+print("Higher γ -> higher ι_hat2 (P_hat expects higher volatility than P)")
 ```
 
 ```{solution-end}
@@ -1373,29 +1373,29 @@ Hence $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij} = 1$ for all $(i,j)$.
 # Exercise 4 numerical verification
 # Use trend-stationary power utility (Section "When Does Recovery Succeed?")
 gc_ex4 = 0.002
-S_ts   = np.zeros((3, 3))
+S_ts = np.zeros((3, 3))
 for i in range(3):
     for j in range(3):
-        S_ts[i, j] = np.exp(-delta - gamma*gc_ex4) * (c_levels[j]/c_levels[i])**(-gamma)
+        S_ts[i, j] = np.exp(-δ - γ*gc_ex4) * (c_levels[j]/c_levels[i])**(-γ)
 
 Q_ts = S_ts * P_phys
 
 # Perron-Frobenius
-_, exp_eta_ts, e_hat_ts, P_hat_ts = perron_frobenius(Q_ts)
+_, exp_η_ts, e_hat_ts, P_hat_ts = perron_frobenius(Q_ts)
 
-# Check eigenvector is proportional to c^gamma
-e_theory = c_levels**gamma
+# Check eigenvector is proportional to c^γ
+e_theory = c_levels**γ
 e_theory /= e_theory.sum()
 
-print("Computed eigenvector ê:", np.round(e_hat_ts, 6))
-print("Theoretical cʸ / norm: ", np.round(e_theory, 6))
+print("Computed eigenvector e_hat:", np.round(e_hat_ts, 6))
+print("Theoretical c^γ / norm: ", np.round(e_theory, 6))
 print(f"Max discrepancy: {np.abs(e_hat_ts - e_theory).max():.2e}")
 
 H_ts = np.where(P_phys > 0, P_hat_ts / P_phys, 0.0)
-print(f"\nMartingale increment matrix ĥ:")
+print(f"\nMartingale increment matrix h_hat:")
 print(np.round(H_ts, 6))
-print(f"Max |ĥ_ij - 1|: {np.abs(H_ts[P_phys>0] - 1).max():.2e}")
-print("→ Recovery is exact for trend-stationary power utility.")
+print(f"Max |h_hat_ij - 1|: {np.abs(H_ts[P_phys>0] - 1).max():.2e}")
+print("-> Recovery is exact for trend-stationary power utility.")
 ```
 
 ```{solution-end}
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index f1034171a..79ed60f3c 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -73,8 +73,6 @@ from scipy.stats import norm
 import matplotlib.cm as cm
 
 plt.rcParams['figure.figsize'] = (10, 6)
-plt.rcParams['axes.grid'] = True
-plt.rcParams['grid.alpha'] = 0.3
 ```
 
 ## Model setup
@@ -277,35 +275,35 @@ density.
 We discretize this onto a grid of $m$ states and build the matrix $P$.
 
 ```{code-cell} ipython3
-def build_state_price_matrix(mu, sigma, gamma, delta, T=1.0, n_states=11, n_sigma=5):
+def build_state_price_matrix(μ, σ, γ, δ, T=1.0, n_states=11, n_σ=5):
     """
     Build an m x m state price transition matrix for the lognormal / CRRA model.
 
     Parameters
     ----------
-    mu     : float  Natural expected log-return (annualised)
-    sigma  : float  Volatility (annualised)
-    gamma  : float  Coefficient of relative risk aversion
-    delta  : float  Subjective discount rate
+    μ      : float  Natural expected log-return (annualised)
+    σ      : float  Volatility (annualised)
+    γ      : float  Coefficient of relative risk aversion
+    δ      : float  Subjective discount rate
     T      : float  Horizon (years) for one period
     n_states : int  Number of discrete states
-    n_sigma  : int  Grid half-width in standard deviations
+    n_σ    : int    Grid half-width in standard deviations
 
     Returns
     -------
     P      : (m, m) array  State price matrix
     states : (m,) array    State values (log-return grid)
     """
-    # Equally-spaced grid from -n_sigma*sigma to +n_sigma*sigma
-    states = np.linspace(-n_sigma * sigma * np.sqrt(T),
-                          n_sigma * sigma * np.sqrt(T),
+    # Equally-spaced grid from -n_σ*σ to +n_σ*σ
+    states = np.linspace(-n_σ * σ * np.sqrt(T),
+                          n_σ * σ * np.sqrt(T),
                           n_states)
     ds = states[1] - states[0]   # grid spacing
 
     m = n_states
     P = np.zeros((m, m))
 
-    drift = (mu - 0.5 * sigma**2) * T
+    drift = (μ - 0.5 * σ**2) * T
 
     for i in range(m):
         s_i = states[i]
@@ -313,9 +311,9 @@ def build_state_price_matrix(mu, sigma, gamma, delta, T=1.0, n_states=11, n_sigm
             s_j = states[j]
             log_return = s_j - s_i
             # Normal density evaluated at s_j given s_i
-            n_val = norm.pdf(log_return, loc=drift, scale=sigma * np.sqrt(T))
+            n_val = norm.pdf(log_return, loc=drift, scale=σ * np.sqrt(T))
             # Pricing kernel
-            kernel = np.exp(-delta * T) * np.exp(-gamma * log_return)
+            kernel = np.exp(-δ * T) * np.exp(-γ * log_return)
             P[i, j] = kernel * n_val * ds
 
     return P, states
@@ -323,14 +321,14 @@ def build_state_price_matrix(mu, sigma, gamma, delta, T=1.0, n_states=11, n_sigm
 
 ```{code-cell} ipython3
 # Parameters matching the numerical example in Ross (2015), Section IV
-mu    = 0.08    # 8% annual expected return
-sigma = 0.20    # 20% annual volatility
-gamma = 3.0     # CRRA coefficient
-delta = 0.02    # 2% annual discount rate
-T     = 1.0     # one-year horizon
+μ = 0.08    # 8% annual expected return
+σ = 0.20    # 20% annual volatility
+γ = 3.0     # CRRA coefficient
+δ = 0.02    # 2% annual discount rate
+T = 1.0     # one-year horizon
 
-P, states = build_state_price_matrix(mu, sigma, gamma, delta, T,
-                                     n_states=11, n_sigma=5)
+P, states = build_state_price_matrix(μ, σ, γ, δ, T,
+                                     n_states=11, n_σ=5)
 
 print("State price matrix P  (rows = current state, cols = future state)")
 print("Row sums (should equal discount factor e^{-r}):")
@@ -351,8 +349,8 @@ def recover_natural_distribution(P):
     -------
     F     : (m, m) array  Recovered natural probability transition matrix
     z     : (m,) array    Dominant eigenvector of P (Perron vector)
-    delta : float         Recovered subjective discount rate
-    phi   : (m,) array    Recovered kernel values (relative to state 0)
+    δ     : float         Recovered subjective discount rate
+    φ     : (m,) array    Recovered kernel values (relative to state 0)
     """
     m = P.shape[0]
 
@@ -365,7 +363,7 @@ def recover_natural_distribution(P):
     real_eigenvectors = eigenvectors[:, real_mask].real
 
     idx = np.argmax(real_eigenvalues)
-    delta_recovered = real_eigenvalues[idx]
+    δ_recovered = real_eigenvalues[idx]
     z = real_eigenvectors[:, idx]
 
     # Ensure z is positive (Perron vector)
@@ -380,24 +378,24 @@ def recover_natural_distribution(P):
     D_inv = np.diag(z)
 
     # Recover natural probability matrix
-    F = (1.0 / delta_recovered) * D @ P @ D_inv
+    F = (1.0 / δ_recovered) * D @ P @ D_inv
 
     # Clip small numerical negatives and renormalize rows
     F = np.clip(F, 0, None)
     F = F / F.sum(axis=1, keepdims=True)
 
     # Pricing kernel relative to middle state
-    phi = 1.0 / z
+    φ = 1.0 / z
 
-    return F, z, delta_recovered, phi
+    return F, z, δ_recovered, φ
 ```
 
 ```{code-cell} ipython3
-F, z, delta_rec, phi = recover_natural_distribution(P)
+F, z, δ_rec, φ = recover_natural_distribution(P)
 
-print(f"Recovered discount rate δ  = {delta_rec:.6f}  (true: {np.exp(-delta):.6f})")
+print(f"Recovered discount rate δ  = {δ_rec:.6f}  (true: {np.exp(-δ):.6f})")
 print(f"\nRecovered kernel φ (monotone decreasing in good states):")
-print(np.round(phi, 4))
+print(np.round(φ, 4))
 print(f"\nNatural probability matrix F  (row sums should be 1):")
 print(np.round(F.sum(axis=1), 6))
 ```
@@ -444,7 +442,7 @@ Q_rn = P / row_sums   # risk-neutral probabilities
 
 # One-period marginals
 f_nat = F[mid, :]              # natural: row of recovered F
-f_rn  = Q_rn[mid, :]          # risk-neutral: row of Q
+f_rn = Q_rn[mid, :]            # risk-neutral: row of Q
 
 # State labels in gross return terms
 gross_returns = np.exp(states)
@@ -452,30 +450,27 @@ gross_returns = np.exp(states)
 fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
 # Panel A: densities
-axes[0].plot(gross_returns, f_nat, 'b-o', ms=5, label='Natural (recovered)', lw=2)
-axes[0].plot(gross_returns, f_rn,  'r--s', ms=5, label='Risk-neutral',       lw=2)
-axes[0].set_xlabel('Gross return $S_T / S_0$')
-axes[0].set_ylabel('Probability')
-axes[0].set_title('One-Period Marginal Distributions')
+axes[0].plot(gross_returns, f_nat, 'b-o', ms=5, label='natural (recovered)', lw=2)
+axes[0].plot(gross_returns, f_rn, 'r--s', ms=5, label='risk-neutral', lw=2)
+axes[0].set_xlabel('gross return $S_T / S_0$')
+axes[0].set_ylabel('probability')
+axes[0].set_title('one-period marginal distributions')
 axes[0].legend()
 
 # Panel B: pricing kernel
-axes[1].plot(gross_returns, phi, 'g-^', ms=5, lw=2)
-axes[1].set_xlabel('Gross return $S_T / S_0$')
-axes[1].set_ylabel('Kernel $\\phi$ (relative)')
-axes[1].set_title('Recovered Pricing Kernel')
-
-plt.tight_layout()
-plt.savefig('ross_recovery_distributions.png', dpi=120)
+axes[1].plot(gross_returns, φ, 'g-^', ms=5, lw=2)
+axes[1].set_xlabel('gross return $S_T / S_0$')
+axes[1].set_ylabel('kernel $\\phi$ (relative)')
+axes[1].set_title('recovered pricing kernel')
 plt.show()
 ```
 
 ```{code-cell} ipython3
 # Compute summary statistics
 E_nat = np.sum(f_nat * gross_returns)
-E_rn  = np.sum(f_rn  * gross_returns)
+E_rn = np.sum(f_rn * gross_returns)
 std_nat = np.sqrt(np.sum(f_nat * (gross_returns - E_nat)**2))
-std_rn  = np.sqrt(np.sum(f_rn  * (gross_returns - E_rn )**2))
+std_rn = np.sqrt(np.sum(f_rn * (gross_returns - E_rn)**2))
 
 risk_free = np.sum(P[mid])   # price of riskless bond from middle state
 
@@ -502,21 +497,19 @@ good states relative to the natural measure.
 ```{code-cell} ipython3
 # CDFs
 cdf_nat = np.cumsum(f_nat)
-cdf_rn  = np.cumsum(f_rn)
+cdf_rn = np.cumsum(f_rn)
 
 fig, ax = plt.subplots(figsize=(9, 5))
-ax.plot(gross_returns, cdf_nat, 'b-o', ms=5, lw=2, label='Natural CDF')
-ax.plot(gross_returns, cdf_rn,  'r--s', ms=5, lw=2, label='Risk-neutral CDF')
-ax.set_xlabel('Gross return $S_T / S_0$')
-ax.set_ylabel('Cumulative probability')
-ax.set_title('Stochastic Dominance: Natural CDF lies below Risk-Neutral CDF')
+ax.plot(gross_returns, cdf_nat, 'b-o', ms=5, lw=2, label='natural cdf')
+ax.plot(gross_returns, cdf_rn, 'r--s', ms=5, lw=2, label='risk-neutral cdf')
+ax.set_xlabel('gross return $S_T / S_0$')
+ax.set_ylabel('cumulative probability')
+ax.set_title('stochastic dominance: natural cdf lies below risk-neutral cdf')
 ax.legend()
-plt.tight_layout()
-plt.savefig('ross_recovery_stochdom.png', dpi=120)
 plt.show()
 
 # Verify dominance: natural CDF should be <= risk-neutral CDF at every point
-print(f"Natural CDF ≤ Risk-neutral CDF at all states: "
+print(f"Natural CDF <= Risk-neutral CDF at all states: "
       f"{np.all(cdf_nat <= cdf_rn + 1e-10)}")
 ```
 
@@ -552,18 +545,18 @@ def compute_risk_premia(P, F, states):
     m = len(states)
     gross_returns = np.exp(states)
 
-    rf  = np.zeros(m)
+    rf = np.zeros(m)
     erp = np.zeros(m)
 
     for i in range(m):
         discount = P[i].sum()          # riskless discount factor
-        rf[i]  = -np.log(discount)     # risk-free rate
+        rf[i] = -np.log(discount)      # risk-free rate
 
         # Expected gross return under natural measure
         # We compute E[S_j/S_i] = sum_j F_ij * exp(s_j - s_i)
         relative_returns = np.exp(states - states[i])
         E_R_nat = np.sum(F[i] * relative_returns)
-        E_R_rn  = np.sum((P[i] / discount) * relative_returns)
+        E_R_rn = np.sum((P[i] / discount) * relative_returns)
 
         erp[i] = np.log(E_R_nat) - rf[i]
 
@@ -575,23 +568,21 @@ erp, rf = compute_risk_premia(P, F, states)
 fig, axes = plt.subplots(1, 2, figsize=(13, 5))
 
 axes[0].plot(np.exp(states), rf * 100, 'b-o', ms=5, lw=2)
-axes[0].set_xlabel('Current state $S / S_0$')
-axes[0].set_ylabel('Annual risk-free rate (%)')
-axes[0].set_title('Risk-Free Rate by State')
+axes[0].set_xlabel('current state $S / S_0$')
+axes[0].set_ylabel('annual risk-free rate (%)')
+axes[0].set_title('risk-free rate by state')
 
 axes[1].plot(np.exp(states), erp * 100, 'r-^', ms=5, lw=2)
-axes[1].set_xlabel('Current state $S / S_0$')
-axes[1].set_ylabel('Equity Risk Premium (%)')
-axes[1].set_title('Recovered Equity Risk Premium by State')
+axes[1].set_xlabel('current state $S / S_0$')
+axes[1].set_ylabel('equity risk premium (%)')
+axes[1].set_title('recovered equity risk premium by state')
 
-plt.tight_layout()
-plt.savefig('ross_recovery_erp.png', dpi=120)
 plt.show()
 
 mid = len(states) // 2
 print(f"At the middle state:")
-print(f"  Risk-free rate  ≈ {rf[mid]*100:.2f}% (true: {delta*100:.2f}%)")
-print(f"  Equity premium  ≈ {erp[mid]*100:.2f}% (true: {(mu-delta)*100:.2f}%)")
+print(f"  Risk-free rate  approx {rf[mid]*100:.2f}% (true: {δ*100:.2f}%)")
+print(f"  Equity premium  approx {erp[mid]*100:.2f}% (true: {(μ-δ)*100:.2f}%)")
 ```
 
 ## Sensitivity analysis: effect of risk aversion
@@ -600,42 +591,40 @@ The shape of the pricing kernel, and hence the gap between natural and
 risk-neutral probabilities, depends on the coefficient of risk aversion $\gamma$.
 
 ```{code-cell} ipython3
-gammas = [1.0, 2.0, 3.0, 5.0, 8.0]
-colors = cm.viridis(np.linspace(0.1, 0.9, len(gammas)))
+γs = [1.0, 2.0, 3.0, 5.0, 8.0]
+colors = cm.viridis(np.linspace(0.1, 0.9, len(γs)))
 
 fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
-for gamma_val, color in zip(gammas, colors):
-    P_g, states_g = build_state_price_matrix(mu, sigma, gamma_val, delta, T)
-    F_g, z_g, delta_g, phi_g = recover_natural_distribution(P_g)
+for γ_val, color in zip(γs, colors):
+    P_g, states_g = build_state_price_matrix(μ, σ, γ_val, δ, T)
+    F_g, z_g, δ_g, φ_g = recover_natural_distribution(P_g)
     mid_g = len(states_g) // 2
 
     f_nat_g = F_g[mid_g, :]
-    row_sum  = P_g[mid_g].sum()
-    f_rn_g  = P_g[mid_g] / row_sum
+    row_sum = P_g[mid_g].sum()
+    f_rn_g = P_g[mid_g] / row_sum
 
     gross = np.exp(states_g)
     erp_val = (np.sum(f_nat_g * np.exp(states_g - states_g[mid_g]))
-               - np.exp(delta_g)) * 100
+               - np.exp(δ_g)) * 100
 
-    axes[0].plot(gross, phi_g, color=color, lw=2,
-                 label=f'$\\gamma={gamma_val:.0f}$')
+    axes[0].plot(gross, φ_g, color=color, lw=2,
+                 label=f'$\\gamma={γ_val:.0f}$')
     axes[1].plot(gross, f_nat_g - f_rn_g, color=color, lw=2,
-                 label=f'$\\gamma={gamma_val:.0f}$')
+                 label=f'$\\gamma={γ_val:.0f}$')
 
-axes[0].set_xlabel('Gross return')
-axes[0].set_ylabel('Kernel $\\phi$')
-axes[0].set_title('Pricing Kernel vs Risk Aversion')
+axes[0].set_xlabel('gross return')
+axes[0].set_ylabel('kernel $\\phi$')
+axes[0].set_title('pricing kernel vs risk aversion')
 axes[0].legend(fontsize=9)
 
 axes[1].axhline(0, color='k', lw=0.8, ls='--')
-axes[1].set_xlabel('Gross return')
-axes[1].set_ylabel('Natural minus risk-neutral probability')
-axes[1].set_title('Natural minus Risk-Neutral Density')
+axes[1].set_xlabel('gross return')
+axes[1].set_ylabel('natural minus risk-neutral probability')
+axes[1].set_title('natural minus risk-neutral density')
 axes[1].legend(fontsize=9)
 
-plt.tight_layout()
-plt.savefig('ross_recovery_sensitivity.png', dpi=120)
 plt.show()
 ```
 
@@ -660,25 +649,23 @@ $$
 
 ```{code-cell} ipython3
 # Vary the true discount rate and check how well we recover it
-true_deltas = np.linspace(0.00, 0.06, 13)
-recovered_deltas = []
+true_δs = np.linspace(0.00, 0.06, 13)
+recovered_δs = []
 
-for d in true_deltas:
-    P_d, _ = build_state_price_matrix(mu, sigma, gamma=3.0, delta=d, T=1.0)
+for d in true_δs:
+    P_d, _ = build_state_price_matrix(μ, σ, γ=3.0, δ=d, T=1.0)
     _, _, d_rec, _ = recover_natural_distribution(P_d)
-    recovered_deltas.append(d_rec)
+    recovered_δs.append(d_rec)
 
 plt.figure(figsize=(8, 5))
-plt.plot(true_deltas * 100, true_deltas * 100, 'k--', lw=1.5, label='45° line')
-plt.plot(true_deltas * 100,
-         [-np.log(d_r) * 100 for d_r in recovered_deltas],
-         'bo-', ms=6, lw=2, label='Recovered $\\delta$')
-plt.xlabel('True discount rate (%)')
-plt.ylabel('Recovered discount rate (%)')
-plt.title('Accuracy of Recovered Discount Rate')
+plt.plot(true_δs * 100, true_δs * 100, 'k--', lw=1.5, label='45 deg line')
+plt.plot(true_δs * 100,
+         [-np.log(d_r) * 100 for d_r in recovered_δs],
+         'bo-', ms=6, lw=2, label='recovered $\\delta$')
+plt.xlabel('true discount rate (%)')
+plt.ylabel('recovered discount rate (%)')
+plt.title('accuracy of recovered discount rate')
 plt.legend()
-plt.tight_layout()
-plt.savefig('ross_recovery_delta.png', dpi=120)
 plt.show()
 ```
 
@@ -706,35 +693,33 @@ def tail_prob(f_dist, states, threshold):
     return float(np.sum(f_dist[states <= threshold]))
 
 P_base, states_base = build_state_price_matrix(
-    mu, sigma, gamma=3.0, delta=0.02, T=1.0,
-    n_states=41, n_sigma=5)
-F_base, z_base, delta_base, phi_base = recover_natural_distribution(P_base)
+    μ, σ, γ=3.0, δ=0.02, T=1.0,
+    n_states=41, n_σ=5)
+F_base, z_base, δ_base, φ_base = recover_natural_distribution(P_base)
 
 mid_b = len(states_base) // 2
 f_nat_base = F_base[mid_b]
-f_rn_base  = P_base[mid_b] / P_base[mid_b].sum()
+f_rn_base = P_base[mid_b] / P_base[mid_b].sum()
 
 prob_nat = [tail_prob(f_nat_base, states_base, t) for t in thresholds]
-prob_rn  = [tail_prob(f_rn_base,  states_base, t) for t in thresholds]
+prob_rn = [tail_prob(f_rn_base, states_base, t) for t in thresholds]
 
 fig, ax = plt.subplots(figsize=(10, 5))
-ax.plot(np.exp(thresholds), prob_nat, 'b-',  lw=2, label='Natural (recovered)')
-ax.plot(np.exp(thresholds), prob_rn,  'r--', lw=2, label='Risk-neutral')
-ax.set_xlabel('Gross return threshold')
-ax.set_ylabel('Probability of decline below threshold')
-ax.set_title('Tail Probabilities: Natural vs. Risk-Neutral')
+ax.plot(np.exp(thresholds), prob_nat, 'b-', lw=2, label='natural (recovered)')
+ax.plot(np.exp(thresholds), prob_rn, 'r--', lw=2, label='risk-neutral')
+ax.set_xlabel('gross return threshold')
+ax.set_ylabel('probability of decline below threshold')
+ax.set_title('tail probabilities: natural vs. risk-neutral')
 ax.axvline(x=0.75, color='gray', ls=':', lw=1.5, label='25% decline')
 ax.axvline(x=0.70, color='silver', ls=':', lw=1.5, label='30% decline')
 ax.legend()
-plt.tight_layout()
-plt.savefig('ross_recovery_tail.png', dpi=120)
 plt.show()
 
 # Print specific tail probabilities
 for thresh, label in [(-0.25, '25% decline'), (-0.30, '30% decline'),
                        (-0.10, '10% decline')]:
     p_n = tail_prob(f_nat_base, states_base, thresh)
-    p_r = tail_prob(f_rn_base,  states_base, thresh)
+    p_r = tail_prob(f_rn_base, states_base, thresh)
     print(f"P(log-return < {thresh:.0%}):   Natural = {p_n:.4f},   "
           f"Risk-Neutral = {p_r:.4f},   Ratio = {p_r/p_n:.2f}x")
 ```
@@ -767,22 +752,22 @@ R^2 \leq e^{2rT} \, \mathrm{Var}(\phi).
 $$
 
 ```{code-cell} ipython3
-def kernel_variance(phi, f_nat):
+def kernel_variance(φ, f_nat):
     """Variance of the pricing kernel under the natural measure."""
-    E_phi   = np.sum(phi * f_nat)
-    E_phi2  = np.sum(phi**2 * f_nat)
-    return E_phi2 - E_phi**2, E_phi
+    E_φ = np.sum(φ * f_nat)
+    E_φ2 = np.sum(φ**2 * f_nat)
+    return E_φ2 - E_φ**2, E_φ
 
 
-var_phi, E_phi = kernel_variance(phi_base, f_nat_base)
-std_phi = np.sqrt(var_phi)
+var_φ, E_φ = kernel_variance(φ_base, f_nat_base)
+std_φ = np.sqrt(var_φ)
 
 print(f"Pricing kernel statistics (one year):")
-print(f"  E[φ]     = {E_phi:.4f}")
-print(f"  Var(φ)   = {var_phi:.4f}")
-print(f"  Std(φ)   = {std_phi:.4f}")
-print(f"\nHansen-Jagannathan bound on Sharpe ratio: {std_phi:.4f}")
-print(f"Upper bound on R² in return forecasting: {var_phi:.4f}")
+print(f"  E[φ]     = {E_φ:.4f}")
+print(f"  Var(φ)   = {var_φ:.4f}")
+print(f"  Std(φ)   = {std_φ:.4f}")
+print(f"\nHansen-Jagannathan bound on Sharpe ratio: {std_φ:.4f}")
+print(f"Upper bound on R^2 in return forecasting: {var_φ:.4f}")
 ```
 
 ## Limitations and extensions
@@ -866,23 +851,23 @@ P_ex = np.array([
 
 eigenvalues, eigenvectors = eig(P_ex)
 real_mask = np.isreal(eigenvalues)
-real_ev   = eigenvalues[real_mask].real
+real_ev = eigenvalues[real_mask].real
 real_evec = eigenvectors[:, real_mask].real
 
-idx   = np.argmax(real_ev)
-delta_ex = real_ev[idx]
-z_ex  = real_evec[:, idx]
+idx = np.argmax(real_ev)
+δ_ex = real_ev[idx]
+z_ex = real_evec[:, idx]
 if z_ex.min() < 0:
     z_ex = -z_ex
 z_ex = z_ex / z_ex[1]   # normalise to middle state
 
-print(f"(a) Dominant eigenvalue δ = {delta_ex:.6f}")
+print(f"(a) Dominant eigenvalue δ = {δ_ex:.6f}")
 print(f"    Eigenvector z          = {z_ex}")
 
 # (b) Recover F
-D_ex    = np.diag(1.0 / z_ex)
+D_ex = np.diag(1.0 / z_ex)
 D_inv_ex = np.diag(z_ex)
-F_ex    = (1.0 / delta_ex) * D_ex @ P_ex @ D_inv_ex
+F_ex = (1.0 / δ_ex) * D_ex @ P_ex @ D_inv_ex
 
 print(f"\n(b) Recovered natural transition matrix F:")
 print(np.round(F_ex, 4))
@@ -892,9 +877,9 @@ print(f"\n(c) Row sums of F: {np.round(F_ex.sum(axis=1), 8)}")
 print(f"    All non-negative: {(F_ex >= -1e-10).all()}")
 
 # (d) Pricing kernel
-phi_ex = 1.0 / z_ex
-print(f"\n(d) Pricing kernel φ = {np.round(phi_ex, 4)}")
-print(f"    Kernel decreasing state 1→3: {phi_ex[0] > phi_ex[1] > phi_ex[2]}")
+φ_ex = 1.0 / z_ex
+print(f"\n(d) Pricing kernel φ = {np.round(φ_ex, 4)}")
+print(f"    Kernel decreasing state 1->3: {φ_ex[0] > φ_ex[1] > φ_ex[2]}")
 ```
 
 ```{solution-end}
@@ -936,20 +921,20 @@ P_ex = np.array([
 from scipy.linalg import eig
 eigenvalues, eigenvectors = eig(P_ex)
 real_mask = np.isreal(eigenvalues)
-real_ev   = eigenvalues[real_mask].real
+real_ev = eigenvalues[real_mask].real
 real_evec = eigenvectors[:, real_mask].real
-idx   = np.argmax(real_ev)
-delta_ex = real_ev[idx]
-z_ex  = real_evec[:, idx]
+idx = np.argmax(real_ev)
+δ_ex = real_ev[idx]
+z_ex = real_evec[:, idx]
 if z_ex.min() < 0:
     z_ex = -z_ex
 z_ex = z_ex / z_ex[1]
 
-D_ex    = np.diag(1.0 / z_ex)
+D_ex = np.diag(1.0 / z_ex)
 D_inv_ex = np.diag(z_ex)
-F_ex    = (1.0 / delta_ex) * D_ex @ P_ex @ D_inv_ex
-F_ex    = np.clip(F_ex, 0, None)
-F_ex   /= F_ex.sum(axis=1, keepdims=True)
+F_ex = (1.0 / δ_ex) * D_ex @ P_ex @ D_inv_ex
+F_ex = np.clip(F_ex, 0, None)
+F_ex /= F_ex.sum(axis=1, keepdims=True)
 
 # (a) Marginals from state 2 (index 1)
 start = 1
@@ -962,7 +947,7 @@ print(f"    Risk-neutral q = {np.round(q_marg, 4)}")
 
 # (b) CDFs
 cdf_nat = np.cumsum(f_marg)
-cdf_rn  = np.cumsum(q_marg)
+cdf_rn = np.cumsum(q_marg)
 
 print("\n(b) CDFs:")
 for k in range(3):
@@ -970,8 +955,8 @@ for k in range(3):
 
 # (c) Stochastic dominance
 dominates = np.all(cdf_nat <= cdf_rn + 1e-10)
-print(f"\n(c) Natural CDF ≤ Risk-neutral CDF at all states: {dominates}")
-print("    → Natural distribution stochastically dominates risk-neutral distribution ✓")
+print(f"\n(c) Natural CDF <= Risk-neutral CDF at all states: {dominates}")
+print("    -> Natural distribution stochastically dominates risk-neutral distribution")
 ```
 
 ```{solution-end}
@@ -982,7 +967,7 @@ print("    → Natural distribution stochastically dominates risk-neutral distri
 
 **Risk aversion and tail risk.**
 
-Write a function `tail_risk_ratio(gamma, threshold, mu, sigma, delta, T)` that:
+Write a function `tail_risk_ratio(γ, threshold, μ, σ, δ, T)` that:
 
 1. Constructs the state price matrix $P$ using `build_state_price_matrix` with
    the given parameters and `n_states=41`.
@@ -1006,14 +991,14 @@ import numpy as np
 import matplotlib.pyplot as plt
 
 
-def tail_risk_ratio(gamma, threshold, mu=0.08, sigma=0.20, delta=0.02, T=1.0):
+def tail_risk_ratio(γ, threshold, μ=0.08, σ=0.20, δ=0.02, T=1.0):
     """
     Compute ratio of risk-neutral to natural tail probability P(log-return < threshold).
     """
     P_g, states_g = build_state_price_matrix(
-        mu, sigma, gamma, delta, T, n_states=41, n_sigma=5)
+        μ, σ, γ, δ, T, n_states=41, n_σ=5)
 
-    F_g, z_g, delta_g, phi_g = recover_natural_distribution(P_g)
+    F_g, z_g, δ_g, φ_g = recover_natural_distribution(P_g)
 
     mid_g = len(states_g) // 2
 
@@ -1021,23 +1006,21 @@ def tail_risk_ratio(gamma, threshold, mu=0.08, sigma=0.20, delta=0.02, T=1.0):
     f_rn_g  = P_g[mid_g] / P_g[mid_g].sum()
 
     p_nat = float(np.sum(f_nat_g[states_g <= threshold]))
-    p_rn  = float(np.sum(f_rn_g[states_g  <= threshold]))
+    p_rn = float(np.sum(f_rn_g[states_g  <= threshold]))
 
     if p_nat < 1e-12:
         return np.nan
     return p_rn / p_nat
 
 
-gammas = np.linspace(1.0, 10.0, 20)
-ratios = [tail_risk_ratio(g, -0.30) for g in gammas]
+γs = np.linspace(1.0, 10.0, 20)
+ratios = [tail_risk_ratio(g, -0.30) for g in γs]
 
 plt.figure(figsize=(9, 5))
-plt.plot(gammas, ratios, 'b-o', ms=5, lw=2)
-plt.xlabel('Risk aversion coefficient $\\gamma$')
-plt.ylabel('Risk-neutral / Natural tail probability')
-plt.title('Tail Risk Ratio for a 30% Decline vs Risk Aversion')
-plt.tight_layout()
-plt.savefig('ross_recovery_ex3.png', dpi=120)
+plt.plot(γs, ratios, 'b-o', ms=5, lw=2)
+plt.xlabel('risk aversion coefficient $\\gamma$')
+plt.ylabel('risk-neutral / natural tail probability')
+plt.title('tail risk ratio for a 30% decline vs risk aversion')
 plt.show()
 
 # Economic interpretation

From f18b093db4a7b2961fbd87394c46ffc9aaad75dd Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Sun, 26 Apr 2026 00:11:27 +0800
Subject: [PATCH 17/26] updates

---
 lectures/_static/quant-econ.bib            |   7 +-
 lectures/blackwell_kihlstrom.md            |   2 +-
 lectures/cass_fiscal.md                    |   2 +-
 lectures/cass_fiscal_2.md                  |   2 +-
 lectures/chow_business_cycles.md           |   6 +-
 lectures/hansen_singleton_1982.md          |   2 +-
 lectures/hansen_singleton_1983.md          |   4 +-
 lectures/inventory_q.md                    |  20 +-
 lectures/lqcontrol.md                      |   2 +-
 lectures/markov_perf.md                    |   6 +-
 lectures/misspecified_recovery.md          | 769 ++++++++++++---------
 lectures/odu.md                            |   4 +-
 lectures/ross_recovery.md                  | 690 +++++++++---------
 lectures/rs_inventory_q.md                 |  14 +-
 lectures/survival_recursive_preferences.md |  16 +-
 lectures/theil_1.md                        |  18 +-
 lectures/theil_2.md                        |  22 +-
 lectures/two_computation.md                |   2 +-
 lectures/var_dmd.md                        |  15 +-
 19 files changed, 853 insertions(+), 750 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 62b683e46..f26738561 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -127,7 +127,7 @@ @article{BakshiChabiYo2012
   number    = {1},
   pages     = {191--208},
   year      = {2012},
-  doi       = {10.1016/j.jfineco.2011.10.004}
+  doi       = {10.1016/j.jfineco.2012.01.003}
 }
 
 @article{BackusGregoryZin1989,
@@ -138,7 +138,7 @@ @article{BackusGregoryZin1989
   number    = {3},
   pages     = {371--399},
   year      = {1989},
-  doi       = {10.1016/0304-3932(89)90033-X}
+  doi       = {10.1016/0304-3932(89)90027-5}
 }
 
 @article{Hansen2012,
@@ -172,7 +172,8 @@ @article{Borovicka2020
   number    = {1},
   pages     = {206--251},
   year      = {2020},
-  publisher = {University of Chicago Press}
+  publisher = {University of Chicago Press},
+  doi       = {10.1086/704072}
 }
 
 @article{Sandroni2000Markets,
diff --git a/lectures/blackwell_kihlstrom.md b/lectures/blackwell_kihlstrom.md
index ad1cfb9d3..65192a9ec 100644
--- a/lectures/blackwell_kihlstrom.md
+++ b/lectures/blackwell_kihlstrom.md
@@ -962,7 +962,7 @@ The Blackwell order says that, absent costs, more information is always better f
 
 With costs, the consumer chooses quality investment $\theta$ to maximize *net value*.
 
-If quality investment translates into experiment accuracy with diminishing returns — say, accuracy $\phi(\theta) = 1 - e^{-a\theta}$ for a rate parameter $a$ — then the marginal value of information eventually decreases in $\theta$.
+If quality investment translates into experiment accuracy with diminishing returns -- say, accuracy $\phi(\theta) = 1 - e^{-a\theta}$ for a rate parameter $a$ -- then the marginal value of information eventually decreases in $\theta$.
 
 With a convex cost $c(\theta) = c \, \theta^2$, the increasing marginal cost eventually overtakes the declining marginal value, producing an interior optimum.
 
diff --git a/lectures/cass_fiscal.md b/lectures/cass_fiscal.md
index b7e3646df..618549610 100644
--- a/lectures/cass_fiscal.md
+++ b/lectures/cass_fiscal.md
@@ -1133,7 +1133,7 @@ and capital stock across time:
     - The jump in $\tau_c$ depresses $\bar{R}$ below $1$, causing a *sharp drop in consumption*.
 - After $T = 10$:
     - The effects of anticipated distortion are over, and the economy gradually adjusts to the lower capital stock.
-    - Capital must now rise, requiring *austerity* —consumption plummets after $t = T$,  indicated by  lower levels of consumption.
+    - Capital must now rise, requiring *austerity* --consumption plummets after $t = T$,  indicated by  lower levels of consumption.
     - The interest rate gradually declines, and consumption grows at a diminishing rate along the path to the terminal steady-state.
 
 +++
diff --git a/lectures/cass_fiscal_2.md b/lectures/cass_fiscal_2.md
index 3f396e500..2ff84230b 100644
--- a/lectures/cass_fiscal_2.md
+++ b/lectures/cass_fiscal_2.md
@@ -498,7 +498,7 @@ This means that foreign households begin repaying part of their external debt by
 
 We now explore the impact of an increase in capital taxation in the domestic economy $10$ periods after its announcement at $t = 1$.
 
-Because the change is anticipated, households in both countries adjust immediately—even though the tax does not take effect until period $t = 11$.
+Because the change is anticipated, households in both countries adjust immediately--even though the tax does not take effect until period $t = 11$.
 
 ```{code-cell} ipython3
 shocks_global = {
diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index 6fcc777a5..e7bfb9648 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -351,9 +351,9 @@ The second equation is the discrete Lyapunov equation for $\Gamma_0$.
 > But in reality the cycles ... are generally not damped.
 > How can the maintenance of the swings be explained?
 > ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
-> Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings—we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
+> Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings--we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
 >
-> — Ragnar Frisch (1933) {cite}`frisch33`
+> -- Ragnar Frisch (1933) {cite}`frisch33`
 
 Chow's main insight is that oscillations in the deterministic system are *neither necessary nor sufficient* for producing "cycles" in the stochastic system.
 
@@ -1408,7 +1408,7 @@ plt.show()
 
 As $v$ increases, eigenvalues approach the unit circle: oscillations become more persistent in the time domain (left), and the spectral peak becomes sharper in the frequency domain (right).
 
-Complex roots produce a pronounced peak at interior frequencies—the spectral signature of business cycles.
+Complex roots produce a pronounced peak at interior frequencies--the spectral signature of business cycles.
 
 ```{solution-end}
 ```
diff --git a/lectures/hansen_singleton_1982.md b/lectures/hansen_singleton_1982.md
index 862b7ba03..560cd5c4b 100644
--- a/lectures/hansen_singleton_1982.md
+++ b/lectures/hansen_singleton_1982.md
@@ -225,7 +225,7 @@ The vector $z_t$ plays the role of **instruments**.
 
 The conditional Euler equation $E_t[M_{t+1}R_{t+1}^i - 1] = 0$ says that the pricing error is unpredictable given *everything* in the agent's time-$t$ information set.
 
-That is a very strong restriction — it says the pricing error is orthogonal to every time-$t$ measurable random variable.
+That is a very strong restriction -- it says the pricing error is orthogonal to every time-$t$ measurable random variable.
 
 We cannot use the entire information set in practice, but we can pick any finite collection of time-$t$ observable variables $z_t$ and the orthogonality must still hold.
 
diff --git a/lectures/hansen_singleton_1983.md b/lectures/hansen_singleton_1983.md
index c2df80578..ee70ff25b 100644
--- a/lectures/hansen_singleton_1983.md
+++ b/lectures/hansen_singleton_1983.md
@@ -36,7 +36,7 @@ kernelspec:
 > rational expectations econometrics. A rational expectations equilibrium is a
 > likelihood function. Maximize it.
 >
-> — An Interview with Thomas J. Sargent {cite}`evans2005interview`
+> -- An Interview with Thomas J. Sargent {cite}`evans2005interview`
 
 ## Overview
 
@@ -1869,7 +1869,7 @@ Our estimates reproduce the pattern that {cite:t}`MehraPrescott1985` later calle
 
 - *Low estimated risk aversion:* The estimated $\hat\alpha$ values (and thus risk aversion $-\hat\alpha$) from the table above are similar to those in {cite:t}`hansen1983stochastic`, who report $\hat\alpha$ between $-0.32$ and $-1.25$.
 
-- *Tiny return predictability:* The unrestricted-VAR $R_R^2$ values are comparable to the 0.02 to 0.06 range in {cite:t}`hansen1983stochastic` — the predictable component of stock returns is small relative to the unpredictable component.
+- *Tiny return predictability:* The unrestricted-VAR $R_R^2$ values are comparable to the 0.02 to 0.06 range in {cite:t}`hansen1983stochastic` -- the predictable component of stock returns is small relative to the unpredictable component.
 
 - *Strong rejection for Treasury bills:* The Euler-equation restrictions are decisively rejected for the nominally risk-free Treasury bill return, just as in Table 4 of {cite:t}`hansen1983stochastic`.
 
diff --git a/lectures/inventory_q.md b/lectures/inventory_q.md
index e9c9de9ce..9095b5727 100644
--- a/lectures/inventory_q.md
+++ b/lectures/inventory_q.md
@@ -35,7 +35,7 @@ A firm must decide how much stock to order each period, facing uncertain demand
 We approach the problem in two ways.
 
 First, we solve it exactly using dynamic programming, assuming full knowledge of
-the model — the demand distribution, cost parameters, and transition dynamics.
+the model -- the demand distribution, cost parameters, and transition dynamics.
 
 Second, we show how a manager can learn the optimal policy from experience alone, using [Q-learning](https://en.wikipedia.org/wiki/Q-learning).
 
@@ -475,7 +475,7 @@ All the manager needs to observe at each step is:
 4. the discount factor $\beta$, which is determined by the interest rate, and
 5. the next inventory level $X_{t+1}$ (which they can read off the warehouse).
 
-These are all directly observable quantities — no model knowledge is required.
+These are all directly observable quantities -- no model knowledge is required.
 
 
 ### The Q-table and the role of the max
@@ -483,7 +483,7 @@ These are all directly observable quantities — no model knowledge is required.
 It is important to understand how the update rule relates to the manager's
 actions.
 
-The manager maintains a **Q-table** — a lookup table storing an estimate $q_t(x,
+The manager maintains a **Q-table** -- a lookup table storing an estimate $q_t(x,
 a)$ for every state-action pair $(x, a)$.
 
 At each step, the manager is in some state $x$ and must choose a specific action
@@ -492,7 +492,7 @@ and next state $X_{t+1}$, and updates *that one entry* $q_t(x, a)$ of the
 table using the rule above.
 
 It is tempting to read the $\max_{a'}$ in the update rule as prescribing the
-manager's next action — that is, to interpret the update as saying "move to
+manager's next action -- that is, to interpret the update as saying "move to
 state $X_{t+1}$ and take an action in $\argmax_{a'} q_t(X_{t+1}, a')$."
 
 But the $\max$ plays a different role.  
@@ -512,7 +512,7 @@ The rule governing how the manager chooses actions is called the **behavior poli
 
 Because the $\max$ in the update target always points toward $q^*$
 regardless of how the manager selects actions, the behavior policy affects only
-which $(x, a)$ entries get visited — and hence updated — over time.
+which $(x, a)$ entries get visited -- and hence updated -- over time.
 
 In the reinforcement learning literature, this property is called **off-policy**
 learning: the convergence target ($q^*$) does not depend on the behavior policy.
@@ -521,8 +521,8 @@ As long as every $(x, a)$ pair is visited infinitely often (so that every entry
 of the Q-table receives infinitely many updates) and the learning rates satisfy
 standard conditions (see below), the Q-table converges to $q^*$.
 
-The behavior policy affects the *speed* of convergence — visiting important
-state-action pairs more frequently leads to faster learning — but not the
+The behavior policy affects the *speed* of convergence -- visiting important
+state-action pairs more frequently leads to faster learning -- but not the
 *limit*.
 
 In practice, we want the manager to mostly take good actions (to earn reasonable
@@ -555,11 +555,11 @@ The stochastic demand shocks naturally drive the manager across different invent
 
 A simple but powerful technique for accelerating learning is **optimistic initialization**: instead of starting the Q-table at zero, we initialize every entry to a value above the true optimum.
 
-Because every untried action looks optimistically good, the agent is "disappointed" whenever it tries one — the update pulls that entry down toward reality. This drives the agent to try other actions (which still look optimistically high), producing broad exploration of the state-action space early in training.
+Because every untried action looks optimistically good, the agent is "disappointed" whenever it tries one -- the update pulls that entry down toward reality. This drives the agent to try other actions (which still look optimistically high), producing broad exploration of the state-action space early in training.
 
 This idea is sometimes called **optimism in the face of uncertainty** and is widely used in both bandit and reinforcement learning settings.
 
-In our problem, the value function $v^*$ ranges from about 13 to 18. We initialize the Q-table at 20 — modestly above the true maximum — to ensure optimistic exploration without being so extreme as to distort learning.
+In our problem, the value function $v^*$ ranges from about 13 to 18. We initialize the Q-table at 20 -- modestly above the true maximum -- to ensure optimistic exploration without being so extreme as to distort learning.
 
 ### Implementation
 
@@ -581,7 +581,7 @@ def greedy_policy_from_q(q, K):
     return σ
 ```
 
-The Q-learning loop runs for `n_steps` total steps in a single continuous trajectory — just as a real manager would learn from the ongoing stream of data.
+The Q-learning loop runs for `n_steps` total steps in a single continuous trajectory -- just as a real manager would learn from the ongoing stream of data.
 
 At specified step counts (given by `snapshot_steps`), we record the current greedy policy.
 
diff --git a/lectures/lqcontrol.md b/lectures/lqcontrol.md
index 10f66284f..68edfe99b 100644
--- a/lectures/lqcontrol.md
+++ b/lectures/lqcontrol.md
@@ -1267,7 +1267,7 @@ The parameters are $r = 0.05, \beta = 1 / (1 + r), \bar c = 1.5,  \mu = 2, \sigm
 
 Here’s one solution.
 
-We use some fancy plot commands to get a certain style — feel free to
+We use some fancy plot commands to get a certain style -- feel free to
 use simpler ones.
 
 The model is an LQ permanent income / life-cycle model with hump-shaped
diff --git a/lectures/markov_perf.md b/lectures/markov_perf.md
index 49d870890..fd5bcbf70 100644
--- a/lectures/markov_perf.md
+++ b/lectures/markov_perf.md
@@ -140,7 +140,10 @@ v_i(q_i, q_{-i}) = \max_{\hat q_i}
    \left\{\pi_i (q_i, q_{-i}, \hat q_i) + \beta v_i(\hat q_i, f_{-i}(q_{-i}, q_i)) \right\}
 ```
 
-**Definition**  A **Markov perfect equilibrium** of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
+```{prf:definition} Markov Perfect Equilibrium
+:label: def-markov-perfect-equilibrium
+
+A **Markov perfect equilibrium** of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
 
 * The value function $v_i$ satisfies  Bellman equation {eq}`game4`.
 * The maximizer on the right side of {eq}`game4`  equals $f_i(q_i, q_{-i})$.
@@ -150,6 +153,7 @@ The adjective "Markov" denotes that the equilibrium decision rules depend only o
 "Perfect" means complete, in the sense that the equilibrium is constructed by backward induction and hence builds in optimizing behavior for each firm at all possible future states.
 
 * These include many states that will not be reached when we iterate forward on the pair of equilibrium strategies $f_i$ starting from a given initial state.
+```
 
 ### Computation
 
diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index cf274ca5f..d71a9bd67 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -28,23 +28,22 @@ kernelspec:
 
 ## Overview
 
-Asset prices are forward-looking: they encode investors' expectations about future economic
-states and their valuations of different risks.  
+Asset prices are forward-looking: they encode investors' expectations about future
+economic states and their valuations of different risks.
 
-A long-standing question in finance is
-whether one can *recover* the probability distribution used by investors — their subjective
-beliefs — from observed asset prices alone.
+A long-standing question in finance is whether one can *recover* the probability
+distribution used by investors -- their subjective beliefs -- from observed asset prices
+alone.
 
-{cite}`BorovickaHansenScheinkman2016` study the challenge of separating investors'
-beliefs from their risk preferences using **Perron–Frobenius theory**. 
+{cite:t}`BorovickaHansenScheinkman2016` study the challenge of separating investors'
+beliefs from their risk preferences using **Perron–Frobenius theory**.
 
-Their key finding
-is that Perron–Frobenius theory applied to Arrow prices recovers a **long-term risk-neutral
-measure** that absorbs all long-horizon risk adjustments.  
+Their key finding is that Perron–Frobenius theory applied to Arrow prices recovers a
+**long-term risk-neutral measure** that absorbs all long-horizon risk adjustments.
 
-This recovered measure coincides
-with investors' subjective beliefs only under a stringent — and often empirically
-implausible — restriction on the stochastic discount factor.
+This recovered measure coincides with investors' subjective beliefs only under a
+stringent -- and often empirically implausible -- restriction on the stochastic discount
+factor.
 
 After completing this lecture you will be able to:
 
@@ -89,17 +88,16 @@ plt.rcParams.update({
 
 ### Arrow prices and stochastic discount factors
 
-Consider a discrete-time economy with an $n$-state Markov chain $\{X_t\}$ governed
-by transition matrix $\mathbf{P} = [p_{ij}]$.  
+Consider a discrete-time economy with an $n$-state Markov chain $\{X_t\}$ governed by
+transition matrix $\mathbf{P} = [p_{ij}]$.
 
-An **Arrow price** $q_{ij}$ is the
-date-$t$ price of a claim that pays $\$1$ tomorrow in state $j$ given that the current
-state is $i$.  
+An **Arrow price** $q_{ij}$ is the date-$t$ price of a claim that pays $\$1$ tomorrow in
+state $j$ given that the current state is $i$.
 
 Collect these prices in a matrix $\mathbf{Q} = [q_{ij}]$.
 
-A **stochastic discount factor** (SDF) $s_{ij}$ prices risk by discounting the payoff
-in state $j$ tomorrow when today's state is $i$.  
+A **stochastic discount factor** (SDF) $s_{ij}$ prices risk by discounting the payoff in
+state $j$ tomorrow when today's state is $i$.
 
 Arrow prices and the SDF are linked by
 
@@ -107,53 +105,46 @@ $$
 q_{ij} = s_{ij} \, p_{ij}.
 $$
 
-Given $\mathbf{Q}$, any pair $(\mathbf{S}, \mathbf{P})$ satisfying $q_{ij} = s_{ij} p_{ij}$
-for all $(i,j)$ is consistent with the observed prices.  
+Given $\mathbf{Q}$, any pair $(\mathbf{S}, \mathbf{P})$ satisfying
+$q_{ij} = s_{ij} p_{ij}$ for all $(i,j)$ is consistent with the observed prices.
 
-The fundamental identification
-problem is that $\mathbf{Q}$ has $n^2$ entries, $\mathbf{P}$ has $n(n-1)$ free entries
-(rows sum to one), and $\mathbf{S}$ has $n^2$ free entries — so there are far more
-unknowns than equations.
+The fundamental identification problem is that $\mathbf{Q}$ has $n^2$ entries,
+$\mathbf{P}$ has $n(n-1)$ free entries (rows sum to one), and $\mathbf{S}$ has $n^2$
+free entries -- so there are far more unknowns than equations.
 
 To make progress, we can impose restrictions on the SDF.
 
-Two classical restrictions are
-studied in the sections that follow.
+Two classical restrictions are studied in the sections that follow.
 
 ### A three-state illustration
 
-To build intuition, we work with a three-state Markov chain representing
-**recession**, **normal**, and **expansion** phases of the business cycle.
+To build intuition, we work with a three-state Markov chain representing **recession**,
+**normal**, and **expansion** phases of the business cycle.
 
 The physical transition matrix and consumption levels are:
 
 ```{code-cell} ipython3
-# Physical transition matrix (recession, normal, expansion)
 P_phys = np.array([
     [0.70, 0.25, 0.05],   # from recession
     [0.15, 0.65, 0.20],   # from normal
     [0.05, 0.30, 0.65],   # from expansion
 ])
 
-# Consumption levels in each state (arbitrary units)
 c_levels = np.array([0.85, 1.00, 1.15])
 state_names = ['recession', 'normal', 'expansion']
 
-# Preference parameters
-δ = 0.99    # monthly discount factor
+δ = -np.log(0.99)  # monthly subjective discount rate, so exp(-δ) = 0.99
 γ = 5.0     # coefficient of relative risk aversion
 
-# Arrow price matrix under power utility with rational expectations:
-#   q_ij = δ * (c_j / c_i)^{-γ} * p_ij
 n = len(c_levels)
 Q_mat = np.zeros((n, n))
 for i in range(n):
     for j in range(n):
-        Q_mat[i, j] = δ * (c_levels[j] / c_levels[i])**(-γ) * P_phys[i, j]
+        Q_mat[i, j] = np.exp(-δ) * (c_levels[j] / c_levels[i])**(-γ) * P_phys[i, j]
 
-print("Arrow price matrix Q:")
+print("Arrow price matrix Q")
 print(np.round(Q_mat, 5))
-print(f"\nSum of each row (= price of risk-free bond): {Q_mat.sum(axis=1).round(5)}")
+print("Risk-free discount factors:", Q_mat.sum(axis=1).round(5))
 ```
 
 ## Risk-neutral probabilities
@@ -164,10 +155,11 @@ $$
 \bar{s}_{i,j} = \bar{q}_i
 $$
 
-where $\bar{q}_i = \sum_j q_{ij}$ is the price of a one-period discount bond in state $i$.
+where $\bar{q}_i = \sum_j q_{ij}$ is the price of a one-period discount bond in state
+$i$.
 
 Under this restriction all future states are discounted equally from state $i$, so risk
-adjustments depend only on the current state.  
+adjustments depend only on the current state.
 
 The resulting risk-neutral probabilities are
 
@@ -177,31 +169,31 @@ $$
 
 ```{code-cell} ipython3
 def risk_neutral_probs(Q):
-    """Compute risk-neutral transition matrix from Arrow price matrix."""
-    q_bonds = Q.sum(axis=1)            # one-period bond prices
+    """Normalize Arrow prices by one-period bond prices."""
+    q_bonds = Q.sum(axis=1)
     P_bar = Q / q_bonds[:, np.newaxis]
     return P_bar, q_bonds
 
 
 P_bar, q_bonds = risk_neutral_probs(Q_mat)
 
-print("One-period bond prices (risk-free discount factors):")
+print("One-period bond prices:")
 for i, (s, qb) in enumerate(zip(state_names, q_bonds)):
     print(f"  {s:12s}: {qb:.5f}  (annualized yield ~ {-np.log(qb)*12:.2%})")
 
-print("\nRisk-neutral transition matrix P_bar:")
+print("\nRisk-neutral P_bar:")
 print(np.round(P_bar, 4))
-print(f"\nRow sums: {P_bar.sum(axis=1)}")
+print("Row sums:", P_bar.sum(axis=1))
 ```
 
 ```{note}
 Risk-neutral probabilities absorb **one-period** (short-run) risk adjustments.
 
-They are
-widely used in financial engineering but are generally *not* equal to investors' beliefs.
+They are widely used in financial engineering but are generally *not* equal to
+investors' beliefs.
 
-When short-term interest rates vary across states, risk-neutral probabilities are
-also horizon-dependent: the $t$-period forward measure differs from $\bar{\mathbf{P}}^t$.
+When short-term interest rates vary across states, risk-neutral probabilities are also
+horizon-dependent: the $t$-period forward measure differs from $\bar{\mathbf{P}}^t$.
 ```
 
 ## Long-term risk-neutral probabilities: Perron–Frobenius theory
@@ -209,6 +201,7 @@ also horizon-dependent: the $t$-period forward measure differs from $\bar{\mathb
 ### The eigenvalue problem
 
 The long-term behavior of discount factors is governed by a different restriction.
+
 **Long-term risk pricing** sets
 
 $$
@@ -216,6 +209,7 @@ $$
 $$
 
 for a scalar $\hat{\eta}$ and a vector of positive numbers $\{\hat{e}_i\}$.
+
 Substituting into $q_{ij} = s_{ij} p_{ij}$ gives:
 
 $$
@@ -231,9 +225,37 @@ $$
 
 This is an **eigenvalue–eigenvector problem** for the Arrow price matrix $\mathbf{Q}$.
 
-By the **Perron–Frobenius theorem**, if $\mathbf{Q}$ has strictly positive entries, the
-dominant eigenvalue is unique, real, and positive, and its eigenvector has strictly
-positive entries.  
+The next theorem is the mathematical reason this construction is well defined.
+
+It is not yet a theorem about recovering investors' true beliefs.
+
+Instead, it proves that a positive pricing operator has one distinguished positive
+eigenvalue-eigenvector pair.
+
+The proof idea, stated informally, is that a positive matrix maps the positive cone back
+into itself.
+
+Repeatedly applying the matrix and renormalizing pushes all positive vectors toward the
+same ray; the expansion rate along that ray is the Perron root.
+
+In this lecture we use that ray to define the state-dependent component
+$\hat{\mathbf e}$ and use the expansion rate to define the long-run discount rate
+$\hat{\eta}$.
+
+```{prf:theorem} Perron--Frobenius
+:label: thm-pf-mis
+
+If $A$ is a matrix with strictly positive entries, then
+
+1. $A$ has a unique largest positive real eigenvalue $r$ (the Perron root).
+2. There exists a strictly positive eigenvector $e \gg 0$ with $Ae = re$, unique up to scaling.
+```
+
+By {prf:ref}`thm-pf-mis`, the eigenvalue problem for $\mathbf{Q}$ has a unique solution.
+
+What has been proved at this stage is uniqueness of the long-term risk-neutral
+construction, not equality between $\hat{\mathbf P}$ and the physical transition matrix
+$\mathbf P$.
 
 This gives a unique construction:
 
@@ -241,7 +263,7 @@ This gives a unique construction:
    dominant eigenvalue–eigenvector pair.
 2. Set $\hat{p}_{ij} = \exp(-\hat{\eta}) \, q_{ij} \, \hat{e}_j / \hat{e}_i$.
 
-{cite}`BorovickaHansenScheinkman2016` call the resulting $\hat{\mathbf{P}}$ the
+{cite:t}`BorovickaHansenScheinkman2016` call the resulting $\hat{\mathbf{P}}$ the
 **long-term risk-neutral measure** because, under $\hat{\mathbf{P}}$, the long-horizon
 risk premia on stochastically growing cash flows are identically zero.
 
@@ -249,37 +271,27 @@ risk premia on stochastically growing cash flows are identically zero.
 
 ```{code-cell} ipython3
 def perron_frobenius(Q):
-    """
-    Compute the Perron-Frobenius decomposition of an Arrow price matrix.
-
-    Parameters
-    ----------
-    Q : ndarray, shape (n, n)
-        Arrow price matrix.
-
-    Returns
-    -------
-    η_hat : float — log of the dominant eigenvalue
-    exp_η : float — dominant eigenvalue exp(η_hat)
-    e_hat : ndarray — dominant eigenvector (positive, normalized to sum=1)
-    P_hat : ndarray — long-term risk-neutral transition matrix
-    """
+    """Return the Perron root, eigenvector, and long-term risk-neutral matrix."""
     eigenvalues, eigenvectors = linalg.eig(Q)
 
-    # Dominant eigenvalue: largest real part (real & positive by Perron–Frobenius)
-    idx = np.argmax(eigenvalues.real)
-    exp_η = eigenvalues[idx].real
-    e_hat = eigenvectors[:, idx].real
+    # Use the positive Perron eigenpair and discard numerical complex roots.
+    real_mask = np.isreal(eigenvalues)
+    real_eigenvalues = eigenvalues[real_mask].real
+    real_eigenvectors = eigenvectors[:, real_mask].real
+
+    idx = np.argmax(real_eigenvalues)
+    exp_η = real_eigenvalues[idx]
+    e_hat = real_eigenvectors[:, idx]
 
-    # Ensure positive entries (PF guarantees existence; numpy may flip sign)
-    if e_hat.mean() < 0:
+    if e_hat.sum() < 0:
         e_hat = -e_hat
-    e_hat = np.abs(e_hat) / np.abs(e_hat).sum()   # normalize to sum = 1
+    if np.any(e_hat <= 0):
+        raise ValueError("Dominant eigenvector is not strictly positive.")
+    e_hat = e_hat / e_hat.sum()
 
     η_hat = np.log(exp_η)
 
-    # Long-term risk-neutral transition matrix
-    # P_hat[i,j] = exp(-η_hat) * Q[i,j] * e_hat[j] / e_hat[i]
+    # Change measure using the Perron eigenfunction.
     P_hat = (1.0 / exp_η) * Q * e_hat[np.newaxis, :] / e_hat[:, np.newaxis]
 
     return η_hat, exp_η, e_hat, P_hat
@@ -287,13 +299,12 @@ def perron_frobenius(Q):
 
 η_hat, exp_η, e_hat, P_hat = perron_frobenius(Q_mat)
 
-print(f"Dominant eigenvalue  exp(η_hat) = {exp_η:.6f}")
-print(f"Log eigenvalue       η_hat      = {η_hat:.5f}  "
-      f"(annualized ~ {η_hat*12:.4f})")
-print(f"\nEigenvector e_hat = {e_hat.round(5)}")
-print(f"\nLong-term risk-neutral P_hat:")
+print(f"exp(η_hat) = {exp_η:.6f}")
+print(f"η_hat      = {η_hat:.5f}  (annualized ~ {η_hat*12:.4f})")
+print(f"e_hat      = {e_hat.round(5)}")
+print("\nLong-term risk-neutral P_hat:")
 print(np.round(P_hat, 4))
-print(f"\nRow sums: {P_hat.sum(axis=1)}")
+print("Row sums:", P_hat.sum(axis=1))
 ```
 
 ### Comparing the three probability measures
@@ -329,9 +340,8 @@ plt.show()
 ```
 
 ```{code-cell} ipython3
-# Stationary distributions under each measure
 def stationary_dist(P):
-    """Compute stationary distribution of an ergodic transition matrix P."""
+    """Stationary distribution of an ergodic transition matrix."""
     n = P.shape[0]
     A = (P.T - np.eye(n))
     A[-1] = 1.0
@@ -361,26 +371,37 @@ ax.set_title('stationary distributions under three probability measures')
 ax.legend(fontsize=9)
 plt.tight_layout();  plt.show()
 
-print("Stationary distributions:")
+print("Stationary distributions")
 for lbl, π in zip(labels, [π_phys, π_bar, π_hat]):
     print(f"  {lbl:45s}: {np.round(π,4)}")
 ```
 
-The long-term risk-neutral measure $\hat{\mathbf{P}}$ assigns **higher weight to bad
-states** (recession) and **lower weight to good states** (expansion) than the physical
-measure $\mathbf{P}$.  
+In this first trend-stationary power-utility example, the long-term risk-neutral measure
+$\hat{\mathbf{P}}$ coincides with the physical measure $\mathbf{P}$.
 
-This is the risk adjustment for long-run growth uncertainty: a
-risk-averse investor's long-run discount rates embed a premium for permanent income risk.
+This is the special success case in {cite}`BorovickaHansenScheinkman2016`: the SDF has
+only the Perron--Frobenius trend component and no martingale component.
+
+The one-period risk-neutral measure $\bar{\mathbf P}$, by contrast, still absorbs
+short-run risk adjustments and therefore differs from $\mathbf P$.
 
 ## The martingale decomposition
 
 ### Decomposing the SDF process
 
+The decomposition in this section answers a diagnostic question: after we remove the
+long-run discount rate and the state-dependent Perron--Frobenius trend from the SDF, is
+anything left?
+
+If the answer is yes, the leftover term is a martingale that changes probabilities
+between $\mathbf P$ and $\hat{\mathbf P}$.
+
+The proof is obtained by writing the one-period pricing identity in Perron--Frobenius
+form and multiplying those one-period identities over time.
+
 Let $\hat{\mathbf{e}}$ and $\hat{\eta}$ solve the Perron–Frobenius problem.
 
-Define the
-process
+Define the process
 
 $$
 \frac{\hat{H}_{t+1}}{\hat{H}_t} = (X_t)' \hat{\mathbf{H}} X_{t+1},
@@ -388,11 +409,10 @@ $$
 \hat{h}_{ij} = \frac{\hat{p}_{ij}}{p_{ij}}.
 $$
 
-Because $\sum_j \hat{h}_{ij} p_{ij} = \sum_j \hat{p}_{ij} = 1$, the process $\hat{H}$
-is a martingale under the physical measure $\mathbf{P}$.  
+Because $\sum_j \hat{h}_{ij} p_{ij} = \sum_j \hat{p}_{ij} = 1$, the process $\hat{H}$ is
+a martingale under the physical measure $\mathbf{P}$.
 
-The accumulated SDF then admits
-the **multiplicative decomposition**:
+The accumulated SDF then admits the **multiplicative decomposition**:
 
 $$
 S_t = \exp(\hat{\eta} t) \left(\frac{\hat{e}(X_0)}{\hat{e}(X_t)}\right)
@@ -408,61 +428,96 @@ The three components are:
 | $\hat{H}_t/\hat{H}_0$ | Martingale; encodes long-run risk adjustments |
 
 ```{code-cell} ipython3
-# SDF matrix: s_ij = q_ij / p_ij
+# Physical SDF implied by Arrow prices and physical probabilities.
 S_mat = np.where(P_phys > 0, Q_mat / P_phys, 0.0)
 
-# Trend SDF: s_hat_ij = exp(η_hat) * e_hat_i / e_hat_j
+# Perron-Frobenius trend component of the SDF.
 S_hat = exp_η * e_hat[:, np.newaxis] / e_hat[np.newaxis, :]
 
-# Martingale increment: h_hat_ij = P_hat_ij / P_ij  (also = S_ij / S_hat_ij)
+# Martingale likelihood-ratio increment between P_hat and P.
 H_incr = np.where(P_phys > 0, P_hat / P_phys, 0.0)
 
 print("SDF matrix S = Q/P:")
 print(np.round(S_mat, 4))
 print("\nTrend SDF S_hat = exp(η_hat) * e_hat_i / e_hat_j:")
 print(np.round(S_hat, 4))
-print("\nMartingale increment h_hat = S_hat * H_tilde_incr (= P_hat/P):")
+print("\nMartingale increment h_hat = P_hat/P:")
 print(np.round(H_incr, 4))
 
-# Verify martingale property: E[h_hat_{ij} | X_t=i] = sum_j h_hat_ij * p_ij = 1
 mart_check = (H_incr * P_phys).sum(axis=1)
-print(f"\nMartingale property check — E[h_hat | X_t=i] = {mart_check}")
+print(f"\nE[h_hat | X_t=i] = {mart_check}")
 ```
 
-Higher risk aversion amplifies the pessimistic distortion: as $\gamma$ increases, the
-recovered measure assigns growing probability to the recession state.
+Here $\hat h_{ij}=1$ for every transition, so there is no recovery distortion.
 
-
-(Gigures illustrating this will appear below, after we define the Epstein–Zin utility
-function that is needed to compute them.)
+The pessimistic distortion appears below once recursive utility introduces a nontrivial
+continuation-value martingale.
 
 ## When does recovery succeed?
 
 ### The Ross recovery condition
 
-{cite}`Ross2015` proposes to identify investors' subjective beliefs by imposing
+{cite:t}`Ross2015` proposes to identify investors' subjective beliefs by imposing
 
 $$
 \widetilde{S}_t = \exp(-\delta t) \frac{m(X_t)}{m(X_0)}
 $$
 
 for some positive function $m$ and discount rate $\delta$ (Condition 4 in
-{cite}`BorovickaHansenScheinkman2016`).  
+{cite}`BorovickaHansenScheinkman2016`).
+
+Under this restriction, the SDF has **no martingale component**: $\hat{H}_t \equiv 1$.
+
+The proposition below states the exact object being tested.
 
-Under this restriction, the SDF has **no
-martingale component**: $\hat{H}_t \equiv 1$.
+It asks whether the Perron--Frobenius transition matrix $\hat{\mathbf P}$ is the same as
+the physical transition matrix $\mathbf P$.
 
-Equivalently, recovery succeeds if and only if the physical stochastic discount factor
-takes the "long-term risk pricing" form
+The proof is just an accounting exercise: divide the recovered probabilities by the
+physical probabilities and see whether the resulting likelihood-ratio increment is
+identically one.
+
+```{prf:proposition} Ross Recovery Condition
+:label: prop-ross-recovery-condition
+
+({cite}`BorovickaHansenScheinkman2016`) Recovery succeeds -- i.e.,
+$\hat{\mathbf{P}} = \mathbf{P}$ -- if and only if the physical stochastic discount
+factor takes the long-term risk pricing form
 
 $$
 s_{ij} = \exp(\hat{\eta}) \frac{\hat{e}_i}{\hat{e}_j}
 $$
 
-with $\hat{h}_{ij} \equiv 1$.
+with $\hat{h}_{ij} \equiv 1$, so that the SDF has no martingale component.
+```
 
-In this case $\hat{\mathbf{P}} = \mathbf{P}$ and the
-Perron–Frobenius procedure recovers the true probabilities.
+```{prf:proof}
+Using $q_{ij}=s_{ij}p_{ij}$ and the Perron--Frobenius construction,
+
+$$
+\hat{p}_{ij}
+= \exp(-\hat{\eta})q_{ij}\frac{\hat{e}_j}{\hat{e}_i}
+= \exp(-\hat{\eta})s_{ij}p_{ij}\frac{\hat{e}_j}{\hat{e}_i}.
+$$
+
+Hence the likelihood-ratio increment between the recovered and physical measures is
+
+$$
+\hat{h}_{ij}
+= \frac{\hat{p}_{ij}}{p_{ij}}
+= \exp(-\hat{\eta})s_{ij}\frac{\hat{e}_j}{\hat{e}_i}.
+$$
+
+Thus $\hat{\mathbf P}=\mathbf P$ if and only if $\hat{h}_{ij}=1$ for every feasible
+transition $(i,j)$, which is equivalent to
+
+$$
+s_{ij} = \exp(\hat{\eta})\frac{\hat{e}_i}{\hat{e}_j}.
+$$
+
+This is precisely the case in which the martingale term in the multiplicative
+decomposition is degenerate.
+```
 
 The critical question is: when is the martingale component degenerate?
 
@@ -471,26 +526,75 @@ The critical question is: when is the martingale component degenerate?
 Consider a power-utility investor with risk aversion $\gamma$ and *trend-stationary*
 consumption $C_t = \exp(g_c t)(c \cdot X_t)$ where $c$ is a positive vector.
 
-The
-one-period SDF is
+The one-period SDF is
 
 $$
 s_{ij} = \exp(-\delta - \gamma g_c) \left(\frac{c_j}{c_i}\right)^{-\gamma}.
 $$
 
-This has the exact long-term risk pricing form with $\hat{e}_j = c_j^\gamma$ and
+The corollary shows one important case where the recovery condition is satisfied.
+
+What is being proved is that trend-stationary consumption risk can be absorbed entirely
+into the state-dependent ratio $\hat e_i/\hat e_j$.
+
+The proof works by guessing the Perron--Frobenius eigenvector from marginal utility,
+then checking that the recovered transition probabilities reduce to the original
+physical probabilities.
+
+```{prf:corollary} Recovery under Power Utility
+:label: cor-recovery-power-utility
+
+For a power-utility investor with trend-stationary consumption, the SDF takes the exact
+long-term risk pricing form with $\hat{e}_j = c_j^\gamma$ and
 $\hat{\eta} = -(\delta + \gamma g_c)$.
 
-Therefore $\hat{h}_{ij} \equiv 1$ and **Ross
-recovery succeeds exactly** when consumption fluctuations around a deterministic trend
-are the only source of risk.
+Therefore $\hat{h}_{ij} \equiv 1$ and Ross recovery succeeds exactly when consumption
+fluctuations around a deterministic trend are the only source of risk.
+```
+
+```{prf:proof}
+Let
+
+$$
+A = \exp(-\delta-\gamma g_c)
+$$
+
+so that
+
+$$
+q_{ij} = A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}.
+$$
+
+Guess $\hat e_i=c_i^\gamma$.
+
+Then
+
+$$
+[\mathbf Q\hat{\mathbf e}]_i
+= \sum_j A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}c_j^\gamma
+= A c_i^\gamma \sum_j p_{ij}
+= A\hat e_i.
+$$
+
+Thus $\exp(\hat\eta)=A$ and $\hat{\mathbf e}$ is the Perron--Frobenius eigenvector.
+
+Substituting into the recovered transition probabilities gives
+
+$$
+\hat p_{ij}
+= \frac{1}{A}q_{ij}\frac{\hat e_j}{\hat e_i}
+= \frac{1}{A}
+   A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}
+   \frac{c_j^\gamma}{c_i^\gamma}
+= p_{ij}.
+$$
+
+Hence $\hat h_{ij}=\hat p_{ij}/p_{ij}=1$ for all feasible transitions.
+```
 
 ```{code-cell} ipython3
-# Verify: for trend-stationary power utility, h_hat_ij = 1 identically
 gc = 0.002   # monthly trend growth
 
-# Trend-stationary: consumption growth ratio depends only on state, not history
-# s_ij = exp(-δ - γ*gc) * (c_j/c_i)^(-γ)
 S_trend = np.zeros((n, n))
 for i in range(n):
     for j in range(n):
@@ -502,14 +606,21 @@ _, exp_η_t, e_hat_t, P_hat_t = perron_frobenius(Q_trend)
 
 H_incr_trend = np.where(P_phys > 0, P_hat_t / P_phys, 0.0)
 
-print("Martingale increment h_hat_ij for trend-stationary power utility:")
+print("Trend-stationary h_hat:")
 print(np.round(H_incr_trend, 6))
-print(f"\nMax deviation from 1: {np.abs(H_incr_trend[P_phys>0] - 1).max():.2e}")
-print("-> Martingale is trivial: Recovery succeeds.")
+print(f"Max deviation from 1: {np.abs(H_incr_trend[P_phys>0] - 1).max():.2e}")
 ```
 
 ### Recursive (Epstein–Zin) utility
 
+The previous corollary is a success case for recovery.
+
+The next calculation is a failure case: it shows exactly where the power-utility proof
+breaks once continuation values enter the SDF.
+
+The key step is to identify an extra term that cannot, in general, be written only as a
+ratio of the current and next states.
+
 When the investor has **Epstein–Zin recursive preferences** with risk aversion
 $\gamma \neq 1$, continuation values $V_t$ satisfy the recursion
 
@@ -527,45 +638,31 @@ s_{ij} = \exp(-\delta - g_c)\frac{c_i}{c_j}
 $$
 
 where $v^*_i = \exp\!\bigl[(1-\gamma)v_i\bigr]$ and $\mathbf{P}_i$ is the $i$-th row of
-$\mathbf{P}$.  
+$\mathbf{P}$.
 
-The additional factor $v^*_j/(\mathbf{P}_i v^*)$ introduces a **nontrivial
-martingale component** whenever continuation values vary across states.
+The additional factor $v^*_j/(\mathbf{P}_i v^*)$ introduces a **nontrivial martingale
+component** whenever $v^*$ is not constant across states.
 
 ```{code-cell} ipython3
 def solve_ez_finite(P, c, δ, γ, gc, tol=1e-12, max_iter=5000):
-    """
-    Solve for Epstein-Zin continuation values in finite Markov chain.
-
-    Solves the fixed-point v_i = (1-β)log(c_i) + β/(1-γ) log(P_i @ exp((1-γ)v))
-    where β = exp(-δ - gc).  The special case γ = 1 (log utility) is handled
-    separately to avoid the 0/0 indeterminate form: the recursion reduces to
-    v = (I - β P)^{-1} (1-β) log(c) and the SDF simplifies to
-    s_ij = exp(-δ - g_c) c_i / c_j.
-
-    Returns
-    -------
-    v : ndarray — continuation values (net of time trend)
-    vstar : ndarray — exp((1-γ)v)
-    s : ndarray — one-period SDF matrix
-    """
-    β = np.exp(-δ - gc)
+    """Solve finite-state Epstein-Zin continuation values and SDF."""
+    β = np.exp(-δ)
     log_c = np.log(c)
     n = len(c)
+    flow = (1 - β) * log_c + β * gc
 
     if abs(γ - 1.0) < 1e-10:
-        # Log utility: (I - β P) v = (1-β) log c
-        v = linalg.solve(np.eye(n) - β * P, (1 - β) * log_c)
-        vstar = np.ones(n)     # exp((1-1)*v) = 1
-        Pv = np.ones(n)        # P @ ones = ones
+        # Log utility avoids the (1-gamma) denominator in the recursion.
+        v = linalg.solve(np.eye(n) - β * P, flow)
+        vstar = np.ones(n)
+        Pv = np.ones(n)
     else:
-        # General recursive utility: fixed-point iteration
+        # Fixed-point iteration for the transformed continuation value term.
         v = log_c.copy()
         for _ in range(max_iter):
             vstar = np.exp((1 - γ) * v)
             Pv = P @ vstar
-            v_new = ((1 - β) * log_c
-                     + β / (1 - γ) * np.log(Pv))
+            v_new = flow + β / (1 - γ) * np.log(Pv)
             if np.max(np.abs(v_new - v)) < tol:
                 v = v_new
                 break
@@ -573,7 +670,7 @@ def solve_ez_finite(P, c, δ, γ, gc, tol=1e-12, max_iter=5000):
         vstar = np.exp((1 - γ) * v)
         Pv = P @ vstar
 
-    # SDF matrix
+    # The SDF includes the continuation-value likelihood-ratio term.
     s = np.zeros((n, n))
     for i in range(n):
         for j in range(n):
@@ -582,7 +679,6 @@ def solve_ez_finite(P, c, δ, γ, gc, tol=1e-12, max_iter=5000):
     return v, vstar, s
 
 
-# Compare: γ = 1 (log utility, degenerate martingale) vs γ = 5
 gc_ex = 0.001   # monthly consumption trend growth
 
 for γ_val, label in [(1.0, 'γ = 1  (log utility)'), (5.0, 'γ = 5  (risk aversion)')]:
@@ -594,15 +690,12 @@ for γ_val, label in [(1.0, 'γ = 1  (log utility)'), (5.0, 'γ = 5  (risk avers
 
     π_hat_ez = stationary_dist(P_hat_ez)
     print(f"\n{label}")
-    print(f"  Continuation values v = {v_ez.round(4)}")
     print(f"  Max |h_hat_ij - 1|        = {np.abs(H_ez[P_phys>0] - 1).max():.4f}")
     print(f"  Stationary P_hat         = {π_hat_ez.round(4)}")
-    print(f"  Stationary P          = {π_phys.round(4)}")
+    print(f"  Stationary P             = {π_phys.round(4)}")
 ```
 
 ```{code-cell} ipython3
-# Show how the martingale depends on γ for recursive utility
-# Start at 1.0: the γ=1 special case in solve_ez_finite is handled explicitly.
 γs_ez = np.linspace(1.0, 10.0, 50)
 mart_errors = []
 π_rec_hat = []
@@ -635,9 +728,6 @@ plt.tight_layout();  plt.show()
 ```
 
 ```{code-cell} ipython3
-# Visualize the martingale increment using Epstein-Zin utility (γ=5).
-# Trend-stationary power utility always yields h_hat_ij = 1 by construction (see Exercise 4),
-# so we use recursive utility here to reveal a genuinely non-trivial martingale.
 γ_ez_demo = 5.0
 _, _, S_ez_demo = solve_ez_finite(P_phys, c_levels, δ, γ_ez_demo, gc_ex)
 Q_ez_demo = S_ez_demo * P_phys
@@ -663,7 +753,6 @@ axes[0].set_yticklabels(state_names, fontsize=9)
 axes[0].set_xlabel('next state');  axes[0].set_ylabel('current state')
 plt.colorbar(im0, ax=axes[0], fraction=0.046)
 
-# How risk aversion γ shifts the recovered measure under Epstein-Zin utility.
 γs_shift = np.linspace(1.0, 12, 60)
 rec_wts_ez = []
 for g in γs_shift:
@@ -683,17 +772,17 @@ axes[1].legend(fontsize=9)
 plt.tight_layout();  plt.show()
 ```
 
-At $\gamma = 1$ (log utility), the continuation value is constant across states and the
-martingale is trivial, so recovery succeeds.
+At $\gamma = 1$ (log utility), $v^*=\exp((1-\gamma)v)$ is constant across states, so the
+continuation-value martingale is trivial and recovery succeeds.
 
-For $\gamma > 1$, continuation values vary
-with the state, generating a non-degenerate martingale that grows with risk aversion.
+For $\gamma > 1$, the transformed continuation value $v^*$ varies with the state,
+generating a non-degenerate martingale that grows with risk aversion.
 
 ## The long-run risk model
 
 We now illustrate the results quantitatively using the Bansal–Yaron
-{cite}`Bansal_Yaron_2004` long-run risk model, calibrated to {cite}`BorovickaHansenScheinkman2016`
-(Figure 1).
+{cite}`Bansal_Yaron_2004` long-run risk model, calibrated to
+{cite}`BorovickaHansenScheinkman2016` (Figure 1).
 
 ### Model setup
 
@@ -708,10 +797,11 @@ $$
 
 where $W_t$ is a three-dimensional Brownian motion.
 
-Here $X_{1t}$ is the
-**predictable component of consumption growth** and $X_{2t}$ is **stochastic volatility**.
+Here $X_{1t}$ is the **predictable component of consumption growth** and $X_{2t}$ is
+**stochastic volatility**.
 
-The representative agent has Epstein–Zin preferences with unit elasticity of substitution.
+The representative agent has Epstein–Zin preferences with unit elasticity of
+substitution.
 
 The stochastic discount factor satisfies
 
@@ -719,11 +809,10 @@ $$
 d\log S_t = -\delta\,dt - d\log C_t + d\log H^*_t,
 $$
 
-where $H^*$ is a martingale determined by the continuation value of the recursive utility.
+where $H^*$ is a martingale determined by the continuation value of the recursive
+utility.
 
 ```{code-cell} ipython3
-# Model parameters from Borovicka-Hansen-Scheinkman (2016), Figure 1
-# Monthly frequency
 lrr_params = dict(
     δ = 0.002,         # subjective discount rate
     γ = 10.0,          # risk aversion
@@ -743,33 +832,25 @@ lrr_params = dict(
 
 ### Solving the value function
 
-The log continuation value $v(X_t)$ is affine in the state: $v(x) = \bar{v}_0 + \bar{v}_1 x_1 + \bar{v}_2 x_2$.
+The log continuation value $v(X_t)$ is affine in the state:
+$v(x) = \bar{v}_0 + \bar{v}_1 x_1 + \bar{v}_2 x_2$.
 
-The coefficients satisfy the algebraic system in Appendix D of {cite}`BorovickaHansenScheinkman2016`.
+The coefficients satisfy the algebraic system in Appendix D of
+{cite}`BorovickaHansenScheinkman2016`.
 
 ```{code-cell} ipython3
 def solve_value_function(p):
-    """
-    Solve for Epstein-Zin value function coefficients in the LRR model.
-
-    The continuation value satisfies:
-        log V_t = log C_t + v_bar0 + v_bar1*X1_t + v_bar2*X2_t
-
-    Returns v_bar1, v_bar2.
-    """
+    """Solve the affine Epstein-Zin value-function coefficients."""
     δ, γ = p['δ'], p['γ']
     μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
     σ1, σ2 = p['σ1'], p['σ2']
     β_c1, β_c2 = p['β_c1'], p['β_c2']
     α_c = p['α_c']
 
-    # Linear equation for v_bar1
-    # δ v_bar1 = β_c1 + μ_bar11 v_bar1  =>  v_bar1 = β_c1 / (δ - μ_bar11)
+    # The X1 coefficient solves a scalar linear equation.
     v1 = β_c1 / (δ - μ11)
 
-    # Quadratic equation for v_bar2
-    # 0 = (μ_bar22 - δ)v_bar2 + β_c2 + μ_bar12 v_bar1 + (1/2)(1-γ)|A + B v_bar2|^2
-    # where A = α_c_bar + σ_bar1 v_bar1,  B = σ_bar2
+    # The X2 coefficient solves the quadratic equation from the affine recursion.
     A_vec = α_c + σ1 * v1
     B_vec = σ2
 
@@ -781,7 +862,6 @@ def solve_value_function(p):
     if disc < 0:
         raise ValueError("Value function does not exist for these parameters.")
 
-    # "Minus" solution (generates ergodic dynamics under P_hat)
     v2 = (-b - np.sqrt(disc)) / (2 * a)
     return v1, v2, A_vec, B_vec
 
@@ -789,22 +869,13 @@ def solve_value_function(p):
 v1, v2, A_vec, B_vec = solve_value_function(lrr_params)
 print(f"Value-function slope on X1:  v_bar1 = {v1:.4f}")
 print(f"Value-function slope on X2:  v_bar2 = {v2:.4f}")
-print(f"\nInterpretation:")
-print(f"  Higher X1 (better expected growth) raises continuation value (v_bar1 > 0)")
-print(f"  Higher X2 (more volatility) lowers continuation value (v_bar2 < 0)")
 ```
 
 ### Perron–Frobenius and recovered dynamics
 
 ```{code-cell} ipython3
 def solve_pf_lrr(p, v1, v2, A_vec):
-    """
-    Solve the Perron-Frobenius problem for the long-run risk model.
-
-    Eigenfunction guess: e_hat(x) = exp(e_bar1 x1 + e_bar2 x2).
-
-    Returns e_bar1, e_bar2, η_hat, and the SDF diffusion vector α_s.
-    """
+    """Solve the LRR Perron-Frobenius coefficients."""
     δ, γ = p['δ'], p['γ']
     μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
     ι1, ι2 = p['ι1'], p['ι2']
@@ -813,23 +884,23 @@ def solve_pf_lrr(p, v1, v2, A_vec):
     β_c0 = p['β_c0']
     β_c1, β_c2 = p['β_c1'], p['β_c2']
 
-    # SDF diffusion:  α_s = -γ α_c + (1-γ)(σ_bar1 v_bar1 + σ_bar2 v_bar2)
-    α_s = (-γ * α_c
-           + (1 - γ) * (σ1 * v1 + σ2 * v2))
+    # H* is the continuation-value martingale in the recursive utility SDF.
+    α_h_star = (1 - γ) * (α_c + σ1 * v1 + σ2 * v2)
+
+    # α_s is the diffusion loading of d log S_t.
+    α_s = -α_c + α_h_star
 
-    # SDF drift parameters in  β_s(x) = β_s0 + β_s11(x1-ι1) + β_s12(x2-ι2)
+    # The Ito correction uses d log H*, not the total log-SDF diffusion.
     β_s11 = -β_c1
-    β_s12 = -β_c2 - 0.5 * np.dot(α_s, α_s)
+    β_s12 = -β_c2 - 0.5 * np.dot(α_h_star, α_h_star)
     β_s0 = (-δ - β_c0
-            - 0.5 * ι2 * np.dot(α_s, α_s))
+            - 0.5 * ι2 * np.dot(α_h_star, α_h_star))
 
-    # Equation 0 = β_s11 + μ_bar11 e_bar1  =>  e_bar1 = -β_s11 / μ_bar11
+    # The first Perron coefficient solves a scalar linear equation.
     e1 = -β_s11 / μ11
 
-    # Quadratic for e_bar2
-    # 0 = (β_s12 + (1/2)|α_s|^2)  +  e_bar1(μ_bar12 + σ_bar1*α_s) + (1/2)e_bar1^2|σ_bar1|^2
-    #     + e_bar2(μ_bar22 + σ_bar2*α_s + e_bar1 σ_bar1*σ_bar2') + (1/2)e_bar2^2|σ_bar2|^2
-    const_pf = (β_s12 + 0.5*np.dot(α_s, α_s)    # = 0 by construction
+    # The second Perron coefficient solves a quadratic equation.
+    const_pf = (β_s12 + 0.5*np.dot(α_s, α_s)
                 + e1*(μ12 + np.dot(σ1, α_s))
                 + 0.5*e1**2*np.dot(σ1, σ1))
     lin_pf = μ22 + np.dot(σ2, α_s) + e1*np.dot(σ1, σ2)
@@ -839,12 +910,11 @@ def solve_pf_lrr(p, v1, v2, A_vec):
     e2_m = (-lin_pf - np.sqrt(disc)) / (2*quad_pf)
     e2_p = (-lin_pf + np.sqrt(disc)) / (2*quad_pf)
 
-    # η_hat = β_s0 - β_s12*ι2 - e_bar2*μ_bar22*ι2  (ι1 = 0)
     η_m = β_s0 - β_s12*ι2 - e2_m*μ22*ι2
     η_p = β_s0 - β_s12*ι2 - e2_p*μ22*ι2
 
-    # Choose solution with smaller |η_hat| (ergodicity requirement)
-    if abs(η_m) <= abs(η_p):
+    # Select the lower eigenvalue root that generates stationary recovered dynamics.
+    if η_m <= η_p:
         e2, η_hat = e2_m, η_m
     else:
         e2, η_hat = e2_p, η_p
@@ -857,35 +927,25 @@ e1, e2, η_hat_lrr, α_s = solve_pf_lrr(lrr_params, v1, v2, A_vec)
 print(f"PF eigenfunction coefficients:  e_bar1 = {e1:.4f},  e_bar2 = {e2:.4f}")
 print(f"Log eigenvalue:                 η_hat  = {η_hat_lrr:.6f}  "
       f"(annualized = {η_hat_lrr*12:.4f})")
-print(f"\nInterpretation:")
-print(f"  e_bar1 = {e1:.2f}: e_hat down-weights high-X1 (good growth) states")
-print(f"  e_bar2 = {e2:.2f}: e_hat up-weights high-X2 (high volatility) states")
 ```
 
 ### Computing the P_hat dynamics
 
 ```{code-cell} ipython3
 def compute_phat_dynamics(p, e1, e2, α_s):
-    """
-    Compute the drift parameters of X under the recovered measure P_hat.
-
-    Under P_hat, the Brownian motion is
-        dW_hat_t = -sqrt(X2_t) * α_hat_h dt + dW_t
-    where α_hat_h = α_s + σ_bar1 e_bar1 + σ_bar2 e_bar2.
-    """
+    """Drift parameters under the recovered measure P_hat."""
     μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
     ι1, ι2 = p['ι1'], p['ι2']
     σ1, σ2 = p['σ1'], p['σ2']
 
-    # Martingale drift correction
+    # α_h is the likelihood-ratio loading for the recovered measure.
     α_h = α_s + σ1 * e1 + σ2 * e2
 
-    # New drift parameters under P_hat
     μ_hat_11 = μ11
     μ_hat_12 = μ12 + np.dot(σ1, α_h)
     μ_hat_22 = μ22 + np.dot(σ2, α_h)
 
-    # New long-run means
+    # Rewrite the drift in mean-reversion form under P_hat.
     ι_hat_2 = (μ22 / μ_hat_22) * ι2
     ι_hat_1 = (ι1
                + (1.0/μ11) * (μ12*ι2 - μ_hat_12*ι_hat_2))
@@ -904,36 +964,65 @@ def compute_phat_dynamics(p, e1, e2, α_s):
 
 phat_dyn = compute_phat_dynamics(lrr_params, e1, e2, α_s)
 
-print("Dynamics of X under P_hat  (vs physical P):")
+print("P_hat dynamics:")
 print(f"  μ_hat_11 = {phat_dyn['μ_hat_11']:.4f}  "
-      f"(same as physical μ_bar_11 = {lrr_params['μ11']:.4f})")
+      f"(physical {lrr_params['μ11']:.4f})")
 print(f"  μ_hat_12 = {phat_dyn['μ_hat_12']:.6f}  "
-      f"(physical = 0 — new coupling created by risk adjustment)")
+      f"(physical 0)")
 print(f"  μ_hat_22 = {phat_dyn['μ_hat_22']:.5f}  "
-      f"(physical = {lrr_params['μ22']:.4f})")
+      f"(physical {lrr_params['μ22']:.4f})")
 print(f"  ι_hat_1  = {phat_dyn['ι_hat_1']:.5f}  "
-      f"(physical ι1 = {lrr_params['ι1']:.4f}  — lower mean growth under P_hat)")
+      f"(physical {lrr_params['ι1']:.4f})")
 print(f"  ι_hat_2  = {phat_dyn['ι_hat_2']:.5f}  "
-      f"(physical ι2 = {lrr_params['ι2']:.4f}  — higher mean volatility under P_hat)")
+      f"(physical {lrr_params['ι2']:.4f})")
+```
+
+For comparison with the paper's Figure 1, we also compute the instantaneous risk-neutral
+dynamics.
+
+This change of measure uses the martingale component of the normalized SDF, whose
+diffusion vector is $\alpha_s$.
+
+```{code-cell} ipython3
+def compute_rn_dynamics(p, α_s):
+    """Drift parameters under the one-period risk-neutral measure."""
+    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
+    ι1, ι2 = p['ι1'], p['ι2']
+    σ1, σ2 = p['σ1'], p['σ2']
+
+    # Risk-neutral dynamics use the normalized SDF loading.
+    μ_rn_11 = μ11
+    μ_rn_12 = μ12 + np.dot(σ1, α_s)
+    μ_rn_22 = μ22 + np.dot(σ2, α_s)
+
+    # Rewrite the drift in mean-reversion form under P_bar.
+    ι_rn_2 = (μ22 / μ_rn_22) * ι2
+    ι_rn_1 = (ι1
+              + (1.0/μ11) * (μ12*ι2 - μ_rn_12*ι_rn_2))
+
+    return dict(
+        μ_rn_11 = μ_rn_11,
+        μ_rn_12 = μ_rn_12,
+        μ_rn_22 = μ_rn_22,
+        ι_rn_1 = ι_rn_1,
+        ι_rn_2 = ι_rn_2,
+    )
+
+
+rn_dyn = compute_rn_dynamics(lrr_params, α_s)
+
+print("Dynamics of X under P_bar (risk-neutral):")
+print(f"  μ_rn_12 = {rn_dyn['μ_rn_12']:.6f}")
+print(f"  μ_rn_22 = {rn_dyn['μ_rn_22']:.5f}")
+print(f"  ι_rn_1  = {rn_dyn['ι_rn_1']:.5f}")
+print(f"  ι_rn_2  = {rn_dyn['ι_rn_2']:.5f}")
 ```
 
 ### Simulating and comparing stationary distributions
 
 ```{code-cell} ipython3
 def simulate_lrr(dyn, T=600_000, seed=42):
-    """
-    Simulate the LRR state vector using Euler-Maruyama (monthly steps).
-
-    Parameters
-    ----------
-    dyn  : dict with μ11, μ12, μ22, ι1, ι2, σ1, σ2
-    T    : number of monthly steps
-    seed : random seed
-
-    Returns
-    -------
-    X1, X2 : ndarray — stationary sample paths (burn-in discarded)
-    """
+    """Simulate stationary LRR paths by Euler-Maruyama."""
     rng = np.random.default_rng(seed)
     μ11 = dyn.get('μ11', dyn.get('μ_hat_11'))
     μ12 = dyn.get('μ12', dyn.get('μ_hat_12', 0.0))
@@ -949,7 +1038,9 @@ def simulate_lrr(dyn, T=600_000, seed=42):
     for t in range(1, T):
         X2t = max(X2[t-1], 1e-9)
         sq_X2 = np.sqrt(X2t)
-        dW = rng.standard_normal(3)          # monthly Δt = 1
+
+        # Monthly Euler step with dt = 1.
+        dW = rng.standard_normal(3)
 
         X1[t] = X1[t-1] + (μ11*(X1[t-1]-ι1) + μ12*(X2t-ι2)) + sq_X2*np.dot(σ1, dW)
         X2[t] = max(X2[t-1] + μ22*(X2t-ι2) + sq_X2*np.dot(σ2, dW),  1e-9)
@@ -958,8 +1049,6 @@ def simulate_lrr(dyn, T=600_000, seed=42):
     return X1[burn:], X2[burn:]
 
 
-# Simulation under physical P
-print("Simulating under physical measure P ...")
 X1_P, X2_P = simulate_lrr(
     dict(μ11=lrr_params['μ11'], μ12=lrr_params['μ12'],
          μ22=lrr_params['μ22'], ι1=lrr_params['ι1'],
@@ -968,8 +1057,6 @@ X1_P, X2_P = simulate_lrr(
     T=600_000
 )
 
-# Simulation under recovered measure P_hat
-print("Simulating under recovered measure P_hat ...")
 X1_Ph, X2_Ph = simulate_lrr(
     dict(μ_hat_11=phat_dyn['μ_hat_11'],
          μ_hat_12=phat_dyn['μ_hat_12'],
@@ -980,14 +1067,23 @@ X1_Ph, X2_Ph = simulate_lrr(
          σ2=lrr_params['σ2']),
     T=600_000
 )
-print("Done.")
+
+X1_RN, X2_RN = simulate_lrr(
+    dict(μ11=rn_dyn['μ_rn_11'],
+         μ12=rn_dyn['μ_rn_12'],
+         μ22=rn_dyn['μ_rn_22'],
+         ι1=rn_dyn['ι_rn_1'],
+         ι2=rn_dyn['ι_rn_2'],
+         σ1=lrr_params['σ1'],
+         σ2=lrr_params['σ2']),
+    T=600_000
+)
 ```
 
 ```{code-cell} ipython3
-# Reproduce Figure 1 of Borovicka-Hansen-Scheinkman (2016)
 def kde2d_contour(ax, X1, X2, levels=8, color='k', alpha=1.0, lw=1.5,
-                  bandwidth=None):
-    """Plot contour lines of a 2D kernel density estimate."""
+                  bandwidth=None, linestyle='solid'):
+    """Add 2D KDE contours to an axis."""
     xy = np.vstack([X2, X1])
     kde = gaussian_kde(xy, bw_method=bandwidth)
     x2g = np.linspace(X2.min()*0.9, X2.max()*1.1, 120)
@@ -995,22 +1091,21 @@ def kde2d_contour(ax, X1, X2, levels=8, color='k', alpha=1.0, lw=1.5,
     X2g, X1g = np.meshgrid(x2g, x1g)
     Z = kde(np.vstack([X2g.ravel(), X1g.ravel()])).reshape(X2g.shape)
     ax.contour(X2g, X1g, Z, levels=levels, colors=color, alpha=alpha,
-               linewidths=lw)
+               linewidths=lw, linestyles=linestyle)
 
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5), sharey=True)
 
-# Left panel: distribution under P
 kde2d_contour(ax1, X1_P, X2_P, color='navy', levels=7)
 ax1.set_xlabel('conditional volatility  $X_2$', fontsize=11)
 ax1.set_ylabel('mean growth rate  $X_1$', fontsize=11)
 ax1.set_title(r'physical measure  $P$', fontsize=12)
 
-# Right panel: distribution under P_hat, plus outermost contour of P_bar (risk-neutral)
 kde2d_contour(ax2, X1_Ph, X2_Ph, color='navy', levels=7)
+kde2d_contour(ax2, X1_RN, X2_RN, color='black', levels=3,
+              alpha=0.65, lw=1.2, linestyle='--')
 ax2.set_xlabel('conditional volatility  $X_2$', fontsize=11)
 ax2.set_title(r'long-term risk-neutral  $\hat{P}$', fontsize=12)
 
-# Annotate distributional shifts
 for ax in (ax1, ax2):
     ax.axhline(0, color='grey', lw=0.8, ls='--')
     ax.axvline(lrr_params['ι2'], color='grey', lw=0.8, ls='--')
@@ -1023,9 +1118,12 @@ ax2.annotate(f"mean X1 ~ {X1_Ph.mean():.4f}", xy=(0.05, 0.92),
              xycoords='axes fraction', fontsize=9, color='navy')
 ax2.annotate(f"mean X2 ~ {X2_Ph.mean():.4f}", xy=(0.05, 0.85),
              xycoords='axes fraction', fontsize=9, color='navy')
+ax2.plot([], [], color='navy', lw=1.5, label=r'$\hat{P}$')
+ax2.plot([], [], color='black', lw=1.2, ls='--', label=r'risk-neutral $\bar{P}$')
+ax2.legend(fontsize=9, loc='lower right')
 
 plt.suptitle('stationary distributions of $(X_1, X_2)$ under $P$ and $\\hat{P}$\n'
-             '(reproducing Figure 1 of Borovička, Hansen & Scheinkman 2016)',
+             '(based on Figure 1 of Borovička, Hansen & Scheinkman 2016)',
              fontsize=12, y=1.02)
 plt.tight_layout();  plt.show()
 ```
@@ -1033,20 +1131,18 @@ plt.tight_layout();  plt.show()
 The recovered measure $\hat{P}$ concentrates around **lower mean growth** (more negative
 $X_1$) and **higher conditional volatility** (larger $X_2$).
 
-Forecasts made using
-$\hat{P}$ are systematically pessimistic compared to forecasts based on the true
-distribution $P$.
+Forecasts made using $\hat{P}$ are systematically pessimistic compared to forecasts
+based on the true distribution $P$.
 
 ## Measuring the martingale component
 
 ### Entropy bounds
 
-Even without observing the full array of Arrow prices, we can obtain **lower bounds**
-on the size of the martingale component.
+Even without observing the full array of Arrow prices, we can obtain **lower bounds** on
+the size of the martingale component.
 
-For a convex function
-$\phi_\theta(r) = [(r)^{1+\theta} - 1] / [\theta(1+\theta)]$, the discrepancy
-between $\hat{P}$ and $P$ satisfies
+For a convex function $\phi_\theta(r) = [(r)^{1+\theta} - 1] / [\theta(1+\theta)]$, the
+discrepancy between $\hat{P}$ and $P$ satisfies
 
 $$
 \lambda_\theta = E\!\left[\phi_\theta\!\left(\frac{\hat{H}_{t+1}}{\hat{H}_t}\right)\right]
@@ -1062,7 +1158,7 @@ Two special cases are:
 
 ```{code-cell} ipython3
 def φ_θ(r, θ):
-    """Discrepancy function φ_θ(r) = [(r)^{1+θ} - 1] / [θ(1+θ)]."""
+    """Discrepancy function."""
     if abs(θ) < 1e-10:      # θ -> 0: relative entropy r log r
         return r * np.log(r)
     if abs(θ + 1) < 1e-10:  # θ -> -1: -log r
@@ -1071,24 +1167,20 @@ def φ_θ(r, θ):
 
 
 def martingale_entropy(Q, P, θ=-1):
-    """
-    Compute the stationary-average discrepancy E[φ_θ(h_hat)] for the finite-state chain.
-    """
+    """Stationary-average discrepancy E[φ_θ(h_hat)]."""
     _, exp_η, e_hat, P_hat = perron_frobenius(Q)
-    H_incr = np.where(P > 0, P_hat / P, 1.0)  # h_hat_ij
-    π_hat = stationary_dist(P_hat)
+    H_incr = np.where(P > 0, P_hat / P, 1.0)
+    π = stationary_dist(P)
 
-    # Stationary-average: sum_i sum_j π_hat_i h_hat_ij p_ij  φ_θ(h_hat_ij)
     disc = 0.0
     for i in range(P.shape[0]):
         for j in range(P.shape[1]):
             if P[i, j] > 0:
-                disc += π_hat[i] * P[i, j] * φ_θ(H_incr[i, j], θ)
+                disc += π[i] * P[i, j] * φ_θ(H_incr[i, j], θ)
     return disc
 
 
-# Compute entropy for different γ values
-γs_ent = np.linspace(1.0, 10.0, 50)  # γ=1 handled by solve_ez_finite
+γs_ent = np.linspace(1.0, 10.0, 50)
 entropies = {'θ=-1 (neg. log)': [], 'θ=0 (rel. entropy)': [], 'θ=1 (variance/2)': []}
 
 for γ_val in γs_ent:
@@ -1112,9 +1204,9 @@ plt.tight_layout();  plt.show()
 ```
 
 All three discrepancy measures increase with risk aversion, confirming that a higher
-$\gamma$ implies a larger — and more economically significant — martingale component.
+$\gamma$ implies a larger -- and more economically significant -- martingale component.
 
-{cite}`AlvarezJermann2005` and {cite}`BakshiChabiYo2012` use analogous bounds with
+{cite:t}`AlvarezJermann2005` and {cite:t}`BakshiChabiYo2012` use analogous bounds with
 long-maturity bond returns to find empirically large martingale components in U.S. data.
 
 ## Exercises
@@ -1122,8 +1214,9 @@ long-maturity bond returns to find empirically large martingale components in U.
 ```{exercise}
 :label: ex_risk_neutral
 
-**Verify risk-neutral probabilities.** Consider a two-state Markov chain with physical
-transition matrix
+**Verify risk-neutral probabilities.**
+
+Consider a two-state Markov chain with physical transition matrix
 
 $$
 \mathbf{P} = \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix}
@@ -1147,7 +1240,6 @@ $$
 ```
 
 ```{code-cell} ipython3
-# Exercise 1 solution
 P2 = np.array([[0.8, 0.2],
                [0.4, 0.6]])
 Q2 = np.array([[0.72, 0.15],
@@ -1155,18 +1247,20 @@ Q2 = np.array([[0.72, 0.15],
 
 P_bar2, q_bonds2 = risk_neutral_probs(Q2)
 
-print("Risk-neutral transition matrix P_bar:")
+print("Risk-neutral P_bar:")
 print(np.round(P_bar2, 4))
 print(f"\nRow sums: {P_bar2.sum(axis=1)}")
-print(f"\nOne-period bond prices q_bar_i: {q_bonds2}")
+print(f"\nBond prices q_bar_i: {q_bonds2}")
 print(f"Annualized risk-free rates: {(-np.log(q_bonds2)*12).round(4)}")
 
-# Verify SDF independence from j
+S_bar2 = np.repeat(q_bonds2[:, np.newaxis], P_bar2.shape[1], axis=1)
+print(f"\nRisk-neutral SDF matrix S_bar:")
+print(np.round(S_bar2, 4))
+print("Check Q = S_bar * P_bar:", np.allclose(Q2, S_bar2 * P_bar2))
+
 S2 = Q2 / P2
-print(f"\nSDF matrix S = Q/P:")
+print(f"\nPhysical SDF matrix S = Q/P:")
 print(np.round(S2, 4))
-print("Row 0: all entries should equal q_bar0 =", round(q_bonds2[0], 4))
-print("Row 1: all entries should equal q_bar1 =", round(q_bonds2[1], 4))
 ```
 
 ```{solution-end}
@@ -1175,17 +1269,21 @@ print("Row 1: all entries should equal q_bar1 =", round(q_bonds2[1], 4))
 ```{exercise}
 :label: ex_gamma_sensitivity
 
-**Risk aversion and recovery distortion.** Using the three-state example from the
-lecture (with $\delta = 0.99$ and trend-stationary consumption levels
-$c = [0.85, 1.00, 1.15]$), investigate how the recovered probability
-vector $\hat{\boldsymbol{\pi}}$ depends on the risk aversion parameter $\gamma$.
+**Risk aversion and recovery distortion under recursive utility.**
+
+Using the three-state Epstein--Zin example from the lecture (with $\exp(-\delta)=0.99$,
+$g_c=0.001$, and consumption levels $c = [0.85, 1.00, 1.15]$), investigate how the
+recovered probability vector $\hat{\boldsymbol{\pi}}$ depends on the risk aversion
+parameter $\gamma$.
 
 1. For each $\gamma \in \{1, 2, 5, 10, 15\}$, compute the long-term risk-neutral
-   stationary distribution $\hat{\boldsymbol{\pi}}$.
+   stationary distribution $\hat{\boldsymbol{\pi}}$ using the recursive-utility SDF.
 2. Plot all five distributions as grouped bar charts alongside the physical
    distribution $\boldsymbol{\pi}$.
-3. At what value of $\gamma$ does the recession probability under $\hat{\mathbf{P}}$
-   exceed $50\%$?
+3. Does the recession probability under $\hat{\mathbf{P}}$ exceed $50\%$ for
+   $\gamma \leq 30$?
+
+If not, report the maximum value on that range.
 ```
 
 ```{solution-start} ex_gamma_sensitivity
@@ -1193,15 +1291,12 @@ vector $\hat{\boldsymbol{\pi}}$ depends on the risk aversion parameter $\gamma$.
 ```
 
 ```{code-cell} ipython3
-# Exercise 2 solution
 γs_ex2 = [1, 2, 5, 10, 15]
 all_π = []
 
 for γ_val in γs_ex2:
-    Q_g = np.zeros((3, 3))
-    for i in range(3):
-        for j in range(3):
-            Q_g[i, j] = δ * (c_levels[j]/c_levels[i])**(-γ_val) * P_phys[i, j]
+    _, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
+    Q_g = S_g * P_phys
     _, _, _, Ph_g = perron_frobenius(Q_g)
     all_π.append(stationary_dist(Ph_g))
 
@@ -1210,7 +1305,6 @@ x = np.arange(3)
 w = 0.13
 colors_g = plt.cm.Blues(np.linspace(0.3, 0.9, len(γs_ex2)))
 
-# Physical distribution
 bars = ax.bar(x - 3*w, π_phys, width=w, color='grey', alpha=0.7, label='physical P')
 for b_, v in zip(bars, π_phys):
     ax.text(b_.get_x()+w/2, v+0.005, f'{v:.3f}', ha='center', va='bottom', fontsize=7)
@@ -1228,16 +1322,14 @@ ax.set_title(r'stationary distribution of $\hat{P}$ for varying risk aversion $\
 ax.legend(fontsize=8, loc='upper right')
 plt.tight_layout();  plt.show()
 
-# Part 3: find γ where recession probability under P_hat exceeds 50%
 γs_fine = np.linspace(1, 30, 200)
 rec_probs = []
 for γ_val in γs_fine:
-    Q_g = np.array([[δ*(c_levels[j]/c_levels[i])**(-γ_val)*P_phys[i,j]
-                     for j in range(3)] for i in range(3)])
+    _, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
+    Q_g = S_g * P_phys
     _, _, _, Ph_g = perron_frobenius(Q_g)
     rec_probs.append(stationary_dist(Ph_g)[0])
 
-# Interpolate crossing point
 idx50 = np.where(np.array(rec_probs) > 0.5)[0]
 if len(idx50) > 0:
     print(f"\nRecession prob under P_hat exceeds 50% at approximately γ ~ {γs_fine[idx50[0]]:.1f}")
@@ -1252,15 +1344,16 @@ else:
 ```{exercise}
 :label: ex_lrr_gamma
 
-**Effect of risk aversion in the long-run risk model.** Repeat the long-run risk
-simulation from the lecture for $\gamma \in \{5, 10, 15\}$ (keeping all other
-parameters fixed at their calibrated values).
+**Effect of risk aversion in the long-run risk model.**
+
+Repeat the long-run risk simulation from the lecture for $\gamma \in \{5, 10, 15\}$
+(keeping all other parameters fixed at their calibrated values).
 
 1. For each $\gamma$, compute $(\bar{e}_1, \bar{e}_2)$ and $\hat{\eta}$.
-2. Plot $\hat{\iota}_1$ (long-run mean of $X_1$ under $\hat{P}$) as a function of $\gamma$.
-   Interpret the result in terms of long-run expected consumption growth.
-3. Plot $\hat{\iota}_2$ (long-run mean of $X_2$ under $\hat{P}$) as a function of $\gamma$.
-   Interpret in terms of long-run volatility.
+2. Plot $\hat{\iota}_1$ (long-run mean of $X_1$ under $\hat{P}$) as a function of
+   $\gamma$ and interpret the result in terms of long-run expected consumption growth.
+3. Plot $\hat{\iota}_2$ (long-run mean of $X_2$ under $\hat{P}$) as a function of
+   $\gamma$ and interpret it in terms of long-run volatility.
 ```
 
 ```{solution-start} ex_lrr_gamma
@@ -1268,13 +1361,12 @@ parameters fixed at their calibrated values).
 ```
 
 ```{code-cell} ipython3
-# Exercise 3 solution
 γs_lrr = np.linspace(2.0, 18.0, 40)
 ι_hat_1_vals = []
 ι_hat_2_vals = []
 η_hat_vals = []
 
-p_copy = dict(lrr_params)  # copy to modify γ
+p_copy = dict(lrr_params)
 
 for γ_val in γs_lrr:
     p_copy['γ'] = γ_val
@@ -1312,8 +1404,6 @@ axes[2].set_title('long-run discount rate $\\hat{\\eta}$\n(more negative = highe
 
 plt.tight_layout();  plt.show()
 
-print("Higher γ -> more negative ι_hat1 (P_hat expects lower growth than P)")
-print("Higher γ -> higher ι_hat2 (P_hat expects higher volatility than P)")
 ```
 
 ```{solution-end}
@@ -1322,14 +1412,16 @@ print("Higher γ -> higher ι_hat2 (P_hat expects higher volatility than P)")
 ```{exercise}
 :label: ex_recovery_test
 
-**Testing the Ross recovery condition.** Show algebraically and numerically that, for
-any $n$-state power-utility model with trend-stationary consumption (as in Example 1 of
+**Testing the Ross recovery condition.**
+
+Show algebraically and numerically that, for any $n$-state power-utility model with
+trend-stationary consumption (as in Example 1 of
 {cite}`BorovickaHansenScheinkman2016`), the martingale increment satisfies
 $\hat{h}_{ij} \equiv 1$.
 
-1. Write the SDF as $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ for some constant $A$.
-   Show that the Perron-Frobenius eigenvector is $\hat{e}_j = c_j^\gamma$ (up to scale)
-   and find $\hat{\eta}$.
+1. Write the SDF as $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ for some constant $A$,
+   show that the Perron-Frobenius eigenvector is $\hat{e}_j = c_j^\gamma$ (up to
+   scale), and find $\hat{\eta}$.
 2. Compute $\hat{p}_{ij} = \exp(-\hat{\eta}) q_{ij} \hat{e}_j / \hat{e}_i$ and verify
    it equals $p_{ij}$.
 3. Confirm numerically for the three-state example with $\gamma = 5$ and
@@ -1342,7 +1434,8 @@ $\hat{h}_{ij} \equiv 1$.
 
 **Analytical derivation:**
 
-With $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ we have $q_{ij} = A(c_j/c_i)^{-\gamma} p_{ij}$.
+With $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ we have
+$q_{ij} = A(c_j/c_i)^{-\gamma} p_{ij}$.
 
 Guess $\hat{e}_j = c_j^\gamma$.
 
@@ -1352,12 +1445,14 @@ $$
 [\mathbf{Q} \hat{\mathbf{e}}]_i
 = \sum_j q_{ij} \hat{e}_j
 = A \sum_j \frac{c_j^{-\gamma}}{c_i^{-\gamma}} p_{ij} \cdot c_j^\gamma
-= A \sum_j p_{ij}
-= A.
+= A c_i^\gamma \sum_j p_{ij}
+= A \hat e_i.
 $$
 
-So $\mathbf{Q}\hat{\mathbf{e}} = A \hat{\mathbf{e}}$, confirming $\hat{\mathbf{e}} = \{c_j^\gamma\}$
-and $\exp(\hat{\eta}) = A$.  Therefore
+So $\mathbf{Q}\hat{\mathbf{e}} = A \hat{\mathbf{e}}$, confirming
+$\hat{\mathbf{e}} = \{c_j^\gamma\}$ and $\exp(\hat{\eta}) = A$.
+
+Therefore
 
 $$
 \hat{p}_{ij}
@@ -1370,8 +1465,6 @@ $$
 Hence $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij} = 1$ for all $(i,j)$.
 
 ```{code-cell} ipython3
-# Exercise 4 numerical verification
-# Use trend-stationary power utility (Section "When Does Recovery Succeed?")
 gc_ex4 = 0.002
 S_ts = np.zeros((3, 3))
 for i in range(3):
@@ -1380,24 +1473,20 @@ for i in range(3):
 
 Q_ts = S_ts * P_phys
 
-# Perron-Frobenius
 _, exp_η_ts, e_hat_ts, P_hat_ts = perron_frobenius(Q_ts)
 
-# Check eigenvector is proportional to c^γ
 e_theory = c_levels**γ
 e_theory /= e_theory.sum()
 
-print("Computed eigenvector e_hat:", np.round(e_hat_ts, 6))
-print("Theoretical c^γ / norm: ", np.round(e_theory, 6))
+print("e_hat:", np.round(e_hat_ts, 6))
+print("c^γ normalized:", np.round(e_theory, 6))
 print(f"Max discrepancy: {np.abs(e_hat_ts - e_theory).max():.2e}")
 
 H_ts = np.where(P_phys > 0, P_hat_ts / P_phys, 0.0)
-print(f"\nMartingale increment matrix h_hat:")
+print("\nh_hat:")
 print(np.round(H_ts, 6))
 print(f"Max |h_hat_ij - 1|: {np.abs(H_ts[P_phys>0] - 1).max():.2e}")
-print("-> Recovery is exact for trend-stationary power utility.")
 ```
 
 ```{solution-end}
 ```
-
diff --git a/lectures/odu.md b/lectures/odu.md
index c62519468..f15c54fb9 100644
--- a/lectures/odu.md
+++ b/lectures/odu.md
@@ -245,12 +245,12 @@ What kind of optimal policy might result from
 {eq}`odu_mvf` and the parameterization specified above?
 
 Intuitively, if we accept at $w_a$ and $w_a\leq w_b$,
-then — all other things being given — we should also accept at $w_b$.
+then -- all other things being given -- we should also accept at $w_b$.
 
 This suggests a policy of accepting whenever $w$ exceeds some
 threshold value $\bar w$.
 
-But $\bar w$ should depend on $\pi$ — in
+But $\bar w$ should depend on $\pi$ -- in
 fact, it should be decreasing in $\pi$ because
 
 - $f$ is a less attractive offer distribution than $g$
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index 79ed60f3c..c5f10c97c 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -28,40 +28,57 @@ kernelspec:
 
 ## Overview
 
-From option prices we can extract risk-neutral (martingale) probabilities of
-future outcomes.
+Option prices reveal **risk-neutral probabilities**, the probabilities implied by asset
+prices once risk adjustments have been folded in.
 
-But risk-neutral probabilities blend two things: the market's
-*actual* probability beliefs and investors' *risk aversion*.
+These are not the **natural probabilities** that investors actually assign to future
+states of the world.
 
-Disentangling the
-two seems to require imposing  parametric assumptions on
-preferences of a representative investor.
+The two differ because risk-neutral probabilities blend together two distinct objects:
+the market's true beliefs about the future, and investors' aversion to risk.
 
-Nevertheless, {cite}`Ross2015` showed that under a key assumption — the **transition
-independence** of the pricing kernel — the natural (real-world) probability
-distribution and the pricing kernel can be uniquely recovered from state prices
-alone, without historical return data or parametric utility functions.
+The link between them is the **pricing kernel**, which reweights natural probabilities
+to deliver state prices.
 
-This
-result is called the **Recovery Theorem**.
+For example, using the Arrow-security language from {doc}`ge_arrow`, suppose tomorrow
+has two states, recession and boom, with natural probabilities $(0.5, 0.5)$ and pricing
+kernels $(1.2, 0.7)$.
 
-The theorem has several important implications.
+The Arrow prices are then $(0.6, 0.35)$, so the riskless discount factor is the row sum
+$0.95$.
+
+Normalizing the Arrow prices by this row sum gives risk-neutral probabilities
+$(0.6/0.95, 0.35/0.95) \approx (0.63, 0.37)$, which overweight the recession state even
+though the natural probability of recession is only $0.5$.
+
+Separating beliefs from risk aversion has traditionally required parametric assumptions
+about the preferences of a representative investor.
+
+{cite:t}`Ross2015` showed otherwise.
+
+Under a structural restriction on the pricing kernel called **transition independence**,
+the natural probability distribution and the pricing kernel can be uniquely recovered
+from state prices alone with no historical return data and no assumed utility
+function.
+
+This is the **Recovery Theorem**.
+
+It has several important implications:
 
 * It enables model-free extraction of the market's forward-looking probability
   distribution from option prices.
 * It provides model-free tests of the efficient market hypothesis.
 * It sheds light on the "dark matter" of finance: the probability of rare
-  catastrophic events allegedly embedded in market prices.
+  catastrophic events embedded in market prices.
 
 This lecture covers
 
-* The basic Arrow–Debreu framework linking state prices, the pricing kernel,
-  and natural probabilities.
-* Ross's Recovery Theorem and its proof via the Perron–Frobenius theorem.
-* A computational implementation that recovers the natural distribution from a
-  simulated state-price matrix.
-* Comparisons between risk-neutral and recovered natural densities.
+* the Arrow–Debreu framework linking state prices, the pricing kernel, and natural
+  probabilities,
+* Ross's Recovery Theorem and its proof via the Perron–Frobenius theorem,
+* an implementation that recovers the natural distribution from a
+  simulated state-price matrix, and
+* comparisons between risk-neutral and recovered natural densities.
 
 Let's import the packages we'll need.
 
@@ -71,8 +88,6 @@ import matplotlib.pyplot as plt
 from scipy.linalg import eig
 from scipy.stats import norm
 import matplotlib.cm as cm
-
-plt.rcParams['figure.figsize'] = (10, 6)
 ```
 
 ## Model setup
@@ -81,16 +96,13 @@ plt.rcParams['figure.figsize'] = (10, 6)
 
 Consider a discrete-time, discrete-state economy.
 
-At each date the economy
-occupies one of $m$ states $\theta_1, \ldots, \theta_m$.
+At each date the economy occupies one of $m$ states $\theta_1, \ldots, \theta_m$.
 
-An **Arrow–Debreu
-security** pays \$1 if the economy is in state $\theta_j$ next period and
-nothing otherwise.
+An **Arrow–Debreu security** pays \$1 if the economy is in state $\theta_j$ next period
+and nothing otherwise.
 
-Denote by $p(\theta_i, \theta_j)$ the price today, when the current state is
-$\theta_i$, of the Arrow–Debreu security paying in state $\theta_j$ next
-period.
+Denote by $p(\theta_i, \theta_j)$ the price today, when the current state is $\theta_i$,
+of the Arrow–Debreu security paying in state $\theta_j$ next period.
 
 Collect these into an $m \times m$ **state price transition matrix**
 
@@ -98,49 +110,59 @@ $$
 P = [p(\theta_i, \theta_j)]_{i,j=1}^m.
 $$
 
-The row sums give the state-dependent interest factor: $\sum_j p(\theta_i,
-\theta_j) = e^{-r(\theta_i)}$.
+As in {doc}`ge_arrow`, the row sums give the state-dependent riskless discount factor:
+$\sum_j p(\theta_i, \theta_j) = e^{-r(\theta_i)}$.
 
 ### The pricing kernel
 
-From the Fundamental Theorem of Asset Pricing, the pricing kernel
-$\phi(\theta_i, \theta_j)$ relates state prices to natural probabilities via
+Using the stochastic-discount-factor notation studied in {doc}`markov_asset` and the
+Arrow-security notation used in {doc}`ge_arrow`, the pricing kernel $\phi(\theta_i,
+\theta_j)$ relates state prices to natural probabilities via
 
 $$
 p(\theta_i, \theta_j) = \phi(\theta_i, \theta_j) \, f(\theta_i, \theta_j),
 $$
 
-where $f(\theta_i, \theta_j)$ is the natural (conditional) probability of
-transitioning from state $\theta_i$ to $\theta_j$.
+where $f(\theta_i, \theta_j)$ is the natural (conditional) probability of transitioning
+from state $\theta_i$ to $\theta_j$.
 
-In the canonical representative-agent model with additively separable utility
-and discount factor $\delta$, the first-order condition gives
+As in the representative-agent equilibrium calculation in {doc}`ge_arrow`, the
+canonical additively separable model with discount factor $\delta$ gives
 
 $$
 \phi(\theta_i, \theta_j) = \frac{p(\theta_i, \theta_j)}{f(\theta_i, \theta_j)}
     = \frac{\delta U'(c(\theta_j))}{U'(c(\theta_i))}.
-$$
+$$ (eq:canon_ge)
 
 The key structural property this implies is **transition independence**.
 
 ### Transition independence
 
 
-**Definition.** A pricing kernel is *transition independent* if there exists a
-positive function $h$ on the state space and a positive scalar $\delta$ such
-that for every transition from state $\theta_i$ to $\theta_j$,
+```{prf:definition} Transition Independence
+:label: def-transition-independence
+
+A pricing kernel is **transition independent** if there exists a positive function $h$ on
+the state space and a positive scalar $\delta$ such that for every transition from state
+$\theta_i$ to $\theta_j$,
 
 $$
 \phi(\theta_i, \theta_j) = \delta \, \frac{h(\theta_j)}{h(\theta_i)}.
 $$
+```
+
+
+Transition independence says the kernel depends on the *ending* state and normalizes by
+the *beginning* state.
 
+It holds for any agent with intertemporally additive separable utility (where $h = U'$).
 
-Transition independence says the kernel depends on the *ending* state and
-normalizes by the *beginning* state.
+In particular, this holds for {eq}`eq:canon_ge`.
 
-It holds for any agent with
-intertemporally additive separable utility (where $h = U'$) and also for
-Epstein–Zin recursive preferences {cite}`Epstein_Zin1989`.
+Ross also notes that some Epstein--Zin specifications can produce transition-independent
+kernels {cite}`Epstein_Zin1989`, although {doc}`misspecified_recovery` shows that
+recursive utility with nontrivial continuation-value martingales need not satisfy the
+Ross restriction.
 
 Under transition independence, the state-price equation becomes
 
@@ -165,8 +187,8 @@ $$
 
 ### Reduction to an eigenvalue problem
 
-Since $F$ is a stochastic matrix, its rows sum to one: $F e = e$ where $e$
-is the vector of ones.
+Since $F$ is a stochastic matrix, its rows sum to one: $F e = e$ where $e$ is the vector
+of ones.
 
 Substituting the expression for $F$:
 
@@ -176,33 +198,70 @@ $$
 P z = \delta z, \quad z \equiv D^{-1} e.
 $$
 
-This is an **eigenvalue problem**: we seek a positive vector $z$ and scalar
-$\delta$ satisfying $Pz = \delta z$.
+This is an **eigenvalue problem**: we seek a positive vector $z$ and scalar $\delta$
+satisfying $Pz = \delta z$.
 
-The Perron–Frobenius theorem guarantees exactly one such solution when $P$ is
-nonnegative and irreducible.
+In principle every eigenvalue-eigenvector pair of $P$ is a formal solution, but only the
+one with a strictly positive eigenvector is economically valid: $D_{ii} = 1/z_i$ must be
+positive (so $z_i > 0$), and $F$ must have nonnegative entries.
 
-**Theorem (Perron–Frobenius).** Every nonnegative irreducible matrix has a
-unique positive eigenvector (up to scaling) and a unique largest positive
-eigenvalue.
+The theorem below guarantees that exactly one such pair exists.
 
-Section 1.2.3 of {cite}`Sargent_Stachurski_2024` provides a proof  of this theorem as well as a discussion of its applications to economic networks.
+```{prf:theorem} Perron--Frobenius
+:label: thm-perron-frobenius
+
+If $A$ is a nonnegative irreducible matrix, then
+
+1. $A$ has a unique largest positive real eigenvalue $r$ (the Perron root).
+2. There exists a strictly positive eigenvector $z \gg 0$ with $Az = rz$,
+   unique up to scaling.
+```
+
+The proof uses the invariance of the positive cone and irreducibility to isolate the
+unique positive ray associated with the Perron root.
+
+See Section 1.2.3 of {cite}`Sargent_Stachurski_2024` for details.
+
+Applied to the recovery problem: the Perron root is $\delta$ (the subjective discount
+factor) and the Perron vector $z$ determines $D$ via $D_{ii} = 1/z_i$, closing the
+system uniquely.
 
 
 ### Ross's recovery theorem
 
+The three assumptions in the theorem each carry a specific role.
+
+No-arbitrage guarantees that $P$ has nonnegative entries and that the state prices
+encode a well-defined pricing measure.
+
+Irreducibility ensures the economy is not divided into disconnected sub-economies --
+without it, the Perron–Frobenius theorem gives multiple candidate eigenvectors and
+recovery breaks down.
+
+Transition independence is the key economic restriction: it says the pricing kernel
+factors as $\delta h(\theta_j)/h(\theta_i)$, so the entire kernel is pinned down by a
+single vector $h$ (or equivalently $z$).
+
 
-**Theorem 1 (Recovery Theorem, {cite}`Ross2015`).** Suppose prices provide  no
-arbitrage opportunities, that the state price transition matrix $P$ is irreducible, and that the
-pricing kernel is transition independent.  Then there exists a *unique*
-positive solution $(\delta, z, F)$ to the recovery problem.  That is, for any
-set of state prices there is a unique compatible natural probability transition
-matrix and a unique pricing kernel.
+```{prf:theorem} Recovery Theorem
+:label: thm-ross-recovery
 
+Suppose prices provide no arbitrage opportunities, that the state
+price transition matrix $P$ is irreducible, and that the pricing kernel is transition
+independent.
 
-*Proof sketch.*  Because $P$ is nonnegative and irreducible, the
-Perron–Frobenius theorem gives a unique positive eigenvector $z > 0$ with
-positive eigenvalue $\lambda > 0$ satisfying $Pz = \lambda z$.  Setting
+Then there exists a *unique* positive solution $(\delta, z, F)$ to the recovery problem.
+
+That is, for any set of state prices there is a unique compatible natural probability
+transition matrix and a unique pricing kernel.
+```
+
+```{prf:proof}
+Because $P$ is nonnegative and irreducible, the Perron–Frobenius theorem gives a unique
+positive eigenvector $z \gg 0$ with positive eigenvalue $\lambda > 0$ satisfying
+$Pz = \lambda z$.
+
+Setting
 
 $$
 \delta = \lambda, \qquad D_{ii} = \frac{1}{z_i},
@@ -214,39 +273,64 @@ $$
 f_{ij} = \frac{1}{\delta} \frac{z_j}{z_i} \, p_{ij}.
 $$
 
-One can verify that $F$ is indeed a stochastic matrix: all entries are
-positive and each row sums to one.
+To confirm $F$ is stochastic, note that all entries are nonnegative (since
+$p_{ij} \geq 0$ and $z_i, z_j > 0$) and
+
+$$
+\sum_j f_{ij}
+= \frac{1}{\delta z_i} \sum_j z_j \, p_{ij}
+= \frac{[Pz]_i}{\delta z_i}
+= \frac{\delta z_i}{\delta z_i} = 1.
+$$
 
-Uniqueness follows from the uniqueness of
-the Perron–Frobenius eigenvector. $\blacksquare$
+Uniqueness follows from the uniqueness of the Perron--Frobenius eigenvector.
+```
 
 ### Pricing kernel from the eigenvector
 
 The recovered kernel values are
 
 $$
-\phi(\theta_i, \theta_j) = \frac{\delta}{z_i / z_j}
-    = \frac{z_j}{z_i} \cdot \frac{1}{1},
+\phi(\theta_i, \theta_j) = \delta \frac{z_i}{z_j},
+\qquad h(\theta_i) = \frac{\delta}{z_i},
 $$
 
-so the kernel at state $\theta_i$ (relative to a baseline state) is $1/z_i$.
+where $h(\theta_i) = \delta/z_i$ follows from $D_{ii} = h(\theta_i)/\delta = 1/z_i$.
+
+Destination states with high $z_j$ have **low** kernel values: for a fixed origin $i$,
+the kernel $\delta z_i/z_j$ is decreasing in $z_j$.
 
-States with high $z_i$ have **low** kernel values, meaning the market assigns
-relatively less pricing weight per unit of probability — consistent with those
-states being "good times."
+This means the market assigns relatively less pricing weight per unit of probability to
+high-$z_j$ outcomes -- consistent with those states being "good times" that require less
+insurance.
 
 ### Corollary: risk-neutral pricing when rates are state-independent
 
 
-**Theorem 2 ({cite}`Ross2015`).** If the riskless rate is the same in all
-states ($Pe = \gamma e$ for some scalar $\gamma$), then the unique natural
-distribution consistent with recovery is the risk-neutral (martingale)
-distribution itself: $F = (1/\gamma) P$.
+```{prf:corollary}
+:label: cor-risk-neutral-recovery
+
+If the riskless rate is the same in all states ($Pe = \gamma e$ for
+some scalar $\gamma$), then the unique natural distribution consistent with recovery is
+the risk-neutral (martingale) distribution itself: $F = (1/\gamma) P$.
+```
+
+```{prf:proof}
+When $Pe = \gamma e$, the vector of ones $e$ is the Perron eigenvector with eigenvalue
+$\gamma$.
 
+By the uniqueness part of the Perron--Frobenius theorem, $z = e$ (up to scaling) and
+$\delta = \gamma$.
 
-This remarkable result says that with a constant interest rate and a bounded
-irreducible state space, recovery forces risk-neutrality — a non-trivial
-restriction of the model.
+Setting $z = e$ gives $D = I$ (the identity matrix), so
+
+$$
+F = \frac{1}{\delta} D P D^{-1} = \frac{1}{\gamma} P. \qquad \square
+$$
+```
+
+This remarkable result says that with a constant interest rate and a bounded irreducible
+state space, recovery forces risk-neutrality -- a non-trivial restriction of the model.
 
 ## Python implementation
 
@@ -254,73 +338,64 @@ We now implement the Recovery Theorem numerically.
 
 ### Building a state price matrix from a lognormal model
 
-Following {cite}`Ross2015` Section IV, suppose the natural distribution of
-log-returns over one period is normal:
+Following {cite}`Ross2015` Section IV, suppose the natural distribution of log-returns
+over one period is normal:
 
 $$
 \log(S_T/S_0) \sim \mathcal{N}\!\left((\mu - \tfrac{1}{2}\sigma^2)T, \sigma^2 T\right).
 $$
 
-With a CRRA pricing kernel $\phi(S_0, S_T) = e^{-\delta T}(S_T/S_0)^{-\gamma}$,
-the state price density is
+With a CRRA pricing kernel $\phi(S_0, S_T) = e^{-\delta T}(S_T/S_0)^{-\gamma}$, the
+state price density is
 
 $$
-P_T(s, s_T) = e^{-\delta T} e^{-\gamma(s_T - s)} \,
-    n\!\left(\frac{s_T - s - (\mu - \frac{1}{2}\sigma^2)T}{\sigma\sqrt{T}}\right),
+f_{ij} \propto
+    n\!\left(\frac{s_j - s_i - (\mu - \frac{1}{2}\sigma^2)T}{\sigma\sqrt{T}}\right)
+    \Delta s,
+\qquad
+p_{ij} = e^{-\delta T} e^{-\gamma(s_j - s_i)} f_{ij},
 $$
 
-where $s = \ln S_0$, $s_T = \ln S_T$, and $n(\cdot)$ is the standard normal
-density.
+where $s_i = \ln S_i$, $s_j = \ln S_j$, $n(\cdot)$ is the standard normal density, and
+the discretized probabilities $f_{ij}$ are normalized row by row.
 
 We discretize this onto a grid of $m$ states and build the matrix $P$.
 
 ```{code-cell} ipython3
 def build_state_price_matrix(μ, σ, γ, δ, T=1.0, n_states=11, n_σ=5):
-    """
-    Build an m x m state price transition matrix for the lognormal / CRRA model.
-
-    Parameters
-    ----------
-    μ      : float  Natural expected log-return (annualised)
-    σ      : float  Volatility (annualised)
-    γ      : float  Coefficient of relative risk aversion
-    δ      : float  Subjective discount rate
-    T      : float  Horizon (years) for one period
-    n_states : int  Number of discrete states
-    n_σ    : int    Grid half-width in standard deviations
-
-    Returns
-    -------
-    P      : (m, m) array  State price matrix
-    states : (m,) array    State values (log-return grid)
-    """
-    # Equally-spaced grid from -n_σ*σ to +n_σ*σ
+    """Build a discretized lognormal/CRRA state-price matrix."""
     states = np.linspace(-n_σ * σ * np.sqrt(T),
                           n_σ * σ * np.sqrt(T),
                           n_states)
-    ds = states[1] - states[0]   # grid spacing
+    ds = states[1] - states[0]
 
     m = n_states
     P = np.zeros((m, m))
+    F = np.zeros((m, m))
 
     drift = (μ - 0.5 * σ**2) * T
 
+    # First build a row-stochastic natural transition matrix on the bounded grid.
     for i in range(m):
         s_i = states[i]
         for j in range(m):
             s_j = states[j]
             log_return = s_j - s_i
-            # Normal density evaluated at s_j given s_i
-            n_val = norm.pdf(log_return, loc=drift, scale=σ * np.sqrt(T))
-            # Pricing kernel
+            F[i, j] = norm.pdf(log_return, loc=drift,
+                               scale=σ * np.sqrt(T)) * ds
+
+        F[i] = F[i] / F[i].sum()
+
+        # Price each Arrow claim as natural probability times the CRRA kernel.
+        for j in range(m):
+            log_return = states[j] - s_i
             kernel = np.exp(-δ * T) * np.exp(-γ * log_return)
-            P[i, j] = kernel * n_val * ds
+            P[i, j] = kernel * F[i, j]
 
     return P, states
 ```
 
 ```{code-cell} ipython3
-# Parameters matching the numerical example in Ross (2015), Section IV
 μ = 0.08    # 8% annual expected return
 σ = 0.20    # 20% annual volatility
 γ = 3.0     # CRRA coefficient
@@ -330,10 +405,9 @@ T = 1.0     # one-year horizon
 P, states = build_state_price_matrix(μ, σ, γ, δ, T,
                                      n_states=11, n_σ=5)
 
-print("State price matrix P  (rows = current state, cols = future state)")
-print("Row sums (should equal discount factor e^{-r}):")
+print("State-price row sums:")
 print(np.round(P.sum(axis=1), 4))
-print(f"\nImplied annual interest rate: {-np.log(P[5].sum()):.4f}")
+print(f"Middle-state risk-free rate: {-np.log(P[5].sum()):.4f}")
 ```
 
 ### Applying the recovery theorem
@@ -342,22 +416,12 @@ The Recovery Theorem requires computing the **dominant eigenvector** of $P$.
 
 ```{code-cell} ipython3
 def recover_natural_distribution(P):
-    """
-    Apply the Recovery Theorem to state price matrix P.
-
-    Returns
-    -------
-    F     : (m, m) array  Recovered natural probability transition matrix
-    z     : (m,) array    Dominant eigenvector of P (Perron vector)
-    δ     : float         Recovered subjective discount rate
-    φ     : (m,) array    Recovered kernel values (relative to state 0)
-    """
+    """Recover natural probabilities and the pricing kernel from state prices."""
     m = P.shape[0]
 
-    # Compute all eigenvalues and right eigenvectors
     eigenvalues, eigenvectors = eig(P)
 
-    # Find the dominant (Perron) eigenvalue — largest positive real one
+    # Ross recovery uses the Perron root and its strictly positive eigenvector.
     real_mask = np.isreal(eigenvalues)
     real_eigenvalues = eigenvalues[real_mask].real
     real_eigenvectors = eigenvectors[:, real_mask].real
@@ -366,25 +430,21 @@ def recover_natural_distribution(P):
     δ_recovered = real_eigenvalues[idx]
     z = real_eigenvectors[:, idx]
 
-    # Ensure z is positive (Perron vector)
     if np.mean(z) < 0:
         z = -z
 
-    # Normalize so that z[reference] = 1
     z = z / z[m // 2]
 
-    # Diagonal matrix D with D_ii = 1/z_i
     D = np.diag(1.0 / z)
     D_inv = np.diag(z)
 
-    # Recover natural probability matrix
+    # The diagonal similarity transform converts state prices into probabilities.
     F = (1.0 / δ_recovered) * D @ P @ D_inv
 
-    # Clip small numerical negatives and renormalize rows
     F = np.clip(F, 0, None)
     F = F / F.sum(axis=1, keepdims=True)
 
-    # Pricing kernel relative to middle state
+    # The kernel is reported relative to the middle state normalization.
     φ = 1.0 / z
 
     return F, z, δ_recovered, φ
@@ -393,10 +453,11 @@ def recover_natural_distribution(P):
 ```{code-cell} ipython3
 F, z, δ_rec, φ = recover_natural_distribution(P)
 
-print(f"Recovered discount rate δ  = {δ_rec:.6f}  (true: {np.exp(-δ):.6f})")
-print(f"\nRecovered kernel φ (monotone decreasing in good states):")
+print(f"Recovered discount factor δ = {δ_rec:.6f}  (true: {np.exp(-δ):.6f})")
+print(f"Recovered discount rate     = {-np.log(δ_rec):.6f}  (true: {δ:.6f})")
+print("Recovered kernel φ:")
 print(np.round(φ, 4))
-print(f"\nNatural probability matrix F  (row sums should be 1):")
+print("Row sums of recovered F:")
 print(np.round(F.sum(axis=1), 6))
 ```
 
@@ -405,51 +466,24 @@ print(np.round(F.sum(axis=1), 6))
 A key insight of {cite}`Ross2015` is that the natural distribution systematically
 differs from the risk-neutral one.
 
-In particular, the natural distribution
-stochastically dominates the risk-neutral distribution (Theorem 3 in {cite}`Ross2015`).
+In particular, the natural distribution stochastically dominates the risk-neutral
+distribution (Theorem 3 in {cite}`Ross2015`).
 
 ```{code-cell} ipython3
-def get_marginal(transition_matrix, initial_row, n_periods, states_exp):
-    """
-    Compute the marginal distribution at horizon n_periods by iterating
-    the transition matrix starting from a given initial distribution.
-
-    Parameters
-    ----------
-    transition_matrix : (m, m) array
-    initial_row       : int  Index of the starting state
-    n_periods         : int  Horizon
-    states_exp        : (m,) array  Gross return levels exp(states)
-    """
-    m = transition_matrix.shape[0]
-    # Start with all weight on the initial row
-    dist = np.zeros(m)
-    dist[initial_row] = 1.0
-
-    for _ in range(n_periods):
-        dist = dist @ transition_matrix
-
-    return dist
-```
-
-```{code-cell} ipython3
-# Starting from the middle state (current state = S_0)
 mid = len(states) // 2
 
-# Risk-neutral transition matrix Q_ij = P_ij / sum_k P_ik  (row normalise P)
 row_sums = P.sum(axis=1, keepdims=True)
-Q_rn = P / row_sums   # risk-neutral probabilities
 
-# One-period marginals
-f_nat = F[mid, :]              # natural: row of recovered F
-f_rn = Q_rn[mid, :]            # risk-neutral: row of Q
+# Normalize Arrow prices by the one-period riskless bond price in each state.
+Q_rn = P / row_sums
+
+f_nat = F[mid, :]
+f_rn = Q_rn[mid, :]
 
-# State labels in gross return terms
 gross_returns = np.exp(states)
 
 fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
-# Panel A: densities
 axes[0].plot(gross_returns, f_nat, 'b-o', ms=5, label='natural (recovered)', lw=2)
 axes[0].plot(gross_returns, f_rn, 'r--s', ms=5, label='risk-neutral', lw=2)
 axes[0].set_xlabel('gross return $S_T / S_0$')
@@ -457,7 +491,6 @@ axes[0].set_ylabel('probability')
 axes[0].set_title('one-period marginal distributions')
 axes[0].legend()
 
-# Panel B: pricing kernel
 axes[1].plot(gross_returns, φ, 'g-^', ms=5, lw=2)
 axes[1].set_xlabel('gross return $S_T / S_0$')
 axes[1].set_ylabel('kernel $\\phi$ (relative)')
@@ -466,36 +499,35 @@ plt.show()
 ```
 
 ```{code-cell} ipython3
-# Compute summary statistics
 E_nat = np.sum(f_nat * gross_returns)
 E_rn = np.sum(f_rn * gross_returns)
 std_nat = np.sqrt(np.sum(f_nat * (gross_returns - E_nat)**2))
 std_rn = np.sqrt(np.sum(f_rn * (gross_returns - E_rn)**2))
 
-risk_free = np.sum(P[mid])   # price of riskless bond from middle state
+# The row sum is the price of a sure one-dollar payoff next period.
+risk_free = np.sum(P[mid])
 
-print("Summary Statistics (one-period horizon)")
+print("One-period summary")
 print(f"{'':30s} {'Natural':>12s}   {'Risk-Neutral':>12s}")
 print("-" * 57)
 print(f"{'Expected gross return':30s} {E_nat:>12.4f}   {E_rn:>12.4f}")
 print(f"{'Std dev':30s} {std_nat:>12.4f}   {std_rn:>12.4f}")
 print(f"{'Risk-free discount factor':30s} {risk_free:>12.4f}")
 print(f"{'Annual risk-free rate':30s} {-np.log(risk_free):>12.4f}")
-print(f"{'Equity risk premium':30s} {E_nat - 1/risk_free:>12.4f}")
+print(f"{'Arithmetic equity premium':30s} {E_nat - 1/risk_free:>12.4f}")
 ```
 
 ### Stochastic dominance
 
-Theorem 3 of {cite}`Ross2015` shows that the natural marginal density
-**first-order stochastically dominates** the risk-neutral density: the CDF of
-the natural distribution lies *below* that of the risk-neutral distribution.
+Theorem 3 of {cite}`Ross2015` shows that the natural marginal density **first-order
+stochastically dominates** the risk-neutral density: the CDF of the natural distribution
+lies *below* that of the risk-neutral distribution.
 
-Because the pricing kernel is declining (investors fear bad
-outcomes), risk-neutral probabilities overweight bad states and underweight
-good states relative to the natural measure.
+Because the pricing kernel is declining (investors fear bad outcomes), risk-neutral
+probabilities overweight bad states and underweight good states relative to the natural
+measure.
 
 ```{code-cell} ipython3
-# CDFs
 cdf_nat = np.cumsum(f_nat)
 cdf_rn = np.cumsum(f_rn)
 
@@ -508,40 +540,31 @@ ax.set_title('stochastic dominance: natural cdf lies below risk-neutral cdf')
 ax.legend()
 plt.show()
 
-# Verify dominance: natural CDF should be <= risk-neutral CDF at every point
 print(f"Natural CDF <= Risk-neutral CDF at all states: "
       f"{np.all(cdf_nat <= cdf_rn + 1e-10)}")
 ```
 
 ## Extracting the pricing kernel and risk premium
 
-The pricing kernel recovered from $P$ via the Perron–Frobenius theorem has the following interpretation.
+The pricing kernel recovered from $P$ via the Perron–Frobenius theorem has the following
+interpretation.
 
-In the CRRA model the kernel is proportional to
-$\exp(-\gamma \cdot \text{log-return})$, so it is decreasing in the return.
+In the CRRA model the kernel is proportional to $\exp(-\gamma \cdot \text{log-return})$,
+so it is decreasing in the return.
 
-The **equity risk premium** can be computed as the difference between the
-expected return under the natural measure and the risk-free rate:
+The **log equity risk premium** computed below is the log expected gross return under
+the natural measure minus the continuously compounded risk-free rate:
 
 $$
-\text{ERP} = E^f[R] - r_f = \frac{\sum_j f_{ij}\, (S_j/S_i)}{\sum_j p_{ij}} - 1.
+\text{log ERP}_i
+= \log\left(\sum_j f_{ij}\frac{S_j}{S_i}\right) - r_{f,i},
+\qquad
+r_{f,i} = -\log\left(\sum_j p_{ij}\right).
 $$
 
 ```{code-cell} ipython3
 def compute_risk_premia(P, F, states):
-    """
-    Compute the equity risk premium for each starting state.
-
-    Parameters
-    ----------
-    P, F   : (m, m) arrays  State price and natural probability matrices
-    states : (m,) array     Log-state grid
-
-    Returns
-    -------
-    erp    : (m,) array     Equity risk premium from each starting state
-    rf     : (m,) array     Risk-free rate from each starting state
-    """
+    """Compute log equity premia and risk-free rates by starting state."""
     m = len(states)
     gross_returns = np.exp(states)
 
@@ -549,14 +572,12 @@ def compute_risk_premia(P, F, states):
     erp = np.zeros(m)
 
     for i in range(m):
-        discount = P[i].sum()          # riskless discount factor
-        rf[i] = -np.log(discount)      # risk-free rate
+        discount = P[i].sum()
+        rf[i] = -np.log(discount)
 
-        # Expected gross return under natural measure
-        # We compute E[S_j/S_i] = sum_j F_ij * exp(s_j - s_i)
+        # Gross return from current state i to future state j.
         relative_returns = np.exp(states - states[i])
         E_R_nat = np.sum(F[i] * relative_returns)
-        E_R_rn = np.sum((P[i] / discount) * relative_returns)
 
         erp[i] = np.log(E_R_nat) - rf[i]
 
@@ -574,21 +595,23 @@ axes[0].set_title('risk-free rate by state')
 
 axes[1].plot(np.exp(states), erp * 100, 'r-^', ms=5, lw=2)
 axes[1].set_xlabel('current state $S / S_0$')
-axes[1].set_ylabel('equity risk premium (%)')
-axes[1].set_title('recovered equity risk premium by state')
+axes[1].set_ylabel('log equity risk premium (%)')
+axes[1].set_title('recovered log equity risk premium by state')
 
 plt.show()
 
 mid = len(states) // 2
-print(f"At the middle state:")
-print(f"  Risk-free rate  approx {rf[mid]*100:.2f}% (true: {δ*100:.2f}%)")
-print(f"  Equity premium  approx {erp[mid]*100:.2f}% (true: {(μ-δ)*100:.2f}%)")
+print("Middle state:")
+print(f"  Risk-free rate       approx {rf[mid]*100:.2f}% "
+      f"(calibration: {δ*100:.2f}%)")
+print(f"  Log equity premium  approx {erp[mid]*100:.2f}% "
+      f"(calibration: {(μ-δ)*100:.2f}%)")
 ```
 
 ## Sensitivity analysis: effect of risk aversion
 
-The shape of the pricing kernel, and hence the gap between natural and
-risk-neutral probabilities, depends on the coefficient of risk aversion $\gamma$.
+The shape of the pricing kernel, and hence the gap between natural and risk-neutral
+probabilities, depends on the coefficient of risk aversion $\gamma$.
 
 ```{code-cell} ipython3
 γs = [1.0, 2.0, 3.0, 5.0, 8.0]
@@ -606,8 +629,6 @@ for γ_val, color in zip(γs, colors):
     f_rn_g = P_g[mid_g] / row_sum
 
     gross = np.exp(states_g)
-    erp_val = (np.sum(f_nat_g * np.exp(states_g - states_g[mid_g]))
-               - np.exp(δ_g)) * 100
 
     axes[0].plot(gross, φ_g, color=color, lw=2,
                  label=f'$\\gamma={γ_val:.0f}$')
@@ -628,27 +649,27 @@ axes[1].legend(fontsize=9)
 plt.show()
 ```
 
-The plots confirm the single-crossing property from Theorem 3 of
-{cite}`Ross2015`: for returns below some threshold $v$, risk-neutral
-probability exceeds natural probability; above $v$ the natural probability
-dominates.
+The plots confirm the single-crossing property from Theorem 3 of {cite}`Ross2015`: for
+returns below some threshold $v$, risk-neutral probability exceeds natural probability;
+above $v$ the natural probability dominates.
 
 A higher $\gamma$ amplifies this wedge.
 
 ## Recovering the discount rate
 
-A useful by-product of the Recovery Theorem is the **recovered subjective
-discount rate** $\delta$, which equals the Perron–Frobenius eigenvalue of $P$.
+A useful by-product of the Recovery Theorem is the **recovered subjective discount
+factor** $\delta$, which equals the Perron–Frobenius eigenvalue of $P$.
 
-Corollary 1 of {cite}`Ross2015` states that $\delta$ is bounded above by the
-largest observed interest factor (i.e., the maximum row sum of $P$):
+The corresponding continuously compounded discount rate is $-\log \delta$.
+
+Corollary 1 of {cite}`Ross2015` states that $\delta$ is bounded above by the largest
+observed interest factor (i.e., the maximum row sum of $P$):
 
 $$
 \delta \leq \max_i \sum_j p(\theta_i, \theta_j).
 $$
 
 ```{code-cell} ipython3
-# Vary the true discount rate and check how well we recover it
 true_δs = np.linspace(0.00, 0.06, 13)
 recovered_δs = []
 
@@ -671,25 +692,23 @@ plt.show()
 
 ## Tail risk: natural vs. risk-neutral probabilities of catastrophe
 
-One of the most striking applications of the Recovery Theorem is its ability
-to separate the market's genuine fear of catastrophes from the risk premium
-attached to them.
+One of the most striking applications of the Recovery Theorem is its ability to separate
+the market's genuine fear of catastrophes from the risk premium attached to them.
 
-{cite}`barro2006rare` and {cite}`MehraPrescott1985` discuss how rare disasters
-might explain the equity premium puzzle.  
+{cite:t}`barro2006rare` and {cite:t}`MehraPrescott1985` discuss how rare disasters might
+explain the equity premium puzzle.
 
-The risk-neutral probability of a
-large decline is elevated both because (a) the market assigns a high natural
-probability to such events and (b) the pricing kernel upweights bad outcomes.
+The risk-neutral probability of a large decline is elevated both because (a) the market
+assigns a high natural probability to such events and (b) the pricing kernel upweights
+bad outcomes.
 
 Ross's Recovery Machinery lets us decompose these two forces.
 
 ```{code-cell} ipython3
-# Compare left-tail probabilities: P(R < threshold) under each measure
-thresholds = np.linspace(-0.40, 0.10, 200)   # log-returns
+thresholds = np.linspace(-0.40, 0.10, 200)
 
 def tail_prob(f_dist, states, threshold):
-    """CDF evaluated at threshold (log-return)."""
+    """Left-tail probability for log returns."""
     return float(np.sum(f_dist[states <= threshold]))
 
 P_base, states_base = build_state_price_matrix(
@@ -715,7 +734,6 @@ ax.axvline(x=0.70, color='silver', ls=':', lw=1.5, label='30% decline')
 ax.legend()
 plt.show()
 
-# Print specific tail probabilities
 for thresh, label in [(-0.25, '25% decline'), (-0.30, '30% decline'),
                        (-0.10, '10% decline')]:
     p_n = tail_prob(f_nat_base, states_base, thresh)
@@ -724,16 +742,16 @@ for thresh, label in [(-0.25, '25% decline'), (-0.30, '30% decline'),
           f"Risk-Neutral = {p_r:.4f},   Ratio = {p_r/p_n:.2f}x")
 ```
 
-The risk-neutral density assigns higher probability to large drops than the
-recovered natural density.
+The risk-neutral density assigns higher probability to large drops than the recovered
+natural density.
 
-The ratio captures the additional weight from risk
-aversion — the premium investors demand to bear tail risk.
+The ratio captures the additional weight from risk aversion -- the premium investors
+demand to bear tail risk.
 
 ## Testing efficient markets
 
-{cite}`Ross2015` shows that once the pricing kernel is recovered, one obtains
-an **upper bound on the Sharpe ratio** for any investment strategy:
+{cite:t}`Ross2015` shows that once the pricing kernel is recovered, one obtains an **upper
+bound on the Sharpe ratio** for any investment strategy:
 
 $$
 \sigma(\phi) \geq e^{-rT} \frac{|\mu_\text{excess}|}{\sigma_\text{asset}},
@@ -741,11 +759,10 @@ $$
 
 where $\sigma(\phi)$ is the standard deviation of the pricing kernel.
 
-This
-follows from the Hansen–Jagannathan bound {cite}`Hansen_Jagannathan_1991`.
+This follows from the Hansen–Jagannathan bound {cite}`Hansen_Jagannathan_1991`.
 
-Equivalently, the $R^2$ of any return-forecasting regression using publicly
-available information is bounded above by the variance of the pricing kernel:
+Equivalently, the $R^2$ of any return-forecasting regression using publicly available
+information is bounded above by the variance of the pricing kernel:
 
 $$
 R^2 \leq e^{2rT} \, \mathrm{Var}(\phi).
@@ -753,7 +770,7 @@ $$
 
 ```{code-cell} ipython3
 def kernel_variance(φ, f_nat):
-    """Variance of the pricing kernel under the natural measure."""
+    """Return Var(φ) and E[φ]."""
     E_φ = np.sum(φ * f_nat)
     E_φ2 = np.sum(φ**2 * f_nat)
     return E_φ2 - E_φ**2, E_φ
@@ -762,7 +779,7 @@ def kernel_variance(φ, f_nat):
 var_φ, E_φ = kernel_variance(φ_base, f_nat_base)
 std_φ = np.sqrt(var_φ)
 
-print(f"Pricing kernel statistics (one year):")
+print("Pricing kernel statistics:")
 print(f"  E[φ]     = {E_φ:.4f}")
 print(f"  Var(φ)   = {var_φ:.4f}")
 print(f"  Std(φ)   = {std_φ:.4f}")
@@ -772,35 +789,39 @@ print(f"Upper bound on R^2 in return forecasting: {var_φ:.4f}")
 
 ## Limitations and extensions
 
-The Recovery Theorem is a remarkable theoretical result, but several caveats
-apply in practice.
+The Recovery Theorem is a remarkable theoretical result, but several caveats apply in
+practice.
+
+**Finite state space.**
+
+The theorem requires a bounded, irreducible Markov chain.
+
+In continuous, unbounded state spaces (e.g., a lognormal diffusion), uniqueness fails
+because any exponential $e^{\alpha x}$ satisfies the characteristic equation.
+
+{cite:t}`CarrYu2012` establish recovery with a bounded diffusion.
 
-**Finite state space.** The theorem requires a bounded, irreducible Markov
-chain.
+**Transition independence.**
 
-In continuous, unbounded state spaces (e.g., a lognormal diffusion),
-uniqueness fails because any exponential $e^{\alpha x}$ satisfies the
-characteristic equation.
+If the kernel is not transition independent, recovery is not guaranteed.
 
-{cite}`CarrYu2012` establish recovery with a bounded
-diffusion.
+{cite:t}`BorovickaHansenScheinkman2016` show that the Ross recovery can confound the
+long-run risk component of the kernel with the natural probability distribution,
+yielding an incorrect decomposition.
 
-**Transition independence.** If the kernel is not transition independent,
-recovery is not guaranteed.
+**Empirical estimation.**
 
-{cite}`BorovickaHansenScheinkman2016` show that
-the Ross recovery can confound the long-run risk component of the kernel with
-the natural probability distribution, yielding an incorrect decomposition.
+Extracting reliable state prices from observed option prices requires careful
+interpolation and extrapolation.
 
-**Empirical estimation.** Extracting reliable state prices from observed option
-prices requires careful interpolation and extrapolation.
+The mapping from implied volatilities to state prices via the
+{cite}`BreedenLitzenberger1978` formula involves second derivatives, which amplify
+measurement error.
 
-The mapping from
-implied volatilities to state prices via the {cite}`BreedenLitzenberger1978` formula involves second derivatives, which amplify measurement error.
+**State dependence.**
 
-**State dependence.** The state must capture all relevant variables: the level
-of volatility, not just the current index level, is an important state variable
-for equity options.
+The state must capture all relevant variables: the level of volatility, not just the
+current index level, is an important state variable for equity options.
 
 ## Exercises
 
@@ -813,13 +834,14 @@ Consider the $3 \times 3$ state price matrix
 
 $$
 P = \begin{pmatrix}
-0.8 & 0.12 & 0.02 \\
-0.10 & 0.75 & 0.10 \\
-0.05 & 0.15 & 0.72
+0.5950 & 0.1700 & 0.0272 \\
+0.1594 & 0.5525 & 0.1360 \\
+0.0664 & 0.3188 & 0.5525
 \end{pmatrix}.
 $$
 
-(a) Compute the dominant eigenvalue $\delta$ and the corresponding eigenvector $z$ of $P$.
+(a) Compute the dominant eigenvalue $\delta$ and the corresponding eigenvector $z$ of
+$P$.
 
 (b) Use $z$ to recover the natural probability transition matrix $F$ via
 
@@ -829,9 +851,10 @@ $$
 
 (c) Verify that each row of $F$ sums to one and all entries are positive.
 
-(d) Compute the pricing kernel $\phi_i = 1/z_i$ for each state.  Does the
-    kernel decrease as we move from state 1 to state 3 (i.e., from bad to
-    good states)?
+(d) Compute the pricing kernel $\phi_i = 1/z_i$ for each state.
+
+Does the kernel decrease as we move from state 1 to state 3 (i.e., from bad to good
+states)?
 ```
 
 ```{solution-start} rt_ex1
@@ -842,11 +865,10 @@ $$
 import numpy as np
 from scipy.linalg import eig
 
-# (a) Dominant eigenvalue and eigenvector
 P_ex = np.array([
-    [0.80, 0.12, 0.02],
-    [0.10, 0.75, 0.10],
-    [0.05, 0.15, 0.72]
+    [0.5950, 0.1700, 0.0272],
+    [0.159375, 0.5525, 0.1360],
+    [0.06640625, 0.31875, 0.5525]
 ])
 
 eigenvalues, eigenvectors = eig(P_ex)
@@ -859,27 +881,24 @@ idx = np.argmax(real_ev)
 z_ex = real_evec[:, idx]
 if z_ex.min() < 0:
     z_ex = -z_ex
-z_ex = z_ex / z_ex[1]   # normalise to middle state
+z_ex = z_ex / z_ex[1]
 
-print(f"(a) Dominant eigenvalue δ = {δ_ex:.6f}")
-print(f"    Eigenvector z          = {z_ex}")
+print(f"δ = {δ_ex:.6f}")
+print(f"z = {z_ex}")
 
-# (b) Recover F
 D_ex = np.diag(1.0 / z_ex)
 D_inv_ex = np.diag(z_ex)
 F_ex = (1.0 / δ_ex) * D_ex @ P_ex @ D_inv_ex
 
-print(f"\n(b) Recovered natural transition matrix F:")
+print("\nRecovered F:")
 print(np.round(F_ex, 4))
 
-# (c) Row sums
-print(f"\n(c) Row sums of F: {np.round(F_ex.sum(axis=1), 8)}")
-print(f"    All non-negative: {(F_ex >= -1e-10).all()}")
+print(f"\nRow sums: {np.round(F_ex.sum(axis=1), 8)}")
+print(f"Nonnegative: {(F_ex >= -1e-10).all()}")
 
-# (d) Pricing kernel
 φ_ex = 1.0 / z_ex
-print(f"\n(d) Pricing kernel φ = {np.round(φ_ex, 4)}")
-print(f"    Kernel decreasing state 1->3: {φ_ex[0] > φ_ex[1] > φ_ex[2]}")
+print(f"\nφ = {np.round(φ_ex, 4)}")
+print(f"Decreasing: {φ_ex[0] > φ_ex[1] > φ_ex[2]}")
 ```
 
 ```{solution-end}
@@ -890,18 +909,18 @@ print(f"    Kernel decreasing state 1->3: {φ_ex[0] > φ_ex[1] > φ_ex[2]}")
 
 **Stochastic dominance.**
 
-Using the recovered $F$ and the normalised risk-neutral matrix
-$Q = P / \text{row sums}$ from the exercise above:
+Using the recovered $F$ and the normalised risk-neutral matrix $Q = P / \text{row sums}$
+from the exercise above:
 
 (a) Compute the one-step marginal distributions $f_j = F_{2,j}$ and $q_j = Q_{2,j}$
-    starting from state 2 (index 1 in Python).
+starting from state 2 (index 1 in Python).
 
-(b) Compute the CDFs $\hat F_k = \sum_{j \leq k} f_j$ and $\hat Q_k = \sum_{j
-    \leq k} q_j$ for each state.
+(b) Compute the CDFs $\hat F_k = \sum_{j \leq k} f_j$ and
+$\hat Q_k = \sum_{j \leq k} q_j$ for each state.
 
-(c) Verify numerically that $\hat F_k \leq \hat Q_k$ for every $k$, confirming
-    that the natural distribution stochastically dominates the risk-neutral
-    distribution (Theorem 3 of {cite}`Ross2015`).
+(c) Verify numerically that $\hat F_k \leq \hat Q_k$ for every $k$, confirming that the
+natural distribution stochastically dominates the risk-neutral distribution (Theorem 3
+of {cite}`Ross2015`).
 ```
 
 ```{solution-start} rt_ex2
@@ -912,12 +931,11 @@ $Q = P / \text{row sums}$ from the exercise above:
 import numpy as np
 
 P_ex = np.array([
-    [0.80, 0.12, 0.02],
-    [0.10, 0.75, 0.10],
-    [0.05, 0.15, 0.72]
+    [0.5950, 0.1700, 0.0272],
+    [0.159375, 0.5525, 0.1360],
+    [0.06640625, 0.31875, 0.5525]
 ])
 
-# Recompute F from exercise 1
 from scipy.linalg import eig
 eigenvalues, eigenvectors = eig(P_ex)
 real_mask = np.isreal(eigenvalues)
@@ -936,27 +954,23 @@ F_ex = (1.0 / δ_ex) * D_ex @ P_ex @ D_inv_ex
 F_ex = np.clip(F_ex, 0, None)
 F_ex /= F_ex.sum(axis=1, keepdims=True)
 
-# (a) Marginals from state 2 (index 1)
 start = 1
 f_marg = F_ex[start]
 q_marg = P_ex[start] / P_ex[start].sum()
 
-print("(a) One-step marginals from state 2:")
-print(f"    Natural f     = {np.round(f_marg, 4)}")
-print(f"    Risk-neutral q = {np.round(q_marg, 4)}")
+print("One-step marginals from state 2:")
+print(f"natural     = {np.round(f_marg, 4)}")
+print(f"risk-neutral = {np.round(q_marg, 4)}")
 
-# (b) CDFs
 cdf_nat = np.cumsum(f_marg)
 cdf_rn = np.cumsum(q_marg)
 
-print("\n(b) CDFs:")
+print("\nCDFs:")
 for k in range(3):
-    print(f"    State {k+1}: CDF_nat = {cdf_nat[k]:.4f},  CDF_rn = {cdf_rn[k]:.4f}")
+    print(f"state {k+1}: natural = {cdf_nat[k]:.4f}, risk-neutral = {cdf_rn[k]:.4f}")
 
-# (c) Stochastic dominance
 dominates = np.all(cdf_nat <= cdf_rn + 1e-10)
-print(f"\n(c) Natural CDF <= Risk-neutral CDF at all states: {dominates}")
-print("    -> Natural distribution stochastically dominates risk-neutral distribution")
+print(f"\nNatural CDF <= risk-neutral CDF: {dominates}")
 ```
 
 ```{solution-end}
@@ -976,8 +990,8 @@ Write a function `tail_risk_ratio(γ, threshold, μ, σ, δ, T)` that:
    and risk-neutral distributions starting from the middle state.
 4. Returns the ratio $p_\text{risk-neutral} / p_\text{natural}$.
 
-Using this function, plot the ratio as a function of $\gamma \in [1, 10]$ for
-a threshold of $-30\%$ (i.e., `threshold = -0.30`).
+Using this function, plot the ratio as a function of $\gamma \in [1, 10]$ for a
+threshold of $-30\%$ (i.e., `threshold = -0.30`).
 
 Explain the economic interpretation: why does a higher $\gamma$ raise the ratio?
 ```
@@ -992,13 +1006,11 @@ import matplotlib.pyplot as plt
 
 
 def tail_risk_ratio(γ, threshold, μ=0.08, σ=0.20, δ=0.02, T=1.0):
-    """
-    Compute ratio of risk-neutral to natural tail probability P(log-return < threshold).
-    """
+    """Risk-neutral / natural left-tail probability."""
     P_g, states_g = build_state_price_matrix(
         μ, σ, γ, δ, T, n_states=41, n_σ=5)
 
-    F_g, z_g, δ_g, φ_g = recover_natural_distribution(P_g)
+    F_g, _, _, _ = recover_natural_distribution(P_g)
 
     mid_g = len(states_g) // 2
 
@@ -1023,26 +1035,22 @@ plt.ylabel('risk-neutral / natural tail probability')
 plt.title('tail risk ratio for a 30% decline vs risk aversion')
 plt.show()
 
-# Economic interpretation
-print("Economic interpretation:")
-print("A higher γ means the pricing kernel falls more steeply in bad states.")
-print("This upweights bad outcomes (crashes) more heavily under risk-neutral")
-print("probabilities, raising the ratio — even if the true crash probability")
-print("(natural measure) stays the same.")
-print(f"\nRatio at γ=1.0: {tail_risk_ratio(1.0, -0.30):.2f}")
+print(f"Ratio at γ=1.0: {tail_risk_ratio(1.0, -0.30):.2f}")
 print(f"Ratio at γ=5.0: {tail_risk_ratio(5.0, -0.30):.2f}")
 print(f"Ratio at γ=10.0: {tail_risk_ratio(10.0, -0.30):.2f}")
 ```
 
-**Economic interpretation.**  A higher coefficient of risk aversion $\gamma$
-makes the pricing kernel steeper: the market assigns a larger premium per unit
-of probability to bad-state payoffs.  Risk-neutral probabilities, which
-incorporate this premium, overstate the natural probability of a crash by a
-factor that grows rapidly with $\gamma$.  This is the "dark matter" of finance:
-the high risk-neutral probability of a crash seen in option prices can be
-attributed mostly to risk aversion rather than a genuinely elevated natural
-probability of a catastrophe.
+**Economic interpretation.**
+
+A higher coefficient of risk aversion $\gamma$ makes the pricing kernel steeper: the
+market assigns a larger premium per unit of probability to bad-state payoffs.
+
+Risk-neutral probabilities, which incorporate this premium, overstate the natural
+probability of a crash by a factor that grows rapidly with $\gamma$.
+
+This is the "dark matter" of finance: the high risk-neutral probability of a crash seen
+in option prices can be attributed mostly to risk aversion rather than a genuinely
+elevated natural probability of a catastrophe.
 
 ```{solution-end}
 ```
-
diff --git a/lectures/rs_inventory_q.md b/lectures/rs_inventory_q.md
index 347932d20..0a65a44c7 100644
--- a/lectures/rs_inventory_q.md
+++ b/lectures/rs_inventory_q.md
@@ -378,7 +378,7 @@ The key is to identify where the randomness in profits actually comes from.
 Recall that per-period profit is $\pi(x, a, d) = \min(x, d) - ca - \kappa
 \mathbf{1}\{a > 0\}$.
 
-The ordering cost $ca + \kappa \mathbf{1}\{a > 0\}$ is **deterministic** — it
+The ordering cost $ca + \kappa \mathbf{1}\{a > 0\}$ is **deterministic** -- it
 is chosen before the demand shock is realized.
 
 So higher ordering shifts the level of profits down but does not affect their
@@ -387,9 +387,9 @@ variance.
 The variance comes from **revenue**: $\min(x, D)$.
 
 When inventory $x$ is high, $\min(x, D) \approx D$ for most demand
-realizations — revenue inherits the full variance of demand.
+realizations -- revenue inherits the full variance of demand.
 
-When inventory $x$ is low, $\min(x, D) \approx x$ for most realizations —
+When inventory $x$ is low, $\min(x, D) \approx x$ for most realizations --
 revenue is nearly deterministic, capped at the inventory level.
 
 A risk-sensitive agent therefore prefers lower inventory because it **caps the
@@ -484,7 +484,7 @@ $$
     \right].
 $$
 
-This is a fixed point equation in $q$ alone — $v^*$ has been eliminated.
+This is a fixed point equation in $q$ alone -- $v^*$ has been eliminated.
 
 ### The Q-learning update rule
 
@@ -515,7 +515,7 @@ standard Q-learning.
 Notice several differences from the risk-neutral case:
 
 - The Q-values are **positive** (expectations of exponentials) rather than signed.
-- The optimal policy is $\sigma(x) = \argmin_a q(x, a)$ — we **minimize**
+- The optimal policy is $\sigma(x) = \argmin_a q(x, a)$ -- we **minimize**
   rather than maximize, because $\psi^{-1}$ is decreasing.
 - The observed profit enters through $\exp(-\gamma R_{t+1})$ rather than
   additively.
@@ -523,7 +523,7 @@ Notice several differences from the risk-neutral case:
   than a scaled sum $\beta \cdot \max_{a'} q_t$.
 
 As before, the agent needs only to observe $x$, $a$, $R_{t+1}$, and
-$X_{t+1}$ — no model knowledge is required.
+$X_{t+1}$ -- no model knowledge is required.
 
 ### Implementation plan
 
@@ -552,7 +552,7 @@ Our implementation follows the same structure as the risk-neutral Q-learning in
 
 As in {doc}`inventory_q`, we use optimistic initialization to accelerate learning.
 
-The logic is the same — initialize the Q-table so that every untried action looks attractive, driving the agent to explore broadly — but the direction is reversed.
+The logic is the same -- initialize the Q-table so that every untried action looks attractive, driving the agent to explore broadly -- but the direction is reversed.
 
 Since the optimal policy *minimizes* $q$, "optimistic" means initializing the Q-table *below* the true values.  When the agent tries an action, the update pushes $q$ upward toward reality, making that entry look worse and prompting the agent to try other actions that still appear optimistically good.
 
diff --git a/lectures/survival_recursive_preferences.md b/lectures/survival_recursive_preferences.md
index b1347c279..8dfa5b673 100644
--- a/lectures/survival_recursive_preferences.md
+++ b/lectures/survival_recursive_preferences.md
@@ -315,7 +315,7 @@ subject to $z^1 + z^2 \leq 1$.
 
 The first line is the flow payoff from the two agents' felicity functions.
 
-The second line multiplies $\tilde{J}(\upsilon)$ by a term that combines the agents' discount rates, belief-weighted endowment drift, and a variance correction — these arise from absorbing the $Y^{1-\gamma}$ factor via Itô's lemma.
+The second line multiplies $\tilde{J}(\upsilon)$ by a term that combines the agents' discount rates, belief-weighted endowment drift, and a variance correction -- these arise from absorbing the $Y^{1-\gamma}$ factor via Itô's lemma.
 
 The third line multiplies $\tilde{J}'(\upsilon)$ by the drift of the Pareto share, which depends on the difference in discount rates and the belief-weighted response to endowment risk.
 
@@ -387,7 +387,7 @@ $$ (eq:wealth_decomp)
 
 The first term measures how much faster agent 1's portfolio grows.
 
-The second measures how much less agent 1 consumes out of wealth — a lower consumption-wealth ratio means more saving and faster wealth accumulation.
+The second measures how much less agent 1 consumes out of wealth -- a lower consumption-wealth ratio means more saving and faster wealth accumulation.
 
 When this total difference is positive, agent 1 survives; when negative, she shrinks toward extinction.
 
@@ -508,7 +508,7 @@ $$
 = \frac{1-\rho}{\rho} \left[(\omega^1 - \omega^2)\sigma_Y + \frac{(\omega^1 - \omega^2)^2}{2\gamma}\right]
 $$ (eq:consumption_rates)
 
-The term in brackets is the difference in *subjective* expected portfolio returns — what agent 1 believes she earns relative to agent 2.
+The term in brackets is the difference in *subjective* expected portfolio returns -- what agent 1 believes she earns relative to agent 2.
 
 The factor $(1-\rho)/\rho$ translates this perceived return advantage into a saving response.
 
@@ -772,8 +772,8 @@ plt.show()
 
 Each panel plots two curves in the $(\gamma, \rho)$ plane for a different value of agent 1's belief distortion $\omega^1$ (agent 2 has correct beliefs, $\omega^2 = 0$).
 
-- The dashed curve (blue) is where the boundary drift at $\upsilon = 0$ equals zero — condition (i) in {prf:ref}`survival_conditions`.
-- The solid curve (red) is where the boundary drift at $\upsilon = 1$ equals zero — condition (ii).
+- The dashed curve (blue) is where the boundary drift at $\upsilon = 0$ equals zero -- condition (i) in {prf:ref}`survival_conditions`.
+- The solid curve (red) is where the boundary drift at $\upsilon = 1$ equals zero -- condition (ii).
 - The shaded region between the two curves is where both agents survive.
 - The dotted diagonal $\gamma = \rho$ is the separable CRRA case, along which the agent with more accurate beliefs always dominates.
 
@@ -937,7 +937,7 @@ This is outcome (d) in {prf:ref}`survival_conditions`: neither boundary is repel
 
 As $\gamma$ increases past roughly 1, the blue curve crosses zero and becomes positive while the red curve stays negative.
 
-Now both boundaries are repelling and we enter the coexistence region — outcome (a).
+Now both boundaries are repelling and we enter the coexistence region -- outcome (a).
 
 ## The separable case
 
@@ -1004,7 +1004,7 @@ This figure simulates 20 sample paths of the Pareto share $\upsilon_t$ under sep
 
 Agent 2 has correct beliefs, so the log-odds drift is negative and all paths trend toward $\upsilon = 0$.
 
-Agent 1 is driven to extinction — the classical market-selection result of {cite:t}`Blume_Easley2006`.
+Agent 1 is driven to extinction -- the classical market-selection result of {cite:t}`Blume_Easley2006`.
 
 ## Asset pricing implications
 
@@ -1227,7 +1227,7 @@ plt.show()
 
 The left panel shows 20 sample paths of the Pareto share $\upsilon_t$ under parameters inside the coexistence region ($\omega^1 = 0.25$, $\omega^2 = 0$, $\gamma = 5$, IES $\approx 1.49$).
 
-Unlike the separable case in {numref}`fig-crra-pareto-paths`, the paths do not drift to zero — they repeatedly visit a wide range of values, bouncing between the two repelling boundaries.
+Unlike the separable case in {numref}`fig-crra-pareto-paths`, the paths do not drift to zero -- they repeatedly visit a wide range of values, bouncing between the two repelling boundaries.
 
 The right panel approximates the stationary density by pooling the second half of longer simulations.
 
diff --git a/lectures/theil_1.md b/lectures/theil_1.md
index caaa874ec..1af931bb9 100644
--- a/lectures/theil_1.md
+++ b/lectures/theil_1.md
@@ -53,7 +53,7 @@ from quantecon import LQ
 Their result justifies a convenient two-step algorithm:
 
 1. **Optimize** under perfect foresight (treat future exogenous variables as known).
-2. **Forecast** — substitute optimal forecasts for the unknown future values.
+2. **Forecast** -- substitute optimal forecasts for the unknown future values.
 
 The striking insight is that these two steps are completely separable.
 
@@ -177,7 +177,7 @@ As part of its computational tractability, this specialization delivers a striki
 
 Under quadratic $V$ and linear $g$, the optimal decision rule $h$ decomposes into two components applied in sequence.
 
-**Step 1 — Forecasting.** Define the infinite sequence of optimal point forecasts of all current and future states of nature:
+**Step 1 -- Forecasting.** Define the infinite sequence of optimal point forecasts of all current and future states of nature:
 
 ```{math}
 :label: eq:forecast_sequence_v3
@@ -195,7 +195,7 @@ The optimal forecast sequence is a (generally nonlinear) function of the current
 
 The function $h_2 : S_1 \to S_1^\infty$ depends entirely on the environment $(f, \Phi)$ and is obtained as the solution to a **pure forecasting problem**, with no reference to preferences or technology.
 
-**Step 2 — Optimization.** Given the forecast sequence $\tilde{z}_t$, the optimal action is a **linear** function of $\tilde{z}_t$ and $x_t$:
+**Step 2 -- Optimization.** Given the forecast sequence $\tilde{z}_t$, the optimal action is a **linear** function of $\tilde{z}_t$ and $x_t$:
 
 ```{math}
 :label: eq:optimization_rule_v3
@@ -226,21 +226,21 @@ The relationship of original interest, $h = T(f)$, then follows directly from {e
 
 ### Certainty equivalence and perfect foresight
 
-The name "certainty equivalence" reflects a further implication of the LQ structure: the function $h_1$ can be derived as if the agent **knew the future path $z_{t+1}, z_{t+2}, \ldots$ with certainty** — i.e., by solving the deterministic problem in which $\tilde{z}_t$ is treated as the realized path rather than a forecast.
+The name "certainty equivalence" reflects a further implication of the LQ structure: the function $h_1$ can be derived as if the agent **knew the future path $z_{t+1}, z_{t+2}, \ldots$ with certainty** -- i.e., by solving the deterministic problem in which $\tilde{z}_t$ is treated as the realized path rather than a forecast.
 
 Randomness of the environment affects actions only through the forecast $\tilde{z}_t$; conditional on $\tilde{z}_t$, the optimization problem is deterministic.
 
 This means the LQ problem decouples into:
 
- *  **Dynamic optimization under perfect foresight** — solve for $h_1$ from $(V, g)$ by treating $\tilde{z}_t$ as known, yielding a standard deterministic LQ regulator problem independent of the environment $(f, \Phi)$.
+ *  **Dynamic optimization under perfect foresight** -- solve for $h_1$ from $(V, g)$ by treating $\tilde{z}_t$ as known, yielding a standard deterministic LQ regulator problem independent of the environment $(f, \Phi)$.
 
- *  **Optimal linear prediction** — solve for $h_2 = S(f)$ from $(f, \Phi)$ using least-squares forecasting theory, which reduces to a standard Kalman/Wiener prediction formula when $f$ is itself linear.
+ *  **Optimal linear prediction** -- solve for $h_2 = S(f)$ from $(f, \Phi)$ using least-squares forecasting theory, which reduces to a standard Kalman/Wiener prediction formula when $f$ is itself linear.
 
 ### Cross-equation restrictions
 
 A hallmark of the rational expectations hypothesis as it appears in this framework is that it ties together what would otherwise be free parameters in different equations.
 
-The requirement that $\tilde{z}_t = h_2(z_t) = S(f)(z_t)$ — i.e., that agents' forecasts be *optimal* with respect to the *actual* law of motion $f$ — imposes **cross-equation restrictions** between the parameters of the forecasting rule $h_2$ and the parameters of the environment $f$.
+The requirement that $\tilde{z}_t = h_2(z_t) = S(f)(z_t)$ -- i.e., that agents' forecasts be *optimal* with respect to the *actual* law of motion $f$ -- imposes **cross-equation restrictions** between the parameters of the forecasting rule $h_2$ and the parameters of the environment $f$.
 
 These restrictions, rather than any conditions on distributed lags within a single equation, are the operative empirical content of rational expectations.
 
@@ -318,7 +318,7 @@ Prior practice, exemplified by the adaptive expectations mechanisms of {cite:t}`
 
 treating the coefficient $\lambda$ as a free parameter to be estimated from data, with no reference to the underlying environment $f$.
 
-The deficiency is not that {eq}`eq:adaptive_expectations_v3` is a distributed lag — linear forecasting rules are perfectly acceptable simplifications.
+The deficiency is not that {eq}`eq:adaptive_expectations_v3` is a distributed lag -- linear forecasting rules are perfectly acceptable simplifications.
 
 The deficiency is that the **coefficients** of the distributed lag are left unrestricted by theory.
 
@@ -381,7 +381,7 @@ print(f"Theoretical slope β/(1-β)*P = {theoretical_slope:.4f}")
 
 The slope is indeed $\tfrac{\beta}{1-\beta} P$, confirming the analytic formula.
 
-The value matrix $P$ is determined entirely by preferences and technology, not by the noise level — a direct consequence of the certainty equivalence principle.
+The value matrix $P$ is determined entirely by preferences and technology, not by the noise level -- a direct consequence of the certainty equivalence principle.
 
 ```{solution-end}
 ```
diff --git a/lectures/theil_2.md b/lectures/theil_2.md
index b342cda1f..7c1dd2a3d 100644
--- a/lectures/theil_2.md
+++ b/lectures/theil_2.md
@@ -41,7 +41,7 @@ problems.
 The property justifies a two-step algorithm for computing optimal decision rules:
 
 1. *Optimize* under perfect foresight (treat future exogenous variables as known).
-2. *Forecast* — substitute optimal forecasts for the unknown future values.
+2. *Forecast* -- substitute optimal forecasts for the unknown future values.
 
 This lecture extends the certainty equivalence property in two directions motivated by
 {cite}`hansen2004certainty`:
@@ -58,8 +58,8 @@ This lecture extends the certainty equivalence property in two directions motiva
   parameter $\theta$ and the risk-sensitivity parameter $\sigma$ are linked by
   $\theta = -\sigma^{-1}$.
 
-We illustrate all three settings — ordinary CE, robust CE, and the permanent income
-application — with Python code using `quantecon`.
+We illustrate all three settings -- ordinary CE, robust CE, and the permanent income
+application -- with Python code using `quantecon`.
 
 ### Model features
 
@@ -138,7 +138,7 @@ gain $F$ is invariant to the noise level $\sigma$ while $d$ grows with it.
 ---
 mystnb:
   figure:
-    caption: CE principle — policy vs. value
+    caption: CE principle -- policy vs. value
     name: fig-ce-policy-value
 ---
 a, b_coeff = 0.9, 1.0
@@ -180,7 +180,7 @@ plt.show()
 
 ### Setup and the multiplier problem
 
-The decision maker in Simon and Theil's setting knows his model exactly — he has
+The decision maker in Simon and Theil's setting knows his model exactly -- he has
 no doubt about the transition law {eq}`eq:z_transition_o`.
 
 Now suppose he suspects that the true
@@ -207,7 +207,7 @@ where $\eta_0$ parametrises the tolerated misspecification budget and $\hat{\mat
 is the expectation under the distorted law {eq}`eq:distorted_law`.
 
 To construct a *robust* decision rule the decision maker solves the
-**multiplier problem** — a two-player zero-sum dynamic game:
+**multiplier problem** -- a two-player zero-sum dynamic game:
 
 ```{math}
 :label: eq:multiplier
@@ -555,7 +555,7 @@ equation is
 
 With $\beta R = 1$ (Hall's case), this is
 $\mathbb{E}_t[\mu_{c,t+1}] = \mu_{ct}$, i.e., the **marginal utility of
-consumption is a martingale** — equivalently, consumption follows a random walk.
+consumption is a martingale** -- equivalently, consumption follows a random walk.
 
 The optimal policy is $\mu_{ct} = -F y_t$ where, from the solved-forward
 Euler equation, $F = [(R-1),\ (R-1)/(R - \rho)]$.
@@ -594,7 +594,7 @@ The consumption rule takes the certainty-equivalent form
         \sum_{j=0}^{\infty} R^{-j}(z_{t+j} - b)\right]\right)
 ```
 
-where $h_1$ — the first step of the CE algorithm — is *identical* to the
+where $h_1$ -- the first step of the CE algorithm -- is *identical* to the
 non-robust case.
 
 Only the expectations operator changes.
@@ -607,7 +607,7 @@ The resulting AR(1) dynamics for $\mu_{ct}$ become:
 ```
 
 with $\tilde{\varphi} < 1$, implying $\mathbb{E}_t[c_{t+1}] > c_t$ under the
-approximating model — a form of **precautionary saving**.
+approximating model -- a form of **precautionary saving**.
 
 The observational equivalence formula {eq}`eq:oe_locus` (derived below) immediately
 gives the robust AR(1) coefficient: $\tilde{\varphi} = 1/(\tilde{\beta} R)$
@@ -722,8 +722,8 @@ plt.show()
 ```
 
 The plot confirms the paper's key finding: *activating a preference for
-robustness is observationally equivalent — for consumption and saving behaviour
-— to increasing the discount factor*.
+robustness is observationally equivalent -- for consumption and saving behaviour
+-- to increasing the discount factor*.
 
 However, {cite:t}`HST_1999` show that the two
 parametrisations do *not* imply the same asset prices.
diff --git a/lectures/two_computation.md b/lectures/two_computation.md
index 606b3daef..38ec17088 100644
--- a/lectures/two_computation.md
+++ b/lectures/two_computation.md
@@ -2078,7 +2078,7 @@ The declining profile among retirees reflects the actuarial calculation: older r
 We now plot the aggregate transition paths for the labor tax, government debt, capital, and consumption under both schemes.
 
 ```{code-cell} ipython3
-# hh, tech, ss0, ss1 already in scope — just alias from dict for readability
+# hh, tech, ss0, ss1 already in scope -- just alias from dict for readability
 ss0_exp1 = exp1_exo['ss0']
 ss1_exp1 = exp1_exo['ss1']
 
diff --git a/lectures/var_dmd.md b/lectures/var_dmd.md
index 8fa8d190f..ee8e8c2bd 100644
--- a/lectures/var_dmd.md
+++ b/lectures/var_dmd.md
@@ -601,9 +601,14 @@ This is  a consequence of a  result established by Tu et al. {cite}`tu_Rowley` t
 
 
   
-**Proposition** The $p$ columns of $\Phi$ are eigenvectors of $\hat A$.
+```{prf:proposition}
+:label: prop-dmd-eigenvectors
 
-**Proof:** From formula {eq}`eq:Phiformula` we have
+The $p$ columns of $\Phi$ are eigenvectors of $\hat A$.
+```
+
+```{prf:proof}
+From formula {eq}`eq:Phiformula` we have
 
 $$  
 \begin{aligned}
@@ -620,20 +625,16 @@ $$
 \hat A \Phi = \Phi \Lambda .
 $$ (eq:APhiLambda)
 
-  
-
 Let $\phi_i$ be the $i$th  column of $\Phi$ and $\lambda_i$ be the corresponding $i$ eigenvalue of $\tilde A$ from decomposition {eq}`eq:tildeAeigenred`. 
 
 Equating the $m \times 1$ vectors that appear on the two  sides of  equation {eq}`eq:APhiLambda`  gives
 
-
 $$
 \hat A \phi_i = \lambda_i \phi_i .
 $$
 
 This equation confirms that  $\phi_i$ is an eigenvector of $\hat A$ that corresponds to eigenvalue  $\lambda_i$ of both  $\tilde A$ and $\hat A$.
-
-This concludes the proof. 
+```
 
 Also see {cite}`DDSE_book` (p. 238)
 

From b0d8c606371ac7072d78934efbcd5679df928c8e Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Sun, 26 Apr 2026 19:36:16 +0800
Subject: [PATCH 18/26] updates

---
 lectures/ross_recovery.md | 698 +++++++++++++++++++-------------------
 1 file changed, 355 insertions(+), 343 deletions(-)

diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index c5f10c97c..4f62e0349 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -40,17 +40,6 @@ the market's true beliefs about the future, and investors' aversion to risk.
 The link between them is the **pricing kernel**, which reweights natural probabilities
 to deliver state prices.
 
-For example, using the Arrow-security language from {doc}`ge_arrow`, suppose tomorrow
-has two states, recession and boom, with natural probabilities $(0.5, 0.5)$ and pricing
-kernels $(1.2, 0.7)$.
-
-The Arrow prices are then $(0.6, 0.35)$, so the riskless discount factor is the row sum
-$0.95$.
-
-Normalizing the Arrow prices by this row sum gives risk-neutral probabilities
-$(0.6/0.95, 0.35/0.95) \approx (0.63, 0.37)$, which overweight the recession state even
-though the natural probability of recession is only $0.5$.
-
 Separating beliefs from risk aversion has traditionally required parametric assumptions
 about the preferences of a representative investor.
 
@@ -59,15 +48,15 @@ about the preferences of a representative investor.
 Under a structural restriction on the pricing kernel called **transition independence**,
 the natural probability distribution and the pricing kernel can be uniquely recovered
 from state prices alone with no historical return data and no assumed utility
-function.
+function, provided the state-price system is Markov, irreducible, and sufficiently rich.
 
 This is the **Recovery Theorem**.
 
 It has several important implications:
 
-* It enables model-free extraction of the market's forward-looking probability
-  distribution from option prices.
-* It provides model-free tests of the efficient market hypothesis.
+* It shows how state-price transition data can identify the market's forward-looking
+  natural distribution when the assumption holds
+* It provides tests of the efficient market hypothesis.
 * It sheds light on the "dark matter" of finance: the probability of rare
   catastrophic events embedded in market prices.
 
@@ -127,11 +116,11 @@ where $f(\theta_i, \theta_j)$ is the natural (conditional) probability of transi
 from state $\theta_i$ to $\theta_j$.
 
 As in the representative-agent equilibrium calculation in {doc}`ge_arrow`, the
-canonical additively separable model with discount factor $\delta$ gives
+canonical additively separable model with discount factor $\beta$ gives
 
 $$
 \phi(\theta_i, \theta_j) = \frac{p(\theta_i, \theta_j)}{f(\theta_i, \theta_j)}
-    = \frac{\delta U'(c(\theta_j))}{U'(c(\theta_i))}.
+    = \frac{\beta U'(c(\theta_j))}{U'(c(\theta_i))}.
 $$ (eq:canon_ge)
 
 The key structural property this implies is **transition independence**.
@@ -143,11 +132,11 @@ The key structural property this implies is **transition independence**.
 :label: def-transition-independence
 
 A pricing kernel is **transition independent** if there exists a positive function $h$ on
-the state space and a positive scalar $\delta$ such that for every transition from state
+the state space and a positive scalar $\beta$ such that for every transition from state
 $\theta_i$ to $\theta_j$,
 
 $$
-\phi(\theta_i, \theta_j) = \delta \, \frac{h(\theta_j)}{h(\theta_i)}.
+\phi(\theta_i, \theta_j) = \beta \, \frac{h(\theta_j)}{h(\theta_i)}.
 $$
 ```
 
@@ -155,7 +144,8 @@ $$
 Transition independence says the kernel depends on the *ending* state and normalizes by
 the *beginning* state.
 
-It holds for any agent with intertemporally additive separable utility (where $h = U'$).
+In the representative-agent complete-markets environment above, it holds under
+intertemporally additive separable utility (where $h = U'$).
 
 In particular, this holds for {eq}`eq:canon_ge`.
 
@@ -167,20 +157,20 @@ Ross restriction.
 Under transition independence, the state-price equation becomes
 
 $$
-p(\theta_i, \theta_j) = \delta \, \frac{h(\theta_j)}{h(\theta_i)} \,
+p(\theta_i, \theta_j) = \beta \, \frac{h(\theta_j)}{h(\theta_i)} \,
     f(\theta_i, \theta_j).
 $$
 
-In matrix notation, defining the diagonal matrix $D$ with $D_{ii} = h(\theta_i)/\delta$,
+In matrix notation, defining the diagonal matrix $D$ with $D_{ii} = h(\theta_i)/\beta$,
 
 $$
-DP = \delta F D,
+DP = \beta F D,
 $$
 
 or equivalently,
 
 $$
-F = \frac{1}{\delta} D P D^{-1}.
+F = \frac{1}{\beta} D P D^{-1}.
 $$
 
 ## The recovery theorem
@@ -193,13 +183,13 @@ of ones.
 Substituting the expression for $F$:
 
 $$
-\frac{1}{\delta} D P D^{-1} e = e
+\frac{1}{\beta} D P D^{-1} e = e
 \quad \Longrightarrow \quad
-P z = \delta z, \quad z \equiv D^{-1} e.
+P z = \beta z, \quad z \equiv D^{-1} e.
 $$
 
-This is an **eigenvalue problem**: we seek a positive vector $z$ and scalar $\delta$
-satisfying $Pz = \delta z$.
+This is an **eigenvalue problem** where we seek a positive vector $z$ and scalar $\beta$
+satisfying $Pz = \beta z$.
 
 In principle every eigenvalue-eigenvector pair of $P$ is a formal solution, but only the
 one with a strictly positive eigenvector is economically valid: $D_{ii} = 1/z_i$ must be
@@ -212,19 +202,21 @@ The theorem below guarantees that exactly one such pair exists.
 
 If $A$ is a nonnegative irreducible matrix, then
 
-1. $A$ has a unique largest positive real eigenvalue $r$ (the Perron root).
+1. $A$ has a positive real eigenvalue $r$ equal to its spectral radius (the Perron root).
 2. There exists a strictly positive eigenvector $z \gg 0$ with $Az = rz$,
    unique up to scaling.
+3. No other eigenvector is strictly positive.
 ```
 
-The proof uses the invariance of the positive cone and irreducibility to isolate the
-unique positive ray associated with the Perron root.
+Other eigenvalues can have the same modulus when the matrix is imprimitive, but the
+strictly positive eigenvector is unique up to scale.
 
 See Section 1.2.3 of {cite}`Sargent_Stachurski_2024` for details.
 
-Applied to the recovery problem: the Perron root is $\delta$ (the subjective discount
-factor) and the Perron vector $z$ determines $D$ via $D_{ii} = 1/z_i$, closing the
-system uniquely.
+See also the full statement in {doc}`intro:eigen_II`.
+
+Applied to the recovery problem: the Perron root is $\beta$ (the subjective discount
+factor) and the Perron vector $z$ determines $D$ via $D_{ii} = 1/z_i$.
 
 
 ### Ross's recovery theorem
@@ -238,8 +230,10 @@ Irreducibility ensures the economy is not divided into disconnected sub-economie
 without it, the Perron–Frobenius theorem gives multiple candidate eigenvectors and
 recovery breaks down.
 
-Transition independence is the key economic restriction: it says the pricing kernel
-factors as $\delta h(\theta_j)/h(\theta_i)$, so the entire kernel is pinned down by a
+Transition independence is the key economic restriction.
+
+It says the pricing kernel
+factors as $\beta h(\theta_j)/h(\theta_i)$, so the entire kernel is pinned down by a
 single vector $h$ (or equivalently $z$).
 
 
@@ -250,10 +244,10 @@ Suppose prices provide no arbitrage opportunities, that the state
 price transition matrix $P$ is irreducible, and that the pricing kernel is transition
 independent.
 
-Then there exists a *unique* positive solution $(\delta, z, F)$ to the recovery problem.
+Then there exists a *unique* positive solution $(\beta, z, F)$ to the recovery problem.
 
-That is, for any set of state prices there is a unique compatible natural probability
-transition matrix and a unique pricing kernel.
+That is, under these assumptions, the state prices imply a unique compatible natural
+probability transition matrix and a unique transition pricing kernel.
 ```
 
 ```{prf:proof}
@@ -264,13 +258,13 @@ $Pz = \lambda z$.
 Setting
 
 $$
-\delta = \lambda, \qquad D_{ii} = \frac{1}{z_i},
+\beta = \lambda, \qquad D_{ii} = \frac{1}{z_i},
 $$
 
 the natural probability transition matrix is uniquely recovered as
 
 $$
-f_{ij} = \frac{1}{\delta} \frac{z_j}{z_i} \, p_{ij}.
+f_{ij} = \frac{1}{\beta} \frac{z_j}{z_i} \, p_{ij}.
 $$
 
 To confirm $F$ is stochastic, note that all entries are nonnegative (since
@@ -278,9 +272,9 @@ $p_{ij} \geq 0$ and $z_i, z_j > 0$) and
 
 $$
 \sum_j f_{ij}
-= \frac{1}{\delta z_i} \sum_j z_j \, p_{ij}
-= \frac{[Pz]_i}{\delta z_i}
-= \frac{\delta z_i}{\delta z_i} = 1.
+= \frac{1}{\beta z_i} \sum_j z_j \, p_{ij}
+= \frac{[Pz]_i}{\beta z_i}
+= \frac{\beta z_i}{\beta z_i} = 1.
 $$
 
 Uniqueness follows from the uniqueness of the Perron--Frobenius eigenvector.
@@ -288,81 +282,120 @@ Uniqueness follows from the uniqueness of the Perron--Frobenius eigenvector.
 
 ### Pricing kernel from the eigenvector
 
-The recovered kernel values are
+The recovered transition-kernel values are
 
 $$
-\phi(\theta_i, \theta_j) = \delta \frac{z_i}{z_j},
-\qquad h(\theta_i) = \frac{\delta}{z_i},
+\phi(\theta_i, \theta_j) = \beta \frac{z_i}{z_j},
+\qquad h(\theta_i) = \frac{\beta}{z_i},
 $$
 
-where $h(\theta_i) = \delta/z_i$ follows from $D_{ii} = h(\theta_i)/\delta = 1/z_i$.
+where $h(\theta_i) = \beta/z_i$ follows from $D_{ii} = h(\theta_i)/\beta = 1/z_i$.
 
-Destination states with high $z_j$ have **low** kernel values: for a fixed origin $i$,
-the kernel $\delta z_i/z_j$ is decreasing in $z_j$.
+Destination states with high $z_j$ have *low* kernel values: for a fixed origin $i$,
+the kernel $\beta z_i/z_j$ is decreasing in $z_j$.
 
 This means the market assigns relatively less pricing weight per unit of probability to
 high-$z_j$ outcomes -- consistent with those states being "good times" that require less
 insurance.
 
-### Corollary: risk-neutral pricing when rates are state-independent
+The same eigenvector argument also clarifies a useful limiting case.
+
+If the one-period
+bond price is identical in every current state, then the vector of ones is already the
+Perron vector, so recovery has no state-dependent change of measure left to perform.
 
 
 ```{prf:corollary}
 :label: cor-risk-neutral-recovery
 
-If the riskless rate is the same in all states ($Pe = \gamma e$ for
-some scalar $\gamma$), then the unique natural distribution consistent with recovery is
-the risk-neutral (martingale) distribution itself: $F = (1/\gamma) P$.
+If the riskless rate is the same in all states ($Pe = b e$ for
+some scalar $b$), then the unique natural distribution consistent with recovery is
+the risk-neutral (martingale) distribution itself: $F = (1/b) P$.
 ```
 
 ```{prf:proof}
-When $Pe = \gamma e$, the vector of ones $e$ is the Perron eigenvector with eigenvalue
-$\gamma$.
+When $Pe = b e$, the vector of ones $e$ is the Perron eigenvector with eigenvalue
+$b$.
 
 By the uniqueness part of the Perron--Frobenius theorem, $z = e$ (up to scaling) and
-$\delta = \gamma$.
+$\beta = b$.
 
-Setting $z = e$ gives $D = I$ (the identity matrix), so
+Setting $z = e$ gives $D = I$, so
 
 $$
-F = \frac{1}{\delta} D P D^{-1} = \frac{1}{\gamma} P. \qquad \square
+F = \frac{1}{\beta} D P D^{-1} = \frac{1}{b} P. \qquad \square
 $$
 ```
 
-This remarkable result says that with a constant interest rate and a bounded irreducible
-state space, recovery forces risk-neutrality -- a non-trivial restriction of the model.
+## Numerical example
+
+We now demonstrate the Recovery Theorem numerically.
+
+### Building a finite-state example
 
-## Python implementation
+We build the economy directly
+on a finite grid of log payoff states $s_1, \ldots, s_m$.
 
-We now implement the Recovery Theorem numerically.
+On this grid we choose three primitives:
 
-### Building a state price matrix from a lognormal model
+1. a row-stochastic natural transition matrix $F$,
+2. a subjective discount factor $\beta = e^{-\rho T}$, and
+3. a CRRA transition pricing kernel
+   $\phi_{ij} = \beta e^{-\gamma(s_j-s_i)}$.
 
-Following {cite}`Ross2015` Section IV, suppose the natural distribution of log-returns
-over one period is normal:
+The state-price matrix is then constructed from
+
+$$
+p_{ij} = \phi_{ij} f_{ij}.
+$$
+
+This means the Recovery Theorem assumptions hold by construction: $P$ is nonnegative,
+$F$ is a Markov transition matrix, and the kernel is transition independent with
+$z_i \propto e^{\gamma s_i}$. This benchmark therefore provides a strict test of
+whether the eigenvector recovery calculation returns the objects used to construct
+prices.
+
+To keep the example close to Ross's Section IV, we choose $F$ to have lognormal-shaped
+rows. In the unbounded continuous model one would write
 
 $$
 \log(S_T/S_0) \sim \mathcal{N}\!\left((\mu - \tfrac{1}{2}\sigma^2)T, \sigma^2 T\right).
 $$
 
-With a CRRA pricing kernel $\phi(S_0, S_T) = e^{-\delta T}(S_T/S_0)^{-\gamma}$, the
-state price density is
+Following Ross's Table I, we represent the distribution on a finite grid of states.
+
+Ross's table uses a fixed future payoff distribution, so its rows of $F$ are
+identical.
+
+Here we apply the same finite-grid construction to a Markov transition matrix with
+lognormal-shaped rows.
+
+Ross uses states from $-5$ to $+5$ standard deviations; we use
+the same range below.
+
+The truncation is an essential part of the finite-state model, not a cosmetic
+detail: it is what brings the example into the Perron--Frobenius setting.
+
+In the
+unbounded continuous lognormal growth model, Ross shows that recovery is not unique.
+
+On the finite grid, the natural transition probabilities and state prices are
 
 $$
 f_{ij} \propto
     n\!\left(\frac{s_j - s_i - (\mu - \frac{1}{2}\sigma^2)T}{\sigma\sqrt{T}}\right)
     \Delta s,
 \qquad
-p_{ij} = e^{-\delta T} e^{-\gamma(s_j - s_i)} f_{ij},
+p_{ij} = e^{-\rho T} e^{-\gamma(s_j - s_i)} f_{ij},
 $$
 
 where $s_i = \ln S_i$, $s_j = \ln S_j$, $n(\cdot)$ is the standard normal density, and
 the discretized probabilities $f_{ij}$ are normalized row by row.
 
-We discretize this onto a grid of $m$ states and build the matrix $P$.
+The next cell constructs this finite grid and builds $P$.
 
 ```{code-cell} ipython3
-def build_state_price_matrix(μ, σ, γ, δ, T=1.0, n_states=11, n_σ=5):
+def build_state_price_matrix(μ, σ, γ, ρ, T=1.0, n_states=11, n_σ=5):
     """Build a discretized lognormal/CRRA state-price matrix."""
     states = np.linspace(-n_σ * σ * np.sqrt(T),
                           n_σ * σ * np.sqrt(T),
@@ -375,7 +408,7 @@ def build_state_price_matrix(μ, σ, γ, δ, T=1.0, n_states=11, n_σ=5):
 
     drift = (μ - 0.5 * σ**2) * T
 
-    # First build a row-stochastic natural transition matrix on the bounded grid.
+    # First build a row-stochastic natural transition matrix on the bounded grid
     for i in range(m):
         s_i = states[i]
         for j in range(m):
@@ -386,23 +419,25 @@ def build_state_price_matrix(μ, σ, γ, δ, T=1.0, n_states=11, n_σ=5):
 
         F[i] = F[i] / F[i].sum()
 
-        # Price each Arrow claim as natural probability times the CRRA kernel.
+        # Price each Arrow claim as natural probability times the CRRA kernel
         for j in range(m):
             log_return = states[j] - s_i
-            kernel = np.exp(-δ * T) * np.exp(-γ * log_return)
+            kernel = np.exp(-ρ * T) * np.exp(-γ * log_return)
             P[i, j] = kernel * F[i, j]
 
     return P, states
 ```
 
+Now choose a calibration and build the state-price matrix.
+
 ```{code-cell} ipython3
 μ = 0.08    # 8% annual expected return
 σ = 0.20    # 20% annual volatility
 γ = 3.0     # CRRA coefficient
-δ = 0.02    # 2% annual discount rate
+ρ = 0.02    # 2% annual continuous discount rate
 T = 1.0     # one-year horizon
 
-P, states = build_state_price_matrix(μ, σ, γ, δ, T,
+P, states = build_state_price_matrix(μ, σ, γ, ρ, T,
                                      n_states=11, n_σ=5)
 
 print("State-price row sums:")
@@ -410,71 +445,163 @@ print(np.round(P.sum(axis=1), 4))
 print(f"Middle-state risk-free rate: {-np.log(P[5].sum()):.4f}")
 ```
 
+The row sums are the model-implied one-period bond prices in each current state. They
+vary near the boundaries because the finite grid truncates and renormalizes the
+conditional transition probabilities.
+
 ### Applying the recovery theorem
 
-The Recovery Theorem requires computing the **dominant eigenvector** of $P$.
+The Recovery Theorem requires computing the **Perron eigenvector** of $P$.
 
 ```{code-cell} ipython3
-def recover_natural_distribution(P):
-    """Recover natural probabilities and the pricing kernel from state prices."""
+def recover_natural_distribution(P, tol=1e-10):
+    """
+    Recover natural probabilities and the relative pricing kernel
+    from state prices.
+    """
+
     m = P.shape[0]
 
     eigenvalues, eigenvectors = eig(P)
+    eigenvalues = np.real_if_close(eigenvalues, tol=1000)
+    eigenvectors = np.real_if_close(eigenvectors, tol=1000)
 
-    # Ross recovery uses the Perron root and its strictly positive eigenvector.
+    # Ross recovery uses the Perron root and its strictly positive eigenvector
     real_mask = np.isreal(eigenvalues)
-    real_eigenvalues = eigenvalues[real_mask].real
-    real_eigenvectors = eigenvectors[:, real_mask].real
+    real_eigenvalues = np.asarray(
+        eigenvalues[real_mask].real, dtype=float)
+    real_eigenvectors = np.asarray(
+        eigenvectors[:, real_mask].real, dtype=float)
+
+    order = np.argsort(real_eigenvalues)[::-1]
+
+    for idx in order:
+        β_candidate = real_eigenvalues[idx]
+        z_candidate = real_eigenvectors[:, idx]
 
-    idx = np.argmax(real_eigenvalues)
-    δ_recovered = real_eigenvalues[idx]
-    z = real_eigenvectors[:, idx]
+        if np.mean(z_candidate) < 0:
+            z_candidate = -z_candidate
 
-    if np.mean(z) < 0:
-        z = -z
+        if β_candidate > 0 and np.all(z_candidate > tol):
+            β_recovered = β_candidate
+            z = z_candidate
+            break
+    else:
+        raise ValueError("No strictly positive real eigenvector found")
 
     z = z / z[m // 2]
 
     D = np.diag(1.0 / z)
     D_inv = np.diag(z)
 
-    # The diagonal similarity transform converts state prices into probabilities.
-    F = (1.0 / δ_recovered) * D @ P @ D_inv
+    # Converts state prices into probabilities
+    F = (1.0 / β_recovered) * D @ P @ D_inv
 
-    F = np.clip(F, 0, None)
-    F = F / F.sum(axis=1, keepdims=True)
+    min_entry = F.min()
+    row_sum_error = np.max(np.abs(F.sum(axis=1) - 1.0))
 
-    # The kernel is reported relative to the middle state normalization.
-    φ = 1.0 / z
+    if min_entry < -tol:
+        raise ValueError(f"Recovered F has negative entries: min={min_entry}")
 
-    return F, z, δ_recovered, φ
+    if row_sum_error > 1e-8:
+        raise ValueError(
+            f"Recovered F row sums are not one: max error={row_sum_error}"
+        )
+
+    # The kernel relative to the middle state normalization
+    φ_relative = 1.0 / z
+
+    return F, z, β_recovered, φ_relative
 ```
 
+There are two normalizations to keep separate.
+
+Ross's Table I reports the kernel shape
+with the middle state normalized to one, which is $1/z_j$ under our normalization
+$z_{\text{mid}}=1$.
+
+The actual one-period stochastic discount factor for a transition
+from the middle state to state $j$ is $\beta/z_j$.
+
 ```{code-cell} ipython3
-F, z, δ_rec, φ = recover_natural_distribution(P)
-
-print(f"Recovered discount factor δ = {δ_rec:.6f}  (true: {np.exp(-δ):.6f})")
-print(f"Recovered discount rate     = {-np.log(δ_rec):.6f}  (true: {δ:.6f})")
-print("Recovered kernel φ:")
-print(np.round(φ, 4))
-print("Row sums of recovered F:")
-print(np.round(F.sum(axis=1), 6))
+F, z, β_rec, φ_relative = recover_natural_distribution(P)
+ρ_rec = -np.log(β_rec)
+φ_middle = β_rec * φ_relative
+
+print(f"Recovered discount factor β = {β_rec:.6f}  (true: {np.exp(-ρ):.6f})")
+print(f"Recovered discount rate ρ = {ρ_rec:.6f}  (true: {ρ:.6f})")
+print("Ross-normalized kernel 1/z (middle state = 1):")
+print(np.round(φ_relative, 4))
+print("Actual one-period kernel from the middle state β × (1/z):")
+print(np.round(φ_middle, 4))
 ```
 
-### Visualizing natural vs. risk-neutral distributions
+Because we know the data-generating natural transition matrix and pricing kernel
+used to construct $P$, we can use them to verify that recovery works in this
+simulation.
 
-A key insight of {cite}`Ross2015` is that the natural distribution systematically
-differs from the risk-neutral one.
+In real data the natural transition matrix is unobserved, so these checks become
+internal diagnostics combined with an assessment of the recovery assumptions.
 
-In particular, the natural distribution stochastically dominates the risk-neutral
-distribution (Theorem 3 in {cite}`Ross2015`).
+
+```{code-cell} ipython3
+def true_lognormal_transition_matrix(states, μ, σ, T):
+    """
+    Construct the bounded-grid natural transition matrix used in the simulation.
+    """
+    m = len(states)
+    ds = states[1] - states[0]
+    drift = (μ - 0.5 * σ**2) * T
+    F_true = np.zeros((m, m))
+
+    for i in range(m):
+        log_returns = states - states[i]
+        F_true[i] = norm.pdf(log_returns, loc=drift,
+                             scale=σ * np.sqrt(T)) * ds
+        F_true[i] = F_true[i] / F_true[i].sum()
+
+    return F_true
+
+
+mid = len(states) // 2
+F_true = true_lognormal_transition_matrix(states, μ, σ, T)
+φ_middle_true = np.exp(-ρ * T) * np.exp(-γ * (states - states[mid]))
+P_reconstructed = β_rec * (z[:, None] / z[None, :]) * F
+
+print("Recovery diagnostics")
+print(f"max |β_rec - exp(-ρT)| = {abs(β_rec - np.exp(-ρ * T)):.2e}")
+print(f"max |φ_middle - true kernel| = "
+      f"{np.max(np.abs(φ_middle - φ_middle_true)):.2e}")
+print(f"max |F - true F| = {np.max(np.abs(F - F_true)):.2e}")
+print(f"max |P - recovered kernel times F| = "
+      f"{np.max(np.abs(P - P_reconstructed)):.2e}")
+```
+
+Indeed, the discrepancies are at the level of numerical roundoff.
+
+## Natural vs. risk-neutral distributions
+
+A key insight of {cite:t}`Ross2015` is that the natural distribution can differ
+systematically from the risk-neutral one.
+
+In this CRRA example, where states are ordered from low to high payoff, Theorem 3 of
+{cite:t}`Ross2015` implies that the natural marginal density **first-order
+stochastically dominates** the risk-neutral density: the CDF of the natural distribution
+lies *below* that of the risk-neutral distribution.
+
+Because the pricing kernel is declining (investors fear bad outcomes), risk-neutral
+probabilities overweight bad states and underweight good states relative to the natural
+measure.
+
+We first plot the natural distribution against the risk-neutral one and the recovered
+relative pricing kernel
 
 ```{code-cell} ipython3
 mid = len(states) // 2
 
 row_sums = P.sum(axis=1, keepdims=True)
 
-# Normalize Arrow prices by the one-period riskless bond price in each state.
+# Normalize Arrow prices by the one-period riskless bond price in each state
 Q_rn = P / row_sums
 
 f_nat = F[mid, :]
@@ -484,59 +611,31 @@ gross_returns = np.exp(states)
 
 fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
-axes[0].plot(gross_returns, f_nat, 'b-o', ms=5, label='natural (recovered)', lw=2)
-axes[0].plot(gross_returns, f_rn, 'r--s', ms=5, label='risk-neutral', lw=2)
+axes[0].plot(gross_returns, f_nat, label='natural (recovered)', lw=2)
+axes[0].plot(gross_returns, f_rn, label='risk-neutral', lw=2)
 axes[0].set_xlabel('gross return $S_T / S_0$')
 axes[0].set_ylabel('probability')
 axes[0].set_title('one-period marginal distributions')
 axes[0].legend()
 
-axes[1].plot(gross_returns, φ, 'g-^', ms=5, lw=2)
+axes[1].plot(gross_returns, φ_relative, 'g-^', lw=2)
 axes[1].set_xlabel('gross return $S_T / S_0$')
-axes[1].set_ylabel('kernel $\\phi$ (relative)')
-axes[1].set_title('recovered pricing kernel')
+axes[1].set_ylabel('relative kernel $1/z$')
+axes[1].set_title('recovered relative pricing kernel')
 plt.show()
 ```
 
-```{code-cell} ipython3
-E_nat = np.sum(f_nat * gross_returns)
-E_rn = np.sum(f_rn * gross_returns)
-std_nat = np.sqrt(np.sum(f_nat * (gross_returns - E_nat)**2))
-std_rn = np.sqrt(np.sum(f_rn * (gross_returns - E_rn)**2))
-
-# The row sum is the price of a sure one-dollar payoff next period.
-risk_free = np.sum(P[mid])
-
-print("One-period summary")
-print(f"{'':30s} {'Natural':>12s}   {'Risk-Neutral':>12s}")
-print("-" * 57)
-print(f"{'Expected gross return':30s} {E_nat:>12.4f}   {E_rn:>12.4f}")
-print(f"{'Std dev':30s} {std_nat:>12.4f}   {std_rn:>12.4f}")
-print(f"{'Risk-free discount factor':30s} {risk_free:>12.4f}")
-print(f"{'Annual risk-free rate':30s} {-np.log(risk_free):>12.4f}")
-print(f"{'Arithmetic equity premium':30s} {E_nat - 1/risk_free:>12.4f}")
-```
-
-### Stochastic dominance
-
-Theorem 3 of {cite}`Ross2015` shows that the natural marginal density **first-order
-stochastically dominates** the risk-neutral density: the CDF of the natural distribution
-lies *below* that of the risk-neutral distribution.
-
-Because the pricing kernel is declining (investors fear bad outcomes), risk-neutral
-probabilities overweight bad states and underweight good states relative to the natural
-measure.
+The CDF clearly shows the first-order stochastic dominance
 
 ```{code-cell} ipython3
 cdf_nat = np.cumsum(f_nat)
 cdf_rn = np.cumsum(f_rn)
 
 fig, ax = plt.subplots(figsize=(9, 5))
-ax.plot(gross_returns, cdf_nat, 'b-o', ms=5, lw=2, label='natural cdf')
-ax.plot(gross_returns, cdf_rn, 'r--s', ms=5, lw=2, label='risk-neutral cdf')
+ax.plot(gross_returns, cdf_nat, lw=2, label='natural cdf')
+ax.plot(gross_returns, cdf_rn, lw=2, label='risk-neutral cdf')
 ax.set_xlabel('gross return $S_T / S_0$')
 ax.set_ylabel('cumulative probability')
-ax.set_title('stochastic dominance: natural cdf lies below risk-neutral cdf')
 ax.legend()
 plt.show()
 
@@ -544,75 +643,22 @@ print(f"Natural CDF <= Risk-neutral CDF at all states: "
       f"{np.all(cdf_nat <= cdf_rn + 1e-10)}")
 ```
 
-## Extracting the pricing kernel and risk premium
-
-The pricing kernel recovered from $P$ via the Perron–Frobenius theorem has the following
-interpretation.
-
-In the CRRA model the kernel is proportional to $\exp(-\gamma \cdot \text{log-return})$,
-so it is decreasing in the return.
-
-The **log equity risk premium** computed below is the log expected gross return under
-the natural measure minus the continuously compounded risk-free rate:
-
-$$
-\text{log ERP}_i
-= \log\left(\sum_j f_{ij}\frac{S_j}{S_i}\right) - r_{f,i},
-\qquad
-r_{f,i} = -\log\left(\sum_j p_{ij}\right).
-$$
-
-```{code-cell} ipython3
-def compute_risk_premia(P, F, states):
-    """Compute log equity premia and risk-free rates by starting state."""
-    m = len(states)
-    gross_returns = np.exp(states)
-
-    rf = np.zeros(m)
-    erp = np.zeros(m)
-
-    for i in range(m):
-        discount = P[i].sum()
-        rf[i] = -np.log(discount)
-
-        # Gross return from current state i to future state j.
-        relative_returns = np.exp(states - states[i])
-        E_R_nat = np.sum(F[i] * relative_returns)
-
-        erp[i] = np.log(E_R_nat) - rf[i]
+The gap between the two CDFs is generated by the slope of the pricing kernel. 
 
-    return erp, rf
+In the
+CRRA benchmark, this slope is controlled by the risk-aversion coefficient $\gamma$.
 
+We next vary $\gamma$ to see how the recovered kernel and the natural/risk-neutral
+wedge change.
 
-erp, rf = compute_risk_premia(P, F, states)
-
-fig, axes = plt.subplots(1, 2, figsize=(13, 5))
-
-axes[0].plot(np.exp(states), rf * 100, 'b-o', ms=5, lw=2)
-axes[0].set_xlabel('current state $S / S_0$')
-axes[0].set_ylabel('annual risk-free rate (%)')
-axes[0].set_title('risk-free rate by state')
-
-axes[1].plot(np.exp(states), erp * 100, 'r-^', ms=5, lw=2)
-axes[1].set_xlabel('current state $S / S_0$')
-axes[1].set_ylabel('log equity risk premium (%)')
-axes[1].set_title('recovered log equity risk premium by state')
-
-plt.show()
-
-mid = len(states) // 2
-print("Middle state:")
-print(f"  Risk-free rate       approx {rf[mid]*100:.2f}% "
-      f"(calibration: {δ*100:.2f}%)")
-print(f"  Log equity premium  approx {erp[mid]*100:.2f}% "
-      f"(calibration: {(μ-δ)*100:.2f}%)")
-```
-
-## Sensitivity analysis: effect of risk aversion
+## Effect of risk aversion
 
 The shape of the pricing kernel, and hence the gap between natural and risk-neutral
 probabilities, depends on the coefficient of risk aversion $\gamma$.
 
+We illustrate this by plotting the relative pricing kernel $1/z$ and the gap between
+the natural and risk-neutral densities for a range of values of $\gamma$.
+
 ```{code-cell} ipython3
 γs = [1.0, 2.0, 3.0, 5.0, 8.0]
 colors = cm.viridis(np.linspace(0.1, 0.9, len(γs)))
@@ -620,8 +666,8 @@ colors = cm.viridis(np.linspace(0.1, 0.9, len(γs)))
 fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
 for γ_val, color in zip(γs, colors):
-    P_g, states_g = build_state_price_matrix(μ, σ, γ_val, δ, T)
-    F_g, z_g, δ_g, φ_g = recover_natural_distribution(P_g)
+    P_g, states_g = build_state_price_matrix(μ, σ, γ_val, ρ, T)
+    F_g, z_g, β_g, φ_relative_g = recover_natural_distribution(P_g)
     mid_g = len(states_g) // 2
 
     f_nat_g = F_g[mid_g, :]
@@ -630,14 +676,14 @@ for γ_val, color in zip(γs, colors):
 
     gross = np.exp(states_g)
 
-    axes[0].plot(gross, φ_g, color=color, lw=2,
+    axes[0].plot(gross, φ_relative_g, color=color, lw=2,
                  label=f'$\\gamma={γ_val:.0f}$')
     axes[1].plot(gross, f_nat_g - f_rn_g, color=color, lw=2,
                  label=f'$\\gamma={γ_val:.0f}$')
 
 axes[0].set_xlabel('gross return')
-axes[0].set_ylabel('kernel $\\phi$')
-axes[0].set_title('pricing kernel vs risk aversion')
+axes[0].set_ylabel('relative kernel $1/z$')
+axes[0].set_title('relative pricing kernel vs risk aversion')
 axes[0].legend(fontsize=9)
 
 axes[1].axhline(0, color='k', lw=0.8, ls='--')
@@ -649,51 +695,49 @@ axes[1].legend(fontsize=9)
 plt.show()
 ```
 
-The plots confirm the single-crossing property from Theorem 3 of {cite}`Ross2015`: for
-returns below some threshold $v$, risk-neutral probability exceeds natural probability;
-above $v$ the natural probability dominates.
+Because the states are ordered from low to high payoff, the plots show the
+single-crossing property from Theorem 3 of {cite}`Ross2015`: for returns below some
+threshold $v$, risk-neutral probability exceeds natural probability; above $v$ the
+natural probability dominates.
 
 A higher $\gamma$ amplifies this wedge.
 
 ## Recovering the discount rate
 
-A useful by-product of the Recovery Theorem is the **recovered subjective discount
-factor** $\delta$, which equals the Perron–Frobenius eigenvalue of $P$.
+A useful by-product of the Recovery Theorem is the *recovered subjective discount
+factor* $\beta$, which equals the Perron–Frobenius eigenvalue of $P$.
 
-The corresponding continuously compounded discount rate is $-\log \delta$.
+The corresponding continuously compounded discount rate is $\rho = -\log \beta$.
 
-Corollary 1 of {cite}`Ross2015` states that $\delta$ is bounded above by the largest
-observed interest factor (i.e., the maximum row sum of $P$):
+Corollary 1 of {cite:t}`Ross2015` states that $\beta$ is bounded above by the largest
+state-dependent one-period discount factor — equivalently, the maximum row sum of $P$:
 
 $$
-\delta \leq \max_i \sum_j p(\theta_i, \theta_j).
+\beta \leq \max_i \sum_j p(\theta_i, \theta_j).
 $$
 
+Sweeping the true $\rho$ over a grid and reporting the recovered values alongside the
+recovery error confirms that the eigenvalue calculation pins down $\beta$ accurately:
+
 ```{code-cell} ipython3
-true_δs = np.linspace(0.00, 0.06, 13)
-recovered_δs = []
-
-for d in true_δs:
-    P_d, _ = build_state_price_matrix(μ, σ, γ=3.0, δ=d, T=1.0)
-    _, _, d_rec, _ = recover_natural_distribution(P_d)
-    recovered_δs.append(d_rec)
-
-plt.figure(figsize=(8, 5))
-plt.plot(true_δs * 100, true_δs * 100, 'k--', lw=1.5, label='45 deg line')
-plt.plot(true_δs * 100,
-         [-np.log(d_r) * 100 for d_r in recovered_δs],
-         'bo-', ms=6, lw=2, label='recovered $\\delta$')
-plt.xlabel('true discount rate (%)')
-plt.ylabel('recovered discount rate (%)')
-plt.title('accuracy of recovered discount rate')
-plt.legend()
-plt.show()
+true_ρs = np.linspace(0.00, 0.06, 13)
+recovered_ρs = np.empty_like(true_ρs)
+
+for k, rho in enumerate(true_ρs):
+    P_d, _ = build_state_price_matrix(μ, σ, γ=3.0, ρ=rho, T=1.0)
+    _, _, β_d, _ = recover_natural_distribution(P_d)
+    recovered_ρs[k] = -np.log(β_d)
+
+print(
+    f"max |true ρ - recovered ρ| = {np.max(np.abs(true_ρs - recovered_ρs)):.2e}")
+np.column_stack([true_ρs, recovered_ρs])
 ```
 
 ## Tail risk: natural vs. risk-neutral probabilities of catastrophe
 
 One of the most striking applications of the Recovery Theorem is its ability to separate
-the market's genuine fear of catastrophes from the risk premium attached to them.
+the market's recovered natural probability of catastrophes from the risk premium
+attached to them.
 
 {cite:t}`barro2006rare` and {cite:t}`MehraPrescott1985` discuss how rare disasters might
 explain the equity premium puzzle.
@@ -702,9 +746,19 @@ The risk-neutral probability of a large decline is elevated both because (a) the
 assigns a high natural probability to such events and (b) the pricing kernel upweights
 bad outcomes.
 
-Ross's Recovery Machinery lets us decompose these two forces.
+Ross's recovery machinery lets us decompose these two forces.
+
+The next cell plots left-tail probabilities under the recovered natural and the
+risk-neutral measures from the middle state, so the gap between the curves isolates
+the pricing-kernel contribution to crash probabilities.
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Tail probabilities under the recovered natural and risk-neutral measures
+    name: fig-tail-probs
+---
 thresholds = np.linspace(-0.40, 0.10, 200)
 
 def tail_prob(f_dist, states, threshold):
@@ -712,9 +766,9 @@ def tail_prob(f_dist, states, threshold):
     return float(np.sum(f_dist[states <= threshold]))
 
 P_base, states_base = build_state_price_matrix(
-    μ, σ, γ=3.0, δ=0.02, T=1.0,
+    μ, σ, γ=3.0, ρ=0.02, T=1.0,
     n_states=41, n_σ=5)
-F_base, z_base, δ_base, φ_base = recover_natural_distribution(P_base)
+F_base, z_base, β_base, φ_relative_base = recover_natural_distribution(P_base)
 
 mid_b = len(states_base) // 2
 f_nat_base = F_base[mid_b]
@@ -724,75 +778,55 @@ prob_nat = [tail_prob(f_nat_base, states_base, t) for t in thresholds]
 prob_rn = [tail_prob(f_rn_base, states_base, t) for t in thresholds]
 
 fig, ax = plt.subplots(figsize=(10, 5))
-ax.plot(np.exp(thresholds), prob_nat, 'b-', lw=2, label='natural (recovered)')
-ax.plot(np.exp(thresholds), prob_rn, 'r--', lw=2, label='risk-neutral')
+ax.plot(np.exp(thresholds), prob_nat, lw=2, label='natural (recovered)')
+ax.plot(np.exp(thresholds), prob_rn, lw=2, label='risk-neutral')
 ax.set_xlabel('gross return threshold')
 ax.set_ylabel('probability of decline below threshold')
-ax.set_title('tail probabilities: natural vs. risk-neutral')
 ax.axvline(x=0.75, color='gray', ls=':', lw=1.5, label='25% decline')
 ax.axvline(x=0.70, color='silver', ls=':', lw=1.5, label='30% decline')
 ax.legend()
 plt.show()
-
-for thresh, label in [(-0.25, '25% decline'), (-0.30, '30% decline'),
-                       (-0.10, '10% decline')]:
-    p_n = tail_prob(f_nat_base, states_base, thresh)
-    p_r = tail_prob(f_rn_base, states_base, thresh)
-    print(f"P(log-return < {thresh:.0%}):   Natural = {p_n:.4f},   "
-          f"Risk-Neutral = {p_r:.4f},   Ratio = {p_r/p_n:.2f}x")
 ```
 
 The risk-neutral density assigns higher probability to large drops than the recovered
 natural density.
 
-The ratio captures the additional weight from risk aversion -- the premium investors
-demand to bear tail risk.
+In this CRRA
+simulation, increasing risk aversion makes the risk-neutral crash probability rise
+faster than the recovered natural crash probability.
+
+We will say more in {ref}`rt_ex3`.
 
 ## Testing efficient markets
 
-{cite:t}`Ross2015` shows that once the pricing kernel is recovered, one obtains an **upper
-bound on the Sharpe ratio** for any investment strategy:
+{cite:t}`Ross2015` shows that once the pricing kernel is recovered, one obtains an *upper
+bound on the Sharpe ratio* for strategies based on the stock-market filtration used in
+recovery:
 
 $$
-\sigma(\phi) \geq e^{-rT} \frac{|\mu_\text{excess}|}{\sigma_\text{asset}},
+\frac{|\mu_\text{excess}|}{\sigma_\text{asset}} \leq e^{rT}\, \sigma(M),
 $$
 
-where $\sigma(\phi)$ is the standard deviation of the pricing kernel.
+where $\sigma(M)$ is the standard deviation of the actual one-period stochastic discount
+factor projected on that filtration. Arbitrary orthogonal noise in a candidate kernel
+does not tighten this market-efficiency bound.
 
-This follows from the Hansen–Jagannathan bound {cite}`Hansen_Jagannathan_1991`.
+This follows from the {doc}`advacned:hansen_jagannathan_1991` {cite}`Hansen_Jagannathan_1991`.
 
-Equivalently, the $R^2$ of any return-forecasting regression using publicly available
-information is bounded above by the variance of the pricing kernel:
+Equivalently, under the Recovery Theorem assumptions, the $R^2$ of return-forecasting
+regressions based on that information set is bounded above by the variance of the
+pricing kernel:
 
 $$
-R^2 \leq e^{2rT} \, \mathrm{Var}(\phi).
+R^2 \leq e^{2rT} \, \mathrm{Var}(M).
 $$
 
-```{code-cell} ipython3
-def kernel_variance(φ, f_nat):
-    """Return Var(φ) and E[φ]."""
-    E_φ = np.sum(φ * f_nat)
-    E_φ2 = np.sum(φ**2 * f_nat)
-    return E_φ2 - E_φ**2, E_φ
-
-
-var_φ, E_φ = kernel_variance(φ_base, f_nat_base)
-std_φ = np.sqrt(var_φ)
-
-print("Pricing kernel statistics:")
-print(f"  E[φ]     = {E_φ:.4f}")
-print(f"  Var(φ)   = {var_φ:.4f}")
-print(f"  Std(φ)   = {std_φ:.4f}")
-print(f"\nHansen-Jagannathan bound on Sharpe ratio: {std_φ:.4f}")
-print(f"Upper bound on R^2 in return forecasting: {var_φ:.4f}")
-```
-
 ## Limitations and extensions
 
 The Recovery Theorem is a remarkable theoretical result, but several caveats apply in
 practice.
 
-**Finite state space.**
+*Finite state space:*
 
 The theorem requires a bounded, irreducible Markov chain.
 
@@ -801,7 +835,7 @@ because any exponential $e^{\alpha x}$ satisfies the characteristic equation.
 
 {cite:t}`CarrYu2012` establish recovery with a bounded diffusion.
 
-**Transition independence.**
+*Transition independence:*
 
 If the kernel is not transition independent, recovery is not guaranteed.
 
@@ -809,20 +843,15 @@ If the kernel is not transition independent, recovery is not guaranteed.
 long-run risk component of the kernel with the natural probability distribution,
 yielding an incorrect decomposition.
 
-**Empirical estimation.**
+*Empirical estimationL*
 
 Extracting reliable state prices from observed option prices requires careful
 interpolation and extrapolation.
 
 The mapping from implied volatilities to state prices via the
-{cite}`BreedenLitzenberger1978` formula involves second derivatives, which amplify
+{cite:t}`BreedenLitzenberger1978` formula involves second derivatives, which amplify
 measurement error.
 
-**State dependence.**
-
-The state must capture all relevant variables: the level of volatility, not just the
-current index level, is an important state variable for equity options.
-
 ## Exercises
 
 ```{exercise}
@@ -835,23 +864,24 @@ Consider the $3 \times 3$ state price matrix
 $$
 P = \begin{pmatrix}
 0.5950 & 0.1700 & 0.0272 \\
-0.1594 & 0.5525 & 0.1360 \\
-0.0664 & 0.3188 & 0.5525
+0.159375 & 0.5525 & 0.1360 \\
+0.06640625 & 0.31875 & 0.5525
 \end{pmatrix}.
 $$
 
-(a) Compute the dominant eigenvalue $\delta$ and the corresponding eigenvector $z$ of
+(a) Compute the Perron eigenvalue $\beta$ and the corresponding eigenvector $z$ of
 $P$.
 
 (b) Use $z$ to recover the natural probability transition matrix $F$ via
 
 $$
-f_{ij} = \frac{1}{\delta} \frac{z_j}{z_i} p_{ij}.
+f_{ij} = \frac{1}{\beta} \frac{z_j}{z_i} p_{ij}.
 $$
 
 (c) Verify that each row of $F$ sums to one and all entries are positive.
 
-(d) Compute the pricing kernel $\phi_i = 1/z_i$ for each state.
+(d) Compute the relative kernel component $1/z_i$ for each state. For a transition from
+state $i$ to state $j$, the full pricing kernel is $\beta z_i/z_j$.
 
 Does the kernel decrease as we move from state 1 to state 3 (i.e., from bad to good
 states)?
@@ -861,10 +891,9 @@ states)?
 :class: dropdown
 ```
 
-```{code-cell} ipython3
-import numpy as np
-from scipy.linalg import eig
+Here is one solution:
 
+```{code-cell} ipython3
 P_ex = np.array([
     [0.5950, 0.1700, 0.0272],
     [0.159375, 0.5525, 0.1360],
@@ -877,18 +906,18 @@ real_ev = eigenvalues[real_mask].real
 real_evec = eigenvectors[:, real_mask].real
 
 idx = np.argmax(real_ev)
-δ_ex = real_ev[idx]
+β_ex = real_ev[idx]
 z_ex = real_evec[:, idx]
 if z_ex.min() < 0:
     z_ex = -z_ex
 z_ex = z_ex / z_ex[1]
 
-print(f"δ = {δ_ex:.6f}")
+print(f"β = {β_ex:.6f}")
 print(f"z = {z_ex}")
 
 D_ex = np.diag(1.0 / z_ex)
 D_inv_ex = np.diag(z_ex)
-F_ex = (1.0 / δ_ex) * D_ex @ P_ex @ D_inv_ex
+F_ex = (1.0 / β_ex) * D_ex @ P_ex @ D_inv_ex
 
 print("\nRecovered F:")
 print(np.round(F_ex, 4))
@@ -896,9 +925,9 @@ print(np.round(F_ex, 4))
 print(f"\nRow sums: {np.round(F_ex.sum(axis=1), 8)}")
 print(f"Nonnegative: {(F_ex >= -1e-10).all()}")
 
-φ_ex = 1.0 / z_ex
-print(f"\nφ = {np.round(φ_ex, 4)}")
-print(f"Decreasing: {φ_ex[0] > φ_ex[1] > φ_ex[2]}")
+φ_relative_ex = 1.0 / z_ex
+print(f"\nrelative kernel 1/z = {np.round(φ_relative_ex, 4)}")
+print(f"Decreasing: {φ_relative_ex[0] > φ_relative_ex[1] > φ_relative_ex[2]}")
 ```
 
 ```{solution-end}
@@ -918,31 +947,23 @@ starting from state 2 (index 1 in Python).
 (b) Compute the CDFs $\hat F_k = \sum_{j \leq k} f_j$ and
 $\hat Q_k = \sum_{j \leq k} q_j$ for each state.
 
-(c) Verify numerically that $\hat F_k \leq \hat Q_k$ for every $k$, confirming that the
-natural distribution stochastically dominates the risk-neutral distribution (Theorem 3
-of {cite}`Ross2015`).
+(c) Verify numerically that $\hat F_k \leq \hat Q_k$ for every $k$, confirming stochastic
+dominance in this ordered three-state example.
 ```
 
 ```{solution-start} rt_ex2
 :class: dropdown
 ```
 
-```{code-cell} ipython3
-import numpy as np
-
-P_ex = np.array([
-    [0.5950, 0.1700, 0.0272],
-    [0.159375, 0.5525, 0.1360],
-    [0.06640625, 0.31875, 0.5525]
-])
+Here is one solution:
 
-from scipy.linalg import eig
+```{code-cell} ipython3
 eigenvalues, eigenvectors = eig(P_ex)
 real_mask = np.isreal(eigenvalues)
 real_ev = eigenvalues[real_mask].real
 real_evec = eigenvectors[:, real_mask].real
 idx = np.argmax(real_ev)
-δ_ex = real_ev[idx]
+β_ex = real_ev[idx]
 z_ex = real_evec[:, idx]
 if z_ex.min() < 0:
     z_ex = -z_ex
@@ -950,9 +971,7 @@ z_ex = z_ex / z_ex[1]
 
 D_ex = np.diag(1.0 / z_ex)
 D_inv_ex = np.diag(z_ex)
-F_ex = (1.0 / δ_ex) * D_ex @ P_ex @ D_inv_ex
-F_ex = np.clip(F_ex, 0, None)
-F_ex /= F_ex.sum(axis=1, keepdims=True)
+F_ex = (1.0 / β_ex) * D_ex @ P_ex @ D_inv_ex
 
 start = 1
 f_marg = F_ex[start]
@@ -981,7 +1000,7 @@ print(f"\nNatural CDF <= risk-neutral CDF: {dominates}")
 
 **Risk aversion and tail risk.**
 
-Write a function `tail_risk_ratio(γ, threshold, μ, σ, δ, T)` that:
+Write a function `tail_risk_ratio(γ, threshold, μ, σ, ρ, T)` that:
 
 1. Constructs the state price matrix $P$ using `build_state_price_matrix` with
    the given parameters and `n_states=41`.
@@ -991,7 +1010,7 @@ Write a function `tail_risk_ratio(γ, threshold, μ, σ, δ, T)` that:
 4. Returns the ratio $p_\text{risk-neutral} / p_\text{natural}$.
 
 Using this function, plot the ratio as a function of $\gamma \in [1, 10]$ for a
-threshold of $-30\%$ (i.e., `threshold = -0.30`).
+30 percent simple decline, i.e. `threshold = np.log(0.70)`.
 
 Explain the economic interpretation: why does a higher $\gamma$ raise the ratio?
 ```
@@ -1000,25 +1019,23 @@ Explain the economic interpretation: why does a higher $\gamma$ raise the ratio?
 :class: dropdown
 ```
 
-```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
-
+Here is one solution:
 
-def tail_risk_ratio(γ, threshold, μ=0.08, σ=0.20, δ=0.02, T=1.0):
+```{code-cell} ipython3
+def tail_risk_ratio(γ, threshold, μ=0.08, σ=0.20, ρ=0.02, T=1.0):
     """Risk-neutral / natural left-tail probability."""
     P_g, states_g = build_state_price_matrix(
-        μ, σ, γ, δ, T, n_states=41, n_σ=5)
+        μ, σ, γ, ρ, T, n_states=41, n_σ=5)
 
     F_g, _, _, _ = recover_natural_distribution(P_g)
 
     mid_g = len(states_g) // 2
 
     f_nat_g = F_g[mid_g]
-    f_rn_g  = P_g[mid_g] / P_g[mid_g].sum()
+    f_rn_g = P_g[mid_g] / P_g[mid_g].sum()
 
     p_nat = float(np.sum(f_nat_g[states_g <= threshold]))
-    p_rn = float(np.sum(f_rn_g[states_g  <= threshold]))
+    p_rn = float(np.sum(f_rn_g[states_g <= threshold]))
 
     if p_nat < 1e-12:
         return np.nan
@@ -1026,31 +1043,26 @@ def tail_risk_ratio(γ, threshold, μ=0.08, σ=0.20, δ=0.02, T=1.0):
 
 
 γs = np.linspace(1.0, 10.0, 20)
-ratios = [tail_risk_ratio(g, -0.30) for g in γs]
+threshold_30 = np.log(0.70)
+ratios = [tail_risk_ratio(g, threshold_30) for g in γs]
 
 plt.figure(figsize=(9, 5))
-plt.plot(γs, ratios, 'b-o', ms=5, lw=2)
+plt.plot(γs, ratios, '-o', ms=5, lw=2)
 plt.xlabel('risk aversion coefficient $\\gamma$')
 plt.ylabel('risk-neutral / natural tail probability')
 plt.title('tail risk ratio for a 30% decline vs risk aversion')
 plt.show()
-
-print(f"Ratio at γ=1.0: {tail_risk_ratio(1.0, -0.30):.2f}")
-print(f"Ratio at γ=5.0: {tail_risk_ratio(5.0, -0.30):.2f}")
-print(f"Ratio at γ=10.0: {tail_risk_ratio(10.0, -0.30):.2f}")
 ```
 
-**Economic interpretation.**
-
 A higher coefficient of risk aversion $\gamma$ makes the pricing kernel steeper: the
 market assigns a larger premium per unit of probability to bad-state payoffs.
 
-Risk-neutral probabilities, which incorporate this premium, overstate the natural
-probability of a crash by a factor that grows rapidly with $\gamma$.
+Risk-neutral probabilities incorporate this premium, so in this CRRA simulation the
+risk-neutral crash probability rises faster with $\gamma$ than the recovered natural
+crash probability.
 
-This is the "dark matter" of finance: the high risk-neutral probability of a crash seen
-in option prices can be attributed mostly to risk aversion rather than a genuinely
-elevated natural probability of a catastrophe.
+Recovery separates the market's estimated natural crash probability from the
+pricing-kernel premium attached to crash states.
 
 ```{solution-end}
 ```

From ef5c278f6ea8522bdf76117ffb87225d1a30607c Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Mon, 27 Apr 2026 00:16:04 +0800
Subject: [PATCH 19/26] updates

Co-authored-by: Copilot <copilot@github.com>
---
 lectures/misspecified_recovery.md | 2169 ++++++++++++++---------------
 lectures/ross_recovery.md         |   83 +-
 2 files changed, 1141 insertions(+), 1111 deletions(-)

diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index d71a9bd67..aa56f8ca1 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -20,7 +20,7 @@ kernelspec:
 </div>
 ```
 
-# Misspecified Recovery
+# Misspecified recovery
 
 ```{contents} Contents
 :depth: 2
@@ -28,829 +28,694 @@ kernelspec:
 
 ## Overview
 
-Asset prices are forward-looking: they encode investors' expectations about future
-economic states and their valuations of different risks.
+The lecture {doc}`ross_recovery` studies conditions under which recovery is valid.
 
-A long-standing question in finance is whether one can *recover* the probability
-distribution used by investors -- their subjective beliefs -- from observed asset prices
-alone.
+There, **transition independence** lets us use Arrow prices to separate investors'
+beliefs from the pricing kernel.
 
-{cite:t}`BorovickaHansenScheinkman2016` study the challenge of separating investors'
-beliefs from their risk preferences using **Perron–Frobenius theory**.
+This lecture asks what the same Perron--Frobenius calculation delivers when transition
+independence fails.
 
-Their key finding is that Perron–Frobenius theory applied to Arrow prices recovers a
-**long-term risk-neutral measure** that absorbs all long-horizon risk adjustments.
+{cite:t}`BorovickaHansenScheinkman2016` show that the stochastic discount factor can be
+decomposed into three pieces: a deterministic long-run discount component, a
+state-dependent eigenfunction ratio, and a martingale likelihood ratio.
 
-This recovered measure coincides with investors' subjective beliefs only under a
-stringent -- and often empirically implausible -- restriction on the stochastic discount
-factor.
+The first two pieces are exactly what the Perron--Frobenius eigenpair can absorb.
 
-After completing this lecture you will be able to:
+The martingale piece is different: it changes the probability measure.
 
-- Explain why Arrow prices alone cannot identify both transition probabilities and stochastic
-  discount factors without additional restrictions.
-- Construct **risk-neutral** and **long-term risk-neutral** transition matrices from Arrow
-  prices using the Perron–Frobenius eigenvalue–eigenvector decomposition.
-- Decompose any stochastic discount factor process into a trend component, a state-dependent
-  component, and a **martingale component**, and explain what the martingale encodes.
-- Identify the exact condition under which {cite}`Ross2015`'s Recovery Theorem succeeds,
-  and show that this condition fails in empirically relevant models with recursive utility
-  or permanent consumption shocks.
-- Simulate the {cite}`Bansal_Yaron_2004` long-run risk model and compare the stationary
-  distributions under the physical and recovered probability measures.
+In their words, it produces a probability measure that "absorbs long-term risk
+adjustments" {cite}`BorovickaHansenScheinkman2016`.
 
-### Related lectures
+Thus the probabilities recovered from Arrow prices need not be the correctly specified
+transition probabilities for the state process.
 
-- {doc}`affine_risk_prices`: affine models of the stochastic discount factor and term structure.
-- {doc}`markov_asset`: Markov asset pricing and stationary equilibria.
-- {doc}`harrison_kreps`: risk-neutral pricing and the change-of-measure approach.
+Instead, they can already include compensation for long-run risk.
 
-## Setup
+The likelihood ratio between the recovered probabilities and the correctly specified
+probabilities is the martingale component.
 
-```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
-import matplotlib.cm as cm
-from scipy import linalg
-from scipy.stats import gaussian_kde
-import warnings
-warnings.filterwarnings('ignore')
-
-plt.rcParams.update({
-    'axes.spines.top': False,
-    'axes.spines.right': False,
-    'font.size': 11,
-    'figure.dpi': 110,
-})
-```
-
-## Arrow prices and the identification challenge
-
-### Arrow prices and stochastic discount factors
-
-Consider a discrete-time economy with an $n$-state Markov chain $\{X_t\}$ governed by
-transition matrix $\mathbf{P} = [p_{ij}]$.
-
-An **Arrow price** $q_{ij}$ is the date-$t$ price of a claim that pays $\$1$ tomorrow in
-state $j$ given that the current state is $i$.
-
-Collect these prices in a matrix $\mathbf{Q} = [q_{ij}]$.
+If that martingale is constant, Ross recovery returns the correctly specified
+transition probabilities.
 
-A **stochastic discount factor** (SDF) $s_{ij}$ prices risk by discounting the payoff in
-state $j$ tomorrow when today's state is $i$.
+If it is not constant, the recovered measure embeds long-horizon risk adjustments.
 
-Arrow prices and the SDF are linked by
+In the examples below, this typically shifts probability toward adverse long-run-risk
+states, so the recovered measure looks more pessimistic than the correctly specified
+probability law.
 
-$$
-q_{ij} = s_{ij} \, p_{ij}.
-$$
-
-Given $\mathbf{Q}$, any pair $(\mathbf{S}, \mathbf{P})$ satisfying
-$q_{ij} = s_{ij} p_{ij}$ for all $(i,j)$ is consistent with the observed prices.
-
-The fundamental identification problem is that $\mathbf{Q}$ has $n^2$ entries,
-$\mathbf{P}$ has $n(n-1)$ free entries (rows sum to one), and $\mathbf{S}$ has $n^2$
-free entries -- so there are far more unknowns than equations.
-
-To make progress, we can impose restrictions on the SDF.
-
-Two classical restrictions are studied in the sections that follow.
+We will:
 
-### A three-state illustration
-
-To build intuition, we work with a three-state Markov chain representing **recession**,
-**normal**, and **expansion** phases of the business cycle.
-
-The physical transition matrix and consumption levels are:
+- use results from {doc}`ross_recovery` without re-proving it,
+- diagnose misspecification through the likelihood-ratio martingale,
+- show why recursive utility and permanent shocks break recovery,
+- measure the difference in a long-run risk model.
 
 ```{code-cell} ipython3
-P_phys = np.array([
-    [0.70, 0.25, 0.05],   # from recession
-    [0.15, 0.65, 0.20],   # from normal
-    [0.05, 0.30, 0.65],   # from expansion
-])
-
-c_levels = np.array([0.85, 1.00, 1.15])
-state_names = ['recession', 'normal', 'expansion']
-
-δ = -np.log(0.99)  # monthly subjective discount rate, so exp(-δ) = 0.99
-γ = 5.0     # coefficient of relative risk aversion
-
-n = len(c_levels)
-Q_mat = np.zeros((n, n))
-for i in range(n):
-    for j in range(n):
-        Q_mat[i, j] = np.exp(-δ) * (c_levels[j] / c_levels[i])**(-γ) * P_phys[i, j]
-
-print("Arrow price matrix Q")
-print(np.round(Q_mat, 5))
-print("Risk-free discount factors:", Q_mat.sum(axis=1).round(5))
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy import linalg
+from scipy.integrate import solve_ivp
+from scipy.stats import gaussian_kde
 ```
 
-## Risk-neutral probabilities
-
-The **risk-neutral restriction** sets
-
-$$
-\bar{s}_{i,j} = \bar{q}_i
-$$
-
-where $\bar{q}_i = \sum_j q_{ij}$ is the price of a one-period discount bond in state
-$i$.
-
-Under this restriction all future states are discounted equally from state $i$, so risk
-adjustments depend only on the current state.
-
-The resulting risk-neutral probabilities are
-
-$$
-\bar{p}_{ij} = \frac{q_{ij}}{\bar{q}_i}.
-$$
+The next cell contains code inherited from the previous lecture: row-normalizing Arrow
+prices, finding a positive Perron pair, and computing stationary distributions.
 
 ```{code-cell} ipython3
+:tags: [hide-input]
+
 def risk_neutral_probs(Q):
     """Normalize Arrow prices by one-period bond prices."""
     q_bonds = Q.sum(axis=1)
-    P_bar = Q / q_bonds[:, np.newaxis]
+    P_bar = Q / q_bonds[:, None]
     return P_bar, q_bonds
 
 
-P_bar, q_bonds = risk_neutral_probs(Q_mat)
+def perron_frobenius(Q):
+    """Positive Perron pair and induced long-term risk-neutral transition matrix."""
+    eigenvalues, eigenvectors = linalg.eig(Q)
+    eigenvalues = np.real_if_close(eigenvalues, tol=1000)
+    eigenvectors = np.real_if_close(eigenvectors, tol=1000)
 
-print("One-period bond prices:")
-for i, (s, qb) in enumerate(zip(state_names, q_bonds)):
-    print(f"  {s:12s}: {qb:.5f}  (annualized yield ~ {-np.log(qb)*12:.2%})")
+    real_mask = np.isreal(eigenvalues)
+    vals = np.asarray(eigenvalues[real_mask].real, dtype=float)
+    vecs = np.asarray(eigenvectors[:, real_mask].real, dtype=float)
+
+    for idx in np.argsort(vals)[::-1]:
+        exp_eta = vals[idx]
+        e = vecs[:, idx]
+        if e.sum() < 0:
+            e = -e
+        if exp_eta > 0 and np.all(e > 0):
+            break
+    else:
+        raise ValueError("No strictly positive Perron eigenvector found")
 
-print("\nRisk-neutral P_bar:")
-print(np.round(P_bar, 4))
-print("Row sums:", P_bar.sum(axis=1))
-```
+    e = e / e.sum()
+    eta = np.log(exp_eta)
+    P_hat = (1 / exp_eta) * Q * e[None, :] / e[:, None]
 
-```{note}
-Risk-neutral probabilities absorb **one-period** (short-run) risk adjustments.
+    if np.max(np.abs(P_hat.sum(axis=1) - 1)) > 1e-8:
+        raise ValueError("Recovered transition matrix is not stochastic")
+    if P_hat.min() < -1e-10:
+        raise ValueError("Recovered transition matrix has negative entries")
 
-They are widely used in financial engineering but are generally *not* equal to
-investors' beliefs.
+    return eta, exp_eta, e, P_hat
 
-When short-term interest rates vary across states, risk-neutral probabilities are also
-horizon-dependent: the $t$-period forward measure differs from $\bar{\mathbf{P}}^t$.
-```
 
-## Long-term risk-neutral probabilities: Perron–Frobenius theory
+def stationary_dist(P):
+    """Stationary distribution of an ergodic transition matrix."""
+    n = P.shape[0]
+    A = P.T - np.eye(n)
+    A[-1] = 1
+    b = np.zeros(n)
+    b[-1] = 1
+    return linalg.solve(A, b)
 
-### The eigenvalue problem
 
-The long-term behavior of discount factors is governed by a different restriction.
+def martingale_increment(Q, P):
+    """Likelihood-ratio increment from actual to recovered probabilities."""
+    eta, exp_eta, e, P_hat = perron_frobenius(Q)
+    H = np.ones_like(P)
+    mask = P > 0
+    H[mask] = P_hat[mask] / P[mask]
+    return H, eta, e, P_hat
+```
 
-**Long-term risk pricing** sets
+## One-period and long-term risk-neutral matrices
 
-$$
-\hat{s}_{ij} = \exp(\hat{\eta}) \frac{\hat{e}_i}{\hat{e}_j}
-$$
+Let $\mathbf{P}=[p_{ij}]$ denote the correctly specified transition matrix and
+$\mathbf{Q}=[q_{ij}]$ the Arrow price matrix.
 
-for a scalar $\hat{\eta}$ and a vector of positive numbers $\{\hat{e}_i\}$.
+Here "correctly specified" means the transition law that actually governs the Markov
+state in the model.
 
-Substituting into $q_{ij} = s_{ij} p_{ij}$ gives:
+The one-period stochastic discount factor (SDF) satisfies
 
 $$
-\hat{p}_{ij} = \exp(-\hat{\eta}) \, q_{ij} \, \frac{\hat{e}_j}{\hat{e}_i}.
+q_{ij} = s_{ij} p_{ij}.
 $$
 
-For $\hat{\mathbf{P}}$ to be a valid transition matrix (rows summing to one), we need
-$\sum_j \hat{p}_{ij} = 1$, which requires
+We will compare $\mathbf{P}$ with two probability matrices constructed from the same
+Arrow price matrix $\mathbf{Q}$.
+
+First, the **one-period risk-neutral matrix** divides each row of $\mathbf{Q}$ by the
+price of a one-period discount bond in the current state:
 
 $$
-\sum_j q_{ij} \hat{e}_j = \exp(\hat{\eta}) \hat{e}_i, \quad \text{i.e.,} \quad \mathbf{Q} \hat{\mathbf{e}} = \exp(\hat{\eta}) \hat{\mathbf{e}}.
+\bar p_{ij}
+= \frac{q_{ij}}{\sum_k q_{ik}}.
 $$
 
-This is an **eigenvalue–eigenvector problem** for the Arrow price matrix $\mathbf{Q}$.
+This matrix absorbs one-period risk adjustments into transition probabilities.
 
-The next theorem is the mathematical reason this construction is well defined.
+Second, the **long-term risk-neutral matrix** uses the positive Perron eigenpair of
+$\mathbf{Q}$.
 
-It is not yet a theorem about recovering investors' true beliefs.
+Let $(\exp(\hat \eta), \hat e)$ solve
 
-Instead, it proves that a positive pricing operator has one distinguished positive
-eigenvalue-eigenvector pair.
+$$
+\mathbf{Q}\hat e = \exp(\hat \eta)\hat e.
+$$
 
-The proof idea, stated informally, is that a positive matrix maps the positive cone back
-into itself.
+Then define
 
-Repeatedly applying the matrix and renormalizing pushes all positive vectors toward the
-same ray; the expansion rate along that ray is the Perron root.
+$$
+\hat p_{ij}
+= \exp(-\hat \eta) q_{ij} \frac{\hat e_j}{\hat e_i}.
+$$
 
-In this lecture we use that ray to define the state-dependent component
-$\hat{\mathbf e}$ and use the expansion rate to define the long-run discount rate
-$\hat{\eta}$.
+This construction removes the state-dependent Perron eigenfunction from Arrow prices
+and returns a stochastic matrix $\hat{\mathbf{P}}$.
 
-```{prf:theorem} Perron--Frobenius
-:label: thm-pf-mis
+In {doc}`ross_recovery`, transition independence pins down the split between $s_{ij}$
+and $p_{ij}$.
 
-If $A$ is a matrix with strictly positive entries, then
+Here we drop transition independence.
 
-1. $A$ has a unique largest positive real eigenvalue $r$ (the Perron root).
-2. There exists a strictly positive eigenvector $e \gg 0$ with $Ae = re$, unique up to scaling.
-```
+The question is whether $\hat{\mathbf{P}}$ still
+equals the correctly specified transition matrix $\mathbf{P}$.
 
-By {prf:ref}`thm-pf-mis`, the eigenvalue problem for $\mathbf{Q}$ has a unique solution.
+### Where recovery works
 
-What has been proved at this stage is uniqueness of the long-term risk-neutral
-construction, not equality between $\hat{\mathbf P}$ and the physical transition matrix
-$\mathbf P$.
+We start with a three-state economy: recession, normal, and expansion.
 
-This gives a unique construction:
+The correctly specified transition matrix is deliberately simple.
 
-1. Solve $\mathbf{Q} \hat{\mathbf{e}} = \exp(\hat{\eta}) \hat{\mathbf{e}}$ for the
-   dominant eigenvalue–eigenvector pair.
-2. Set $\hat{p}_{ij} = \exp(-\hat{\eta}) \, q_{ij} \, \hat{e}_j / \hat{e}_i$.
+For trend-stationary consumption and power utility, the SDF is
 
-{cite:t}`BorovickaHansenScheinkman2016` call the resulting $\hat{\mathbf{P}}$ the
-**long-term risk-neutral measure** because, under $\hat{\mathbf{P}}$, the long-horizon
-risk premia on stochastically growing cash flows are identically zero.
+$$
+s_{ij}=A\left(\frac{c_j}{c_i}\right)^{-\gamma}.
+$$
 
-### Python implementation
+This is a case where Ross recovery should return the correctly specified transition
+matrix.
 
 ```{code-cell} ipython3
-def perron_frobenius(Q):
-    """Return the Perron root, eigenvector, and long-term risk-neutral matrix."""
-    eigenvalues, eigenvectors = linalg.eig(Q)
-
-    # Use the positive Perron eigenpair and discard numerical complex roots.
-    real_mask = np.isreal(eigenvalues)
-    real_eigenvalues = eigenvalues[real_mask].real
-    real_eigenvectors = eigenvectors[:, real_mask].real
-
-    idx = np.argmax(real_eigenvalues)
-    exp_η = real_eigenvalues[idx]
-    e_hat = real_eigenvectors[:, idx]
-
-    if e_hat.sum() < 0:
-        e_hat = -e_hat
-    if np.any(e_hat <= 0):
-        raise ValueError("Dominant eigenvector is not strictly positive.")
-    e_hat = e_hat / e_hat.sum()
-
-    η_hat = np.log(exp_η)
-
-    # Change measure using the Perron eigenfunction.
-    P_hat = (1.0 / exp_η) * Q * e_hat[np.newaxis, :] / e_hat[:, np.newaxis]
-
-    return η_hat, exp_η, e_hat, P_hat
+P_true = np.array([
+    [0.70, 0.25, 0.05],
+    [0.15, 0.65, 0.20],
+    [0.05, 0.30, 0.65],
+])
 
+c_levels = np.array([0.997, 1.000, 1.003])
+state_names = ['recession', 'normal', 'expansion']
 
-η_hat, exp_η, e_hat, P_hat = perron_frobenius(Q_mat)
+δ = -np.log(0.99)   # monthly subjective discount rate
+γ_power = 5.0       # risk aversion
+g_c = 0.002         # monthly trend growth
 
-print(f"exp(η_hat) = {exp_η:.6f}")
-print(f"η_hat      = {η_hat:.5f}  (annualized ~ {η_hat*12:.4f})")
-print(f"e_hat      = {e_hat.round(5)}")
-print("\nLong-term risk-neutral P_hat:")
-print(np.round(P_hat, 4))
-print("Row sums:", P_hat.sum(axis=1))
+# Price Arrow claims as actual probabilities times the power-utility SDF
+S_power = (
+    np.exp(-δ - γ_power * g_c)
+    * (c_levels[None, :] / c_levels[:, None])**(-γ_power)
+)
+Q_power = S_power * P_true
 ```
 
-### Comparing the three probability measures
-
-```{code-cell} ipython3
-fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
-
-matrices = [
-    (P_phys, r'physical  $\mathbf{P}$', 'Blues'),
-    (P_bar, r'risk-neutral  $\bar{\mathbf{P}}$', 'Oranges'),
-    (P_hat, r'long-term risk-neutral $\hat{\mathbf{P}}$', 'Greens'),
-]
-
-for ax, (mat, title, cmap) in zip(axes, matrices):
-    im = ax.imshow(mat, cmap=cmap, vmin=0, vmax=0.85, aspect='auto')
-    ax.set_title(title, fontsize=12, pad=10)
-    ax.set_xticks(range(n));  ax.set_yticks(range(n))
-    ax.set_xticklabels(state_names, rotation=20, fontsize=9)
-    ax.set_yticklabels(state_names, fontsize=9)
-    ax.set_xlabel('next state', fontsize=9)
-    ax.set_ylabel('current state', fontsize=9)
-    plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
-    for i in range(n):
-        for j in range(n):
-            clr = 'white' if mat[i, j] > 0.45 else 'black'
-            ax.text(j, i, f'{mat[i,j]:.3f}', ha='center', va='center',
-                    fontsize=9, color=clr)
-
-plt.suptitle('transition matrices under alternative probability measures',
-             fontsize=13, y=1.02)
-plt.tight_layout()
-plt.show()
-```
+We now compute both risk-neutral matrices from the same Arrow price matrix.
 
 ```{code-cell} ipython3
-def stationary_dist(P):
-    """Stationary distribution of an ergodic transition matrix."""
-    n = P.shape[0]
-    A = (P.T - np.eye(n))
-    A[-1] = 1.0
-    b = np.zeros(n);  b[-1] = 1.0
-    return linalg.solve(A, b)
-
-π_phys = stationary_dist(P_phys)
+P_bar, q_bonds = risk_neutral_probs(Q_power)
+η_hat, exp_η, e_hat, P_hat = perron_frobenius(Q_power)
+π_true = stationary_dist(P_true)
 π_bar = stationary_dist(P_bar)
 π_hat = stationary_dist(P_hat)
-
-fig, ax = plt.subplots(figsize=(8, 4))
-x = np.arange(n)
-w = 0.25
-labels = [r'physical $P$', r'risk-neutral $\bar{P}$',
-          r'long-term risk-neutral $\hat{P}$']
-colors = ['steelblue', 'darkorange', 'forestgreen']
-for k, (π, lbl, col) in enumerate(zip([π_phys, π_bar, π_hat], labels, colors)):
-    bars = ax.bar(x + k*w, π, width=w, label=lbl, color=col, alpha=0.85,
-                  edgecolor='white')
-    for b_, v in zip(bars, π):
-        ax.text(b_.get_x() + w/2, v + 0.008, f'{v:.3f}',
-                ha='center', va='bottom', fontsize=9)
-
-ax.set_xticks(x + w);  ax.set_xticklabels(state_names)
-ax.set_ylabel('stationary probability')
-ax.set_title('stationary distributions under three probability measures')
-ax.legend(fontsize=9)
-plt.tight_layout();  plt.show()
-
-print("Stationary distributions")
-for lbl, π in zip(labels, [π_phys, π_bar, π_hat]):
-    print(f"  {lbl:45s}: {np.round(π,4)}")
 ```
 
-In this first trend-stationary power-utility example, the long-term risk-neutral measure
-$\hat{\mathbf{P}}$ coincides with the physical measure $\mathbf{P}$.
-
-This is the special success case in {cite}`BorovickaHansenScheinkman2016`: the SDF has
-only the Perron--Frobenius trend component and no martingale component.
-
-The one-period risk-neutral measure $\bar{\mathbf P}$, by contrast, still absorbs
-short-run risk adjustments and therefore differs from $\mathbf P$.
+The one-period risk-neutral matrix differs from the correctly specified matrix because
+it includes one-period risk adjustments.
 
-## The martingale decomposition
+The long-term risk-neutral transition matrix coincides with the correctly specified
+transition matrix because, after the Perron eigenfunction is removed, no
+likelihood-ratio term remains.
 
-### Decomposing the SDF process
+Here is the likelihood-ratio term explicitly.
 
-The decomposition in this section answers a diagnostic question: after we remove the
-long-run discount rate and the state-dependent Perron--Frobenius trend from the SDF, is
-anything left?
+Define
 
-If the answer is yes, the leftover term is a martingale that changes probabilities
-between $\mathbf P$ and $\hat{\mathbf P}$.
-
-The proof is obtained by writing the one-period pricing identity in Perron--Frobenius
-form and multiplying those one-period identities over time.
-
-Let $\hat{\mathbf{e}}$ and $\hat{\eta}$ solve the Perron–Frobenius problem.
+$$
+\hat h_{ij}
+= \frac{\hat p_{ij}}{p_{ij}}
+= \exp(-\hat\eta)s_{ij}\frac{\hat e_j}{\hat e_i}.
+$$
 
-Define the process
+For a path of states, the product
 
 $$
-\frac{\hat{H}_{t+1}}{\hat{H}_t} = (X_t)' \hat{\mathbf{H}} X_{t+1},
-\quad \text{where} \quad
-\hat{h}_{ij} = \frac{\hat{p}_{ij}}{p_{ij}}.
+\hat H_t
+= \prod_{\tau=1}^t \hat h_{X_{\tau-1}, X_\tau}
 $$
 
-Because $\sum_j \hat{h}_{ij} p_{ij} = \sum_j \hat{p}_{ij} = 1$, the process $\hat{H}$ is
-a martingale under the physical measure $\mathbf{P}$.
+is the likelihood-ratio martingale that changes probabilities from the correctly
+specified measure to the recovered measure.
 
-The accumulated SDF then admits the **multiplicative decomposition**:
+In the power-utility example, write
 
 $$
-S_t = \exp(\hat{\eta} t) \left(\frac{\hat{e}(X_0)}{\hat{e}(X_t)}\right)
-      \left(\frac{\hat{H}_t}{\hat{H}_0}\right).
+A = \exp(-\delta-\gamma g_c),
+\qquad
+s_{ij}=A\left(\frac{c_j}{c_i}\right)^{-\gamma}.
 $$
 
-The three components are:
+Taking $\hat e_i=c_i^\gamma$, up to scale, gives
 
-| Component | Interpretation |
-|---|---|
-| $\exp(\hat{\eta} t)$ | Deterministic exponential discounting; $-\hat{\eta}$ is the long-run yield |
-| $\hat{e}(X_0)/\hat{e}(X_t)$ | State-dependent trend; mean-stationary under $\hat{\mathbf{P}}$ |
-| $\hat{H}_t/\hat{H}_0$ | Martingale; encodes long-run risk adjustments |
+$$
+[\mathbf{Q}\hat e]_i
+= \sum_j A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}c_j^\gamma
+= A c_i^\gamma
+= A\hat e_i,
+$$
 
-```{code-cell} ipython3
-# Physical SDF implied by Arrow prices and physical probabilities.
-S_mat = np.where(P_phys > 0, Q_mat / P_phys, 0.0)
+so $\exp(\hat\eta)=A$.
 
-# Perron-Frobenius trend component of the SDF.
-S_hat = exp_η * e_hat[:, np.newaxis] / e_hat[np.newaxis, :]
+Consequently,
 
-# Martingale likelihood-ratio increment between P_hat and P.
-H_incr = np.where(P_phys > 0, P_hat / P_phys, 0.0)
+$$
+\hat h_{ij}
+= A^{-1}A\left(\frac{c_j}{c_i}\right)^{-\gamma}
+  \frac{c_j^\gamma}{c_i^\gamma}
+=1.
+$$
 
-print("SDF matrix S = Q/P:")
-print(np.round(S_mat, 4))
-print("\nTrend SDF S_hat = exp(η_hat) * e_hat_i / e_hat_j:")
-print(np.round(S_hat, 4))
-print("\nMartingale increment h_hat = P_hat/P:")
-print(np.round(H_incr, 4))
+```{code-cell} ipython3
+matrices = [
+    ("correctly specified P", P_true),
+    ("one-period risk-neutral P_bar", P_bar),
+    ("long-term risk-neutral P_hat", P_hat),
+]
 
-mart_check = (H_incr * P_phys).sum(axis=1)
-print(f"\nE[h_hat | X_t=i] = {mart_check}")
+for label, mat in matrices:
+    print(label)
+    print(np.round(mat, 3))
+    print()
 ```
 
-Here $\hat h_{ij}=1$ for every transition, so there is no recovery distortion.
+```{code-cell} ipython3
+H_power = np.divide(P_hat, P_true, out=np.ones_like(P_true), where=P_true > 0)
+e_theory = c_levels**γ_power
 
-The pessimistic distortion appears below once recursive utility introduces a nontrivial
-continuation-value martingale.
+print("Perron eigenfunction: numerical vs c^gamma")
+for name, e_num, e_th in zip(state_names, e_hat / e_hat[1],
+                             e_theory / e_theory[1]):
+    print(f"{name:9s}: {e_num:.6f}  {e_th:.6f}")
 
-## When does recovery succeed?
+print("\nlikelihood-ratio increment h_hat = P_hat / P")
+print(np.round(H_power, 6))
 
-### The Ross recovery condition
+print("\nconditional means under P")
+print(np.round((P_true * H_power).sum(axis=1), 6))
 
-{cite:t}`Ross2015` proposes to identify investors' subjective beliefs by imposing
+print(f"\nmax |h_hat - 1| = "
+      f"{np.max(np.abs(H_power[P_true > 0] - 1)):.2e}")
+```
 
-$$
-\widetilde{S}_t = \exp(-\delta t) \frac{m(X_t)}{m(X_0)}
-$$
+The output illustrates the difference between short-horizon and long-horizon risk
+adjustments.
 
-for some positive function $m$ and discount rate $\delta$ (Condition 4 in
-{cite}`BorovickaHansenScheinkman2016`).
+The one-period risk-neutral matrix $\bar{\mathbf{P}}$ is close to, but not the same as,
+the correctly specified matrix $\mathbf{P}$.
 
-Under this restriction, the SDF has **no martingale component**: $\hat{H}_t \equiv 1$.
+It changes the transition probabilities because one-period Arrow prices include
+one-period risk adjustments.
 
-The proposition below states the exact object being tested.
+By contrast, the long-term risk-neutral matrix $\hat{\mathbf{P}}$ is exactly the same
+as $\mathbf{P}$ in this example.
 
-It asks whether the Perron--Frobenius transition matrix $\hat{\mathbf P}$ is the same as
-the physical transition matrix $\mathbf P$.
+The diagnostic confirms why: the likelihood-ratio increment $\hat h_{ij}$ is one for
+every transition, so the martingale $\hat H_t$ is identically one.
 
-The proof is just an accounting exercise: divide the recovered probabilities by the
-physical probabilities and see whether the resulting likelihood-ratio increment is
-identically one.
+This is the condition under which Ross recovery returns the correctly specified
+transition matrix: after the Perron eigenfunction removes the state-dependent part of
+the SDF, no likelihood-ratio martingale remains.
 
-```{prf:proposition} Ross Recovery Condition
-:label: prop-ross-recovery-condition
+## The martingale diagnostic
 
-({cite}`BorovickaHansenScheinkman2016`) Recovery succeeds -- i.e.,
-$\hat{\mathbf{P}} = \mathbf{P}$ -- if and only if the physical stochastic discount
-factor takes the long-term risk pricing form
+Let $(\hat \eta, \hat e)$ be the positive Perron pair of $\mathbf{Q}$:
 
 $$
-s_{ij} = \exp(\hat{\eta}) \frac{\hat{e}_i}{\hat{e}_j}
+\mathbf{Q} \hat e = \exp(\hat\eta) \hat e.
 $$
 
-with $\hat{h}_{ij} \equiv 1$, so that the SDF has no martingale component.
-```
-
-```{prf:proof}
-Using $q_{ij}=s_{ij}p_{ij}$ and the Perron--Frobenius construction,
+The associated long-term risk-neutral transition matrix is
 
 $$
-\hat{p}_{ij}
-= \exp(-\hat{\eta})q_{ij}\frac{\hat{e}_j}{\hat{e}_i}
-= \exp(-\hat{\eta})s_{ij}p_{ij}\frac{\hat{e}_j}{\hat{e}_i}.
+\hat p_{ij}
+= \exp(-\hat\eta) q_{ij} \frac{\hat e_j}{\hat e_i}.
 $$
 
-Hence the likelihood-ratio increment between the recovered and physical measures is
+Compare $\hat{\mathbf{P}}$ with the correctly specified transition matrix
+$\mathbf{P}$ by defining
 
 $$
-\hat{h}_{ij}
-= \frac{\hat{p}_{ij}}{p_{ij}}
-= \exp(-\hat{\eta})s_{ij}\frac{\hat{e}_j}{\hat{e}_i}.
+\hat h_{ij} = \frac{\hat p_{ij}}{p_{ij}}.
 $$
 
-Thus $\hat{\mathbf P}=\mathbf P$ if and only if $\hat{h}_{ij}=1$ for every feasible
-transition $(i,j)$, which is equivalent to
+For a fixed current state $i$, the numbers $\hat h_{ij}$ average to one under the
+correctly specified transition probabilities:
 
 $$
-s_{ij} = \exp(\hat{\eta})\frac{\hat{e}_i}{\hat{e}_j}.
+\sum_j \hat h_{ij} p_{ij}=1.
 $$
 
-This is precisely the case in which the martingale term in the multiplicative
-decomposition is degenerate.
-```
-
-The critical question is: when is the martingale component degenerate?
-
-### Power utility with trend-stationary consumption
+Thus $\hat h_{ij}$ is a one-period likelihood-ratio increment.
 
-Consider a power-utility investor with risk aversion $\gamma$ and *trend-stationary*
-consumption $C_t = \exp(g_c t)(c \cdot X_t)$ where $c$ is a positive vector.
+Multiplying these
+increments over time gives a martingale.
 
-The one-period SDF is
+The one-period SDF can be written as
 
 $$
-s_{ij} = \exp(-\delta - \gamma g_c) \left(\frac{c_j}{c_i}\right)^{-\gamma}.
+s_{ij}
+= \exp(\hat\eta) \frac{\hat e_i}{\hat e_j} \hat h_{ij}.
 $$
 
-The corollary shows one important case where the recovery condition is satisfied.
+The Perron calculation therefore separates the SDF into:
 
-What is being proved is that trend-stationary consumption risk can be absorbed entirely
-into the state-dependent ratio $\hat e_i/\hat e_j$.
-
-The proof works by guessing the Perron--Frobenius eigenvector from marginal utility,
-then checking that the recovered transition probabilities reduce to the original
-physical probabilities.
+| Part | Role |
+|---|---|
+| $\exp(\hat\eta)$ | deterministic long-run discounting |
+| $\hat e_i / \hat e_j$ | state-dependent long-run term |
+| $\hat h_{ij}$ | likelihood ratio that changes probabilities |
 
-```{prf:corollary} Recovery under Power Utility
-:label: cor-recovery-power-utility
+If $\hat h_{ij}=1$ for every feasible transition, then the recovered transition matrix
+and the correctly specified transition matrix are the same.
 
-For a power-utility investor with trend-stationary consumption, the SDF takes the exact
-long-term risk pricing form with $\hat{e}_j = c_j^\gamma$ and
-$\hat{\eta} = -(\delta + \gamma g_c)$.
+This is the condition under which Ross recovery returns the correctly specified
+transition matrix.
 
-Therefore $\hat{h}_{ij} \equiv 1$ and Ross recovery succeeds exactly when consumption
-fluctuations around a deterministic trend are the only source of risk.
-```
-
-```{prf:proof}
-Let
+```{prf:proposition} Recovery diagnostic
+:label: prop-misspecified-recovery-diagnostic
 
-$$
-A = \exp(-\delta-\gamma g_c)
-$$
+For a finite-state Markov model with correctly specified transition matrix $\mathbf{P}$ and Arrow
+matrix $\mathbf{Q}$, Perron--Frobenius recovery returns the correctly specified transition matrix
+if and only if $\hat h_{ij}=1$ for every transition with $p_{ij}>0$.
 
-so that
+Equivalently, recovery returns the correctly specified transition matrix if and only if
+the SDF has no nonconstant likelihood-ratio martingale:
 
 $$
-q_{ij} = A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}.
+s_{ij}=\exp(\hat\eta)\frac{\hat e_i}{\hat e_j}.
 $$
+```
 
-Guess $\hat e_i=c_i^\gamma$.
-
-Then
+```{prf:proof}
+Using $q_{ij}=s_{ij}p_{ij}$,
 
 $$
-[\mathbf Q\hat{\mathbf e}]_i
-= \sum_j A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}c_j^\gamma
-= A c_i^\gamma \sum_j p_{ij}
-= A\hat e_i.
+\hat h_{ij}
+=\frac{\hat p_{ij}}{p_{ij}}
+=\exp(-\hat\eta)s_{ij}\frac{\hat e_j}{\hat e_i}.
 $$
 
-Thus $\exp(\hat\eta)=A$ and $\hat{\mathbf e}$ is the Perron--Frobenius eigenvector.
-
-Substituting into the recovered transition probabilities gives
-
-$$
-\hat p_{ij}
-= \frac{1}{A}q_{ij}\frac{\hat e_j}{\hat e_i}
-= \frac{1}{A}
-   A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}
-   \frac{c_j^\gamma}{c_i^\gamma}
-= p_{ij}.
-$$
+Thus $\hat{\mathbf{P}}=\mathbf{P}$ if and only if $\hat h_{ij}=1$ on every feasible
+transition.
 
-Hence $\hat h_{ij}=\hat p_{ij}/p_{ij}=1$ for all feasible transitions.
+This condition is the same as saying that the SDF can be written in the displayed form
+with no extra likelihood-ratio term.
 ```
 
-```{code-cell} ipython3
-gc = 0.002   # monthly trend growth
-
-S_trend = np.zeros((n, n))
-for i in range(n):
-    for j in range(n):
-        S_trend[i, j] = np.exp(-δ - γ*gc) * (c_levels[j]/c_levels[i])**(-γ)
+The power-utility calculation above illustrates the proposition: the likelihood-ratio increment $\hat h_{ij}$ is a constant one.
 
-Q_trend = S_trend * P_phys
+## Recursive utility
 
-_, exp_η_t, e_hat_t, P_hat_t = perron_frobenius(Q_trend)
-
-H_incr_trend = np.where(P_phys > 0, P_hat_t / P_phys, 0.0)
-
-print("Trend-stationary h_hat:")
-print(np.round(H_incr_trend, 6))
-print(f"Max deviation from 1: {np.abs(H_incr_trend[P_phys>0] - 1).max():.2e}")
-```
+The previous example worked because all risk adjustment in the SDF could be written as
+a ratio of a function of today's state to a function of tomorrow's state.
 
-### Recursive (Epstein–Zin) utility
+The Perron eigenfunction removes exactly that kind of term.
 
-The previous corollary is a success case for recovery.
+Recursive utility usually adds something else: a continuation-value term that behaves
+like a likelihood ratio.
 
-The next calculation is a failure case: it shows exactly where the power-utility proof
-breaks once continuation values enter the SDF.
-
-The key step is to identify an extra term that cannot, in general, be written only as a
-ratio of the current and next states.
-
-When the investor has **Epstein–Zin recursive preferences** with risk aversion
-$\gamma \neq 1$, continuation values $V_t$ satisfy the recursion
+For the unit-EIS Epstein--Zin case in {cite:t}`BorovickaHansenScheinkman2016`, with
+$C_t=\exp(g_c t)c(X_t)$, write the translated continuation value as $V_t=g_c t+v(X_t)$,
+and define
 
 $$
-V_t = \bigl[1-\exp(-\delta)\bigr] \log C_t
-      + \frac{\exp(-\delta)}{1-\gamma}
-        \log \mathbf{E}_t\bigl[\exp\bigl((1-\gamma)V_{t+1}\bigr)\bigr].
+v_i^*=\exp((1-\gamma)v_i).
 $$
 
-The SDF takes the form (see {cite}`BorovickaHansenScheinkman2016`, Example 2)
+The SDF is
 
 $$
-s_{ij} = \exp(-\delta - g_c)\frac{c_i}{c_j}
-         \left(\frac{v^*_j}{\mathbf{P}_i v^*}\right),
+s_{ij}
+= \exp(-\delta-g_c) \frac{c_i}{c_j}
+  \frac{v_j^*}{\sum_k p_{ik}v_k^*}.
 $$
 
-where $v^*_i = \exp\!\bigl[(1-\gamma)v_i\bigr]$ and $\mathbf{P}_i$ is the $i$-th row of
-$\mathbf{P}$.
+The denominator is the conditional expectation of $v_j^*$ given current state $i$, so
+the last fraction has conditional mean one under $\mathbf{P}$.
+
+It is therefore a likelihood-ratio increment.
+
+When $v^*$ is not constant, that likelihood ratio varies across next-period states.
+
+That variation is why recovery no longer returns the correct transition matrix.
 
-The additional factor $v^*_j/(\mathbf{P}_i v^*)$ introduces a **nontrivial martingale
-component** whenever $v^*$ is not constant across states.
+The next cell solves the finite-state continuation-value equation and builds the SDF.
 
 ```{code-cell} ipython3
-def solve_ez_finite(P, c, δ, γ, gc, tol=1e-12, max_iter=5000):
-    """Solve finite-state Epstein-Zin continuation values and SDF."""
+def solve_ez_unit_eis(P, c, δ, γ, g_c, tol=1e-12, max_iter=10_000):
+    """Finite-state unit-EIS Epstein-Zin continuation values and SDF."""
     β = np.exp(-δ)
     log_c = np.log(c)
     n = len(c)
-    flow = (1 - β) * log_c + β * gc
+    flow = (1 - β) * log_c + β * g_c
 
-    if abs(γ - 1.0) < 1e-10:
-        # Log utility avoids the (1-gamma) denominator in the recursion.
+    if abs(γ - 1) < 1e-10:
         v = linalg.solve(np.eye(n) - β * P, flow)
-        vstar = np.ones(n)
-        Pv = np.ones(n)
+        v_star = np.ones(n)
+        Pv_star = np.ones(n)
     else:
-        # Fixed-point iteration for the transformed continuation value term.
         v = log_c.copy()
         for _ in range(max_iter):
-            vstar = np.exp((1 - γ) * v)
-            Pv = P @ vstar
-            v_new = flow + β / (1 - γ) * np.log(Pv)
+            v_star = np.exp((1 - γ) * v)
+            Pv_star = P @ v_star
+            v_new = flow + β / (1 - γ) * np.log(Pv_star)
             if np.max(np.abs(v_new - v)) < tol:
                 v = v_new
                 break
             v = v_new
-        vstar = np.exp((1 - γ) * v)
-        Pv = P @ vstar
+        else:
+            raise ValueError("Epstein-Zin fixed point did not converge.")
 
-    # The SDF includes the continuation-value likelihood-ratio term.
-    s = np.zeros((n, n))
-    for i in range(n):
-        for j in range(n):
-            s[i, j] = np.exp(-δ - gc) * (c[i] / c[j]) * (vstar[j] / Pv[i])
+        v_star = np.exp((1 - γ) * v)
+        Pv_star = P @ v_star
 
-    return v, vstar, s
+    S = (
+        np.exp(-δ - g_c)
+        * (c[:, None] / c[None, :])
+        * (v_star[None, :] / Pv_star[:, None])
+    )
 
+    return v, v_star, S
+```
 
-gc_ex = 0.001   # monthly consumption trend growth
+At log utility, $v^*$ is constant and the martingale disappears.
 
-for γ_val, label in [(1.0, 'γ = 1  (log utility)'), (5.0, 'γ = 5  (risk aversion)')]:
-    v_ez, vstar_ez, S_ez = solve_ez_finite(P_phys, c_levels,
-                                            δ, γ_val, gc_ex)
-    Q_ez = S_ez * P_phys
-    _, _, _, P_hat_ez = perron_frobenius(Q_ez)
-    H_ez = np.where(P_phys > 0, P_hat_ez / P_phys, 0.0)
+As risk aversion rises, continuation values matter more.
 
-    π_hat_ez = stationary_dist(P_hat_ez)
-    print(f"\n{label}")
-    print(f"  Max |h_hat_ij - 1|        = {np.abs(H_ez[P_phys>0] - 1).max():.4f}")
-    print(f"  Stationary P_hat         = {π_hat_ez.round(4)}")
-    print(f"  Stationary P             = {π_phys.round(4)}")
-```
+The recovered probability measure then moves farther away from the correctly specified
+probability measure.
 
-```{code-cell} ipython3
-γs_ez = np.linspace(1.0, 10.0, 50)
-mart_errors = []
-π_rec_hat = []
-
-for γ_val in γs_ez:
-    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
-    Q_g = S_g * P_phys
-    _, _, _, Ph = perron_frobenius(Q_g)
-    H_g = np.where(P_phys > 0, Ph / P_phys, 0.0)
-    mart_errors.append(np.abs(H_g[P_phys > 0] - 1).max())
-    π_rec_hat.append(stationary_dist(Ph)[0])
-
-fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
-
-ax1.plot(γs_ez, mart_errors, color='firebrick', lw=2.5)
-ax1.set_xlabel('risk aversion  γ')
-ax1.set_ylabel(r'$\max_{i,j} |\hat{h}_{ij} - 1|$')
-ax1.set_title('martingale non-degeneracy vs risk aversion\n(Epstein–Zin utility)')
-
-ax2.plot(γs_ez, π_rec_hat, color='steelblue', lw=2.5,
-         label=r'recession weight under $\hat{P}$')
-ax2.axhline(π_phys[0], ls='--', color='grey', lw=1.5,
-            label=f'recession weight under $P$  ({π_phys[0]:.3f})')
-ax2.set_xlabel('risk aversion  γ')
-ax2.set_ylabel('stationary probability')
-ax2.set_title('recovered recession probability vs risk aversion')
-ax2.legend(fontsize=9)
-
-plt.tight_layout();  plt.show()
-```
+To make the mechanism visible in a small three-state example, the figure below uses
+the more dispersed consumption vector
+
+$$
+c=(0.85, 1.00, 1.15).
+$$
+
+The heatmap reports percentage deviations of the likelihood-ratio increment from one:
+$100(\hat h_{ij}-1)$.
+
+Positive entries are transitions that receive more probability under the recovered
+measure than under the correctly specified probability measure.
+
+The right panel reports the increase in the recovered recession probability, measured
+in percentage points.
 
 ```{code-cell} ipython3
-γ_ez_demo = 5.0
-_, _, S_ez_demo = solve_ez_finite(P_phys, c_levels, δ, γ_ez_demo, gc_ex)
-Q_ez_demo = S_ez_demo * P_phys
-_, _, _, P_hat_ez_demo = perron_frobenius(Q_ez_demo)
-H_incr_ez = np.where(P_phys > 0, P_hat_ez_demo / P_phys, 1.0)
+---
+mystnb:
+  figure:
+    caption: Recursive utility generates a nonconstant likelihood-ratio increment that distorts recovery.
+    name: fig-mr-recursive-martingale
+---
+c_recursive = np.array([0.85, 1.00, 1.15])
+γ_demo = 10.0
+_, _, S_demo = solve_ez_unit_eis(P_true, c_recursive, δ, γ_demo, g_c)
+Q_demo = S_demo * P_true
+H_demo, _, _, P_hat_demo = martingale_increment(Q_demo, P_true)
+H_dev = 100 * (H_demo - 1)
+
+γ_grid = np.linspace(1, 15, 80)
+rec_prob = []
+for γ in γ_grid:
+    _, _, S_g = solve_ez_unit_eis(P_true, c_recursive, δ, γ, g_c)
+    Q_g = S_g * P_true
+    _, _, _, P_hat_g = martingale_increment(Q_g, P_true)
+    rec_prob.append(stationary_dist(P_hat_g)[0])
+rec_prob = np.array(rec_prob)
+rec_prob_gain = 100 * (rec_prob - π_true[0])
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))
 
-vmax_h = max(1.5, H_incr_ez.max() * 1.05)
-vmin_h = min(0.5, H_incr_ez.min() * 0.95)
-im0 = axes[0].imshow(H_incr_ez, cmap='RdYlGn', vmin=vmin_h, vmax=vmax_h, aspect='auto')
-axes[0].set_title(
-    r'martingale increment $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij}$' '\n'
-    r'(Epstein–Zin utility, $\gamma=5$)',
-    fontsize=11)
-for i in range(n):
-    for j in range(n):
-        axes[0].text(j, i, f'{H_incr_ez[i,j]:.3f}',
-                     ha='center', va='center', fontsize=10)
-axes[0].set_xticks(range(n));  axes[0].set_yticks(range(n))
-axes[0].set_xticklabels(state_names, rotation=20, fontsize=9)
-axes[0].set_yticklabels(state_names, fontsize=9)
-axes[0].set_xlabel('next state');  axes[0].set_ylabel('current state')
-plt.colorbar(im0, ax=axes[0], fraction=0.046)
-
-γs_shift = np.linspace(1.0, 12, 60)
-rec_wts_ez = []
-for g in γs_shift:
-    _, _, S_g = solve_ez_finite(P_phys, c_levels, δ, g, gc_ex)
-    Q_g = S_g * P_phys
-    _, _, _, Ph = perron_frobenius(Q_g)
-    rec_wts_ez.append(stationary_dist(Ph)[0])
-
-axes[1].plot(γs_shift, rec_wts_ez, color='steelblue', lw=2.5)
-axes[1].axhline(π_phys[0], color='grey', ls='--', lw=1.5,
-                label=fr'physical recession prob = {π_phys[0]:.3f}')
-axes[1].set_xlabel('risk aversion  γ')
-axes[1].set_ylabel(r'recession weight under $\hat{P}$')
-axes[1].set_title(r'how $\gamma$ shifts the long-term risk-neutral measure'
-                  '\n(Epstein–Zin utility)')
-axes[1].legend(fontsize=9)
-plt.tight_layout();  plt.show()
+bound = np.max(np.abs(H_dev))
+im = axes[0].imshow(H_dev, cmap='coolwarm', vmin=-bound, vmax=bound, aspect='auto')
+axes[0].set_xticks(range(3))
+axes[0].set_yticks(range(3))
+axes[0].set_xticklabels(state_names, rotation=20)
+axes[0].set_yticklabels(state_names)
+axes[0].set_xlabel('next state')
+axes[0].set_ylabel(r'current state')
+axes[0].set_title(r'likelihood-ratio distortion, $\gamma=10$')
+
+for i in range(3):
+    for j in range(3):
+        axes[0].text(j, i, f"{H_dev[i, j]:.1f}",
+                     ha='center', va='center', fontsize=9)
+plt.colorbar(im, ax=axes[0], fraction=0.046, pad=0.04,
+             label=r'$100(\hat h_{ij}-1)$')
+
+axes[1].plot(γ_grid, rec_prob_gain, lw=2.5)
+axes[1].axhline(0, ls='--', lw=1.5, color='0.5')
+axes[1].set_xlabel(r"risk aversion $\gamma$")
+axes[1].set_ylabel('increase in recession probability\n(percentage points)')
+axes[1].set_title('recession probability distortion')
+axes[1].set_ylim(0, rec_prob_gain.max() * 1.08)
+
+plt.tight_layout()
+plt.show()
 ```
 
-At $\gamma = 1$ (log utility), $v^*=\exp((1-\gamma)v)$ is constant across states, so the
-continuation-value martingale is trivial and recovery succeeds.
+## Permanent shocks
+
+Recursive utility is one way to generate a nonconstant likelihood ratio.
+
+Permanent shocks provide another.
+
+Suppose consumption has a permanent multiplicative shock,
 
-For $\gamma > 1$, the transformed continuation value $v^*$ varies with the state,
-generating a non-degenerate martingale that grows with risk aversion.
+$$
+\log C_{t+1}-\log C_t
+= g + x(X_{t+1})-x(X_t) + \sigma \varepsilon_{t+1},
+$$
+
+where $\varepsilon_{t+1}$ is independent over time.
 
-## The long-run risk model
+With power utility, the SDF contains
+
+$$
+\exp(-\delta-\gamma g)
+\exp\{-\gamma[x(X_{t+1})-x(X_t)]\}
+\exp(-\gamma\sigma\varepsilon_{t+1}).
+$$
 
-We now illustrate the results quantitatively using the Bansal–Yaron
-{cite}`Bansal_Yaron_2004` long-run risk model, calibrated to
-{cite}`BorovickaHansenScheinkman2016` (Figure 1).
+The middle term depends only on the current and next Markov states.
 
-### Model setup
+It is a ratio of
+state functions, so the Perron eigenfunction can absorb it.
 
-The state vector $X_t = (X_{1t}, X_{2t})'$ follows the continuous-time diffusion
+The permanent shock term depends on the new shock $\varepsilon_{t+1}$.
+
+Because that
+shock is not summarized by the finite Markov state in this calculation, it cannot be
+removed by a state eigenfunction.
+
+After dividing by its conditional mean, the shock term becomes a likelihood-ratio
+increment:
+
+$$
+\frac{\exp(-\gamma\sigma\varepsilon_{t+1})}
+     {E[\exp(-\gamma\sigma\varepsilon_{t+1})]}.
+$$
+
+Thus permanent consumption shocks can break belief recovery, even under ordinary power
+utility.
+
+## Long-run risk
+
+We now use the Bansal--Yaron long-run risk model, in the calibration reported by
+{cite:t}`BorovickaHansenScheinkman2016`.
+
+The point is to see how different the recovered measure can look in a standard
+macro-finance model.
+
+The state vector $X_t=(X_{1t},X_{2t})'$ follows
 
 $$
 \begin{aligned}
-dX_{1t} &= \bar{\mu}_{11}(X_{1t} - \iota_1)\,dt + \sqrt{X_{2t}}\,\bar{\sigma}_1 dW_t \\
-dX_{2t} &= \bar{\mu}_{22}(X_{2t} - \iota_2)\,dt + \sqrt{X_{2t}}\,\bar{\sigma}_2 dW_t,
+dX_{1t}
+&= [\mu_{11}(X_{1t}-\iota_1)+\mu_{12}(X_{2t}-\iota_2)]dt
+   + \sqrt{X_{2t}}\sigma_1 dW_t, \\
+dX_{2t}
+&= \mu_{22}(X_{2t}-\iota_2)dt
+   + \sqrt{X_{2t}}\sigma_2 dW_t .
 \end{aligned}
 $$
 
-where $W_t$ is a three-dimensional Brownian motion.
+Here $X_1$ is predictable consumption growth and $X_2$ is stochastic volatility.
 
-Here $X_{1t}$ is the **predictable component of consumption growth** and $X_{2t}$ is
-**stochastic volatility**.
-
-The representative agent has Epstein–Zin preferences with unit elasticity of
+The representative agent has Epstein--Zin utility with unit elasticity of intertemporal
 substitution.
 
-The stochastic discount factor satisfies
+The continuation value introduces a martingale $H^*$ into the SDF:
 
 $$
-d\log S_t = -\delta\,dt - d\log C_t + d\log H^*_t,
+d\log S_t = -\delta dt - d\log C_t + d\log H_t^*.
 $$
 
-where $H^*$ is a martingale determined by the continuation value of the recursive
-utility.
+The next cell sets the calibration.
 
 ```{code-cell} ipython3
 lrr_params = dict(
-    δ = 0.002,         # subjective discount rate
-    γ = 10.0,          # risk aversion
-    μ11 = -0.021,      # mean reversion of X1
-    μ12 = 0.0,         # (under P; becomes non-zero under P_hat)
-    μ22 = -0.013,      # mean reversion of X2
-    ι1 = 0.0,          # long-run mean of X1
-    ι2 = 1.0,          # long-run mean of X2 (normalized)
-    σ1 = np.array([0.0, 0.00034, 0.0]),   # diffusion of X1 (1*3)
-    σ2 = np.array([0.0, 0.0, -0.038]),    # diffusion of X2 (1*3)
-    β_c0 = 0.0015,     # consumption drift constant
-    β_c1 = 1.0,        # loading on X1
-    β_c2 = 0.0,        # loading on X2
-    α_c = np.array([0.0078, 0.0, 0.0]),   # consumption diffusion (1*3)
+    δ=0.002,
+    γ=10.0,
+    μ11=-0.021,
+    μ12=0.0,
+    μ22=-0.013,
+    ι1=0.0,
+    ι2=1.0,
+    σ1=np.array([0.0, 0.00034, 0.0]),
+    σ2=np.array([0.0, 0.0, -0.038]),
+    β_c0=0.0015,
+    β_c1=1.0,
+    β_c2=0.0,
+    α_c=np.array([0.0078, 0.0, 0.0]),
 )
 ```
 
-### Solving the value function
+In this affine model, the continuation value and the Perron eigenfunction have a simple
+exponential-affine form.
+
+Think of the translated continuation value as having slopes $(v_1, v_2)$ with respect
+to predictable growth and volatility.
+
+The Perron eigenfunction has analogous slopes $(e_1, e_2)$.
+
+Once those slopes are known, changing probabilities is a drift adjustment: the
+instantaneous risk-neutral measure uses the SDF shock exposure, while the long-term
+risk-neutral measure also includes the Perron eigenfunction shock exposure.
 
-The log continuation value $v(X_t)$ is affine in the state:
-$v(x) = \bar{v}_0 + \bar{v}_1 x_1 + \bar{v}_2 x_2$.
+The next functions compute these pieces in that order.
 
-The coefficients satisfy the algebraic system in Appendix D of
-{cite}`BorovickaHansenScheinkman2016`.
+This is why the code below is organized as value-function coefficients, Perron
+coefficients, and then the drift of $X$ under the recovered and risk-neutral probability
+measures.
 
 ```{code-cell} ipython3
 def solve_value_function(p):
-    """Solve the affine Epstein-Zin value-function coefficients."""
-    δ, γ = p['δ'], p['γ']
-    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
-    σ1, σ2 = p['σ1'], p['σ2']
-    β_c1, β_c2 = p['β_c1'], p['β_c2']
-    α_c = p['α_c']
-
-    # The X1 coefficient solves a scalar linear equation.
+    """Slopes of the affine continuation value."""
+    δ, γ = p["δ"], p["γ"]
+    μ11, μ12, μ22 = p["μ11"], p["μ12"], p["μ22"]
+    σ1, σ2 = p["σ1"], p["σ2"]
+    β_c1, β_c2 = p["β_c1"], p["β_c2"]
+    α_c = p["α_c"]
+
     v1 = β_c1 / (δ - μ11)
 
-    # The X2 coefficient solves the quadratic equation from the affine recursion.
+    # The volatility slope solves a scalar quadratic.
     A_vec = α_c + σ1 * v1
     B_vec = σ2
 
@@ -858,635 +723,763 @@ def solve_value_function(p):
     b = (μ22 - δ) + (1 - γ) * np.dot(A_vec, B_vec)
     c = β_c2 + μ12 * v1 + 0.5 * (1 - γ) * np.dot(A_vec, A_vec)
 
-    disc = b**2 - 4*a*c
+    disc = b**2 - 4 * a * c
     if disc < 0:
         raise ValueError("Value function does not exist for these parameters.")
 
     v2 = (-b - np.sqrt(disc)) / (2 * a)
-    return v1, v2, A_vec, B_vec
-
+    return v1, v2
 
-v1, v2, A_vec, B_vec = solve_value_function(lrr_params)
-print(f"Value-function slope on X1:  v_bar1 = {v1:.4f}")
-print(f"Value-function slope on X2:  v_bar2 = {v2:.4f}")
-```
 
-### Perron–Frobenius and recovered dynamics
+def solve_pf_lrr(p, v1, v2):
+    """Perron eigenfunction slopes and the SDF diffusion loading."""
+    δ, γ = p["δ"], p["γ"]
+    μ11, μ12, μ22 = p["μ11"], p["μ12"], p["μ22"]
+    ι1, ι2 = p["ι1"], p["ι2"]
+    σ1, σ2 = p["σ1"], p["σ2"]
+    α_c = p["α_c"]
+    β_c0, β_c1, β_c2 = p["β_c0"], p["β_c1"], p["β_c2"]
 
-```{code-cell} ipython3
-def solve_pf_lrr(p, v1, v2, A_vec):
-    """Solve the LRR Perron-Frobenius coefficients."""
-    δ, γ = p['δ'], p['γ']
-    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
-    ι1, ι2 = p['ι1'], p['ι2']
-    σ1, σ2 = p['σ1'], p['σ2']
-    α_c = p['α_c']
-    β_c0 = p['β_c0']
-    β_c1, β_c2 = p['β_c1'], p['β_c2']
-
-    # H* is the continuation-value martingale in the recursive utility SDF.
     α_h_star = (1 - γ) * (α_c + σ1 * v1 + σ2 * v2)
-
-    # α_s is the diffusion loading of d log S_t.
     α_s = -α_c + α_h_star
 
-    # The Ito correction uses d log H*, not the total log-SDF diffusion.
     β_s11 = -β_c1
     β_s12 = -β_c2 - 0.5 * np.dot(α_h_star, α_h_star)
-    β_s0 = (-δ - β_c0
-            - 0.5 * ι2 * np.dot(α_h_star, α_h_star))
+    β_s0 = -δ - β_c0 - 0.5 * ι2 * np.dot(α_h_star, α_h_star)
 
-    # The first Perron coefficient solves a scalar linear equation.
     e1 = -β_s11 / μ11
 
-    # The second Perron coefficient solves a quadratic equation.
-    const_pf = (β_s12 + 0.5*np.dot(α_s, α_s)
-                + e1*(μ12 + np.dot(σ1, α_s))
-                + 0.5*e1**2*np.dot(σ1, σ1))
-    lin_pf = μ22 + np.dot(σ2, α_s) + e1*np.dot(σ1, σ2)
-    quad_pf = 0.5 * np.dot(σ2, σ2)
+    const = (β_s12 + 0.5 * np.dot(α_s, α_s)
+             + e1 * (μ12 + np.dot(σ1, α_s))
+             + 0.5 * e1**2 * np.dot(σ1, σ1))
+    lin = μ22 + np.dot(σ2, α_s) + e1 * np.dot(σ1, σ2)
+    quad = 0.5 * np.dot(σ2, σ2)
 
-    disc = lin_pf**2 - 4*quad_pf*const_pf
-    e2_m = (-lin_pf - np.sqrt(disc)) / (2*quad_pf)
-    e2_p = (-lin_pf + np.sqrt(disc)) / (2*quad_pf)
+    disc = lin**2 - 4 * quad * const
+    roots = [(-lin - np.sqrt(disc)) / (2 * quad),
+             (-lin + np.sqrt(disc)) / (2 * quad)]
 
-    η_m = β_s0 - β_s12*ι2 - e2_m*μ22*ι2
-    η_p = β_s0 - β_s12*ι2 - e2_p*μ22*ι2
+    candidates = []
+    for e2 in roots:
+        eta = (β_s0 - β_s11 * ι1 - β_s12 * ι2
+               - e1 * (μ11 * ι1 + μ12 * ι2) - e2 * μ22 * ι2)
+        candidates.append((eta, e2))
 
-    # Select the lower eigenvalue root that generates stationary recovered dynamics.
-    if η_m <= η_p:
-        e2, η_hat = e2_m, η_m
-    else:
-        e2, η_hat = e2_p, η_p
-
-    return e1, e2, η_hat, α_s
-
-
-e1, e2, η_hat_lrr, α_s = solve_pf_lrr(lrr_params, v1, v2, A_vec)
-
-print(f"PF eigenfunction coefficients:  e_bar1 = {e1:.4f},  e_bar2 = {e2:.4f}")
-print(f"Log eigenvalue:                 η_hat  = {η_hat_lrr:.6f}  "
-      f"(annualized = {η_hat_lrr*12:.4f})")
-```
+    # Choose the stable Perron root used for the long-term factorization.
+    eta, e2 = min(candidates)
+    return e1, e2, eta, α_s
 
-### Computing the P_hat dynamics
 
-```{code-cell} ipython3
-def compute_phat_dynamics(p, e1, e2, α_s):
-    """Drift parameters under the recovered measure P_hat."""
-    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
-    ι1, ι2 = p['ι1'], p['ι2']
-    σ1, σ2 = p['σ1'], p['σ2']
+def recovered_lrr_dynamics(p, e1, e2, α_s):
+    """State dynamics under the long-term risk-neutral measure."""
+    μ11, μ12, μ22 = p["μ11"], p["μ12"], p["μ22"]
+    ι1, ι2 = p["ι1"], p["ι2"]
+    σ1, σ2 = p["σ1"], p["σ2"]
 
-    # α_h is the likelihood-ratio loading for the recovered measure.
     α_h = α_s + σ1 * e1 + σ2 * e2
 
     μ_hat_11 = μ11
     μ_hat_12 = μ12 + np.dot(σ1, α_h)
     μ_hat_22 = μ22 + np.dot(σ2, α_h)
 
-    # Rewrite the drift in mean-reversion form under P_hat.
     ι_hat_2 = (μ22 / μ_hat_22) * ι2
-    ι_hat_1 = (ι1
-               + (1.0/μ11) * (μ12*ι2 - μ_hat_12*ι_hat_2))
+    ι_hat_1 = ι1 + (μ12 * ι2 - μ_hat_12 * ι_hat_2) / μ11
 
     return dict(
-        μ_hat_11 = μ_hat_11,
-        μ_hat_12 = μ_hat_12,
-        μ_hat_22 = μ_hat_22,
-        ι_hat_1 = ι_hat_1,
-        ι_hat_2 = ι_hat_2,
-        α_h = α_h,
-        σ1 = σ1,
-        σ2 = σ2,
+        μ11=μ_hat_11,
+        μ12=μ_hat_12,
+        μ22=μ_hat_22,
+        ι1=ι_hat_1,
+        ι2=ι_hat_2,
+        σ1=σ1,
+        σ2=σ2,
+        α_h=α_h,
     )
 
 
-phat_dyn = compute_phat_dynamics(lrr_params, e1, e2, α_s)
+def risk_neutral_lrr_dynamics(p, α_s):
+    """State dynamics under the instantaneous risk-neutral measure."""
+    μ11, μ12, μ22 = p["μ11"], p["μ12"], p["μ22"]
+    ι1, ι2 = p["ι1"], p["ι2"]
+    σ1, σ2 = p["σ1"], p["σ2"]
+
+    μ_bar_11 = μ11
+    μ_bar_12 = μ12 + np.dot(σ1, α_s)
+    μ_bar_22 = μ22 + np.dot(σ2, α_s)
+
+    ι_bar_2 = (μ22 / μ_bar_22) * ι2
+    ι_bar_1 = ι1 + (μ12 * ι2 - μ_bar_12 * ι_bar_2) / μ11
 
-print("P_hat dynamics:")
-print(f"  μ_hat_11 = {phat_dyn['μ_hat_11']:.4f}  "
-      f"(physical {lrr_params['μ11']:.4f})")
-print(f"  μ_hat_12 = {phat_dyn['μ_hat_12']:.6f}  "
-      f"(physical 0)")
-print(f"  μ_hat_22 = {phat_dyn['μ_hat_22']:.5f}  "
-      f"(physical {lrr_params['μ22']:.4f})")
-print(f"  ι_hat_1  = {phat_dyn['ι_hat_1']:.5f}  "
-      f"(physical {lrr_params['ι1']:.4f})")
-print(f"  ι_hat_2  = {phat_dyn['ι_hat_2']:.5f}  "
-      f"(physical {lrr_params['ι2']:.4f})")
+    return dict(
+        μ11=μ_bar_11,
+        μ12=μ_bar_12,
+        μ22=μ_bar_22,
+        ι1=ι_bar_1,
+        ι2=ι_bar_2,
+        σ1=σ1,
+        σ2=σ2,
+    )
 ```
 
-For comparison with the paper's Figure 1, we also compute the instantaneous risk-neutral
-dynamics.
+For the calibration used here, the recovered measure changes the long-run state
+distribution.
 
-This change of measure uses the martingale component of the normalized SDF, whose
-diffusion vector is $\alpha_s$.
+It lowers the mean of expected growth and raises the mean of volatility.
 
 ```{code-cell} ipython3
-def compute_rn_dynamics(p, α_s):
-    """Drift parameters under the one-period risk-neutral measure."""
-    μ11, μ12, μ22 = p['μ11'], p['μ12'], p['μ22']
-    ι1, ι2 = p['ι1'], p['ι2']
-    σ1, σ2 = p['σ1'], p['σ2']
-
-    # Risk-neutral dynamics use the normalized SDF loading.
-    μ_rn_11 = μ11
-    μ_rn_12 = μ12 + np.dot(σ1, α_s)
-    μ_rn_22 = μ22 + np.dot(σ2, α_s)
-
-    # Rewrite the drift in mean-reversion form under P_bar.
-    ι_rn_2 = (μ22 / μ_rn_22) * ι2
-    ι_rn_1 = (ι1
-              + (1.0/μ11) * (μ12*ι2 - μ_rn_12*ι_rn_2))
+v1, v2 = solve_value_function(lrr_params)
+e1, e2, η_lrr, α_s = solve_pf_lrr(lrr_params, v1, v2)
+dyn_hat = recovered_lrr_dynamics(lrr_params, e1, e2, α_s)
+dyn_bar = risk_neutral_lrr_dynamics(lrr_params, α_s)
+
+print(f"value slopes:       v1 = {v1:.4f}, v2 = {v2:.4f}")
+print(f"Perron coefficients: e1 = {e1:.4f}, e2 = {e2:.4f}")
+print(f"log eigenvalue:     eta = {η_lrr:.6f}  "
+      f"(annualized {12 * η_lrr:.4f})")
+print()
+print("Long-run means under three measures")
+print("measure        iota_1     iota_2     mu_12      mu_22")
+print("---------   --------   --------   --------   --------")
+print(f"actual      {lrr_params['ι1']:8.5f}   {lrr_params['ι2']:8.5f}"
+      f"   {lrr_params['μ12']:8.5f}   {lrr_params['μ22']:8.5f}")
+print(f"risk-neut.  {dyn_bar['ι1']:8.5f}   {dyn_bar['ι2']:8.5f}"
+      f"   {dyn_bar['μ12']:8.5f}   {dyn_bar['μ22']:8.5f}")
+print(f"long-term   {dyn_hat['ι1']:8.5f}   {dyn_hat['ι2']:8.5f}"
+      f"   {dyn_hat['μ12']:8.5f}   {dyn_hat['μ22']:8.5f}")
+```
 
-    return dict(
-        μ_rn_11 = μ_rn_11,
-        μ_rn_12 = μ_rn_12,
-        μ_rn_22 = μ_rn_22,
-        ι_rn_1 = ι_rn_1,
-        ι_rn_2 = ι_rn_2,
-    )
+These numbers show the mechanism clearly.
 
+The positive value slope $v_1$ says that the continuation value is very sensitive to
+predictable consumption growth.
 
-rn_dyn = compute_rn_dynamics(lrr_params, α_s)
+The volatility slope $v_2$ is negative in this calibration, so higher volatility lowers
+continuation value.
 
-print("Dynamics of X under P_bar (risk-neutral):")
-print(f"  μ_rn_12 = {rn_dyn['μ_rn_12']:.6f}")
-print(f"  μ_rn_22 = {rn_dyn['μ_rn_22']:.5f}")
-print(f"  ι_rn_1  = {rn_dyn['ι_rn_1']:.5f}")
-print(f"  ι_rn_2  = {rn_dyn['ι_rn_2']:.5f}")
-```
+The Perron coefficient $e_1$ has the opposite sign: the long-term change of measure
+loads negatively on predictable growth.
+
+Thus the recovered measure tilts probability toward histories with lower expected
+growth.
+
+The positive $e_2$ works in the other direction for volatility, tilting probability
+toward higher-volatility states.
+
+The table translates those coefficients into state dynamics.
+
+Relative to the correctly specified law, both risk-neutral measures lower the long-run
+mean of predictable growth and raise the long-run mean of volatility.
+
+The long-term risk-neutral measure moves further in that direction than the
+instantaneous risk-neutral measure: $\iota_1$ falls from $0$ to about $-0.0027$, while
+$\iota_2$ rises from $1$ to about $1.13$.
+
+The small negative log eigenvalue means that the Perron discount factor is slightly
+below one; with the usual yield sign convention, $-\eta$ is the corresponding long-run
+discount rate.
+
+### State probabilities
+
+Figure 1 in {cite:t}`BorovickaHansenScheinkman2016` is about forecasting after
+treating the recovered measure as beliefs.
+
+It is the same message as the coefficient table above, but shown as a distribution
+rather than as long-run means.
+
+The table said that the recovered measure lowers the long-run mean of predictable
+growth $X_1$ and raises the long-run mean of volatility $X_2$.
+
+The figure shows the same distortion geometrically: probability mass moves down and to
+the right.
+
+The left panel uses the correctly specified probability measure $\mathbf{P}$.
+
+The right panel uses the probability measure recovered from the Perron--Frobenius
+calculation, $\hat{\mathbf{P}}$.
 
-### Simulating and comparing stationary distributions
+The main message is not just that the two densities differ.
+
+The recovered measure puts more probability on bad long-run-risk states.
+
+These are states with lower predictable growth $X_1$ and higher volatility $X_2$.
+
+It also makes low growth and high volatility occur together more often.
+
+The dashed contour adds the instantaneous risk-neutral distribution. In this calibration,
+the risk-neutral and recovered stationary distributions are close to each other and both
+are far from the correctly specified distribution.
+
+This means that the martingale likelihood ratio is responsible for much of the risk
+adjustment.
+
+The plot below is drawn in three steps.
+
+First, we simulate the state process under each set of drift parameters: the correctly
+specified dynamics, the recovered long-term risk-neutral dynamics, and the instantaneous
+risk-neutral dynamics.
+
+Second, after discarding an initial burn-in, we estimate the stationary joint density of
+$(X_2, X_1)$ with a two-dimensional kernel density estimator.
+
+Third, we draw density contours on the same axes.
+
+The horizontal line marks $X_1=0$ and the vertical line marks the correctly specified
+mean of volatility, $X_2=\iota_2$.
+
+The code uses the paper's calibration but keeps the simulation and KDE choices simple.
 
 ```{code-cell} ipython3
-def simulate_lrr(dyn, T=600_000, seed=42):
-    """Simulate stationary LRR paths by Euler-Maruyama."""
+def simulate_lrr(dyn, T=180_000, seed=123):
+    """Euler simulation of the LRR state process under one probability measure."""
     rng = np.random.default_rng(seed)
-    μ11 = dyn.get('μ11', dyn.get('μ_hat_11'))
-    μ12 = dyn.get('μ12', dyn.get('μ_hat_12', 0.0))
-    μ22 = dyn.get('μ22', dyn.get('μ_hat_22'))
-    ι1 = dyn.get('ι1', dyn.get('ι_hat_1'))
-    ι2 = dyn.get('ι2', dyn.get('ι_hat_2'))
-    σ1 = dyn['σ1']
-    σ2 = dyn['σ2']
-
     X1 = np.zeros(T)
-    X2 = np.full(T, ι2)
+    X2 = np.full(T, dyn["ι2"])
 
+    # Euler step with monthly time increment
     for t in range(1, T):
-        X2t = max(X2[t-1], 1e-9)
-        sq_X2 = np.sqrt(X2t)
-
-        # Monthly Euler step with dt = 1.
+        X2_prev = max(X2[t-1], 1e-9)
         dW = rng.standard_normal(3)
-
-        X1[t] = X1[t-1] + (μ11*(X1[t-1]-ι1) + μ12*(X2t-ι2)) + sq_X2*np.dot(σ1, dW)
-        X2[t] = max(X2[t-1] + μ22*(X2t-ι2) + sq_X2*np.dot(σ2, dW),  1e-9)
+        sqrt_X2 = np.sqrt(X2_prev)
+
+        X1[t] = (
+            X1[t-1]
+            + dyn["μ11"] * (X1[t-1] - dyn["ι1"])
+            + dyn["μ12"] * (X2_prev - dyn["ι2"])
+            + sqrt_X2 * np.dot(dyn["σ1"], dW)
+        )
+        X2[t] = max(
+            X2_prev
+            + dyn["μ22"] * (X2_prev - dyn["ι2"])
+            + sqrt_X2 * np.dot(dyn["σ2"], dW),
+            1e-9,
+        )
 
     burn = T // 5
     return X1[burn:], X2[burn:]
 
 
-X1_P, X2_P = simulate_lrr(
-    dict(μ11=lrr_params['μ11'], μ12=lrr_params['μ12'],
-         μ22=lrr_params['μ22'], ι1=lrr_params['ι1'],
-         ι2=lrr_params['ι2'],
-         σ1=lrr_params['σ1'], σ2=lrr_params['σ2']),
-    T=600_000
-)
+def kde2d_contour(ax, X1, X2, label, levels=7, fill=True,
+                  linestyle='solid', outer_only=False):
+    """Estimate the stationary density and draw its contours."""
+    m = min(25_000, len(X1))
+    idx = np.linspace(0, len(X1) - 1, m, dtype=int)
+    x1 = X1[idx]
+    x2 = X2[idx]
 
-X1_Ph, X2_Ph = simulate_lrr(
-    dict(μ_hat_11=phat_dyn['μ_hat_11'],
-         μ_hat_12=phat_dyn['μ_hat_12'],
-         μ_hat_22=phat_dyn['μ_hat_22'],
-         ι_hat_1=phat_dyn['ι_hat_1'],
-         ι_hat_2=phat_dyn['ι_hat_2'],
-         σ1=lrr_params['σ1'],
-         σ2=lrr_params['σ2']),
-    T=600_000
-)
+    kde = gaussian_kde(np.vstack([x2, x1]))
+    x2_grid = np.linspace(0.6, 1.6, 140)
+    x1_grid = np.linspace(-0.006, 0.006, 140)
+    X2g, X1g = np.meshgrid(x2_grid, x1_grid)
+    Z = kde(np.vstack([X2g.ravel(), X1g.ravel()])).reshape(X2g.shape)
 
-X1_RN, X2_RN = simulate_lrr(
-    dict(μ11=rn_dyn['μ_rn_11'],
-         μ12=rn_dyn['μ_rn_12'],
-         μ22=rn_dyn['μ_rn_22'],
-         ι1=rn_dyn['ι_rn_1'],
-         ι2=rn_dyn['ι_rn_2'],
-         σ1=lrr_params['σ1'],
-         σ2=lrr_params['σ2']),
-    T=600_000
+    contour_levels = np.linspace(0.12 * Z.max(), 0.9 * Z.max(), levels)
+    if outer_only:
+        contour_levels = contour_levels[:1]
+
+    if fill:
+        fill_levels = np.r_[contour_levels, Z.max()]
+        ax.contourf(X2g, X1g, Z, levels=fill_levels, cmap='Greys',
+                    alpha=0.85)
+        ax.contour(X2g, X1g, Z, levels=contour_levels, colors='0.55',
+                   linewidths=0.4)
+        ax.plot([], [], color='0.25', lw=1.5, label=label)
+    else:
+        ax.contour(X2g, X1g, Z, levels=contour_levels, colors='black',
+                   linewidths=1.5, linestyles=linestyle)
+        ax.plot([], [], color='black', lw=1.5, ls=linestyle, label=label)
+
+
+dyn_true = dict(
+    μ11=lrr_params["μ11"],
+    μ12=lrr_params["μ12"],
+    μ22=lrr_params["μ22"],
+    ι1=lrr_params["ι1"],
+    ι2=lrr_params["ι2"],
+    σ1=lrr_params["σ1"],
+    σ2=lrr_params["σ2"],
 )
-```
 
-```{code-cell} ipython3
-def kde2d_contour(ax, X1, X2, levels=8, color='k', alpha=1.0, lw=1.5,
-                  bandwidth=None, linestyle='solid'):
-    """Add 2D KDE contours to an axis."""
-    xy = np.vstack([X2, X1])
-    kde = gaussian_kde(xy, bw_method=bandwidth)
-    x2g = np.linspace(X2.min()*0.9, X2.max()*1.1, 120)
-    x1g = np.linspace(X1.min()*0.9, X1.max()*1.1, 120)
-    X2g, X1g = np.meshgrid(x2g, x1g)
-    Z = kde(np.vstack([X2g.ravel(), X1g.ravel()])).reshape(X2g.shape)
-    ax.contour(X2g, X1g, Z, levels=levels, colors=color, alpha=alpha,
-               linewidths=lw, linestyles=linestyle)
-
-fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5), sharey=True)
-
-kde2d_contour(ax1, X1_P, X2_P, color='navy', levels=7)
-ax1.set_xlabel('conditional volatility  $X_2$', fontsize=11)
-ax1.set_ylabel('mean growth rate  $X_1$', fontsize=11)
-ax1.set_title(r'physical measure  $P$', fontsize=12)
-
-kde2d_contour(ax2, X1_Ph, X2_Ph, color='navy', levels=7)
-kde2d_contour(ax2, X1_RN, X2_RN, color='black', levels=3,
-              alpha=0.65, lw=1.2, linestyle='--')
-ax2.set_xlabel('conditional volatility  $X_2$', fontsize=11)
-ax2.set_title(r'long-term risk-neutral  $\hat{P}$', fontsize=12)
-
-for ax in (ax1, ax2):
-    ax.axhline(0, color='grey', lw=0.8, ls='--')
-    ax.axvline(lrr_params['ι2'], color='grey', lw=0.8, ls='--')
-
-ax1.annotate(f"mean X1 ~ {X1_P.mean():.4f}", xy=(0.05, 0.92),
-             xycoords='axes fraction', fontsize=9, color='navy')
-ax1.annotate(f"mean X2 ~ {X2_P.mean():.4f}", xy=(0.05, 0.85),
-             xycoords='axes fraction', fontsize=9, color='navy')
-ax2.annotate(f"mean X1 ~ {X1_Ph.mean():.4f}", xy=(0.05, 0.92),
-             xycoords='axes fraction', fontsize=9, color='navy')
-ax2.annotate(f"mean X2 ~ {X2_Ph.mean():.4f}", xy=(0.05, 0.85),
-             xycoords='axes fraction', fontsize=9, color='navy')
-ax2.plot([], [], color='navy', lw=1.5, label=r'$\hat{P}$')
-ax2.plot([], [], color='black', lw=1.2, ls='--', label=r'risk-neutral $\bar{P}$')
-ax2.legend(fontsize=9, loc='lower right')
-
-plt.suptitle('stationary distributions of $(X_1, X_2)$ under $P$ and $\\hat{P}$\n'
-             '(based on Figure 1 of Borovička, Hansen & Scheinkman 2016)',
-             fontsize=12, y=1.02)
-plt.tight_layout();  plt.show()
+X1_P, X2_P = simulate_lrr(dyn_true, seed=1)
+X1_H, X2_H = simulate_lrr(dyn_hat, seed=2)
+X1_B, X2_B = simulate_lrr(dyn_bar, seed=3)
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4.8), sharex=True, sharey=True)
+kde2d_contour(axes[0], X1_P, X2_P, label=r'correctly specified $\mathbf{P}$')
+kde2d_contour(axes[1], X1_H, X2_H,
+              label=r'long-term risk-neutral $\hat{\mathbf{P}}$')
+kde2d_contour(axes[1], X1_B, X2_B,
+              label=r'risk-neutral $\bar{\mathbf{P}}$',
+              fill=False, linestyle='--', outer_only=True)
+
+for ax in axes:
+    ax.axhline(0, lw=0.8, ls='--')
+    ax.axvline(lrr_params["ι2"], lw=0.8, ls='--')
+    ax.set_xlim(0.6, 1.6)
+    ax.set_ylim(-0.006, 0.006)
+    ax.set_xlabel(r"conditional volatility $X_2$")
+    ax.legend(fontsize=9)
+
+axes[0].set_ylabel(r"mean growth rate $X_1$")
+plt.tight_layout()
+plt.show()
 ```
 
-The recovered measure $\hat{P}$ concentrates around **lower mean growth** (more negative
-$X_1$) and **higher conditional volatility** (larger $X_2$).
+The movement below the horizontal line means lower expected growth, while movement to
+the right of the vertical line means higher volatility.
+
+### Yield implications
+
+Figure 2 in {cite:t}`BorovickaHansenScheinkman2016` asks how the probability
+difference affects yields.
+
+For a cash flow $G_t$, the yield compares a forecast of the payoff with its asset price:
+
+$$
+y_t[G](x)
+= \frac{1}{t}\log E[G_t \mid X_0=x]
+  - \frac{1}{t}\log E[S_tG_t \mid X_0=x].
+$$
+
+The first term is a forecast of the cash flow.
+
+The second term is its price, written using the stochastic discount factor.
+
+If an analyst treats $\hat{\mathbf{P}}$ as investors' beliefs, the forecast term changes
+while the observed price is held fixed.
+
+The left panel applies this comparison to a payoff equal to aggregate consumption at
+maturity.
 
-Forecasts made using $\hat{P}$ are systematically pessimistic compared to forecasts
-based on the true distribution $P$.
+The recovered measure treats adverse long-run-risk states as more likely.
 
-## Measuring the martingale component
+As a result, it removes much of the long-run risk compensation from aggregate
+consumption cash flows.
 
-### Entropy bounds
+The resulting consumption yields are lower than the yields computed with the correctly
+specified probability measure.
 
-Even without observing the full array of Arrow prices, we can obtain **lower bounds** on
-the size of the martingale component.
+The bond panel gives the comparison case: for a zero-coupon payoff, changing the payoff
+forecast does not change the numerator. It isolates the maturity-matched discounting
+against which the aggregate-consumption cash flow is compared.
 
-For a convex function $\phi_\theta(r) = [(r)^{1+\theta} - 1] / [\theta(1+\theta)]$, the
-discrepancy between $\hat{P}$ and $P$ satisfies
+The calculation below uses the affine formulas implied by the long-run risk model.
+
+If a multiplicative functional $M$ has log drift affine in $X$ and diffusion proportional
+to $\sqrt{X_2}$, then
 
 $$
-\lambda_\theta = E\!\left[\phi_\theta\!\left(\frac{\hat{H}_{t+1}}{\hat{H}_t}\right)\right]
-\geq 0,
+E[M_t \mid X_0=x]
+= \exp\{\theta_0(t)+\theta_1(t)x_1+\theta_2(t)x_2\},
 $$
 
-with equality if and only if the martingale is trivial.
+where the coefficients solve Riccati equations.
 
-Two special cases are:
+The code below computes these affine expectations under the correctly specified
+measure, recomputes only the consumption forecast under the recovered measure, and keeps
+asset prices fixed.
 
-- **$\theta = -1$**: $\phi_{-1}(r) = -\log r$, so $\lambda_{-1} = -E[\log(\hat{H}_{t+1}/\hat{H}_t)]$ is the **expected log-likelihood** (entropy).
-- **$\theta = 1$**: $\lambda_1 = \tfrac{1}{2}\mathrm{Var}[\hat{H}_{t+1}/\hat{H}_t]$.
+It then plots median and interquartile yield bands across the same simulated initial
+states.
 
 ```{code-cell} ipython3
-def φ_θ(r, θ):
-    """Discrepancy function."""
-    if abs(θ) < 1e-10:      # θ -> 0: relative entropy r log r
-        return r * np.log(r)
-    if abs(θ + 1) < 1e-10:  # θ -> -1: -log r
-        return -np.log(r)
-    return (r**(1 + θ) - 1) / (θ * (1 + θ))
-
-
-def martingale_entropy(Q, P, θ=-1):
-    """Stationary-average discrepancy E[φ_θ(h_hat)]."""
-    _, exp_η, e_hat, P_hat = perron_frobenius(Q)
-    H_incr = np.where(P > 0, P_hat / P, 1.0)
-    π = stationary_dist(P)
-
-    disc = 0.0
-    for i in range(P.shape[0]):
-        for j in range(P.shape[1]):
-            if P[i, j] > 0:
-                disc += π[i] * P[i, j] * φ_θ(H_incr[i, j], θ)
-    return disc
-
-
-γs_ent = np.linspace(1.0, 10.0, 50)
-entropies = {'θ=-1 (neg. log)': [], 'θ=0 (rel. entropy)': [], 'θ=1 (variance/2)': []}
-
-for γ_val in γs_ent:
-    v_g, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
-    Q_g = S_g * P_phys
-    for θ, key in [(-1, 'θ=-1 (neg. log)'), (0, 'θ=0 (rel. entropy)'),
-                   (1, 'θ=1 (variance/2)')]:
-        entropies[key].append(martingale_entropy(Q_g, P_phys, θ=θ))
-
-fig, ax = plt.subplots(figsize=(8, 4.5))
-colors_ent = ['firebrick', 'darkorange', 'steelblue']
-for (label, vals), col in zip(entropies.items(), colors_ent):
-    ax.plot(γs_ent, vals, label=label, color=col, lw=2)
-
-ax.set_xlabel('risk aversion  γ')
-ax.set_ylabel(r'$E[\phi_\theta(\hat{H}_{t+1}/\hat{H}_t)]$')
-ax.set_title('discrepancy measures for the martingale component\n'
-             '(larger values <-> larger deviation from Ross recovery)')
-ax.legend(fontsize=9)
-plt.tight_layout();  plt.show()
+---
+mystnb:
+  figure:
+    caption: >-
+      Yield implications of using recovered probabilities as beliefs. Dashed
+      consumption-yield bands use recovered payoff forecasts with prices fixed; bond
+      yields are unchanged because the zero-coupon payoff has no forecast term.
+    name: fig-mr-lrr-figure-2
+---
+def affine_expectation_coeffs(dyn, β0, β1, β2, α, horizons):
+    """Riccati coefficients for log E[M_t | X_0=x]."""
+    μ11, μ12, μ22 = dyn["μ11"], dyn["μ12"], dyn["μ22"]
+    ι1, ι2 = dyn["ι1"], dyn["ι2"]
+    σ1, σ2 = dyn["σ1"], dyn["σ2"]
+
+    def ode(_, θ):
+        θ0, θ1, θ2 = θ
+        θ0_dot = (β0 - β1 * ι1 - β2 * ι2
+                  - θ1 * (μ11 * ι1 + μ12 * ι2)
+                  - θ2 * μ22 * ι2)
+        θ1_dot = β1 + μ11 * θ1
+        θ2_dot = (β2 + μ12 * θ1 + μ22 * θ2
+                  + 0.5 * np.dot(α, α)
+                  + θ1 * np.dot(σ1, α)
+                  + θ2 * np.dot(σ2, α)
+                  + 0.5 * θ1**2 * np.dot(σ1, σ1)
+                  + θ1 * θ2 * np.dot(σ1, σ2)
+                  + 0.5 * θ2**2 * np.dot(σ2, σ2))
+        return [θ0_dot, θ1_dot, θ2_dot]
+
+    sol = solve_ivp(ode, (0, horizons[-1]), np.zeros(3),
+                    t_eval=horizons, rtol=1e-8, atol=1e-10)
+    if not sol.success:
+        raise ValueError("Riccati equation failed to solve")
+    return sol.y.T
+
+
+def log_expectation(θ, X1, X2):
+    """Evaluate log E[M_t | X_0=x] on simulated states."""
+    return θ[:, 0, None] + θ[:, 1, None] * X1[None, :] + θ[:, 2, None] * X2[None, :]
+
+
+def yield_quantiles(log_num, log_den, horizons):
+    """Quartiles of annualized yields across initial states."""
+    yields = 12 * (log_num - log_den) / horizons[:, None]
+    return np.quantile(yields, [0.25, 0.5, 0.75], axis=1)
+
+
+def transform_functional(β0, β1, β2, α, dyn_old, dyn_new, α_h):
+    """Rewrite a multiplicative functional after changing probabilities."""
+    # The drift changes because the recovered likelihood ratio changes the
+    # Brownian shock exposure used to forecast the cash flow.
+    β_level = β0 - β1 * dyn_old["ι1"] - β2 * dyn_old["ι2"]
+    β2_new = β2 + np.dot(α, α_h)
+    β0_new = β_level + β1 * dyn_new["ι1"] + β2_new * dyn_new["ι2"]
+    return β0_new, β1, β2_new, α
+
+
+def sdf_coefficients(p, v1, v2):
+    """SDF coefficients used in the affine expectation calculation."""
+    δ, γ = p["δ"], p["γ"]
+    α_c, σ1, σ2 = p["α_c"], p["σ1"], p["σ2"]
+
+    α_h_star = (1 - γ) * (α_c + σ1 * v1 + σ2 * v2)
+    α_s = -α_c + α_h_star
+
+    β_s1 = -p["β_c1"]
+    β_s2 = -p["β_c2"] - 0.5 * np.dot(α_h_star, α_h_star)
+    β_s0 = -δ - p["β_c0"] - 0.5 * p["ι2"] * np.dot(α_h_star, α_h_star)
+
+    return β_s0, β_s1, β_s2, α_s
+
+
+quarters = np.arange(1, 101)
+horizons = 3 * quarters
+
+β_c0, β_c1, β_c2 = (lrr_params["β_c0"],
+                    lrr_params["β_c1"],
+                    lrr_params["β_c2"])
+α_c = lrr_params["α_c"]
+
+β_s0, β_s1, β_s2, α_s = sdf_coefficients(lrr_params, v1, v2)
+
+# Numerators and denominators for yields under the correctly specified measure
+θ_C_P = affine_expectation_coeffs(dyn_true, β_c0, β_c1, β_c2, α_c, horizons)
+θ_S_P = affine_expectation_coeffs(dyn_true, β_s0, β_s1, β_s2, α_s, horizons)
+θ_SC_P = affine_expectation_coeffs(
+    dyn_true, β_s0 + β_c0, β_s1 + β_c1, β_s2 + β_c2,
+    α_s + α_c, horizons
+)
+
+# Recovered-belief numerator for the aggregate-consumption payoff
+β_Ch0, β_Ch1, β_Ch2, α_Ch = transform_functional(
+    β_c0, β_c1, β_c2, α_c, dyn_true, dyn_hat, dyn_hat["α_h"]
+)
+θ_C_H = affine_expectation_coeffs(dyn_hat, β_Ch0, β_Ch1, β_Ch2,
+                                  α_Ch, horizons)
+
+log_C_P = log_expectation(θ_C_P, X1_P, X2_P)
+log_C_H = log_expectation(θ_C_H, X1_P, X2_P)
+log_S_P = log_expectation(θ_S_P, X1_P, X2_P)
+log_SC_P = log_expectation(θ_SC_P, X1_P, X2_P)
+
+qC_P = yield_quantiles(log_C_P, log_SC_P, horizons)
+qC_H = yield_quantiles(log_C_H, log_SC_P, horizons)
+qB_P = yield_quantiles(np.zeros_like(log_S_P), log_S_P, horizons)
+# A zero-coupon payoff has the same numerator, log E[1] = 0, under either belief.
+qB_H = qB_P.copy()
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4.8), sharex=True)
+
+def plot_yield_band(ax, x, q, color, label, linestyle='solid',
+                    alpha=0.35):
+    """Plot quartile band and quartile lines."""
+    ax.fill_between(x, q[0], q[2], color=color, alpha=alpha, linewidth=0)
+    ax.plot(x, q[1], color=color, lw=2.4, ls=linestyle, label=label)
+    ax.plot(x, q[0], color=color, lw=1.3, ls=linestyle)
+    ax.plot(x, q[2], color=color, lw=1.3, ls=linestyle)
+
+
+plot_yield_band(axes[0], quarters, qC_P, color='0.2',
+                label='correctly specified measure', alpha=0.45)
+plot_yield_band(axes[0], quarters, qC_H, color='0.65',
+                label='recovered measure', linestyle='--', alpha=0.35)
+plot_yield_band(axes[1], quarters, qB_P, color='0.2',
+                label='correctly specified measure', alpha=0.45)
+plot_yield_band(axes[1], quarters, qB_H, color='0.65',
+                label='recovered measure', linestyle='--', alpha=0.25)
+
+axes[0].set_xlabel('maturity (quarters)')
+axes[0].set_ylabel('consumption yield to maturity')
+axes[1].set_xlabel('maturity (quarters)')
+axes[1].set_ylabel('bond yield to maturity')
+
+axes[0].legend(fontsize=9)
+
+plt.tight_layout()
+plt.show()
 ```
 
-All three discrepancy measures increase with risk aversion, confirming that a higher
-$\gamma$ implies a larger -- and more economically significant -- martingale component.
+The left panel is the key one: recovered beliefs put more mass on low-growth,
+high-volatility states, so they forecast lower consumption and imply lower consumption
+yields when prices are held fixed.
 
-{cite:t}`AlvarezJermann2005` and {cite:t}`BakshiChabiYo2012` use analogous bounds with
-long-maturity bond returns to find empirically large martingale components in U.S. data.
+The bond panel is a check.
 
-## Exercises
+Since $\log E[1]=0$ under any measure, the solid and dashed
+bond-yield bands coincide.
 
-```{exercise}
-:label: ex_risk_neutral
+## Additional state vector
+
+{cite:t}`BorovickaHansenScheinkman2016` then asks whether the recovery
+problem can be fixed by enlarging the state vector.
 
-**Verify risk-neutral probabilities.**
+So far, the Perron eigenfunction has depended only on the Markov state $X_t$.
 
-Consider a two-state Markov chain with physical transition matrix
+But many models also contain a growing component $Y_t$, such as log consumption, with
+increments driven by the same shocks:
 
 $$
-\mathbf{P} = \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix}
+X_{t+1}=\phi_x(X_t,W_{t+1}),
+\qquad
+Y_{t+1}-Y_t=\phi_y(X_t,W_{t+1}).
 $$
 
-and Arrow price matrix
+If we allow the eigenfunction to depend on both $(X_t,Y_t)$, then a natural candidate is
 
 $$
-\mathbf{Q} = \begin{pmatrix} 0.72 & 0.15 \\ 0.36 & 0.42 \end{pmatrix}.
+\varepsilon(x,y)=\exp(\zeta \cdot y)e_\zeta(x).
 $$
 
-1. Compute the risk-neutral transition matrix $\bar{\mathbf{P}}$ and verify it is a
-   valid probability matrix.
-2. Compute the one-period discount bond prices and the implied risk-free rates in each
-   state.
-3. Show that the SDF $\bar{s}_{ij} = \bar{q}_i$ is independent of the next state $j$.
-```
+This form is natural because $Y$ enters through increments.
 
-```{solution-start} ex_risk_neutral
-:class: dropdown
-```
+Along a path,
 
-```{code-cell} ipython3
-P2 = np.array([[0.8, 0.2],
-               [0.4, 0.6]])
-Q2 = np.array([[0.72, 0.15],
-               [0.36, 0.42]])
+$$
+\exp(\zeta \cdot Y_{t+1})
+= \exp(\zeta \cdot Y_t)
+  \exp\{\zeta \cdot (Y_{t+1}-Y_t)\}.
+$$
 
-P_bar2, q_bonds2 = risk_neutral_probs(Q2)
+Since $Y_{t+1}-Y_t$ is a function of $(X_t,W_{t+1})$, the ratio
+$\exp(\zeta \cdot Y_{t+1})/\exp(\zeta \cdot Y_t)$ is a one-period
+multiplicative shock.
 
-print("Risk-neutral P_bar:")
-print(np.round(P_bar2, 4))
-print(f"\nRow sums: {P_bar2.sum(axis=1)}")
-print(f"\nBond prices q_bar_i: {q_bonds2}")
-print(f"Annualized risk-free rates: {(-np.log(q_bonds2)*12).round(4)}")
+Thus multiplying the old eigenfunction by $\exp(\zeta \cdot y)$ does not destroy the
+Perron structure; it simply changes the one-period pricing operator by the extra factor
+$\exp\{\zeta \cdot (Y_{t+1}-Y_t)\}$.
 
-S_bar2 = np.repeat(q_bonds2[:, np.newaxis], P_bar2.shape[1], axis=1)
-print(f"\nRisk-neutral SDF matrix S_bar:")
-print(np.round(S_bar2, 4))
-print("Check Q = S_bar * P_bar:", np.allclose(Q2, S_bar2 * P_bar2))
+For each choice of $\zeta$, the remaining $x$-dependent part solves a different Perron
+problem:
 
-S2 = Q2 / P2
-print(f"\nPhysical SDF matrix S = Q/P:")
-print(np.round(S2, 4))
-```
+$$
+E\left[
+    \frac{S_{t+1}}{S_t}
+    \exp\{\zeta \cdot (Y_{t+1}-Y_t)\}
+    e_\zeta(X_{t+1})
+    \mid X_t=x
+\right]
+=\exp(\eta_\zeta)e_\zeta(x).
+$$
 
-```{solution-end}
-```
+Changing $\zeta$ changes how much long-run growth risk is loaded into the eigenfunction.
 
-```{exercise}
-:label: ex_gamma_sensitivity
+Thus adding $Y_t$ can make the subjective probability law one possible solution, but it
+also creates a family of possible solutions.
 
-**Risk aversion and recovery distortion under recursive utility.**
+The extra state variable therefore does not remove the identification problem; it
+usually makes the selection problem more explicit.
 
-Using the three-state Epstein--Zin example from the lecture (with $\exp(-\delta)=0.99$,
-$g_c=0.001$, and consumption levels $c = [0.85, 1.00, 1.15]$), investigate how the
-recovered probability vector $\hat{\boldsymbol{\pi}}$ depends on the risk aversion
-parameter $\gamma$.
+The paper also points out a related practical issue.
 
-1. For each $\gamma \in \{1, 2, 5, 10, 15\}$, compute the long-term risk-neutral
-   stationary distribution $\hat{\boldsymbol{\pi}}$ using the recursive-utility SDF.
-2. Plot all five distributions as grouped bar charts alongside the physical
-   distribution $\boldsymbol{\pi}$.
-3. Does the recession probability under $\hat{\mathbf{P}}$ exceed $50\%$ for
-   $\gamma \leq 30$?
+Highly persistent stationary processes can be hard to distinguish from processes with
+stationary increments.
 
-If not, report the maximum value on that range.
-```
+A stationary approximation may have a unique Perron solution for each finite persistence
+level, but as persistence becomes extreme, the limiting problem can have many
+near-solutions.
 
-```{solution-start} ex_gamma_sensitivity
-:class: dropdown
-```
+Numerically, this means recovery can become fragile exactly in the cases where a
+stationary model is being used to approximate stochastic growth.
 
-```{code-cell} ipython3
-γs_ex2 = [1, 2, 5, 10, 15]
-all_π = []
-
-for γ_val in γs_ex2:
-    _, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
-    Q_g = S_g * P_phys
-    _, _, _, Ph_g = perron_frobenius(Q_g)
-    all_π.append(stationary_dist(Ph_g))
-
-fig, ax = plt.subplots(figsize=(10, 4.5))
-x = np.arange(3)
-w = 0.13
-colors_g = plt.cm.Blues(np.linspace(0.3, 0.9, len(γs_ex2)))
-
-bars = ax.bar(x - 3*w, π_phys, width=w, color='grey', alpha=0.7, label='physical P')
-for b_, v in zip(bars, π_phys):
-    ax.text(b_.get_x()+w/2, v+0.005, f'{v:.3f}', ha='center', va='bottom', fontsize=7)
-
-for k, (γ_val, π_g, col) in enumerate(zip(γs_ex2, all_π, colors_g)):
-    bars = ax.bar(x + (k-1.5)*w, π_g, width=w, color=col,
-                  label=f'γ={γ_val}')
-    for b_, v in zip(bars, π_g):
-        ax.text(b_.get_x()+w/2, v+0.005, f'{v:.3f}',
-                ha='center', va='bottom', fontsize=7)
-
-ax.set_xticks(x);  ax.set_xticklabels(state_names)
-ax.set_ylabel('stationary probability')
-ax.set_title(r'stationary distribution of $\hat{P}$ for varying risk aversion $\gamma$')
-ax.legend(fontsize=8, loc='upper right')
-plt.tight_layout();  plt.show()
-
-γs_fine = np.linspace(1, 30, 200)
-rec_probs = []
-for γ_val in γs_fine:
-    _, _, S_g = solve_ez_finite(P_phys, c_levels, δ, γ_val, gc_ex)
-    Q_g = S_g * P_phys
-    _, _, _, Ph_g = perron_frobenius(Q_g)
-    rec_probs.append(stationary_dist(Ph_g)[0])
-
-idx50 = np.where(np.array(rec_probs) > 0.5)[0]
-if len(idx50) > 0:
-    print(f"\nRecession prob under P_hat exceeds 50% at approximately γ ~ {γs_fine[idx50[0]]:.1f}")
-else:
-    print(f"\nRecession prob under P_hat does not exceed 50% for γ <= 30")
-    print(f"  Maximum recession prob = {max(rec_probs):.4f} at γ = 30")
-```
+There is, however, a structured way forward.
 
-```{solution-end}
-```
+If the analyst supplies a reference multiplicative functional $Y^r$ that is known to
+contain the martingale component of the SDF, then one can restrict the enlarged
+eigenfunction to the form
+
+$$
+(Y^r)^{-1}e(x).
+$$
+
+This restriction chooses which long-run martingale component is allowed into the
+eigenfunction.
+
+With this extra structure, Arrow prices can again reveal subjective probabilities.
+
+But the key input is external: the long-run martingale component has been supplied by
+the analyst, not recovered from Arrow prices alone.
+
+## Lessons
+
+The Perron--Frobenius calculation remains useful under misspecification, but it no
+longer solves the belief-recovery problem by itself.
+
+It delivers a probability measure that may include long-horizon risk premia.
+
+That measure equals investors' beliefs only when the likelihood-ratio martingale is
+constant.
+
+Recursive utility, permanent shocks, and long-run risk models give this martingale an
+economically important role, so it should not be overlooked when assessing the
+implications of transition independence for belief recovery.
+
+## Exercises
 
 ```{exercise}
-:label: ex_lrr_gamma
+:label: ex_misspecified_recovery_diagnostic
 
-**Effect of risk aversion in the long-run risk model.**
+**A two-state diagnostic.**
 
-Repeat the long-run risk simulation from the lecture for $\gamma \in \{5, 10, 15\}$
-(keeping all other parameters fixed at their calibrated values).
+Let
 
-1. For each $\gamma$, compute $(\bar{e}_1, \bar{e}_2)$ and $\hat{\eta}$.
-2. Plot $\hat{\iota}_1$ (long-run mean of $X_1$ under $\hat{P}$) as a function of
-   $\gamma$ and interpret the result in terms of long-run expected consumption growth.
-3. Plot $\hat{\iota}_2$ (long-run mean of $X_2$ under $\hat{P}$) as a function of
-   $\gamma$ and interpret it in terms of long-run volatility.
+$$
+\mathbf{P} =
+\begin{pmatrix}
+0.8 & 0.2 \\
+0.4 & 0.6
+\end{pmatrix},
+\qquad
+\mathbf{Q} =
+\begin{pmatrix}
+0.72 & 0.15 \\
+0.36 & 0.42
+\end{pmatrix}.
+$$
+
+1. Compute the one-period risk-neutral transition matrix $\bar{\mathbf{P}}$.
+2. Compute the recovered transition matrix $\hat{\mathbf{P}}$.
+3. Compute $\hat h_{ij}=\hat p_{ij}/p_{ij}$ and decide whether recovery returns the
+   correctly specified transition matrix.
 ```
 
-```{solution-start} ex_lrr_gamma
+```{solution-start} ex_misspecified_recovery_diagnostic
 :class: dropdown
 ```
 
-```{code-cell} ipython3
-γs_lrr = np.linspace(2.0, 18.0, 40)
-ι_hat_1_vals = []
-ι_hat_2_vals = []
-η_hat_vals = []
-
-p_copy = dict(lrr_params)
-
-for γ_val in γs_lrr:
-    p_copy['γ'] = γ_val
-    try:
-        v1g, v2g, A_g, _ = solve_value_function(p_copy)
-        e1g, e2g, η_g, α_sg = solve_pf_lrr(p_copy, v1g, v2g, A_g)
-        dyn_g = compute_phat_dynamics(p_copy, e1g, e2g, α_sg)
-        ι_hat_1_vals.append(dyn_g['ι_hat_1'])
-        ι_hat_2_vals.append(dyn_g['ι_hat_2'])
-        η_hat_vals.append(η_g)
-    except Exception:
-        ι_hat_1_vals.append(np.nan)
-        ι_hat_2_vals.append(np.nan)
-        η_hat_vals.append(np.nan)
-
-fig, axes = plt.subplots(1, 3, figsize=(14, 4))
-
-axes[0].plot(γs_lrr, ι_hat_1_vals, color='steelblue', lw=2.5)
-axes[0].axhline(lrr_params['ι1'], ls='--', color='grey', lw=1.5,
-                label=f"physical ι1 = {lrr_params['ι1']}")
-axes[0].set_xlabel('risk aversion  γ');  axes[0].set_ylabel(r'$\hat{\iota}_1$')
-axes[0].set_title('long-run mean of $X_1$ under $\\hat{P}$\n(down = lower expected growth)')
-axes[0].legend(fontsize=9)
-
-axes[1].plot(γs_lrr, ι_hat_2_vals, color='firebrick', lw=2.5)
-axes[1].axhline(lrr_params['ι2'], ls='--', color='grey', lw=1.5,
-                label=f"physical ι2 = {lrr_params['ι2']}")
-axes[1].set_xlabel('risk aversion  γ');  axes[1].set_ylabel(r'$\hat{\iota}_2$')
-axes[1].set_title('long-run mean of $X_2$ under $\\hat{P}$\n(up = higher expected volatility)')
-axes[1].legend(fontsize=9)
+Here is one solution:
 
-axes[2].plot(γs_lrr, np.array(η_hat_vals)*12, color='purple', lw=2.5)
-axes[2].set_xlabel('risk aversion  γ');  axes[2].set_ylabel(r'annualized $\hat{\eta}$')
-axes[2].set_title('long-run discount rate $\\hat{\\eta}$\n(more negative = higher long-run yield)')
+```{code-cell} ipython3
+P2 = np.array([[0.8, 0.2],
+               [0.4, 0.6]])
+Q2 = np.array([[0.72, 0.15],
+               [0.36, 0.42]])
 
-plt.tight_layout();  plt.show()
+Pbar2, qb2 = risk_neutral_probs(Q2)
+H2, eta2, e2, Phat2 = martingale_increment(Q2, P2)
 
+print("One-period risk-neutral transition matrix P_bar")
+print(np.round(Pbar2, 4))
+print("\nRecovered transition matrix P_hat")
+print(np.round(Phat2, 4))
+print("\nMartingale increment h_hat")
+print(np.round(H2, 4))
+print("\nRecovery returns P:", np.allclose(H2[P2 > 0], 1))
 ```
 
 ```{solution-end}
 ```
 
 ```{exercise}
-:label: ex_recovery_test
-
-**Testing the Ross recovery condition.**
-
-Show algebraically and numerically that, for any $n$-state power-utility model with
-trend-stationary consumption (as in Example 1 of
-{cite}`BorovickaHansenScheinkman2016`), the martingale increment satisfies
-$\hat{h}_{ij} \equiv 1$.
-
-1. Write the SDF as $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ for some constant $A$,
-   show that the Perron-Frobenius eigenvector is $\hat{e}_j = c_j^\gamma$ (up to
-   scale), and find $\hat{\eta}$.
-2. Compute $\hat{p}_{ij} = \exp(-\hat{\eta}) q_{ij} \hat{e}_j / \hat{e}_i$ and verify
-   it equals $p_{ij}$.
-3. Confirm numerically for the three-state example with $\gamma = 5$ and
-   $c = [0.85, 1.00, 1.15]$.
-```
+:label: ex_power_utility_success
 
-```{solution-start} ex_recovery_test
-:class: dropdown
-```
+**Power utility benchmark.**
 
-**Analytical derivation:**
+For trend-stationary consumption and power utility,
 
-With $s_{ij} = A \cdot (c_j/c_i)^{-\gamma}$ we have
-$q_{ij} = A(c_j/c_i)^{-\gamma} p_{ij}$.
+$$
+s_{ij}=A\left(\frac{c_j}{c_i}\right)^{-\gamma}.
+$$
 
-Guess $\hat{e}_j = c_j^\gamma$.
+Show that $\hat e_i=c_i^\gamma$ is the Perron eigenvector and that
+$\hat{\mathbf{P}}=\mathbf{P}$.
 
-Then
+Then verify the result numerically using the three-state baseline in the lecture.
+```
+
+```{solution-start} ex_power_utility_success
+:class: dropdown
+```
+
+The analytical check is:
 
 $$
-[\mathbf{Q} \hat{\mathbf{e}}]_i
-= \sum_j q_{ij} \hat{e}_j
-= A \sum_j \frac{c_j^{-\gamma}}{c_i^{-\gamma}} p_{ij} \cdot c_j^\gamma
-= A c_i^\gamma \sum_j p_{ij}
-= A \hat e_i.
+[\mathbf{Q}\hat e]_i
+=\sum_j A\left(\frac{c_j}{c_i}\right)^{-\gamma}p_{ij}c_j^\gamma
+=A c_i^\gamma
+=A\hat e_i.
 $$
 
-So $\mathbf{Q}\hat{\mathbf{e}} = A \hat{\mathbf{e}}$, confirming
-$\hat{\mathbf{e}} = \{c_j^\gamma\}$ and $\exp(\hat{\eta}) = A$.
-
-Therefore
+Thus $\exp(\hat\eta)=A$ and
 
 $$
-\hat{p}_{ij}
-= \frac{1}{A} q_{ij} \frac{\hat{e}_j}{\hat{e}_i}
-= \frac{1}{A} \cdot A \frac{c_j^{-\gamma}}{c_i^{-\gamma}} p_{ij}
-  \cdot \frac{c_j^\gamma}{c_i^\gamma}
-= p_{ij}.
+\hat p_{ij}
+=\frac{1}{A}q_{ij}\frac{\hat e_j}{\hat e_i}
+=p_{ij}.
 $$
 
-Hence $\hat{h}_{ij} = \hat{p}_{ij}/p_{ij} = 1$ for all $(i,j)$.
+Below is the numerical check.
 
 ```{code-cell} ipython3
-gc_ex4 = 0.002
-S_ts = np.zeros((3, 3))
-for i in range(3):
-    for j in range(3):
-        S_ts[i, j] = np.exp(-δ - γ*gc_ex4) * (c_levels[j]/c_levels[i])**(-γ)
+H_power, _, e_power, P_hat_power = martingale_increment(Q_power, P_true)
+e_theory = c_levels**γ_power
+e_theory = e_theory / e_theory.sum()
+
+print("Perron eigenvector")
+print(np.round(e_power, 6))
+print("\nNormalized c^gamma")
+print(np.round(e_theory, 6))
+print("\nmax |P_hat - P|:",
+      np.max(np.abs(P_hat_power - P_true)))
+print("max |h_hat - 1|:",
+      np.max(np.abs(H_power[P_true > 0] - 1)))
+```
 
-Q_ts = S_ts * P_phys
+```{solution-end}
+```
+
+```{exercise}
+:label: ex_recursive_utility_distortion
+
+**Recursive utility and risk aversion.**
+
+Using the finite-state Epstein--Zin example with
+$c=(0.85, 1.00, 1.15)$, compute the stationary distribution of
+$\hat{\mathbf{P}}$ for $\gamma \in \{1, 5, 10, 15\}$.
 
-_, exp_η_ts, e_hat_ts, P_hat_ts = perron_frobenius(Q_ts)
+Which state receives the largest increase in stationary probability as $\gamma$ rises?
+```
 
-e_theory = c_levels**γ
-e_theory /= e_theory.sum()
+```{solution-start} ex_recursive_utility_distortion
+:class: dropdown
+```
 
-print("e_hat:", np.round(e_hat_ts, 6))
-print("c^γ normalized:", np.round(e_theory, 6))
-print(f"Max discrepancy: {np.abs(e_hat_ts - e_theory).max():.2e}")
+Here is one solution:
 
-H_ts = np.where(P_phys > 0, P_hat_ts / P_phys, 0.0)
-print("\nh_hat:")
-print(np.round(H_ts, 6))
-print(f"Max |h_hat_ij - 1|: {np.abs(H_ts[P_phys>0] - 1).max():.2e}")
+```{code-cell} ipython3
+for γ in [1, 5, 10, 15]:
+    _, _, S_g = solve_ez_unit_eis(P_true, c_recursive, δ, γ, g_c)
+    Q_g = S_g * P_true
+    _, _, _, P_hat_g = martingale_increment(Q_g, P_true)
+    π_g = stationary_dist(P_hat_g)
+    print(f"gamma={γ:2.0f}: {np.round(π_g, 4)}")
+
+print("\nCorrectly specified:", np.round(π_true, 4))
 ```
 
+The recession state receives the largest increase.
+
 ```{solution-end}
 ```
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index 4f62e0349..8e9ca0e58 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -28,8 +28,16 @@ kernelspec:
 
 ## Overview
 
-Option prices reveal **risk-neutral probabilities**, the probabilities implied by asset
-prices once risk adjustments have been folded in.
+Asset prices are forward-looking: they encode investors' expectations about future
+economic states and their valuations of different risks.
+
+A long-standing question in finance is whether one can *recover* the probability
+distribution used by investors -- their subjective beliefs -- from observed asset
+prices alone.
+
+Option prices reveal **state prices**; once these are normalized by the riskless
+discount factor, the resulting probabilities are the **risk-neutral probabilities**
+implied by asset prices after risk adjustments have been folded in.
 
 These are not the **natural probabilities** that investors actually assign to future
 states of the world.
@@ -46,9 +54,10 @@ about the preferences of a representative investor.
 {cite:t}`Ross2015` showed otherwise.
 
 Under a structural restriction on the pricing kernel called **transition independence**,
-the natural probability distribution and the pricing kernel can be uniquely recovered
-from state prices alone with no historical return data and no assumed utility
-function, provided the state-price system is Markov, irreducible, and sufficiently rich.
+together with no-arbitrage and irreducibility of an identified finite-state Markov
+Arrow–Debreu state-price transition matrix, the natural probability transition matrix
+and the transition pricing kernel can be uniquely recovered from state prices alone
+with no historical return data and no assumed utility function.
 
 This is the **Recovery Theorem**.
 
@@ -125,6 +134,27 @@ $$ (eq:canon_ge)
 
 The key structural property this implies is **transition independence**.
 
+### The identification challenge
+
+Before stating the restriction, it helps to see why one is needed at all.
+
+Given $P$, any pair $(\phi, f)$ satisfying $p_{ij} = \phi_{ij} f_{ij}$ for every
+$(i,j)$ is consistent with observed state prices.
+
+The state-price matrix $P$ supplies $m^2$ equations.
+
+A natural transition matrix $F$
+contributes $m(m-1)$ free entries (rows sum to one), and an arbitrary kernel $\phi$
+contributes another $m^2$ -- a total of $2m^2 - m$ unknowns against only $m^2$
+equations.
+
+The system is under-identified by exactly $m^2 - m$ parameters, so some structural
+restriction on the kernel is needed to pin down $\phi$ and $f$ separately.
+
+Transition independence below is one such restriction: it cuts $\phi$ from $m^2$ free
+entries down to $m$ (a state function $h$ free up to scale plus a discount factor
+$\beta$), closing the identification gap exactly.
+
 ### Transition independence
 
 
@@ -223,8 +253,9 @@ factor) and the Perron vector $z$ determines $D$ via $D_{ii} = 1/z_i$.
 
 The three assumptions in the theorem each carry a specific role.
 
-No-arbitrage guarantees that $P$ has nonnegative entries and that the state prices
-encode a well-defined pricing measure.
+Assuming the Arrow–Debreu state prices are identified, no-arbitrage guarantees that
+$P$ has nonnegative entries and that the state prices encode a well-defined pricing
+measure.
 
 Irreducibility ensures the economy is not divided into disconnected sub-economies --
 without it, the Perron–Frobenius theorem gives multiple candidate eigenvectors and
@@ -244,10 +275,9 @@ Suppose prices provide no arbitrage opportunities, that the state
 price transition matrix $P$ is irreducible, and that the pricing kernel is transition
 independent.
 
-Then there exists a *unique* positive solution $(\beta, z, F)$ to the recovery problem.
-
-That is, under these assumptions, the state prices imply a unique compatible natural
-probability transition matrix and a unique transition pricing kernel.
+Then there exists a positive solution $(\beta, z, F)$ to the recovery problem in which
+$z$ is unique up to normalization, and the implied natural probability transition
+matrix $F$ and transition pricing kernel are unique.
 ```
 
 ```{prf:proof}
@@ -294,9 +324,13 @@ where $h(\theta_i) = \beta/z_i$ follows from $D_{ii} = h(\theta_i)/\beta = 1/z_i
 Destination states with high $z_j$ have *low* kernel values: for a fixed origin $i$,
 the kernel $\beta z_i/z_j$ is decreasing in $z_j$.
 
-This means the market assigns relatively less pricing weight per unit of probability to
-high-$z_j$ outcomes -- consistent with those states being "good times" that require less
-insurance.
+When $h$ is interpreted as marginal utility and states are ordered by consumption or
+payoff, larger $z_j$ corresponds to lower marginal utility -- "good times" that
+require less insurance and so receive less pricing weight per unit of natural
+probability. 
+
+This monotonic interpretation is not guaranteed for an arbitrary ordering
+of stock-market states.
 
 The same eigenvector argument also clarifies a useful limiting case.
 
@@ -364,11 +398,13 @@ $$
 
 Following Ross's Table I, we represent the distribution on a finite grid of states.
 
-Ross's table uses a fixed future payoff distribution, so its rows of $F$ are
-identical.
+This example is Ross-inspired rather than an exact reproduction of Ross's Table I.
+
+Ross's Table I uses a fixed future payoff distribution, so its rows of $F$ are
+identical. 
 
-Here we apply the same finite-grid construction to a Markov transition matrix with
-lognormal-shaped rows.
+Here the same CRRA/lognormal pricing logic is embedded in a finite Markov
+transition matrix whose rows shift with the current state.
 
 Ross uses states from $-5$ to $+5$ standard deviations; we use
 the same range below.
@@ -811,7 +847,7 @@ where $\sigma(M)$ is the standard deviation of the actual one-period stochastic
 factor projected on that filtration. Arbitrary orthogonal noise in a candidate kernel
 does not tighten this market-efficiency bound.
 
-This follows from the {doc}`advacned:hansen_jagannathan_1991` {cite}`Hansen_Jagannathan_1991`.
+This follows from the Hansen–Jagannathan bound {cite}`Hansen_Jagannathan_1991`.
 
 Equivalently, under the Recovery Theorem assumptions, the $R^2$ of return-forecasting
 regressions based on that information set is bounded above by the variance of the
@@ -843,7 +879,7 @@ If the kernel is not transition independent, recovery is not guaranteed.
 long-run risk component of the kernel with the natural probability distribution,
 yielding an incorrect decomposition.
 
-*Empirical estimationL*
+*Empirical estimation:*
 
 Extracting reliable state prices from observed option prices requires careful
 interpolation and extrapolation.
@@ -880,8 +916,9 @@ $$
 
 (c) Verify that each row of $F$ sums to one and all entries are positive.
 
-(d) Compute the relative kernel component $1/z_i$ for each state. For a transition from
-state $i$ to state $j$, the full pricing kernel is $\beta z_i/z_j$.
+(d) For destination state $j$, the relative kernel component is $1/z_j$; for a
+transition from state $i$ to state $j$, the full pricing kernel is $\beta z_i/z_j$.
+Compute $1/z_j$ for each state.
 
 Does the kernel decrease as we move from state 1 to state 3 (i.e., from bad to good
 states)?
@@ -1005,7 +1042,7 @@ Write a function `tail_risk_ratio(γ, threshold, μ, σ, ρ, T)` that:
 1. Constructs the state price matrix $P$ using `build_state_price_matrix` with
    the given parameters and `n_states=41`.
 2. Applies `recover_natural_distribution` to obtain $F$.
-3. Computes $P(\text{log-return} < \text{threshold})$ under both the natural
+3. Computes $P(\text{log-return} \leq \text{threshold})$ under both the natural
    and risk-neutral distributions starting from the middle state.
 4. Returns the ratio $p_\text{risk-neutral} / p_\text{natural}$.
 

From 5f9e369b476bb2c4b9ef5d9cb1a18ceb4f0270a7 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Mon, 27 Apr 2026 15:32:44 +0800
Subject: [PATCH 20/26] updates

Co-authored-by: Copilot <copilot@github.com>
---
 lectures/misspecified_recovery.md | 586 ++++++++++++++++++++++--------
 lectures/ross_recovery.md         | 163 +++++----
 2 files changed, 520 insertions(+), 229 deletions(-)

diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index aa56f8ca1..d24b34ecb 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -28,49 +28,106 @@ kernelspec:
 
 ## Overview
 
-The lecture {doc}`ross_recovery` studies conditions under which recovery is valid.
+The lecture {doc}`ross_recovery` studies the case in which recovery is valid.
 
 There, **transition independence** lets us use Arrow prices to separate investors'
 beliefs from the pricing kernel.
 
-This lecture asks what the same Perron--Frobenius calculation delivers when transition
-independence fails.
+This lecture asks what the same Perron--Frobenius calculation delivers when that
+restriction fails.
 
-{cite:t}`BorovickaHansenScheinkman2016` show that the stochastic discount factor can be
-decomposed into three pieces: a deterministic long-run discount component, a
-state-dependent eigenfunction ratio, and a martingale likelihood ratio.
+We will keep three probability laws separate.
 
-The first two pieces are exactly what the Perron--Frobenius eigenpair can absorb.
+The first is the correctly specified transition law, which is the law that actually
+governs the Markov state in the model.
 
-The martingale piece is different: it changes the probability measure.
+In the paper, this can be interpreted as the actual law under rational expectations.
 
-In their words, it produces a probability measure that "absorbs long-term risk
-adjustments" {cite}`BorovickaHansenScheinkman2016`.
+Interpreting it as investors' subjective beliefs requires additional assumptions.
 
-Thus the probabilities recovered from Arrow prices need not be the correctly specified
-transition probabilities for the state process.
+The second is the one-period risk-neutral law, which comes from normalizing one-period
+Arrow prices by bond prices.
 
-Instead, they can already include compensation for long-run risk.
+The third is the Perron, or recovered, law, which is the probability law produced by the
+same eigenvector calculation used in Ross recovery.
 
-The likelihood ratio between the recovered probabilities and the correctly specified
-probabilities is the martingale component.
+The central question is whether the recovered law equals the correctly specified law.
 
-If that martingale is constant, Ross recovery returns the correctly specified
+{cite:t}`BorovickaHansenScheinkman2016` show that, in general, the answer is no.
+
+A likelihood ratio is just a ratio of probabilities under two probability laws.
+
+The reason is that the stochastic discount factor can contain a likelihood-ratio term
+that changes the probability measure.
+
+If that likelihood-ratio term is constant, Ross recovery returns the correctly specified
 transition probabilities.
 
-If it is not constant, the recovered measure embeds long-horizon risk adjustments.
+If it is not constant, the recovered law includes risk adjustments that matter for
+long-horizon claims, because likelihood-ratio increments compound along histories.
 
 In the examples below, this typically shifts probability toward adverse long-run-risk
-states, so the recovered measure looks more pessimistic than the correctly specified
-probability law.
+states, so the recovered law looks more pessimistic than the correctly specified law.
 
 We will:
 
 - use results from {doc}`ross_recovery` without re-proving it,
-- diagnose misspecification through the likelihood-ratio martingale,
+- diagnose misspecification through a likelihood-ratio term,
 - show why recursive utility and permanent shocks break recovery,
 - measure the difference in a long-run risk model.
 
+### The broader framework
+
+The paper's framework is more general than the finite-state matrices used first in this
+lecture.
+
+It starts with a Markov state $X_t$ and, when needed, an auxiliary process $Y_t$ with
+stationary increments.
+
+The auxiliary process lets the model record shocks or growth components that are not
+fully summarized by $X_t$ alone.
+
+The basic objects are **multiplicative functionals**.
+
+A positive process $M_t$ is a multiplicative functional when its log increments depend
+on the current state and the next shock.
+
+Stochastic discount factors, cash-flow growth processes, and likelihood-ratio
+martingales are all treated this way.
+
+In the paper, a stochastic discount factor $S_t$ prices bounded claims by
+
+$$
+\Pi_{\tau,t}(\Phi_t)
+= E\left[\frac{S_t}{S_\tau}\Phi_t \mid \mathcal F_\tau\right].
+$$
+
+For a payoff $f(X_t)$, this defines a pricing operator
+
+$$
+[Q_t f](x)
+= E[S_t f(X_t) \mid X_0=x].
+$$
+
+The Perron--Frobenius problem is therefore an operator problem:
+
+$$
+[Q_t \hat e](x) = \exp(\hat\eta t)\hat e(x).
+$$
+
+The associated likelihood-ratio martingale is
+
+$$
+\frac{\hat H_t}{\hat H_0}
+= \exp(-\hat\eta t) S_t
+  \frac{\hat e(X_t)}{\hat e(X_0)}.
+$$
+
+In a finite-state one-period model, $Q_t$ becomes a matrix power and $\hat e$ becomes a
+positive eigenvector.
+
+That is the special case we use below to make the mechanics transparent.
+
 ```{code-cell} ipython3
 import numpy as np
 import matplotlib.pyplot as plt
@@ -79,8 +136,10 @@ from scipy.integrate import solve_ivp
 from scipy.stats import gaussian_kde
 ```
 
-The next cell contains code inherited from the previous lecture: row-normalizing Arrow
-prices, finding a positive Perron pair, and computing stationary distributions.
+The next cell contains code inherited from the previous lecture.
+
+It row-normalizes Arrow prices, finds the positive Perron eigenpair, and computes
+stationary distributions.
 
 ```{code-cell} ipython3
 :tags: [hide-input]
@@ -93,7 +152,7 @@ def risk_neutral_probs(Q):
 
 
 def perron_frobenius(Q):
-    """Positive Perron pair and induced long-term risk-neutral transition matrix."""
+    """Positive Perron pair and induced recovered transition matrix."""
     eigenvalues, eigenvectors = linalg.eig(Q)
     eigenvalues = np.real_if_close(eigenvalues, tol=1000)
     eigenvectors = np.real_if_close(eigenvectors, tol=1000)
@@ -143,13 +202,13 @@ def martingale_increment(Q, P):
     return H, eta, e, P_hat
 ```
 
-## One-period and long-term risk-neutral matrices
+## Three transition matrices
 
 Let $\mathbf{P}=[p_{ij}]$ denote the correctly specified transition matrix and
 $\mathbf{Q}=[q_{ij}]$ the Arrow price matrix.
 
-Here "correctly specified" means the transition law that actually governs the Markov
-state in the model.
+Here "correctly specified" means that $\mathbf{P}$ is the transition law that actually
+governs the Markov state in the model.
 
 The one-period stochastic discount factor (SDF) satisfies
 
@@ -157,11 +216,13 @@ $$
 q_{ij} = s_{ij} p_{ij}.
 $$
 
-We will compare $\mathbf{P}$ with two probability matrices constructed from the same
-Arrow price matrix $\mathbf{Q}$.
+We will compare $\mathbf{P}$ with two probability matrices constructed from
+$\mathbf{Q}$.
+
+The first one is the **one-period risk-neutral matrix**.
 
-First, the **one-period risk-neutral matrix** divides each row of $\mathbf{Q}$ by the
-price of a one-period discount bond in the current state:
+It divides each row of $\mathbf{Q}$ by the price of a one-period discount bond in the
+current state:
 
 $$
 \bar p_{ij}
@@ -170,8 +231,9 @@ $$
 
 This matrix absorbs one-period risk adjustments into transition probabilities.
 
-Second, the **long-term risk-neutral matrix** uses the positive Perron eigenpair of
-$\mathbf{Q}$.
+The second one is the **Perron recovered matrix**.
+
+It starts from the positive Perron eigenpair of $\mathbf{Q}$.
 
 Let $(\exp(\hat \eta), \hat e)$ solve
 
@@ -186,16 +248,38 @@ $$
 = \exp(-\hat \eta) q_{ij} \frac{\hat e_j}{\hat e_i}.
 $$
 
-This construction removes the state-dependent Perron eigenfunction from Arrow prices
-and returns a stochastic matrix $\hat{\mathbf{P}}$.
+The factor $\hat e_j/\hat e_i$ is chosen to cancel any SDF component of the form
+$\exp(\hat \eta)\hat e_i/\hat e_j$.
+
+The result is a stochastic matrix $\hat{\mathbf{P}}$.
+
+This construction assumes that the relevant Arrow-price matrix has a positive Perron
+pair that is unique up to scale.
+
+In the finite-state examples below this condition is satisfied, while in more general
+state spaces the paper imposes additional stability and ergodicity conditions.
+
+In particular, positive eigenfunctions need not be unique in continuous state spaces.
+
+The paper's uniqueness result selects the Perron solution whose likelihood-ratio
+martingale makes $X_t$ stationary and ergodic under the recovered probability measure.
+
+Following {cite:t}`BorovickaHansenScheinkman2016`, $\hat{\mathbf{P}}$ is called a
+**long-term risk-neutral** transition matrix.
+
+The name means that the Perron eigenpair isolates the part of pricing that dominates
+long-maturity Arrow claims.
+
+It is not the same object as the one-period risk-neutral matrix
+$\bar{\mathbf{P}}$.
 
 In {doc}`ross_recovery`, transition independence pins down the split between $s_{ij}$
 and $p_{ij}$.
 
 Here we drop transition independence.
 
-The question is whether $\hat{\mathbf{P}}$ still
-equals the correctly specified transition matrix $\mathbf{P}$.
+The question is whether the Perron recovered matrix $\hat{\mathbf{P}}$ still equals the
+correctly specified matrix $\mathbf{P}$.
 
 ### Where recovery works
 
@@ -234,7 +318,8 @@ S_power = (
 Q_power = S_power * P_true
 ```
 
-We now compute both risk-neutral matrices from the same Arrow price matrix.
+We now compute the one-period risk-neutral matrix and the Perron recovered matrix from
+the same Arrow price matrix.
 
 ```{code-cell} ipython3
 P_bar, q_bonds = risk_neutral_probs(Q_power)
@@ -244,32 +329,50 @@ P_bar, q_bonds = risk_neutral_probs(Q_power)
 π_hat = stationary_dist(P_hat)
 ```
 
-The one-period risk-neutral matrix differs from the correctly specified matrix because
-it includes one-period risk adjustments.
+These two matrices should not be expected to agree.
+
+The row-normalized matrix $\bar{\mathbf{P}}$ is a short-horizon risk-neutral change of
+measure: it folds the one-period SDF into transition probabilities, so it generally
+differs from the correctly specified matrix $\mathbf{P}$.
 
-The long-term risk-neutral transition matrix coincides with the correctly specified
-transition matrix because, after the Perron eigenfunction is removed, no
-likelihood-ratio term remains.
+The logic comes from the recovery formula in {doc}`ross_recovery`.
 
-Here is the likelihood-ratio term explicitly.
+In the transition-independent case, the pricing kernel has the form
+$s_{ij}=\exp(\hat\eta)\hat e_i/\hat e_j$.
 
-Define
+Substituting this into the Perron formula gives
 
 $$
-\hat h_{ij}
-= \frac{\hat p_{ij}}{p_{ij}}
-= \exp(-\hat\eta)s_{ij}\frac{\hat e_j}{\hat e_i}.
+\hat p_{ij}
+= \exp(-\hat\eta) q_{ij}\frac{\hat e_j}{\hat e_i}
+= \exp(-\hat\eta)
+  \left(\exp(\hat\eta)\frac{\hat e_i}{\hat e_j}p_{ij}\right)
+  \frac{\hat e_j}{\hat e_i}
+=p_{ij}.
 $$
 
-For a path of states, the product
+Thus the Perron matrix $\hat{\mathbf{P}}$ cancels the transition-independent part of
+the SDF.
+
+In this power-utility benchmark, the whole SDF has exactly that form, so the remaining
+likelihood-ratio term should be one and $\hat{\mathbf{P}}$ should coincide with
+$\mathbf{P}$.
+
+The next calculation checks this by comparing the Perron eigenfunction with
+$c_i^\gamma$ and then computing the ratio $\hat{\mathbf{P}}/\mathbf{P}$.
+
+Define the diagnostic ratio
 
 $$
-\hat H_t
-= \prod_{\tau=1}^t \hat h_{X_{\tau-1}, X_\tau}
+\hat h_{ij}
+= \frac{\hat p_{ij}}{p_{ij}}
+= \exp(-\hat\eta)s_{ij}\frac{\hat e_j}{\hat e_i}.
 $$
 
-is the likelihood-ratio martingale that changes probabilities from the correctly
-specified measure to the recovered measure.
+When $\hat h_{ij}=1$ for every transition, the recovered matrix and the correctly
+specified matrix are the same.
+
+The next section explains why this ratio is also called a likelihood-ratio increment.
 
 In the power-utility example, write
 
@@ -299,19 +402,6 @@ $$
 =1.
 $$
 
-```{code-cell} ipython3
-matrices = [
-    ("correctly specified P", P_true),
-    ("one-period risk-neutral P_bar", P_bar),
-    ("long-term risk-neutral P_hat", P_hat),
-]
-
-for label, mat in matrices:
-    print(label)
-    print(np.round(mat, 3))
-    print()
-```
-
 ```{code-cell} ipython3
 H_power = np.divide(P_hat, P_true, out=np.ones_like(P_true), where=P_true > 0)
 e_theory = c_levels**γ_power
@@ -331,8 +421,8 @@ print(f"\nmax |h_hat - 1| = "
       f"{np.max(np.abs(H_power[P_true > 0] - 1)):.2e}")
 ```
 
-The output illustrates the difference between short-horizon and long-horizon risk
-adjustments.
+The output separates a short-horizon risk adjustment from the Perron recovery
+calculation.
 
 The one-period risk-neutral matrix $\bar{\mathbf{P}}$ is close to, but not the same as,
 the correctly specified matrix $\mathbf{P}$.
@@ -344,13 +434,15 @@ By contrast, the long-term risk-neutral matrix $\hat{\mathbf{P}}$ is exactly the
 as $\mathbf{P}$ in this example.
 
 The diagnostic confirms why: the likelihood-ratio increment $\hat h_{ij}$ is one for
-every transition, so the martingale $\hat H_t$ is identically one.
+every transition.
 
 This is the condition under which Ross recovery returns the correctly specified
-transition matrix: after the Perron eigenfunction removes the state-dependent part of
-the SDF, no likelihood-ratio martingale remains.
+transition matrix.
+
+In this example, that cancellation exhausts the SDF, so no additional probability
+distortion remains.
 
-## The martingale diagnostic
+## The likelihood-ratio diagnostic
 
 Let $(\hat \eta, \hat e)$ be the positive Perron pair of $\mathbf{Q}$:
 
@@ -365,13 +457,22 @@ $$
 = \exp(-\hat\eta) q_{ij} \frac{\hat e_j}{\hat e_i}.
 $$
 
-Compare $\hat{\mathbf{P}}$ with the correctly specified transition matrix
-$\mathbf{P}$ by defining
+To see whether recovery has changed the probability law, compare each recovered
+transition probability with the corresponding correctly specified transition
+probability.
+
+For feasible transitions with $p_{ij}>0$, define the one-period likelihood-ratio
+increment
 
 $$
 \hat h_{ij} = \frac{\hat p_{ij}}{p_{ij}}.
 $$
 
+If $\hat h_{ij}>1$, the recovered law assigns more probability to transition $(i,j)$
+than the correctly specified law.
+
+If $\hat h_{ij}<1$, it assigns less probability to that transition.
+
 For a fixed current state $i$, the numbers $\hat h_{ij}$ average to one under the
 correctly specified transition probabilities:
 
@@ -381,8 +482,11 @@ $$
 
 Thus $\hat h_{ij}$ is a one-period likelihood-ratio increment.
 
-Multiplying these
-increments over time gives a martingale.
+Multiplying these increments along a history of states gives the likelihood ratio for
+the whole history.
+
+That likelihood-ratio process is a martingale, which is why the last term in the
+decomposition below is called a martingale component.
 
 The one-period SDF can be written as
 
@@ -408,9 +512,10 @@ transition matrix.
 ```{prf:proposition} Recovery diagnostic
 :label: prop-misspecified-recovery-diagnostic
 
-For a finite-state Markov model with correctly specified transition matrix $\mathbf{P}$ and Arrow
-matrix $\mathbf{Q}$, Perron--Frobenius recovery returns the correctly specified transition matrix
-if and only if $\hat h_{ij}=1$ for every transition with $p_{ij}>0$.
+Under the finite-state assumptions used in this lecture, for a Markov model with
+correctly specified transition matrix $\mathbf{P}$ and Arrow matrix $\mathbf{Q}$,
+Perron--Frobenius recovery returns the correctly specified transition matrix if and only
+if $\hat h_{ij}=1$ for every transition with $p_{ij}>0$.
 
 Equivalently, recovery returns the correctly specified transition matrix if and only if
 the SDF has no nonconstant likelihood-ratio martingale:
@@ -436,17 +541,46 @@ This condition is the same as saying that the SDF can be written in the displaye
 with no extra likelihood-ratio term.
 ```
 
-The power-utility calculation above illustrates the proposition: the likelihood-ratio increment $\hat h_{ij}$ is a constant one.
+This finite-state diagnostic is a special case of the paper's general identification
+result.
+
+If a pair $(S,P)$ explains asset prices and $H$ is any positive martingale, then the
+same asset prices are also explained by the changed probability measure $P^H$ together
+with the adjusted stochastic discount factor
+
+$$
+S_t^H = S_t\frac{H_0}{H_t}.
+$$
+
+Thus Arrow prices alone cannot usually distinguish a change in beliefs from a change in
+the SDF.
+
+Ross recovery becomes an identification result only after imposing a restriction such
+as
+
+$$
+S_t = \exp(-\delta t)\frac{m(X_t)}{m(X_0)},
+$$
+
+which rules out a nontrivial martingale component.
+
+The power-utility calculation above illustrates the proposition.
+
+In that benchmark, the likelihood-ratio increment $\hat h_{ij}$ is a constant one.
 
 ## Recursive utility
 
+We now use the diagnostic to see how recovery can fail.
+
 The previous example worked because all risk adjustment in the SDF could be written as
 a ratio of a function of today's state to a function of tomorrow's state.
 
-The Perron eigenfunction removes exactly that kind of term.
+The Perron formula cancels exactly that kind of term.
 
-Recursive utility usually adds something else: a continuation-value term that behaves
-like a likelihood ratio.
+Recursive utility usually adds something else.
+
+The extra object is a continuation-value term, and the key point is that it behaves like
+the likelihood-ratio increment defined above.
 
 For the unit-EIS Epstein--Zin case in {cite:t}`BorovickaHansenScheinkman2016`, with
 $C_t=\exp(g_c t)c(X_t)$, write the translated continuation value as $V_t=g_c t+v(X_t)$,
@@ -464,8 +598,19 @@ s_{ij}
   \frac{v_j^*}{\sum_k p_{ik}v_k^*}.
 $$
 
-The denominator is the conditional expectation of $v_j^*$ given current state $i$, so
-the last fraction has conditional mean one under $\mathbf{P}$.
+In this unit-EIS example, the Perron eigenfunction is $\hat e_j=c_j$ and
+$\hat\eta=-(\delta+g_c)$.
+
+Applying the Perron formula therefore leaves
+
+$$
+\hat p_{ij}
+= p_{ij}\frac{v_j^*}{\sum_k p_{ik}v_k^*}.
+$$
+
+The denominator is the conditional expectation of $v_j^*$ given current state $i$.
+
+Therefore the last fraction has conditional mean one under $\mathbf{P}$.
 
 It is therefore a likelihood-ratio increment.
 
@@ -512,7 +657,7 @@ def solve_ez_unit_eis(P, c, δ, γ, g_c, tol=1e-12, max_iter=10_000):
     return v, v_star, S
 ```
 
-At log utility, $v^*$ is constant and the martingale disappears.
+At log utility, $v^*$ is constant and the likelihood-ratio term disappears.
 
 As risk aversion rises, continuation values matter more.
 
@@ -562,7 +707,7 @@ rec_prob_gain = 100 * (rec_prob - π_true[0])
 fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))
 
 bound = np.max(np.abs(H_dev))
-im = axes[0].imshow(H_dev, cmap='coolwarm', vmin=-bound, vmax=bound, aspect='auto')
+im = axes[0].imshow(H_dev, cmap='Blues', vmin=-bound, vmax=bound)
 axes[0].set_xticks(range(3))
 axes[0].set_yticks(range(3))
 axes[0].set_xticklabels(state_names, rotation=20)
@@ -589,6 +734,18 @@ plt.tight_layout()
 plt.show()
 ```
 
+It is clear that recursive utility tilts the recovered law toward worse future
+states.
+
+At $\gamma=10$, transitions into recession receive more probability under the recovered
+law, while transitions into expansion receive less.
+
+As risk aversion rises, this distortion becomes stronger and the stationary recession
+probability under the recovered law moves further above its correctly specified value.
+
+Thus, as the continuation-value term creates a nonconstant $\hat h_{ij}$, the Perron
+recovered matrix no longer equals the correctly specified transition matrix.
+
 ## Permanent shocks
 
 Recursive utility is one way to generate a nonconstant likelihood ratio.
@@ -614,14 +771,12 @@ $$
 
 The middle term depends only on the current and next Markov states.
 
-It is a ratio of
-state functions, so the Perron eigenfunction can absorb it.
+It is a ratio of state functions, so the Perron formula can cancel it.
 
 The permanent shock term depends on the new shock $\varepsilon_{t+1}$.
 
-Because that
-shock is not summarized by the finite Markov state in this calculation, it cannot be
-removed by a state eigenfunction.
+Because that shock is not summarized by the finite Markov state in this calculation,
+there is no state function whose ratio can cancel it.
 
 After dividing by its conditional mean, the shock term becomes a likelihood-ratio
 increment:
@@ -634,14 +789,27 @@ $$
 Thus permanent consumption shocks can break belief recovery, even under ordinary power
 utility.
 
+This statement is relative to the Markov state used in the recovery calculation.
+
+Enlarging the state or information structure to account for the shock can accommodate
+it, but doing so creates the identification problem discussed in {ref}`mr_additional_state`.
+
 ## Long-run risk
 
-We now use the Bansal--Yaron long-run risk model, in the calibration reported by
+We now move from small finite-state examples to a standard continuous-time
+macro-finance model.
+
+The model is the Bansal--Yaron long-run risk model, using the calibration reported by
 {cite:t}`BorovickaHansenScheinkman2016`.
 
 The point is to see how different the recovered measure can look in a standard
 macro-finance model.
 
+The calculation has the same structure as before.
+
+We first write the correctly specified state dynamics, then compute the probability law
+implied by the Perron recovery calculation.
+
 The state vector $X_t=(X_{1t},X_{2t})'$ follows
 
 $$
@@ -660,12 +828,23 @@ Here $X_1$ is predictable consumption growth and $X_2$ is stochastic volatility.
 The representative agent has Epstein--Zin utility with unit elasticity of intertemporal
 substitution.
 
-The continuation value introduces a martingale $H^*$ into the SDF:
+The continuation value introduces the continuous-time analogue of the likelihood-ratio
+process above.
+
+We denote that process by $H^*$, and the SDF satisfies
 
 $$
 d\log S_t = -\delta dt - d\log C_t + d\log H_t^*.
 $$
 
+Here $H^*$ is the continuation-value martingale entering the Epstein--Zin SDF.
+
+The Perron--Frobenius likelihood-ratio martingale $\hat H$ is obtained only after also
+incorporating the Perron eigenfunction.
+
+In models with martingale components in consumption growth, $H^*$ and $\hat H$ need not
+coincide.
+
 The next cell sets the calibration.
 
 ```{code-cell} ipython3
@@ -686,23 +865,65 @@ lrr_params = dict(
 )
 ```
 
-In this affine model, the continuation value and the Perron eigenfunction have a simple
-exponential-affine form.
+The next code block computes how the different probability measures change the drift of
+the state vector.
+
+The first object is the continuation value.
+
+In this affine model, the translated continuation value is linear in the state:
+
+$$
+v(x) = v_0 + v_1 x_1 + v_2 x_2.
+$$
+
+This is why we call $v_1$ and $v_2$ slopes.
+
+They are the derivatives of the continuation value with respect to predictable growth
+and volatility.
+
+These slopes enter the continuation-value martingale $H^*$.
+
+In the code, this martingale has shock exposure
+
+$$
+\alpha_{H^*}
+= (1-\gamma)(\alpha_c + \sigma_1 v_1 + \sigma_2 v_2).
+$$
+
+Since the SDF is $d\log S_t=-\delta dt-d\log C_t+d\log H_t^*$, its shock exposure is
+
+$$
+\alpha_S = -\alpha_c + \alpha_{H^*}.
+$$
+
+This vector $\alpha_S$ drives the one-period risk-neutral change of measure.
+
+The second object is the Perron eigenfunction.
+
+It is exponential-affine:
+
+$$
+\hat e(x) = \exp(e_0 + e_1 x_1 + e_2 x_2).
+$$
+
+Thus $e_1$ and $e_2$ are slopes of the log eigenfunction.
 
-Think of the translated continuation value as having slopes $(v_1, v_2)$ with respect
-to predictable growth and volatility.
+Because $X_1$ and $X_2$ have shock loadings $\sigma_1$ and $\sigma_2$, the Perron
+eigenfunction contributes the additional shock exposure
 
-The Perron eigenfunction has analogous slopes $(e_1, e_2)$.
+$$
+\sigma_1 e_1 + \sigma_2 e_2.
+$$
 
-Once those slopes are known, changing probabilities is a drift adjustment: the
-instantaneous risk-neutral measure uses the SDF shock exposure, while the long-term
-risk-neutral measure also includes the Perron eigenfunction shock exposure.
+Therefore the one-period risk-neutral dynamics use only $\alpha_S$, while the Perron
+recovered dynamics use
 
-The next functions compute these pieces in that order.
+$$
+\alpha_S + \sigma_1 e_1 + \sigma_2 e_2.
+$$
 
-This is why the code below is organized as value-function coefficients, Perron
-coefficients, and then the drift of $X$ under the recovered and risk-neutral probability
-measures.
+The functions below follow this order: compute $(v_1, v_2)$, compute $\alpha_S$ and
+$(e_1, e_2)$, and then translate these shock exposures into drifts for $X$.
 
 ```{code-cell} ipython3
 def solve_value_function(p):
@@ -713,9 +934,11 @@ def solve_value_function(p):
     β_c1, β_c2 = p["β_c1"], p["β_c2"]
     α_c = p["α_c"]
 
+    # v1 is the coefficient on predictable growth in v(x).
     v1 = β_c1 / (δ - μ11)
 
-    # The volatility slope solves a scalar quadratic.
+    # v2 is the coefficient on volatility.
+    # In the affine model it is the stable root of a scalar quadratic.
     A_vec = α_c + σ1 * v1
     B_vec = σ2
 
@@ -740,15 +963,19 @@ def solve_pf_lrr(p, v1, v2):
     α_c = p["α_c"]
     β_c0, β_c1, β_c2 = p["β_c0"], p["β_c1"], p["β_c2"]
 
+    # Continuation-value martingale exposure and SDF exposure.
     α_h_star = (1 - γ) * (α_c + σ1 * v1 + σ2 * v2)
     α_s = -α_c + α_h_star
 
+    # Drift coefficients of log S before the Perron factorization.
     β_s11 = -β_c1
     β_s12 = -β_c2 - 0.5 * np.dot(α_h_star, α_h_star)
     β_s0 = -δ - β_c0 - 0.5 * ι2 * np.dot(α_h_star, α_h_star)
 
+    # e1 and e2 are coefficients in log e(x) = e0 + e1 x1 + e2 x2.
     e1 = -β_s11 / μ11
 
+    # e2 solves the remaining quadratic from the Perron eigenvalue equation.
     const = (β_s12 + 0.5 * np.dot(α_s, α_s)
              + e1 * (μ12 + np.dot(σ1, α_s))
              + 0.5 * e1**2 * np.dot(σ1, σ1))
@@ -776,12 +1003,15 @@ def recovered_lrr_dynamics(p, e1, e2, α_s):
     ι1, ι2 = p["ι1"], p["ι2"]
     σ1, σ2 = p["σ1"], p["σ2"]
 
+    # The recovered measure uses the SDF exposure plus the Perron exposure.
     α_h = α_s + σ1 * e1 + σ2 * e2
 
+    # A diffusion change of measure shifts each drift by sigma_i dot alpha_h.
     μ_hat_11 = μ11
     μ_hat_12 = μ12 + np.dot(σ1, α_h)
     μ_hat_22 = μ22 + np.dot(σ2, α_h)
 
+    # Rewrite the shifted drift in mean-reversion form.
     ι_hat_2 = (μ22 / μ_hat_22) * ι2
     ι_hat_1 = ι1 + (μ12 * ι2 - μ_hat_12 * ι_hat_2) / μ11
 
@@ -803,10 +1033,12 @@ def risk_neutral_lrr_dynamics(p, α_s):
     ι1, ι2 = p["ι1"], p["ι2"]
     σ1, σ2 = p["σ1"], p["σ2"]
 
+    # The one-period risk-neutral measure uses only the SDF exposure.
     μ_bar_11 = μ11
     μ_bar_12 = μ12 + np.dot(σ1, α_s)
     μ_bar_22 = μ22 + np.dot(σ2, α_s)
 
+    # Rewrite the shifted drift in mean-reversion form.
     ι_bar_2 = (μ22 / μ_bar_22) * ι2
     ι_bar_1 = ι1 + (μ12 * ι2 - μ_bar_12 * ι_bar_2) / μ11
 
@@ -880,57 +1112,40 @@ discount rate.
 
 ### State probabilities
 
-Figure 1 in {cite:t}`BorovickaHansenScheinkman2016` is about forecasting after
-treating the recovered measure as beliefs.
-
-It is the same message as the coefficient table above, but shown as a distribution
-rather than as long-run means.
-
-The table said that the recovered measure lowers the long-run mean of predictable
-growth $X_1$ and raises the long-run mean of volatility $X_2$.
+The coefficient table gives one summary of the distortion created by recovery.
 
-The figure shows the same distortion geometrically: probability mass moves down and to
-the right.
+A probability plot gives another.
 
-The left panel uses the correctly specified probability measure $\mathbf{P}$.
+It shows not only that the means of $X_1$ and $X_2$ move, but also which combinations of
+growth and volatility become more likely.
 
-The right panel uses the probability measure recovered from the Perron--Frobenius
-calculation, $\hat{\mathbf{P}}$.
+This matters because treating the recovered law as beliefs changes the whole forecast
+distribution, not just a pair of long-run averages.
 
-The main message is not just that the two densities differ.
-
-The recovered measure puts more probability on bad long-run-risk states.
+Under the recovered law, probability mass shifts toward bad long-run-risk states.
 
 These are states with lower predictable growth $X_1$ and higher volatility $X_2$.
 
-It also makes low growth and high volatility occur together more often.
-
-The dashed contour adds the instantaneous risk-neutral distribution. In this calibration,
-the risk-neutral and recovered stationary distributions are close to each other and both
-are far from the correctly specified distribution.
-
-This means that the martingale likelihood ratio is responsible for much of the risk
-adjustment.
+The dashed contour adds the one-period risk-neutral law.
 
-The plot below is drawn in three steps.
-
-First, we simulate the state process under each set of drift parameters: the correctly
-specified dynamics, the recovered long-term risk-neutral dynamics, and the instantaneous
-risk-neutral dynamics.
+In this calibration, the one-period risk-neutral and Perron recovered stationary
+distributions are close to each other, and both are far from the correctly specified
+distribution.
 
-Second, after discarding an initial burn-in, we estimate the stationary joint density of
-$(X_2, X_1)$ with a two-dimensional kernel density estimator.
+Thus the likelihood-ratio component accounts for much of the risk adjustment in the
+state dynamics.
 
-Third, we draw density contours on the same axes.
+The plot below simulates the state process under each probability law and estimates the
+stationary joint density of $(X_2, X_1)$.
 
 The horizontal line marks $X_1=0$ and the vertical line marks the correctly specified
 mean of volatility, $X_2=\iota_2$.
 
-The code uses the paper's calibration but keeps the simulation and KDE choices simple.
-
 ```{code-cell} ipython3
 def simulate_lrr(dyn, T=180_000, seed=123):
-    """Euler simulation of the LRR state process under one probability measure."""
+    """
+    Euler simulation of the LRR state process under one probability measure.
+    """
     rng = np.random.default_rng(seed)
     X1 = np.zeros(T)
     X2 = np.full(T, dyn["ι2"])
@@ -1029,8 +1244,16 @@ the right of the vertical line means higher volatility.
 
 ### Yield implications
 
-Figure 2 in {cite:t}`BorovickaHansenScheinkman2016` asks how the probability
-difference affects yields.
+The probability distortion matters for asset-pricing interpretation because yields mix
+two objects: a payoff forecast and an asset price.
+
+The recovered measure is called long-term risk-neutral because it absorbs
+the martingale component that prices long-horizon risk.
+
+For stochastically growing cash flows, long-term risk premia vanish when yields are
+computed under this recovered measure.
+
+Under the correctly specified law, those same long-term risk premia need not vanish.
 
 For a cash flow $G_t$, the yield compares a forecast of the payoff with its asset price:
 
@@ -1044,23 +1267,42 @@ The first term is a forecast of the cash flow.
 
 The second term is its price, written using the stochastic discount factor.
 
-If an analyst treats $\hat{\mathbf{P}}$ as investors' beliefs, the forecast term changes
-while the observed price is held fixed.
+Arrow prices determine the second term.
+
+The question here is what happens to the first term if an analyst treats the recovered
+law $\hat{\mathbf{P}}$ as investors' beliefs.
 
-The left panel applies this comparison to a payoff equal to aggregate consumption at
-maturity.
+For an aggregate-consumption payoff, the answer is substantial.
 
-The recovered measure treats adverse long-run-risk states as more likely.
+The recovered law assigns more probability to low-growth, high-volatility states, so it
+forecasts lower future consumption.
 
-As a result, it removes much of the long-run risk compensation from aggregate
-consumption cash flows.
+Holding prices fixed, that lower forecast translates into lower consumption yields.
 
-The resulting consumption yields are lower than the yields computed with the correctly
-specified probability measure.
+The zero-coupon bond is the comparison case.
 
-The bond panel gives the comparison case: for a zero-coupon payoff, changing the payoff
-forecast does not change the numerator. It isolates the maturity-matched discounting
-against which the aggregate-consumption cash flow is compared.
+Its payoff is one, so the forecast term is always $\log E[1]=0$.
+
+Changing beliefs therefore does not move the bond-yield panel.
+
+The same Perron object also appears in long-bond and forward-measure limits.
+
+The limiting one-period return on a very long bond is
+
+$$
+R^\infty_{t,t+1}
+= \exp(-\hat\eta)\frac{\hat e(X_{t+1})}{\hat e(X_t)}.
+$$
+
+The martingale increment satisfies
+
+$$
+\frac{\hat H_{t+1}}{\hat H_t}
+= \frac{S_{t+1}}{S_t} R^\infty_{t,t+1}.
+$$
+
+Thus the limiting one-period transition from forward measures coincides with the
+Perron recovered transition.
 
 The calculation below uses the affine formulas implied by the long-run risk model.
 
@@ -1231,6 +1473,7 @@ The bond panel is a check.
 Since $\log E[1]=0$ under any measure, the solid and dashed
 bond-yield bands coincide.
 
+(mr_additional_state)=
 ## Additional state vector
 
 {cite:t}`BorovickaHansenScheinkman2016` then asks whether the recovery
@@ -1307,7 +1550,7 @@ stationary model is being used to approximate stochastic growth.
 There is, however, a structured way forward.
 
 If the analyst supplies a reference multiplicative functional $Y^r$ that is known to
-contain the martingale component of the SDF, then one can restrict the enlarged
+have the same martingale component as the SDF, then one can restrict the enlarged
 eigenfunction to the form
 
 $$
@@ -1322,6 +1565,37 @@ With this extra structure, Arrow prices can again reveal subjective probabilitie
 But the key input is external: the long-run martingale component has been supplied by
 the analyst, not recovered from Arrow prices alone.
 
+## Measuring the martingale component
+
+The paper also asks how large the martingale component is in asset-market data.
+
+This matters because a small martingale component would make the recovered law close to
+beliefs, while a large one would make the recovered law mainly a long-term
+risk-neutral object.
+
+One family of measures applies a convex function to the martingale increment
+$\hat H_{t+1}/\hat H_t$.
+
+For example, conditional relative entropy uses
+
+$$
+E\left[
+    \frac{\hat H_{t+1}}{\hat H_t}
+    \log\frac{\hat H_{t+1}}{\hat H_t}
+    \mid X_t=x
+\right].
+$$
+
+This expression is zero only when the martingale increment is identically one.
+
+With incomplete asset-market data, the full martingale increment is not observed.
+
+The paper therefore uses pricing restrictions and long-bond return approximations to
+derive lower bounds on such discrepancy measures.
+
+These bounds are a way to test whether the martingale component is economically small
+without requiring a full set of Arrow prices.
+
 ## Lessons
 
 The Perron--Frobenius calculation remains useful under misspecification, but it no
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index 8e9ca0e58..16589457f 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -53,11 +53,17 @@ about the preferences of a representative investor.
 
 {cite:t}`Ross2015` showed otherwise.
 
-Under a structural restriction on the pricing kernel called **transition independence**,
-together with no-arbitrage and irreducibility of an identified finite-state Markov
-Arrow–Debreu state-price transition matrix, the natural probability transition matrix
-and the transition pricing kernel can be uniquely recovered from state prices alone
-with no historical return data and no assumed utility function.
+Ross's theorem says that, in a finite-state Markov economy, state prices can be
+enough.
+
+Suppose the Arrow–Debreu state-price transition matrix is arbitrage-free and
+irreducible.
+
+If the pricing kernel also satisfies a structural restriction called **transition
+independence**, then state prices uniquely determine both the natural probability
+transition matrix and the transition pricing kernel.
+
+No historical return data or assumed utility function is needed.
 
 This is the **Recovery Theorem**.
 
@@ -132,7 +138,12 @@ $$
     = \frac{\beta U'(c(\theta_j))}{U'(c(\theta_i))}.
 $$ (eq:canon_ge)
 
-The key structural property this implies is **transition independence**.
+This formula has a special structure: the kernel can be written as a ratio of two
+state-specific terms.
+
+Ross calls this property **transition independence**.
+
+We will say more about it soon.
 
 ### The identification challenge
 
@@ -148,12 +159,10 @@ contributes $m(m-1)$ free entries (rows sum to one), and an arbitrary kernel $\p
 contributes another $m^2$ -- a total of $2m^2 - m$ unknowns against only $m^2$
 equations.
 
-The system is under-identified by exactly $m^2 - m$ parameters, so some structural
+The system is under-identified by $m^2 - m$ parameters, so some structural
 restriction on the kernel is needed to pin down $\phi$ and $f$ separately.
 
-Transition independence below is one such restriction: it cuts $\phi$ from $m^2$ free
-entries down to $m$ (a state function $h$ free up to scale plus a discount factor
-$\beta$), closing the identification gap exactly.
+Transition independence restriction does the job, as we will see in the next section.
 
 ### Transition independence
 
@@ -179,10 +188,11 @@ intertemporally additive separable utility (where $h = U'$).
 
 In particular, this holds for {eq}`eq:canon_ge`.
 
-Ross also notes that some Epstein--Zin specifications can produce transition-independent
-kernels {cite}`Epstein_Zin1989`, although {doc}`misspecified_recovery` shows that
-recursive utility with nontrivial continuation-value martingales need not satisfy the
-Ross restriction.
+Transition independence helps because it ties all $m^2$ entries of $\phi$ together:
+once the $m$ state-specific values are known, the whole kernel is pinned down.
+
+It therefore cuts $\phi$ from $m^2$ free entries down to $m$, so the system becomes
+exactly identified.
 
 Under transition independence, the state-price equation becomes
 
@@ -225,7 +235,7 @@ In principle every eigenvalue-eigenvector pair of $P$ is a formal solution, but
 one with a strictly positive eigenvector is economically valid: $D_{ii} = 1/z_i$ must be
 positive (so $z_i > 0$), and $F$ must have nonnegative entries.
 
-The theorem below guarantees that exactly one such pair exists.
+The Perron–Frobenius theorem guarantees that exactly one such pair exists.
 
 ```{prf:theorem} Perron--Frobenius
 :label: thm-perron-frobenius
@@ -241,7 +251,7 @@ If $A$ is a nonnegative irreducible matrix, then
 Other eigenvalues can have the same modulus when the matrix is imprimitive, but the
 strictly positive eigenvector is unique up to scale.
 
-See Section 1.2.3 of {cite}`Sargent_Stachurski_2024` for details.
+See Section 1.2.3 of {cite:t}`Sargent_Stachurski_2024` for details.
 
 See also the full statement in {doc}`intro:eigen_II`.
 
@@ -267,6 +277,8 @@ It says the pricing kernel
 factors as $\beta h(\theta_j)/h(\theta_i)$, so the entire kernel is pinned down by a
 single vector $h$ (or equivalently $z$).
 
+With these in mind, the Recovery Theorem follows from the Perron–Frobenius theorem.
+
 
 ```{prf:theorem} Recovery Theorem
 :label: thm-ross-recovery
@@ -324,15 +336,12 @@ where $h(\theta_i) = \beta/z_i$ follows from $D_{ii} = h(\theta_i)/\beta = 1/z_i
 Destination states with high $z_j$ have *low* kernel values: for a fixed origin $i$,
 the kernel $\beta z_i/z_j$ is decreasing in $z_j$.
 
-When $h$ is interpreted as marginal utility and states are ordered by consumption or
+When $h$ represents marginal utility and states are ordered by consumption or
 payoff, larger $z_j$ corresponds to lower marginal utility -- "good times" that
 require less insurance and so receive less pricing weight per unit of natural
 probability. 
 
-This monotonic interpretation is not guaranteed for an arbitrary ordering
-of stock-market states.
-
-The same eigenvector argument also clarifies a useful limiting case.
+The same eigenvector argument also yields a useful limiting case.
 
 If the one-period
 bond price is identical in every current state, then the vector of ones is already the
@@ -372,7 +381,7 @@ on a finite grid of log payoff states $s_1, \ldots, s_m$.
 
 On this grid we choose three primitives:
 
-1. a row-stochastic natural transition matrix $F$,
+1. a row-stochastic irreducible natural transition matrix $F$,
 2. a subjective discount factor $\beta = e^{-\rho T}$, and
 3. a CRRA transition pricing kernel
    $\phi_{ij} = \beta e^{-\gamma(s_j-s_i)}$.
@@ -385,12 +394,12 @@ $$
 
 This means the Recovery Theorem assumptions hold by construction: $P$ is nonnegative,
 $F$ is a Markov transition matrix, and the kernel is transition independent with
-$z_i \propto e^{\gamma s_i}$. This benchmark therefore provides a strict test of
-whether the eigenvector recovery calculation returns the objects used to construct
-prices.
+$z_i \propto e^{\gamma s_i}$.
 
 To keep the example close to Ross's Section IV, we choose $F$ to have lognormal-shaped
-rows. In the unbounded continuous model one would write
+rows.
+
+In the unbounded continuous model one would write
 
 $$
 \log(S_T/S_0) \sim \mathcal{N}\!\left((\mu - \tfrac{1}{2}\sigma^2)T, \sigma^2 T\right).
@@ -409,8 +418,7 @@ transition matrix whose rows shift with the current state.
 Ross uses states from $-5$ to $+5$ standard deviations; we use
 the same range below.
 
-The truncation is an essential part of the finite-state model, not a cosmetic
-detail: it is what brings the example into the Perron--Frobenius setting.
+The truncation is an essential part of the finite-state model: it is what brings the example into the Perron--Frobenius setting.
 
 In the
 unbounded continuous lognormal growth model, Ross shows that recovery is not unique.
@@ -481,7 +489,9 @@ print(np.round(P.sum(axis=1), 4))
 print(f"Middle-state risk-free rate: {-np.log(P[5].sum()):.4f}")
 ```
 
-The row sums are the model-implied one-period bond prices in each current state. They
+The row sums are the model-implied one-period bond prices in each current state.
+
+They
 vary near the boundaries because the finite grid truncates and renormalizes the
 conditional transition probabilities.
 
@@ -550,35 +560,20 @@ def recover_natural_distribution(P, tol=1e-10):
     return F, z, β_recovered, φ_relative
 ```
 
-There are two normalizations to keep separate.
-
-Ross's Table I reports the kernel shape
-with the middle state normalized to one, which is $1/z_j$ under our normalization
-$z_{\text{mid}}=1$.
+The Perron vector also recovers the shape of the pricing kernel.
 
-The actual one-period stochastic discount factor for a transition
-from the middle state to state $j$ is $\beta/z_j$.
+Ross's Table I reports this shape with the middle state normalized to one, which is
+$1/z_j$ under our normalization $z_{\text{mid}}=1$.
 
 ```{code-cell} ipython3
 F, z, β_rec, φ_relative = recover_natural_distribution(P)
-ρ_rec = -np.log(β_rec)
-φ_middle = β_rec * φ_relative
 
-print(f"Recovered discount factor β = {β_rec:.6f}  (true: {np.exp(-ρ):.6f})")
-print(f"Recovered discount rate ρ = {ρ_rec:.6f}  (true: {ρ:.6f})")
 print("Ross-normalized kernel 1/z (middle state = 1):")
 print(np.round(φ_relative, 4))
-print("Actual one-period kernel from the middle state β × (1/z):")
-print(np.round(φ_middle, 4))
 ```
 
-Because we know the data-generating natural transition matrix and pricing kernel
-used to construct $P$, we can use them to verify that recovery works in this
-simulation.
-
-In real data the natural transition matrix is unobserved, so these checks become
-internal diagnostics combined with an assessment of the recovery assumptions.
-
+Because we know the data-generating natural transition matrix used to construct
+$P$, we can verify that recovery works in this simulation.
 
 ```{code-cell} ipython3
 def true_lognormal_transition_matrix(states, μ, σ, T):
@@ -599,15 +594,10 @@ def true_lognormal_transition_matrix(states, μ, σ, T):
     return F_true
 
 
-mid = len(states) // 2
 F_true = true_lognormal_transition_matrix(states, μ, σ, T)
-φ_middle_true = np.exp(-ρ * T) * np.exp(-γ * (states - states[mid]))
 P_reconstructed = β_rec * (z[:, None] / z[None, :]) * F
 
 print("Recovery diagnostics")
-print(f"max |β_rec - exp(-ρT)| = {abs(β_rec - np.exp(-ρ * T)):.2e}")
-print(f"max |φ_middle - true kernel| = "
-      f"{np.max(np.abs(φ_middle - φ_middle_true)):.2e}")
 print(f"max |F - true F| = {np.max(np.abs(F - F_true)):.2e}")
 print(f"max |P - recovered kernel times F| = "
       f"{np.max(np.abs(P - P_reconstructed)):.2e}")
@@ -732,7 +722,7 @@ plt.show()
 ```
 
 Because the states are ordered from low to high payoff, the plots show the
-single-crossing property from Theorem 3 of {cite}`Ross2015`: for returns below some
+single-crossing property from Theorem 3 of {cite:t}`Ross2015`: for returns below some
 threshold $v$, risk-neutral probability exceeds natural probability; above $v$ the
 natural probability dominates.
 
@@ -835,28 +825,53 @@ We will say more in {ref}`rt_ex3`.
 
 ## Testing efficient markets
 
-{cite:t}`Ross2015` shows that once the pricing kernel is recovered, one obtains an *upper
-bound on the Sharpe ratio* for strategies based on the stock-market filtration used in
-recovery:
+The recovered pricing kernel can also be used to test market efficiency.
+
+If a trading strategy has a very high Sharpe ratio, then some pricing kernel must be
+volatile enough to price that payoff.
+
+The Hansen--Jagannathan bound {cite}`Hansen_Jagannathan_1991` says that, for any excess
+return with mean $\mu_\text{excess}$ and standard deviation $\sigma_\text{asset}$,
 
 $$
 \frac{|\mu_\text{excess}|}{\sigma_\text{asset}} \leq e^{rT}\, \sigma(M),
 $$
 
-where $\sigma(M)$ is the standard deviation of the actual one-period stochastic discount
-factor projected on that filtration. Arbitrary orthogonal noise in a candidate kernel
-does not tighten this market-efficiency bound.
+where $M$ is the one-period stochastic discount factor.
+
+Ross's point is that recovery gives an estimate of the relevant volatility
+$\sigma(M)$.
+
+Hence it gives an upper bound on the Sharpe ratio of any strategy based on the same
+stock-market information used in recovery.
 
-This follows from the Hansen–Jagannathan bound {cite}`Hansen_Jagannathan_1991`.
+If such a strategy has a Sharpe ratio above the bound, then it is too profitable to be
+consistent with efficiency, under the assumptions of the Recovery
+Theorem.
 
-Equivalently, under the Recovery Theorem assumptions, the $R^2$ of return-forecasting
-regressions based on that information set is bounded above by the variance of the
-pricing kernel:
+The same logic gives a bound on return predictability.
+
+Suppose excess returns are decomposed as
 
 $$
-R^2 \leq e^{2rT} \, \mathrm{Var}(M).
+x_{t+1} = \mu(I_t) + \epsilon_{t+1},
 $$
 
+where $I_t$ is the stock-market information set and $\epsilon_{t+1}$ is unpredictable
+from $I_t$.
+
+Then the $R^2$ of a forecasting regression based on $I_t$ is bounded above by the
+variance of the recovered kernel:
+
+$$
+R^2 \leq e^{2rT} \, \sigma^2(M).
+$$
+
+Only the component of the kernel projected on this information set is relevant.
+
+Adding unrelated noise to a candidate pricing kernel would raise its variance, but it
+would not justify stronger return predictability from stock-market information.
+
 ## Limitations and extensions
 
 The Recovery Theorem is a remarkable theoretical result, but several caveats apply in
@@ -879,6 +894,8 @@ If the kernel is not transition independent, recovery is not guaranteed.
 long-run risk component of the kernel with the natural probability distribution,
 yielding an incorrect decomposition.
 
+We will discuss this later in a sequal to this lecture {doc}`misspecified_recovery`.
+
 *Empirical estimation:*
 
 Extracting reliable state prices from observed option prices requires careful
@@ -905,18 +922,18 @@ P = \begin{pmatrix}
 \end{pmatrix}.
 $$
 
-(a) Compute the Perron eigenvalue $\beta$ and the corresponding eigenvector $z$ of
+1. Compute the Perron eigenvalue $\beta$ and the corresponding eigenvector $z$ of
 $P$.
 
-(b) Use $z$ to recover the natural probability transition matrix $F$ via
+2. Use $z$ to recover the natural probability transition matrix $F$ via
 
 $$
 f_{ij} = \frac{1}{\beta} \frac{z_j}{z_i} p_{ij}.
 $$
 
-(c) Verify that each row of $F$ sums to one and all entries are positive.
+3. Verify that each row of $F$ sums to one and all entries are positive.
 
-(d) For destination state $j$, the relative kernel component is $1/z_j$; for a
+4. For destination state $j$, the relative kernel component is $1/z_j$; for a
 transition from state $i$ to state $j$, the full pricing kernel is $\beta z_i/z_j$.
 Compute $1/z_j$ for each state.
 
@@ -978,13 +995,13 @@ print(f"Decreasing: {φ_relative_ex[0] > φ_relative_ex[1] > φ_relative_ex[2]}"
 Using the recovered $F$ and the normalised risk-neutral matrix $Q = P / \text{row sums}$
 from the exercise above:
 
-(a) Compute the one-step marginal distributions $f_j = F_{2,j}$ and $q_j = Q_{2,j}$
+1. Compute the one-step marginal distributions $f_j = F_{2,j}$ and $q_j = Q_{2,j}$
 starting from state 2 (index 1 in Python).
 
-(b) Compute the CDFs $\hat F_k = \sum_{j \leq k} f_j$ and
+2. Compute the CDFs $\hat F_k = \sum_{j \leq k} f_j$ and
 $\hat Q_k = \sum_{j \leq k} q_j$ for each state.
 
-(c) Verify numerically that $\hat F_k \leq \hat Q_k$ for every $k$, confirming stochastic
+3. Verify numerically that $\hat F_k \leq \hat Q_k$ for every $k$, confirming stochastic
 dominance in this ordered three-state example.
 ```
 

From 5949a12b2084f4f5fff458f824bc403be016ebc5 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Mon, 27 Apr 2026 21:41:57 +0800
Subject: [PATCH 21/26] updates

---
 lectures/ross_recovery.md | 267 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 254 insertions(+), 13 deletions(-)

diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index 16589457f..b3ea3b008 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -77,11 +77,13 @@ It has several important implications:
 
 This lecture covers
 
-* the Arrow–Debreu framework linking state prices, the pricing kernel, and natural
-  probabilities,
+* the Arrow–Debreu framework linking state prices, risk-neutral probabilities,
+  the pricing kernel, and natural probabilities,
 * Ross's Recovery Theorem and its proof via the Perron–Frobenius theorem,
 * an implementation that recovers the natural distribution from a
   simulated state-price matrix, and
+* how option prices and forward equations can be used to estimate transition
+  state prices,
 * comparisons between risk-neutral and recovered natural densities.
 
 Let's import the packages we'll need.
@@ -117,6 +119,62 @@ $$
 As in {doc}`ge_arrow`, the row sums give the state-dependent riskless discount factor:
 $\sum_j p(\theta_i, \theta_j) = e^{-r(\theta_i)}$.
 
+Here $r(\theta_i)$ is the one-period continuously compounded riskless rate in
+current state $\theta_i$.
+
+More generally, if an asset pays $g(\theta_j)$ next period, then its price in
+state $\theta_i$ is
+
+$$
+p_g(\theta_i)
+    = \sum_j p(\theta_i, \theta_j) g(\theta_j).
+$$
+
+Let
+
+$$
+b(\theta_i) \equiv \sum_j p(\theta_i, \theta_j) = e^{-r(\theta_i)}
+$$
+
+be the price of a one-period riskless bond in state $\theta_i$.
+
+Normalizing Arrow prices by this bond price gives the **risk-neutral transition
+probabilities**
+
+$$
+q^*(\theta_i, \theta_j)
+    = \frac{p(\theta_i, \theta_j)}{b(\theta_i)}
+    = e^{r(\theta_i)} p(\theta_i, \theta_j).
+$$
+
+Thus the same asset price can be written as
+
+$$
+p_g(\theta_i)
+    = b(\theta_i) \sum_j q^*(\theta_i, \theta_j) g(\theta_j)
+    = e^{-r(\theta_i)} E_i^*[g(\theta_{t+1})].
+$$
+
+Here $E_i^*$ denotes conditional expectation under
+$q^*(\theta_i,\cdot)$.
+
+The asterisk marks the risk-neutral, or martingale, probability measure.
+
+It is useful to separate this one-period normalization from the dynamic
+transition structure.
+
+If $Q(\theta_i,\theta_j,T)$ denotes the risk-neutral probability of moving from
+$\theta_i$ to $\theta_j$ over $T$ periods, and $0<t<T$ is an intermediate
+horizon, then the Markov forward equation is
+
+$$
+Q(\theta_i,\theta_j,T)
+    = \sum_k Q(\theta_i,\theta_k,t) Q(\theta_k,\theta_j,T-t).
+$$
+
+In matrix notation, multiperiod risk-neutral transition matrices compose by
+matrix multiplication.
+
 ### The pricing kernel
 
 Using the stochastic-discount-factor notation studied in {doc}`markov_asset` and the
@@ -370,6 +428,60 @@ F = \frac{1}{\beta} D P D^{-1} = \frac{1}{b} P. \qquad \square
 $$
 ```
 
+(ross-recovery-single-crossing)=
+### Single crossing and the risk premium
+
+Ross also uses the representative-agent formula to compare the natural and
+risk-neutral densities directly.
+
+For a fixed current state $\theta_i$,
+
+$$
+\frac{q^*(\theta_i,\theta_j)}{f(\theta_i,\theta_j)}
+    = e^{r(\theta_i)} \phi(\theta_i,\theta_j)
+    = e^{r(\theta_i)} \beta
+      \frac{U'(c(\theta_j))}{U'(c(\theta_i))}.
+$$ (eq:rn-natural-ratio)
+
+If $U'$ is decreasing in consumption, then the ratio in
+{eq}`eq:rn-natural-ratio` is decreasing in next-period consumption
+$c(\theta_j)$.
+
+Since both $q^*(\theta_i,\cdot)$ and $f(\theta_i,\cdot)$ integrate to one, there
+is a crossing point $v$ defined by
+
+$$
+e^{r(\theta_i)} \beta U'(v) = U'(c(\theta_i)).
+$$
+
+Below $v$, risk-neutral probability exceeds natural probability; above $v$, the
+natural probability exceeds the risk-neutral probability.
+
+Hence the natural consumption distribution first-order stochastically dominates
+the risk-neutral one.
+
+In a one-period model where terminal consumption is the market payoff, this
+also gives a positive market risk premium.
+
+Let $R$ denote the market return under the natural
+law, let $R^*$ denote the same return under the risk-neutral law, and let
+$R_f$ denote the riskless return in the same one-period units.
+
+The stochastic-dominance result can be represented as
+
+$$
+R^* \sim R - Z + \epsilon,
+$$
+
+where $Z \geq 0$ captures the downward shift induced by risk adjustment and
+$\epsilon$ is a residual satisfying $E[\epsilon \mid R-Z]=0$.
+
+Taking expectations gives
+
+$$
+E[R] = R_f + E[Z] > R_f.
+$$
+
 ## Numerical example
 
 We now demonstrate the Recovery Theorem numerically.
@@ -399,7 +511,43 @@ $z_i \propto e^{\gamma s_i}$.
 To keep the example close to Ross's Section IV, we choose $F$ to have lognormal-shaped
 rows.
 
-In the unbounded continuous model one would write
+The continuous benchmark is a lognormal payoff with CRRA utility:
+
+$$
+U(S_T) = \frac{S_T^{1-\gamma}}{1-\gamma},
+\qquad
+S_T = S_0
+      \exp\!\left((\mu-\tfrac{1}{2}\sigma^2)T
+                 + \sigma \sqrt{T} \xi\right),
+$$
+
+where $\xi \sim N(0,1)$, $\mu$ is the expected growth-rate parameter,
+$\sigma$ is volatility, $T$ is the horizon, $\gamma$ is the CRRA coefficient,
+and $\rho$ is the continuously compounded subjective discount rate.
+
+The $T$-period pricing kernel is
+
+$$
+\phi_T
+    = e^{-\rho T}\left(\frac{S_T}{S_0}\right)^{-\gamma}.
+$$
+
+Equivalently, if $s=\log S_0$ and $s_T=\log S_T$, then the state-price density
+with respect to the future log state $s_T$ is
+
+$$
+p_T(s,s_T)
+    = e^{-\rho T} e^{-\gamma(s_T-s)}
+      \frac{1}{\sigma \sqrt{T}}
+      n\!\left(
+        \frac{s_T-s-(\mu-\frac{1}{2}\sigma^2)T}
+             {\sigma \sqrt{T}}
+      \right),
+$$
+
+where $n$ is the standard normal density.
+
+Thus the natural log return satisfies
 
 $$
 \log(S_T/S_0) \sim \mathcal{N}\!\left((\mu - \tfrac{1}{2}\sigma^2)T, \sigma^2 T\right).
@@ -597,7 +745,7 @@ def true_lognormal_transition_matrix(states, μ, σ, T):
 F_true = true_lognormal_transition_matrix(states, μ, σ, T)
 P_reconstructed = β_rec * (z[:, None] / z[None, :]) * F
 
-print("Recovery diagnostics")
+print("Recovery numerical checks")
 print(f"max |F - true F| = {np.max(np.abs(F - F_true)):.2e}")
 print(f"max |P - recovered kernel times F| = "
       f"{np.max(np.abs(P - P_reconstructed)):.2e}")
@@ -610,10 +758,11 @@ Indeed, the discrepancies are at the level of numerical roundoff.
 A key insight of {cite:t}`Ross2015` is that the natural distribution can differ
 systematically from the risk-neutral one.
 
-In this CRRA example, where states are ordered from low to high payoff, Theorem 3 of
-{cite:t}`Ross2015` implies that the natural marginal density **first-order
-stochastically dominates** the risk-neutral density: the CDF of the natural distribution
-lies *below* that of the risk-neutral distribution.
+In this CRRA example, where states are ordered from low to high payoff, the
+single-crossing argument in {ref}`ross-recovery-single-crossing` implies that
+the natural marginal density **first-order stochastically dominates** the
+risk-neutral density: the CDF of the natural distribution lies *below* that of
+the risk-neutral distribution.
 
 Because the pricing kernel is declining (investors fear bad outcomes), risk-neutral
 probabilities overweight bad states and underweight good states relative to the natural
@@ -722,9 +871,9 @@ plt.show()
 ```
 
 Because the states are ordered from low to high payoff, the plots show the
-single-crossing property from Theorem 3 of {cite:t}`Ross2015`: for returns below some
-threshold $v$, risk-neutral probability exceeds natural probability; above $v$ the
-natural probability dominates.
+single-crossing property discussed in {ref}`ross-recovery-single-crossing`: for
+returns below some threshold $v$, risk-neutral probability exceeds natural
+probability; above $v$ the natural probability dominates.
 
 A higher $\gamma$ amplifies this wedge.
 
@@ -823,6 +972,66 @@ faster than the recovered natural crash probability.
 
 We will say more in {ref}`rt_ex3`.
 
+## From option prices to transition prices
+
+The numerical example above starts from a known state-price transition matrix
+$P$.
+
+Empirically, Ross starts one step earlier: option prices reveal state-price
+densities at different maturities from the current state, and the transition
+matrix must be inferred from those maturity-by-maturity state prices.
+
+Let $C(K,T)$ be the price of a call option with strike $K$ and maturity $T$.
+
+If $p(S,T)$ is the state-price density for terminal index level $S$, then
+
+$$
+C(K,T)
+    = \int_K^\infty (S-K) p(S,T) \, dS.
+$$
+
+Differentiating twice with respect to the strike gives the
+{cite:t}`BreedenLitzenberger1978` formula
+
+$$
+p(K,T) = \frac{\partial^2 C(K,T)}{\partial K^2}.
+$$
+
+After discretizing strikes and maturities, let
+
+$$
+p_t(c) = \big(p_t(c,1), \ldots, p_t(c,m)\big)
+$$
+
+be the vector of state prices at horizon $t$ observed from today's state $c$.
+
+Here $c$ indexes the current state and $t$ counts discrete maturity steps.
+
+The first one-period vector $p_1(c)$ is the row of $P$ corresponding to the
+current state.
+
+If the one-period state-price transition matrix $P$ is time homogeneous, these
+vectors satisfy the forward recursion
+
+$$
+p_{t+1}(c) = p_t(c) P,
+\qquad t=1,\ldots,m-1.
+$$
+
+Componentwise,
+
+$$
+p_{t+1}(c,j) = \sum_k p_t(c,k) p(k,j).
+$$
+
+Thus $m$ maturity vectors supply the $m^2$ equations needed to estimate the
+$m^2$ transition prices $p(k,j)$.
+
+In practice this step is numerically delicate because the second derivative in
+the option-price formula amplifies measurement error, and because additional
+shape restrictions such as positivity or unimodality may be needed to obtain a
+reasonable transition matrix.
+
 ## Testing efficient markets
 
 The recovered pricing kernel can also be used to test market efficiency.
@@ -837,7 +1046,8 @@ $$
 \frac{|\mu_\text{excess}|}{\sigma_\text{asset}} \leq e^{rT}\, \sigma(M),
 $$
 
-where $M$ is the one-period stochastic discount factor.
+where $M$ is the one-period stochastic discount factor and $r$ is the
+continuously compounded riskless rate over horizon $T$.
 
 Ross's point is that recovery gives an estimate of the relevant volatility
 $\sigma(M)$.
@@ -884,6 +1094,37 @@ The theorem requires a bounded, irreducible Markov chain.
 In continuous, unbounded state spaces (e.g., a lognormal diffusion), uniqueness fails
 because any exponential $e^{\alpha x}$ satisfies the characteristic equation.
 
+To see the issue, consider the continuous lognormal growth state-price density
+above.
+
+The natural continuous-space analogue of the Perron--Frobenius problem is
+
+$$
+\int p_T(s,y) v(y) \, dy = \lambda v(s).
+$$
+
+Here $y$ is a possible future log state, $v$ is a candidate positive
+eigenfunction, and $\lambda$ is its eigenvalue.
+
+For every real $\alpha$, the exponential function $v_\alpha(s)=e^{\alpha s}$
+solves this equation with eigenvalue
+
+$$
+\lambda(\alpha)
+    =
+    \exp\!\left(
+        -\rho T
+        +(\alpha-\gamma)(\mu-\tfrac{1}{2}\sigma^2)T
+        +\tfrac{1}{2}\sigma^2T(\alpha-\gamma)^2
+    \right).
+$$
+
+The positive eigenfunction is therefore not unique.
+
+This is why truncation or boundedness assumptions matter: they turn the
+continuous operator problem back into a Perron--Frobenius problem with a
+unique positive eigenvector.
+
 {cite:t}`CarrYu2012` establish recovery with a bounded diffusion.
 
 *Transition independence:*
@@ -894,7 +1135,7 @@ If the kernel is not transition independent, recovery is not guaranteed.
 long-run risk component of the kernel with the natural probability distribution,
 yielding an incorrect decomposition.
 
-We will discuss this later in a sequal to this lecture {doc}`misspecified_recovery`.
+We discuss this in the sequel lecture {doc}`misspecified_recovery`.
 
 *Empirical estimation:*
 

From 1dc53c89ab58148600b452e164ff914cdbc57686 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Mon, 27 Apr 2026 22:23:37 +0800
Subject: [PATCH 22/26] updates

Co-authored-by: Copilot <copilot@github.com>
---
 lectures/misspecified_recovery.md | 1154 +++++++++++++++++++++--------
 1 file changed, 845 insertions(+), 309 deletions(-)

diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index d24b34ecb..f9288e250 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -33,100 +33,54 @@ The lecture {doc}`ross_recovery` studies the case in which recovery is valid.
 There, **transition independence** lets us use Arrow prices to separate investors'
 beliefs from the pricing kernel.
 
-This lecture asks what the same Perron--Frobenius calculation delivers when that
-restriction fails.
+This lecture asks what the same Perron--Frobenius approach delivers when that
+restriction is not imposed.
 
-We will keep three probability laws separate.
+We will keep three probability measures separate.
 
-The first is the correctly specified transition law, which is the law that actually
-governs the Markov state in the model.
+The first is the correctly specified probability measure, which governs the Markov
+state in the model.
 
-In the paper, this can be interpreted as the actual law under rational expectations.
+In the paper, this can be interpreted as the actual probability measure under rational
+expectations.
 
 Interpreting it as investors' subjective beliefs requires additional assumptions.
 
-The second is the one-period risk-neutral law, which comes from normalizing one-period
-Arrow prices by bond prices.
+The second is the one-period risk-neutral probability measure, which comes from
+normalizing one-period Arrow prices by bond prices.
 
-The third is the Perron, or recovered, law, which is the probability law produced by the
-same eigenvector calculation used in Ross recovery.
+The third is the probability measure recovered by Perron--Frobenius Theory, also called
+the long-term risk-neutral measure.
 
-The central question is whether the recovered law equals the correctly specified law.
+The central question is whether the recovered probability measure equals the correctly
+specified probability measure.
 
 {cite:t}`BorovickaHansenScheinkman2016` show that, in general, the answer is no.
 
-A likelihood ratio is just a ratio of probabilities under two probability laws.
+The paper studies the ratio of the recovered probability measure to the correctly
+specified probability measure.
 
-The reason is that the stochastic discount factor can contain a likelihood-ratio term
+The reason is that the stochastic discount factor can contain a martingale component
 that changes the probability measure.
 
-If that likelihood-ratio term is constant, Ross recovery returns the correctly specified
-transition probabilities.
+If that martingale component is identically one, Ross recovery returns the correctly
+specified transition probabilities.
 
-If it is not constant, the recovered law includes risk adjustments that matter for
-long-horizon claims, because likelihood-ratio increments compound along histories.
+If it is not identically one, the recovered probability measure absorbs long-term risk
+adjustments, because martingale increments compound along histories.
 
-In the examples below, this typically shifts probability toward adverse long-run-risk
-states, so the recovered law looks more pessimistic than the correctly specified law.
+In the examples below, the recovered probability measure assigns more probability to
+adverse long-run-risk states than the correctly specified probability measure.
 
 We will:
 
 - use results from {doc}`ross_recovery` without re-proving it,
-- diagnose misspecification through a likelihood-ratio term,
-- show why recursive utility and permanent shocks break recovery,
+- study misspecification through the martingale component,
+- show why recursive utility and permanent shocks make the recovered probability
+  measure differ from the correctly specified probability measure,
 - measure the difference in a long-run risk model.
 
-### The broader framework
-
-The paper's framework is more general than the finite-state matrices used first in this
-lecture.
-
-It starts with a Markov state $X_t$ and, when needed, an auxiliary process $Y_t$ with
-stationary increments.
-
-The auxiliary process lets the model record shocks or growth components that are not
-fully summarized by $X_t$ alone.
-
-The basic objects are **multiplicative functionals**.
-
-A positive process $M_t$ is a multiplicative functional when its log increments depend
-on the current state and the next shock.
-
-Stochastic discount factors, cash-flow growth processes, and likelihood-ratio
-martingales are all treated this way.
-
-In the paper, a stochastic discount factor $S_t$ prices bounded claims by
-
-$$
-\Pi_{\tau,t}(\Phi_t)
-= E\left[\frac{S_t}{S_\tau}\Phi_t \mid \mathcal F_\tau\right].
-$$
-
-For a payoff $f(X_t)$, this defines a pricing operator
-
-$$
-[Q_t f](x)
-= E[S_t f(X_t) \mid X_0=x].
-$$
-
-The Perron--Frobenius problem is therefore an operator problem:
-
-$$
-[Q_t \hat e](x) = \exp(\hat\eta t)\hat e(x).
-$$
-
-The associated likelihood-ratio martingale is
-
-$$
-\frac{\hat H_t}{\hat H_0}
-= \exp(-\hat\eta t) S_t
-  \frac{\hat e(X_t)}{\hat e(X_0)}.
-$$
-
-In a finite-state one-period model, $Q_t$ becomes a matrix power and $\hat e$ becomes a
-positive eigenvector.
-
-That is the special case we use below to make the mechanics transparent.
+We will use the following imports.
 
 ```{code-cell} ipython3
 import numpy as np
@@ -138,8 +92,8 @@ from scipy.stats import gaussian_kde
 
 The next cell contains code inherited from the previous lecture.
 
-It row-normalizes Arrow prices, finds the positive Perron eigenpair, and computes
-stationary distributions.
+It row-normalizes Arrow prices, finds the Perron--Frobenius eigenvalue and positive
+right eigenvector, and computes stationary distributions.
 
 ```{code-cell} ipython3
 :tags: [hide-input]
@@ -152,7 +106,7 @@ def risk_neutral_probs(Q):
 
 
 def perron_frobenius(Q):
-    """Positive Perron pair and induced recovered transition matrix."""
+    """Perron-Frobenius eigenpair and associated transition matrix."""
     eigenvalues, eigenvectors = linalg.eig(Q)
     eigenvalues = np.real_if_close(eigenvalues, tol=1000)
     eigenvectors = np.real_if_close(eigenvectors, tol=1000)
@@ -169,7 +123,7 @@ def perron_frobenius(Q):
         if exp_eta > 0 and np.all(e > 0):
             break
     else:
-        raise ValueError("No strictly positive Perron eigenvector found")
+        raise ValueError("No strictly positive Perron-Frobenius eigenvector found")
 
     e = e / e.sum()
     eta = np.log(exp_eta)
@@ -194,7 +148,7 @@ def stationary_dist(P):
 
 
 def martingale_increment(Q, P):
-    """Likelihood-ratio increment from actual to recovered probabilities."""
+    """Martingale increment for the recovered probability measure."""
     eta, exp_eta, e, P_hat = perron_frobenius(Q)
     H = np.ones_like(P)
     mask = P > 0
@@ -207,14 +161,16 @@ def martingale_increment(Q, P):
 Let $\mathbf{P}=[p_{ij}]$ denote the correctly specified transition matrix and
 $\mathbf{Q}=[q_{ij}]$ the Arrow price matrix.
 
-Here "correctly specified" means that $\mathbf{P}$ is the transition law that actually
+Here "correctly specified" means that $\mathbf{P}$ is the transition matrix that
 governs the Markov state in the model.
 
 The one-period stochastic discount factor (SDF) satisfies
 
-$$
+```{math}
+:label: eq-mr-arrow-price-finite
+
 q_{ij} = s_{ij} p_{ij}.
-$$
+```
 
 We will compare $\mathbf{P}$ with two probability matrices constructed from
 $\mathbf{Q}$.
@@ -231,57 +187,71 @@ $$
 
 This matrix absorbs one-period risk adjustments into transition probabilities.
 
-The second one is the **Perron recovered matrix**.
+The second one is the transition matrix associated with the **long-term risk-neutral
+probability**.
 
-It starts from the positive Perron eigenpair of $\mathbf{Q}$.
+It starts from the Perron--Frobenius eigenvalue and positive right eigenvector of
+$\mathbf{Q}$.
 
 Let $(\exp(\hat \eta), \hat e)$ solve
 
-$$
+```{math}
+:label: eq-mr-pf-finite
+
 \mathbf{Q}\hat e = \exp(\hat \eta)\hat e.
-$$
+```
 
 Then define
 
-$$
+```{math}
+:label: eq-mr-phat-finite
+
 \hat p_{ij}
 = \exp(-\hat \eta) q_{ij} \frac{\hat e_j}{\hat e_i}.
-$$
+```
 
 The factor $\hat e_j/\hat e_i$ is chosen to cancel any SDF component of the form
 $\exp(\hat \eta)\hat e_i/\hat e_j$.
 
 The result is a stochastic matrix $\hat{\mathbf{P}}$.
 
-This construction assumes that the relevant Arrow-price matrix has a positive Perron
-pair that is unique up to scale.
+This construction assumes that $\mathbf{Q}$ has a unique positive right eigenvector up
+to scale.
 
-In the finite-state examples below this condition is satisfied, while in more general
-state spaces the paper imposes additional stability and ergodicity conditions.
+For a finite irreducible non-negative matrix, the Perron--Frobenius theorem guarantees
+this uniqueness directly.
 
-In particular, positive eigenfunctions need not be unique in continuous state spaces.
+In general state spaces this guarantee does not carry over: multiple positive
+eigenfunctions may exist, and an additional selection condition is needed to pin down
+the long-term risk-neutral measure.
 
-The paper's uniqueness result selects the Perron solution whose likelihood-ratio
-martingale makes $X_t$ stationary and ergodic under the recovered probability measure.
+The general framework in the next section makes that selection condition explicit.
 
 Following {cite:t}`BorovickaHansenScheinkman2016`, $\hat{\mathbf{P}}$ is called a
 **long-term risk-neutral** transition matrix.
 
-The name means that the Perron eigenpair isolates the part of pricing that dominates
-long-maturity Arrow claims.
+The name means that the Perron--Frobenius eigenvalue and eigenvector isolate the part
+of pricing that dominates long-maturity Arrow claims.
 
-It is not the same object as the one-period risk-neutral matrix
+It is not the same transition matrix as the one-period risk-neutral matrix
 $\bar{\mathbf{P}}$.
 
-In {doc}`ross_recovery`, transition independence pins down the split between $s_{ij}$
-and $p_{ij}$.
+In {doc}`ross_recovery`, transition independence restricts the SDF to
+
+$$
+s_{ij}=\exp(-\delta)\frac{m_j}{m_i}
+$$
+
+for a positive vector $m$ and scalar $\delta$, which pins down the split between
+$s_{ij}$ and $p_{ij}$.
 
-Here we drop transition independence.
+Here we drop that restriction.
 
-The question is whether the Perron recovered matrix $\hat{\mathbf{P}}$ still equals the
-correctly specified matrix $\mathbf{P}$.
+The question is whether the transition matrix associated with the long-term
+risk-neutral probability, $\hat{\mathbf{P}}$, still equals the correctly specified
+matrix $\mathbf{P}$.
 
-### Where recovery works
+### Degenerate Martingale Component
 
 We start with a three-state economy: recession, normal, and expansion.
 
@@ -318,8 +288,8 @@ S_power = (
 Q_power = S_power * P_true
 ```
 
-We now compute the one-period risk-neutral matrix and the Perron recovered matrix from
-the same Arrow price matrix.
+We now compute the one-period risk-neutral matrix and the transition matrix associated
+with the long-term risk-neutral probability from the same Arrow price matrix.
 
 ```{code-cell} ipython3
 P_bar, q_bonds = risk_neutral_probs(Q_power)
@@ -335,12 +305,12 @@ The row-normalized matrix $\bar{\mathbf{P}}$ is a short-horizon risk-neutral cha
 measure: it folds the one-period SDF into transition probabilities, so it generally
 differs from the correctly specified matrix $\mathbf{P}$.
 
-The logic comes from the recovery formula in {doc}`ross_recovery`.
+The logic comes from the Perron--Frobenius construction in {doc}`ross_recovery`.
 
 In the transition-independent case, the pricing kernel has the form
 $s_{ij}=\exp(\hat\eta)\hat e_i/\hat e_j$.
 
-Substituting this into the Perron formula gives
+Substituting this into the Perron--Frobenius transition formula gives
 
 $$
 \hat p_{ij}
@@ -351,17 +321,17 @@ $$
 =p_{ij}.
 $$
 
-Thus the Perron matrix $\hat{\mathbf{P}}$ cancels the transition-independent part of
-the SDF.
+Thus the transition matrix $\hat{\mathbf{P}}$ associated with the long-term
+risk-neutral probability cancels the transition-independent part of the SDF.
 
 In this power-utility benchmark, the whole SDF has exactly that form, so the remaining
-likelihood-ratio term should be one and $\hat{\mathbf{P}}$ should coincide with
+martingale increment should be one and $\hat{\mathbf{P}}$ should coincide with
 $\mathbf{P}$.
 
-The next calculation checks this by comparing the Perron eigenfunction with
+The next calculation checks this by comparing the Perron--Frobenius eigenfunction with
 $c_i^\gamma$ and then computing the ratio $\hat{\mathbf{P}}/\mathbf{P}$.
 
-Define the diagnostic ratio
+Define the one-period martingale increment
 
 $$
 \hat h_{ij}
@@ -369,10 +339,10 @@ $$
 = \exp(-\hat\eta)s_{ij}\frac{\hat e_j}{\hat e_i}.
 $$
 
-When $\hat h_{ij}=1$ for every transition, the recovered matrix and the correctly
-specified matrix are the same.
+When $\hat h_{ij}=1$ for every transition, $\hat{\mathbf P}$ and $\mathbf P$ are the
+same.
 
-The next section explains why this ratio is also called a likelihood-ratio increment.
+The next section explains why this ratio is the one-period martingale increment.
 
 In the power-utility example, write
 
@@ -406,12 +376,12 @@ $$
 H_power = np.divide(P_hat, P_true, out=np.ones_like(P_true), where=P_true > 0)
 e_theory = c_levels**γ_power
 
-print("Perron eigenfunction: numerical vs c^gamma")
+print("Perron-Frobenius eigenfunction: numerical vs c^gamma")
 for name, e_num, e_th in zip(state_names, e_hat / e_hat[1],
                              e_theory / e_theory[1]):
     print(f"{name:9s}: {e_num:.6f}  {e_th:.6f}")
 
-print("\nlikelihood-ratio increment h_hat = P_hat / P")
+print("\nmartingale increment h_hat = P_hat / P")
 print(np.round(H_power, 6))
 
 print("\nconditional means under P")
@@ -421,8 +391,8 @@ print(f"\nmax |h_hat - 1| = "
       f"{np.max(np.abs(H_power[P_true > 0] - 1)):.2e}")
 ```
 
-The output separates a short-horizon risk adjustment from the Perron recovery
-calculation.
+The output separates a short-horizon risk adjustment from the Perron--Frobenius
+approach.
 
 The one-period risk-neutral matrix $\bar{\mathbf{P}}$ is close to, but not the same as,
 the correctly specified matrix $\mathbf{P}$.
@@ -433,18 +403,19 @@ one-period risk adjustments.
 By contrast, the long-term risk-neutral matrix $\hat{\mathbf{P}}$ is exactly the same
 as $\mathbf{P}$ in this example.
 
-The diagnostic confirms why: the likelihood-ratio increment $\hat h_{ij}$ is one for
-every transition.
+The calculation confirms why: the martingale increment $\hat h_{ij}$ is one for every
+transition.
 
 This is the condition under which Ross recovery returns the correctly specified
 transition matrix.
 
-In this example, that cancellation exhausts the SDF, so no additional probability
-distortion remains.
+In this example, that cancellation exhausts the SDF, so the martingale component is
+degenerate.
 
-## The likelihood-ratio diagnostic
+## Martingale Component
 
-Let $(\hat \eta, \hat e)$ be the positive Perron pair of $\mathbf{Q}$:
+Let $(\hat \eta, \hat e)$ be the Perron--Frobenius eigenvalue exponent and positive
+right eigenvector of $\mathbf{Q}$:
 
 $$
 \mathbf{Q} \hat e = \exp(\hat\eta) \hat e.
@@ -457,19 +428,20 @@ $$
 = \exp(-\hat\eta) q_{ij} \frac{\hat e_j}{\hat e_i}.
 $$
 
-To see whether recovery has changed the probability law, compare each recovered
+To see whether recovery has changed the probability measure, compare each recovered
 transition probability with the corresponding correctly specified transition
 probability.
 
-For feasible transitions with $p_{ij}>0$, define the one-period likelihood-ratio
-increment
+For feasible transitions with $p_{ij}>0$, define the one-period martingale increment
+
+```{math}
+:label: eq-mr-hhat-finite
 
-$$
 \hat h_{ij} = \frac{\hat p_{ij}}{p_{ij}}.
-$$
+```
 
-If $\hat h_{ij}>1$, the recovered law assigns more probability to transition $(i,j)$
-than the correctly specified law.
+If $\hat h_{ij}>1$, the recovered probability measure assigns more probability to
+transition $(i,j)$ than the correctly specified probability measure.
 
 If $\hat h_{ij}<1$, it assigns less probability to that transition.
 
@@ -480,45 +452,52 @@ $$
 \sum_j \hat h_{ij} p_{ij}=1.
 $$
 
-Thus $\hat h_{ij}$ is a one-period likelihood-ratio increment.
+Thus $\hat h_{ij}$ is a one-period martingale increment.
 
-Multiplying these increments along a history of states gives the likelihood ratio for
-the whole history.
+Multiplying these increments along a history of states gives the ratio of the recovered
+probability measure to the correctly specified probability measure for the whole
+history.
 
-That likelihood-ratio process is a martingale, which is why the last term in the
-decomposition below is called a martingale component.
+That ratio process is a martingale, which is why the last term in
+{eq}`eq-mr-finite-sdf-decomposition` is called a martingale component.
 
-The one-period SDF can be written as
+Using {eq}`eq-mr-arrow-price-finite`, {eq}`eq-mr-phat-finite`, and
+{eq}`eq-mr-hhat-finite`, the one-period SDF can be written as
+
+```{math}
+:label: eq-mr-finite-sdf-decomposition
 
-$$
 s_{ij}
 = \exp(\hat\eta) \frac{\hat e_i}{\hat e_j} \hat h_{ij}.
-$$
+```
 
-The Perron calculation therefore separates the SDF into:
+The Perron--Frobenius approach therefore separates the SDF into:
 
 | Part | Role |
 |---|---|
 | $\exp(\hat\eta)$ | deterministic long-run discounting |
 | $\hat e_i / \hat e_j$ | state-dependent long-run term |
-| $\hat h_{ij}$ | likelihood ratio that changes probabilities |
+| $\hat h_{ij}$ | martingale increment that changes probabilities |
 
-If $\hat h_{ij}=1$ for every feasible transition, then the recovered transition matrix
-and the correctly specified transition matrix are the same.
+If $\hat h_{ij}=1$ for every feasible transition, then the transition matrix associated
+with the recovered probability measure and the correctly specified transition matrix
+are the same.
 
 This is the condition under which Ross recovery returns the correctly specified
 transition matrix.
 
-```{prf:proposition} Recovery diagnostic
-:label: prop-misspecified-recovery-diagnostic
+```{prf:proposition} Finite-state martingale component
+:label: prop-misspecified-recovery-martingale-component
 
 Under the finite-state assumptions used in this lecture, for a Markov model with
 correctly specified transition matrix $\mathbf{P}$ and Arrow matrix $\mathbf{Q}$,
-Perron--Frobenius recovery returns the correctly specified transition matrix if and only
-if $\hat h_{ij}=1$ for every transition with $p_{ij}>0$.
+the probability measure recovered by Perron--Frobenius Theory returns the correctly
+specified transition matrix if and only if $\hat h_{ij}=1$ for every transition with
+$p_{ij}>0$.
 
 Equivalently, recovery returns the correctly specified transition matrix if and only if
-the SDF has no nonconstant likelihood-ratio martingale:
+the SDF in {eq}`eq-mr-finite-sdf-decomposition` has no nonconstant martingale
+component:
 
 $$
 s_{ij}=\exp(\hat\eta)\frac{\hat e_i}{\hat e_j}.
@@ -537,11 +516,11 @@ $$
 Thus $\hat{\mathbf{P}}=\mathbf{P}$ if and only if $\hat h_{ij}=1$ on every feasible
 transition.
 
-This condition is the same as saying that the SDF can be written in the displayed form
-with no extra likelihood-ratio term.
+This condition is the same as saying that the SDF can be written as
+{eq}`eq-mr-finite-sdf-decomposition` with no extra martingale increment.
 ```
 
-This finite-state diagnostic is a special case of the paper's general identification
+This finite-state implication is a special case of the paper's general identification
 result.
 
 If a pair $(S,P)$ explains asset prices and $H$ is any positive martingale, then the
@@ -564,23 +543,535 @@ $$
 
 which rules out a nontrivial martingale component.
 
-The power-utility calculation above illustrates the proposition.
+The power-utility example above illustrates the proposition.
+
+In that benchmark, the martingale increment $\hat h_{ij}$ is identically one.
+
+## From matrices to the general framework
+
+The finite-state calculation has three objects:
+
+1. the correctly specified transition probabilities $p_{ij}$,
+2. the SDF increments $s_{ij}$,
+3. the Arrow prices $q_{ij}=s_{ij}p_{ij}$.
+
+It also has one diagnostic object:
+
+$$
+\hat h_{ij}
+= \frac{\hat p_{ij}}{p_{ij}}
+= \exp(-\hat\eta)s_{ij}\frac{\hat e_j}{\hat e_i}.
+$$
+
+The numbers $\hat h_{ij}$ are one-period martingale increments.
+
+They change the probability of a one-period transition from $p_{ij}$ to
+$\hat p_{ij}=\hat h_{ij}p_{ij}$.
+
+The general framework in {cite:t}`BorovickaHansenScheinkman2016` does the same thing
+without assuming that states are finite.
+
+The transition matrix becomes a Markov probability measure, the Arrow price matrix
+becomes a family of pricing operators, and the one-period ratios $\hat h_{ij}$ become
+a positive martingale process.
+
+The point of this section is to build that dictionary.
+
+### Probability space and state
+
+Start with a probability space $(\Omega,\mathcal F,P)$.
+
+Here $P$ is the correctly specified probability measure.
+
+In the rational-expectations interpretation of the paper, this is the actual, or
+original, probability measure governing the state.
+
+In the finite-state section, $P$ was represented by the transition matrix
+$\mathbf P=[p_{ij}]$.
 
-In that benchmark, the likelihood-ratio increment $\hat h_{ij}$ is a constant one.
+The index set is either discrete time, $\mathbb T=\{0,1,2,\ldots\}$, or continuous
+time, $\mathbb T=\mathbb R_+$.
 
-## Recursive utility
+The main state process is $X=\{X_t:t\in\mathbb T\}$, which is stationary and Markov
+under $P$.
 
-We now use the diagnostic to see how recovery can fail.
+A second process $W=\{W_t:t\in\mathbb T\}$ records shocks that drive $X$ and other
+economic quantities.
 
-The previous example worked because all risk adjustment in the SDF could be written as
-a ratio of a function of today's state to a function of tomorrow's state.
+In discrete time, the shock increment between dates $t$ and $t+1$ is
 
-The Perron formula cancels exactly that kind of term.
+$$
+\Delta W_{t+1}=W_{t+1}-W_t.
+$$
+
+The known function $\phi_x$ maps today's state and the next shock increment into
+tomorrow's state.
+
+The discrete-time state evolution is
+
+$$
+X_{t+1}=\phi_x(X_t,\Delta W_{t+1}).
+$$
+
+The conditional law generated by this equation is the general-state replacement for
+row $i$ of the finite matrix $\mathbf P$.
+
+````{prf:assumption} Markov state and shock increments
+:label: assumption-mr-markov-shocks
+
+The process $X$ is ergodic under $P$.
+
+The conditional distribution of $\Delta W_{t+1}$ given $X_t$ is time invariant and
+independent of past shock histories conditioned on $X_t$.
+````
+
+The filtration $\{\mathcal F_t\}$ is generated by the initial condition $X_0$ and by
+the history of shocks through date $t$.
+
+### Information and $Y$
+
+The Markov state and the information that reveals shocks need not coincide.
+
+The state $X_t$ is observed at date $t$.
+
+The next shock $\Delta W_{t+1}$ need not be directly observed from the pair
+$(X_t,X_{t+1})$.
+
+If $(X_t,X_{t+1})$ does reveal $\Delta W_{t+1}$, then $X$ alone carries the relevant
+shock information.
+
+If it does not, introduce an auxiliary process $Y=\{Y_t\}$ with stationary increments.
+
+The known function $\phi_y$ maps today's state and the next shock increment into the
+increment of $Y$.
+
+The discrete-time evolution for the auxiliary increment is
+
+$$
+Y_{t+1}-Y_t=\phi_y(X_t,\Delta W_{t+1}).
+$$
+
+The pair $(X_{t+1},Y_{t+1}-Y_t)$ is then rich enough, together with $X_t$, to recover
+the shock increment $\Delta W_{t+1}$.
+
+This device lets the model handle shocks or growth components that affect payoffs and
+SDFs but are not fully summarized by the next Markov state alone.
+
+Write the enlarged process as $Z=(X,Y)$.
+
+The process $Z$ is Markov with a triangular structure: the conditional distribution of
+$(X_{t+1},Y_{t+1}-Y_t)$ depends on the past only through $X_t$.
+
+This is why the next Perron--Frobenius problem can first be posed with eigenfunctions
+of $X$ alone.
+
+The section {ref}`mr_additional_state` later returns to what changes when the
+eigenfunction is allowed to depend on $Y$ as well.
+
+### Multiplicative functionals
+
+The general framework needs a way to describe objects that compound over time.
+
+This is the role of a positive multiplicative functional $M=\{M_t\}$.
+
+Its log increment is a function of today's state and the next shock increment:
+
+$$
+\log \frac{M_{t+1}}{M_t}
+= \kappa(X_t,\Delta W_{t+1}).
+$$
+
+Equivalently,
+
+$$
+\frac{M_{t+1}}{M_t}
+= \exp\{\kappa(X_t,\Delta W_{t+1})\}.
+$$
+
+Thus $M_t/M_0$ is a product of positive one-period increments.
+
+Under {prf:ref}`assumption-mr-markov-shocks`, the logarithm of $M$ has stationary
+increments.
+
+Products and reciprocals of positive multiplicative functionals are again positive
+multiplicative functionals.
+
+Exponential functions of linear combinations of the components of $Y$ are examples.
+
+Stochastic discount factors, stochastic growth factors, and positive multiplicative
+martingales are all modeled this way.
+
+In the finite-state model, the SDF increment $s_{ij}$ is one example of a
+multiplicative-functional increment.
+
+The ratio $h_{ij}$ that changes probabilities is another.
+
+### Stochastic discount factors and pricing operators
+
+A stochastic discount factor $S=\{S_t\}$ is a positive multiplicative functional with
+$S_0=1$ and finite first moments conditional on $X_0$.
+
+Let $\Phi_t$ be a bounded payoff measurable with respect to the date-$t$ information.
+
+The date-$\tau$ price of $\Phi_t$ is
+
+$$
+\Pi_{\tau,t}(\Phi_t)
+= E\left[\frac{S_t}{S_\tau}\Phi_t\mid\mathcal F_\tau\right].
+$$
+
+The ratio $S_t/S_\tau$ is the stochastic discount factor from date $t$ back to date
+$\tau$.
+
+If the payoff is a bounded function $f(X_t)$ of the future Markov state, this pricing
+formula defines a horizon-$t$ operator $Q_t$ by
+
+$$
+[Q_t f](x)
+= E[S_t f(X_t)\mid X_0=x].
+$$
+
+This operator is the general-state analogue of multiplying a payoff vector by the
+Arrow-price matrix $\mathbf Q$.
+
+To see the connection, suppose again that the state space is finite and $t=1$.
+
+Then
+
+$$
+[Q_1 f]_i
+= \sum_j s_{ij}p_{ij}f_j
+= \sum_j q_{ij}f_j.
+$$
+
+Thus $Q_1$ is exactly the matrix $\mathbf Q$.
+
+In discrete time, the multiplicative property of $S$ implies that $Q_t$ is obtained
+by applying the one-period operator $Q_1$ repeatedly.
+
+In continuous time, the family $\{Q_t:t\geq0\}$ is a semigroup of pricing operators.
+
+### Martingales and equivalent probability measures
+
+Different stochastic discount factor / probability pairs can produce the same pricing
+operators, and that flexibility is the source of the identification problem.
+
+Let $H=\{H_t\}$ be a strictly positive martingale with $E[H_0]=1$ under $P$.
+
+For an event $A$ observable by date $\tau$, the changed probability measure $P^H$ is
+defined by
+
+$$
+P^H(A)=E[1_A H_\tau].
+$$
 
-Recursive utility usually adds something else.
+The law of iterated expectations makes this definition consistent across dates.
 
-The extra object is a continuation-value term, and the key point is that it behaves like
-the likelihood-ratio increment defined above.
+When $H$ is also a multiplicative functional, making it a multiplicative martingale,
+the change of probability preserves the Markov structure of $Z$.
+
+The SDF that represents the same prices under $P^H$ is
+
+$$
+S_t^H=S_t\frac{H_0}{H_t}.
+$$
+
+Thus the same pricing operators can be represented by the pair $(S,P)$ or by the
+pair $(S^H,P^H)$.
+
+In the finite-state model, this is just
+
+$$
+p^H_{ij}=h_{ij}p_{ij},
+\qquad
+s^H_{ij}=\frac{s_{ij}}{h_{ij}},
+$$
+
+so that
+
+$$
+s^H_{ij}p^H_{ij}=s_{ij}p_{ij}=q_{ij}.
+$$
+
+The same Arrow prices can therefore be explained by changing the probability measure
+and offsetting that change in the SDF.
+
+This is also the precise sense in which Arrow prices alone do not identify beliefs.
+
+### What Perron--Frobenius recovers
+
+Now return to the Perron--Frobenius step.
+
+The finite-state equation was {eq}`eq-mr-pf-finite`.
+
+The general-state replacement is an eigenfunction problem for the pricing operators:
+find a scalar $\hat\eta$ and a positive function $\hat e$ such that, for every horizon
+$t$,
+
+```{math}
+:label: eq-mr-pf-general
+
+[Q_t\hat e](x)
+=\exp(\hat\eta t)\hat e(x).
+```
+
+The positive function $\hat e$ is the general-state counterpart of the
+Perron--Frobenius eigenvector.
+
+The scalar $\hat\eta$ is the log eigenvalue.
+
+In finite states, $\hat e$ is just a positive vector with one entry for each state.
+
+In general state spaces, $\hat e(x)$ is a positive function of the current state.
+
+Its job is to record the state-dependent part of long-horizon valuation.
+
+Equation {eq}`eq-mr-pf-general` says that a future payoff equal to $\hat e(X_t)$ has
+date-0 price $\exp(\hat\eta t)\hat e(X_0)$.
+
+Thus $\hat\eta$ gives the common growth or discount rate, while $\hat e$ gives the
+state-dependent scaling.
+
+Since eigenfunctions are defined only up to scale, uniqueness always means uniqueness
+up to multiplication by a positive constant.
+
+The eigenfunction equation implies the conditional moment restriction
+
+$$
+E[S_t\hat e(X_t)\mid\mathcal F_\tau]
+=\exp((t-\tau)\hat\eta)S_\tau\hat e(X_\tau),
+\qquad t\geq \tau.
+$$
+
+Use this restriction to define
+
+```{math}
+:label: eq-mr-hhat-process
+
+\frac{\hat H_t}{\hat H_0}
+=\exp(-\hat\eta t)S_t
+  \frac{\hat e(X_t)}{\hat e(X_0)}.
+```
+
+This process is a martingale because, for $t\geq \tau$,
+
+$$
+\begin{aligned}
+E\left[\frac{\hat H_t}{\hat H_0}\mid\mathcal F_\tau\right]
+&=
+\frac{\exp(-\hat\eta t)}{\hat e(X_0)}
+E[S_t\hat e(X_t)\mid\mathcal F_\tau] \\
+&=
+\frac{\exp(-\hat\eta t)}{\hat e(X_0)}
+\exp((t-\tau)\hat\eta)S_\tau\hat e(X_\tau) \\
+&=
+\exp(-\hat\eta \tau)S_\tau
+\frac{\hat e(X_\tau)}{\hat e(X_0)}
+=\frac{\hat H_\tau}{\hat H_0}.
+\end{aligned}
+$$
+
+The process is positive because $S$ and $\hat e$ are positive.
+
+Its one-period increment is
+
+```{math}
+:label: eq-mr-hhat-increment-general
+
+\frac{\hat H_{t+1}}{\hat H_t}
+= \exp(-\hat\eta)\frac{S_{t+1}}{S_t}
+  \frac{\hat e(X_{t+1})}{\hat e(X_t)}.
+```
+
+In the finite-state model, when $X_t=i$ and $X_{t+1}=j$,
+{eq}`eq-mr-hhat-increment-general` becomes
+
+$$
+\frac{\hat H_{t+1}}{\hat H_t}
+= \exp(-\hat\eta)s_{ij}\frac{\hat e_j}{\hat e_i}
+= \hat h_{ij}
+= \frac{\hat p_{ij}}{p_{ij}}.
+$$
+
+This is the same three-component decomposition as
+{eq}`eq-mr-finite-sdf-decomposition`.
+
+In finite states, {eq}`eq-mr-finite-sdf-decomposition` is equivalent to
+
+$$
+\frac{S_{t+1}}{S_t}
+= \exp(\hat\eta)
+  \frac{\hat e(X_t)}{\hat e(X_{t+1})}
+  \frac{\hat H_{t+1}}{\hat H_t}.
+$$
+
+The only change is notation: $\hat h_{ij}$ is the one-period density ratio in a finite
+Markov chain, while $\hat H_{t+1}/\hat H_t$ is the corresponding one-period density
+ratio in the general Markov setting.
+
+Therefore $\hat H_t/\hat H_0$ is the density of the recovered probability measure
+relative to the correctly specified probability measure through date $t$.
+
+If $\hat H_{t+1}/\hat H_t$ is not identically one, the recovered probability measure differs
+from the correctly specified probability measure.
+
+We show the restriction that rules out this difference in the next section.
+
+### Selection and recovery
+
+In finite irreducible matrix problems, Perron--Frobenius theory gives a unique positive
+eigenvector up to scale, so the recovered transition matrix is pinned down by
+$\mathbf Q$.
+
+In general state spaces, multiple positive eigenfunctions may solve the same pricing
+operator problem.
+
+The paper therefore imposes a selection condition on the probability measure induced by
+the candidate eigenfunction.
+
+````{prf:assumption} Ergodicity of the recovered measure
+:label: assumption-mr-ergodicity
+
+The process $X$ is stationary and ergodic under $P^{\hat H}$, the probability measure
+induced by the multiplicative martingale $\hat H$ defined in the previous section.
+````
+
+````{prf:proposition} Uniqueness of the Perron--Frobenius solution
+:label: prop-mr-uniqueness
+
+There is at most one solution $(\hat e, \hat\eta)$ to the Perron--Frobenius problem
+such that $X$ is stationary and ergodic under the induced probability measure
+$P^{\hat H}$.
+````
+
+This selected solution, when it exists, identifies the long-term risk-neutral measure.
+
+It does not by itself identify subjective beliefs.
+
+To make recovery identify beliefs, an additional restriction is needed on the SDF.
+
+The restriction used by Ross recovery is the paper's Condition 4:
+
+````{prf:assumption}
+:label: assumption-mr-condition-4
+
+Let
+
+$$
+S_t=\exp(-\delta t)\frac{m(X_t)}{m(X_0)}
+$$
+
+for some positive function $m$ and real number $\delta$.
+````
+
+This is the transition-independence restriction from {doc}`ross_recovery`.
+
+Under this restriction, setting $\hat e=1/m$ and $\hat\eta=-\delta$ gives
+
+$$
+\frac{\hat H_t}{\hat H_0}
+= \exp(\delta t)
+  \left[\exp(-\delta t)\frac{m(X_t)}{m(X_0)}\right]
+  \frac{1/m(X_t)}{1/m(X_0)}
+=1.
+$$
+
+Thus the martingale component is identically one after normalization.
+
+This is the general-state version of the finite-state condition
+$\hat h_{ij}=1$ for every feasible transition.
+
+If this martingale is not identically one, the recovered probability measure absorbs it
+and generally differs from the correctly specified probability measure.
+
+We will see a few important examples of this in the next section.
+
+The section {ref}`mr_additional_state` returns to what happens when the eigenfunction
+is allowed to depend on the auxiliary process $Y$.
+
+### Continuous-time version
+
+Let's briefly introduce the model in continuous time before discussing examples where recovery fails.
+
+We introduce the diffusion notation because the long-run risk example below is
+written in continuous time.
+
+The objects are the same as before:
+
+- $X$ is the Markov state,
+- $Y$ records additional growing or shock-revealing components,
+- $M$ is a positive multiplicative functional, such as an SDF, a cash-flow growth
+  process, or a martingale used to change probabilities.
+
+In the continuous-time version, $W$ is a Brownian motion.
+
+The state, auxiliary process, and multiplicative functional satisfy
+
+$$
+\begin{aligned}
+dX_t &= \mu_x(X_t)dt+\sigma_x(X_t)dW_t,\\
+dY_t &= \mu_y(X_t)dt+\sigma_y(X_t)dW_t,\\
+d\log M_t &= \beta(X_t)dt+\alpha(X_t)\cdot dW_t.
+\end{aligned}
+$$
+
+Here $\mu_x$ and $\mu_y$ are drift functions, while $\sigma_x$ and $\sigma_y$ are
+shock-exposure matrices.
+
+The function $\beta$ is the drift of $\log M$, and $\alpha$ is the Brownian shock
+exposure of $\log M$.
+
+The invertibility assumption on the stacked shock-exposure matrix is the continuous-time
+counterpart of the discrete-time condition that $(X_{t+1},Y_{t+1}-Y_t)$ reveals the
+shock increment.
+
+It lets the history of $Z=(X,Y)$ reveal the Brownian information.
+
+For $M$ to be a local martingale, its drift must satisfy
+
+$$
+\beta(x)=-\frac{1}{2}\alpha(x)\cdot\alpha(x).
+$$
+
+This follows from Ito's formula:
+
+$$
+\frac{dM_t}{M_t}
+= \left(\beta(X_t)+\frac{1}{2}\alpha(X_t)\cdot\alpha(X_t)\right)dt
+  + \alpha(X_t)\cdot dW_t.
+$$
+
+A local martingale has zero drift in $dM_t/M_t$, which gives the displayed restriction.
+
+Additional integrability conditions then ensure that this local martingale is a true
+martingale.
+
+Under the probability change induced by such a martingale, Brownian drifts shift.
+
+This is the continuous-time analogue of replacing $p_{ij}$ by $h_{ij}p_{ij}$ in the
+finite-state model.
+
+The Markov and triangular structure of $Z$ is preserved, which is why the same
+Perron--Frobenius decomposition can be applied.
+
+
+## When the recovery fails
+
+Now let's discuss a few examples where the recovered probability measure differs from the correctly specified probability measure.
+
+### Recursive utility
+
+We now use the martingale component to see when the recovered probability measure
+differs from the correctly specified probability measure.
+
+In the previous example, all risk adjustment in the SDF could be written as a ratio of
+a function of today's state to a function of tomorrow's state.
+
+The Perron--Frobenius transition formula cancels exactly that kind of term.
+
+Recursive utility adds a continuation-value term.
+
+The key point is that this term behaves like the martingale increment defined above.
 
 For the unit-EIS Epstein--Zin case in {cite:t}`BorovickaHansenScheinkman2016`, with
 $C_t=\exp(g_c t)c(X_t)$, write the translated continuation value as $V_t=g_c t+v(X_t)$,
@@ -598,10 +1089,10 @@ s_{ij}
   \frac{v_j^*}{\sum_k p_{ik}v_k^*}.
 $$
 
-In this unit-EIS example, the Perron eigenfunction is $\hat e_j=c_j$ and
+In this unit-EIS example, the Perron--Frobenius eigenfunction is $\hat e_j=c_j$ and
 $\hat\eta=-(\delta+g_c)$.
 
-Applying the Perron formula therefore leaves
+Applying the Perron--Frobenius transition formula therefore leaves
 
 $$
 \hat p_{ij}
@@ -612,11 +1103,12 @@ The denominator is the conditional expectation of $v_j^*$ given current state $i
 
 Therefore the last fraction has conditional mean one under $\mathbf{P}$.
 
-It is therefore a likelihood-ratio increment.
+It is therefore a martingale increment.
 
-When $v^*$ is not constant, that likelihood ratio varies across next-period states.
+When $v^*$ is not constant, that ratio varies across next-period states.
 
-That variation is why recovery no longer returns the correct transition matrix.
+That variation is why the probability measure recovered by Perron--Frobenius Theory no
+longer gives the correctly specified transition matrix.
 
 The next cell solves the finite-state continuation-value equation and builds the SDF.
 
@@ -657,11 +1149,11 @@ def solve_ez_unit_eis(P, c, δ, γ, g_c, tol=1e-12, max_iter=10_000):
     return v, v_star, S
 ```
 
-At log utility, $v^*$ is constant and the likelihood-ratio term disappears.
+At log utility, $v^*$ is constant and the martingale increment is one.
 
 As risk aversion rises, continuation values matter more.
 
-The recovered probability measure then moves farther away from the correctly specified
+The recovered probability measure then differs more from the correctly specified
 probability measure.
 
 To make the mechanism visible in a small three-state example, the figure below uses
@@ -671,11 +1163,11 @@ $$
 c=(0.85, 1.00, 1.15).
 $$
 
-The heatmap reports percentage deviations of the likelihood-ratio increment from one:
+The heatmap reports percentage deviations of the martingale increment from one:
 $100(\hat h_{ij}-1)$.
 
 Positive entries are transitions that receive more probability under the recovered
-measure than under the correctly specified probability measure.
+probability measure than under the correctly specified probability measure.
 
 The right panel reports the increase in the recovered recession probability, measured
 in percentage points.
@@ -684,7 +1176,7 @@ in percentage points.
 ---
 mystnb:
   figure:
-    caption: Recursive utility generates a nonconstant likelihood-ratio increment that distorts recovery.
+    caption: Recursive utility generates a nonconstant martingale increment.
     name: fig-mr-recursive-martingale
 ---
 c_recursive = np.array([0.85, 1.00, 1.15])
@@ -714,7 +1206,7 @@ axes[0].set_xticklabels(state_names, rotation=20)
 axes[0].set_yticklabels(state_names)
 axes[0].set_xlabel('next state')
 axes[0].set_ylabel(r'current state')
-axes[0].set_title(r'likelihood-ratio distortion, $\gamma=10$')
+axes[0].set_title(r'martingale increment, $\gamma=10$')
 
 for i in range(3):
     for j in range(3):
@@ -727,32 +1219,33 @@ axes[1].plot(γ_grid, rec_prob_gain, lw=2.5)
 axes[1].axhline(0, ls='--', lw=1.5, color='0.5')
 axes[1].set_xlabel(r"risk aversion $\gamma$")
 axes[1].set_ylabel('increase in recession probability\n(percentage points)')
-axes[1].set_title('recession probability distortion')
+axes[1].set_title('recovered recession probability')
 axes[1].set_ylim(0, rec_prob_gain.max() * 1.08)
 
 plt.tight_layout()
 plt.show()
 ```
 
-It is clear that recursive utility tilts the recovered law toward worse future
-states.
+Recursive utility makes the recovered probability measure assign more probability to
+recession transitions.
 
 At $\gamma=10$, transitions into recession receive more probability under the recovered
-law, while transitions into expansion receive less.
+probability measure, while transitions into expansion receive less.
 
-As risk aversion rises, this distortion becomes stronger and the stationary recession
-probability under the recovered law moves further above its correctly specified value.
+As risk aversion rises, the stationary recession probability under the recovered
+probability measure moves further above its correctly specified value.
 
-Thus, as the continuation-value term creates a nonconstant $\hat h_{ij}$, the Perron
-recovered matrix no longer equals the correctly specified transition matrix.
+Thus, as the continuation-value term creates a nonconstant $\hat h_{ij}$, the transition
+matrix associated with the long-term risk-neutral probability no longer equals the
+correctly specified transition matrix.
 
-## Permanent shocks
+### Permanent Shocks
 
-Recursive utility is one way to generate a nonconstant likelihood ratio.
+Recursive utility gives one nonconstant martingale component.
 
 Permanent shocks provide another.
 
-Suppose consumption has a permanent multiplicative shock,
+Suppose consumption has a permanent shock,
 
 $$
 \log C_{t+1}-\log C_t
@@ -771,30 +1264,31 @@ $$
 
 The middle term depends only on the current and next Markov states.
 
-It is a ratio of state functions, so the Perron formula can cancel it.
+It is a ratio of state functions, so the Perron--Frobenius transition formula can cancel
+it.
 
 The permanent shock term depends on the new shock $\varepsilon_{t+1}$.
 
-Because that shock is not summarized by the finite Markov state in this calculation,
+Because that shock is not summarized by the finite Markov state in this construction,
 there is no state function whose ratio can cancel it.
 
-After dividing by its conditional mean, the shock term becomes a likelihood-ratio
-increment:
+After dividing by its conditional mean, the shock term becomes a martingale increment:
 
 $$
 \frac{\exp(-\gamma\sigma\varepsilon_{t+1})}
      {E[\exp(-\gamma\sigma\varepsilon_{t+1})]}.
 $$
 
-Thus permanent consumption shocks can break belief recovery, even under ordinary power
-utility.
+Thus permanent consumption shocks can make the recovered probability measure differ
+from investors' beliefs, even under ordinary power utility.
 
-This statement is relative to the Markov state used in the recovery calculation.
+This statement is relative to the Markov state used in the recovery procedure.
 
 Enlarging the state or information structure to account for the shock can accommodate
-it, but doing so creates the identification problem discussed in {ref}`mr_additional_state`.
+it, but doing so leads to the identification problem discussed in
+{ref}`mr_additional_state`.
 
-## Long-run risk
+### Long-run risk
 
 We now move from small finite-state examples to a standard continuous-time
 macro-finance model.
@@ -802,13 +1296,13 @@ macro-finance model.
 The model is the Bansal--Yaron long-run risk model, using the calibration reported by
 {cite:t}`BorovickaHansenScheinkman2016`.
 
-The point is to see how different the recovered measure can look in a standard
-macro-finance model.
+The point is to compare the recovered probability measure with the correctly specified
+probability measure in a standard macro-finance model.
 
-The calculation has the same structure as before.
+The construction has the same structure as before.
 
-We first write the correctly specified state dynamics, then compute the probability law
-implied by the Perron recovery calculation.
+We first write the correctly specified state dynamics, then compute the probability
+measure implied by the Perron--Frobenius approach.
 
 The state vector $X_t=(X_{1t},X_{2t})'$ follows
 
@@ -828,8 +1322,8 @@ Here $X_1$ is predictable consumption growth and $X_2$ is stochastic volatility.
 The representative agent has Epstein--Zin utility with unit elasticity of intertemporal
 substitution.
 
-The continuation value introduces the continuous-time analogue of the likelihood-ratio
-process above.
+The continuation value introduces the continuous-time analogue of the martingale
+component above.
 
 We denote that process by $H^*$, and the SDF satisfies
 
@@ -839,8 +1333,8 @@ $$
 
 Here $H^*$ is the continuation-value martingale entering the Epstein--Zin SDF.
 
-The Perron--Frobenius likelihood-ratio martingale $\hat H$ is obtained only after also
-incorporating the Perron eigenfunction.
+The multiplicative martingale $\hat H$ associated with the Perron--Frobenius problem is
+obtained only after also incorporating the Perron--Frobenius eigenfunction.
 
 In models with martingale components in consumption growth, $H^*$ and $\hat H$ need not
 coincide.
@@ -868,7 +1362,7 @@ lrr_params = dict(
 The next code block computes how the different probability measures change the drift of
 the state vector.
 
-The first object is the continuation value.
+The first quantity is the continuation value.
 
 In this affine model, the translated continuation value is linear in the state:
 
@@ -898,7 +1392,7 @@ $$
 
 This vector $\alpha_S$ drives the one-period risk-neutral change of measure.
 
-The second object is the Perron eigenfunction.
+The second quantity is the Perron--Frobenius eigenfunction.
 
 It is exponential-affine:
 
@@ -908,15 +1402,15 @@ $$
 
 Thus $e_1$ and $e_2$ are slopes of the log eigenfunction.
 
-Because $X_1$ and $X_2$ have shock loadings $\sigma_1$ and $\sigma_2$, the Perron
-eigenfunction contributes the additional shock exposure
+Because $X_1$ and $X_2$ have shock loadings $\sigma_1$ and $\sigma_2$, the
+Perron--Frobenius eigenfunction contributes the additional shock exposure
 
 $$
 \sigma_1 e_1 + \sigma_2 e_2.
 $$
 
-Therefore the one-period risk-neutral dynamics use only $\alpha_S$, while the Perron
-recovered dynamics use
+Therefore the one-period risk-neutral dynamics use only $\alpha_S$, while the dynamics
+under the long-term risk-neutral measure use
 
 $$
 \alpha_S + \sigma_1 e_1 + \sigma_2 e_2.
@@ -955,7 +1449,7 @@ def solve_value_function(p):
 
 
 def solve_pf_lrr(p, v1, v2):
-    """Perron eigenfunction slopes and the SDF diffusion loading."""
+    """Perron-Frobenius eigenfunction slopes and the SDF diffusion loading."""
     δ, γ = p["δ"], p["γ"]
     μ11, μ12, μ22 = p["μ11"], p["μ12"], p["μ22"]
     ι1, ι2 = p["ι1"], p["ι2"]
@@ -967,7 +1461,7 @@ def solve_pf_lrr(p, v1, v2):
     α_h_star = (1 - γ) * (α_c + σ1 * v1 + σ2 * v2)
     α_s = -α_c + α_h_star
 
-    # Drift coefficients of log S before the Perron factorization.
+    # Drift coefficients of log S before the Perron-Frobenius decomposition.
     β_s11 = -β_c1
     β_s12 = -β_c2 - 0.5 * np.dot(α_h_star, α_h_star)
     β_s0 = -δ - β_c0 - 0.5 * ι2 * np.dot(α_h_star, α_h_star)
@@ -975,7 +1469,7 @@ def solve_pf_lrr(p, v1, v2):
     # e1 and e2 are coefficients in log e(x) = e0 + e1 x1 + e2 x2.
     e1 = -β_s11 / μ11
 
-    # e2 solves the remaining quadratic from the Perron eigenvalue equation.
+    # e2 solves the remaining quadratic from the Perron-Frobenius eigenvalue equation.
     const = (β_s12 + 0.5 * np.dot(α_s, α_s)
              + e1 * (μ12 + np.dot(σ1, α_s))
              + 0.5 * e1**2 * np.dot(σ1, σ1))
@@ -992,7 +1486,7 @@ def solve_pf_lrr(p, v1, v2):
                - e1 * (μ11 * ι1 + μ12 * ι2) - e2 * μ22 * ι2)
         candidates.append((eta, e2))
 
-    # Choose the stable Perron root used for the long-term factorization.
+    # Choose the solution that gives the smaller eigenvalue exponent.
     eta, e2 = min(candidates)
     return e1, e2, eta, α_s
 
@@ -1003,7 +1497,7 @@ def recovered_lrr_dynamics(p, e1, e2, α_s):
     ι1, ι2 = p["ι1"], p["ι2"]
     σ1, σ2 = p["σ1"], p["σ2"]
 
-    # The recovered measure uses the SDF exposure plus the Perron exposure.
+    # The long-term risk-neutral measure uses the SDF exposure plus the eigenfunction exposure.
     α_h = α_s + σ1 * e1 + σ2 * e2
 
     # A diffusion change of measure shifts each drift by sigma_i dot alpha_h.
@@ -1028,7 +1522,7 @@ def recovered_lrr_dynamics(p, e1, e2, α_s):
 
 
 def risk_neutral_lrr_dynamics(p, α_s):
-    """State dynamics under the instantaneous risk-neutral measure."""
+    """State dynamics under the one-period risk-neutral measure."""
     μ11, μ12, μ22 = p["μ11"], p["μ12"], p["μ22"]
     ι1, ι2 = p["ι1"], p["ι2"]
     σ1, σ2 = p["σ1"], p["σ2"]
@@ -1053,8 +1547,8 @@ def risk_neutral_lrr_dynamics(p, α_s):
     )
 ```
 
-For the calibration used here, the recovered measure changes the long-run state
-distribution.
+For the calibration used here, the recovered probability measure changes the long-run
+state distribution.
 
 It lowers the mean of expected growth and raises the mean of volatility.
 
@@ -1065,7 +1559,7 @@ dyn_hat = recovered_lrr_dynamics(lrr_params, e1, e2, α_s)
 dyn_bar = risk_neutral_lrr_dynamics(lrr_params, α_s)
 
 print(f"value slopes:       v1 = {v1:.4f}, v2 = {v2:.4f}")
-print(f"Perron coefficients: e1 = {e1:.4f}, e2 = {e2:.4f}")
+print(f"eigenfunction coefficients: e1 = {e1:.4f}, e2 = {e2:.4f}")
 print(f"log eigenvalue:     eta = {η_lrr:.6f}  "
       f"(annualized {12 * η_lrr:.4f})")
 print()
@@ -1074,7 +1568,7 @@ print("measure        iota_1     iota_2     mu_12      mu_22")
 print("---------   --------   --------   --------   --------")
 print(f"actual      {lrr_params['ι1']:8.5f}   {lrr_params['ι2']:8.5f}"
       f"   {lrr_params['μ12']:8.5f}   {lrr_params['μ22']:8.5f}")
-print(f"risk-neut.  {dyn_bar['ι1']:8.5f}   {dyn_bar['ι2']:8.5f}"
+print(f"one-period  {dyn_bar['ι1']:8.5f}   {dyn_bar['ι2']:8.5f}"
       f"   {dyn_bar['μ12']:8.5f}   {dyn_bar['μ22']:8.5f}")
 print(f"long-term   {dyn_hat['ι1']:8.5f}   {dyn_hat['ι2']:8.5f}"
       f"   {dyn_hat['μ12']:8.5f}   {dyn_hat['μ22']:8.5f}")
@@ -1088,55 +1582,56 @@ predictable consumption growth.
 The volatility slope $v_2$ is negative in this calibration, so higher volatility lowers
 continuation value.
 
-The Perron coefficient $e_1$ has the opposite sign: the long-term change of measure
-loads negatively on predictable growth.
+The eigenfunction coefficient $e_1$ has the opposite sign: the long-term change of
+measure loads negatively on predictable growth.
 
-Thus the recovered measure tilts probability toward histories with lower expected
-growth.
+Thus the recovered probability measure assigns more probability to histories with lower
+expected growth.
 
-The positive $e_2$ works in the other direction for volatility, tilting probability
-toward higher-volatility states.
+The positive $e_2$ has the opposite implication for volatility, assigning more
+probability to higher-volatility states.
 
 The table translates those coefficients into state dynamics.
 
-Relative to the correctly specified law, both risk-neutral measures lower the long-run
-mean of predictable growth and raise the long-run mean of volatility.
+Relative to the correctly specified probability measure, both risk-neutral measures
+lower the long-run mean of predictable growth and raise the long-run mean of
+volatility.
 
-The long-term risk-neutral measure moves further in that direction than the
-instantaneous risk-neutral measure: $\iota_1$ falls from $0$ to about $-0.0027$, while
-$\iota_2$ rises from $1$ to about $1.13$.
+The long-term risk-neutral measure moves further in that direction than the one-period
+risk-neutral measure: $\iota_1$ falls from $0$ to about $-0.0027$, while $\iota_2$
+rises from $1$ to about $1.13$.
 
-The small negative log eigenvalue means that the Perron discount factor is slightly
-below one; with the usual yield sign convention, $-\eta$ is the corresponding long-run
-discount rate.
+The small negative log eigenvalue means that $\exp(\eta)$ is slightly below one; with
+the usual yield sign convention, $-\eta$ is the corresponding long-run discount rate.
 
-### State probabilities
+#### Stationary Densities
 
-The coefficient table gives one summary of the distortion created by recovery.
+The coefficient table gives one summary of the difference between probability measures.
 
-A probability plot gives another.
+A stationary-density plot gives another.
 
 It shows not only that the means of $X_1$ and $X_2$ move, but also which combinations of
 growth and volatility become more likely.
 
-This matters because treating the recovered law as beliefs changes the whole forecast
-distribution, not just a pair of long-run averages.
+This matters because treating the recovered probability measure as beliefs changes the
+whole forecast distribution, not just a pair of long-run averages.
 
-Under the recovered law, probability mass shifts toward bad long-run-risk states.
+Under the recovered probability measure, probability mass shifts toward adverse
+long-run-risk states.
 
 These are states with lower predictable growth $X_1$ and higher volatility $X_2$.
 
-The dashed contour adds the one-period risk-neutral law.
+The dashed contour adds the one-period risk-neutral probability measure.
 
-In this calibration, the one-period risk-neutral and Perron recovered stationary
+In this calibration, the one-period risk-neutral and long-term risk-neutral stationary
 distributions are close to each other, and both are far from the correctly specified
 distribution.
 
-Thus the likelihood-ratio component accounts for much of the risk adjustment in the
+Thus the martingale component accounts for much of the risk adjustment in the
 state dynamics.
 
-The plot below simulates the state process under each probability law and estimates the
-stationary joint density of $(X_2, X_1)$.
+The plot below simulates the state process under each probability measure and estimates
+the stationary joint density of $(X_2, X_1)$.
 
 The horizontal line marks $X_1=0$ and the vertical line marks the correctly specified
 mean of volatility, $X_2=\iota_2$.
@@ -1242,40 +1737,54 @@ plt.show()
 The movement below the horizontal line means lower expected growth, while movement to
 the right of the vertical line means higher volatility.
 
-### Yield implications
+#### Yield implications
 
-The probability distortion matters for asset-pricing interpretation because yields mix
-two objects: a payoff forecast and an asset price.
+The difference between probability measures matters for asset-pricing interpretation
+because yields mix two quantities: a payoff forecast and an asset price.
 
-The recovered measure is called long-term risk-neutral because it absorbs
+The recovered probability measure is called long-term risk-neutral because it absorbs
 the martingale component that prices long-horizon risk.
 
-For stochastically growing cash flows, long-term risk premia vanish when yields are
-computed under this recovered measure.
+For stochastically growing cash flows, the paper's long-horizon result is that risk
+premia relative to maturity-matched bonds vanish under the recovered probability
+measure, subject to the stability and moment conditions used for the limit.
 
-Under the correctly specified law, those same long-term risk premia need not vanish.
+Under the correctly specified probability measure, those same long-term risk premia need
+not vanish.
 
-For a cash flow $G_t$, the yield compares a forecast of the payoff with its asset price:
+For a cash flow $G_t$, write expectations under the correctly specified probability
+measure as $E_P$ and expectations under the recovered probability measure as
+$E_{\hat P}$.
+
+The yield computed under the correctly specified probability measure is
 
 $$
-y_t[G](x)
-= \frac{1}{t}\log E[G_t \mid X_0=x]
-  - \frac{1}{t}\log E[S_tG_t \mid X_0=x].
+y_t^P[G](x)
+= \frac{1}{t}\log E_P[G_t \mid X_0=x]
+  - \frac{1}{t}\log E_P[S_tG_t \mid X_0=x].
 $$
 
-The first term is a forecast of the cash flow.
+The first term is the payoff forecast.
 
-The second term is its price, written using the stochastic discount factor.
+The second term is the asset price, written using the original SDF representation.
 
 Arrow prices determine the second term.
 
 The question here is what happens to the first term if an analyst treats the recovered
-law $\hat{\mathbf{P}}$ as investors' beliefs.
+probability measure $\hat{\mathbf{P}}$ as investors' beliefs.
+
+In that comparison, prices are held fixed and only the forecast term is recomputed:
+
+$$
+y_t^{\hat P}[G](x)
+= \frac{1}{t}\log E_{\hat P}[G_t \mid X_0=x]
+  - \frac{1}{t}\log E_P[S_tG_t \mid X_0=x].
+$$
 
 For an aggregate-consumption payoff, the answer is substantial.
 
-The recovered law assigns more probability to low-growth, high-volatility states, so it
-forecasts lower future consumption.
+The recovered probability measure assigns more probability to low-growth,
+high-volatility states, so it forecasts lower future consumption.
 
 Holding prices fixed, that lower forecast translates into lower consumption yields.
 
@@ -1285,7 +1794,8 @@ Its payoff is one, so the forecast term is always $\log E[1]=0$.
 
 Changing beliefs therefore does not move the bond-yield panel.
 
-The same Perron object also appears in long-bond and forward-measure limits.
+The same solution to the Perron--Frobenius problem also appears in long-bond and
+forward-measure limits.
 
 The limiting one-period return on a very long bond is
 
@@ -1302,7 +1812,7 @@ $$
 $$
 
 Thus the limiting one-period transition from forward measures coincides with the
-Perron recovered transition.
+transition associated with the long-term risk-neutral probability.
 
 The calculation below uses the affine formulas implied by the long-run risk model.
 
@@ -1317,8 +1827,8 @@ $$
 where the coefficients solve Riccati equations.
 
 The code below computes these affine expectations under the correctly specified
-measure, recomputes only the consumption forecast under the recovered measure, and keeps
-asset prices fixed.
+measure, recomputes only the consumption forecast under the recovered probability
+measure, and keeps asset prices fixed.
 
 It then plots median and interquartile yield bands across the same simulated initial
 states.
@@ -1328,9 +1838,10 @@ states.
 mystnb:
   figure:
     caption: >-
-      Yield implications of using recovered probabilities as beliefs. Dashed
-      consumption-yield bands use recovered payoff forecasts with prices fixed; bond
-      yields are unchanged because the zero-coupon payoff has no forecast term.
+      Yield implications of using the recovered probability measure as beliefs.
+      Dashed consumption-yield bands use payoff forecasts under the recovered
+      probability measure with prices fixed; bond yields are unchanged because the
+      zero-coupon payoff has no forecast term.
     name: fig-mr-lrr-figure-2
 ---
 def affine_expectation_coeffs(dyn, β0, β1, β2, α, horizons):
@@ -1374,7 +1885,7 @@ def yield_quantiles(log_num, log_den, horizons):
 
 def transform_functional(β0, β1, β2, α, dyn_old, dyn_new, α_h):
     """Rewrite a multiplicative functional after changing probabilities."""
-    # The drift changes because the recovered likelihood ratio changes the
+    # The drift changes because the martingale component changes the
     # Brownian shock exposure used to forecast the cash flow.
     β_level = β0 - β1 * dyn_old["ι1"] - β2 * dyn_old["ι2"]
     β2_new = β2 + np.dot(α, α_h)
@@ -1415,7 +1926,7 @@ horizons = 3 * quarters
     α_s + α_c, horizons
 )
 
-# Recovered-belief numerator for the aggregate-consumption payoff
+# Numerator for the aggregate-consumption payoff under the recovered probability measure
 β_Ch0, β_Ch1, β_Ch2, α_Ch = transform_functional(
     β_c0, β_c1, β_c2, α_c, dyn_true, dyn_hat, dyn_hat["α_h"]
 )
@@ -1447,11 +1958,11 @@ def plot_yield_band(ax, x, q, color, label, linestyle='solid',
 plot_yield_band(axes[0], quarters, qC_P, color='0.2',
                 label='correctly specified measure', alpha=0.45)
 plot_yield_band(axes[0], quarters, qC_H, color='0.65',
-                label='recovered measure', linestyle='--', alpha=0.35)
+                label='recovered probability measure', linestyle='--', alpha=0.35)
 plot_yield_band(axes[1], quarters, qB_P, color='0.2',
                 label='correctly specified measure', alpha=0.45)
 plot_yield_band(axes[1], quarters, qB_H, color='0.65',
-                label='recovered measure', linestyle='--', alpha=0.25)
+                label='recovered probability measure', linestyle='--', alpha=0.25)
 
 axes[0].set_xlabel('maturity (quarters)')
 axes[0].set_ylabel('consumption yield to maturity')
@@ -1464,11 +1975,11 @@ plt.tight_layout()
 plt.show()
 ```
 
-The left panel is the key one: recovered beliefs put more mass on low-growth,
-high-volatility states, so they forecast lower consumption and imply lower consumption
-yields when prices are held fixed.
+The left panel is the key one: treating the recovered probability measure as beliefs
+assigns more probability to low-growth, high-volatility states, so the implied forecast
+for consumption is lower and consumption yields fall when prices are held fixed.
 
-The bond panel is a check.
+The bond panel verifies the zero-coupon comparison.
 
 Since $\log E[1]=0$ under any measure, the solid and dashed
 bond-yield bands coincide.
@@ -1476,21 +1987,36 @@ bond-yield bands coincide.
 (mr_additional_state)=
 ## Additional state vector
 
-{cite:t}`BorovickaHansenScheinkman2016` then asks whether the recovery
-problem can be fixed by enlarging the state vector.
+{cite:t}`BorovickaHansenScheinkman2016` then asks whether enlarging the state vector
+changes the recovery problem.
 
-So far, the Perron eigenfunction has depended only on the Markov state $X_t$.
+So far, the Perron--Frobenius eigenfunction has depended only on the Markov state
+$X_t$.
 
 But many models also contain a growing component $Y_t$, such as log consumption, with
-increments driven by the same shocks:
+increments driven by the same shock increments.
+
+Here $\Delta W_{t+1}$ denotes the shock increment between dates $t$ and $t+1$.
+
+The map $\phi_x$ sends today's state and the next shock increment into tomorrow's
+state.
+
+The map $\phi_y$ sends today's state and the next shock increment into the increment
+of $Y$.
 
 $$
-X_{t+1}=\phi_x(X_t,W_{t+1}),
+X_{t+1}=\phi_x(X_t,\Delta W_{t+1}),
 \qquad
-Y_{t+1}-Y_t=\phi_y(X_t,W_{t+1}).
+Y_{t+1}-Y_t=\phi_y(X_t,\Delta W_{t+1}).
 $$
 
-If we allow the eigenfunction to depend on both $(X_t,Y_t)$, then a natural candidate is
+Let $\varepsilon$ denote an eigenfunction candidate that is allowed to depend on both
+the stationary state $X_t$ and the growing component $Y_t$.
+
+Let $\zeta$ be a vector of loadings on $Y$, and let $e_\zeta$ be a positive function
+of $X$.
+
+Then a natural candidate is
 
 $$
 \varepsilon(x,y)=\exp(\zeta \cdot y)e_\zeta(x).
@@ -1506,16 +2032,17 @@ $$
   \exp\{\zeta \cdot (Y_{t+1}-Y_t)\}.
 $$
 
-Since $Y_{t+1}-Y_t$ is a function of $(X_t,W_{t+1})$, the ratio
-$\exp(\zeta \cdot Y_{t+1})/\exp(\zeta \cdot Y_t)$ is a one-period
-multiplicative shock.
+Since $Y_{t+1}-Y_t$ is a function of $(X_t,\Delta W_{t+1})$, the ratio
+$\exp(\zeta \cdot Y_{t+1})/\exp(\zeta \cdot Y_t)$ is a one-period positive
+multiplicative functional increment.
 
-Thus multiplying the old eigenfunction by $\exp(\zeta \cdot y)$ does not destroy the
-Perron structure; it simply changes the one-period pricing operator by the extra factor
+For a fixed $\zeta$, this factor tilts the one-period pricing operator by
 $\exp\{\zeta \cdot (Y_{t+1}-Y_t)\}$.
 
-For each choice of $\zeta$, the remaining $x$-dependent part solves a different Perron
-problem:
+The $x$-dependent term is therefore not simply the earlier eigenfunction reused.
+
+For each choice of $\zeta$, the remaining $x$-dependent part solves a different
+Perron--Frobenius problem:
 
 $$
 E\left[
@@ -1529,8 +2056,8 @@ $$
 
 Changing $\zeta$ changes how much long-run growth risk is loaded into the eigenfunction.
 
-Thus adding $Y_t$ can make the subjective probability law one possible solution, but it
-also creates a family of possible solutions.
+Thus adding $Y_t$ can make the subjective probability measure one possible solution, but
+it also creates a family of possible solutions.
 
 The extra state variable therefore does not remove the identification problem; it
 usually makes the selection problem more explicit.
@@ -1540,12 +2067,13 @@ The paper also points out a related practical issue.
 Highly persistent stationary processes can be hard to distinguish from processes with
 stationary increments.
 
-A stationary approximation may have a unique Perron solution for each finite persistence
-level, but as persistence becomes extreme, the limiting problem can have many
-near-solutions.
+A stationary approximation may have a unique solution to the Perron--Frobenius problem
+for each finite persistence level, but as persistence becomes extreme, the limiting
+problem can have many approximate solutions.
 
-Numerically, this means recovery can become fragile exactly in the cases where a
-stationary model is being used to approximate stochastic growth.
+Numerically, this means the solution to the Perron--Frobenius problem can be sensitive
+exactly in the cases where a stationary model is being used to approximate stochastic
+growth.
 
 There is, however, a structured way forward.
 
@@ -1569,9 +2097,16 @@ the analyst, not recovered from Arrow prices alone.
 
 The paper also asks how large the martingale component is in asset-market data.
 
-This matters because a small martingale component would make the recovered law close to
-beliefs, while a large one would make the recovered law mainly a long-term
-risk-neutral object.
+Under rational expectations, this measures how important long-term risk adjustments are
+for valuation.
+
+Under a subjective-beliefs interpretation, it measures the discrepancy between
+subjective beliefs and the correctly specified probability measure only after imposing
+that the subjective SDF itself has no martingale component.
+
+With that extra restriction in place, a small martingale component would make the
+recovered probability measure close to beliefs, while a large one would make long-term
+risk adjustments more important for the recovered probability measure.
 
 One family of measures applies a convex function to the martingale increment
 $\hat H_{t+1}/\hat H_t$.
@@ -1598,13 +2133,13 @@ without requiring a full set of Arrow prices.
 
 ## Lessons
 
-The Perron--Frobenius calculation remains useful under misspecification, but it no
+The Perron--Frobenius approach remains useful under misspecification, but it no
 longer solves the belief-recovery problem by itself.
 
 It delivers a probability measure that may include long-horizon risk premia.
 
-That measure equals investors' beliefs only when the likelihood-ratio martingale is
-constant.
+That measure equals investors' beliefs only when the martingale component is
+identically one.
 
 Recursive utility, permanent shocks, and long-run risk models give this martingale an
 economically important role, so it should not be overlooked when assessing the
@@ -1613,9 +2148,9 @@ implications of transition independence for belief recovery.
 ## Exercises
 
 ```{exercise}
-:label: ex_misspecified_recovery_diagnostic
+:label: ex_misspecified_recovery_martingale_component
 
-**A two-state diagnostic.**
+**A two-state martingale component.**
 
 Let
 
@@ -1634,12 +2169,13 @@ $$
 $$
 
 1. Compute the one-period risk-neutral transition matrix $\bar{\mathbf{P}}$.
-2. Compute the recovered transition matrix $\hat{\mathbf{P}}$.
+2. Compute the transition matrix $\hat{\mathbf{P}}$ associated with the recovered
+   probability measure.
 3. Compute $\hat h_{ij}=\hat p_{ij}/p_{ij}$ and decide whether recovery returns the
    correctly specified transition matrix.
 ```
 
-```{solution-start} ex_misspecified_recovery_diagnostic
+```{solution-start} ex_misspecified_recovery_martingale_component
 :class: dropdown
 ```
 
@@ -1656,7 +2192,7 @@ H2, eta2, e2, Phat2 = martingale_increment(Q2, P2)
 
 print("One-period risk-neutral transition matrix P_bar")
 print(np.round(Pbar2, 4))
-print("\nRecovered transition matrix P_hat")
+print("\nTransition matrix P_hat associated with the recovered probability measure")
 print(np.round(Phat2, 4))
 print("\nMartingale increment h_hat")
 print(np.round(H2, 4))
@@ -1677,7 +2213,7 @@ $$
 s_{ij}=A\left(\frac{c_j}{c_i}\right)^{-\gamma}.
 $$
 
-Show that $\hat e_i=c_i^\gamma$ is the Perron eigenvector and that
+Show that $\hat e_i=c_i^\gamma$ is the Perron--Frobenius eigenvector and that
 $\hat{\mathbf{P}}=\mathbf{P}$.
 
 Then verify the result numerically using the three-state baseline in the lecture.
@@ -1711,7 +2247,7 @@ H_power, _, e_power, P_hat_power = martingale_increment(Q_power, P_true)
 e_theory = c_levels**γ_power
 e_theory = e_theory / e_theory.sum()
 
-print("Perron eigenvector")
+print("Perron-Frobenius eigenvector")
 print(np.round(e_power, 6))
 print("\nNormalized c^gamma")
 print(np.round(e_theory, 6))
@@ -1725,7 +2261,7 @@ print("max |h_hat - 1|:",
 ```
 
 ```{exercise}
-:label: ex_recursive_utility_distortion
+:label: ex_recursive_utility_martingale_component
 
 **Recursive utility and risk aversion.**
 
@@ -1736,7 +2272,7 @@ $\hat{\mathbf{P}}$ for $\gamma \in \{1, 5, 10, 15\}$.
 Which state receives the largest increase in stationary probability as $\gamma$ rises?
 ```
 
-```{solution-start} ex_recursive_utility_distortion
+```{solution-start} ex_recursive_utility_martingale_component
 :class: dropdown
 ```
 

From e6ecf60a878b2ff08f43f5394f22a8d104044a67 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Mon, 27 Apr 2026 22:42:22 +0800
Subject: [PATCH 23/26] updates

---
 lectures/misspecified_recovery.md | 73 +++++++++++++++++++++----------
 1 file changed, 51 insertions(+), 22 deletions(-)

diff --git a/lectures/misspecified_recovery.md b/lectures/misspecified_recovery.md
index f9288e250..71a073a65 100644
--- a/lectures/misspecified_recovery.md
+++ b/lectures/misspecified_recovery.md
@@ -41,11 +41,6 @@ We will keep three probability measures separate.
 The first is the correctly specified probability measure, which governs the Markov
 state in the model.
 
-In the paper, this can be interpreted as the actual probability measure under rational
-expectations.
-
-Interpreting it as investors' subjective beliefs requires additional assumptions.
-
 The second is the one-period risk-neutral probability measure, which comes from
 normalizing one-period Arrow prices by bond prices.
 
@@ -218,8 +213,12 @@ The result is a stochastic matrix $\hat{\mathbf{P}}$.
 This construction assumes that $\mathbf{Q}$ has a unique positive right eigenvector up
 to scale.
 
-For a finite irreducible non-negative matrix, the Perron--Frobenius theorem guarantees
-this uniqueness directly.
+For a finite irreducible nonnegative matrix, the Perron root has a strictly positive
+right eigenvector unique up to scale.
+
+For long-horizon dominance and convergence, one typically imposes a stronger condition
+such as primitivity or aperiodicity; the paper uses a positivity condition on
+$\sum_{t=0}^{\infty}\lambda^t\mathbf Q^t$.
 
 In general state spaces this guarantee does not carry over: multiple positive
 eigenfunctions may exist, and an additional selection condition is needed to pin down
@@ -523,14 +522,17 @@ This condition is the same as saying that the SDF can be written as
 This finite-state implication is a special case of the paper's general identification
 result.
 
-If a pair $(S,P)$ explains asset prices and $H$ is any positive martingale, then the
-same asset prices are also explained by the changed probability measure $P^H$ together
-with the adjusted stochastic discount factor
+If a pair $(S,P)$ explains asset prices and $H$ is a positive multiplicative
+martingale, then the same asset prices are also explained by the changed probability
+measure $P^H$ together with the adjusted stochastic discount factor
 
 $$
 S_t^H = S_t\frac{H_0}{H_t}.
 $$
 
+More generally, any strictly positive martingale can change probability measures, but
+multiplicativity preserves the Markov structure used here.
+
 Thus Arrow prices alone cannot usually distinguish a change in beliefs from a change in
 the SDF.
 
@@ -573,7 +575,7 @@ without assuming that states are finite.
 
 The transition matrix becomes a Markov probability measure, the Arrow price matrix
 becomes a family of pricing operators, and the one-period ratios $\hat h_{ij}$ become
-a positive martingale process.
+increments of a positive multiplicative martingale.
 
 The point of this section is to build that dictionary.
 
@@ -614,7 +616,7 @@ X_{t+1}=\phi_x(X_t,\Delta W_{t+1}).
 $$
 
 The conditional law generated by this equation is the general-state replacement for
-row $i$ of the finite matrix $\mathbf P$.
+the finite-matrix row indexed by the current state $x$.
 
 ````{prf:assumption} Markov state and shock increments
 :label: assumption-mr-markov-shocks
@@ -662,6 +664,9 @@ Write the enlarged process as $Z=(X,Y)$.
 The process $Z$ is Markov with a triangular structure: the conditional distribution of
 $(X_{t+1},Y_{t+1}-Y_t)$ depends on the past only through $X_t$.
 
+Histories of $Z$, together with $X_0$, generate the same information as the shock
+history.
+
 This is why the next Perron--Frobenius problem can first be posed with eigenfunctions
 of $X$ alone.
 
@@ -690,6 +695,11 @@ $$
 
 Thus $M_t/M_0$ is a product of positive one-period increments.
 
+This is the Condition-1 version of a multiplicative functional used in the paper.
+
+The formal definition is slightly broader, but this form covers the models studied
+below.
+
 Under {prf:ref}`assumption-mr-markov-shocks`, the logarithm of $M$ has stationary
 increments.
 
@@ -756,7 +766,11 @@ In continuous time, the family $\{Q_t:t\geq0\}$ is a semigroup of pricing operat
 Different stochastic discount factor / probability pairs can produce the same pricing
 operators, and that flexibility is the source of the identification problem.
 
-Let $H=\{H_t\}$ be a strictly positive martingale with $E[H_0]=1$ under $P$.
+For the Markov setting used here, the probability changes of interest are generated by
+positive multiplicative martingales.
+
+At the level of probability changes, let $H=\{H_t\}$ be a strictly positive martingale
+with $E[H_0]=1$ under $P$.
 
 For an event $A$ observable by date $\tau$, the changed probability measure $P^H$ is
 defined by
@@ -835,6 +849,9 @@ state-dependent scaling.
 Since eigenfunctions are defined only up to scale, uniqueness always means uniqueness
 up to multiplication by a positive constant.
 
+In general state spaces, existence of a positive eigenfunction is also a substantive
+condition.
+
 The eigenfunction equation implies the conditional moment restriction
 
 $$
@@ -909,11 +926,14 @@ The only change is notation: $\hat h_{ij}$ is the one-period density ratio in a
 Markov chain, while $\hat H_{t+1}/\hat H_t$ is the corresponding one-period density
 ratio in the general Markov setting.
 
-Therefore $\hat H_t/\hat H_0$ is the density of the recovered probability measure
-relative to the correctly specified probability measure through date $t$.
+Conditional on $X_0$, the likelihood ratio for histories through date $t$ is
+$\hat H_t/\hat H_0$.
 
-If $\hat H_{t+1}/\hat H_t$ is not identically one, the recovered probability measure differs
-from the correctly specified probability measure.
+For the unconditional measure on $\mathcal F_t$, the Radon--Nikodym density is
+$\hat H_t$, where $\hat H_0$ adjusts the initial distribution.
+
+If $\hat H_{t+1}/\hat H_t$ is not identically one, the recovered probability measure
+differs from the correctly specified probability measure.
 
 We show the restriction that rules out this difference in the next section.
 
@@ -923,8 +943,8 @@ In finite irreducible matrix problems, Perron--Frobenius theory gives a unique p
 eigenvector up to scale, so the recovered transition matrix is pinned down by
 $\mathbf Q$.
 
-In general state spaces, multiple positive eigenfunctions may solve the same pricing
-operator problem.
+In general state spaces, a positive eigenfunction need not exist, and multiple
+positive eigenfunctions may solve the same pricing operator problem when one does.
 
 The paper therefore imposes a selection condition on the probability measure induced by
 the candidate eigenfunction.
@@ -964,7 +984,8 @@ $$
 for some positive function $m$ and real number $\delta$.
 ````
 
-This is the transition-independence restriction from {doc}`ross_recovery`.
+This is the transition-independence restriction from {doc}`ross_recovery`, imposed on
+the SDF representation whose probability measure one wants to recover.
 
 Under this restriction, setting $\hat e=1/m$ and $\hat\eta=-\delta$ gives
 
@@ -1046,7 +1067,12 @@ A local martingale has zero drift in $dM_t/M_t$, which gives the displayed restr
 Additional integrability conditions then ensure that this local martingale is a true
 martingale.
 
-Under the probability change induced by such a martingale, Brownian drifts shift.
+Under the probability measure induced by a martingale $H$ with this exposure
+$\alpha$, $\widetilde W_t=W_t-\int_0^t\alpha(X_s)ds$ is Brownian.
+
+With this sign convention, the drift of $X$ changes from $\mu_x$ to
+$\mu_x+\sigma_x\alpha$, and the drift of $Y$ changes from $\mu_y$ to
+$\mu_y+\sigma_y\alpha$.
 
 This is the continuous-time analogue of replacing $p_{ij}$ by $h_{ij}p_{ij}$ in the
 finite-state model.
@@ -1199,7 +1225,7 @@ rec_prob_gain = 100 * (rec_prob - π_true[0])
 fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))
 
 bound = np.max(np.abs(H_dev))
-im = axes[0].imshow(H_dev, cmap='Blues', vmin=-bound, vmax=bound)
+im = axes[0].imshow(H_dev, cmap='RdBu_r', vmin=-bound, vmax=bound)
 axes[0].set_xticks(range(3))
 axes[0].set_yticks(range(3))
 axes[0].set_xticklabels(state_names, rotation=20)
@@ -1630,6 +1656,9 @@ distribution.
 Thus the martingale component accounts for much of the risk adjustment in the
 state dynamics.
 
+The paper's Figure 1 reports model-implied stationary densities; the simulation below
+is a numerical approximation to those densities.
+
 The plot below simulates the state process under each probability measure and estimates
 the stationary joint density of $(X_2, X_1)$.
 

From 898ce4a76d5a39c45e4a47d9f33cf6adf6c6ba15 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Tue, 28 Apr 2026 12:57:42 +0800
Subject: [PATCH 24/26] updates

---
 lectures/_static/quant-econ.bib  | 224 +++++++++++++++----------------
 lectures/chow_business_cycles.md |   2 +-
 lectures/lagrangian_lqdp.md      |   4 +-
 3 files changed, 115 insertions(+), 115 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index f26738561..2701fe981 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -780,7 +780,7 @@ @book{Sargent_Stachurski_2025
   place={Cambridge}, 
   title={Dynamic Programming: Finite States}, 
   publisher={Cambridge University Press}, 
-  author={Sargent, Thomas J and Stachurski, John}, 
+  author={Sargent, Thomas J. and Stachurski, John},
   year={2025}
 }
 
@@ -790,28 +790,24 @@ @book{Sargent_Stachurski_2024
   title={Economic Networks: Theory and Computation},
   publisher={Cambridge University Press},
   author={Sargent, Thomas J. and Stachurski, John},
-  year={2024},
-  collection={Structural Analysis in the Social Sciences}
+  year={2024}
 }
 
-@incollection{slutsky:1927,
- address = {Moscow},
- author = {Slutsky, Eugen},
- booktitle = {Problems of Economic Conditions},
- date-added = {2021-02-16 14:44:03 -0600},
- date-modified = {2021-02-16 14:44:03 -0600},
- publisher = {The Conjuncture Institute},
- title = {The Summation of Random Causes as the Source of Cyclic Processes},
- volume = {3},
- year = {1927}
+@article{slutsky1937,
+  author  = {Slutzky, Eugen},
+  title   = {The Summation of Random Causes as the Source of Cyclic Processes},
+  journal = {Econometrica},
+  volume  = {5},
+  number  = {2},
+  pages   = {105--146},
+  year    = {1937},
+  doi     = {10.2307/1907241}
 }
 
 @incollection{frisch33,
- author = {Ragar Frisch},
+ author = {Ragnar Frisch},
  booktitle = {Economic Essays in Honour of Gustav Cassel},
- date-added = {2015-01-09 21:08:15 +0000},
- date-modified = {2015-01-09 21:08:15 +0000},
- pages = {171-205},
+ pages = {171--203},
  publisher = {Allen and Unwin},
  title = {Propagation Problems and Impulse Problems in Dynamic Economics},
  year = {1933}
@@ -991,8 +987,6 @@ @incollection{Hurwicz:1962
  address = {Stanford, CA},
  author = {Hurwicz, Leonid},
  booktitle = {Logic, Methodology and Philosophy of Science},
- date-added = {2014-12-26 17:45:57 +0000},
- date-modified = {2022-01-09 19:40:37 -0600},
  pages = {232-239},
  publisher = {Stanford University Press},
  title = {On the Structural Form of Interdependent Systems},
@@ -1032,8 +1026,8 @@ @article{wecker1979predicting
 }
 
 @book{Chadhuri_Mukerjee_88,
-  title     = {Randomized Response: Theory and Technique},
-  author    = {A Chadhuri and R Mukerjee},
+  title     = {Randomized Response: Theory and Techniques},
+  author    = {Chaudhuri, A. and Mukerjee, R.},
   year      = {1988},
   publisher = {Marcel Dekker},
   address   = {New York}
@@ -1156,8 +1150,8 @@ @article{apostolakis1990
 }
 
 @unpublished{Greenfield_Sargent_1993,
-  author = {Moses A Greenfield and Thomas J Sargent},
-  title  = {A Probabilistic Analysis of a Catastrophic Transuranic Waste Hoise Accident at the WIPP},
+  author = {Greenfield, Moses A. and Sargent, Thomas J.},
+  title  = {A Probabilistic Analysis of a Catastrophic Transuranic Waste Hoist Accident at the WIPP},
   year   = {1993},
   month  = {June},
   note   = {Environmental Evaluation Group, Albuquerque, New Mexico},
@@ -1184,12 +1178,12 @@ @article{Groves_73
 }
 
 @article{Clarke_71,
-  author  = {Clarke, E.},
-  year    = { 1971},
+  author  = {Clarke, Edward H.},
+  year    = {1971},
   title   = {Multipart pricing of public goods},
   journal = {Public Choice},
-  volume  = {8},
-  pages   = {19-33}
+  volume  = {11},
+  pages   = {17--33}
 }
 
 @article{Vickrey_61,
@@ -1243,8 +1237,6 @@ @article{tu_Rowley
 
 @book{Knight:1921,
   author        = {Knight, Frank H.},
-  date-added    = {2020-08-20 10:29:34 -0500},
-  date-modified = {2020-08-20 11:10:35 -0500},
   keywords      = {climate,modeling},
   publisher     = {Houghton Mifflin},
   title         = {{Risk, Uncertainty, and Profit}},
@@ -1253,12 +1245,10 @@ @book{Knight:1921
 
 @article{MaccheroniMarinacciRustichini:2006b,
   author        = {Maccheroni, Fabio and Marinacci, Massimo and Rustichini, Aldo},
-  date-added    = {2021-05-19 08:04:27 -0500},
-  date-modified = {2021-05-19 08:04:27 -0500},
   journal       = {Econometrica},
   keywords      = {*file-import-17-01-11},
   number        = {6},
-  pages         = {1147--1498},
+  pages         = {1447--1498},
   title         = {{Ambiguity Aversion, Robustness, and the Variational Representation of Preferences}},
   volume        = {74},
   year          = {2006}
@@ -1266,8 +1256,6 @@ @article{MaccheroniMarinacciRustichini:2006b
 
   @article{GilboaSchmeidler:1989,
   author          = {Gilboa, Itzhak and Schmeidler, David},
-  date-added      = {2020-08-10 09:11:02 -0500},
-  date-modified   = {2020-08-10 09:11:02 -0500},
   journal         = {Journal of Mathematical Economics},
   keywords        = {climate,modeling},
   mendeley-groups = {nsfbib},
@@ -1402,8 +1390,8 @@ @book{Galichon_2016
 }
 
 @book{DMD_book,
-  title     = {Dynamic mode decomposition: data-driven modeling of complex systems},
-  author    = {J. N.  Kutz and  S. L. Brunton and  B. W, Brunton and  J. L. Proctor},
+  title     = {Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems},
+  author    = {Kutz, J. Nathan and Brunton, Steven L. and Brunton, Bingni W. and Proctor, Joshua L.},
   year      = {2016},
   publisher = {SIAM}
 }
@@ -1417,14 +1405,14 @@ @book{DDSE_book
 }
 
 @book{bertsimas_tsitsiklis1997,
-  author    = {Bertsimas, D. & Tsitsiklis, J. N.},
-  title     = {{Introduction to linear optimization}},
+  author    = {Bertsimas, Dimitris and Tsitsiklis, John N.},
+  title     = {{Introduction to Linear Optimization}},
   publisher = {Athena Scientific},
   year      = {1997}
 }
 
 @book{hu_guo2018,
-  author    = {Hu, Y. & Guo, Y.},
+  author    = {Hu, Yunquan and Guo, Yaohuang},
   title     = {{Operations research}},
   publisher = {Tsinghua University Press},
   edition   = {5th},
@@ -1433,9 +1421,7 @@ @book{hu_guo2018
 
 @article{definetti,
   author        = {Bruno de Finetti},
-  date-added    = {2014-12-26 17:45:57 +0000},
-  date-modified = {2014-12-26 17:45:57 +0000},
-  journal       = {Annales de l'Institute Henri Poincare'},
+  journal       = {Annales de l'Institut Henri Poincaré},
   note          = {English translation in Kyburg and Smokler (eds.), {\it Studies in Subjective Probability}, Wiley, New York, 1964},
   pages         = {1 - 68},
   title         = {La Prevision: Ses Lois Logiques, Ses Sources Subjectives},
@@ -1682,11 +1668,12 @@ @article{benhabib2018skewed
   year    = {2018}
 }
 
-@article{pareto1896cours,
+@book{pareto1896cours,
   title   = {Cours d'{\'e}conomie politique},
-  author  = {Vilfredo, Pareto},
-  journal = {Rouge, Lausanne},
-  volume  = {2},
+  author  = {Pareto, Vilfredo},
+  publisher = {F. Rouge},
+  address = {Lausanne},
+  volume  = {1},
   year    = {1896}
 }
 
@@ -1839,7 +1826,7 @@ @article{Samuelson1939
   title   = {Interactions Between the Multiplier Analysis
              and the Principle of Acceleration},
   author  = {Samuelson, Paul A.},
-  journal = {Review of Economic Studies},
+  journal = {The Review of Economics and Statistics},
   volume  = {21},
   number  = {2},
   year    = {1939},
@@ -1874,7 +1861,7 @@ @incollection{Koopmans
   title      = {On the Concept of Optimal Economic Growth},
   booktitle  = {The Economic Approach to Development Planning},
   address    = { Chicago},
-  publilsher = {Rand McNally},
+  publisher  = {Rand McNally},
   pages      = {225-287}
 }
 
@@ -1966,29 +1953,33 @@ @article{Jovanovic1979
   publisher = {The University of Chicago Press}
 }
 
-@article{Deneckere1992,
-  title     = {Cyclical and chaotic behavior in a dynamic equilibrium model, with implications for fiscal policy},
-  author    = {Deneckere, Raymond J and Judd, Kenneth L},
-  journal   = {Cycles and chaos in economic equilibrium},
+@incollection{Deneckere1992,
+  title     = {Cyclical and Chaotic Behavior in a Dynamic Equilibrium Model, with Implications for Fiscal Policy},
+  author    = {Deneckere, Raymond J. and Judd, Kenneth L.},
+  editor    = {Benhabib, Jess},
+  booktitle = {Cycles and Chaos in Economic Equilibrium},
   pages     = {308--329},
   year      = {1992},
-  publisher = {Princeton University Press}
+  publisher = {Princeton University Press},
+  address   = {Princeton}
 }
 
 @article{Judd1985,
   title     = {On the performance of patents},
   author    = {Judd, Kenneth L},
   journal   = {Econometrica},
+  volume    = {53},
+  number    = {3},
   pages     = {567--585},
-  year      = {1985},
-  publisher = {JSTOR}
+  year      = {1985}
 }
 
 @book{Helpman1985,
-  title     = {Market structure and international trade},
-  author    = {Helpman, Elhanan and Krugman, Paul},
-  year      = {1985},
-  publisher = {MIT Press Cambridge}
+  title     = {Market Structure and Foreign Trade: Increasing Returns, Imperfect Competition, and the International Economy},
+  author    = {Helpman, Elhanan and Krugman, Paul R.},
+  year      = {1987},
+  publisher = {MIT Press},
+  address   = {Cambridge, MA}
 }
 
 @article{LettLud2004,
@@ -2014,13 +2005,13 @@ @article{LettLud2001
 }
 
 @article{CampbellShiller88,
-  author  = {John Y. Campbell, Robert J. Shiller},
+  author  = {Campbell, John Y. and Shiller, Robert J.},
   title   = {{The Dividend-Price Ratio and Expectations of Future Dividends and Discount Factors}},
   journal = {Review of Financial Studies},
   year    = 1988,
   volume  = {1},
   number  = {3},
-  pages   = {195-228}
+  pages   = {195--228}
 }
 
 @book{Friedman98,
@@ -2054,9 +2045,9 @@ @book{Kreps88
 }
 
 @book{Bertsekas75,
-  author    = {Dmitri Bertsekas},
+  author    = {Bertsekas, Dimitri P.},
   title     = {Dynamic Programming and Stochastic Control},
-  year      = {1975},
+  year      = {1976},
   publisher = {Academic Press},
   address   = {New York}
 }
@@ -2178,9 +2169,9 @@ @book{Orfanidisoptimum1988
 }
 
 @book{Athanasios1991,
-  title     = {Probability, random variables, and stochastic processes},
-  author    = {Athanasios, Papoulis and Pillai, S Unnikrishna},
-  publisher = {Mc-Graw Hill},
+  title     = {Probability, Random Variables, and Stochastic Processes},
+  author    = {Papoulis, Athanasios},
+  publisher = {McGraw-Hill},
   year      = {1991}
 }
 
@@ -2210,7 +2201,7 @@ @article{PhelanStacchetti2001
   year    = 2001,
   volume  = {69},
   number  = {6},
-  pages   = {1491-1518},
+  pages   = {1491--1518},
   month   = {November}
 }
 
@@ -2355,9 +2346,12 @@ @article{arellano2008default
 }
 
 @article{davis2006flow,
-  title   = {The flow approach to labor markets: New data sources, micro-macro links and the recent downturn},
-  author  = {Davis, Steven J and Faberman, R Jason and Haltiwanger, John},
+  title   = {The Flow Approach to Labor Markets: New Data Sources and Micro-Macro Links},
+  author  = {Davis, Steven J. and Faberman, R. Jason and Haltiwanger, John},
   journal = {Journal of Economic Perspectives},
+  volume  = {20},
+  number  = {3},
+  pages   = {3--26},
   year    = {2006}
 }
 
@@ -2403,11 +2397,9 @@ @article{Rust1996
 }
 
 @book{AKR1990,
-  author    = {Amman, H. M. and Kendrick, D.A. and Rust, J.},
-  address   = {Burlington, MA},
-  publisher = {Elsevier},
+  editor    = {Amman, H. M. and Kendrick, D. A. and Rust, John},
   title     = {{Handbook of Computational Economics}},
-  year      = {1990}
+  year      = {1996}
 }
 
 @book{AndersonMoore2005,
@@ -2601,10 +2593,12 @@ @article{Hall1978
 }
 
 @article{HallMishkin1982,
-  author  = {Hall, Robert E and Mishkin, Frederic S},
-  journal = {National Bureau of Economic Research Working Paper Series},
+  author  = {Hall, Robert E. and Mishkin, Frederic S.},
+  journal = {Econometrica},
   title   = {{The Sensitivity of Consumption to Transitory Income: Estimates from Panel Data on Households}},
-  volume  = {No. 505},
+  volume  = {50},
+  number  = {2},
+  pages   = {461--481},
   year    = {1982}
 }
 
@@ -2782,23 +2776,23 @@ @article{Kuhn2013
 }
 
 @article{KydlandPrescott1977,
-  author  = {Kydland, Finn E., and Edward C. Prescott},
+  author  = {Kydland, Finn E. and Prescott, Edward C.},
   journal = {Journal of Political Economy},
-  pages   = {867-896},
+  pages   = {473--492},
   title   = {Rules Rather than Discretion: The Inconsistency of Optimal Plans},
-  volume  = {106},
-  number  = {5},
+  volume  = {85},
+  number  = {3},
   year    = {1977}
 }
 
 @article{KydlandPrescott1980,
-  author  = {Kydland, Finn E., and Edward C. Prescott},
-  journal = {Econometrics},
-  pages   = {1345-2370},
+  author  = {Kydland, Finn E. and Prescott, Edward C.},
+  journal = {Econometrica},
+  pages   = {1345--1370},
   title   = {Time to Build and Aggregate Fluctuations},
   volume  = {50},
   number  = {6},
-  year    = {1980}
+  year    = {1982}
 }
 
 @book{LasotaMackey1994,
@@ -2828,9 +2822,9 @@ @article{Lucas1978
 }
 
 @article{LucasStokey1983,
-  author  = {Lucas, Jr., Robert E and Stokey, Nancy L},
-  journal = {Journal of monetary Economics},
-  number  = {3},
+  author  = {Lucas, Jr., Robert E. and Stokey, Nancy L.},
+  journal = {Journal of Monetary Economics},
+  number  = {1},
   pages   = {55--93},
   title   = {{Optimal Fiscal and Monetary Policy in an Economy without Capital}},
   volume  = {12},
@@ -2838,13 +2832,13 @@ @article{LucasStokey1983
 }
 
 @article{MarcetMarimon1994,
-  author      = {Albert Marcet and Ramon Marimon},
-  title       = {{Recursive contracts}},
-  year        = 1994,
-  institution = {Department of Economics and Business, Universitat Pompeu Fabra},
-  type        = {Economics Working Papers},
-  url         = {http://ideas.repec.org/p/upf/upfgen/337.html},
-  number      = {337}
+  author  = {Marcet, Albert and Marimon, Ramon},
+  title   = {{Recursive Contracts}},
+  journal = {Econometrica},
+  volume  = {87},
+  number  = {5},
+  pages   = {1589--1631},
+  year    = {2019}
 }
 
 @article{MarcetSargent1989,
@@ -2960,12 +2954,13 @@ @article{Pearlman1992
 }
 
 @article{PearlmanCurrieLevine1986,
-  author  = {Pearlman, J.G. and Currie, D.A. and Levine, P.L.},
-  title   = {Rational expectations with partial information},
-  journal = {Economic Modeling},
+  author  = {Pearlman, Joseph and Currie, David and Levine, Paul},
+  title   = {Rational expectations models with partial information},
+  journal = {Economic Modelling},
   volume  = {3},
-  pages   = {90-105},
-  year    = {1992}
+  number  = {2},
+  pages   = {90--105},
+  year    = {1986}
 }
 
 @book{Popper1992,
@@ -2980,9 +2975,9 @@ @article{Prescott1977
   author  = {Prescott, Edward C.},
   year    = {1977},
   title   = {Should Control Theory Be Used for Economic Stabilization?},
-  journal = {Journal of Monetary Economics},
+  journal = {Carnegie-Rochester Conference Series on Public Policy},
   volume  = {7},
-  pages   = {13-38}
+  pages   = {13--38}
 }
 
 @article{Rabault2002,
@@ -3018,12 +3013,13 @@ @article{Reiter2009
 }
 
 @article{Sargent1979,
-  author  = {Sargent, T J},
+  author  = {Sargent, Thomas J.},
   year    = {1979},
   title   = {A Note On Maximum Likelihood Estimation of The Rational Expectations Model of The Term Structure},
   journal = {Journal of Monetary Economics},
-  volume  = {35},
-  pages   = {245-274}
+  volume  = {5},
+  number  = {1},
+  pages   = {133--143}
 }
 
 @book{Sargent1987,
@@ -3254,13 +3250,15 @@ @article{barro2006rare
   publisher={MIT Press}
 }
 
-@article{Brock1982,
+@incollection{Brock1982,
   title={Asset prices in a production economy},
   author={Brock, William A},
-  journal={The Economics of Information and Uncertainty},
-  pages={1--43},
+  booktitle={The Economics of Information and Uncertainty},
+  editor={McCall, John J.},
+  pages={1--46},
   year={1982},
-  publisher={University of Chicago Press}
+  publisher={University of Chicago Press},
+  address={Chicago}
 }
 
 @article{PrescottMehra1980,
@@ -3548,12 +3546,14 @@ @book{Hans_Sarg_book_2016
 }
 
 @article{Neyman_Pearson,
-  author  = {Neyman, J. and  Pearson, E. S},
+  author  = {Neyman, J. and Pearson, E. S.},
   year    = {1933},
   title   = {On the problem of the most efficient tests of statistical
              hypotheses},
-  journal = {Phil. Trans. R. Soc. Lond. A. 231 (694–706)},
-  pages   = {289–337}
+  journal = {Philosophical Transactions of the Royal Society of London},
+  volume  = {231},
+  number  = {694--706},
+  pages   = {289--337}
 }
 
 @article{ma2020income,
@@ -3858,4 +3858,4 @@ @article{grossman1976
   number  = {2},
   pages   = {573--585},
   year    = {1976}
-}
\ No newline at end of file
+}
diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index e7bfb9648..245f04291 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -844,7 +844,7 @@ The peak appears at $\omega/\pi \approx 0.10$, which corresponds to a cycle leng
 
 ### The Slutsky connection
 
-Chow connects this result to Slutsky's {cite}`slutsky:1927`  finding that  moving averages of a random series have recurrent cycles.
+Chow connects this result to Slutsky's {cite}`slutsky1937`  finding that  moving averages of a random series have recurrent cycles.
 
 The VAR(1) model can be written as an infinite moving average:
 
diff --git a/lectures/lagrangian_lqdp.md b/lectures/lagrangian_lqdp.md
index 5cce10050..4d80bb632 100644
--- a/lectures/lagrangian_lqdp.md
+++ b/lectures/lagrangian_lqdp.md
@@ -451,10 +451,10 @@ solves. See {cite}`Ljungqvist2012`,  ch 12.
 
 ## Application
 
-Here we demonstrate the computation with the deterministic permanent-income example from this [quantecon lecture](https://python.quantecon.org/lqcontrol.html).
+Here we demonstrate the computation with the deterministic permanent-income example from this {doc}`lqcontrol`.
 
 Because that model is discounted, we apply the invariant-subspace method to the
-equivalent **undiscounted** system obtained from the transformed matrices
+equivalent *undiscounted* system obtained from the transformed matrices
 $\hat A = \beta^{1/2} A$ and $\hat B = \beta^{1/2} B$.
 
 ```{code-cell} ipython3

From d9d3cad44bfcc92d85bca4fa63cd1a2fbef69d66 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Tue, 28 Apr 2026 16:54:20 +0800
Subject: [PATCH 25/26] updates

Co-authored-by: Copilot <copilot@github.com>
---
 lectures/information_market_equilibrium.md | 153 +++++++++++++--------
 lectures/ross_recovery.md                  |  41 ++++--
 2 files changed, 126 insertions(+), 68 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index bfb19ac3f..2fb1f9257 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -40,8 +40,8 @@ answered by {cite:t}`kihlstrom_mirman1975`.
    signal correlated with an unknown state of the world and adjusts demand
    accordingly.
    - Equilibrium prices shift. 
-   - Under what conditions can an outside observer *infer* the
-   insider's private signal from the equilibrium price?
+   - Under what conditions can an outside observer *infer* the insider's
+   posterior distribution from the equilibrium price?
 
 2. *Do Bayesian price expectations converge?*  
    - In a stationary stochastic exchange
@@ -166,6 +166,11 @@ Agents maximize expected utility subject to their budget constraints.
 A **competitive
 equilibrium** is a price $\hat{p}$ that clears both markets simultaneously.
 
+Under the maintained convexity assumptions equilibrium exists, and following
+{cite:t}`kihlstrom_mirman1975` we assume the equilibrium price is unique, so
+that we can write $\hat p = p(\mu)$ as a well-defined function of the informed
+agent's posterior.
+
 For most of what follows, the production side matters only through the induced
 equilibrium price map, so when we turn to numerical illustrations we will
 suppress production and use a pure-exchange / portfolio interpretation to keep
@@ -331,8 +336,8 @@ Good 1 is a risky asset with random return $\bar{a}$; good 2 is "money".
 An insider's demand reveals private information about the return.
 
 If the invertibility condition holds, outside observers can read the insider's
-signal from
-the equilibrium stock price.
+posterior distribution -- the useful information the insider's signal carries
+about $\bar a$ -- from the equilibrium stock price.
 
 #### Price as a quality signal
 
@@ -745,8 +750,8 @@ In each period $t$:
 
 The endowment vectors $\{\tilde{\omega}^t\}$ are **i.i.d.** with density
 $f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_K)$ is
-a
-**structural parameter vector** (of dimension $K$) that is *fixed but unknown*.
+a **structural parameter vector** (of dimension $K$) that is *fixed but
+unknown*.
 
 The equilibrium price at time $t$ is a deterministic function of $\omega^t$, so
 $\{p^t\}$ is also i.i.d.
@@ -775,12 +780,13 @@ structure from price data alone.
 
 ### The identification problem
 
-Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices
-loses
-information relative to observing endowments.
+Because price observations identify only the induced price density
+$g(\cdot \mid \lambda)$, and because the structural-to-reduced-form map
+$\lambda \mapsto g(\cdot \mid \lambda)$ may be many-to-one, price data may
+identify only a reduced-form class rather than the exact structure.
 
-In particular, it may be impossible to
-recover $\lambda$ from $g(p \mid \lambda)$ even with infinite price data.
+In particular, it may be impossible to recover $\lambda$ from
+$g(p \mid \lambda)$ even with infinite price data.
 
 To handle this, partition $\Lambda$ into equivalence classes $\mu$ such that
 $\lambda \in \mu$ and $\lambda' \in \mu$ whenever $g(p \mid \lambda) = g(p \mid
@@ -1040,9 +1046,9 @@ We now verify that the observer's price expectations converge to the
 rational-expectations
 distribution $g(p \mid \bar\mu)$.
 
-We continue to use the parameterization of the "easy-to-learn" example above
-($\bar{p}_{\text{true}} = 2.0$, $\bar{p}_{\text{alt}} = 1.2$, $\sigma_p = 0.4$),
-now extending to $T = 1{,}000$ periods with a single simulated path and prior $h_0 = 0.5$
+We use the parameterization of the "hard-to-learn" example above
+($\bar{p}_{\text{true}} = 2.0$, $\bar{p}_{\text{alt}} = 1.8$, $\sigma_p = 0.4$),
+extending to $T = 1{,}000$ periods with a single simulated path and prior $h_0 = 0.5$
 
 ```{code-cell} ipython3
 ---
@@ -1059,7 +1065,7 @@ def price_expectation(h_t, p_bar_true, p_bar_alt, σ_p, p_grid):
     )
 
 
-p_bar_true, p_bar_alt = 2.0, 1.2
+p_bar_true, p_bar_alt = 2.0, 1.8
 σ_p = 0.4
 n_paths = 1
 T_long = 1000
@@ -1072,7 +1078,7 @@ p_grid = np.linspace(0.0, 3.5, 300)
 re_density = norm.pdf(p_grid, loc=p_bar_true, scale=σ_p)
 
 fig, ax = plt.subplots(figsize=(8, 5))
-snapshots = [0, 1, 3, 5, 10]
+snapshots = [0, 25, 100, 300, 1000]
 palette   = plt.cm.Blues(np.linspace(0.3, 1.0, len(snapshots)))
 
 for t_snap, col in zip(snapshots, palette):
@@ -1118,9 +1124,10 @@ $\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
 (because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price
 distribution).
 
-The three structures have price means $\bar{p}_1 = \bar{p}_2 = 2.0$ and
-$\bar{p}_3 = 1.2$, with common standard deviation $\sigma_p = 0.4$, a
-uniform prior $h_0 = (1/3, 1/3, 1/3)$, and $T = 400$ periods over $30$ paths.
+We continue with the hard-to-learn parameterization, so the three structures
+have price means $\bar{p}_1 = \bar{p}_2 = 2.0$ and $\bar{p}_3 = 1.8$, with
+common standard deviation $\sigma_p = 0.4$, a uniform prior
+$h_0 = (1/3, 1/3, 1/3)$, and $T = 400$ periods over $30$ paths.
 
 The true structure is $\lambda^{(1)}$.
 
@@ -1152,7 +1159,7 @@ def simulate_learning_3struct(
 
 
 # Structures 0 and 1 share the same reduced form
-p_bar_vec = np.array([2.0, 2.0, 1.2])
+p_bar_vec = np.array([2.0, 2.0, 1.8])
 h0_vec = np.array([1 / 3, 1 / 3, 1 / 3])
 σ_p = 0.4
 T = 400
@@ -1199,10 +1206,10 @@ $\bar\mu$.
 ```{exercise}
 :label: km_ex1
 
-Consider a two-state economy ($a_1 = 2$,
-$a_2 = 0.5$) where the informed agent has **CARA** (constant absolute risk
-aversion)
-preferences over portfolio wealth:
+**CARA portfolio utility and the stock-market interpretation.**
+
+Consider a two-state economy ($a_1 = 2$, $a_2 = 0.5$) where the informed agent has
+**CARA** (constant absolute risk aversion) preferences over portfolio wealth:
 
 $$
 u(W) = -e^{-\gamma W}, \quad W = x_2 + \bar{a}\, x_1.
@@ -1221,17 +1228,18 @@ Total supply of good 1 is $X_1 = 1$.
 1. Derive the first-order condition for the informed agent's optimal $x_1$.
 
 1. Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the
-   entire
-supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.
-Solve it
-numerically for $q \in (0,1)$ and several values of $\gamma$.
-
-1. Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility
-   condition
-holds in this example. Explain why this is economically similar to the $\sigma >
-1$ case in
-{prf:ref}`ime_theorem_invertibility_conditions`, but not a direct application of
-that theorem.
+   entire supply) to obtain an implicit equation for the equilibrium price
+   $p^*(q)$, and solve it numerically for $q \in (0,1)$ and several values of
+   $\gamma$.
+
+1. Show *analytically* that $p^*(q)$ admits the closed form
+
+   $$
+   p^*(q) = \frac{a_2 + R(q,\gamma)\, a_1}{1 + R(q,\gamma)},
+   \qquad R(q,\gamma) = \frac{q}{1-q}\, e^{-\gamma(a_1-a_2)},
+   $$
+
+   and verify that $p^*(q)$ is strictly increasing in $q$.
 ```
 
 ```{solution-start} km_ex1
@@ -1294,17 +1302,41 @@ plt.show()
 
 The price is strictly increasing in $q$ for every $\gamma > 0$.
 
-The reason is that portfolio utility $u(x_2 + \bar{a}\,x_1)$ treats the two
-goods as perfect substitutes in creating wealth, so a higher posterior
-probability of the high-return state raises the marginal value of the risky
-asset and pushes the equilibrium price upward.
+For the closed form, start from the FOC at $x_1 = 1$, divide both sides by
+$(a_1 - p)(p - a_2)$, and combine the exponentials:
+
+$$
+\frac{q\,(a_1 - p)}{(1-q)\,(p - a_2)} = e^{\gamma(a_1 - a_2)}.
+$$
 
-This behavior is similar in spirit to the $\sigma > 1$ case in
-{prf:ref}`ime_theorem_invertibility_conditions`, but it is not a direct
-consequence of that theorem because CARA utility over wealth is not homothetic
-in the two-good representation used in the theorem.
+Rearranging gives
 
-Here monotonicity is verified directly from the specific first-order condition.
+$$
+\frac{p - a_2}{a_1 - p} = \frac{q}{1-q}\, e^{-\gamma(a_1 - a_2)}
+\equiv R(q,\gamma),
+$$
+
+and solving the resulting linear equation in $p$ yields
+
+$$
+p^*(q) = \frac{a_2 + R(q,\gamma)\, a_1}{1 + R(q,\gamma)}.
+$$
+
+Since $R(q,\gamma)$ is strictly increasing in $q$ and
+$dp^*/dR = (a_1 - a_2)/(1 + R)^2 > 0$, the equilibrium price $p^*(q)$ is
+strictly increasing in $q$.
+
+This exercise uses the stock-market interpretation emphasized by
+{cite:t}`kihlstrom_mirman1975`.
+
+Portfolio wealth is $W = x_2 + \bar{a}\, x_1$, so $a x_1$ and $x_2$ are perfect
+substitutes in each state.
+
+Hence the elasticity of substitution between the two arguments of
+$u(a x_1, x_2)$ is infinite, corresponding to the $\sigma > 1$ side of
+{prf:ref}`ime_theorem_invertibility_conditions`.
+
+The difference is that this example is not the full equilibrium model the theorem analyzes, but rather a partial equilibrium model with a single informed agent and a fixed supply of the risky asset.
 
 ```{solution-end}
 ```
@@ -1317,15 +1349,16 @@ convergence to rational expectations is determined by the **Kullback-Leibler
 divergence**
 between the two reduced forms.
 
-The KL divergence from $g(\cdot \mid \mu_2)$ to $g(\cdot \mid \mu_1)$, for two
-normal
-distributions with means $\bar{p}_1$ and $\bar{p}_2$ and common variance
-$\sigma_p^2$, is
+The KL divergence $D_{KL}(\mu_1 \| \mu_2)$ from $g(\cdot \mid \mu_1)$ to
+$g(\cdot \mid \mu_2)$, for two normal distributions with means $\bar{p}_1$ and
+$\bar{p}_2$ and common variance $\sigma_p^2$, is
 
 $$
-D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2}.
+D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2},
 $$
 
+which is symmetric in the two means under equal variances.
+
 1. For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard"
    case
 ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.8$), compute $D_{KL}$ for $\sigma_p = 0.4$.
@@ -1377,8 +1410,8 @@ for ax, (name, p1, p2) in zip(axes, cases):
                label=fr"Median $T_{{0.99}} = {median_T:.0f}$")
     ax.set_title(
         f"{name}: $D_{{KL}} = {kl:.4f}$,  "
-        fr"$C/D_{{KL}} \approx {median_T*kl:.1f}$",
-        fontsize=11
+        fr"$\widehat C = T_{{0.99}} D_{{KL}} \approx {median_T * kl:.1f}$",
+        fontsize=11 
     )
     ax.set_xlabel(r"$T_{0.99}$", fontsize=12)
     ax.set_ylabel("count", fontsize=11)
@@ -1399,18 +1432,18 @@ $D_{KL}$).
 ```{exercise}
 :label: km_ex3
 
-{prf:ref}`ime_theorem_bayesian_convergence`
-assumes the true
-distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
-$h(\bar\lambda) > 0$).
+{prf:ref}`ime_theorem_bayesian_convergence` requires the prior to assign
+positive probability to the true reduced-form class $\bar\mu$, equivalently to
+some structure that generates the true price distribution
+$g(\cdot \mid \bar\mu)$.
 
-Investigate what happens when the true model is *not* in the
-prior support.
+In this exercise the true reduced form itself is excluded from the prior
+support, so we investigate what happens when no model in the prior generates the
+true price distribution.
 
 Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior
-   that
-    places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.3,
-    0.4^2)$.
+that places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and
+$N(2.3, 0.4^2)$.
 
 Plot the posterior weight on each model over time.
 
diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index b3ea3b008..bbc1bd79e 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -63,7 +63,11 @@ If the pricing kernel also satisfies a structural restriction called **transitio
 independence**, then state prices uniquely determine both the natural probability
 transition matrix and the transition pricing kernel.
 
-No historical return data or assumed utility function is needed.
+No historical return data or assumed utility function is needed -- but this
+identification is conditional on the maintained assumptions of the Recovery
+Theorem: no arbitrage, a finite/discretized irreducible Markov state space,
+transition independence of the pricing kernel, and accurate recovery of state
+prices.
 
 This is the **Recovery Theorem**.
 
@@ -391,6 +395,13 @@ $$
 
 where $h(\theta_i) = \beta/z_i$ follows from $D_{ii} = h(\theta_i)/\beta = 1/z_i$.
 
+It is useful to distinguish the **full transition kernel** $\phi_{ij} = \beta z_i/z_j$,
+which depends on both origin and destination states, from the **relative kernel
+component** $1/z_j$, which depends only on the destination state.
+
+Ross's Table I reports the destination-state shape $1/z_j$, normalized so that the
+middle state equals one.
+
 Destination states with high $z_j$ have *low* kernel values: for a fixed origin $i$,
 the kernel $\beta z_i/z_j$ is decreasing in $z_j$.
 
@@ -882,7 +893,14 @@ A higher $\gamma$ amplifies this wedge.
 A useful by-product of the Recovery Theorem is the *recovered subjective discount
 factor* $\beta$, which equals the Perron–Frobenius eigenvalue of $P$.
 
-The corresponding continuously compounded discount rate is $\rho = -\log \beta$.
+If the horizon is $T$, the corresponding continuously compounded subjective discount
+rate is
+
+$$
+\rho = -\frac{\log \beta}{T}.
+$$
+
+In the numerical examples below, $T=1$, so this reduces to $\rho = -\log \beta$.
 
 Corollary 1 of {cite:t}`Ross2015` states that $\beta$ is bounded above by the largest
 state-dependent one-period discount factor — equivalently, the maximum row sum of $P$:
@@ -970,6 +988,9 @@ In this CRRA
 simulation, increasing risk aversion makes the risk-neutral crash probability rise
 faster than the recovered natural crash probability.
 
+This is a simulation illustrating Ross's decomposition, not a replication of
+Ross's S&P 500 empirical Table V.
+
 We will say more in {ref}`rt_ex3`.
 
 ## From option prices to transition prices
@@ -1007,8 +1028,8 @@ be the vector of state prices at horizon $t$ observed from today's state $c$.
 
 Here $c$ indexes the current state and $t$ counts discrete maturity steps.
 
-The first one-period vector $p_1(c)$ is the row of $P$ corresponding to the
-current state.
+The first one-period vector $p_1(c)$ identifies the row of $P$ corresponding to
+the current state $c$, supplying $m$ equations.
 
 If the one-period state-price transition matrix $P$ is time homogeneous, these
 vectors satisfy the forward recursion
@@ -1024,8 +1045,10 @@ $$
 p_{t+1}(c,j) = \sum_k p_t(c,k) p(k,j).
 $$
 
-Thus $m$ maturity vectors supply the $m^2$ equations needed to estimate the
-$m^2$ transition prices $p(k,j)$.
+The remaining $m-1$ forward equations $p_{t+1}(c)=p_t(c)P$, each with $m$
+components, supply the remaining $m(m-1)$ equations.
+
+Together these give $m^2$ equations for the $m^2$ transition prices $p(k,j)$.
 
 In practice this step is numerically delicate because the second derivative in
 the option-price formula amplifies measurement error, and because additional
@@ -1034,7 +1057,8 @@ reasonable transition matrix.
 
 ## Testing efficient markets
 
-The recovered pricing kernel can also be used to test market efficiency.
+The recovered pricing kernel can also be used to test market efficiency, under the
+maintained assumptions of the Recovery Theorem.
 
 If a trading strategy has a very high Sharpe ratio, then some pricing kernel must be
 volatile enough to price that payoff.
@@ -1089,7 +1113,8 @@ practice.
 
 *Finite state space:*
 
-The theorem requires a bounded, irreducible Markov chain.
+Ross's theorem is proved for a finite-state irreducible Markov chain; bounded
+continuous-state recovery requires additional results.
 
 In continuous, unbounded state spaces (e.g., a lognormal diffusion), uniqueness fails
 because any exponential $e^{\alpha x}$ satisfies the characteristic equation.

From 9c26975a4cb676506b0274898f2b3390c2a50ad2 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Tue, 28 Apr 2026 16:58:22 +0800
Subject: [PATCH 26/26] update

Co-authored-by: Copilot <copilot@github.com>
---
 lectures/ross_recovery.md | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/lectures/ross_recovery.md b/lectures/ross_recovery.md
index bbc1bd79e..3f879d069 100644
--- a/lectures/ross_recovery.md
+++ b/lectures/ross_recovery.md
@@ -63,11 +63,8 @@ If the pricing kernel also satisfies a structural restriction called **transitio
 independence**, then state prices uniquely determine both the natural probability
 transition matrix and the transition pricing kernel.
 
-No historical return data or assumed utility function is needed -- but this
-identification is conditional on the maintained assumptions of the Recovery
-Theorem: no arbitrage, a finite/discretized irreducible Markov state space,
-transition independence of the pricing kernel, and accurate recovery of state
-prices.
+No historical return data or assumed utility function is needed if some assumptions 
+about the structure of the pricing kernel hold.
 
 This is the **Recovery Theorem**.
 
@@ -981,6 +978,8 @@ ax.legend()
 plt.show()
 ```
 
+This is a simulation illustrating Ross's decomposition.
+
 The risk-neutral density assigns higher probability to large drops than the recovered
 natural density.
 
@@ -988,9 +987,6 @@ In this CRRA
 simulation, increasing risk aversion makes the risk-neutral crash probability rise
 faster than the recovered natural crash probability.
 
-This is a simulation illustrating Ross's decomposition, not a replication of
-Ross's S&P 500 empirical Table V.
-
 We will say more in {ref}`rt_ex3`.
 
 ## From option prices to transition prices
@@ -1057,8 +1053,7 @@ reasonable transition matrix.
 
 ## Testing efficient markets
 
-The recovered pricing kernel can also be used to test market efficiency, under the
-maintained assumptions of the Recovery Theorem.
+The recovered pricing kernel can also be used to test market efficiency, under the assumptions of the Recovery Theorem.
 
 If a trading strategy has a very high Sharpe ratio, then some pricing kernel must be
 volatile enough to price that payoff.
@@ -1114,7 +1109,7 @@ practice.
 *Finite state space:*
 
 Ross's theorem is proved for a finite-state irreducible Markov chain; bounded
-continuous-state recovery requires additional results.
+continuous-state recovery requires additional results in {doc}`misspecified_recovery`.
 
 In continuous, unbounded state spaces (e.g., a lognormal diffusion), uniqueness fails
 because any exponential $e^{\alpha x}$ satisfies the characteristic equation.