From 1a4f0523d9b38ac0f62c1dce87cd8e740c1849c9 Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Sun, 12 Apr 2026 12:59:29 -0600
Subject: [PATCH 01/10] Tom's edits of a new Kihlstrom lecture

---
 .../quantecon-lecture-writing.instructions.md |   87 ++
 .github/prompts/quantecon-lecture.prompt.md   |  209 +++
 lectures/_static/quant-econ.bib               |   53 +
 lectures/_toc.yml                             |    1 +
 lectures/information_market_equilibrium.md    | 1185 +++++++++++++++++
 5 files changed, 1535 insertions(+)
 create mode 100644 .github/instructions/quantecon-lecture-writing.instructions.md
 create mode 100644 .github/prompts/quantecon-lecture.prompt.md
 create mode 100644 lectures/information_market_equilibrium.md
diff --git a/.github/instructions/quantecon-lecture-writing.instructions.md b/.github/instructions/quantecon-lecture-writing.instructions.md
new file mode 100644
index 000000000..ee3b607f7
--- /dev/null
+++ b/.github/instructions/quantecon-lecture-writing.instructions.md
@@ -0,0 +1,87 @@
+---
+applyTo: "lectures/**/*.md"
+description: "MyST markdown and QuantEcon lecture writing conventions. Applied when editing or creating files in the lectures/ directory."
+---
+
+# QuantEcon Lecture Writing Conventions
+
+## Equation Spacing (Critical)
+
+Display equations **must** have a blank line before `$$` and after `$$`:
+
+```
+text before
+
+$$
+equation here
+$$
+
+text after
+```
+
+Never place `$$` immediately adjacent to text lines.
+
+## File Frontmatter
+
+Every lecture `.md` file starts with:
+
+```yaml
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+```
+
+## Cross-Reference Label
+
+Immediately after frontmatter, before the title:
+
+```
+(lecture_label)=
+```{raw} jupyter
+<div id="qe-notebook-header" ...>...</div>
+```
+
+# Title
+```
+
+## Code Cells
+
+All executable Python uses `` ```{code-cell} ipython3 ``.
+Non-executable code uses `` ```python ``.
+
+## Citations and References
+
+- Cite with `{cite}` `` `BibKey` ``
+- Check `lectures/_static/quant-econ.bib` for existing keys before adding new ones
+- New references go in a separate `_extra.bib` file alongside the lecture
+
+## Exercises
+
+Use the paired directives:
+
+```
+```{exercise}
+:label: label_ex1
+...
+```
+
+```{solution-start} label_ex1
+:class: dropdown
+```
+...
+```{solution-end}
+```
+```
+
+## Preferred Python Libraries
+
+`numpy`, `scipy`, `matplotlib`, `quantecon`, `jax` (for computationally intensive work), `numba`
diff --git a/.github/prompts/quantecon-lecture.prompt.md b/.github/prompts/quantecon-lecture.prompt.md
new file mode 100644
index 000000000..453651434
--- /dev/null
+++ b/.github/prompts/quantecon-lecture.prompt.md
@@ -0,0 +1,209 @@
+---
+name: "QuantEcon Lecture from Paper"
+description: "Convert a scientific paper (PDF or .tex) into a QuantEcon lecture in MyST markdown. Attach the paper file before invoking. Produces a .md lecture file and a supplementary .bib file."
+argument-hint: "Attach the paper PDF or .tex file, then optionally specify the desired output filename (e.g. 'my_topic.md')"
+agent: "agent"
+---
+
+You are helping Thomas Sargent convert a scientific paper into a QuantEcon lecture written in the MyST dialect of markdown, following the style and conventions of [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md).
+
+## Your Task
+
+1. **Read the attached paper** (PDF or .tex). Understand its core economic/mathematical content, key results, key intuitions, and analytical techniques.
+
+2. **Draft a complete QuantEcon lecture** as a `.md` file in `lectures/`. The lecture should:
+   - Explain the paper's ideas accessibly to a graduate student audience
+   - Lead the reader through the theory step by step, not just summarize
+   - Include substantial Python code cells that illustrate, compute, and visualize the paper's key results
+   - End with exercises (with full solutions in dropdown blocks)
+
+3. **Produce a supplementary `.bib` file** for any references not already in [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib).
+
+---
+
+## MyST / Jupyter Book Format Rules
+
+Follow these rules exactly. Study [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md) as the canonical example.
+
+### File Frontmatter (required, verbatim structure)
+
+```
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+```
+
+### Required Header Block
+
+Immediately after the frontmatter, add a cross-reference label and the QuantEcon logo block:
+
+```
+(my_lecture_label)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Lecture Title
+
+```{contents} Contents
+:depth: 2
+```
+```
+
+### Equations — CRITICAL SPACING RULE
+
+Every display equation block **must** have a blank line before the opening `$$` and a blank line after the closing `$$`. This is mandatory.
+
+**Correct:**
+
+```
+some text before
+
+$$
+E[x] = \mu
+$$
+
+some text after
+```
+
+**Wrong (will break the build):**
+
+```
+some text before
+$$
+E[x] = \mu
+$$
+some text after
+```
+
+Inline math uses single dollars: $\mu$, $\sigma^2$.
+
+Multi-line aligned equations use:
+
+```
+$$
+\begin{aligned}
+a &= b \\
+c &= d
+\end{aligned}
+$$
+```
+
+### Code Cells
+
+Use ` ```{code-cell} ipython3 ` for all executable Python. For the `pip install` cell at the top (if needed):
+
+```
+```{code-cell} ipython3
+:tags: [hide-output]
+!pip install --upgrade quantecon
+```
+```
+
+### Citations
+
+Use `{cite}` with the BibTeX key: `{cite}` `` `Author_Year` ``. Example: `{cite}` `` `Neyman_Pearson` ``.
+
+Check [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib) first. Add only truly missing references to the new `.bib` file.
+
+### Cross-references
+
+- Link to other lectures: `{doc}` `` `likelihood_ratio_process` ``
+- Label a section: `(my_label)=` on the line before the heading
+- Reference a label: `{ref}` `` `my_label` ``
+
+### Admonitions
+
+```
+```{note}
+...
+```
+
+```{warning}
+...
+```
+```
+
+### Exercises with Solutions
+
+```
+```{exercise}
+:label: ex_label1
+
+Exercise text here.
+```
+
+```{solution-start} ex_label1
+:class: dropdown
+```
+
+Full solution here, including code cells if needed.
+
+```{solution-end}
+```
+```
+
+---
+
+## Lecture Structure Template
+
+Follow this section order:
+
+1. **Overview** — What is this lecture about? What will the reader learn? List bullets.
+2. **Setup** — Imports code cell (all needed libraries). If non-standard packages are needed, add the `pip install` cell first.
+3. **Theory sections** — Walk through mathematical content. Alternate prose, equations, and code cells. Each major concept gets its own `##` section.
+4. **Computational/Simulation sections** — Python code that replicates or extends the paper's numerical results.
+5. **Exercises** — 2–4 exercises ranging from straightforward to challenging, each with a full solution.
+6. **References** — at the end, just add: `` ```{bibliography} `` on its own if references were cited (the global bib handles this automatically via `_config.yml`).
+
+---
+
+## Python Code Guidelines
+
+- Use `numpy`, `scipy`, `matplotlib`, `quantecon` as the default stack
+- Prefer `jax.numpy` / JAX for computationally intensive sections (this repo already has JAX installed)
+- Every figure should call `plt.show()` or `plt.tight_layout(); plt.show()`
+- Write clean, readable code with short docstrings on functions
+- Simulate and plot the paper's key theoretical results rather than just describing them
+
+---
+
+## Supplementary BibTeX File
+
+Name it `lectures/_static/<lecture_name>_extra.bib`. Format example:
+
+```bibtex
+@article{Author_Year,
+  author  = {Last, First and Last2, First2},
+  title   = {Full Title of the Paper},
+  journal = {Journal Name},
+  volume  = {XX},
+  number  = {Y},
+  pages   = {1--30},
+  year    = {YYYY}
+}
+```
+
+Only include references **not already found** in `lectures/_static/quant-econ.bib`.
+
+---
+
+## Output
+
+Produce the complete lecture as a single MyST markdown file. After completing it, also report:
+- The name and path of the output file (e.g. `lectures/my_topic.md`)
+- The name and path of the supplementary bib file (if any new references were needed)
+- A brief (3–5 bullet) summary of what the lecture covers
diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 82f5cc7ec..547006f33 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -3631,3 +3631,56 @@ @article{shwartz_ziv_tishby2017
   journal = {arXiv preprint arXiv:1703.00810},
   year    = 2017
 }
+
+@article{kihlstrom_mirman1975,
+  author    = {Kihlstrom, Richard E. and Mirman, Leonard J.},
+  title     = {Information and Market Equilibrium},
+  journal   = {The Bell Journal of Economics},
+  volume    = {6},
+  number    = {1},
+  pages     = {357--376},
+  year      = {1975},
+  publisher = {The RAND Corporation}
+}
+
+@article{muth1961,
+  author  = {Muth, John F.},
+  title   = {Rational Expectations and the Theory of Price Movements},
+  journal = {Econometrica},
+  volume  = {29},
+  number  = {3},
+  pages   = {315--335},
+  year    = {1961}
+}
+
+@article{radner1972,
+  author  = {Radner, Roy},
+  title   = {Existence of Equilibrium Plans, Prices, and Price Expectations
+             in a Sequence of Markets},
+  journal = {Econometrica},
+  volume  = {40},
+  number  = {2},
+  pages   = {289--304},
+  year    = {1972}
+}
+
+@article{arrow1964,
+  author  = {Arrow, Kenneth J.},
+  title   = {The Role of Securities in the Optimal Allocation of Risk-Bearing},
+  journal = {Review of Economic Studies},
+  volume  = {31},
+  number  = {2},
+  pages   = {91--96},
+  year    = {1964}
+}
+
+@article{grossman1976,
+  author  = {Grossman, Sanford J.},
+  title   = {On the Efficiency of Competitive Stock Markets Where Trades Have
+             Diverse Information},
+  journal = {Journal of Finance},
+  volume  = {31},
+  number  = {2},
+  pages   = {573--585},
+  year    = {1976}
+}
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index 28999d83f..a24169f91 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -43,6 +43,7 @@ parts:
   - file: exchangeable
   - file: likelihood_bayes
   - file: blackwell_kihlstrom
+  - file: information_market_equilibrium
   - file: mix_model
   - file: navy_captain
   - file: merging_of_opinions
diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
new file mode 100644
index 000000000..00063a9a1
--- /dev/null
+++ b/lectures/information_market_equilibrium.md
@@ -0,0 +1,1185 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(information_market_equilibrium)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Information and Market Equilibrium
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+This lecture studies two questions about the **informational role of prices**  posed and
+answered by {cite:t}`kihlstrom_mirman1975`.
+
+1. **When do prices transmit inside information?**  An informed insider observes a private
+   signal correlated with an unknown state of the world and adjusts demand accordingly.
+   Equilibrium prices shift.  Under what conditions can an outside observer *infer* the
+   insider's private signal from the equilibrium price?
+
+2. **Do Bayesian price expectations converge?**  In a stationary stochastic exchange
+   economy, an uninformed observer uses the history of market prices and Bayes' Law  to form
+    expectations about the economy's structure.  Do those expectations eventually
+   agree with those  of a fully informed observer?
+
+Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
+
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is  *sufficient* for
+  $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
+  information about the state that $\tilde{y}'$ contains.
+- **Bayesian consistency**: as the sample grows, the posterior concentrates on the true
+  parameter value (even when the underlying economic ßstructure is not globally identified from prices alone).
+
+Important findings of {cite:t}`kihlstrom_mirman1975` are:
+
+- Equilibrium prices transmit inside information **if and only if** the map from the
+  insider's posterior distribution to the equilibrium price vector is invertible
+  (one-to-one).
+- For a two-state pure exchange economy with CES preferences, invertibility holds whenever the
+  elasticity of substitution $\sigma \neq 1$.  With Cobb-Douglas preferences ($\sigma = 1$)
+  the equilibrium price is independent of the insider's posterior, so information is never
+  transmitted.
+- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is notß  identified.
+
+```{note}
+{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in the same
+way that careful econometricians do.  These two objects come in pairs. To each structure or structural model
+there is a reduced form, or collection of reduced forms traced out by different possible regressions.
+```
+
+The lecture is organized as follows.
+
+1. Set up the static two-commodity model and define equilibrium.
+2. State the price-revelation theorem (Theorem 1 of the paper) and the invertibility
+   conditions (Theorem 2).
+3. Illustrate invertibility — and its failure — with numerical examples using CES and
+   Cobb-Douglas preferences.
+4. Introduce the dynamic stochastic economy and derive the Bayesian convergence result.
+5. Simulate Bayesian learning from price observations.
+
+This lecture builds on ideas in {doc}`blackwell_kihlstrom` and {doc}`likelihood_bayes`.
+
+## Setup
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.optimize import brentq
+from scipy.stats import norm
+```
+
+## A Two-Commodity Economy with an Informed Insider
+
+### Preferences, Endowments, and the Unknown State
+
+The economy has two goods. 
+
+Good 2 is the numeraire (price normalized to 1); good 1 trades
+at price $p > 0$.
+
+An unknown parameter $\bar{a}$ affects the value of good 1. 
+
+Agent $i$'s expected utility
+from a bundle $(x_1^i, x_2^i)$ is
+
+$$
+U^i(x_1^i, x_2^i)
+  = \sum_{s=1}^{S} u^i(a_s x_1^i,\, x_2^i)\, PR^i(\bar{a} = a_s),
+$$
+
+where $PR^i$ is agent $i$'s subjective probability distribution over the finite state space
+$A = \{a_1, \ldots, a_S\}$.
+
+Each agent starts with an endowment $w^i$ of good 2 and a share $\theta^i$ of the
+representative firm.
+
+The firm's profit $\pi$ is determined by profit maximization.
+
+Agent
+$i$'s **budget constraint** is
+
+$$
+p x_1^i + x_2^i = w^i + \theta^i \pi.
+$$
+
+Agents maximize expected utility subject to their budget constraints.
+
+A **competitive
+equilibrium** is a price $\hat{p}$ that clears both markets simultaneously.
+
+### The Informed Agent's Problem
+
+Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$ correlated with
+$\bar{a}$ before trading.
+
+Upon observing $\tilde{y} = y$, agent 1 updates their prior
+$\mu = PR^1$ to a **posterior** $\mu_y = (\mu_{y1}, \ldots, \mu_{yS})$ via Bayes' rule:
+
+$$
+\mu_{ys} = PR(\bar{a} = a_s \mid \tilde{y} = y).
+$$
+
+Because agent 1's demand depends on $\mu_y$, the new equilibrium price satisfies
+
+$$
+\hat{p} = p(\mu_y).
+$$
+
+Outside observers who see $\hat{p}$ but not $\tilde{y}$ can try to *back out* the
+insider's posterior from the price.
+
+This is possible when the map $\mu \mapsto p(\mu)$
+is **invertible** on the relevant domain.
+
+(price_revelation_theorem)=
+## Price Revelation: Theorem 1
+
+### Blackwell Sufficiency
+
+The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's private
+information if observing the equilibrium price is just as informative about $\bar{a}$ as
+observing the signal $\tilde{y}$ directly.
+
+In Blackwell's language ({cite}`blackwell1951` and {cite}`blackwell1953`), this means
+$p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
+
+**Definition.**  A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
+respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
+**independent of $\bar{a}$**, such that
+
+$$
+\phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
+\quad \text{for all } a \text{ and all } y',
+$$
+
+where $\phi_a(y) = PR(\tilde{y} = y \mid \bar{a} = a)$.
+
+Thus, once $\tilde{y}$ is known, $\tilde{y}'$ provides no additional information
+about $\bar{a}$.
+
+**Lemma 1** ({cite:t}`kihlstrom_mirman1975`).  The posterior distribution $\mu_{\tilde{y}}$
+is  sufficient  for $\tilde{y}$.
+
+*Proof sketch.*  The posterior $\mu_{\tilde{y}}$ satisfies
+
+$$
+PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
+  = PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y).
+$$
+
+Because the posterior itself *encodes* what $\tilde{y}$ says about $\bar{a}$, observing
+$\tilde{y}$ directly would add no information. $\square$
+
+**Theorem 1** ({cite:t}`kihlstrom_mirman1975`).  In the economy described above, the price
+random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only if** the
+function $p(PR^1)$ is **invertible** on the set
+
+$$
+P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
+  PR(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu(a) > 0 \bigr\}.
+$$
+
+The "only if" direction follows because if $p$ were not one-to-one, two different posteriors
+would generate the same price; an observer could not distinguish them, so the price would
+not transmit all information that resides in  the signal.
+
+### Two Interpretations
+
+**Insider trading in a stock market.**  Good 1 is a risky asset with random return $\bar{a}$;
+good 2 is ''money''.  An insider's demand reveals private information about the return.
+If the invertibility condition holds, outside observers can read the insider's signal from
+the equilibrium stock price.
+
+**Price as a quality signal.**  Good 1 has uncertain quality $\bar{a}$.  Experienced
+consumers (who have sampled the good) observe a signal correlated with quality and buy
+accordingly.  Uninformed consumers can infer quality from the market price, provided
+invertibility holds.
+
+(invertibility_conditions)=
+## Invertibility and the Elasticity of Substitution (Theorem 2)
+
+When does $p(PR^1)$ fail to be invertible?
+
+Theorem 2 of {cite:t}`kihlstrom_mirman1975`
+shows that for a two-state economy ($S = 2$), the answer turns on the **elasticity of
+substitution** $\sigma$ of agent 1's utility function.
+
+### The Two-State First-Order Condition
+
+With $S = 2$ and $\mu = (q,\, 1-q)$, the first-order condition for agent 1's demand
+(equation (12a) in the paper) reduces to
+
+$$
+p(q) = \frac{\alpha_1 q + \alpha_2 (1-q)}{\beta_1 q + \beta_2 (1-q)},
+$$
+
+where
+
+$$
+\alpha_s = a_s\, u^1_1(a_s x_1,\, x_2), \qquad
+\beta_s  = u^1_2(a_s x_1,\, x_2), \qquad s = 1, 2.
+$$
+
+The equilibrium consumption $(x_1, x_2)$ itself depends on $p$, so this is an implicit
+equation in $p$.
+
+**Theorem 2** ({cite:t}`kihlstrom_mirman1975`).  Assume $u^1$ is quasi-concave and
+homothetic with continuous first partials.  Assume agent 1 always consumes positive
+quantities of both goods.  For $S = 2$:
+
+- If $\sigma < 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
+- If $\sigma > 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
+- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), $p(PR^1)$ is **constant** on $P$
+  (no information is transmitted).
+
+Thus, when $\sigma = 1$ the income and substitution effects exactly cancel,
+making agent 1's demand for good 1 independent of information about $\bar{a}$.
+
+So the market price cannot reveal that information.
+
+### CES Utility
+
+For concreteness we work with the **constant-elasticity-of-substitution** (CES) utility
+function
+
+$$
+u(c_1, c_2) = \bigl(c_1^{\rho} + c_2^{\rho}\bigr)^{1/\rho}, \qquad \rho \in (-\infty,0) \cup (0,1),
+$$
+
+whose elasticity of substitution is $\sigma = 1/(1-\rho)$.
+
+- $\rho \to 0$: Cobb-Douglas ($\sigma = 1$).
+- $\rho < 0$: $\sigma < 1$ (complements).
+- $0 < \rho < 1$: $\sigma > 1$ (substitutes).
+
+Pertinent partial derivatives are
+
+$$
+u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1}, \qquad
+u_2(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_2^{\rho-1}.
+$$
+
+### Equilibrium Price as a Function of the Posterior
+
+We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1 at
+equilibrium (i.e.~$x_1 = 1$).
+
+Agent 1's budget constraint then reduces to
+$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$ satisfying
+the first-order condition
+
+$$
+p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
+= q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
+$$
+
+For Cobb-Douglas utility ($\sigma = 1$), first-order-necessary conditions  (FOC) become $p = W^1 - p$,
+giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no information
+is transmitted through the price in the Cobb-Douglas case.
+
+We compute first-order-necessary conditions numerically below.
+
+```{code-cell} ipython3
+def ces_derivatives(c1, c2, rho):
+    """
+    Returns (u1, u2) for u(c1,c2) = (c1^rho + c2^rho)^(1/rho).
+    Uses Cobb-Douglas limit for |rho| < 1e-4 to avoid numerical overflow.
+    """
+    if abs(rho) < 1e-4:
+        # Cobb-Douglas limit  u = sqrt(c1*c2)
+        u1 = 0.5 * np.sqrt(c2 / c1)
+        u2 = 0.5 * np.sqrt(c1 / c2)
+    else:
+        common = (c1**rho + c2**rho)**(1/rho - 1)
+        u1 = common * c1**(rho - 1)
+        u2 = common * c2**(rho - 1)
+    return u1, u2
+
+
+def eq_price(q, a1, a2, W1, rho):
+    """
+    Solve for the equilibrium price when the informed agent absorbs one unit
+    of good 1.  With x1 = 1 and budget constraint x2 = W1 - p, the FOC
+
+        p [q u2(a1, x2) + (1-q) u2(a2, x2)] = q a1 u1(a1, x2) + (1-q) a2 u1(a2, x2)
+
+    has a unique root p* in (0, W1).
+
+    Parameters
+    ----------
+    q   : posterior probability on state 1 (high state)
+    a1  : state-1 productivity value  (a1 > a2)
+    a2  : state-2 productivity value
+    W1  : informed agent's wealth
+    rho : CES parameter  (rho=0 → Cobb-Douglas; analytical p* = W1/2)
+
+    Returns
+    -------
+    p_star : equilibrium price, or nan if solver fails
+    """
+    def residual(p):
+        x2 = W1 - p          # x1 = 1 absorbed at equilibrium
+        u1_s1, u2_s1 = ces_derivatives(a1, x2, rho)
+        u1_s2, u2_s2 = ces_derivatives(a2, x2, rho)
+        lhs = p * (q * u2_s1 + (1 - q) * u2_s2)
+        rhs = q * a1 * u1_s1 + (1 - q) * a2 * u1_s2
+        return lhs - rhs
+
+    try:
+        return brentq(residual, 1e-6, W1 - 1e-6, xtol=1e-10)
+    except ValueError:
+        return np.nan
+```
+
+```{code-cell} ipython3
+# ── Economy parameters ──────────────────────────────────────────────────────
+a1, a2 = 2.0, 0.5     # state values (a1 > a2)
+W1     = 4.0           # informed agent's wealth; equilibrium x2 = W1 - p
+
+# Posterior grid
+q_grid = np.linspace(0.05, 0.95, 200)
+
+# rho values to compare: complements (<0), Cobb-Douglas (=0), substitutes (>0)
+rho_values = [-0.5, 0.0, 0.5]
+rho_labels = [r"$\rho = -0.5$  ($\sigma = 0.67$, complements)",
+              r"$\rho = 0$  ($\sigma = 1$, Cobb-Douglas)",
+              r"$\rho = 0.5$  ($\sigma = 2$, substitutes)"]
+colors     = ["steelblue", "crimson", "forestgreen"]
+
+fig, ax = plt.subplots(figsize=(8, 5))
+
+for rho, label, color in zip(rho_values, rho_labels, colors):
+    prices = [eq_price(q, a1, a2, W1, rho) for q in q_grid]
+    ax.plot(q_grid, prices, label=label, color=color, lw=2)
+
+ax.set_xlabel(r"Posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
+ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
+ax.set_title("Equilibrium price as a function of the informed agent's posterior",
+             fontsize=12)
+ax.legend(fontsize=10)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+The plot confirms Theorem 2.
+
+- **CES with $\sigma \neq 1$**: the equilibrium price is **strictly monotone** in $q$.
+  An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely invert the
+  price to recover $q$—inside information is fully transmitted.
+- **Cobb-Douglas ($\sigma = 1$)**: the price is *flat* in $q$—information is never
+  transmitted through the market.
+
+```{code-cell} ipython3
+# ── Verify that rho=0 (exact Cobb-Douglas) gives a flat line ─────────────────
+p_cd = [eq_price(q, a1, a2, W1, rho=0.0) for q in q_grid]
+
+print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
+      f"max p* = {max(p_cd):.6f}, "
+      f"range = {max(p_cd)-min(p_cd):.2e}")
+print(f"Analytical CD price  = W1/2 = {W1/2:.6f}")
+```
+
+Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the Cobb-Douglas
+equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
+
+(price_monotonicity)=
+### Why Monotonicity Depends on $\sigma$
+
+The derivative $\partial p / \partial q$ has the sign of $\alpha_1 \beta_2 - \alpha_2 \beta_1$
+(from differentiating the FOC formula).
+
+Using
+
+$$
+\frac{\alpha_s}{\beta_s}
+  = \frac{a_s\, u_1(a_s x_1, x_2)}{u_2(a_s x_1, x_2)}
+  = a_s^{(\sigma-1)/\sigma}\,\Bigl(\frac{x_2}{x_1}\Bigr)^{1/\sigma},
+$$
+
+one can show
+
+$$
+\frac{\partial}{\partial a}\,\frac{\alpha}{\beta}
+  = \frac{(\sigma - 1)}{\sigma}\, a^{-1/\sigma}\,
+    \Bigl(\frac{x_2}{x_1}\Bigr)^{1/\sigma}.
+$$
+
+This is positive when $\sigma > 1$, negative when $\sigma < 1$, and **zero when $\sigma = 1$**
+(Cobb-Douglas).
+
+The vanishing derivative means the marginal rate of substitution is
+independent of $a_s$, so the informed agent's demand—and hence the equilibrium price—does
+not respond to changes in beliefs.
+
+Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for different
+values of $\sigma$:
+
+```{code-cell} ipython3
+a_vals = np.linspace(0.3, 3.0, 300)
+x1_fix, x2_fix = 1.0, 1.0   # fix consumption bundle for illustration
+
+fig, ax = plt.subplots(figsize=(7, 4))
+for rho, color in zip([-0.5, -1e-6, 0.5], ["steelblue", "crimson", "forestgreen"]):
+    sigma = 1 / (1 - rho) if abs(rho) > 1e-8 else 1.0
+    ratios = []
+    for a in a_vals:
+        u1, u2 = ces_derivatives(a * x1_fix, x2_fix, rho)
+        ratios.append(a * u1 / u2)
+    ax.plot(a_vals, ratios,
+            label=rf"$\sigma = {sigma:.2f}$", color=color, lw=2)
+
+ax.set_xlabel(r"State value $a_s$", fontsize=12)
+ax.set_ylabel(r"$\alpha_s / \beta_s = a_s u_1 / u_2$", fontsize=12)
+ax.set_title(r"Marginal rate of substitution $\alpha_s/\beta_s$ vs.\ $a_s$", fontsize=12)
+ax.axhline(y=1.0, color="black", lw=0.8, ls="--")
+ax.legend(fontsize=10)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+When $\sigma = 1$ (red line) the ratio is constant across all $a_s$ values—information
+about the state has no effect on the marginal rate of substitution.
+
+For $\sigma < 1$ the
+ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
+equilibrium price strictly monotone in the posterior $q$ in both cases.
+
+(bayesian_price_expectations)=
+## Bayesian Price Expectations in a Dynamic Economy
+
+We now turn to the **dynamic** question of Section 3 in {cite:t}`kihlstrom_mirman1975`.
+
+### A Stochastic Exchange Economy
+
+Time is discrete: $t = 1, 2, \ldots$  In each period $t$:
+
+1. Consumer $i$ receives a random endowment $\omega_i^t$.
+2. Markets open; competitive prices $p^t = p(\omega^t)$ clear all markets.
+3. Consumers trade and consume.
+
+The endowment vectors $\{\tilde{\omega}^t\}$ are **i.i.d.** with density
+$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is a
+**structural parameter vector** that is *fixed but unknown*.
+
+The equilibrium price at time $t$ is a deterministic function of $\omega^t$, so
+$\{p^t\}$ is also i.i.d. with density
+
+$$
+g(p^t \mid \lambda) = \int f(\omega^t \mid \lambda)\,
+  \mathbf{1}\bigl[p(\omega^t) = p^t\bigr]\, d\omega^t.
+$$
+
+Following econometric convention, {cite:t}`kihlstrom_mirman1975` call $g(p \mid \lambda)$
+the **reduced form** and $f(\omega \mid \lambda)$ the **structure**.
+
+### The Identification Problem
+
+Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices loses
+information relative to observing endowments.
+
+In particular, it may be impossible to
+recover $\lambda$ from $g(p \mid \lambda)$ even with infinite price data.
+
+To handle this, partition $\Lambda$ into equivalence classes $\mu$ such that
+$\lambda \in \mu$ and $\lambda' \in \mu$ whenever $g(p \mid \lambda) = g(p \mid \lambda')$
+for all $p$.
+
+The equivalence class $\mu$ containing the true $\lambda$ is the **reduced
+form** (with respect to data on prices).
+
+An observer who knows the infinite price history learns
+$\mu$ but not necessarily $\lambda$.
+
+### Bayesian Updating
+
+An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in \Lambda$.
+After observing the price sequence $(p^1, \ldots, p^t)$, the observer's Bayesian
+posterior is
+
+$$
+h(\lambda \mid p^1, \ldots, p^t)
+  = \frac{h(\lambda)\, \prod_{\tau=1}^{t} g(p^\tau \mid \lambda)}
+         {\displaystyle\sum_{\lambda' \in \Lambda}
+           h(\lambda')\, \prod_{\tau=1}^{t} g(p^\tau \mid \lambda')}.
+$$
+
+At time $t$, the observer's price expectations for the next period are
+
+$$
+g(p^{t+1} \mid p^1, \ldots, p^t)
+  = \sum_{\lambda \in \Lambda} g(p^{t+1} \mid \lambda)\,
+    h(\lambda \mid p^1, \ldots, p^t).
+$$
+
+### The Convergence Theorem
+
+**Theorem** ({cite:t}`kihlstrom_mirman1975`, Section 3).  Let $\bar\lambda$ be the true
+structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.  Then:
+
+$$
+\lim_{t \to \infty} h(\mu \mid p^1, \ldots, p^t)
+  = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,} \end{cases}
+$$
+
+with probability one.  Consequently,
+
+$$
+\lim_{t \to \infty} g(p^{t+1} \mid p^1, \ldots, p^t) = g(p \mid \bar\mu),
+$$
+
+which equals the rational-expectations price distribution for a fully informed observer.
+
+The convergence uses the **Bayesian consistency** result of {cite:t}`degroot1962`: as
+long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
+(which holds here generically), the posterior concentrates on the true reduced form.
+
+**Key insight.**  Price observers converge to **rational expectations** even if they
+never identify the underlying structure $\bar\lambda$.  It is the reduced form
+$g(p \mid \bar\mu)$ that governs equilibrium price expectations, and the Bayesian
+observer learns the reduced form from prices alone.
+
+(bayesian_simulation)=
+## Simulating Bayesian Learning from Prices
+
+We illustrate the theorem with a two-state example.
+
+**Setup.**  Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
+$p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.  The observer knows
+the two possible price distributions (the reduced forms) but not which one governs the
+data.
+
+This is a standard **Bayesian model selection** problem.  With a prior $h_0$ on $\mu_1$
+and the observed price $p^t$, the posterior weight on $\mu_1$ after period $t$ is
+
+$$
+h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
+      + (1-h_{t-1})\, g(p^t \mid \mu_2)}.
+$$
+
+```{code-cell} ipython3
+def simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths,
+                                seed=42):
+    """
+    Simulate Bayesian learning about which price distribution is true.
+
+    Parameters
+    ----------
+    p_bar_true : mean of the true reduced form
+    p_bar_alt  : mean of the alternative reduced form
+    sigma_p    : common standard deviation of price distributions
+    T          : number of periods
+    h0         : initial prior probability on the true model
+    n_paths    : number of simulation paths
+    seed       : random seed
+
+    Returns
+    -------
+    h_paths : array of shape (n_paths, T+1) with posterior beliefs on true model
+    """
+    rng = np.random.default_rng(seed)
+    h_paths = np.zeros((n_paths, T + 1))
+    h_paths[:, 0] = h0
+
+    for path in range(n_paths):
+        h = h0
+        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        for t, p in enumerate(prices):
+            g_true  = norm.pdf(p, loc=p_bar_true, scale=sigma_p)
+            g_alt   = norm.pdf(p, loc=p_bar_alt,  scale=sigma_p)
+            denom   = h * g_true + (1 - h) * g_alt
+            h       = h * g_true / denom
+            h_paths[path, t + 1] = h
+
+    return h_paths
+
+
+def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
+    """Plot posterior beliefs over time."""
+    T = h_paths.shape[1] - 1
+    t_grid = np.arange(T + 1)
+
+    for path in h_paths:
+        ax.plot(t_grid, path, alpha=0.25, lw=0.8, color="steelblue")
+
+    median_path = np.median(h_paths, axis=0)
+    ax.plot(t_grid, median_path, color="navy", lw=2.5, label="Median posterior")
+
+    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="True model weight = 1")
+    ax.set_xlabel("Period $t$", fontsize=12)
+    ax.set_ylabel(r"$h_t$ = posterior weight on true model", fontsize=12)
+    ax.set_title(
+        rf"Bayesian learning: $\bar p_{{\\rm true}}={p_bar_true:.1f}$, "
+        rf"$\bar p_{{\\rm alt}}={p_bar_alt:.1f}$, $\sigma_p={sigma_p:.2f}$",
+        fontsize=11,
+    )
+    ax.legend(fontsize=10)
+    ax.set_ylim(-0.05, 1.08)
+    ax.grid(alpha=0.3)
+```
+
+```{code-cell} ipython3
+T       = 300
+h0      = 0.5     # diffuse prior
+n_paths = 40
+sigma_p = 0.4
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+
+# Case 1: distinct reduced forms (easy to learn)
+p_bar_true, p_bar_alt = 2.0, 1.2
+h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
+axes[0].set_title("Easy case: means far apart", fontsize=12)
+
+# Case 2: similar reduced forms (harder to learn)
+p_bar_true, p_bar_alt = 2.0, 1.8
+h_paths_hard = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+plot_bayesian_learning(h_paths_hard, p_bar_true, p_bar_alt, axes[1])
+axes[1].set_title("Hard case: means close together", fontsize=12)
+
+plt.tight_layout()
+plt.show()
+```
+
+In both panels the posterior weight on the true model converges to 1 with probability one,
+though convergence is slower when the two price distributions are similar (right panel).
+
+### Price Expectations vs. Rational Expectations
+
+We now verify that the observer's price expectations converge to the rational-expectations
+distribution $g(p \mid \bar\mu)$.
+
+```{code-cell} ipython3
+def price_expectation(h_t, p_bar_true, p_bar_alt, p_grid):
+    """
+    Compute the observer's predictive price density at posterior weight h_t.
+    Mixture: h_t * N(p_bar_true, ...) + (1-h_t) * N(p_bar_alt, ...)
+    """
+    return (h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+            + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p))
+
+
+p_bar_true, p_bar_alt = 2.0, 1.2
+sigma_p = 0.4
+T_long  = 1000
+n_paths = 1
+h_paths_long = simulate_bayesian_learning(
+    p_bar_true, p_bar_alt, sigma_p, T_long, h0=0.5, n_paths=n_paths, seed=7
+)
+
+p_grid = np.linspace(0.0, 3.5, 300)
+re_density = norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+
+fig, ax = plt.subplots(figsize=(8, 5))
+snapshots = [0, 10, 50, 200, T_long]
+palette   = plt.cm.Blues(np.linspace(0.3, 1.0, len(snapshots)))
+
+for t_snap, col in zip(snapshots, palette):
+    h_t = h_paths_long[0, t_snap]
+    dens = price_expectation(h_t, p_bar_true, p_bar_alt, p_grid)
+    ax.plot(p_grid, dens, color=col, lw=2,
+            label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$")
+
+ax.plot(p_grid, re_density, "k--", lw=2.5,
+        label=r"Rational expectations $g(p \mid \bar\mu)$")
+ax.set_xlabel("Price $p$", fontsize=12)
+ax.set_ylabel("Density", fontsize=12)
+ax.set_title("Observer's price distribution converges to rational expectations", fontsize=12)
+ax.legend(fontsize=9)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+The sequence of predictive densities (shades of blue) converges to the rational-expectations
+density (dashed black line) as experience accumulates.
+
+This illustrates the main theorem of
+Section 3 of {cite:t}`kihlstrom_mirman1975`.
+
+(km_extension_nonidentification)=
+### Learning the Reduced Form without Identifying the Structure
+
+The convergence result is particularly striking because the observer converges to
+*rational expectations* even when the underlying **structure** $\lambda$ is
+*not identified* by prices.
+
+To illustrate this, consider a case with *three* possible structures
+$\lambda^{(1)}, \lambda^{(2)}, \lambda^{(3)}$ but only *two* reduced forms
+$\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
+(because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price distribution).
+
+```{code-cell} ipython3
+def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths, seed=0):
+    """
+    Bayesian learning with 3 structures, 2 reduced forms.
+    h0_vec  : length-3 array of initial prior weights on each structure
+    p_bar_vec: length-3 array of price means for each structure
+               (structures 0 and 1 share the same reduced form if p_bar_vec[0]==p_bar_vec[1])
+    true_idx: index (0,1,2) of the true structure
+    Returns  : array (n_paths, T+1, 3) posterior weights on each structure
+    """
+    rng = np.random.default_rng(seed)
+    h_paths = np.zeros((n_paths, T + 1, 3))
+    h_paths[:, 0, :] = h0_vec
+
+    for path in range(n_paths):
+        h = np.array(h0_vec, dtype=float)
+        prices = rng.normal(p_bar_vec[true_idx], sigma_p, size=T)
+        for t, p in enumerate(prices):
+            likelihoods = norm.pdf(p, loc=p_bar_vec, scale=sigma_p)
+            h = h * likelihoods
+            h /= h.sum()
+            h_paths[path, t + 1, :] = h
+
+    return h_paths
+
+
+# Structures 0 and 1 have the same reduced form (same price mean)
+p_bar_vec = np.array([2.0, 2.0, 1.2])
+h0_vec    = np.array([1/3, 1/3, 1/3])
+sigma_p   = 0.4
+T         = 400
+true_idx  = 0             # True structure is 0 (indistinguishable from 1)
+
+h_paths_3 = simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths=30)
+t_grid = np.arange(T + 1)
+
+fig, axes = plt.subplots(1, 3, figsize=(13, 4), sharey=True)
+struct_labels = [r"$\lambda^{(1)}$",
+                 r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
+                 r"$\lambda^{(3)}$"]
+
+for k, (ax, label) in enumerate(zip(axes, struct_labels)):
+    for path in h_paths_3:
+        ax.plot(t_grid, path[:, k], alpha=0.25, lw=0.8, color="steelblue")
+    ax.plot(t_grid, np.median(h_paths_3[:, :, k], axis=0),
+            color="navy", lw=2.5, label="Median")
+    ax.set_title(f"Structure {label}", fontsize=10)
+    ax.set_xlabel("Period $t$", fontsize=11)
+    ax.grid(alpha=0.3)
+    ax.legend(fontsize=9)
+
+axes[0].set_ylabel("Posterior weight", fontsize=11)
+fig.suptitle(
+    r"Non-identification: weights on $\lambda^{(1)}$ and $\lambda^{(2)}$ stabilize at "
+    r"non-degenerate values; $\lambda^{(3)}$ is eliminated",
+    fontsize=10, y=1.02
+)
+plt.tight_layout()
+plt.show()
+```
+
+The observer correctly rules out $\lambda^{(3)}$ (the wrong reduced form) with probability
+one, but cannot distinguish $\lambda^{(1)}$ from $\lambda^{(2)}$ because they generate an
+identical price distribution.
+
+Nevertheless, the observer's **price expectations** converge
+to rational expectations because both structures imply the same reduced form $\bar\mu$.
+
+## Exercises
+
+```{exercise}
+:label: km_ex1
+
+**Invertibility with CARA preferences.**  Consider a two-state economy ($a_1 = 2$,
+$a_2 = 0.5$) where the informed agent has **CARA** (constant absolute risk aversion)
+preferences over portfolio wealth:
+
+$$
+u(W) = -e^{-\gamma W}, \quad W = x_2 + \bar{a}\, x_1.
+$$
+
+The agent chooses $x_1$ to maximize
+
+$$
+q\,u(W_1) + (1-q)\,u(W_2), \quad W_s = w - p\,x_1 + a_s\,x_1,
+$$
+
+subject to the budget constraint $p\,x_1 + x_2 = w$.  Total supply of good 1 is $X_1 = 1$.
+
+(a) Derive the first-order condition for the informed agent's optimal $x_1$.
+
+(b) Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
+supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.  Solve it
+numerically for $q \in (0,1)$ and several values of $\gamma$.
+
+(c) Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
+holds.  Explain intuitively why CARA preferences always lead to an invertible price map
+(the elasticity of substitution of portfolio utility is $\sigma = \infty$).
+```
+
+```{solution-start} km_ex1
+:class: dropdown
+```
+
+**(a) First-order condition.**
+
+Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.  The FOC is
+
+$$
+q\,(a_1 - p)\,\gamma\, e^{-\gamma W_1}
+= (1-q)\,(p - a_2)\,\gamma\, e^{-\gamma W_2},
+$$
+
+or equivalently (dividing by $\gamma$ and rearranging)
+
+$$
+q\,(a_1 - p)\, e^{-\gamma(a_1-p) x_1}
+  = (1-q)\,(p - a_2)\, e^{-\gamma(p-a_2) x_1}.
+$$
+
+**(b) Market-clearing equilibrium price.**
+
+Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
+a scalar root-finding problem in $p$:
+
+$$
+F(p;\,q,\gamma) \equiv
+  q\,(a_1-p)\,e^{-\gamma(a_1-p)} - (1-q)\,(p-a_2)\,e^{-\gamma(p-a_2)} = 0.
+$$
+
+```{code-cell} ipython3
+from scipy.optimize import brentq
+
+def F_cara(p, q, a1, a2, gamma, x1=1.0):
+    """Residual of CARA market-clearing condition."""
+    return (q * (a1-p) * np.exp(-gamma*(a1-p)*x1)
+            - (1-q) * (p-a2) * np.exp(-gamma*(p-a2)*x1))
+
+a1, a2  = 2.0, 0.5
+q_grid  = np.linspace(0.05, 0.95, 200)
+gammas  = [0.5, 1.0, 2.0, 5.0]
+colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(gammas)))
+
+fig, ax = plt.subplots(figsize=(8, 5))
+for gamma, color in zip(gammas, colors_sol):
+    p_eq = [brentq(F_cara, a2+1e-4, a1-1e-4,
+                   args=(q, a1, a2, gamma))
+            for q in q_grid]
+    ax.plot(q_grid, p_eq, lw=2, color=color,
+            label=rf"$\gamma = {gamma}$")
+
+ax.set_xlabel(r"Posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
+ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
+ax.set_title("CARA preferences: equilibrium prices", fontsize=12)
+ax.legend(fontsize=10)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+**(c) Invertibility for CARA.**
+
+The price is strictly increasing in $q$ for every $\gamma > 0$.  Intuitively, portfolio
+utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect substitutes** in
+creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$.  By
+Theorem 2 of {cite:t}`kihlstrom_mirman1975`, the price map is therefore always invertible.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: km_ex2
+
+**Convergence rate and KL divergence.**  In the Bayesian learning simulation, the speed of
+convergence to rational expectations is determined by the **Kullback-Leibler divergence**
+between the two reduced forms.
+
+The KL divergence from $g(\cdot \mid \mu_2)$ to $g(\cdot \mid \mu_1)$, for two normal
+distributions with means $\bar{p}_1$ and $\bar{p}_2$ and common variance $\sigma_p^2$, is
+
+$$
+D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2}.
+$$
+
+(a) For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
+($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.8$), compute $D_{KL}$ for $\sigma_p = 0.4$.
+
+(b) Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
+path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot histograms of
+$T_{0.99}$ for both cases.
+
+(c) How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
+roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
+```
+
+```{solution-start} km_ex2
+:class: dropdown
+```
+
+```{code-cell} ipython3
+sigma_p = 0.4
+
+def kl_normal(p1, p2, sigma):
+    """KL divergence between N(p1,sigma^2) and N(p2,sigma^2)."""
+    return (p1 - p2)**2 / (2 * sigma**2)
+
+cases = [("Easy",  2.0, 1.2), ("Hard", 2.0, 1.8)]
+for name, p1, p2 in cases:
+    kl = kl_normal(p1, p2, sigma_p)
+    print(f"{name} case: D_KL = {kl:.4f}")
+
+n_paths = 100
+
+fig, axes = plt.subplots(1, 2, figsize=(11, 4))
+for ax, (name, p1, p2) in zip(axes, cases):
+    kl = kl_normal(p1, p2, sigma_p)
+    paths = simulate_bayesian_learning(p1, p2, sigma_p, T=2000,
+                                       h0=0.5, n_paths=n_paths, seed=42)
+    # First period where posterior >= 0.99
+    T99 = []
+    for path in paths:
+        idx = np.where(path >= 0.99)[0]
+        T99.append(idx[0] if len(idx) > 0 else 2001)
+
+    median_T = np.median(T99)
+    ax.hist(T99, bins=20, color="steelblue", edgecolor="white", alpha=0.8)
+    ax.axvline(median_T, color="crimson", lw=2,
+               label=fr"Median $T_{{0.99}} = {median_T:.0f}$")
+    ax.set_title(
+        f"{name}: $D_{{KL}} = {kl:.4f}$,  "
+        fr"$C/D_{{KL}} \approx {median_T*kl:.1f}$",
+        fontsize=11
+    )
+    ax.set_xlabel(r"$T_{0.99}$", fontsize=12)
+    ax.set_ylabel("Count", fontsize=11)
+    ax.legend(fontsize=10)
+    ax.grid(alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+```
+
+The median $T_{0.99}$ scales as approximately $C/D_{KL}$, confirming that learning is
+faster when the two reduced forms are more easily distinguished (large $D_{KL}$).
+
+```{solution-end}
+```
+
+```{exercise}
+:label: km_ex3
+
+**Failure of invertibility—counterexample for $S > 2$.**  The paper constructs a
+counterexample showing that for $S = 3$ states, even if the elasticity of substitution
+of $u^1$ is everywhere greater than one, $p(PR^1)$ need **not** be invertible.
+
+Consider the marginal rate of substitution for the portfolio utility
+$u^1(a_s x_1 + x_2)$ (infinite elasticity of substitution) and three states
+$a_1 > a_2 > a_3$.  The MRS is
+
+$$
+m(\mu)
+= \frac{a_1\beta_1\mu(a_1) + a_2\beta_2\mu(a_2) + a_3\beta_3\mu(a_3)}
+       {\beta_1\mu(a_1) + \beta_2\mu(a_2) + \beta_3\mu(a_3)},
+$$
+
+where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
+
+(a) For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
+$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of $(q, r)$.
+Compute $\partial m / \partial r$ and show that its sign depends on
+$\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
+
+(b) Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
+aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify numerically
+that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone in $r$),
+giving a counterexample to invertibility.
+
+(c) Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
+```
+
+```{solution-start} km_ex3
+:class: dropdown
+```
+
+**(a)** Rewrite the MRS with $\mu_1 = 1-r-q$:
+
+$$
+m(q,r) = \frac{a_1\beta_1(1-r-q) + a_2\beta_2 r + a_3\beta_3 q}
+               {\beta_1(1-r-q) + \beta_2 r + \beta_3 q}.
+$$
+
+Differentiating using the quotient rule (denominator $D$):
+
+$$
+\frac{\partial m}{\partial r}
+= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3 q)(\beta_2-\beta_1)}{D^2}.
+$$
+
+After simplification this reduces to a signed combination of
+$\beta_1\beta_2(a_1-a_2)({\cdot})$ and $\beta_2\beta_3(a_2-a_3)({\cdot})$ terms
+whose sign is parameter-dependent.
+
+**(b) Numerical verification.**
+
+```{code-cell} ipython3
+def mrs_3state(q, r, a1, a2, a3, x1, x2, gamma):
+    """MRS with mu(a3)=q, mu(a2)=r, mu(a1)=1-r-q, portfolio utility u'(c)=c^{-gamma}."""
+    mu1, mu2, mu3 = 1 - r - q, r, q
+    beta1 = (a1 * x1 + x2)**(-gamma)
+    beta2 = (a2 * x1 + x2)**(-gamma)
+    beta3 = (a3 * x1 + x2)**(-gamma)
+    num = a1*beta1*mu1 + a2*beta2*mu2 + a3*beta3*mu3
+    den = beta1*mu1 + beta2*mu2 + beta3*mu3
+    return num / den
+
+a1, a2, a3  = 3.0, 2.0, 0.5
+x1, x2      = 1.0, 0.5
+gamma       = 2.0
+q_fix       = 0.1       # fix q, vary r
+r_grid      = np.linspace(0.05, 0.80, 200)
+
+# Filter valid (q+r <= 1)
+r_valid = r_grid[r_grid + q_fix <= 0.95]
+m_vals  = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, gamma) for r in r_valid]
+dm_dr   = np.gradient(m_vals, r_valid)
+
+fig, axes = plt.subplots(1, 2, figsize=(11, 4))
+axes[0].plot(r_valid, m_vals, color="steelblue", lw=2)
+axes[0].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
+axes[0].set_ylabel(r"$m(q, r)$ — MRS", fontsize=12)
+axes[0].set_title(fr"MRS is non-monotone in $r$ (CRRA $\gamma={gamma}$)", fontsize=12)
+axes[0].grid(alpha=0.3)
+
+axes[1].plot(r_valid, dm_dr, color="crimson", lw=2)
+axes[1].axhline(0, color="black", lw=1, ls="--")
+axes[1].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
+axes[1].set_ylabel(r"$\partial m / \partial r$", fontsize=12)
+axes[1].set_title("Derivative changes sign — non-invertibility for $S=3$", fontsize=12)
+axes[1].grid(alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+
+print("Sign changes in dm/dr:",
+      np.sum(np.diff(np.sign(dm_dr)) != 0))
+```
+
+The derivative $\partial m / \partial r$ changes sign, confirming that the MRS (and hence
+the equilibrium price) is **not** monotone in $r$ for $S = 3$.
+
+**(c)** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
+and the MRS is a function of $q$ alone.  One can show directly that $\partial m / \partial q$
+has a definite sign determined entirely by whether $a_1 > a_2$ and whether
+$\sigma > 1$ or $\sigma < 1$ hold—there is no room for sign changes.  With three states,
+the two-dimensional prior $(q, r)$ allows richer interactions between $\beta_s$ values that
+can reverse the sign of the derivative.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: km_ex4
+
+**Bayesian learning with misspecified models.**  The convergence theorem assumes the true
+distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
+$h(\bar\lambda) > 0$).  Investigate what happens when the true model is **not** in the
+prior support.
+
+(a) Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
+    places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$.
+    Plot the posterior weight on each model over time.
+
+(b) Show that the **predictive** (mixture) price distribution converges to the *closest*
+    model in KL divergence terms—which by symmetry is the equal mixture, with mean 2.0.
+    Verify this numerically by computing the predictive mean over time.
+
+(c) Relate this finding to the Bayesian consistency literature: when is the limit
+    distribution a good approximation to the true distribution even under misspecification?
+```
+
+```{solution-start} km_ex4
+:class: dropdown
+```
+
+```{code-cell} ipython3
+def simulate_misspecified(T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0):
+    """
+    Misspecified Bayesian learning: two wrong models with means p_bar_wrong[0,1].
+    True model has mean p_bar_true (not in prior support).
+    Returns (n_paths, T+1, 2) array of posterior weights.
+    """
+    rng = np.random.default_rng(seed)
+    h_paths = np.zeros((n_paths, T + 1, 2))
+    h_paths[:, 0, :] = h0
+
+    for path in range(n_paths):
+        h = np.array(h0, dtype=float)
+        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        for t, price in enumerate(prices):
+            likes = norm.pdf(price, loc=p_bar_wrong, scale=sigma_p)
+            h = h * likes
+            h /= h.sum()
+            h_paths[path, t + 1, :] = h
+
+    return h_paths
+
+
+T        = 1000
+p_true   = 2.0
+p_wrong  = np.array([1.5, 2.5])
+sigma_p  = 0.4
+h0       = np.array([0.5, 0.5])
+n_paths  = 30
+
+h_misspec = simulate_misspecified(T, p_true, p_wrong, sigma_p, h0, n_paths)
+
+t_grid = np.arange(T + 1)
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+for ax, k, label in zip(axes, [0, 1], [r"$N(1.5, \sigma^2)$", r"$N(2.5, \sigma^2)$"]):
+    for path in h_misspec:
+        ax.plot(t_grid, path[:, k], alpha=0.2, lw=0.8, color="steelblue")
+    ax.plot(t_grid, np.median(h_misspec[:, :, k], axis=0),
+            color="navy", lw=2.5, label="Median")
+    ax.axhline(0.5, color="crimson", lw=1.5, ls="--", label="0.5 (symmetric limit)")
+    ax.set_title(f"Posterior weight on {label}", fontsize=11)
+    ax.set_xlabel("Period $t$", fontsize=11)
+    ax.set_ylabel("Posterior weight", fontsize=11)
+    ax.legend(fontsize=9)
+    ax.grid(alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+
+# Predictive mean = h[:,0]*1.5 + h[:,1]*2.5
+pred_mean = np.median(
+    h_misspec[:, :, 0] * p_wrong[0] + h_misspec[:, :, 1] * p_wrong[1], axis=0
+)
+print(f"True mean: {p_true}")
+print(f"Predictive mean at T={T}: {pred_mean[-1]:.4f}")
+print("(Symmetry implies equal weight on 1.5 and 2.5 → predictive mean = 2.0)")
+```
+
+By symmetry, the two wrong models are equidistant from the true distribution in KL
+divergence. The posterior therefore converges to the 50-50 mixture, and the predictive mean
+converges to $0.5 \times 1.5 + 0.5 \times 2.5 = 2.0$—coinciding with the true mean
+despite misspecification.  This is an instance of the general result that under
+misspecification, Bayesian posteriors converge to the distribution in the model class that
+minimizes KL divergence from the model actually generating the data.
+
+```{solution-end}
+```
+

From a9d6c474703dabce25eb9d89da7d49922e6e97fa Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Mon, 20 Apr 2026 18:45:35 -0600
Subject: [PATCH 02/10] Tom's April 20 edits of several lectures

---
 lectures/information_market_equilibrium.md |  24 +-
 lectures/multivariate_normal.md            | 400 +++++++++++++++++++-
 lectures/prob_matrix.md                    | 417 ++++++++++++++++++++-
 3 files changed, 811 insertions(+), 30 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 00063a9a1..222f9bccb 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -43,11 +43,11 @@ answered by {cite:t}`kihlstrom_mirman1975`.
 
 Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
 
-- **Blackwell sufficiency**: a random variable $\tilde{y}$ is  *sufficient* for
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be  *sufficient* for a random variable
   $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
   information about the state that $\tilde{y}'$ contains.
-- **Bayesian consistency**: as the sample grows, the posterior concentrates on the true
-  parameter value (even when the underlying economic ßstructure is not globally identified from prices alone).
+- **Bayesian consistency**: as the sample grows, a Bayesian statistician's posterior probability distribution concentrates on the true
+  parameter value *even when the underlying economic structure is not globally identified from prices alone*.
 
 Important findings of {cite:t}`kihlstrom_mirman1975` are:
 
@@ -58,12 +58,12 @@ Important findings of {cite:t}`kihlstrom_mirman1975` are:
   elasticity of substitution $\sigma \neq 1$.  With Cobb-Douglas preferences ($\sigma = 1$)
   the equilibrium price is independent of the insider's posterior, so information is never
   transmitted.
-- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is notß  identified.
+- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not  identified.
 
 ```{note}
-{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in the same
-way that careful econometricians do.  These two objects come in pairs. To each structure or structural model
-there is a reduced form, or collection of reduced forms traced out by different possible regressions.
+{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in a
+way that careful econometricians do.  Reduced-form  and structural models  come in pairs. To each structure or structural model
+there is a reduced form, or collection of reduced forms, underlying  different possible regressions.
 ```
 
 The lecture is organized as follows.
@@ -164,7 +164,7 @@ $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 
 **Definition.**  A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
 respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
-**independent of $\bar{a}$**, such that
+**independent of**$\bar{a}$, such that
 
 $$
 \phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
@@ -468,7 +468,7 @@ equilibrium price strictly monotone in the posterior $q$ in both cases.
 (bayesian_price_expectations)=
 ## Bayesian Price Expectations in a Dynamic Economy
 
-We now turn to the **dynamic** question of Section 3 in {cite:t}`kihlstrom_mirman1975`.
+We now turn to a question addressed in Section 3 of {cite:t}`kihlstrom_mirman1975`.
 
 ### A Stochastic Exchange Economy
 
@@ -550,13 +550,13 @@ $$
 
 which equals the rational-expectations price distribution for a fully informed observer.
 
-The convergence uses the **Bayesian consistency** result of {cite:t}`degroot1962`: as
+Establishing convergence relies on appealing to  the **Bayesian consistency** result of {cite:t}`degroot1962`: as
 long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
 (which holds here generically), the posterior concentrates on the true reduced form.
 
 **Key insight.**  Price observers converge to **rational expectations** even if they
-never identify the underlying structure $\bar\lambda$.  It is the reduced form
-$g(p \mid \bar\mu)$ that governs equilibrium price expectations, and the Bayesian
+never identify the underlying structure $\bar\lambda$.  The reduced form
+$g(p \mid \bar\mu)$ statistical model is used to form equilibrium price expectations, and the Bayesian
 observer learns the reduced form from prices alone.
 
 (bayesian_simulation)=
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 7aacee6eb..e1353ec0b 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -3,8 +3,10 @@ jupytext:
   text_representation:
     extension: .md
     format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
 kernelspec:
-  display_name: Python 3
+  display_name: Python 3 (ipykernel)
   language: python
   name: python3
 ---
@@ -60,7 +62,7 @@ We apply our Python class to some examples.
 
 We  use the following imports:
 
-```{code-cell} ipython
+```{code-cell} ipython3
 import matplotlib.pyplot as plt
 import numpy as np
 from numba import jit
@@ -95,7 +97,7 @@ def f(z, μ, Σ):
     μ: ndarray(float, dim=1 or 2)
         the mean of z, N by 1
     Σ: ndarray(float, dim=2)
-        the covarianece matrix of z, N by 1
+        the covariance matrix of z, N by N
     """
 
     z = np.atleast_2d(z)
@@ -186,7 +188,7 @@ class MultivariateNormal:
     μ: ndarray(float, dim=1)
         the mean of z, N by 1
     Σ: ndarray(float, dim=2)
-        the covarianece matrix of z, N by 1
+        the covariance matrix of z, N by N
 
     Arguments
     ---------
@@ -1093,8 +1095,8 @@ for indices, IQ, conditions in [([*range(2*n), 2*n], 'θ', 'y1, y2, y3, y4'),
           f'{μ_hat[0]:1.2f} and {Σ_hat[0, 0]:1.2f} respectively')
 ```
 
-Evidently, math tests provide no information about $\mu$ and
-language tests provide no information about $\eta$.
+Evidently, math tests provide no information about $\eta$ and
+language tests provide no information about $\theta$.
 
 ## Univariate Time Series Analysis
 
@@ -1688,7 +1690,7 @@ plt.show()
 
 In the above graph, the green line is what the price of the stock would
 be if people had perfect foresight about the path of dividends while the
-green line is the conditional expectation $E p_t | y_t, y_{t-1}$, which is what the price would
+red line is the conditional expectation $E p_t | y_t, y_{t-1}$, which is what the price would
 be if people did not have perfect foresight but were optimally
 predicting future dividends on the basis of the information
 $y_t, y_{t-1}$ at time $t$.
@@ -1895,7 +1897,7 @@ G = np.array([[1., 3.]])
 R = np.array([[1.]])
 
 x0_hat = np.array([0., 1.])
-Σ0 = np.array([[1., .5], [.3, 2.]])
+Σ0 = np.array([[1., .5], [.5, 2.]])
 
 μ = np.hstack([x0_hat, G @ x0_hat])
 Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]])
@@ -2300,3 +2302,385 @@ Pjk = P[:, :2]
 Σy_hat = Pjk @ Σεjk @ Pjk.T
 print('Σy_hat = \n', Σy_hat)
 ```
+
+## Exercises
+
+```{exercise}
+:label: mv_normal_ex1
+
+**Verify conditional mean and variance by simulation**
+
+For the bivariate normal with
+
+$$
+\mu = \begin{bmatrix} 0.5 \\ 1.0 \end{bmatrix}, \quad
+\Sigma = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}
+$$
+
+fix $z_2 = 2$.
+
+(a) Use `MultivariateNormal` to compute the analytical conditional mean
+$\hat{\mu}_1$ and variance $\hat{\Sigma}_{11}$ of $z_1 \mid z_2 = 2$.
+
+(b) Draw $10^6$ samples from the joint distribution. Retain only those
+for which $|z_2 - 2| < 0.05$. Compute the sample mean and variance of
+the retained $z_1$ values.
+
+(c) Confirm that the sample estimates are close to the analytical values.
+```
+
+```{solution-start} mv_normal_ex1
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import statsmodels.api as sm
+
+μ = np.array([.5, 1.])
+Σ = np.array([[1., .5], [.5, 1.]])
+
+# (a) analytical conditional distribution
+mn = MultivariateNormal(μ, Σ)
+mn.partition(1)
+μ1_hat, Σ11_hat = mn.cond_dist(0, np.array([2.]))
+print(f"Analytical  μ̂₁ = {μ1_hat[0]:.4f},  Σ̂₁₁ = {Σ11_hat[0,0]:.4f}")
+
+# (b) simulation
+n = 1_000_000
+data = np.random.multivariate_normal(μ, Σ, size=n)
+z1_all, z2_all = data[:, 0], data[:, 1]
+
+mask = np.abs(z2_all - 2.) < 0.05
+z1_cond = z1_all[mask]
+print(f"Sample size in band: {mask.sum()}")
+print(f"Sample      μ̂₁ = {np.mean(z1_cond):.4f},  Σ̂₁₁ = {np.var(z1_cond, ddof=1):.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex2
+
+**Product of regression slopes equals squared correlation**
+
+For a bivariate normal with standard deviations $\sigma_1 = \sigma_2 = 1$ and
+correlation $\rho$, show analytically that $b_1 b_2 = \rho^2$, where
+$b_1$ is the slope of $z_1$ on $z_2$ and $b_2$ is the slope of $z_2$
+on $z_1$.
+
+Then verify numerically for $\rho \in \{0.2, 0.5, 0.9\}$ that
+`βs[0] * βs[1]` $= \rho^2$ by constructing the appropriate
+`MultivariateNormal` instances.
+```
+
+```{solution-start} mv_normal_ex2
+:class: dropdown
+```
+
+The regression slopes are
+
+$$
+b_1 = \frac{\Sigma_{12}}{\Sigma_{22}} = \frac{\rho \sigma_1 \sigma_2}{\sigma_2^2}
+= \rho \frac{\sigma_1}{\sigma_2}, \qquad
+b_2 = \frac{\Sigma_{21}}{\Sigma_{11}} = \rho \frac{\sigma_2}{\sigma_1}
+$$
+
+so $b_1 b_2 = \rho^2$.
+
+```{code-cell} python3
+import numpy as np
+
+for rho in [0.2, 0.5, 0.9]:
+    Σ = np.array([[1., rho], [rho, 1.]])
+    mn = MultivariateNormal(np.zeros(2), Σ)
+    mn.partition(1)
+    product = float(mn.βs[0]) * float(mn.βs[1])
+    print(f"ρ={rho:.1f}:  b1*b2 = {product:.4f},  ρ² = {rho**2:.4f},  match: {np.isclose(product, rho**2)}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex3
+
+**IQ inference: effect of the signal-to-noise ratio**
+
+Using the one-dimensional IQ model with $n = 50$ test scores and
+$\mu_\theta = 100$, $\sigma_\theta = 10$:
+
+(a) Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
+For each value, plot the posterior standard deviation
+$\hat{\sigma}_\theta$ as a function of the number of test scores
+included (from 1 to 50), with all curves on the same axes.
+
+(b) Explain intuitively why a larger $\sigma_y$ leads to a slower
+decline of posterior uncertainty.
+```
+
+```{solution-start} mv_normal_ex3
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+n_max = 50
+μθ_val, σθ_val = 100., 10.
+
+fig, ax = plt.subplots()
+for σy_val in [1., 5., 10., 20., 50.]:
+    σθ_hat_arr = np.empty(n_max)
+    for i in range(1, n_max + 1):
+        μ_i, Σ_i, _ = construct_moments_IQ(i, μθ_val, σθ_val, σy_val)
+        mn_i = MultivariateNormal(μ_i, Σ_i)
+        mn_i.partition(i)
+        _, Σθ_i = mn_i.cond_dist(1, np.zeros(i))   # conditioning value doesn't affect variance
+        σθ_hat_arr[i - 1] = np.sqrt(Σθ_i[0, 0])
+    ax.plot(range(1, n_max + 1), σθ_hat_arr, label=f'σy={σy_val:.0f}')
+
+ax.set_xlabel('number of test scores')
+ax.set_ylabel(r'posterior $\hat{\sigma}_\theta$')
+ax.legend()
+plt.show()
+```
+
+When $\sigma_y$ is large each test score is a noisy signal about $\theta$,
+so many more observations are required before the posterior variance falls
+appreciably. In the limit $\sigma_y \to 0$ a single observation pins down
+$\theta$ exactly.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex4
+
+**Prior vs. likelihood in IQ inference**
+
+Using the one-dimensional IQ model with $n = 20$ test scores and
+$\mu_\theta = 100$, $\sigma_y = 10$:
+
+(a) Fix $\sigma_y = 10$ and vary the prior spread
+$\sigma_\theta \in \{1, 5, 10, 50, 500\}$. For each value compute the
+posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
+scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
+
+(b) Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
+the posterior mean converges to the sample mean $\bar{y}$, and as
+$\sigma_y \to \infty$ the posterior mean converges to the prior mean
+$\mu_\theta$.
+```
+
+```{solution-start} mv_normal_ex4
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+n_scores = 20
+μθ_val, σy_val = 100., 10.
+
+# draw one set of test scores from a fixed "true" θ
+np.random.seed(42)
+true_θ = 108.
+y_obs = true_θ + σy_val * np.random.randn(n_scores)
+y_bar = np.mean(y_obs)
+
+σθ_vals = [1., 5., 10., 50., 500.]
+μθ_hat_vals = []
+
+for σθ_val in σθ_vals:
+    μ_i, Σ_i, _ = construct_moments_IQ(n_scores, μθ_val, σθ_val, σy_val)
+    mn_i = MultivariateNormal(μ_i, Σ_i)
+    mn_i.partition(n_scores)
+    μθ_hat, _ = mn_i.cond_dist(1, y_obs)
+    μθ_hat_vals.append(float(μθ_hat))
+
+fig, ax = plt.subplots()
+ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', label=r'$\hat{\mu}_\theta$')
+ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean ȳ = {y_bar:.1f}')
+ax.axhline(μθ_val, ls=':',  color='g', label=f'prior mean μθ = {μθ_val:.0f}')
+ax.set_xlabel(r'$\sigma_\theta$')
+ax.set_ylabel(r'posterior mean $\hat{\mu}_\theta$')
+ax.legend()
+plt.show()
+
+print(f"ȳ = {y_bar:.4f}")
+print(f"Large σθ posterior mean ≈ {μθ_hat_vals[-1]:.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex5
+
+**Kalman filter convergence**
+
+Using the `iterate` function from the Filtering Foundations section with
+
+$$
+A = \begin{bmatrix} 0.9 & 0 \\ 0 & 0.5 \end{bmatrix}, \quad
+C = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad
+G = \begin{bmatrix} 1 & 0 \end{bmatrix}, \quad
+R = \begin{bmatrix} 1 \end{bmatrix}
+$$
+
+and initial conditions $\hat{x}_0 = [0, 0]'$, $\Sigma_0 = I_2$:
+
+(a) Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
+
+(b) Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
+$\Sigma_t[1,1]$ over time. Verify that they converge to a steady state.
+
+(c) Plot the filtered state estimates $\hat{x}_t[0]$ together with the
+true $x_t[0]$ and the raw observations $y_t$ on a single figure.
+```
+
+```{solution-start} mv_normal_ex5
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+A_ex = np.array([[0.9, 0.], [0., 0.5]])
+C_ex = np.array([[1.], [1.]])
+G_ex = np.array([[1., 0.]])
+R_ex = np.array([[1.]])
+
+T_ex = 60
+x0_hat_ex = np.zeros(2)
+Σ0_ex = np.eye(2)
+
+# simulate true states and observations
+np.random.seed(7)
+x_true = np.zeros((T_ex + 1, 2))
+y_seq_ex = np.zeros(T_ex)
+for t in range(T_ex):
+    x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * np.random.randn()
+    y_seq_ex[t] = G_ex @ x_true[t] + np.random.randn()
+
+# run filter
+x_hat_seq, Σ_hat_seq = iterate(x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
+
+# (b) conditional variances
+fig, ax = plt.subplots()
+ax.plot(Σ_hat_seq[:, 0, 0], label=r'$\Sigma_t[0,0]$')
+ax.plot(Σ_hat_seq[:, 1, 1], label=r'$\Sigma_t[1,1]$')
+ax.set_xlabel('t')
+ax.set_ylabel('conditional variance')
+ax.legend()
+plt.show()
+
+# (c) filtered state vs. truth vs. observations
+fig, ax = plt.subplots()
+ax.plot(x_true[1:, 0], label='true $x_t[0]$', alpha=0.7)
+ax.plot(x_hat_seq[1:, 0], label=r'filtered $\hat{x}_t[0]$', ls='--')
+ax.plot(y_seq_ex, label='observations $y_t$', alpha=0.4, lw=0.8)
+ax.set_xlabel('t')
+ax.legend()
+plt.show()
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: mv_normal_ex6
+
+**PCA vs. factor analysis**
+
+In the classic factor analysis model at the end of the lecture the true
+covariance is $\Sigma_y = \Lambda \Lambda' + D$.
+
+(a) Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
+variance explained by the first two principal components and compare
+it with the $\sigma_u = 0.5$ result. Explain the change.
+
+(b) Show that the conditional expectation $E[f \mid Y] = BY$ with
+$B = \Lambda' \Sigma_y^{-1}$ is **not** equal to the two-component PCA
+projection $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$. Plot both on the same
+axes.
+
+(c) In one or two sentences, explain why PCA is misspecified for
+factor-analytic data.
+```
+
+```{solution-start} mv_normal_ex6
+:class: dropdown
+```
+
+```{code-cell} python3
+import numpy as np
+import matplotlib.pyplot as plt
+
+N_fa = 10
+k_fa = 2
+
+Λ_fa = np.zeros((N_fa, k_fa))
+Λ_fa[:N_fa//2, 0] = 1
+Λ_fa[N_fa//2:, 1] = 1
+
+results_table = {}
+for σu_val in [0.5, 2.0]:
+    D_fa = np.eye(N_fa) * σu_val ** 2
+    Σy_fa = Λ_fa @ Λ_fa.T + D_fa
+
+    λ_fa, P_fa = np.linalg.eigh(Σy_fa)
+    ind_fa = sorted(range(N_fa), key=lambda x: λ_fa[x], reverse=True)
+    P_fa   = P_fa[:, ind_fa]
+    λ_fa   = λ_fa[ind_fa]
+
+    frac = λ_fa[:2].sum() / λ_fa.sum()
+    results_table[σu_val] = frac
+    print(f"σu={σu_val}: fraction explained by first 2 PCs = {frac:.4f}")
+
+# (b) comparison using σu=0.5
+σu_b = 0.5
+D_b  = np.eye(N_fa) * σu_b ** 2
+Σy_b = Λ_fa @ Λ_fa.T + D_b
+
+μz_b = np.zeros(k_fa + N_fa)
+Σz_b = np.block([[np.eye(k_fa), Λ_fa.T], [Λ_fa, Σy_b]])
+z_b  = np.random.multivariate_normal(μz_b, Σz_b)
+f_b  = z_b[:k_fa]
+y_b  = z_b[k_fa:]
+
+# factor-analytic E[f|y]
+B_b    = Λ_fa.T @ np.linalg.inv(Σy_b)
+Efy_b  = B_b @ y_b
+
+# PCA projection
+λ_b, P_b = np.linalg.eigh(Σy_b)
+ind_b    = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
+P_b      = P_b[:, ind_b]
+ε_b      = P_b.T @ y_b
+y_hat_b  = P_b[:, :2] @ ε_b[:2]
+
+fig, ax = plt.subplots(figsize=(8, 4))
+ax.scatter(range(N_fa), Λ_fa @ Efy_b, label=r'Factor-analytic $\Lambda E[f\mid y]$')
+ax.scatter(range(N_fa), y_hat_b, marker='x', label=r'PCA projection $\hat{y}$')
+ax.scatter(range(N_fa), Λ_fa @ f_b, marker='^', alpha=0.6, label=r'True signal $\Lambda f$')
+ax.set_xlabel('observation index')
+ax.legend()
+plt.show()
+```
+
+PCA is misspecified for factor-analytic data because it imposes no
+structure on the residual covariance: it decomposes $\Sigma_y$ into
+eigenvectors that need not align with the factor loadings $\Lambda$.
+The factor model, by contrast, correctly separates the covariance into a
+low-rank systematic part $\Lambda\Lambda'$ and a diagonal idiosyncratic
+part $D$, so its conditional expectation $E[f\mid Y]$ is the minimum-variance
+linear estimator of the factors.
+
+```{solution-end}
+```
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index b142b9e39..3708b9599 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -1,10 +1,10 @@
 ---
 jupytext:
   text_representation:
-    extension: .myst
+    extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.13.8
+    jupytext_version: 1.17.1
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -465,7 +465,7 @@ $$
 An associated conditional distribution is
 
 $$
-\textrm{Prob}\{Y=i\vert X=j\} = \frac{\rho_{ij}}{ \sum_{j}\rho_{ij}}
+\textrm{Prob}\{Y=j\vert X=i\} = \frac{\rho_{ij}}{ \sum_{j}\rho_{ij}}
 = \frac{\textrm{Prob}\{Y=j, X=i\}}{\textrm{Prob}\{ X=i\}}
 $$
 
@@ -491,7 +491,7 @@ The first row is the probability that $Y=j, j=0,1$ conditional on $X=0$.
 The second row is the probability that $Y=j, j=0,1$ conditional on $X=1$.
 
 Note that
-- $\sum_{j}\rho_{ij}= \frac{ \sum_{j}\rho_{ij}}{ \sum_{j}\rho_{ij}}=1$, so each row of the transition matrix $P$ is a probability distribution (not so for each column).
+- $\sum_{j}p_{ij}= \frac{ \sum_{j}\rho_{ij}}{ \sum_{j}\rho_{ij}}=1$, so each row of the transition matrix $P$ is a probability distribution (not so for each column).
 
 
 
@@ -891,11 +891,6 @@ $$
 f(x,y) =(2\pi\sigma_1\sigma_2\sqrt{1-\rho^2})^{-1}\exp\left[-\frac{1}{2(1-\rho^2)}\left(\frac{(x-\mu_1)^2}{\sigma_1^2}-\frac{2\rho(x-\mu_1)(y-\mu_2)}{\sigma_1\sigma_2}+\frac{(y-\mu_2)^2}{\sigma_2^2}\right)\right]
 $$
 
-
-$$
-\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\left[-\frac{1}{2(1-\rho^2)}\left(\frac{(x-\mu_1)^2}{\sigma_1^2}-\frac{2\rho(x-\mu_1)(y-\mu_2)}{\sigma_1\sigma_2}+\frac{(y-\mu_2)^2}{\sigma_2^2}\right)\right]
-$$
-
 We start with a  bivariate normal distribution pinned down by
 
 $$
@@ -1199,7 +1194,7 @@ $$
 \mu_{0}= (1-q)(1-r)+(1-q)r & =1-q\\
 \mu_{1}= q(1-r)+qr & =q\\
 \nu_{0}= (1-q)(1-r)+(1-r)q& =1-r\\
-\mu_{1}= r(1-q)+qr& =r
+\nu_{1}= r(1-q)+qr& =r
 \end{aligned}
 $$
 
@@ -1488,3 +1483,405 @@ print(c2_ymtb)
 We have verified that both joint distributions, $c_1$ and $c_2$, have identical marginal distributions of $X$ and $Y$, respectively.
 
 So they are both couplings of $X$ and $Y$.
+
+**Gaussian Copula Example**
+
+A **Gaussian copula** uses the bivariate normal distribution to induce dependence between
+arbitrary marginal distributions.
+
+The construction has three steps:
+
+1. Draw $(Z_1, Z_2)$ from a bivariate standard normal with correlation $\rho$.
+2. Apply the standard normal CDF: $U_k = \Phi(Z_k)$. The pair $(U_1, U_2)$ has uniform marginals but retains the dependence structure of $(Z_1, Z_2)$ — this is the copula.
+3. Apply the inverse CDF of any desired marginal: $X_k = F_k^{-1}(U_k)$.
+
+The following code illustrates this with exponential marginals.
+
+```{code-cell} ipython3
+from scipy import stats
+
+# Gaussian copula parameters
+rho_cop = 0.8
+n_cop = 100_000
+
+# Step 1: draw from bivariate standard normal with correlation rho_cop
+z = np.random.multivariate_normal(
+    [0, 0], [[1, rho_cop], [rho_cop, 1]], n_cop
+)
+
+# Step 2: apply normal CDF -> uniform marginals (the copula itself)
+u1 = stats.norm.cdf(z[:, 0])
+u2 = stats.norm.cdf(z[:, 1])
+
+# Step 3: apply inverse CDFs of desired marginals (here: Exponential)
+x1 = stats.expon.ppf(u1, scale=1.0)   # Exp with mean 1
+x2 = stats.expon.ppf(u2, scale=0.5)   # Exp with mean 0.5
+
+fig, axes = plt.subplots(1, 2, figsize=(10, 4))
+axes[0].scatter(u1[:3000], u2[:3000], alpha=0.2, s=2)
+axes[0].set_xlabel('$u_1$')
+axes[0].set_ylabel('$u_2$')
+axes[0].set_title(f'Copula (uniform marginals, ρ={rho_cop})')
+axes[1].scatter(x1[:3000], x2[:3000], alpha=0.2, s=2)
+axes[1].set_xlabel('$x_1$ (Exp, mean=1)')
+axes[1].set_ylabel('$x_2$ (Exp, mean=0.5)')
+axes[1].set_title('Exponential marginals via Gaussian copula')
+plt.tight_layout()
+plt.show()
+
+print(f"Sample correlation of (x1, x2): {np.corrcoef(x1, x2)[0, 1]:.3f}")
+print(f"Sample correlation of (u1, u2): {np.corrcoef(u1, u2)[0, 1]:.3f}")
+```
+
+The left panel shows the copula itself — the dependence structure in uniform coordinates.
+The right panel shows the same dependence translated to exponential marginals.
+Changing $\rho$ controls the strength of dependence while the marginals remain unchanged.
+
+## Exercises
+
+```{exercise}
+:label: prob_matrix_ex1
+
+**Independence Test**
+
+Consider the joint distribution
+
+$$
+F = \begin{bmatrix} 0.3 & 0.2 \\ 0.1 & 0.4 \end{bmatrix}
+$$
+
+where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
+
+(a) Compute the marginal distributions $\mu_i = \text{Prob}\{X=i\}$ and $\nu_j = \text{Prob}\{Y=j\}$.
+
+(b) Form the independence matrix $f^{\perp}_{ij} = \mu_i \nu_j$ (the outer product of the two marginal vectors).
+
+(c) Compare $F$ with $f^{\perp}$ and determine whether $X$ and $Y$ are independent.
+
+(d) Verify your conclusion by computing $\text{Prob}\{X=0|Y=10\}$ and checking whether it equals $\text{Prob}\{X=0\}$.
+```
+
+```{solution-start} prob_matrix_ex1
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+F = np.array([[0.3, 0.2],
+              [0.1, 0.4]])
+
+# (a) marginals
+mu = F.sum(axis=1)   # sum over columns -> marginal for X
+nu = F.sum(axis=0)   # sum over rows    -> marginal for Y
+print("mu (marginal of X):", mu)
+print("nu (marginal of Y):", nu)
+
+# (b) independence matrix
+F_indep = np.outer(mu, nu)
+print("\nIndependence matrix (outer product):\n", F_indep)
+print("\nActual joint F:\n", F)
+
+# (c) test independence
+print("\nIndependent (F == mu ⊗ nu)?", np.allclose(F, F_indep))
+
+# (d) conditional vs. marginal
+prob_X0_given_Y10 = F[0, 0] / nu[0]
+print(f"\nProb(X=0 | Y=10) = {prob_X0_given_Y10:.4f}")
+print(f"Prob(X=0)         = {mu[0]:.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex2
+
+**Covariance and Correlation**
+
+Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 20\}$ as in Exercise 1:
+
+(a) Compute $\mathbb{E}[X]$, $\mathbb{E}[Y]$, and $\mathbb{E}[XY] = \sum_i \sum_j x_i y_j f_{ij}$.
+
+(b) Compute $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$.
+
+(c) Compute $\text{Cor}(X,Y) = \text{Cov}(X,Y) / (\sigma_X \sigma_Y)$.
+
+(d) Show analytically that $X \perp Y$ implies $\text{Cov}(X,Y) = 0$.
+```
+
+```{solution-start} prob_matrix_ex2
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+xs = np.array([0, 1])
+ys = np.array([10, 20])
+F  = np.array([[0.3, 0.2],
+               [0.1, 0.4]])
+
+mu = F.sum(axis=1)
+nu = F.sum(axis=0)
+
+# (a)
+E_X  = xs @ mu
+E_Y  = ys @ nu
+E_XY = sum(xs[i] * ys[j] * F[i, j] for i in range(2) for j in range(2))
+print(f"E[X] = {E_X}, E[Y] = {E_Y}, E[XY] = {E_XY}")
+
+# (b)
+cov_XY = E_XY - E_X * E_Y
+print(f"Cov(X,Y) = {cov_XY:.4f}")
+
+# (c)
+var_X  = ((xs - E_X)**2) @ mu
+var_Y  = ((ys - E_Y)**2) @ nu
+cor_XY = cov_XY / np.sqrt(var_X * var_Y)
+print(f"Cor(X,Y) = {cor_XY:.4f}")
+```
+
+For part (d): if $X \perp Y$ then $f_{ij} = \mu_i \nu_j$, so
+
+$$
+\mathbb{E}[XY] = \sum_i \sum_j x_i y_j \mu_i \nu_j
+= \left(\sum_i x_i \mu_i\right)\!\left(\sum_j y_j \nu_j\right)
+= \mathbb{E}[X]\,\mathbb{E}[Y]
+$$
+
+and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0$.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex3
+
+**Sum of Two Dice (Convolution)**
+
+Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
+
+(a) Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
+
+(b) Plot the theoretical distribution.
+
+(c) Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
+
+(d) Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
+```
+
+```{solution-start} prob_matrix_ex3
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+
+# (a) convolution
+f = np.ones(6) / 6
+h = np.convolve(f, f)        # Z takes values 2,...,12
+z_vals = np.arange(2, 13)
+
+# (b & c) plot theory and simulation
+n = 1_000_000
+z_sim = np.random.randint(1, 7, n) + np.random.randint(1, 7, n)
+counts = np.bincount(z_sim, minlength=13)[2:]
+
+fig, ax = plt.subplots()
+ax.bar(z_vals - 0.2, h,          0.4, alpha=0.7, label='Theoretical')
+ax.bar(z_vals + 0.2, counts / n, 0.4, alpha=0.7, label='Empirical')
+ax.set_xlabel('Z = X + Y')
+ax.set_ylabel('Probability')
+ax.legend()
+plt.show()
+
+# (d) moments
+E_Z   = z_vals @ h
+Var_Z = ((z_vals - E_Z)**2) @ h
+print(f"Theory:     E[Z] = {E_Z:.2f}, Var(Z) = {Var_Z:.4f}")
+print(f"Simulation: E[Z] = {np.mean(z_sim):.2f}, Var(Z) = {np.var(z_sim):.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex4
+
+**Multi-Step Transition Probabilities**
+
+Consider a two-state Markov chain with transition matrix
+
+$$
+P = \begin{bmatrix} 0.9 & 0.1 \\ 0.2 & 0.8 \end{bmatrix}
+$$
+
+where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
+
+(a) Starting from $\psi_0 = [1, 0]$, compute $\psi_n = \psi_0 P^n$ for $n = 1, 5, 20, 100$.
+
+(b) Find the stationary distribution $\psi^*$ satisfying $\psi^* P = \psi^*$ and $\sum_i \psi^*_i = 1$.
+
+(c) Verify numerically that $\psi_n \to \psi^*$ as $n$ grows.
+```
+
+```{solution-start} prob_matrix_ex4
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+P    = np.array([[0.9, 0.1],
+                 [0.2, 0.8]])
+psi0 = np.array([1.0, 0.0])
+
+# (a)
+for n in [1, 5, 20, 100]:
+    print(f"psi_{n:3d} = {psi0 @ np.linalg.matrix_power(P, n)}")
+
+# (b) stationary: solve (P^T - I) psi = 0  with  sum = 1
+A = np.vstack([P.T - np.eye(2), np.ones(2)])
+b = np.array([0.0, 0.0, 1.0])
+psi_star, *_ = np.linalg.lstsq(A, b, rcond=None)
+print(f"\nStationary distribution: {psi_star}")
+
+# (c) verify
+psi_100 = psi0 @ np.linalg.matrix_power(P, 100)
+print(f"psi_100 close to stationary? {np.allclose(psi_100, psi_star, atol=1e-6)}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex5
+
+**Fréchet–Hoeffding Bounds**
+
+Let $X \in \{0,1\}$ and $Y \in \{0,1\}$ with marginals $\mu = [0.5,\, 0.5]$ and $\nu = [0.4,\, 0.6]$.
+
+(a) Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
+
+(b) Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
+
+(c) Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
+
+(d) Verify that all three have the correct marginals.
+
+(e) For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
+```
+
+```{solution-start} prob_matrix_ex5
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+
+xs = np.array([0, 1])
+ys = np.array([0, 1])
+mu = np.array([0.5, 0.5])
+nu = np.array([0.4, 0.6])
+
+# (a) upper Fréchet: maximise P(X=i, Y=i)
+F_upper = np.array([[0.4, 0.1],
+                    [0.0, 0.5]])
+
+# (b) lower Fréchet: maximise P(X=i, Y=1-i)
+F_lower = np.array([[0.0, 0.5],
+                    [0.4, 0.1]])
+
+# (c) independent
+F_indep = np.outer(mu, nu)
+
+# (d) check marginals
+for F, name in [(F_upper, "Upper Fréchet"),
+                (F_lower, "Lower Fréchet"),
+                (F_indep, "Independent  ")]:
+    print(f"{name}: row sums = {F.sum(axis=1)}, col sums = {F.sum(axis=0)}")
+
+# (e) correlations
+def correlation(F, xs, ys):
+    mu_x  = F.sum(axis=1)
+    nu_y  = F.sum(axis=0)
+    E_X   = xs @ mu_x
+    E_Y   = ys @ nu_y
+    E_XY  = sum(xs[i]*ys[j]*F[i,j] for i in range(2) for j in range(2))
+    cov   = E_XY - E_X * E_Y
+    sig_X = np.sqrt(((xs - E_X)**2) @ mu_x)
+    sig_Y = np.sqrt(((ys - E_Y)**2) @ nu_y)
+    return cov / (sig_X * sig_Y)
+
+print(f"\nCor upper Fréchet = {correlation(F_upper, xs, ys):.4f}  (maximum)")
+print(f"Cor lower Fréchet = {correlation(F_lower, xs, ys):.4f}  (minimum)")
+print(f"Cor independent   = {correlation(F_indep, xs, ys):.4f}")
+```
+
+```{solution-end}
+```
+
+```{exercise}
+:label: prob_matrix_ex6
+
+**Bayes' Law with a Discrete Prior**
+
+A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
+
+(a) After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
+
+$$
+\mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
+$$
+
+for each $\theta$.
+
+(b) Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
+
+(c) Plot the prior and posterior side by side.
+
+(d) Repeat for $k = 3$ heads and describe how the posterior shifts.
+```
+
+```{solution-start} prob_matrix_ex6
+:class: dropdown
+```
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.special import comb
+
+thetas = np.array([0.2, 0.5, 0.8])
+prior  = np.array([0.25, 0.50, 0.25])
+
+def compute_posterior(k, n, thetas, prior):
+    likelihood = comb(n, k) * thetas**k * (1 - thetas)**(n - k)
+    unnorm = likelihood * prior
+    return unnorm / unnorm.sum(), likelihood
+
+post7, lik7 = compute_posterior(7, 10, thetas, prior)
+post3, lik3 = compute_posterior(3, 10, thetas, prior)
+
+print("k=7:  likelihood =", lik7.round(4), " posterior =", post7.round(4))
+print("k=3:  likelihood =", lik3.round(4), " posterior =", post3.round(4))
+
+x = np.arange(len(thetas))
+w = 0.3
+fig, axes = plt.subplots(1, 2, figsize=(10, 4))
+for ax, post, title in zip(axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
+    ax.bar(x - w/2, prior, w, label='Prior',     alpha=0.7)
+    ax.bar(x + w/2, post,  w, label='Posterior', alpha=0.7)
+    ax.set_xticks(x)
+    ax.set_xticklabels([f'θ={t}' for t in thetas])
+    ax.set_ylabel('Probability')
+    ax.set_title(title)
+    ax.legend()
+plt.tight_layout()
+plt.show()
+```
+
+```{solution-end}
+```

From cc8d3dcc5db8944f35e2f9736cc050f587fa0de0 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 15:46:35 +1000
Subject: [PATCH 03/10] updates

---
 .../quantecon-lecture-writing.instructions.md |  87 -----
 .github/prompts/quantecon-lecture.prompt.md   | 209 -----------
 lectures/information_market_equilibrium.md    | 352 +++++++++++-------
 lectures/lagrangian_lqdp.md                   |  60 ++-
 lectures/multivariate_normal.md               | 160 ++++----
 lectures/prob_matrix.md                       | 222 ++++++-----
 6 files changed, 431 insertions(+), 659 deletions(-)
 delete mode 100644 .github/instructions/quantecon-lecture-writing.instructions.md
 delete mode 100644 .github/prompts/quantecon-lecture.prompt.md

diff --git a/.github/instructions/quantecon-lecture-writing.instructions.md b/.github/instructions/quantecon-lecture-writing.instructions.md
deleted file mode 100644
index ee3b607f7..000000000
--- a/.github/instructions/quantecon-lecture-writing.instructions.md
+++ /dev/null
@@ -1,87 +0,0 @@
----
-applyTo: "lectures/**/*.md"
-description: "MyST markdown and QuantEcon lecture writing conventions. Applied when editing or creating files in the lectures/ directory."
----
-
-# QuantEcon Lecture Writing Conventions
-
-## Equation Spacing (Critical)
-
-Display equations **must** have a blank line before `$$` and after `$$`:
-
-```
-text before
-
-$$
-equation here
-$$
-
-text after
-```
-
-Never place `$$` immediately adjacent to text lines.
-
-## File Frontmatter
-
-Every lecture `.md` file starts with:
-
-```yaml
----
-jupytext:
-  text_representation:
-    extension: .md
-    format_name: myst
-    format_version: 0.13
-    jupytext_version: 1.17.1
-kernelspec:
-  display_name: Python 3 (ipykernel)
-  language: python
-  name: python3
----
-```
-
-## Cross-Reference Label
-
-Immediately after frontmatter, before the title:
-
-```
-(lecture_label)=
-```{raw} jupyter
-<div id="qe-notebook-header" ...>...</div>
-```
-
-# Title
-```
-
-## Code Cells
-
-All executable Python uses `` ```{code-cell} ipython3 ``.
-Non-executable code uses `` ```python ``.
-
-## Citations and References
-
-- Cite with `{cite}` `` `BibKey` ``
-- Check `lectures/_static/quant-econ.bib` for existing keys before adding new ones
-- New references go in a separate `_extra.bib` file alongside the lecture
-
-## Exercises
-
-Use the paired directives:
-
-```
-```{exercise}
-:label: label_ex1
-...
-```
-
-```{solution-start} label_ex1
-:class: dropdown
-```
-...
-```{solution-end}
-```
-```
-
-## Preferred Python Libraries
-
-`numpy`, `scipy`, `matplotlib`, `quantecon`, `jax` (for computationally intensive work), `numba`
diff --git a/.github/prompts/quantecon-lecture.prompt.md b/.github/prompts/quantecon-lecture.prompt.md
deleted file mode 100644
index 453651434..000000000
--- a/.github/prompts/quantecon-lecture.prompt.md
+++ /dev/null
@@ -1,209 +0,0 @@
----
-name: "QuantEcon Lecture from Paper"
-description: "Convert a scientific paper (PDF or .tex) into a QuantEcon lecture in MyST markdown. Attach the paper file before invoking. Produces a .md lecture file and a supplementary .bib file."
-argument-hint: "Attach the paper PDF or .tex file, then optionally specify the desired output filename (e.g. 'my_topic.md')"
-agent: "agent"
----
-
-You are helping Thomas Sargent convert a scientific paper into a QuantEcon lecture written in the MyST dialect of markdown, following the style and conventions of [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md).
-
-## Your Task
-
-1. **Read the attached paper** (PDF or .tex). Understand its core economic/mathematical content, key results, key intuitions, and analytical techniques.
-
-2. **Draft a complete QuantEcon lecture** as a `.md` file in `lectures/`. The lecture should:
-   - Explain the paper's ideas accessibly to a graduate student audience
-   - Lead the reader through the theory step by step, not just summarize
-   - Include substantial Python code cells that illustrate, compute, and visualize the paper's key results
-   - End with exercises (with full solutions in dropdown blocks)
-
-3. **Produce a supplementary `.bib` file** for any references not already in [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib).
-
----
-
-## MyST / Jupyter Book Format Rules
-
-Follow these rules exactly. Study [lectures/likelihood_ratio_process.md](../lectures/likelihood_ratio_process.md) as the canonical example.
-
-### File Frontmatter (required, verbatim structure)
-
-```
----
-jupytext:
-  text_representation:
-    extension: .md
-    format_name: myst
-    format_version: 0.13
-    jupytext_version: 1.17.1
-kernelspec:
-  display_name: Python 3 (ipykernel)
-  language: python
-  name: python3
----
-```
-
-### Required Header Block
-
-Immediately after the frontmatter, add a cross-reference label and the QuantEcon logo block:
-
-```
-(my_lecture_label)=
-```{raw} jupyter
-<div id="qe-notebook-header" align="right" style="text-align:right;">
-        <a href="https://quantecon.org/" title="quantecon.org">
-                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
-        </a>
-</div>
-```
-
-# Lecture Title
-
-```{contents} Contents
-:depth: 2
-```
-```
-
-### Equations — CRITICAL SPACING RULE
-
-Every display equation block **must** have a blank line before the opening `$$` and a blank line after the closing `$$`. This is mandatory.
-
-**Correct:**
-
-```
-some text before
-
-$$
-E[x] = \mu
-$$
-
-some text after
-```
-
-**Wrong (will break the build):**
-
-```
-some text before
-$$
-E[x] = \mu
-$$
-some text after
-```
-
-Inline math uses single dollars: $\mu$, $\sigma^2$.
-
-Multi-line aligned equations use:
-
-```
-$$
-\begin{aligned}
-a &= b \\
-c &= d
-\end{aligned}
-$$
-```
-
-### Code Cells
-
-Use ` ```{code-cell} ipython3 ` for all executable Python. For the `pip install` cell at the top (if needed):
-
-```
-```{code-cell} ipython3
-:tags: [hide-output]
-!pip install --upgrade quantecon
-```
-```
-
-### Citations
-
-Use `{cite}` with the BibTeX key: `{cite}` `` `Author_Year` ``. Example: `{cite}` `` `Neyman_Pearson` ``.
-
-Check [lectures/_static/quant-econ.bib](../lectures/_static/quant-econ.bib) first. Add only truly missing references to the new `.bib` file.
-
-### Cross-references
-
-- Link to other lectures: `{doc}` `` `likelihood_ratio_process` ``
-- Label a section: `(my_label)=` on the line before the heading
-- Reference a label: `{ref}` `` `my_label` ``
-
-### Admonitions
-
-```
-```{note}
-...
-```
-
-```{warning}
-...
-```
-```
-
-### Exercises with Solutions
-
-```
-```{exercise}
-:label: ex_label1
-
-Exercise text here.
-```
-
-```{solution-start} ex_label1
-:class: dropdown
-```
-
-Full solution here, including code cells if needed.
-
-```{solution-end}
-```
-```
-
----
-
-## Lecture Structure Template
-
-Follow this section order:
-
-1. **Overview** — What is this lecture about? What will the reader learn? List bullets.
-2. **Setup** — Imports code cell (all needed libraries). If non-standard packages are needed, add the `pip install` cell first.
-3. **Theory sections** — Walk through mathematical content. Alternate prose, equations, and code cells. Each major concept gets its own `##` section.
-4. **Computational/Simulation sections** — Python code that replicates or extends the paper's numerical results.
-5. **Exercises** — 2–4 exercises ranging from straightforward to challenging, each with a full solution.
-6. **References** — at the end, just add: `` ```{bibliography} `` on its own if references were cited (the global bib handles this automatically via `_config.yml`).
-
----
-
-## Python Code Guidelines
-
-- Use `numpy`, `scipy`, `matplotlib`, `quantecon` as the default stack
-- Prefer `jax.numpy` / JAX for computationally intensive sections (this repo already has JAX installed)
-- Every figure should call `plt.show()` or `plt.tight_layout(); plt.show()`
-- Write clean, readable code with short docstrings on functions
-- Simulate and plot the paper's key theoretical results rather than just describing them
-
----
-
-## Supplementary BibTeX File
-
-Name it `lectures/_static/<lecture_name>_extra.bib`. Format example:
-
-```bibtex
-@article{Author_Year,
-  author  = {Last, First and Last2, First2},
-  title   = {Full Title of the Paper},
-  journal = {Journal Name},
-  volume  = {XX},
-  number  = {Y},
-  pages   = {1--30},
-  year    = {YYYY}
-}
-```
-
-Only include references **not already found** in `lectures/_static/quant-econ.bib`.
-
----
-
-## Output
-
-Produce the complete lecture as a single MyST markdown file. After completing it, also report:
-- The name and path of the output file (e.g. `lectures/my_topic.md`)
-- The name and path of the supplementary bib file (if any new references were needed)
-- A brief (3–5 bullet) summary of what the lecture covers
diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 222f9bccb..851560d7b 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -28,42 +28,51 @@ kernelspec:
 
 ## Overview
 
-This lecture studies two questions about the **informational role of prices**  posed and
+This lecture studies two questions about the **informational role of prices** posed and
 answered by {cite:t}`kihlstrom_mirman1975`.
 
-1. **When do prices transmit inside information?**  An informed insider observes a private
+1. *When do prices transmit inside information?*   
+   - An informed insider observes a private
    signal correlated with an unknown state of the world and adjusts demand accordingly.
-   Equilibrium prices shift.  Under what conditions can an outside observer *infer* the
+   - Equilibrium prices shift. 
+   - Under what conditions can an outside observer *infer* the
    insider's private signal from the equilibrium price?
 
-2. **Do Bayesian price expectations converge?**  In a stationary stochastic exchange
-   economy, an uninformed observer uses the history of market prices and Bayes' Law  to form
-    expectations about the economy's structure.  Do those expectations eventually
-   agree with those  of a fully informed observer?
+2. *Do Bayesian price expectations converge?*  
+   - In a stationary stochastic exchange
+   economy, an uninformed observer uses the history of market prices and Bayes' Law to form
+   expectations about the economy's structure.  
+   - Do those expectations eventually
+   agree with those of a fully informed observer?
 
 Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
 
-- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be  *sufficient* for a random variable
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be *sufficient* for a random variable
   $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
   information about the state that $\tilde{y}'$ contains.
 - **Bayesian consistency**: as the sample grows, a Bayesian statistician's posterior probability distribution concentrates on the true
-  parameter value *even when the underlying economic structure is not globally identified from prices alone*.
+  parameter value, even when the underlying economic structure is not globally identified from prices alone.
 
 Important findings of {cite:t}`kihlstrom_mirman1975` are:
 
-- Equilibrium prices transmit inside information **if and only if** the map from the
+- Equilibrium prices transmit inside information *if and only if* the map from the
   insider's posterior distribution to the equilibrium price vector is invertible
   (one-to-one).
 - For a two-state pure exchange economy with CES preferences, invertibility holds whenever the
-  elasticity of substitution $\sigma \neq 1$.  With Cobb-Douglas preferences ($\sigma = 1$)
+  elasticity of substitution $\sigma \neq 1$.  
+  - With Cobb-Douglas preferences ($\sigma = 1$)
   the equilibrium price is independent of the insider's posterior, so information is never
   transmitted.
-- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not  identified.
+- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not identified.
 
 ```{note}
-{cite:t}`kihlstrom_mirman1975` use the terms ''reduced form'' and ''structural'' models in a
-way that careful econometricians do.  Reduced-form  and structural models  come in pairs. To each structure or structural model
-there is a reduced form, or collection of reduced forms, underlying  different possible regressions.
+{cite:t}`kihlstrom_mirman1975` use the terms "reduced form" and "structural" models in a
+way that careful econometricians do. 
+
+Reduced-form and structural models come in pairs. 
+
+To each structure or structural model
+there is a reduced form, or collection of reduced forms, underlying different possible regressions.
 ```
 
 The lecture is organized as follows.
@@ -71,7 +80,7 @@ The lecture is organized as follows.
 1. Set up the static two-commodity model and define equilibrium.
 2. State the price-revelation theorem (Theorem 1 of the paper) and the invertibility
    conditions (Theorem 2).
-3. Illustrate invertibility — and its failure — with numerical examples using CES and
+3. Illustrate invertibility and its failure with numerical examples using CES and
    Cobb-Douglas preferences.
 4. Introduce the dynamic stochastic economy and derive the Bayesian convergence result.
 5. Simulate Bayesian learning from price observations.
@@ -87,9 +96,9 @@ from scipy.optimize import brentq
 from scipy.stats import norm
 ```
 
-## A Two-Commodity Economy with an Informed Insider
+## A two-commodity economy with an informed insider
 
-### Preferences, Endowments, and the Unknown State
+### Preferences, endowments, and the unknown state
 
 The economy has two goods. 
 
@@ -115,7 +124,7 @@ representative firm.
 The firm's profit $\pi$ is determined by profit maximization.
 
 Agent
-$i$'s **budget constraint** is
+$i$'s budget constraint is
 
 $$
 p x_1^i + x_2^i = w^i + \theta^i \pi.
@@ -126,7 +135,7 @@ Agents maximize expected utility subject to their budget constraints.
 A **competitive
 equilibrium** is a price $\hat{p}$ that clears both markets simultaneously.
 
-### The Informed Agent's Problem
+### The informed agent's problem
 
 Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$ correlated with
 $\bar{a}$ before trading.
@@ -151,9 +160,9 @@ This is possible when the map $\mu \mapsto p(\mu)$
 is **invertible** on the relevant domain.
 
 (price_revelation_theorem)=
-## Price Revelation: Theorem 1
+## Price revelation
 
-### Blackwell Sufficiency
+### Blackwell sufficiency
 
 The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's private
 information if observing the equilibrium price is just as informative about $\bar{a}$ as
@@ -162,9 +171,12 @@ observing the signal $\tilde{y}$ directly.
 In Blackwell's language ({cite}`blackwell1951` and {cite}`blackwell1953`), this means
 $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 
-**Definition.**  A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
+```{prf:definition} Sufficiency
+:label: ime_def_sufficiency
+
+A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
 respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
-**independent of**$\bar{a}$, such that
+**independent of** $\bar{a}$, such that
 
 $$
 \phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
@@ -175,11 +187,17 @@ where $\phi_a(y) = PR(\tilde{y} = y \mid \bar{a} = a)$.
 
 Thus, once $\tilde{y}$ is known, $\tilde{y}'$ provides no additional information
 about $\bar{a}$.
+```
+
+```{prf:lemma} Posterior Sufficiency
+:label: ime_lemma_posterior_sufficiency
 
-**Lemma 1** ({cite:t}`kihlstrom_mirman1975`).  The posterior distribution $\mu_{\tilde{y}}$
-is  sufficient  for $\tilde{y}$.
+({cite:t}`kihlstrom_mirman1975`) The posterior distribution $\mu_{\tilde{y}}$
+is sufficient for $\tilde{y}$.
+```
 
-*Proof sketch.*  The posterior $\mu_{\tilde{y}}$ satisfies
+```{prf:proof} (Sketch)
+The posterior $\mu_{\tilde{y}}$ satisfies
 
 $$
 PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
@@ -187,9 +205,13 @@ PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
 $$
 
 Because the posterior itself *encodes* what $\tilde{y}$ says about $\bar{a}$, observing
-$\tilde{y}$ directly would add no information. $\square$
+$\tilde{y}$ directly would add no information.
+```
+
+```{prf:theorem} Price Revelation
+:label: ime_theorem_price_revelation
 
-**Theorem 1** ({cite:t}`kihlstrom_mirman1975`).  In the economy described above, the price
+In the economy described above, the price
 random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only if** the
 function $p(PR^1)$ is **invertible** on the set
 
@@ -197,25 +219,34 @@ $$
 P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
   PR(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu(a) > 0 \bigr\}.
 $$
+```
 
 The "only if" direction follows because if $p$ were not one-to-one, two different posteriors
 would generate the same price; an observer could not distinguish them, so the price would
-not transmit all information that resides in  the signal.
+not transmit all information that resides in the signal.
+
+### Two interpretations
+
+#### Insider trading in a stock market
 
-### Two Interpretations
+Good 1 is a risky asset with random return $\bar{a}$; good 2 is "money".
+
+An insider's demand reveals private information about the return.
 
-**Insider trading in a stock market.**  Good 1 is a risky asset with random return $\bar{a}$;
-good 2 is ''money''.  An insider's demand reveals private information about the return.
 If the invertibility condition holds, outside observers can read the insider's signal from
 the equilibrium stock price.
 
-**Price as a quality signal.**  Good 1 has uncertain quality $\bar{a}$.  Experienced
-consumers (who have sampled the good) observe a signal correlated with quality and buy
-accordingly.  Uninformed consumers can infer quality from the market price, provided
-invertibility holds.
+#### Price as a quality signal
+
+Good 1 has uncertain quality $\bar{a}$.
+
+Experienced consumers (who have sampled the good) observe a signal correlated with quality
+and buy accordingly.
+
+Uninformed consumers can infer quality from the market price, provided invertibility holds.
 
 (invertibility_conditions)=
-## Invertibility and the Elasticity of Substitution (Theorem 2)
+## Invertibility and the elasticity of substitution
 
 When does $p(PR^1)$ fail to be invertible?
 
@@ -223,7 +254,7 @@ Theorem 2 of {cite:t}`kihlstrom_mirman1975`
 shows that for a two-state economy ($S = 2$), the answer turns on the **elasticity of
 substitution** $\sigma$ of agent 1's utility function.
 
-### The Two-State First-Order Condition
+### The two-state first-order condition
 
 With $S = 2$ and $\mu = (q,\, 1-q)$, the first-order condition for agent 1's demand
 (equation (12a) in the paper) reduces to
@@ -242,21 +273,25 @@ $$
 The equilibrium consumption $(x_1, x_2)$ itself depends on $p$, so this is an implicit
 equation in $p$.
 
-**Theorem 2** ({cite:t}`kihlstrom_mirman1975`).  Assume $u^1$ is quasi-concave and
-homothetic with continuous first partials.  Assume agent 1 always consumes positive
-quantities of both goods.  For $S = 2$:
+```{prf:theorem} Invertibility Conditions
+:label: ime_theorem_invertibility_conditions
+
+Assume $u^1$ is quasi-concave and
+homothetic with continuous first partials. Assume agent 1 always consumes positive
+quantities of both goods. For $S = 2$:
 
 - If $\sigma < 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
 - If $\sigma > 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
 - If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), $p(PR^1)$ is **constant** on $P$
   (no information is transmitted).
+```
 
 Thus, when $\sigma = 1$ the income and substitution effects exactly cancel,
 making agent 1's demand for good 1 independent of information about $\bar{a}$.
 
 So the market price cannot reveal that information.
 
-### CES Utility
+### CES utility
 
 For concreteness we work with the **constant-elasticity-of-substitution** (CES) utility
 function
@@ -278,10 +313,10 @@ u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1}, \qqu
 u_2(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_2^{\rho-1}.
 $$
 
-### Equilibrium Price as a Function of the Posterior
+### Equilibrium price as a function of the posterior
 
 We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1 at
-equilibrium (i.e.~$x_1 = 1$).
+equilibrium (i.e., $x_1 = 1$).
 
 Agent 1's budget constraint then reduces to
 $x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$ satisfying
@@ -292,11 +327,11 @@ p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
 = q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
 $$
 
-For Cobb-Douglas utility ($\sigma = 1$), first-order-necessary conditions  (FOC) become $p = W^1 - p$,
+For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p = W^1 - p$,
 giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no information
 is transmitted through the price in the Cobb-Douglas case.
 
-We compute first-order-necessary conditions numerically below.
+We compute first-order conditions numerically below.
 
 ```{code-cell} ipython3
 def ces_derivatives(c1, c2, rho):
@@ -351,6 +386,12 @@ def eq_price(q, a1, a2, W1, rho):
 ```
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: equilibrium price vs posterior
+    name: fig-eq-price-posterior
+---
 # ── Economy parameters ──────────────────────────────────────────────────────
 a1, a2 = 2.0, 0.5     # state values (a1 > a2)
 W1     = 4.0           # informed agent's wealth; equilibrium x2 = W1 - p
@@ -371,17 +412,15 @@ for rho, label, color in zip(rho_values, rho_labels, colors):
     prices = [eq_price(q, a1, a2, W1, rho) for q in q_grid]
     ax.plot(q_grid, prices, label=label, color=color, lw=2)
 
-ax.set_xlabel(r"Posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
-ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
-ax.set_title("Equilibrium price as a function of the informed agent's posterior",
-             fontsize=12)
+ax.set_xlabel(r"posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
+ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.legend(fontsize=10)
 ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
-The plot confirms Theorem 2.
+The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
 - **CES with $\sigma \neq 1$**: the equilibrium price is **strictly monotone** in $q$.
   An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely invert the
@@ -390,7 +429,7 @@ The plot confirms Theorem 2.
   transmitted through the market.
 
 ```{code-cell} ipython3
-# ── Verify that rho=0 (exact Cobb-Douglas) gives a flat line ─────────────────
+# Verify that rho=0 (exact Cobb-Douglas) gives a flat line
 p_cd = [eq_price(q, a1, a2, W1, rho=0.0) for q in q_grid]
 
 print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
@@ -403,7 +442,7 @@ Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the Cobb-
 equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
 
 (price_monotonicity)=
-### Why Monotonicity Depends on $\sigma$
+### Why monotonicity depends on $\sigma$
 
 The derivative $\partial p / \partial q$ has the sign of $\alpha_1 \beta_2 - \alpha_2 \beta_1$
 (from differentiating the FOC formula).
@@ -435,6 +474,12 @@ Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for diffe
 values of $\sigma$:
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: marginal rate of substitution
+    name: fig-mrs-alpha-beta
+---
 a_vals = np.linspace(0.3, 3.0, 300)
 x1_fix, x2_fix = 1.0, 1.0   # fix consumption bundle for illustration
 
@@ -448,9 +493,8 @@ for rho, color in zip([-0.5, -1e-6, 0.5], ["steelblue", "crimson", "forestgreen"
     ax.plot(a_vals, ratios,
             label=rf"$\sigma = {sigma:.2f}$", color=color, lw=2)
 
-ax.set_xlabel(r"State value $a_s$", fontsize=12)
+ax.set_xlabel(r"state value $a_s$", fontsize=12)
 ax.set_ylabel(r"$\alpha_s / \beta_s = a_s u_1 / u_2$", fontsize=12)
-ax.set_title(r"Marginal rate of substitution $\alpha_s/\beta_s$ vs.\ $a_s$", fontsize=12)
 ax.axhline(y=1.0, color="black", lw=0.8, ls="--")
 ax.legend(fontsize=10)
 ax.grid(alpha=0.3)
@@ -466,13 +510,15 @@ ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
 equilibrium price strictly monotone in the posterior $q$ in both cases.
 
 (bayesian_price_expectations)=
-## Bayesian Price Expectations in a Dynamic Economy
+## Bayesian price expectations in a dynamic economy
 
 We now turn to a question addressed in Section 3 of {cite:t}`kihlstrom_mirman1975`.
 
-### A Stochastic Exchange Economy
+### A stochastic exchange economy
+
+Time is discrete: $t = 1, 2, \ldots$
 
-Time is discrete: $t = 1, 2, \ldots$  In each period $t$:
+In each period $t$:
 
 1. Consumer $i$ receives a random endowment $\omega_i^t$.
 2. Markets open; competitive prices $p^t = p(\omega^t)$ clear all markets.
@@ -493,7 +539,7 @@ $$
 Following econometric convention, {cite:t}`kihlstrom_mirman1975` call $g(p \mid \lambda)$
 the **reduced form** and $f(\omega \mid \lambda)$ the **structure**.
 
-### The Identification Problem
+### The identification problem
 
 Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices loses
 information relative to observing endowments.
@@ -511,9 +557,10 @@ form** (with respect to data on prices).
 An observer who knows the infinite price history learns
 $\mu$ but not necessarily $\lambda$.
 
-### Bayesian Updating
+### Bayesian updating
 
 An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in \Lambda$.
+
 After observing the price sequence $(p^1, \ldots, p^t)$, the observer's Bayesian
 posterior is
 
@@ -532,45 +579,57 @@ g(p^{t+1} \mid p^1, \ldots, p^t)
     h(\lambda \mid p^1, \ldots, p^t).
 $$
 
-### The Convergence Theorem
+### The convergence theorem
+
+```{prf:theorem} Bayesian Convergence
+:label: ime_theorem_bayesian_convergence
+
+Let $\bar\lambda$ be the true
+structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.
 
-**Theorem** ({cite:t}`kihlstrom_mirman1975`, Section 3).  Let $\bar\lambda$ be the true
-structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.  Then:
+Then
 
 $$
 \lim_{t \to \infty} h(\mu \mid p^1, \ldots, p^t)
   = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,} \end{cases}
 $$
 
-with probability one.  Consequently,
+with probability one.
+
+Consequently,
 
 $$
 \lim_{t \to \infty} g(p^{t+1} \mid p^1, \ldots, p^t) = g(p \mid \bar\mu),
 $$
 
 which equals the rational-expectations price distribution for a fully informed observer.
+```
 
-Establishing convergence relies on appealing to  the **Bayesian consistency** result of {cite:t}`degroot1962`: as
+Establishing convergence relies on appealing to the **Bayesian consistency** result of {cite:t}`degroot1962`: as
 long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
 (which holds here generically), the posterior concentrates on the true reduced form.
 
-**Key insight.**  Price observers converge to **rational expectations** even if they
-never identify the underlying structure $\bar\lambda$.  The reduced form
-$g(p \mid \bar\mu)$ statistical model is used to form equilibrium price expectations, and the Bayesian
-observer learns the reduced form from prices alone.
+Price observers converge to **rational expectations** even if they never identify the
+underlying structure $\bar\lambda$.
+
+The reduced form $g(p \mid \bar\mu)$ statistical model is used to form equilibrium price
+expectations, and the Bayesian observer learns the reduced form from prices alone.
 
 (bayesian_simulation)=
-## Simulating Bayesian Learning from Prices
+## Simulating Bayesian learning from prices
 
 We illustrate the theorem with a two-state example.
 
-**Setup.**  Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
-$p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.  The observer knows
-the two possible price distributions (the reduced forms) but not which one governs the
-data.
+Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
+$p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.
 
-This is a standard **Bayesian model selection** problem.  With a prior $h_0$ on $\mu_1$
-and the observed price $p^t$, the posterior weight on $\mu_1$ after period $t$ is
+The observer knows the two possible price distributions (the reduced forms) but not which
+one governs the data.
+
+This is a standard **Bayesian model selection** problem.
+
+With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight on $\mu_1$
+after period $t$ is
 
 $$
 h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
@@ -623,22 +682,23 @@ def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
         ax.plot(t_grid, path, alpha=0.25, lw=0.8, color="steelblue")
 
     median_path = np.median(h_paths, axis=0)
-    ax.plot(t_grid, median_path, color="navy", lw=2.5, label="Median posterior")
+    ax.plot(t_grid, median_path, color="navy", lw=2, label="median posterior")
 
-    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="True model weight = 1")
-    ax.set_xlabel("Period $t$", fontsize=12)
+    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="true model weight = 1")
+    ax.set_xlabel("period $t$", fontsize=12)
     ax.set_ylabel(r"$h_t$ = posterior weight on true model", fontsize=12)
-    ax.set_title(
-        rf"Bayesian learning: $\bar p_{{\\rm true}}={p_bar_true:.1f}$, "
-        rf"$\bar p_{{\\rm alt}}={p_bar_alt:.1f}$, $\sigma_p={sigma_p:.2f}$",
-        fontsize=11,
-    )
     ax.legend(fontsize=10)
     ax.set_ylim(-0.05, 1.08)
     ax.grid(alpha=0.3)
 ```
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: bayesian learning across paths
+    name: fig-bayesian-learning
+---
 T       = 300
 h0      = 0.5     # diffuse prior
 n_paths = 40
@@ -650,13 +710,11 @@ fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 p_bar_true, p_bar_alt = 2.0, 1.2
 h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
-axes[0].set_title("Easy case: means far apart", fontsize=12)
 
 # Case 2: similar reduced forms (harder to learn)
 p_bar_true, p_bar_alt = 2.0, 1.8
 h_paths_hard = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths_hard, p_bar_true, p_bar_alt, axes[1])
-axes[1].set_title("Hard case: means close together", fontsize=12)
 
 plt.tight_layout()
 plt.show()
@@ -665,12 +723,18 @@ plt.show()
 In both panels the posterior weight on the true model converges to 1 with probability one,
 though convergence is slower when the two price distributions are similar (right panel).
 
-### Price Expectations vs. Rational Expectations
+### Price expectations vs. rational expectations
 
 We now verify that the observer's price expectations converge to the rational-expectations
 distribution $g(p \mid \bar\mu)$.
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: price distribution convergence
+    name: fig-price-convergence
+---
 def price_expectation(h_t, p_bar_true, p_bar_alt, p_grid):
     """
     Compute the observer's predictive price density at posterior weight h_t.
@@ -701,11 +765,10 @@ for t_snap, col in zip(snapshots, palette):
     ax.plot(p_grid, dens, color=col, lw=2,
             label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$")
 
-ax.plot(p_grid, re_density, "k--", lw=2.5,
-        label=r"Rational expectations $g(p \mid \bar\mu)$")
-ax.set_xlabel("Price $p$", fontsize=12)
-ax.set_ylabel("Density", fontsize=12)
-ax.set_title("Observer's price distribution converges to rational expectations", fontsize=12)
+ax.plot(p_grid, re_density, "k--", lw=2,
+        label=r"rational expectations $g(p \mid \bar\mu)$")
+ax.set_xlabel("price $p$", fontsize=12)
+ax.set_ylabel("density", fontsize=12)
 ax.legend(fontsize=9)
 ax.grid(alpha=0.3)
 plt.tight_layout()
@@ -715,11 +778,10 @@ plt.show()
 The sequence of predictive densities (shades of blue) converges to the rational-expectations
 density (dashed black line) as experience accumulates.
 
-This illustrates the main theorem of
-Section 3 of {cite:t}`kihlstrom_mirman1975`.
+This illustrates {prf:ref}`ime_theorem_bayesian_convergence`.
 
 (km_extension_nonidentification)=
-### Learning the Reduced Form without Identifying the Structure
+### Learning the reduced form without identifying the structure
 
 The convergence result is particularly striking because the observer converges to
 *rational expectations* even when the underlying **structure** $\lambda$ is
@@ -731,6 +793,12 @@ $\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
 (because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price distribution).
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: learning with non-identification
+    name: fig-nonidentification
+---
 def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths, seed=0):
     """
     Bayesian learning with 3 structures, 2 reduced forms.
@@ -775,18 +843,12 @@ for k, (ax, label) in enumerate(zip(axes, struct_labels)):
     for path in h_paths_3:
         ax.plot(t_grid, path[:, k], alpha=0.25, lw=0.8, color="steelblue")
     ax.plot(t_grid, np.median(h_paths_3[:, :, k], axis=0),
-            color="navy", lw=2.5, label="Median")
-    ax.set_title(f"Structure {label}", fontsize=10)
-    ax.set_xlabel("Period $t$", fontsize=11)
+            color="navy", lw=2, label=f"median weight on {label}")
+    ax.set_xlabel("period $t$", fontsize=11)
     ax.grid(alpha=0.3)
     ax.legend(fontsize=9)
 
-axes[0].set_ylabel("Posterior weight", fontsize=11)
-fig.suptitle(
-    r"Non-identification: weights on $\lambda^{(1)}$ and $\lambda^{(2)}$ stabilize at "
-    r"non-degenerate values; $\lambda^{(3)}$ is eliminated",
-    fontsize=10, y=1.02
-)
+axes[0].set_ylabel("posterior weight", fontsize=11)
 plt.tight_layout()
 plt.show()
 ```
@@ -819,13 +881,13 @@ $$
 
 subject to the budget constraint $p\,x_1 + x_2 = w$.  Total supply of good 1 is $X_1 = 1$.
 
-(a) Derive the first-order condition for the informed agent's optimal $x_1$.
+1. Derive the first-order condition for the informed agent's optimal $x_1$.
 
-(b) Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
+1. Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
 supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.  Solve it
 numerically for $q \in (0,1)$ and several values of $\gamma$.
 
-(c) Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
+1. Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
 holds.  Explain intuitively why CARA preferences always lead to an invertible price map
 (the elasticity of substitution of portfolio utility is $\sigma = \infty$).
 ```
@@ -834,7 +896,7 @@ holds.  Explain intuitively why CARA preferences always lead to an invertible pr
 :class: dropdown
 ```
 
-**(a) First-order condition.**
+**1. First-order condition.**
 
 Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.  The FOC is
 
@@ -847,17 +909,17 @@ or equivalently (dividing by $\gamma$ and rearranging)
 
 $$
 q\,(a_1 - p)\, e^{-\gamma(a_1-p) x_1}
-  = (1-q)\,(p - a_2)\, e^{-\gamma(p-a_2) x_1}.
+  = (1-q)\,(p - a_2)\, e^{\gamma(p-a_2) x_1}.
 $$
 
-**(b) Market-clearing equilibrium price.**
+**2. Market-clearing equilibrium price.**
 
 Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
 a scalar root-finding problem in $p$:
 
 $$
 F(p;\,q,\gamma) \equiv
-  q\,(a_1-p)\,e^{-\gamma(a_1-p)} - (1-q)\,(p-a_2)\,e^{-\gamma(p-a_2)} = 0.
+  q\,(a_1-p)\,e^{-\gamma(a_1-p)} - (1-q)\,(p-a_2)\,e^{\gamma(p-a_2)} = 0.
 $$
 
 ```{code-cell} ipython3
@@ -866,7 +928,7 @@ from scipy.optimize import brentq
 def F_cara(p, q, a1, a2, gamma, x1=1.0):
     """Residual of CARA market-clearing condition."""
     return (q * (a1-p) * np.exp(-gamma*(a1-p)*x1)
-            - (1-q) * (p-a2) * np.exp(-gamma*(p-a2)*x1))
+            - (1-q) * (p-a2) * np.exp(gamma*(p-a2)*x1))
 
 a1, a2  = 2.0, 0.5
 q_grid  = np.linspace(0.05, 0.95, 200)
@@ -875,14 +937,14 @@ colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(gammas)))
 
 fig, ax = plt.subplots(figsize=(8, 5))
 for gamma, color in zip(gammas, colors_sol):
-    p_eq = [brentq(F_cara, a2+1e-4, a1-1e-4,
+    p_eq = [brentq(F_cara, a2, a1,
                    args=(q, a1, a2, gamma))
             for q in q_grid]
     ax.plot(q_grid, p_eq, lw=2, color=color,
             label=rf"$\gamma = {gamma}$")
 
-ax.set_xlabel(r"Posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
-ax.set_ylabel("Equilibrium price $p^*(q)$", fontsize=12)
+ax.set_xlabel(r"posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
+ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.set_title("CARA preferences: equilibrium prices", fontsize=12)
 ax.legend(fontsize=10)
 ax.grid(alpha=0.3)
@@ -890,12 +952,12 @@ plt.tight_layout()
 plt.show()
 ```
 
-**(c) Invertibility for CARA.**
+**3. Invertibility for CARA.**
 
 The price is strictly increasing in $q$ for every $\gamma > 0$.  Intuitively, portfolio
 utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect substitutes** in
-creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$.  By
-Theorem 2 of {cite:t}`kihlstrom_mirman1975`, the price map is therefore always invertible.
+creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$. By
+{prf:ref}`ime_theorem_invertibility_conditions`, the price map is therefore always invertible.
 
 ```{solution-end}
 ```
@@ -903,7 +965,7 @@ Theorem 2 of {cite:t}`kihlstrom_mirman1975`, the price map is therefore always i
 ```{exercise}
 :label: km_ex2
 
-**Convergence rate and KL divergence.**  In the Bayesian learning simulation, the speed of
+In the Bayesian learning simulation, the speed of
 convergence to rational expectations is determined by the **Kullback-Leibler divergence**
 between the two reduced forms.
 
@@ -914,14 +976,14 @@ $$
 D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2}.
 $$
 
-(a) For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
+1. For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
 ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.8$), compute $D_{KL}$ for $\sigma_p = 0.4$.
 
-(b) Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
+1. Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
 path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot histograms of
 $T_{0.99}$ for both cases.
 
-(c) How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
+1. How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
 roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
 ```
 
@@ -964,7 +1026,7 @@ for ax, (name, p1, p2) in zip(axes, cases):
         fontsize=11
     )
     ax.set_xlabel(r"$T_{0.99}$", fontsize=12)
-    ax.set_ylabel("Count", fontsize=11)
+    ax.set_ylabel("count", fontsize=11)
     ax.legend(fontsize=10)
     ax.grid(alpha=0.3)
 
@@ -997,24 +1059,24 @@ $$
 
 where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
 
-(a) For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
+1. For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
 $\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of $(q, r)$.
 Compute $\partial m / \partial r$ and show that its sign depends on
 $\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
 
-(b) Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
+1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
 aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify numerically
 that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone in $r$),
 giving a counterexample to invertibility.
 
-(c) Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
+1. Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
 ```
 
 ```{solution-start} km_ex3
 :class: dropdown
 ```
 
-**(a)** Rewrite the MRS with $\mu_1 = 1-r-q$:
+**1.** Rewrite the MRS with $\mu_1 = 1-r-q$:
 
 $$
 m(q,r) = \frac{a_1\beta_1(1-r-q) + a_2\beta_2 r + a_3\beta_3 q}
@@ -1032,7 +1094,7 @@ After simplification this reduces to a signed combination of
 $\beta_1\beta_2(a_1-a_2)({\cdot})$ and $\beta_2\beta_3(a_2-a_3)({\cdot})$ terms
 whose sign is parameter-dependent.
 
-**(b) Numerical verification.**
+**2. Numerical verification.**
 
 ```{code-cell} ipython3
 def mrs_3state(q, r, a1, a2, a3, x1, x2, gamma):
@@ -1080,7 +1142,7 @@ print("Sign changes in dm/dr:",
 The derivative $\partial m / \partial r$ changes sign, confirming that the MRS (and hence
 the equilibrium price) is **not** monotone in $r$ for $S = 3$.
 
-**(c)** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
+**3.** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
 and the MRS is a function of $q$ alone.  One can show directly that $\partial m / \partial q$
 has a definite sign determined entirely by whether $a_1 > a_2$ and whether
 $\sigma > 1$ or $\sigma < 1$ hold—there is no room for sign changes.  With three states,
@@ -1093,20 +1155,23 @@ can reverse the sign of the derivative.
 ```{exercise}
 :label: km_ex4
 
-**Bayesian learning with misspecified models.**  The convergence theorem assumes the true
+{prf:ref}`ime_theorem_bayesian_convergence`
+assumes the true
 distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
 $h(\bar\lambda) > 0$).  Investigate what happens when the true model is **not** in the
 prior support.
 
-(a) Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
+1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
     places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$.
-    Plot the posterior weight on each model over time.
 
-(b) Show that the **predictive** (mixture) price distribution converges to the *closest*
+    - Plot the posterior weight on each model over time.
+
+2. Show that the **predictive** (mixture) price distribution converges to the *closest*
     model in KL divergence terms—which by symmetry is the equal mixture, with mean 2.0.
-    Verify this numerically by computing the predictive mean over time.
 
-(c) Relate this finding to the Bayesian consistency literature: when is the limit
+    - Verify this numerically by computing the predictive mean over time.
+
+3. Relate this finding to the Bayesian consistency literature: when is the limit
     distribution a good approximation to the true distribution even under misspecification?
 ```
 
@@ -1153,11 +1218,11 @@ for ax, k, label in zip(axes, [0, 1], [r"$N(1.5, \sigma^2)$", r"$N(2.5, \sigma^2
     for path in h_misspec:
         ax.plot(t_grid, path[:, k], alpha=0.2, lw=0.8, color="steelblue")
     ax.plot(t_grid, np.median(h_misspec[:, :, k], axis=0),
-            color="navy", lw=2.5, label="Median")
+            color="navy", lw=2, label="median")
     ax.axhline(0.5, color="crimson", lw=1.5, ls="--", label="0.5 (symmetric limit)")
     ax.set_title(f"Posterior weight on {label}", fontsize=11)
-    ax.set_xlabel("Period $t$", fontsize=11)
-    ax.set_ylabel("Posterior weight", fontsize=11)
+    ax.set_xlabel("period $t$", fontsize=11)
+    ax.set_ylabel("posterior weight", fontsize=11)
     ax.legend(fontsize=9)
     ax.grid(alpha=0.3)
 
@@ -1174,12 +1239,15 @@ print("(Symmetry implies equal weight on 1.5 and 2.5 → predictive mean = 2.0)"
 ```
 
 By symmetry, the two wrong models are equidistant from the true distribution in KL
-divergence. The posterior therefore converges to the 50-50 mixture, and the predictive mean
+divergence. 
+
+The posterior therefore converges to the 50-50 mixture, and the predictive mean
 converges to $0.5 \times 1.5 + 0.5 \times 2.5 = 2.0$—coinciding with the true mean
-despite misspecification.  This is an instance of the general result that under
+despite misspecification. 
+
+This is an instance of the general result that under
 misspecification, Bayesian posteriors converge to the distribution in the model class that
 minimizes KL divergence from the model actually generating the data.
 
 ```{solution-end}
 ```
-
diff --git a/lectures/lagrangian_lqdp.md b/lectures/lagrangian_lqdp.md
index f1e680cc6..5cce10050 100644
--- a/lectures/lagrangian_lqdp.md
+++ b/lectures/lagrangian_lqdp.md
@@ -451,11 +451,16 @@ solves. See {cite}`Ljungqvist2012`,  ch 12.
 
 ## Application
 
-Here we demonstrate the computation with an example which is the deterministic version of an example borrowed from this [quantecon lecture](https://python.quantecon.org/lqcontrol.html).
+Here we demonstrate the computation with the deterministic permanent-income example from this [quantecon lecture](https://python.quantecon.org/lqcontrol.html).
+
+Because that model is discounted, we apply the invariant-subspace method to the
+equivalent **undiscounted** system obtained from the transformed matrices
+$\hat A = \beta^{1/2} A$ and $\hat B = \beta^{1/2} B$.
 
 ```{code-cell} ipython3
 # Model parameters
 r = 0.05
+β = 1 / (1 + r)
 c_bar = 2
 μ = 1
 
@@ -468,7 +473,7 @@ B = [[-1],
      [0]]
 
 # Construct an LQ instance
-lq = LQ(Q, R, A, B)
+lq = LQ(Q, R, A, B, beta=β)
 ```
 
 Given matrices $A$, $B$, $Q$, $R$, we can then compute $L$, $N$, and $M=L^{-1}N$.
@@ -476,7 +481,7 @@ Given matrices $A$, $B$, $Q$, $R$, we can then compute $L$, $N$, and $M=L^{-1}N$
 ```{code-cell} ipython3
 def construct_LNM(A, B, Q, R):
 
-    n, k = lq.n, lq.k
+    n = A.shape[0]
 
     # construct L and N
     L = np.zeros((2*n, 2*n))
@@ -496,7 +501,10 @@ def construct_LNM(A, B, Q, R):
 ```
 
 ```{code-cell} ipython3
-L, N, M = construct_LNM(lq.A, lq.B, lq.Q, lq.R)
+A_bar = lq.A * lq.beta ** (1/2)
+B_bar = lq.B * lq.beta ** (1/2)
+
+L, N, M = construct_LNM(A_bar, B_bar, lq.Q, lq.R)
 ```
 
 ```{code-cell} ipython3
@@ -517,7 +525,7 @@ M @ J @ M.T - J
 We can compute the eigenvalues of $M$ using `np.linalg.eigvals`, arranged in ascending order.
 
 ```{code-cell} ipython3
-eigvals = sorted(np.linalg.eigvals(M))
+eigvals = sorted(np.linalg.eigvals(M), key=lambda z: (abs(z), z.real, z.imag))
 eigvals
 ```
 
@@ -529,18 +537,14 @@ When we apply Schur decomposition such that $M=V W V^{-1}$, we want
 To get what we want, let's define a sorting function that tells `scipy.schur` to sort the corresponding eigenvalues with modulus smaller than 1 to the upper left.
 
 ```{code-cell} ipython3
-stable_eigvals = eigvals[:n]
+tol = 1e-10
 
 def sort_fun(x):
-    "Sort the eigenvalues with modules smaller than 1 to the top-left."
-
-    if x in stable_eigvals:
-        stable_eigvals.pop(stable_eigvals.index(x))
-        return True
-    else:
-        return False
+    "Sort the eigenvalues with modulus smaller than 1 to the top-left."
+    return abs(x) < 1 - tol
 
-W, V, _ = schur(M, sort=sort_fun)
+W, V, stable_dim = schur(M, sort=sort_fun)
+stable_dim
 ```
 
 ```{code-cell} ipython3
@@ -584,25 +588,24 @@ def stable_solution(M, verbose=True):
         The matrix represents the linear difference equations system.
     """
     n = M.shape[0] // 2
-    stable_eigvals = list(sorted(np.linalg.eigvals(M))[:n])
+    tol = 1e-10
 
     def sort_fun(x):
-        "Sort the eigenvalues with modules smaller than 1 to the top-left."
-
-        if x in stable_eigvals:
-            stable_eigvals.pop(stable_eigvals.index(x))
-            return True
-        else:
-            return False
-
-    W, V, _ = schur(M, sort=sort_fun)
+        "Sort the eigenvalues with modulus smaller than 1 to the top-left."
+        return abs(x) < 1 - tol
+
+    W, V, stable_dim = schur(M, sort=sort_fun)
+    if stable_dim != n:
+        raise ValueError(
+    f"Expected {n} stable eigenvalues inside the unit circle, found {stable_dim}."
+    )
     if verbose:
         print('eigenvalues:\n')
         print('    W11: {}'.format(np.diag(W[:n, :n])))
         print('    W22: {}'.format(np.diag(W[n:, n:])))
 
-    # compute V21 V11^{-1}
-    P = V[n:, :n] @ np.linalg.inv(V[:n, :n])
+    # compute V21 V11^{-1} without forming the inverse explicitly
+    P = np.linalg.solve(V[:n, :n].T, V[n:, :n].T).T
 
     return W, V, P
 
@@ -761,11 +764,6 @@ For example, when $\beta=\frac{1}{1+r}$, we can solve for $P$ with $\hat{A}=\bet
 
 These settings are adopted by default in the function `stationary_P` defined above.
 
-```{code-cell} ipython3
-β = 1 / (1 + r)
-lq.beta = β
-```
-
 ```{code-cell} ipython3
 stationary_P(lq)
 ```
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index e1353ec0b..2b0d292df 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -37,7 +37,7 @@ In this lecture, you will learn formulas for
 * marginal distributions for all subvectors of $x$
 * conditional distributions for subvectors of $x$ conditional on other subvectors of $x$
 
-We will use  the multivariate normal distribution to formulate some useful models:
+We will use the multivariate normal distribution to formulate some useful models:
 
 * a factor analytic model of an intelligence quotient, i.e., IQ
 * a factor analytic model of two independent inherent abilities, say, mathematical and verbal.
@@ -46,7 +46,7 @@ We will use  the multivariate normal distribution to formulate some useful model
 * time series generated by linear stochastic difference equations
 * optimal linear filtering theory
 
-## The Multivariate Normal Distribution
+## The multivariate normal distribution
 
 This lecture defines a Python class `MultivariateNormal` to be used
 to generate **marginal** and **conditional** distributions associated
@@ -60,7 +60,7 @@ For a multivariate normal distribution it is very convenient that
 
 We apply our Python class to some examples.
 
-We  use the following imports:
+We use the following imports:
 
 ```{code-cell} ipython3
 import matplotlib.pyplot as plt
@@ -75,11 +75,11 @@ multivariate normal probability density.
 This means that the probability density takes the form
 
 $$
-f\left(z;\mu,\Sigma\right)=\left(2\pi\right)^{-\left(\frac{N}{2}\right)}\det\left(\Sigma\right)^{-\frac{1}{2}}\exp\left(-.5\left(z-\mu\right)^{\prime}\Sigma^{-1}\left(z-\mu\right)\right)
+f\left(z;\mu,\Sigma\right)=\left(2\pi\right)^{-\left(\frac{N}{2}\right)}\det\left(\Sigma\right)^{-\frac{1}{2}}\exp\left(-.5\left(z-\mu\right)^\top\Sigma^{-1}\left(z-\mu\right)\right)
 $$
 
 where $\mu=Ez$ is the mean of the random vector $z$ and
-$\Sigma=E\left(z-\mu\right)\left(z-\mu\right)^\prime$ is the
+$\Sigma=E\left(z-\mu\right)\left(z-\mu\right)^\top$ is the
 covariance matrix of $z$.
 
 The covariance matrix $\Sigma$ is symmetric and positive definite.
@@ -157,7 +157,7 @@ $$
 and covariance matrix
 
 $$
-\hat{\Sigma}_{11}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}=\Sigma_{11}-\beta\Sigma_{22}\beta^{\prime}
+\hat{\Sigma}_{11}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}=\Sigma_{11}-\beta\Sigma_{22}\beta^\top
 $$
 
 where
@@ -264,7 +264,7 @@ squares regressions.
 We’ll compare those linear least squares regressions for the simulated
 data to their population counterparts.
 
-## Bivariate Example
+## Bivariate example
 
 We start with a bivariate normal distribution pinned down by
 
@@ -298,7 +298,7 @@ Let's illustrate the fact that you _can regress anything on anything else_.
 
 We have computed everything we need to compute two regression lines, one of $z_2$ on $z_1$, the other of $z_1$ on $z_2$.
 
-We'll represent  these regressions as
+We'll represent these regressions as
 
 $$
 z_1 = a_1 + b_1 z_2 + \epsilon_1
@@ -322,7 +322,7 @@ $$
 E \epsilon_2 z_1 = 0
 $$
 
-Let's  compute $a_1, a_2, b_1, b_2$.
+Let's compute $a_1, a_2, b_1, b_2$.
 
 ```{code-cell} python3
 
@@ -358,7 +358,12 @@ Now let's plot the two regression lines and stare at them.
 
 
 ```{code-cell} python3
-
+---
+mystnb:
+  figure:
+    caption: two regressions
+    name: fig-two-regressions
+---
 z2 = np.linspace(-4,4,100)
 
 
@@ -385,14 +390,13 @@ ax.xaxis.set_ticks_position('bottom')
 ax.yaxis.set_ticks_position('left')
 plt.ylabel('$z_1$', loc = 'top')
 plt.xlabel('$z_2$,', loc = 'right')
-plt.title('two regressions')
 plt.plot(z2,z1, 'r', label = "$z_1$ on $z_2$")
 plt.plot(z2,z1h, 'b', label = "$z_2$ on $z_1$")
 plt.legend()
 plt.show()
 ```
 
-The red line is the  expectation of $z_1$ conditional on $z_2$.
+The red line is the expectation of $z_1$ conditional on $z_2$.
 
 The intercept and slope of the red line are
 
@@ -412,7 +416,7 @@ print("1/b2 = ", 1/b2)
 
 We can use these regression lines or our code to compute conditional expectations.
 
-Let's  compute the mean and variance of the distribution of $z_2$
+Let's compute the mean and variance of the distribution of $z_2$
 conditional on $z_1=5$.
 
 After that we'll reverse what are on the left and right sides of the regression.
@@ -504,9 +508,9 @@ Thus, in each case, for our very large sample size, the sample analogues
 closely approximate their population counterparts.
 
 A Law of Large
-Numbers explains why  sample analogues approximate  population objects.
+Numbers explains why sample analogues approximate population objects.
 
-## Trivariate Example
+## Trivariate example
 
 Let’s apply our code to a trivariate example.
 
@@ -570,7 +574,7 @@ multi_normal.βs[0], results.params
 Once again, sample analogues do a good job of approximating their
 populations counterparts.
 
-## One Dimensional Intelligence (IQ)
+## One dimensional intelligence (IQ)
 
 Let’s move closer to a real-life example, namely, inferring a
 one-dimensional measure of intelligence called IQ from a list of test
@@ -725,7 +729,7 @@ conditional normal distribution of the IQ $\theta$.
 
 In the following code, `ind` sets the variables on the right side of the regression.
 
-Given the way we have defined the vector $X$, we want  to set `ind=1` in order to make $\theta$ the left side variable in the
+Given the way we have defined the vector $X$, we want to set `ind=1` in order to make $\theta$ the left side variable in the
 population regression.
 
 ```{code-cell} python3
@@ -811,9 +815,9 @@ Thus, each $y_{i}$ adds information about $\theta$.
 
 If we were to drive the number of tests $n \rightarrow + \infty$, the
 conditional standard deviation $\hat{\sigma}_{\theta}$ would
-converge to $0$ at  rate $\frac{1}{n^{.5}}$.
+converge to $0$ at rate $\frac{1}{n^{.5}}$.
 
-## Information as Surprise
+## Information as surprise
 
 By using a different representation, let’s look at things from a
 different perspective.
@@ -828,13 +832,13 @@ where $C$ is a lower triangular **Cholesky factor** of
 $\Sigma$ so that
 
 $$
-\Sigma \equiv DD^{\prime} = C C^\prime
+\Sigma \equiv DD^\top = C C^\top
 $$
 
 and
 
 $$
-E \epsilon \epsilon' = I .
+E \epsilon \epsilon^\top = I .
 $$
 
 It follows that
@@ -928,13 +932,13 @@ np.max(np.abs(μθ_hat_arr - μθ_hat_arr_C)) < 1e-10
 np.max(np.abs(Σθ_hat_arr - Σθ_hat_arr_C)) < 1e-10
 ```
 
-## Cholesky Factor Magic
+## Cholesky factor magic
 
 Evidently, the Cholesky factorizations automatically computes the
 population  **regression coefficients** and associated statistics
 that are produced by our `MultivariateNormal` class.
 
-The Cholesky factorization  computes these things **recursively**.
+The Cholesky factorization computes these things **recursively**.
 
 Indeed, in formula {eq}`mnv_1`,
 
@@ -944,7 +948,7 @@ Indeed, in formula {eq}`mnv_1`,
 - the coefficient $c_i$ is the simple population regression
   coefficient of $\theta - \mu_\theta$ on $\epsilon_i$
 
-## Math and Verbal  Intelligence
+## Math and verbal intelligence
 
 We can alter the preceding example to be more realistic.
 
@@ -1098,7 +1102,7 @@ for indices, IQ, conditions in [([*range(2*n), 2*n], 'θ', 'y1, y2, y3, y4'),
 Evidently, math tests provide no information about $\eta$ and
 language tests provide no information about $\theta$.
 
-## Univariate Time Series Analysis
+## Univariate time series analysis
 
 We can use the multivariate normal distribution and a little matrix
 algebra to present foundations of univariate linear time series
@@ -1110,7 +1114,7 @@ Consider the following model:
 
 $$
 \begin{aligned}
-x_0 & \sim  N\left(0, \sigma_0^2\right) \\
+x_0 & \sim N\left(0, \sigma_0^2\right) \\
 x_{t+1} & = a x_{t} + b w_{t+1}, \quad w_{t+1} \sim N\left(0, 1\right), t \geq 0  \\
 y_{t} & = c x_{t} + d v_{t}, \quad v_{t} \sim N\left(0, 1\right), t \geq 0
 \end{aligned}
@@ -1166,7 +1170,7 @@ $c$ and $d$ as diagonal respectively.
 Consequently, the covariance matrix of $Y$ is
 
 $$
-\Sigma_{y} = E Y Y^{\prime} = C \Sigma_{x} C^{\prime} + D D^{\prime}
+\Sigma_{y} = E Y Y^\top = C \Sigma_{x} C^\top + D D^\top
 $$
 
 By stacking $X$ and $Y$, we can write
@@ -1181,8 +1185,8 @@ $$
 and
 
 $$
-\Sigma_{z} = EZZ^{\prime}=\left[\begin{array}{cc}
-\Sigma_{x} & \Sigma_{x}C^{\prime}\\
+\Sigma_{z} = EZZ^\top=\left[\begin{array}{cc}
+\Sigma_{x} & \Sigma_{x}C^\top\\
 C\Sigma_{x} & \Sigma_{y}
 \end{array}\right]
 $$
@@ -1263,7 +1267,7 @@ x = z[:T+1]
 y = z[T+1:]
 ```
 
-### Smoothing Example
+### Smoothing example
 
 This is an instance of a classic `smoothing` calculation whose purpose
 is to compute $E X \mid Y$.
@@ -1297,7 +1301,7 @@ print(" E [ X | Y] = ", )
 multi_normal_ex1.cond_dist(0, y)
 ```
 
-### Filtering Exercise
+### Filtering exercise
 
 Compute $E\left[x_{t} \mid y_{t-1}, y_{t-2}, \dots, y_{0}\right]$.
 
@@ -1340,7 +1344,7 @@ sub_y = y[:t]
 multi_normal_ex2.cond_dist(0, sub_y)
 ```
 
-### Prediction Exercise
+### Prediction exercise
 
 Compute $E\left[y_{t} \mid y_{t-j}, \dots, y_{0} \right]$.
 
@@ -1380,10 +1384,10 @@ sub_y = y[:t-j+1]
 multi_normal_ex3.cond_dist(0, sub_y)
 ```
 
-### Constructing a Wold Representation
+### Constructing a Wold representation
 
 Now we’ll apply Cholesky decomposition to decompose
-$\Sigma_{y}=H H^{\prime}$ and form
+$\Sigma_{y}=H H^\top$ and form
 
 $$
 \epsilon = H^{-1} Y.
@@ -1414,7 +1418,7 @@ y
 This example is an instance of what is known as a **Wold representation** in time series analysis.
 
 
-## Stochastic Difference Equation
+## Stochastic difference equation
 
 Consider the stochastic second-order linear difference equation
 
@@ -1476,8 +1480,8 @@ We have
 $$
 \begin{aligned}
 \mu_{y} = A^{-1} \mu_{b} \\
-\Sigma_{y} &= A^{-1} E \left[\left(b - \mu_{b} + u \right) \left(b - \mu_{b} + u \right)^{\prime}\right] \left(A^{-1}\right)^{\prime} \\
-           &= A^{-1} \left(\Sigma_{b} + \Sigma_{u} \right) \left(A^{-1}\right)^{\prime}
+\Sigma_{y} &= A^{-1} E \left[\left(b - \mu_{b} + u \right) \left(b - \mu_{b} + u \right)^\top\right] \left(A^{-1}\right)^\top \\
+           &= A^{-1} \left(\Sigma_{b} + \Sigma_{u} \right) \left(A^{-1}\right)^\top
 \end{aligned}
 $$
 
@@ -1495,7 +1499,7 @@ $$
 
 $$
 \Sigma_{b}=\left[\begin{array}{cc}
-C\Sigma_{\tilde{y}}C^{\prime} & \boldsymbol{0}_{N-2\times N-2}\\
+C\Sigma_{\tilde{y}}C^\top & \boldsymbol{0}_{N-2\times N-2}\\
 \boldsymbol{0}_{N-2\times2} & \boldsymbol{0}_{N-2\times N-2}
 \end{array}\right],\quad C=\left[\begin{array}{cc}
 \alpha_{2} & \alpha_{1}\\
@@ -1531,7 +1535,7 @@ T = 160
 ```
 
 ```{code-cell} python3
-# construct A and A^{\prime}
+# construct A and A^\top
 A = np.zeros((T, T))
 
 for i in range(T):
@@ -1567,7 +1571,7 @@ C = np.array([[𝛼2, 𝛼1], [0, 𝛼2]])
 Σy = A_inv @ (Σb + Σu) @ A_inv.T
 ```
 
-## Application to Stock Price Model
+## Application to stock price model
 
 Let
 
@@ -1604,7 +1608,7 @@ we have
 $$
 \begin{aligned}
 \mu_{p} = B \mu_{y} \\
-\Sigma_{p} = B \Sigma_{y} B^{\prime}
+\Sigma_{p} = B \Sigma_{y} B^\top
 \end{aligned}
 $$
 
@@ -1641,7 +1645,7 @@ $$
 $$
 
 $$
-\Sigma_{z}=D\Sigma_{y}D^{\prime}
+\Sigma_{z}=D\Sigma_{y}D^\top
 $$
 
 ```{code-cell} python3
@@ -1695,7 +1699,7 @@ be if people did not have perfect foresight but were optimally
 predicting future dividends on the basis of the information
 $y_t, y_{t-1}$ at time $t$.
 
-## Filtering Foundations
+## Filtering foundations
 
 Assume that $x_0$ is an $n \times 1$ random vector and that
 $y_0$ is a $p \times 1$ random vector determined by the
@@ -1713,7 +1717,7 @@ We consider the problem of someone who
 
 * *observes* $y_0$
 *  does not observe $x_0$,
-*  knows $\hat x_0, \Sigma_0, G, R$ and therefore  the joint probability distribution of the vector $\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$
+* knows $\hat x_0, \Sigma_0, G, R$ and therefore the joint probability distribution of the vector $\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$
 * wants to infer $x_0$ from $y_0$ in light of what he knows about that
 joint probability distribution.
 
@@ -1730,7 +1734,7 @@ $$
                           G \Sigma_0 & G \Sigma_0 G' + R \end{bmatrix}
 $$
 
-By applying an appropriate instance of the above formulas for the  mean vector $\hat \mu_1$ and covariance matrix
+By applying an appropriate instance of the above formulas for the mean vector $\hat \mu_1$ and covariance matrix
 $\hat \Sigma_{11}$ of $z_1$ conditional on $z_2$, we find that the probability distribution of
 $x_0$ conditional on $y_0$ is
 ${\mathcal N}(\tilde x_0, \tilde \Sigma_0)$ where
@@ -1860,7 +1864,7 @@ $$
 \Sigma_{t+1}= C C' + A \Sigma_t A' - A \Sigma_t G' (G \Sigma_t G' +R)^{-1} G \Sigma_t A' .
 $$
 
-This is a matrix Riccati difference equation that is closely related to another matrix Riccati difference equation that appears in  a quantecon lecture on the basics of linear quadratic control theory.
+This is a matrix Riccati difference equation that is closely related to another matrix Riccati difference equation that appears in a quantecon lecture on the basics of linear quadratic control theory.
 
 That equation has the form
 
@@ -1876,7 +1880,7 @@ P_{t-1} =R + A' P_t A  - A' P_t B
 Stare at the two preceding equations for a moment or two, the first being a matrix difference equation for a conditional covariance matrix, the
 second being a matrix difference equation in the matrix appearing in a quadratic form for an intertemporal cost of value function.
 
-Although the  two equations are not identical, they display striking family resemblences.
+Although the two equations are not identical, they display striking family resemblances.
 
 * the first equation tells dynamics that work **forward**  in time
 * the second equation tells dynamics that work  **backward** in time
@@ -1931,7 +1935,7 @@ x1_cond = A @ μ1_hat
 x1_cond, Σ1_cond
 ```
 
-### Code for Iterating
+### Code for iterating
 
 Here is code for solving a dynamic filtering problem by iterating on our
 equations, followed by an example.
@@ -1972,10 +1976,10 @@ iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2])
 
 The iterative algorithm just described is a version of the celebrated **Kalman filter**.
 
-We describe the Kalman filter  and some applications of it in {doc}`A First Look at the Kalman Filter <kalman>`
+We describe the Kalman filter and some applications of it in {doc}`A First Look at the Kalman Filter <kalman>`
 
 
-## Classic Factor Analysis Model
+## Classic factor analysis model
 
 The factor analysis model widely used in psychology and other fields can
 be represented as
@@ -1987,11 +1991,11 @@ $$
 where
 
 1. $Y$ is $n \times 1$ random vector,
-   $E U U^{\prime} = D$ is a diagonal matrix,
+   $E U U^\top = D$ is a diagonal matrix,
 1. $\Lambda$ is $n \times k$ coefficient matrix,
 1. $f$ is $k \times 1$ random vector,
-   $E f f^{\prime} = I$,
-1. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f' = 0 $ )
+   $E f f^\top = I$,
+1. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f^\top = 0 $ )
 1. It is presumed that $k$ is small relative to $n$; often
    $k$ is only $1$ or $2$, as in our IQ examples.
 
@@ -1999,15 +2003,15 @@ This implies that
 
 $$
 \begin{aligned}
-\Sigma_y = E Y Y^{\prime} = \Lambda \Lambda^{\prime} + D \\
-E Y f^{\prime} = \Lambda \\
-E f Y^{\prime} = \Lambda^{\prime}
+\Sigma_y = E Y Y^\top = \Lambda \Lambda^\top + D \\
+E Y f^\top = \Lambda \\
+E f Y^\top = \Lambda^\top
 \end{aligned}
 $$
 
 Thus, the covariance matrix $\Sigma_Y$ is the sum of a diagonal
 matrix $D$ and a positive semi-definite matrix
-$\Lambda \Lambda^{\prime}$ of rank $k$.
+$\Lambda \Lambda^\top$ of rank $k$.
 
 This means that all covariances among the $n$ components of the
 $Y$ vector are intermediated by their common dependencies on the
@@ -2026,9 +2030,9 @@ the covariance matrix of the expanded random vector $Z$ can be
 computed as
 
 $$
-\Sigma_{z} = EZZ^{\prime}=\left(\begin{array}{cc}
-I & \Lambda^{\prime}\\
-\Lambda & \Lambda\Lambda^{\prime}+D
+\Sigma_{z} = EZZ^\top=\left(\begin{array}{cc}
+I & \Lambda^\top\\
+\Lambda & \Lambda\Lambda^\top+D
 \end{array}\right)
 $$
 
@@ -2115,7 +2119,7 @@ multi_normal_factor.cond_dist(0, y)
 
 We can verify that the conditional mean
 $E \left[f \mid Y=y\right] = B Y$ where
-$B = \Lambda^{\prime} \Sigma_{y}^{-1}$.
+$B = \Lambda^\top \Sigma_{y}^{-1}$.
 
 ```{code-cell} python3
 B = Λ.T @ np.linalg.inv(Σy)
@@ -2136,7 +2140,7 @@ $\Lambda I^{-1} f = \Lambda f$.
 Λ @ f
 ```
 
-## PCA and Factor Analysis
+## PCA and factor analysis
 
 To learn about Principal Components Analysis (PCA), please see this lecture {doc}`Singular Value Decompositions <svd_intro>`.
 
@@ -2158,7 +2162,7 @@ governs the data on $Y$ we have generated.
 So we compute the PCA decomposition
 
 $$
-\Sigma_{y} = P \tilde{\Lambda} P^{\prime}
+\Sigma_{y} = P \tilde{\Lambda} P^\top
 $$
 
 where $\tilde{\Lambda}$ is a diagonal matrix.
@@ -2172,7 +2176,7 @@ $$
 and
 
 $$
-\epsilon = P^\prime Y
+\epsilon = P^\top Y
 $$
 
 Note that we will arrange the eigenvectors in $P$ in the
@@ -2319,14 +2323,14 @@ $$
 
 fix $z_2 = 2$.
 
-(a) Use `MultivariateNormal` to compute the analytical conditional mean
+1. Use `MultivariateNormal` to compute the analytical conditional mean
 $\hat{\mu}_1$ and variance $\hat{\Sigma}_{11}$ of $z_1 \mid z_2 = 2$.
 
-(b) Draw $10^6$ samples from the joint distribution. Retain only those
+1. Draw $10^6$ samples from the joint distribution. Retain only those
 for which $|z_2 - 2| < 0.05$. Compute the sample mean and variance of
 the retained $z_1$ values.
 
-(c) Confirm that the sample estimates are close to the analytical values.
+1. Confirm that the sample estimates are close to the analytical values.
 ```
 
 ```{solution-start} mv_normal_ex1
@@ -2411,12 +2415,12 @@ for rho in [0.2, 0.5, 0.9]:
 Using the one-dimensional IQ model with $n = 50$ test scores and
 $\mu_\theta = 100$, $\sigma_\theta = 10$:
 
-(a) Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
+1. Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
 For each value, plot the posterior standard deviation
 $\hat{\sigma}_\theta$ as a function of the number of test scores
 included (from 1 to 50), with all curves on the same axes.
 
-(b) Explain intuitively why a larger $\sigma_y$ leads to a slower
+1. Explain intuitively why a larger $\sigma_y$ leads to a slower
 decline of posterior uncertainty.
 ```
 
@@ -2464,12 +2468,12 @@ $\theta$ exactly.
 Using the one-dimensional IQ model with $n = 20$ test scores and
 $\mu_\theta = 100$, $\sigma_y = 10$:
 
-(a) Fix $\sigma_y = 10$ and vary the prior spread
+1. Fix $\sigma_y = 10$ and vary the prior spread
 $\sigma_\theta \in \{1, 5, 10, 50, 500\}$. For each value compute the
 posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
 scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
 
-(b) Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
+1. Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
 the posterior mean converges to the sample mean $\bar{y}$, and as
 $\sigma_y \to \infty$ the posterior mean converges to the prior mean
 $\mu_\theta$.
@@ -2534,12 +2538,12 @@ $$
 
 and initial conditions $\hat{x}_0 = [0, 0]'$, $\Sigma_0 = I_2$:
 
-(a) Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
+1. Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
 
-(b) Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
+1. Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
 $\Sigma_t[1,1]$ over time. Verify that they converge to a steady state.
 
-(c) Plot the filtered state estimates $\hat{x}_t[0]$ together with the
+1. Plot the filtered state estimates $\hat{x}_t[0]$ together with the
 true $x_t[0]$ and the raw observations $y_t$ on a single figure.
 ```
 
@@ -2601,16 +2605,16 @@ plt.show()
 In the classic factor analysis model at the end of the lecture the true
 covariance is $\Sigma_y = \Lambda \Lambda' + D$.
 
-(a) Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
+1. Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
 variance explained by the first two principal components and compare
 it with the $\sigma_u = 0.5$ result. Explain the change.
 
-(b) Show that the conditional expectation $E[f \mid Y] = BY$ with
-$B = \Lambda' \Sigma_y^{-1}$ is **not** equal to the two-component PCA
+1. Show that the conditional expectation $E[f \mid Y] = BY$ with
+$B = \Lambda^\top \Sigma_y^{-1}$ is **not** equal to the two-component PCA
 projection $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$. Plot both on the same
 axes.
 
-(c) In one or two sentences, explain why PCA is misspecified for
+1. In one or two sentences, explain why PCA is misspecified for
 factor-analytic data.
 ```
 
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index 3708b9599..8cbb50f40 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -17,7 +17,7 @@ kernelspec:
 
 This lecture uses matrix algebra to illustrate some basic ideas about probability theory.
 
-After introducing  underlying objects, we'll use matrices and vectors to describe probability distributions.
+After introducing underlying objects, we'll use matrices and vectors to describe probability distributions.
 
 Among concepts that we'll be studying include
 
@@ -29,13 +29,13 @@ Among concepts that we'll be studying include
     - couplings
     - copulas
 - the probability distribution of a sum of two independent random variables
-    - convolution of  marginal distributions
+    - convolution of marginal distributions
 - parameters that define a probability distribution
 - sufficient statistics as data summaries
 
 We'll use a matrix to represent a bivariate or multivariate probability distribution and a vector to represent a univariate probability distribution
 
-This {doc}`companion lecture <stats_examples>` describes some popular probability distributions and describes how to  use Python to sample from them. 
+This {doc}`companion lecture <stats_examples>` describes some popular probability distributions and describes how to use Python to sample from them.
 
 
 In addition to what's in Anaconda, this lecture will need the following libraries:
@@ -59,14 +59,14 @@ set_matplotlib_formats('retina')
 ```
 
 
-## Sketch of Basic Concepts
+## Sketch of basic concepts
 
 We'll briefly define what we mean by a **probability space**, a **probability measure**, and a **random variable**.
 
 For most of this lecture, we sweep these objects into the background
  
 ```{note}
-Nevertheless, they'll be lurking beneath  **induced distributions** of random variables that  we'll  focus on here. These deeper objects are essential for defining  and analysing  the concepts of stationarity and ergodicity that underly laws of large numbers.  For a relatively
+Nevertheless, they'll be lurking beneath **induced distributions** of random variables that we'll focus on here. These deeper objects are essential for defining and analysing the concepts of stationarity and ergodicity that underly laws of large numbers. For a relatively
 nontechnical presentation of some of these results see this chapter from Lars Peter Hansen and Thomas J. Sargent's online monograph titled "Risk, Uncertainty, and Values":<https://lphansen.github.io/QuantMFR/book/1_stochastic_processes.html>.
 ``` 
   
@@ -76,18 +76,18 @@ Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega
 
 Let $\mathcal{G} \subset \Omega$ be a subset of $\Omega$.
 
-Let $\mathcal{F}$ be a collection of such subsets  $\mathcal{G} \subset \Omega$.
+Let $\mathcal{F}$ be a collection of such subsets $\mathcal{G} \subset \Omega$.
 
-The pair $\Omega,\mathcal{F}$  forms our **probability space** on which we want to put a probability measure.
+The pair $\Omega,\mathcal{F}$ forms our **probability space** on which we want to put a probability measure.
 
-A **probability measure** $\mu$ maps a set of possible underlying outcomes  $\mathcal{G} \in \mathcal{F}$  into a scalar number between $0$ and $1$
+A **probability measure** $\mu$ maps a set of possible underlying outcomes $\mathcal{G} \in \mathcal{F}$ into a scalar number between $0$ and $1$
 
 - this is the "probability" that $X$ belongs to $A$, denoted by $ \textrm{Prob}\{X\in A\}$.
 
 A **random variable** $X(\omega)$ is a function of the underlying outcome $\omega \in \Omega$.
 
 
-The random variable $X(\omega)$  has a **probability distribution** that is induced by the underlying probability measure $\mu$ and the function
+The random variable $X(\omega)$ has a **probability distribution** that is induced by the underlying probability measure $\mu$ and the function
 $X(\omega)$:
 
 $$
@@ -98,34 +98,34 @@ where ${\mathcal G}$ is the subset of $\Omega$ for which $X(\omega) \in A$.
 
 We call this the induced probability distribution of random variable $X$.
 
-Instead of working explicitly with an underlying probability space $\Omega,\mathcal{F}$  and probability measure $\mu$,
-applied statisticians often proceed simply by specifying a form for an induced distribution for a random variable $X$. 
+Instead of working explicitly with an underlying probability space $\Omega,\mathcal{F}$ and probability measure $\mu$,
+applied statisticians often proceed simply by specifying a form for an induced distribution for a random variable $X$.
 
-That is how we'll proceed in this lecture and in many subsequent lectures. 
+That is how we'll proceed in this lecture and in many subsequent lectures.
 
 
-## What Does Probability Mean?
+## What does probability mean?
 
 Before diving in, we'll say a few words about what probability theory means and how it connects to statistics.
 
-We  also touch  on these topics in the quantecon lectures  <https://python.quantecon.org/prob_meaning.html> and <https://python.quantecon.org/navy_captain.html>.
+We also touch on these topics in {doc}`prob_meaning` and {doc}`navy_captain`.
 
-For much of this lecture we'll be discussing  fixed "population" probabilities.
+For much of this lecture we'll be discussing fixed "population" probabilities.
 
 These are purely mathematical objects.
 
 To appreciate how statisticians connect probabilities to data, the key is to understand the following concepts:
 
 * A single draw from a probability distribution
-* Repeated independently  and identically distributed (i.i.d.)  draws of "samples" or "realizations" from the same probability distribution
-* A **statistic** defined as a  function of a sequence of samples
-* An **empirical distribution** or **histogram** (a binned empirical distribution) that records observed  **relative frequencies**
-* The idea that a  population probability  distribution is  what we anticipate **relative frequencies** will be in a long sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by **anticipated relative frequencies**
+* Repeated independently and identically distributed (i.i.d.) draws of "samples" or "realizations" from the same probability distribution
+* A **statistic** defined as a function of a sequence of samples
+* An **empirical distribution** or **histogram** (a binned empirical distribution) that records observed **relative frequencies**
+* The idea that a population probability distribution is what we anticipate **relative frequencies** will be in a long sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by **anticipated relative frequencies**
      - **Law of Large Numbers (LLN)**
-     -  **Central Limit Theorem (CLT)**
+     - **Central Limit Theorem (CLT)**
 
 
-**Scalar example**
+#### Scalar example
 
 Let $X$ be a scalar random variable that takes on the $I$ possible values
 $0, 1, 2, \ldots, I-1$ with probabilities
@@ -147,12 +147,12 @@ $$
 
 as a short-hand way of saying that the random variable $X$ is described by the probability distribution $ \{{f_i}\}_{i=0}^{I-1}$.
 
-Consider drawing a  sample $x_0, x_1, \dots , x_{N-1}$ of  $N$ independent and identically distributoed  draws of $X$. 
+Consider drawing a sample $x_0, x_1, \dots , x_{N-1}$ of $N$ independent and identically distributed draws of $X$.
 
-What do the "identical" and "independent" mean in   IID or iid ("identically and independently distributed")?
+What do "identical" and "independent" mean in IID or iid ("identically and independently distributed")?
 
 - "identical" means that each draw is from the same distribution.
-- "independent" means that  joint distribution  equal  products of marginal distributions, i.e.,
+- "independent" means that the joint distribution equals the product of marginal distributions, i.e.,
 
 $$
 \begin{aligned}
@@ -161,9 +161,9 @@ $$
 \end{aligned}
 $$
 
-We define an  **empirical distribution** as follows.
+We define an **empirical distribution** as follows.
 
-For each $i  = 0,\dots,I-1$, let 
+For each $i = 0,\dots,I-1$, let
 
 $$
 \begin{aligned}
@@ -174,35 +174,30 @@ N & = \sum^{I-1}_{i=0} N_i \quad \text{total number of draws},\\
 $$
 
 
-Key concepts that connect probability theory with statistics are laws of large numbers and central limit theorems
+Key concepts that connect probability theory with statistics are laws of large numbers and central limit theorems.
 
-**LLN:**
+A Law of Large Numbers (LLN) states that $\tilde {f_i} \to f_i$ as $N \to \infty$.
 
-- A Law of Large Numbers (LLN) states that $\tilde {f_i} \to f_i \text{ as } N \to \infty$
+A Central Limit Theorem (CLT) describes a **rate** at which $\tilde {f_i} \to f_i$.
 
-**CLT:**
+See {doc}`lln_clt` for a detailed treatment of both results.
 
-- A Central Limit Theorem (CLT) describes a  **rate** at which $\tilde {f_i} \to f_i$
+For "frequentist" statisticians, **anticipated relative frequency** is **all** that a probability distribution means.
 
+But for a Bayesian it means something else -- something partly subjective and purely personal.
 
-**Remarks**
+We say "partly" because a Bayesian also pays attention to relative frequencies.
 
-- For "frequentist" statisticians, **anticipated relative frequency**  is **all** that a probability distribution means.
 
-- But for a Bayesian it means something else -- something partly  subjective and purely personal.
-     
-     * we say "partly" because a Bayesian also pays attention to relative frequencies 
+## Representing probability distributions
 
-
-## Representing  Probability Distributions
-
-A  probability distribution $\textrm{Prob} (X \in A)$ can  be described by its **cumulative distribution function (CDF)**
+A probability distribution $\textrm{Prob} (X \in A)$ can be described by its **cumulative distribution function (CDF)**
 
 $$
 F_{X}(x) = \textrm{Prob}\{X\leq x\}.
 $$
 
-Sometimes, but not always, a random variable can also be described by  **density function** $f(x)$
+Sometimes, but not always, a random variable can also be described by a **density function** $f(x)$
 that is related to its CDF by
 
 $$
@@ -215,7 +210,7 @@ $$
 
 Here $B$ is a set of possible $X$'s whose probability of occurring we want to compute.
 
-When a probability density exists, a probability distribution can be characterized either by its CDF or by its  density.
+When a probability density exists, a probability distribution can be characterized either by its CDF or by its density.
 
 For a **discrete-valued** random variable
 
@@ -231,7 +226,7 @@ Doing this enables us to confine our tool set basically to linear algebra.
 Later we'll briefly discuss how to approximate a continuous random variable with a discrete random variable.
 
 
-## Univariate Probability Distributions
+## Univariate probability distributions
 
 We'll devote most of this lecture to discrete-valued random variables, but we'll say a few things
 about continuous-valued random variables.
@@ -281,15 +276,19 @@ $$
 where $\theta $ is a vector of parameters that is of much smaller dimension than $I$.
 
 
-**Remarks:**
+A **statistical model** is a joint probability distribution characterized by a list of **parameters**.
+
+The concept of **parameter** is intimately related to the notion of **sufficient statistic**.
+
+A **statistic** is a nonlinear function of a data set.
+
+**Sufficient statistics** summarize all **information** that a data set contains about parameters of a statistical model.
+
+Note that a sufficient statistic corresponds to a particular statistical model.
+
+Sufficient statistics are key tools that AI uses to summarize or compress a **big data** set.
 
-- A **statistical model** is a joint probability distribution characterized by a list of **parameters** 
-- The concept of  **parameter** is intimately related to the notion of  **sufficient statistic**.
-- A **statistic** is a   nonlinear function of a data set.
-- **Sufficient statistics**  summarize all  **information** that a  data set contains  about  parameters of statistical model.
-   * Note that a sufficient statistic corresponds to a particular statistical model. 
-   * Sufficient statistics are key  tools that AI uses to summarize or compress  a **big data** set.
--  R. A. Fisher provided a rigorous definition of **information** -- see <https://en.wikipedia.org/wiki/Fisher_information>
+R. A. Fisher provided a rigorous definition of **information** -- see <https://en.wikipedia.org/wiki/Fisher_information>.
 
 
 
@@ -323,7 +322,7 @@ $$
 \textrm{Prob}\{X\in \tilde{X}\} =1
 $$
 
-## Bivariate Probability Distributions
+## Bivariate probability distributions
 
 We'll now discuss a bivariate **joint distribution**.
 
@@ -357,7 +356,7 @@ $$
 \sum_{i}\sum_{j}f_{ij}=1
 $$
 
-## Marginal Probability Distributions
+## Marginal probability distributions
 
 The joint distribution induce marginal distributions
 
@@ -391,7 +390,7 @@ $$
 \end{aligned}
 $$
 
-**Digression:** If two random variables $X,Y$ are continuous and have joint density $f(x,y)$, then marginal distributions can be computed by
+As a digression, if two random variables $X,Y$ are continuous and have joint density $f(x,y)$, then marginal distributions can be computed by
 
 $$
 \begin{aligned}
@@ -400,7 +399,7 @@ f(y)& = \int_{\mathbb{R}} f(x,y) dx
 \end{aligned}
 $$
 
-## Conditional Probability  Distributions
+## Conditional probability distributions
 
 Conditional probabilities are defined according to
 
@@ -426,7 +425,7 @@ $$
 =\frac{ \sum_{i}f_{ij} }{ \sum_{i}f_{ij}}=1
 $$
 
-**Remark:** The mathematics  of conditional probability  implies:
+The mathematics of conditional probability implies:
 
 $$
 \textrm{Prob}\{X=i|Y=j\}	=\frac{\textrm{Prob}\{X=i,Y=j\}}{\textrm{Prob}\{Y=j\}}=\frac{\textrm{Prob}\{Y=j|X=i\}\textrm{Prob}\{X=i\}}{\textrm{Prob}\{Y=j\}}
@@ -446,7 +445,7 @@ $$
 $$
 
 
-## Transition Probability Matrix
+## Transition probability matrix
 
 Consider the following joint probability distribution of  two random variables.
 
@@ -495,7 +494,7 @@ Note that
 
 
 
-## Application: Forecasting a Time Series
+## Application: forecasting a time series
 
 Suppose that there are two time periods.
 
@@ -519,11 +518,10 @@ A conditional distribution is
 
 $$\text{Prob} \{X(1)=j|X(0)=i\}= \frac{f_{ij}}{ \sum_{j}f_{ij}}$$
 
-**Remark:**
-- This formula is a workhorse for applied economic forecasters.
+This formula is a workhorse for applied economic forecasters.
 
 
-## Statistical Independence
+## Statistical independence
 
 Random variables X and Y are statistically **independent** if
 
@@ -550,7 +548,7 @@ $$
 $$
 
 
-## Means and Variances
+## Means and variances
 
 The  mean and variance of a discrete random variable $X$  are
 
@@ -571,7 +569,7 @@ $$
 \end{aligned}
 $$
 
-## Matrix Representations of Some Bivariate Distributions
+## Matrix representations of some bivariate distributions
 
 Let's use matrices to represent a joint distribution, conditional distribution, marginal distribution, and the mean and variance of a  bivariate random variable.
 
@@ -590,12 +588,9 @@ $$ \textrm{Prob}(X=i)=\sum_j{f_{ij}}=u_i  $$
 $$ \textrm{Prob}(Y=j)=\sum_i{f_{ij}}=v_j $$
 
 
-**Sampling:**
+Let's write some Python code that lets us draw some long samples and compute relative frequencies.
 
-Let's write some Python code that let's us  draw some long samples and compute relative frequencies.
-
-The code will let us  check whether  the "sampling" distribution agrees   with the "population" distribution - confirming that
-the population distribution correctly tells us the relative frequencies that we should expect in a large sample. 
+The code lets us check whether the "sampling" distribution agrees with the "population" distribution -- confirming that the population distribution correctly tells us the relative frequencies that we should expect in a large sample.
 
 
 
@@ -844,7 +839,7 @@ class discrete_bijoint:
 
 Let's apply our code to some examples.
 
-**Example 1**
+#### Example 1
 
 ```{code-cell} ipython3
 # joint
@@ -863,7 +858,7 @@ d.marg_dist()
 d.cond_dist()
 ```
 
-**Example 2**
+#### Example 2
 
 ```{code-cell} ipython3
 xs_new = np.array([10, 20, 30])
@@ -882,7 +877,7 @@ d_new.marg_dist()
 d_new.cond_dist()
 ```
 
-## A Continuous Bivariate Random Vector
+## A continuous bivariate random vector
 
 
 A two-dimensional Gaussian distribution has  joint density
@@ -929,9 +924,9 @@ y = np.linspace(-10, 10, 1_000)
 x_mesh, y_mesh = np.meshgrid(x, y, indexing="ij")
 ```
 
-**Joint Distribution**
+#### Joint distribution
 
-Let's  plot the **population** joint density.
+Let's plot the **population** joint density.
 
 ```{code-cell} ipython3
 # %matplotlib notebook
@@ -967,7 +962,7 @@ x = data[:, 0]
 y = data[:, 1]
 ```
 
-**Marginal distribution**
+#### Marginal distribution
 
 ```{code-cell} ipython3
 plt.hist(x, bins=1_000, alpha=0.6)
@@ -987,7 +982,7 @@ plt.hist(y_sim, bins=1_000, density=True, alpha=0.4, histtype="step")
 plt.show()
 ```
 
-**Conditional distribution**
+#### Conditional distribution
 
 For a bivariate normal population distribution, the conditional distributions are also normal:
 
@@ -1074,7 +1069,7 @@ print(μy, σy)
 print(μ2 + ρ * σ2 * (1 - μ1) / σ1, np.sqrt(σ2**2 * (1 - ρ**2)))
 ```
 
-## Sum of Two Independently Distributed Random Variables
+## Sum of two independently distributed random variables
 
 Let $X, Y$ be two independent discrete random variables that take values in $\bar{X}, \bar{Y}$, respectively.
 
@@ -1159,8 +1154,6 @@ $$
 
 Given two marginal distribution, $\mu$ for $X$ and $\nu$ for $Y$, a joint distribution $f_{ij}$ is said to be a **coupling** of $\mu$ and $\nu$.
 
-**Example:**
-
 Consider the following bivariate example.
 
 $$
@@ -1229,10 +1222,9 @@ But the joint distributions differ.
 
 Thus, multiple  joint distributions $[f_{ij}]$ can have  the same marginals.
 
-**Remark:**
-- Couplings  are important in optimal transport problems and in Markov processes. Please see this {doc}`lecture about optimal transport <opt_transport>`
+Couplings are important in optimal transport problems and in Markov processes. Please see this {doc}`lecture about optimal transport <opt_transport>`.
 
-## Copula Functions
+## Copula functions
 
 Suppose that $X_1, X_2, \dots, X_n$ are $N$ random variables  and that
 
@@ -1260,9 +1252,9 @@ Thus, for given marginal distributions, we can use  a copula function to determi
 
 Copula functions are often used to characterize **dependence** of  random variables.
 
-**Discrete marginal distribution**
+#### Discrete marginal distribution
 
-As mentioned above,  for two given marginal distributions there can be more than one coupling.
+As mentioned above, for two given marginal distributions there can be more than one coupling.
 
 For example, consider two  random variables $X, Y$ with distributions
 
@@ -1484,7 +1476,7 @@ We have verified that both joint distributions, $c_1$ and $c_2$, have identical
 
 So they are both couplings of $X$ and $Y$.
 
-**Gaussian Copula Example**
+### Gaussian copula example
 
 A **Gaussian copula** uses the bivariate normal distribution to induce dependence between
 arbitrary marginal distributions.
@@ -1498,6 +1490,12 @@ The construction has three steps:
 The following code illustrates this with exponential marginals.
 
 ```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: gaussian copula with exponential marginals
+    name: fig-gaussian-copula
+---
 from scipy import stats
 
 # Gaussian copula parameters
@@ -1521,11 +1519,9 @@ fig, axes = plt.subplots(1, 2, figsize=(10, 4))
 axes[0].scatter(u1[:3000], u2[:3000], alpha=0.2, s=2)
 axes[0].set_xlabel('$u_1$')
 axes[0].set_ylabel('$u_2$')
-axes[0].set_title(f'Copula (uniform marginals, ρ={rho_cop})')
 axes[1].scatter(x1[:3000], x2[:3000], alpha=0.2, s=2)
 axes[1].set_xlabel('$x_1$ (Exp, mean=1)')
 axes[1].set_ylabel('$x_2$ (Exp, mean=0.5)')
-axes[1].set_title('Exponential marginals via Gaussian copula')
 plt.tight_layout()
 plt.show()
 
@@ -1533,8 +1529,10 @@ print(f"Sample correlation of (x1, x2): {np.corrcoef(x1, x2)[0, 1]:.3f}")
 print(f"Sample correlation of (u1, u2): {np.corrcoef(u1, u2)[0, 1]:.3f}")
 ```
 
-The left panel shows the copula itself — the dependence structure in uniform coordinates.
+The left panel shows the copula itself -- the dependence structure in uniform coordinates, drawn from a bivariate normal with correlation $\rho = 0.8$.
+
 The right panel shows the same dependence translated to exponential marginals.
+
 Changing $\rho$ controls the strength of dependence while the marginals remain unchanged.
 
 ## Exercises
@@ -1552,13 +1550,13 @@ $$
 
 where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
 
-(a) Compute the marginal distributions $\mu_i = \text{Prob}\{X=i\}$ and $\nu_j = \text{Prob}\{Y=j\}$.
+1. Compute the marginal distributions $\mu_i = \text{Prob}\{X=i\}$ and $\nu_j = \text{Prob}\{Y=j\}$.
 
-(b) Form the independence matrix $f^{\perp}_{ij} = \mu_i \nu_j$ (the outer product of the two marginal vectors).
+1. Form the independence matrix $f^{\perp}_{ij} = \mu_i \nu_j$ (the outer product of the two marginal vectors).
 
-(c) Compare $F$ with $f^{\perp}$ and determine whether $X$ and $Y$ are independent.
+1. Compare $F$ with $f^{\perp}$ and determine whether $X$ and $Y$ are independent.
 
-(d) Verify your conclusion by computing $\text{Prob}\{X=0|Y=10\}$ and checking whether it equals $\text{Prob}\{X=0\}$.
+1. Verify your conclusion by computing $\text{Prob}\{X=0|Y=10\}$ and checking whether it equals $\text{Prob}\{X=0\}$.
 ```
 
 ```{solution-start} prob_matrix_ex1
@@ -1601,13 +1599,13 @@ print(f"Prob(X=0)         = {mu[0]:.4f}")
 
 Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 20\}$ as in Exercise 1:
 
-(a) Compute $\mathbb{E}[X]$, $\mathbb{E}[Y]$, and $\mathbb{E}[XY] = \sum_i \sum_j x_i y_j f_{ij}$.
+1. Compute $\mathbb{E}[X]$, $\mathbb{E}[Y]$, and $\mathbb{E}[XY] = \sum_i \sum_j x_i y_j f_{ij}$.
 
-(b) Compute $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$.
+1. Compute $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$.
 
-(c) Compute $\text{Cor}(X,Y) = \text{Cov}(X,Y) / (\sigma_X \sigma_Y)$.
+1. Compute $\text{Cor}(X,Y) = \text{Cov}(X,Y) / (\sigma_X \sigma_Y)$.
 
-(d) Show analytically that $X \perp Y$ implies $\text{Cov}(X,Y) = 0$.
+1. Show analytically that $X \perp Y$ implies $\text{Cov}(X,Y) = 0$.
 ```
 
 ```{solution-start} prob_matrix_ex2
@@ -1642,7 +1640,7 @@ cor_XY = cov_XY / np.sqrt(var_X * var_Y)
 print(f"Cor(X,Y) = {cor_XY:.4f}")
 ```
 
-For part (d): if $X \perp Y$ then $f_{ij} = \mu_i \nu_j$, so
+For part 4: if $X \perp Y$ then $f_{ij} = \mu_i \nu_j$, so
 
 $$
 \mathbb{E}[XY] = \sum_i \sum_j x_i y_j \mu_i \nu_j
@@ -1662,13 +1660,13 @@ and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0
 
 Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
 
-(a) Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
+1. Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
 
-(b) Plot the theoretical distribution.
+1. Plot the theoretical distribution.
 
-(c) Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
+1. Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
 
-(d) Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
+1. Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
 ```
 
 ```{solution-start} prob_matrix_ex3
@@ -1720,11 +1718,11 @@ $$
 
 where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
 
-(a) Starting from $\psi_0 = [1, 0]$, compute $\psi_n = \psi_0 P^n$ for $n = 1, 5, 20, 100$.
+1. Starting from $\psi_0 = [1, 0]$, compute $\psi_n = \psi_0 P^n$ for $n = 1, 5, 20, 100$.
 
-(b) Find the stationary distribution $\psi^*$ satisfying $\psi^* P = \psi^*$ and $\sum_i \psi^*_i = 1$.
+1. Find the stationary distribution $\psi^*$ satisfying $\psi^* P = \psi^*$ and $\sum_i \psi^*_i = 1$.
 
-(c) Verify numerically that $\psi_n \to \psi^*$ as $n$ grows.
+1. Verify numerically that $\psi_n \to \psi^*$ as $n$ grows.
 ```
 
 ```{solution-start} prob_matrix_ex4
@@ -1763,15 +1761,15 @@ print(f"psi_100 close to stationary? {np.allclose(psi_100, psi_star, atol=1e-6)}
 
 Let $X \in \{0,1\}$ and $Y \in \{0,1\}$ with marginals $\mu = [0.5,\, 0.5]$ and $\nu = [0.4,\, 0.6]$.
 
-(a) Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
+1. Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
 
-(b) Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
+1. Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
 
-(c) Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
+1. Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
 
-(d) Verify that all three have the correct marginals.
+1. Verify that all three have the correct marginals.
 
-(e) For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
+1. For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
 ```
 
 ```{solution-start} prob_matrix_ex5
@@ -1830,7 +1828,7 @@ print(f"Cor independent   = {correlation(F_indep, xs, ys):.4f}")
 
 A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
 
-(a) After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
+1. After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
 
 $$
 \mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
@@ -1838,11 +1836,11 @@ $$
 
 for each $\theta$.
 
-(b) Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
+1. Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
 
-(c) Plot the prior and posterior side by side.
+1. Plot the prior and posterior side by side.
 
-(d) Repeat for $k = 3$ heads and describe how the posterior shifts.
+1. Repeat for $k = 3$ heads and describe how the posterior shifts.
 ```
 
 ```{solution-start} prob_matrix_ex6

From 41fba896f97d1ec23ec6f0b5da43a1615460502a Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 20:45:23 +1000
Subject: [PATCH 04/10] updates

---
 lectures/information_market_equilibrium.md | 1143 ++++++++++++++------
 lectures/multivariate_normal.md            |   31 +-
 lectures/prob_matrix.md                    |  213 ++--
 3 files changed, 901 insertions(+), 486 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 851560d7b..475df4d1f 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -15,7 +15,10 @@ kernelspec:
 ```{raw} jupyter
 <div id="qe-notebook-header" align="right" style="text-align:right;">
         <a href="https://quantecon.org/" title="quantecon.org">
-                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+                <img style="width:250px;display:inline;"
+                width="250px"
+                src="https://assets.quantecon.org/img/qe-menubar-logo.svg"
+                alt="QuantEcon">
         </a>
 </div>
 ```
@@ -28,66 +31,85 @@ kernelspec:
 
 ## Overview
 
-This lecture studies two questions about the **informational role of prices** posed and
+This lecture studies two questions about the **informational role of prices**
+posed and
 answered by {cite:t}`kihlstrom_mirman1975`.
 
 1. *When do prices transmit inside information?*   
    - An informed insider observes a private
-   signal correlated with an unknown state of the world and adjusts demand accordingly.
+   signal correlated with an unknown state of the world and adjusts demand
+   accordingly.
    - Equilibrium prices shift. 
    - Under what conditions can an outside observer *infer* the
    insider's private signal from the equilibrium price?
 
 2. *Do Bayesian price expectations converge?*  
    - In a stationary stochastic exchange
-   economy, an uninformed observer uses the history of market prices and Bayes' Law to form
-   expectations about the economy's structure.  
+   economy, an uninformed observer uses the history of market prices and
+   Bayes' Law to form
+   beliefs about the economy's structure and hence about its induced price
+   distribution.
    - Do those expectations eventually
    agree with those of a fully informed observer?
 
 Kihlstrom and Mirman's answers rely on two classical ideas from statistics:
 
-- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be *sufficient* for a random variable
-  $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives all the
+- **Blackwell sufficiency**: a random variable $\tilde{y}$ is said to be
+  *sufficient* for a random variable
+  $\tilde{y}'$ with respect to an unknown state if knowing $\tilde{y}$ gives
+  all the
   information about the state that $\tilde{y}'$ contains.
-- **Bayesian consistency**: as the sample grows, a Bayesian statistician's posterior probability distribution concentrates on the true
-  parameter value, even when the underlying economic structure is not globally identified from prices alone.
+- **Bayesian consistency**: as the sample grows, posterior beliefs eliminate
+  models that imply the wrong **price distribution**, so even when structure is
+  not identified from prices the posterior mass on the true **reduced form**
+  still converges to one.
 
 Important findings of {cite:t}`kihlstrom_mirman1975` are:
 
-- Equilibrium prices transmit inside information *if and only if* the map from the
-  insider's posterior distribution to the equilibrium price vector is invertible
-  (one-to-one).
-- For a two-state pure exchange economy with CES preferences, invertibility holds whenever the
-  elasticity of substitution $\sigma \neq 1$.  
-  - With Cobb-Douglas preferences ($\sigma = 1$)
-  the equilibrium price is independent of the insider's posterior, so information is never
-  transmitted.
-- In the dynamic economy, as information accumulates, Bayesian price expectations converge to **rational expectations**, even when the deep structure of the economy is not identified.
+- Equilibrium prices transmit inside information *if and only if* the map from
+  the
+  insider's posterior distribution to the equilibrium price is one-to-one on
+  the set of
+  posteriors that can actually arise from the signal.
+- In the paper's two-state theorem, invertibility holds when the informed
+  agent's utility is homothetic and the elasticity of substitution is everywhere
+  either below one or above one, with CES preferences providing a convenient
+  illustration and Cobb-Douglas preferences ($\sigma = 1$) giving the opposite
+  case in which the equilibrium price is independent of the insider's posterior.
+- In the dynamic economy, as information accumulates, Bayesian price
+  expectations converge to **rational expectations**, even when the deep
+  structure is not identified from prices alone.
 
 ```{note}
-{cite:t}`kihlstrom_mirman1975` use the terms "reduced form" and "structural" models in a
+{cite:t}`kihlstrom_mirman1975` use the terms "reduced form" and "structural"
+models in a
 way that careful econometricians do. 
 
 Reduced-form and structural models come in pairs. 
 
 To each structure or structural model
-there is a reduced form, or collection of reduced forms, underlying different possible regressions.
+there is a reduced form, or collection of reduced forms, underlying different
+possible regressions.
+
+In this lecture, a **structure** is a parameterization of the underlying
+endowment process.
 ```
 
 The lecture is organized as follows.
 
 1. Set up the static two-commodity model and define equilibrium.
-2. State the price-revelation theorem (Theorem 1 of the paper) and the invertibility
-   conditions (Theorem 2).
-3. Illustrate invertibility and its failure with numerical examples using CES and
+2. State the price-revelation theorem and the invertibility conditions.
+3. Illustrate invertibility and its failure with numerical examples using CES
+   and
    Cobb-Douglas preferences.
-4. Introduce the dynamic stochastic economy and derive the Bayesian convergence result.
+4. Introduce the dynamic stochastic economy and derive the Bayesian convergence
+   result.
 5. Simulate Bayesian learning from price observations.
 
-This lecture builds on ideas in {doc}`blackwell_kihlstrom` and {doc}`likelihood_bayes`.
+This lecture builds on ideas in {doc}`blackwell_kihlstrom` and
+{doc}`likelihood_bayes`.
 
-## Setup
+We start by importing some Python packages.
 
 ```{code-cell} ipython3
 import numpy as np
@@ -96,7 +118,8 @@ from scipy.optimize import brentq
 from scipy.stats import norm
 ```
 
-## A two-commodity economy with an informed insider
+
+## Setup
 
 ### Preferences, endowments, and the unknown state
 
@@ -112,16 +135,28 @@ from a bundle $(x_1^i, x_2^i)$ is
 
 $$
 U^i(x_1^i, x_2^i)
-  = \sum_{s=1}^{S} u^i(a_s x_1^i,\, x_2^i)\, PR^i(\bar{a} = a_s),
+  = \sum_{s=1}^{S} u^i(a_s x_1^i,\, x_2^i)\, P^i(\bar{a} = a_s),
 $$
 
-where $PR^i$ is agent $i$'s subjective probability distribution over the finite state space
+where $P^i$ is agent $i$'s subjective probability distribution over the finite
+state space
 $A = \{a_1, \ldots, a_S\}$.
 
-Each agent starts with an endowment $w^i$ of good 2 and a share $\theta^i$ of the
+Each agent starts with an endowment $w^i$ of good 2 and a share $\theta^i$ of
+the
 representative firm.
 
-The firm's profit $\pi$ is determined by profit maximization.
+In the paper's formal model, a single firm transforms good 2 into good 1
+according to
+$y_1 = f(y_2)$ with $f' < 0$ and chooses production to maximize
+
+$$
+\pi(p) = \max_{y_2 \leq 0} \{p f(y_2) + y_2\}.
+$$
+
+The firm's profit $\pi$ is then distributed to households according to the
+shares
+$\theta^i$.
 
 Agent
 $i$'s budget constraint is
@@ -135,16 +170,25 @@ Agents maximize expected utility subject to their budget constraints.
 A **competitive
 equilibrium** is a price $\hat{p}$ that clears both markets simultaneously.
 
+For most of what follows, the production side matters only through the induced
+equilibrium price map, so when we turn to numerical illustrations we will
+suppress production and use a pure-exchange / portfolio interpretation to keep
+the calculations transparent.
+
 ### The informed agent's problem
 
-Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$ correlated with
+Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$
+correlated with
 $\bar{a}$ before trading.
 
-Upon observing $\tilde{y} = y$, agent 1 updates their prior
-$\mu = PR^1$ to a **posterior** $\mu_y = (\mu_{y1}, \ldots, \mu_{yS})$ via Bayes' rule:
+Before the signal arrives, agent 1 has prior beliefs
+$\mu_0 = P^1$.
+
+Upon observing $\tilde{y} = y$, agent 1 updates to the
+**posterior** $\mu_y = (\mu_{y1}, \ldots, \mu_{yS})$ via Bayes' rule:
 
 $$
-\mu_{ys} = PR(\bar{a} = a_s \mid \tilde{y} = y).
+\mu_{ys} = P(\bar{a} = a_s \mid \tilde{y} = y).
 $$
 
 Because agent 1's demand depends on $\mu_y$, the new equilibrium price satisfies
@@ -153,46 +197,60 @@ $$
 \hat{p} = p(\mu_y).
 $$
 
-Outside observers who see $\hat{p}$ but not $\tilde{y}$ can try to *back out* the
+Outside observers who see $\hat{p}$ but not $\tilde{y}$ can try to *back out*
+the
 insider's posterior from the price.
 
-This is possible when the map $\mu \mapsto p(\mu)$
-is **invertible** on the relevant domain.
+Define the set of realized posteriors
+
+$$
+M = \{\mu_y : y \in Y,\; P(\tilde y = y) > 0\}.
+$$
+
+The key question is whether the map $\mu \mapsto p(\mu)$ is one-to-one on $M$.
+
+To answer that question, we now translate "information in prices" into
+Blackwell's language of sufficiency.
 
 (price_revelation_theorem)=
 ## Price revelation
 
 ### Blackwell sufficiency
 
-The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's private
-information if observing the equilibrium price is just as informative about $\bar{a}$ as
+The price variable $p(\mu_{\tilde{y}})$ *accurately transmits* the insider's
+private
+information if observing the equilibrium price is just as informative about
+$\bar{a}$ as
 observing the signal $\tilde{y}$ directly.
 
-In Blackwell's language ({cite}`blackwell1951` and {cite}`blackwell1953`), this means
+In Blackwell's language ({cite:t}`blackwell1951` and {cite:t}`blackwell1953`),
+this means
 $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 
 ```{prf:definition} Sufficiency
 :label: ime_def_sufficiency
 
 A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
-respect to $\bar{a}$) if there exists a conditional distribution $PR(y' \mid y)$,
+respect to $\bar{a}$) if there exists a conditional distribution $P(y' \mid y)$,
 **independent of** $\bar{a}$, such that
 
 $$
-\phi'_a(y') = \sum_{y \in Y} PR(y' \mid y)\, \phi_a(y)
+\phi'_a(y') = \sum_{y \in Y} P(y' \mid y)\, \phi_a(y)
 \quad \text{for all } a \text{ and all } y',
 $$
 
-where $\phi_a(y) = PR(\tilde{y} = y \mid \bar{a} = a)$.
+where $\phi_a(y) = P(\tilde{y} = y \mid \bar{a} = a)$.
 
 Thus, once $\tilde{y}$ is known, $\tilde{y}'$ provides no additional information
 about $\bar{a}$.
 ```
 
+{cite:t}`kihlstrom_mirman1975` show that 
+
 ```{prf:lemma} Posterior Sufficiency
 :label: ime_lemma_posterior_sufficiency
 
-({cite:t}`kihlstrom_mirman1975`) The posterior distribution $\mu_{\tilde{y}}$
+The posterior distribution $\mu_{\tilde{y}}$
 is sufficient for $\tilde{y}$.
 ```
 
@@ -200,30 +258,76 @@ is sufficient for $\tilde{y}$.
 The posterior $\mu_{\tilde{y}}$ satisfies
 
 $$
-PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
-  = PR(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y).
+P(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y,\; \tilde{y} = y) = \mu_{ys}
+  = P(\bar{a} = a_s \mid \mu_{\tilde{y}} = \mu_y).
 $$
 
-Because the posterior itself *encodes* what $\tilde{y}$ says about $\bar{a}$, observing
-$\tilde{y}$ directly would add no information.
+This identity says that once the posterior is known, conditioning on the
+original signal
+$\tilde y$ does not change beliefs about $\bar a$.
+
+Equivalently, the conditional law of $\tilde y$ given $\mu_{\tilde y}$ is
+independent of
+$\bar a$, so $\mu_{\tilde y}$ is sufficient for $\tilde y$ in Blackwell's sense.
 ```
 
+Now let's think about the mapping from 
+belief to price.
+
 ```{prf:theorem} Price Revelation
 :label: ime_theorem_price_revelation
 
 In the economy described above, the price
-random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only if** the
-function $p(PR^1)$ is **invertible** on the set
+random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only
+if** the
+belief-to-price map is one-to-one on the realized posterior set $M$,
+equivalently if its
+inverse is well defined on the price set
 
 $$
 P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
-  PR(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu(a) > 0 \bigr\}.
+  P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \bigr\}.
 $$
 ```
 
-The "only if" direction follows because if $p$ were not one-to-one, two different posteriors
-would generate the same price; an observer could not distinguish them, so the price would
-not transmit all information that resides in the signal.
+The logic is
+
+$$
+\tilde y \quad \longrightarrow \quad \mu_{\tilde y} \quad \longrightarrow \quad
+p(\mu_{\tilde y}).
+$$
+
+The first arrow loses no information about $\bar a$ by
+{prf:ref}`ime_lemma_posterior_sufficiency`, and the theorem asks when the second
+arrow also loses no information.
+
+The proof has two parts.
+
+If $p(\cdot)$ is one-to-one on $M$, then observing the price is equivalent to
+observing the
+posterior itself because
+
+$$
+P(\mu_{\tilde y} = \mu \mid p(\mu_{\tilde y}) = p)
+= \begin{cases}
+1 & \text{if } \mu = p^{-1}(p), \\
+0 & \text{otherwise.}
+\end{cases}
+$$
+
+This conditional distribution is independent of the state, so price is
+sufficient for the
+posterior; together with {prf:ref}`ime_lemma_posterior_sufficiency`, price is
+therefore
+sufficient for the signal.
+
+Conversely, if two different posteriors in $M$ generated the same price, an
+observer of the price could not tell which posterior had occurred, and the paper
+shows formally that in this case the conditional distribution of the posterior
+given price would depend on the state, so price could not be sufficient.
+
+Before turning to invertibility itself, it helps to keep in mind the two
+economic interpretations emphasized in the paper.
 
 ### Two interpretations
 
@@ -233,56 +337,137 @@ Good 1 is a risky asset with random return $\bar{a}$; good 2 is "money".
 
 An insider's demand reveals private information about the return.
 
-If the invertibility condition holds, outside observers can read the insider's signal from
+If the invertibility condition holds, outside observers can read the insider's
+signal from
 the equilibrium stock price.
 
 #### Price as a quality signal
 
 Good 1 has uncertain quality $\bar{a}$.
 
-Experienced consumers (who have sampled the good) observe a signal correlated with quality
+Experienced consumers (who have sampled the good) observe a signal correlated
+with quality
 and buy accordingly.
 
-Uninformed consumers can infer quality from the market price, provided invertibility holds.
+Uninformed consumers can infer quality from the market price, provided
+invertibility holds.
 
 (invertibility_conditions)=
 ## Invertibility and the elasticity of substitution
 
-When does $p(PR^1)$ fail to be invertible?
+The price-revelation theorem reduces the economic problem to a narrower one:
+when is the belief-to-price map actually one-to-one?
 
-Theorem 2 of {cite:t}`kihlstrom_mirman1975`
-shows that for a two-state economy ($S = 2$), the answer turns on the **elasticity of
+When does the belief-to-price map fail to be invertible?
+
+Theorem {prf:ref}`ime_theorem_invertibility_conditions`
+shows that for a two-state economy ($S = 2$), the answer depends on the
+**elasticity of
 substitution** $\sigma$ of agent 1's utility function.
 
+Before stating the theorem, it helps to see the two intermediate steps in the
+paper's
+argument.
+
+```{prf:lemma} Same Price Implies Same Allocation
+:label: ime_lemma_same_price_same_allocation
+
+Fix the beliefs of all agents except agent 1.
+
+If two posterior beliefs $\mu$ and $\mu'$
+both generate the same equilibrium price $p$, then they generate the same
+equilibrium
+allocation for every trader.
+```
+
+```{prf:proof} (Sketch)
+All uninformed agents face the same price $p$ and keep the same beliefs, so
+their demands
+are unchanged.
+
+The firm's supply is also unchanged because it depends only on $p$.
+
+Market clearing then pins down agent 1's demand as the residual, so agent 1 must
+consume the
+same bundle under $\mu$ and $\mu'$ as well.
+```
+
+This lemma lets us define the informed agent's equilibrium bundle as a function
+of price
+alone:
+
+$$
+x(p) = (x_1(p), x_2(p)).
+$$
+
+Whenever the informed agent consumes positive amounts of both goods, optimality
+of $x(p)$
+under posterior $\mu$ gives the interior first-order condition
+
+$$
+p = \frac{\sum_{s=1}^S a_s u_1^1(a_s x_1(p), x_2(p))\, \mu(a_s)}
+         {\sum_{s=1}^S u_2^1(a_s x_1(p), x_2(p))\, \mu(a_s)}.
+$$
+
+For a fixed price $p$, the bundle $x(p)$ is fixed too, so invertibility boils
+down to
+whether this equation admits a unique posterior $\mu$.
+
+```{prf:lemma} Unique Posterior at a Given Price
+:label: ime_lemma_unique_posterior
+
+If, for each price $p \in P$, the first-order condition above has a unique
+solution
+$\mu \in M$, then the price map is invertible on $P$.
+```
+
+This is Lemma 3 in the paper: if two different posteriors gave the same price,
+then by
+{prf:ref}`ime_lemma_same_price_same_allocation` they would share the same bundle
+$x(p)$,
+contradicting uniqueness of the posterior that solves the first-order condition
+at that
+price.
+
 ### The two-state first-order condition
 
-With $S = 2$ and $\mu = (q,\, 1-q)$, the first-order condition for agent 1's demand
-(equation (12a) in the paper) reduces to
+With $S = 2$ and $\mu = (q,\, 1-q)$, define
 
 $$
-p(q) = \frac{\alpha_1 q + \alpha_2 (1-q)}{\beta_1 q + \beta_2 (1-q)},
+\alpha_s(p) = a_s\, u^1_1(a_s x_1(p),\, x_2(p)), \qquad
+\beta_s(p)  = u^1_2(a_s x_1(p),\, x_2(p)), \qquad s = 1, 2.
 $$
 
-where
+Then the first-order condition becomes
 
 $$
-\alpha_s = a_s\, u^1_1(a_s x_1,\, x_2), \qquad
-\beta_s  = u^1_2(a_s x_1,\, x_2), \qquad s = 1, 2.
+p = \frac{\alpha_1(p)\, q + \alpha_2(p)\, (1-q)}
+         {\beta_1(p)\, q + \beta_2(p)\, (1-q)}.
 $$
 
-The equilibrium consumption $(x_1, x_2)$ itself depends on $p$, so this is an implicit
-equation in $p$.
+At a fixed price $p$, the quantities $\alpha_s(p)$ and $\beta_s(p)$ are
+constants, so
+uniqueness of the posterior is the same as uniqueness of the scalar $q$ solving
+this
+equation.
 
 ```{prf:theorem} Invertibility Conditions
 :label: ime_theorem_invertibility_conditions
 
 Assume $u^1$ is quasi-concave and
-homothetic with continuous first partials. Assume agent 1 always consumes positive
-quantities of both goods. For $S = 2$:
+homothetic with continuous first partials. 
+
+Assume agent 1 always consumes positive
+quantities of both goods.
+
+For $S = 2$:
 
-- If $\sigma < 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
-- If $\sigma > 1$ for all feasible allocations, $p(PR^1)$ is **invertible** on $P$.
-- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), $p(PR^1)$ is **constant** on $P$
+- If $\sigma < 1$ for all feasible allocations, the price map is **invertible**
+  on $P$.
+- If $\sigma > 1$ for all feasible allocations, the price map is **invertible**
+  on $P$.
+- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), the price map is **constant** on
+  $P$
   (no information is transmitted).
 ```
 
@@ -291,13 +476,18 @@ making agent 1's demand for good 1 independent of information about $\bar{a}$.
 
 So the market price cannot reveal that information.
 
+The general theorem is abstract, so we now specialize to CES utility to make the
+mechanism concrete.
+
 ### CES utility
 
-For concreteness we work with the **constant-elasticity-of-substitution** (CES) utility
+For concreteness we work with the **constant-elasticity-of-substitution** (CES)
+utility
 function
 
 $$
-u(c_1, c_2) = \bigl(c_1^{\rho} + c_2^{\rho}\bigr)^{1/\rho}, \qquad \rho \in (-\infty,0) \cup (0,1),
+u(c_1, c_2) = \bigl(c_1^{\rho} + c_2^{\rho}\bigr)^{1/\rho}, \qquad \rho \in
+(-\infty,0) \cup (0,1),
 $$
 
 whose elasticity of substitution is $\sigma = 1/(1-\rho)$.
@@ -309,17 +499,26 @@ whose elasticity of substitution is $\sigma = 1/(1-\rho)$.
 Pertinent partial derivatives are
 
 $$
-u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1}, \qquad
+u_1(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_1^{\rho-1},
+\qquad
 u_2(c_1,c_2) = \bigl(c_1^\rho + c_2^\rho\bigr)^{1/\rho - 1}\, c_2^{\rho-1}.
 $$
 
+This CES example is only an illustration, because the theorem itself covers any
+homothetic utility with elasticity everywhere above one or everywhere below one.
+
+With that example in hand, we can compute the equilibrium price directly as a
+function of the posterior.
+
 ### Equilibrium price as a function of the posterior
 
-We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1 at
+We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1
+at
 equilibrium (i.e., $x_1 = 1$).
 
 Agent 1's budget constraint then reduces to
-$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$ satisfying
+$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$
+satisfying
 the first-order condition
 
 $$
@@ -327,54 +526,37 @@ p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
 = q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
 $$
 
-For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p = W^1 - p$,
-giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no information
+For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p =
+W^1 - p$,
+giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no
+information
 is transmitted through the price in the Cobb-Douglas case.
 
 We compute first-order conditions numerically below.
 
 ```{code-cell} ipython3
-def ces_derivatives(c1, c2, rho):
+def ces_derivatives(c1, c2, ρ):
     """
-    Returns (u1, u2) for u(c1,c2) = (c1^rho + c2^rho)^(1/rho).
-    Uses Cobb-Douglas limit for |rho| < 1e-4 to avoid numerical overflow.
+    Return CES marginal utilities.
+
+    Use the Cobb-Douglas limit near rho = 0.
     """
-    if abs(rho) < 1e-4:
-        # Cobb-Douglas limit  u = sqrt(c1*c2)
+    if abs(ρ) < 1e-4:
         u1 = 0.5 * np.sqrt(c2 / c1)
         u2 = 0.5 * np.sqrt(c1 / c2)
     else:
-        common = (c1**rho + c2**rho)**(1/rho - 1)
-        u1 = common * c1**(rho - 1)
-        u2 = common * c2**(rho - 1)
+        common = (c1**ρ + c2**ρ)**(1 / ρ - 1)
+        u1 = common * c1**(ρ - 1)
+        u2 = common * c2**(ρ - 1)
     return u1, u2
 
 
-def eq_price(q, a1, a2, W1, rho):
-    """
-    Solve for the equilibrium price when the informed agent absorbs one unit
-    of good 1.  With x1 = 1 and budget constraint x2 = W1 - p, the FOC
-
-        p [q u2(a1, x2) + (1-q) u2(a2, x2)] = q a1 u1(a1, x2) + (1-q) a2 u1(a2, x2)
-
-    has a unique root p* in (0, W1).
-
-    Parameters
-    ----------
-    q   : posterior probability on state 1 (high state)
-    a1  : state-1 productivity value  (a1 > a2)
-    a2  : state-2 productivity value
-    W1  : informed agent's wealth
-    rho : CES parameter  (rho=0 → Cobb-Douglas; analytical p* = W1/2)
-
-    Returns
-    -------
-    p_star : equilibrium price, or nan if solver fails
-    """
+def eq_price(q, a1, a2, W1, ρ):
+    """Return the equilibrium price for posterior q."""
     def residual(p):
-        x2 = W1 - p          # x1 = 1 absorbed at equilibrium
-        u1_s1, u2_s1 = ces_derivatives(a1, x2, rho)
-        u1_s2, u2_s2 = ces_derivatives(a2, x2, rho)
+        x2 = W1 - p
+        u1_s1, u2_s1 = ces_derivatives(a1, x2, ρ)
+        u1_s2, u2_s2 = ces_derivatives(a2, x2, ρ)
         lhs = p * (q * u2_s1 + (1 - q) * u2_s2)
         rhs = q * a1 * u1_s1 + (1 - q) * a2 * u1_s2
         return lhs - rhs
@@ -392,45 +574,44 @@ mystnb:
     caption: equilibrium price vs posterior
     name: fig-eq-price-posterior
 ---
-# ── Economy parameters ──────────────────────────────────────────────────────
 a1, a2 = 2.0, 0.5     # state values (a1 > a2)
-W1     = 4.0           # informed agent's wealth; equilibrium x2 = W1 - p
+W1 = 4.0
 
-# Posterior grid
 q_grid = np.linspace(0.05, 0.95, 200)
 
-# rho values to compare: complements (<0), Cobb-Douglas (=0), substitutes (>0)
-rho_values = [-0.5, 0.0, 0.5]
-rho_labels = [r"$\rho = -0.5$  ($\sigma = 0.67$, complements)",
-              r"$\rho = 0$  ($\sigma = 1$, Cobb-Douglas)",
-              r"$\rho = 0.5$  ($\sigma = 2$, substitutes)"]
-colors     = ["steelblue", "crimson", "forestgreen"]
+ρ_values = [-0.5, 0.0, 0.5]
+ρ_labels = [
+    r"$\rho = -0.5$ ($\sigma = 0.67$, complements)",
+    r"$\rho = 0$ ($\sigma = 1$, Cobb-Douglas)",
+    r"$\rho = 0.5$ ($\sigma = 2$, substitutes)",
+]
 
 fig, ax = plt.subplots(figsize=(8, 5))
 
-for rho, label, color in zip(rho_values, rho_labels, colors):
-    prices = [eq_price(q, a1, a2, W1, rho) for q in q_grid]
-    ax.plot(q_grid, prices, label=label, color=color, lw=2)
+for ρ, label in zip(ρ_values, ρ_labels):
+    prices = [eq_price(q, a1, a2, W1, ρ) for q in q_grid]
+    ax.plot(q_grid, prices, label=label, lw=2)
 
 ax.set_xlabel(r"posterior probability $q = \Pr(\bar{a} = a_1)$", fontsize=12)
 ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.legend(fontsize=10)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
 The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
-- **CES with $\sigma \neq 1$**: the equilibrium price is **strictly monotone** in $q$.
-  An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely invert the
+- *CES with $\sigma \neq 1$*: the equilibrium price is **strictly monotone** in
+  $q$.
+
+  - An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely
+    invert the
   price to recover $q$—inside information is fully transmitted.
-- **Cobb-Douglas ($\sigma = 1$)**: the price is *flat* in $q$—information is never
+- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$—information is never
   transmitted through the market.
 
 ```{code-cell} ipython3
-# Verify that rho=0 (exact Cobb-Douglas) gives a flat line
-p_cd = [eq_price(q, a1, a2, W1, rho=0.0) for q in q_grid]
+p_cd = [eq_price(q, a1, a2, W1, ρ=0.0) for q in q_grid]
 
 print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
       f"max p* = {max(p_cd):.6f}, "
@@ -438,14 +619,38 @@ print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
 print(f"Analytical CD price  = W1/2 = {W1/2:.6f}")
 ```
 
-Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the Cobb-Douglas
+Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the
+Cobb-Douglas
 equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
 
+The numerical plot shows monotonicity, and the next subsection connects that
+pattern back to the proof of {prf:ref}`ime_theorem_invertibility_conditions`.
+
 (price_monotonicity)=
 ### Why monotonicity depends on $\sigma$
 
-The derivative $\partial p / \partial q$ has the sign of $\alpha_1 \beta_2 - \alpha_2 \beta_1$
-(from differentiating the FOC formula).
+The key derivative in the paper fixes a price $p$, treats $\alpha_s(p)$ and
+$\beta_s(p)$ as constants, and then differentiates the right-hand side of
+
+$$
+\frac{\alpha_1(p)\, q + \alpha_2(p)\, (1-q)}
+     {\beta_1(p)\, q + \beta_2(p)\, (1-q)}
+$$
+
+is a function of $q$ whose derivative is
+
+$$
+\frac{\partial}{\partial q}
+\frac{\alpha_1 q + \alpha_2 (1-q)}
+     {\beta_1 q + \beta_2 (1-q)}
+= \frac{\alpha_1 \beta_2 - \alpha_2 \beta_1}
+       {\bigl[\beta_1 q + \beta_2 (1-q)\bigr]^2}.
+$$
+
+So the sign is determined by $\alpha_1 \beta_2 - \alpha_2 \beta_1$, and if that
+sign is constant then for each fixed price there is at most one posterior weight
+$q$ consistent with the first-order condition, which is exactly what
+{prf:ref}`ime_theorem_invertibility_conditions` requires.
 
 Using
 
@@ -463,14 +668,23 @@ $$
     \Bigl(\frac{x_2}{x_1}\Bigr)^{1/\sigma}.
 $$
 
-This is positive when $\sigma > 1$, negative when $\sigma < 1$, and **zero when $\sigma = 1$**
-(Cobb-Douglas).
+For the CES specification, this derivative is positive when $\sigma > 1$,
+negative when
+$\sigma < 1$, and *zero when $\sigma = 1$*.
+
+In other words, for CES utility the ratio $\alpha_s / \beta_s$ moves
+monotonically with the state value $a_s$ unless $\sigma = 1$, which makes the
+fixed-price first-order-condition expression monotone in $q$ and in turn
+delivers invertibility.
 
-The vanishing derivative means the marginal rate of substitution is
-independent of $a_s$, so the informed agent's demand—and hence the equilibrium price—does
+The vanishing derivative in the Cobb-Douglas case means the marginal rate of
+substitution is
+independent of $a_s$, so the informed agent's demand, and hence the equilibrium
+price, does
 not respond to changes in beliefs.
 
-Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for different
+Let us visualize the ratio $\alpha_s / \beta_s$ as a function of $a_s$ for
+different
 values of $\sigma$:
 
 ```{code-cell} ipython3
@@ -481,38 +695,41 @@ mystnb:
     name: fig-mrs-alpha-beta
 ---
 a_vals = np.linspace(0.3, 3.0, 300)
-x1_fix, x2_fix = 1.0, 1.0   # fix consumption bundle for illustration
+x1_fix, x2_fix = 1.0, 1.0
 
 fig, ax = plt.subplots(figsize=(7, 4))
-for rho, color in zip([-0.5, -1e-6, 0.5], ["steelblue", "crimson", "forestgreen"]):
-    sigma = 1 / (1 - rho) if abs(rho) > 1e-8 else 1.0
+for ρ in [-0.5, -1e-6, 0.5]:
+    σ = 1 / (1 - ρ) if abs(ρ) > 1e-8 else 1.0
     ratios = []
     for a in a_vals:
-        u1, u2 = ces_derivatives(a * x1_fix, x2_fix, rho)
+        u1, u2 = ces_derivatives(a * x1_fix, x2_fix, ρ)
         ratios.append(a * u1 / u2)
-    ax.plot(a_vals, ratios,
-            label=rf"$\sigma = {sigma:.2f}$", color=color, lw=2)
+    ax.plot(a_vals, ratios, label=rf"$\sigma = {σ:.2f}$", lw=2)
 
 ax.set_xlabel(r"state value $a_s$", fontsize=12)
 ax.set_ylabel(r"$\alpha_s / \beta_s = a_s u_1 / u_2$", fontsize=12)
 ax.axhline(y=1.0, color="black", lw=0.8, ls="--")
 ax.legend(fontsize=10)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
-When $\sigma = 1$ (red line) the ratio is constant across all $a_s$ values—information
+When $\sigma = 1$ the ratio is constant across all $a_s$ values—information
 about the state has no effect on the marginal rate of substitution.
 
 For $\sigma < 1$ the
 ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
 equilibrium price strictly monotone in the posterior $q$ in both cases.
 
+The static analysis asks whether a current price reveals current private
+information, whereas the next section asks what a whole history of prices
+reveals over time.
+
 (bayesian_price_expectations)=
 ## Bayesian price expectations in a dynamic economy
 
-We now turn to a question addressed in Section 3 of {cite:t}`kihlstrom_mirman1975`.
+We now turn to a question addressed in Section 3 of
+{cite:t}`kihlstrom_mirman1975`.
 
 ### A stochastic exchange economy
 
@@ -525,41 +742,73 @@ In each period $t$:
 3. Consumers trade and consume.
 
 The endowment vectors $\{\tilde{\omega}^t\}$ are **i.i.d.** with density
-$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is a
+$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is
+a
 **structural parameter vector** that is *fixed but unknown*.
 
 The equilibrium price at time $t$ is a deterministic function of $\omega^t$, so
-$\{p^t\}$ is also i.i.d. with density
+$\{p^t\}$ is also i.i.d.
+
+For any measurable price set $P$, let
+
+$$
+W(P) = \{\omega^t : p(\omega^t) \in P\}.
+$$
+
+Then
 
 $$
-g(p^t \mid \lambda) = \int f(\omega^t \mid \lambda)\,
-  \mathbf{1}\bigl[p(\omega^t) = p^t\bigr]\, d\omega^t.
+P_\lambda(p^t \in P) = P_\lambda(\omega^t \in W(P))
+= \int_{W(P)} f(\omega^t \mid \lambda)\, d\omega^t.
 $$
 
-Following econometric convention, {cite:t}`kihlstrom_mirman1975` call $g(p \mid \lambda)$
-the **reduced form** and $f(\omega \mid \lambda)$ the **structure**.
+The induced price density is denoted by $g(p^t \mid \lambda)$.
+
+For a given structure $\lambda$, this density is the observable implication of
+the model, and when several structures imply the same density we group them
+into a single reduced-form class.
+
+The next issue is therefore what an observer can and cannot infer about the
+structure from price data alone.
 
 ### The identification problem
 
-Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices loses
+Because the map $\omega \mapsto p(\omega)$ is many-to-one, observing prices
+loses
 information relative to observing endowments.
 
 In particular, it may be impossible to
 recover $\lambda$ from $g(p \mid \lambda)$ even with infinite price data.
 
 To handle this, partition $\Lambda$ into equivalence classes $\mu$ such that
-$\lambda \in \mu$ and $\lambda' \in \mu$ whenever $g(p \mid \lambda) = g(p \mid \lambda')$
+$\lambda \in \mu$ and $\lambda' \in \mu$ whenever $g(p \mid \lambda) = g(p \mid
+\lambda')$
 for all $p$.
 
 The equivalence class $\mu$ containing the true $\lambda$ is the **reduced
-form** (with respect to data on prices).
+form** relevant for price data.
 
 An observer who knows the infinite price history learns
 $\mu$ but not necessarily $\lambda$.
 
+Once that distinction is clear, Bayesian updating can be written down directly.
+
 ### Bayesian updating
 
-An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in \Lambda$.
+An uninformed observer begins with a prior $h(\lambda)$ over $\lambda \in
+\Lambda$.
+
+If the observer could see endowments directly, the posterior would be
+
+$$
+h(\lambda \mid \omega^1, \ldots, \omega^t)
+  = \frac{h(\lambda)\, \prod_{\tau=1}^{t} f(\omega^\tau \mid \lambda)}
+         {\displaystyle\sum_{\lambda' \in \Lambda}
+           h(\lambda')\, \prod_{\tau=1}^{t} f(\omega^\tau \mid \lambda')},
+$$
+
+and the paper appeals to a Bayesian consistency result to conclude that this
+posterior concentrates on the true structure $\bar \lambda$.
 
 After observing the price sequence $(p^1, \ldots, p^t)$, the observer's Bayesian
 posterior is
@@ -571,6 +820,21 @@ h(\lambda \mid p^1, \ldots, p^t)
            h(\lambda')\, \prod_{\tau=1}^{t} g(p^\tau \mid \lambda')}.
 $$
 
+Price data cannot distinguish structures inside the same reduced-form class.
+
+Indeed, if
+$\lambda$ and $\lambda'$ belong to the same class $\mu$, then
+$g(\cdot \mid \lambda) = g(\cdot \mid \lambda')$, so
+
+$$
+\frac{h(\lambda \mid p^1, \ldots, p^t)}
+     {h(\lambda' \mid p^1, \ldots, p^t)}
+= \frac{h(\lambda)}{h(\lambda')}
+$$
+
+for every sample history, so the relative odds within an observationally
+equivalent class never change.
+
 At time $t$, the observer's price expectations for the next period are
 
 $$
@@ -579,6 +843,9 @@ g(p^{t+1} \mid p^1, \ldots, p^t)
     h(\lambda \mid p^1, \ldots, p^t).
 $$
 
+With the posterior and predictive density defined, we can state the paper's
+convergence result.
+
 ### The convergence theorem
 
 ```{prf:theorem} Bayesian Convergence
@@ -587,11 +854,31 @@ $$
 Let $\bar\lambda$ be the true
 structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.
 
+Assume the prior assigns positive probability to $\bar\lambda$ (equivalently,
+positive
+probability to the class $\bar\mu$).
+
+Define the posterior mass on a reduced-form class by
+
+$$
+H_t(\mu) = \sum_{\lambda \in \mu} h(\lambda \mid p^1, \ldots, p^t).
+$$
+
+Because all structures inside a class imply the same $g(\cdot \mid \lambda)$,
+the
+predictive density can equivalently be written as
+
+$$
+g(p^{t+1} \mid p^1, \ldots, p^t)
+  = \sum_{\mu} g(p^{t+1} \mid \mu)\, H_t(\mu).
+$$
+
 Then
 
 $$
-\lim_{t \to \infty} h(\mu \mid p^1, \ldots, p^t)
-  = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,} \end{cases}
+\lim_{t \to \infty} H_t(\mu)
+  = \begin{cases} 1 & \text{if } \mu = \bar\mu, \\ 0 & \text{otherwise,}
+  \end{cases}
 $$
 
 with probability one.
@@ -602,18 +889,29 @@ $$
 \lim_{t \to \infty} g(p^{t+1} \mid p^1, \ldots, p^t) = g(p \mid \bar\mu),
 $$
 
-which equals the rational-expectations price distribution for a fully informed observer.
+which equals the rational-expectations price distribution for a fully informed
+observer.
 ```
 
-Establishing convergence relies on appealing to the **Bayesian consistency** result of {cite:t}`degroot1962`: as
-long as $g(\cdot \mid \mu)$ and $g(\cdot \mid \mu')$ generate mutually singular measures
-(which holds here generically), the posterior concentrates on the true reduced form.
+The important distinction is that price observers need not learn $\bar \lambda$
+itself.
+
+They only learn which reduced-form class is correct.
 
-Price observers converge to **rational expectations** even if they never identify the
-underlying structure $\bar\lambda$.
+That is enough for forecasting because every $\lambda \in \bar \mu$ generates
+the same price density $g(\cdot \mid \bar \mu)$.
 
-The reduced form $g(p \mid \bar\mu)$ statistical model is used to form equilibrium price
-expectations, and the Bayesian observer learns the reduced form from prices alone.
+This is exactly the paper's point: rational price expectations emerge from
+learning the
+reduced form, not from identifying every structural detail of the economy.
+
+Here "rational expectations" means that the observer's predictive distribution
+for next
+period's price matches the objective price distribution generated by the true
+reduced form.
+
+The theorem is easiest to absorb in a stripped-down example, so we now turn to a
+simple simulation.
 
 (bayesian_simulation)=
 ## Simulating Bayesian learning from prices
@@ -623,12 +921,14 @@ We illustrate the theorem with a two-state example.
 Two possible reduced forms $\mu_1$ and $\mu_2$ generate prices
 $p^t \sim N(\bar{p}_i, \sigma_p^2)$ for $i = 1, 2$ respectively.
 
-The observer knows the two possible price distributions (the reduced forms) but not which
+The observer knows the two possible price distributions (the reduced forms) but
+not which
 one governs the data.
 
-This is a standard **Bayesian model selection** problem.
+This is a **Bayesian model selection** problem.
 
-With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight on $\mu_1$
+With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight
+on $\mu_1$
 after period $t$ is
 
 $$
@@ -637,37 +937,22 @@ h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
 $$
 
 ```{code-cell} ipython3
-def simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths,
-                                seed=42):
-    """
-    Simulate Bayesian learning about which price distribution is true.
-
-    Parameters
-    ----------
-    p_bar_true : mean of the true reduced form
-    p_bar_alt  : mean of the alternative reduced form
-    sigma_p    : common standard deviation of price distributions
-    T          : number of periods
-    h0         : initial prior probability on the true model
-    n_paths    : number of simulation paths
-    seed       : random seed
-
-    Returns
-    -------
-    h_paths : array of shape (n_paths, T+1) with posterior beliefs on true model
-    """
+def simulate_bayesian_learning(
+    p_bar_true, p_bar_alt, σ_p, T, h0, n_paths, seed=42
+):
+    """Simulate posterior learning between two Gaussian reduced forms."""
     rng = np.random.default_rng(seed)
     h_paths = np.zeros((n_paths, T + 1))
     h_paths[:, 0] = h0
 
     for path in range(n_paths):
         h = h0
-        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        prices = rng.normal(p_bar_true, σ_p, size=T)
         for t, p in enumerate(prices):
-            g_true  = norm.pdf(p, loc=p_bar_true, scale=sigma_p)
-            g_alt   = norm.pdf(p, loc=p_bar_alt,  scale=sigma_p)
-            denom   = h * g_true + (1 - h) * g_alt
-            h       = h * g_true / denom
+            g_true = norm.pdf(p, loc=p_bar_true, scale=σ_p)
+            g_alt = norm.pdf(p, loc=p_bar_alt, scale=σ_p)
+            denom = h * g_true + (1 - h) * g_alt
+            h = h * g_true / denom
             h_paths[path, t + 1] = h
 
     return h_paths
@@ -684,12 +969,16 @@ def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
     median_path = np.median(h_paths, axis=0)
     ax.plot(t_grid, median_path, color="navy", lw=2, label="median posterior")
 
-    ax.axhline(y=1.0, color="black", ls="--", lw=1.2, label="true model weight = 1")
+    ax.axhline(
+        y=1.0,
+        color="black",
+        ls="--",
+        lw=1.2,
+        label="true model weight = 1",
+    )
     ax.set_xlabel("period $t$", fontsize=12)
     ax.set_ylabel(r"$h_t$ = posterior weight on true model", fontsize=12)
     ax.legend(fontsize=10)
-    ax.set_ylim(-0.05, 1.08)
-    ax.grid(alpha=0.3)
 ```
 
 ```{code-cell} ipython3
@@ -702,30 +991,38 @@ mystnb:
 T       = 300
 h0      = 0.5     # diffuse prior
 n_paths = 40
-sigma_p = 0.4
+σ_p = 0.4
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 
-# Case 1: distinct reduced forms (easy to learn)
+# Distinct reduced forms.
 p_bar_true, p_bar_alt = 2.0, 1.2
-h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, σ_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
 
-# Case 2: similar reduced forms (harder to learn)
+# Similar reduced forms.
 p_bar_true, p_bar_alt = 2.0, 1.8
-h_paths_hard = simulate_bayesian_learning(p_bar_true, p_bar_alt, sigma_p, T, h0, n_paths)
+h_paths_hard = simulate_bayesian_learning(
+    p_bar_true, p_bar_alt, σ_p, T, h0, n_paths
+)
 plot_bayesian_learning(h_paths_hard, p_bar_true, p_bar_alt, axes[1])
 
 plt.tight_layout()
 plt.show()
 ```
 
-In both panels the posterior weight on the true model converges to 1 with probability one,
-though convergence is slower when the two price distributions are similar (right panel).
+In both panels the posterior weight on the true model converges to 1 with
+probability one,
+though convergence is slower when the two price distributions are similar (right
+panel).
+
+This first simulation tracks posterior mass, and the next one tracks the
+predictive density itself.
 
 ### Price expectations vs. rational expectations
 
-We now verify that the observer's price expectations converge to the rational-expectations
+We now verify that the observer's price expectations converge to the
+rational-expectations
 distribution $g(p \mid \bar\mu)$.
 
 ```{code-cell} ipython3
@@ -735,62 +1032,72 @@ mystnb:
     caption: price distribution convergence
     name: fig-price-convergence
 ---
-def price_expectation(h_t, p_bar_true, p_bar_alt, p_grid):
-    """
-    Compute the observer's predictive price density at posterior weight h_t.
-    Mixture: h_t * N(p_bar_true, ...) + (1-h_t) * N(p_bar_alt, ...)
-    """
-    return (h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
-            + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p))
+def price_expectation(h_t, p_bar_true, p_bar_alt, sigma_p, p_grid):
+    """Return the predictive price density at posterior weight h_t."""
+    return (
+        h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+        + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p)
+    )
 
 
 p_bar_true, p_bar_alt = 2.0, 1.2
-sigma_p = 0.4
-T_long  = 1000
+σ_p = 0.4
 n_paths = 1
+T_long = 1000
+
 h_paths_long = simulate_bayesian_learning(
-    p_bar_true, p_bar_alt, sigma_p, T_long, h0=0.5, n_paths=n_paths, seed=7
+    p_bar_true, p_bar_alt, σ_p, T_long, h0=0.5, n_paths=n_paths, seed=7
 )
 
 p_grid = np.linspace(0.0, 3.5, 300)
-re_density = norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
+re_density = norm.pdf(p_grid, loc=p_bar_true, scale=σ_p)
 
 fig, ax = plt.subplots(figsize=(8, 5))
-snapshots = [0, 10, 50, 200, T_long]
+snapshots = [0, 1, 3, 5, 10]
 palette   = plt.cm.Blues(np.linspace(0.3, 1.0, len(snapshots)))
 
 for t_snap, col in zip(snapshots, palette):
     h_t = h_paths_long[0, t_snap]
-    dens = price_expectation(h_t, p_bar_true, p_bar_alt, p_grid)
-    ax.plot(p_grid, dens, color=col, lw=2,
-            label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$")
+    dens = price_expectation(h_t, p_bar_true, p_bar_alt, σ_p, p_grid)
+    ax.plot(
+        p_grid,
+        dens,
+        color=col,
+        lw=2,
+        label=rf"$t = {t_snap}$, $h_t = {h_t:.3f}$",
+    )
 
 ax.plot(p_grid, re_density, "k--", lw=2,
-        label=r"rational expectations $g(p \mid \bar\mu)$")
+        label=r"rational expectations $g(p \mid \bar{\mu})$")
 ax.set_xlabel("price $p$", fontsize=12)
 ax.set_ylabel("density", fontsize=12)
 ax.legend(fontsize=9)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
-The sequence of predictive densities (shades of blue) converges to the rational-expectations
+The sequence of predictive densities (shades of blue) converges to the
+rational-expectations
 density (dashed black line) as experience accumulates.
 
 This illustrates {prf:ref}`ime_theorem_bayesian_convergence`.
 
+We can now sharpen the point by looking at a case in which the reduced form is
+learned but the underlying structure is not.
+
 (km_extension_nonidentification)=
 ### Learning the reduced form without identifying the structure
 
-The convergence result is particularly striking because the observer converges to
+The convergence result is particularly striking because the observer converges
+to
 *rational expectations* even when the underlying **structure** $\lambda$ is
 *not identified* by prices.
 
 To illustrate this, consider a case with *three* possible structures
 $\lambda^{(1)}, \lambda^{(2)}, \lambda^{(3)}$ but only *two* reduced forms
 $\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
-(because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price distribution).
+(because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price
+distribution).
 
 ```{code-cell} ipython3
 ---
@@ -799,24 +1106,19 @@ mystnb:
     caption: learning with non-identification
     name: fig-nonidentification
 ---
-def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths, seed=0):
-    """
-    Bayesian learning with 3 structures, 2 reduced forms.
-    h0_vec  : length-3 array of initial prior weights on each structure
-    p_bar_vec: length-3 array of price means for each structure
-               (structures 0 and 1 share the same reduced form if p_bar_vec[0]==p_bar_vec[1])
-    true_idx: index (0,1,2) of the true structure
-    Returns  : array (n_paths, T+1, 3) posterior weights on each structure
-    """
+def simulate_learning_3struct(
+    T, h0_vec, p_bar_vec, σ_p, true_idx, n_paths, seed=0
+):
+    """Simulate learning with three structures and two reduced forms."""
     rng = np.random.default_rng(seed)
     h_paths = np.zeros((n_paths, T + 1, 3))
     h_paths[:, 0, :] = h0_vec
 
     for path in range(n_paths):
         h = np.array(h0_vec, dtype=float)
-        prices = rng.normal(p_bar_vec[true_idx], sigma_p, size=T)
+        prices = rng.normal(p_bar_vec[true_idx], σ_p, size=T)
         for t, p in enumerate(prices):
-            likelihoods = norm.pdf(p, loc=p_bar_vec, scale=sigma_p)
+            likelihoods = norm.pdf(p, loc=p_bar_vec, scale=σ_p)
             h = h * likelihoods
             h /= h.sum()
             h_paths[path, t + 1, :] = h
@@ -824,20 +1126,24 @@ def simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths,
     return h_paths
 
 
-# Structures 0 and 1 have the same reduced form (same price mean)
+# Structures 0 and 1 share the same reduced form.
 p_bar_vec = np.array([2.0, 2.0, 1.2])
-h0_vec    = np.array([1/3, 1/3, 1/3])
-sigma_p   = 0.4
-T         = 400
-true_idx  = 0             # True structure is 0 (indistinguishable from 1)
+h0_vec = np.array([1 / 3, 1 / 3, 1 / 3])
+σ_p = 0.4
+T = 400
+true_idx = 0     # Structure 0 is observationally equivalent to 1.
 
-h_paths_3 = simulate_learning_3struct(T, h0_vec, p_bar_vec, sigma_p, true_idx, n_paths=30)
+h_paths_3 = simulate_learning_3struct(
+    T, h0_vec, p_bar_vec, σ_p, true_idx, n_paths=30
+)
 t_grid = np.arange(T + 1)
 
 fig, axes = plt.subplots(1, 3, figsize=(13, 4), sharey=True)
-struct_labels = [r"$\lambda^{(1)}$",
-                 r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
-                 r"$\lambda^{(3)}$"]
+struct_labels = [
+    r"$\lambda^{(1)}$",
+    r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
+    r"$\lambda^{(3)}$",
+]
 
 for k, (ax, label) in enumerate(zip(axes, struct_labels)):
     for path in h_paths_3:
@@ -845,7 +1151,6 @@ for k, (ax, label) in enumerate(zip(axes, struct_labels)):
     ax.plot(t_grid, np.median(h_paths_3[:, :, k], axis=0),
             color="navy", lw=2, label=f"median weight on {label}")
     ax.set_xlabel("period $t$", fontsize=11)
-    ax.grid(alpha=0.3)
     ax.legend(fontsize=9)
 
 axes[0].set_ylabel("posterior weight", fontsize=11)
@@ -853,20 +1158,25 @@ plt.tight_layout()
 plt.show()
 ```
 
-The observer correctly rules out $\lambda^{(3)}$ (the wrong reduced form) with probability
-one, but cannot distinguish $\lambda^{(1)}$ from $\lambda^{(2)}$ because they generate an
+The observer correctly rules out $\lambda^{(3)}$ (the wrong reduced form) with
+probability
+one, but cannot distinguish $\lambda^{(1)}$ from $\lambda^{(2)}$ because they
+generate an
 identical price distribution.
 
 Nevertheless, the observer's **price expectations** converge
-to rational expectations because both structures imply the same reduced form $\bar\mu$.
+to rational expectations because both structures imply the same reduced form
+$\bar\mu$.
+
 
 ## Exercises
 
 ```{exercise}
 :label: km_ex1
 
-**Invertibility with CARA preferences.**  Consider a two-state economy ($a_1 = 2$,
-$a_2 = 0.5$) where the informed agent has **CARA** (constant absolute risk aversion)
+Consider a two-state economy ($a_1 = 2$,
+$a_2 = 0.5$) where the informed agent has **CARA** (constant absolute risk
+aversion)
 preferences over portfolio wealth:
 
 $$
@@ -879,17 +1189,24 @@ $$
 q\,u(W_1) + (1-q)\,u(W_2), \quad W_s = w - p\,x_1 + a_s\,x_1,
 $$
 
-subject to the budget constraint $p\,x_1 + x_2 = w$.  Total supply of good 1 is $X_1 = 1$.
+subject to the budget constraint $p\,x_1 + x_2 = w$.
+
+Total supply of good 1 is $X_1 = 1$.
 
 1. Derive the first-order condition for the informed agent's optimal $x_1$.
 
-1. Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the entire
-supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.  Solve it
+1. Use the market-clearing condition $x_1 = 1$ (the informed agent absorbs the
+   entire
+supply) to obtain an implicit equation for the equilibrium price $p^*(q)$.
+Solve it
 numerically for $q \in (0,1)$ and several values of $\gamma$.
 
-1. Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility condition
-holds.  Explain intuitively why CARA preferences always lead to an invertible price map
-(the elasticity of substitution of portfolio utility is $\sigma = \infty$).
+1. Show numerically that $p^*(q)$ is monotone in $q$, so the invertibility
+   condition
+holds in this example. Explain why this is economically similar to the $\sigma >
+1$ case in
+{prf:ref}`ime_theorem_invertibility_conditions`, but not a direct application of
+that theorem.
 ```
 
 ```{solution-start} km_ex1
@@ -898,7 +1215,9 @@ holds.  Explain intuitively why CARA preferences always lead to an invertible pr
 
 **1. First-order condition.**
 
-Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.  The FOC is
+Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.
+
+The FOC is
 
 $$
 q\,(a_1 - p)\,\gamma\, e^{-\gamma W_1}
@@ -915,6 +1234,7 @@ $$
 **2. Market-clearing equilibrium price.**
 
 Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
+
 a scalar root-finding problem in $p$:
 
 $$
@@ -925,39 +1245,46 @@ $$
 ```{code-cell} ipython3
 from scipy.optimize import brentq
 
-def F_cara(p, q, a1, a2, gamma, x1=1.0):
-    """Residual of CARA market-clearing condition."""
-    return (q * (a1-p) * np.exp(-gamma*(a1-p)*x1)
-            - (1-q) * (p-a2) * np.exp(gamma*(p-a2)*x1))
+def F_cara(p, q, a1, a2, γ, x1=1.0):
+    """Residual for the CARA equilibrium condition."""
+    return (q * (a1 - p) * np.exp(-γ * (a1 - p) * x1)
+            - (1 - q) * (p - a2) * np.exp(γ * (p - a2) * x1))
 
-a1, a2  = 2.0, 0.5
-q_grid  = np.linspace(0.05, 0.95, 200)
-gammas  = [0.5, 1.0, 2.0, 5.0]
-colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(gammas)))
+a1, a2 = 2.0, 0.5
+q_grid = np.linspace(0.05, 0.95, 200)
+γ_values = [0.5, 1.0, 2.0, 5.0]
+colors_sol = plt.cm.plasma(np.linspace(0.15, 0.85, len(γ_values)))
 
 fig, ax = plt.subplots(figsize=(8, 5))
-for gamma, color in zip(gammas, colors_sol):
+for γ, color in zip(γ_values, colors_sol):
     p_eq = [brentq(F_cara, a2, a1,
-                   args=(q, a1, a2, gamma))
+                   args=(q, a1, a2, γ))
             for q in q_grid]
     ax.plot(q_grid, p_eq, lw=2, color=color,
-            label=rf"$\gamma = {gamma}$")
+            label=rf"$\gamma = {γ}$")
 
 ax.set_xlabel(r"posterior $q = \Pr(\bar a = a_1)$", fontsize=12)
 ax.set_ylabel("equilibrium price $p^*(q)$", fontsize=12)
 ax.set_title("CARA preferences: equilibrium prices", fontsize=12)
 ax.legend(fontsize=10)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
 **3. Invertibility for CARA.**
 
-The price is strictly increasing in $q$ for every $\gamma > 0$.  Intuitively, portfolio
-utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect substitutes** in
-creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$. By
-{prf:ref}`ime_theorem_invertibility_conditions`, the price map is therefore always invertible.
+The price is strictly increasing in $q$ for every $\gamma > 0$, because
+portfolio utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect
+substitutes** in creating wealth, so a higher posterior probability of the
+high-return state raises the marginal value of the risky asset and pushes the
+equilibrium price upward.
+
+This behavior is similar in spirit to the $\sigma > 1$ case in
+{prf:ref}`ime_theorem_invertibility_conditions`, but it is *not* a direct
+consequence of that theorem because CARA utility over wealth is not homothetic
+in the two-good representation used in the theorem.
+
+Here monotonicity is verified directly from the specific first-order condition.
 
 ```{solution-end}
 ```
@@ -966,21 +1293,27 @@ creating wealth, giving an elasticity of substitution $\sigma = \infty \neq 1$.
 :label: km_ex2
 
 In the Bayesian learning simulation, the speed of
-convergence to rational expectations is determined by the **Kullback-Leibler divergence**
+convergence to rational expectations is determined by the **Kullback-Leibler
+divergence**
 between the two reduced forms.
 
-The KL divergence from $g(\cdot \mid \mu_2)$ to $g(\cdot \mid \mu_1)$, for two normal
-distributions with means $\bar{p}_1$ and $\bar{p}_2$ and common variance $\sigma_p^2$, is
+The KL divergence from $g(\cdot \mid \mu_2)$ to $g(\cdot \mid \mu_1)$, for two
+normal
+distributions with means $\bar{p}_1$ and $\bar{p}_2$ and common variance
+$\sigma_p^2$, is
 
 $$
 D_{KL}(\mu_1 \| \mu_2) = \frac{(\bar{p}_1 - \bar{p}_2)^2}{2\sigma_p^2}.
 $$
 
-1. For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard" case
+1. For the "easy" case ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.2$) and the "hard"
+   case
 ($\bar{p}_1 = 2.0$, $\bar{p}_2 = 1.8$), compute $D_{KL}$ for $\sigma_p = 0.4$.
 
-1. Re-run the simulations from the lecture for both cases with $n=100$ paths.  For each
-path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot histograms of
+1. Re-run the simulations from the lecture for both cases with $n=100$ paths.
+   For each
+path compute the first period $T_{0.99}$ at which $h_t \geq 0.99$.  Plot
+histograms of
 $T_{0.99}$ for both cases.
 
 1. How does the median $T_{0.99}$ scale with $D_{KL}$?  Verify numerically that
@@ -992,25 +1325,25 @@ roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
 ```
 
 ```{code-cell} ipython3
-sigma_p = 0.4
+σ_p = 0.4
 
-def kl_normal(p1, p2, sigma):
-    """KL divergence between N(p1,sigma^2) and N(p2,sigma^2)."""
-    return (p1 - p2)**2 / (2 * sigma**2)
+def kl_normal(p1, p2, σ):
+    """Return the KL divergence for N(p1, sigma^2) and N(p2, sigma^2)."""
+    return (p1 - p2)**2 / (2 * σ**2)
 
 cases = [("Easy",  2.0, 1.2), ("Hard", 2.0, 1.8)]
 for name, p1, p2 in cases:
-    kl = kl_normal(p1, p2, sigma_p)
+    kl = kl_normal(p1, p2, σ_p)
     print(f"{name} case: D_KL = {kl:.4f}")
 
 n_paths = 100
 
 fig, axes = plt.subplots(1, 2, figsize=(11, 4))
 for ax, (name, p1, p2) in zip(axes, cases):
-    kl = kl_normal(p1, p2, sigma_p)
-    paths = simulate_bayesian_learning(p1, p2, sigma_p, T=2000,
+    kl = kl_normal(p1, p2, σ_p)
+    paths = simulate_bayesian_learning(p1, p2, σ_p, T=2000,
                                        h0=0.5, n_paths=n_paths, seed=42)
-    # First period where posterior >= 0.99
+    # First period with posterior >= 0.99.
     T99 = []
     for path in paths:
         idx = np.where(path >= 0.99)[0]
@@ -1028,14 +1361,15 @@ for ax, (name, p1, p2) in zip(axes, cases):
     ax.set_xlabel(r"$T_{0.99}$", fontsize=12)
     ax.set_ylabel("count", fontsize=11)
     ax.legend(fontsize=10)
-    ax.grid(alpha=0.3)
 
 plt.tight_layout()
 plt.show()
 ```
 
-The median $T_{0.99}$ scales as approximately $C/D_{KL}$, confirming that learning is
-faster when the two reduced forms are more easily distinguished (large $D_{KL}$).
+The median $T_{0.99}$ scales as approximately $C/D_{KL}$, confirming that
+learning is
+faster when the two reduced forms are more easily distinguished (large
+$D_{KL}$).
 
 ```{solution-end}
 ```
@@ -1043,13 +1377,17 @@ faster when the two reduced forms are more easily distinguished (large $D_{KL}$)
 ```{exercise}
 :label: km_ex3
 
-**Failure of invertibility—counterexample for $S > 2$.**  The paper constructs a
-counterexample showing that for $S = 3$ states, even if the elasticity of substitution
-of $u^1$ is everywhere greater than one, $p(PR^1)$ need **not** be invertible.
+The paper constructs a
+counterexample showing that for $S = 3$ states, even if the elasticity of
+substitution
+of $u^1$ is everywhere greater than one, the price map need **not** be
+invertible.
 
 Consider the marginal rate of substitution for the portfolio utility
 $u^1(a_s x_1 + x_2)$ (infinite elasticity of substitution) and three states
-$a_1 > a_2 > a_3$.  The MRS is
+$a_1 > a_2 > a_3$.
+
+The MRS is
 
 $$
 m(\mu)
@@ -1060,16 +1398,21 @@ $$
 where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
 
 1. For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
-$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of $(q, r)$.
+$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of
+$(q, r)$.
 Compute $\partial m / \partial r$ and show that its sign depends on
 $\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
 
-1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with risk
-aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify numerically
-that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone in $r$),
+1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with
+   risk
+aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify
+numerically
+that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone
+in $r$),
 giving a counterexample to invertibility.
 
-1. Explain why this non-monotonicity does *not* arise in the two-state case $S = 2$.
+1. Explain why this non-monotonicity does *not* arise in the two-state case $S =
+   2$.
 ```
 
 ```{solution-start} km_ex3
@@ -1087,7 +1430,8 @@ Differentiating using the quotient rule (denominator $D$):
 
 $$
 \frac{\partial m}{\partial r}
-= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3 q)(\beta_2-\beta_1)}{D^2}.
+= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3
+q)(\beta_2-\beta_1)}{D^2}.
 $$
 
 After simplification this reduces to a signed combination of
@@ -1097,40 +1441,41 @@ whose sign is parameter-dependent.
 **2. Numerical verification.**
 
 ```{code-cell} ipython3
-def mrs_3state(q, r, a1, a2, a3, x1, x2, gamma):
-    """MRS with mu(a3)=q, mu(a2)=r, mu(a1)=1-r-q, portfolio utility u'(c)=c^{-gamma}."""
-    mu1, mu2, mu3 = 1 - r - q, r, q
-    beta1 = (a1 * x1 + x2)**(-gamma)
-    beta2 = (a2 * x1 + x2)**(-gamma)
-    beta3 = (a3 * x1 + x2)**(-gamma)
-    num = a1*beta1*mu1 + a2*beta2*mu2 + a3*beta3*mu3
-    den = beta1*mu1 + beta2*mu2 + beta3*mu3
+def mrs_3state(q, r, a1, a2, a3, x1, x2, γ):
+    """Return the three-state MRS at (q, r)."""
+    μ1, μ2, μ3 = 1 - r - q, r, q
+    β1 = (a1 * x1 + x2)**(-γ)
+    β2 = (a2 * x1 + x2)**(-γ)
+    β3 = (a3 * x1 + x2)**(-γ)
+    num = a1 * β1 * μ1 + a2 * β2 * μ2 + a3 * β3 * μ3
+    den = β1 * μ1 + β2 * μ2 + β3 * μ3
     return num / den
 
-a1, a2, a3  = 3.0, 2.0, 0.5
-x1, x2      = 1.0, 0.5
-gamma       = 2.0
-q_fix       = 0.1       # fix q, vary r
-r_grid      = np.linspace(0.05, 0.80, 200)
+a1, a2, a3 = 3.0, 2.0, 0.5
+x1, x2 = 1.0, 0.5
+γ = 2.0
+q_fix = 0.1
+r_grid = np.linspace(0.05, 0.80, 200)
 
-# Filter valid (q+r <= 1)
+# Valid region: q + r <= 1.
 r_valid = r_grid[r_grid + q_fix <= 0.95]
-m_vals  = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, gamma) for r in r_valid]
-dm_dr   = np.gradient(m_vals, r_valid)
+m_vals = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, γ) for r in r_valid]
+dm_dr = np.gradient(m_vals, r_valid)
 
 fig, axes = plt.subplots(1, 2, figsize=(11, 4))
 axes[0].plot(r_valid, m_vals, color="steelblue", lw=2)
 axes[0].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
-axes[0].set_ylabel(r"$m(q, r)$ — MRS", fontsize=12)
-axes[0].set_title(fr"MRS is non-monotone in $r$ (CRRA $\gamma={gamma}$)", fontsize=12)
-axes[0].grid(alpha=0.3)
+axes[0].set_ylabel("MRS m(q, r)", fontsize=12)
+axes[0].set_title(f"MRS is non-monotone in r (CRRA gamma={γ})", fontsize=12)
 
 axes[1].plot(r_valid, dm_dr, color="crimson", lw=2)
 axes[1].axhline(0, color="black", lw=1, ls="--")
 axes[1].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
 axes[1].set_ylabel(r"$\partial m / \partial r$", fontsize=12)
-axes[1].set_title("Derivative changes sign — non-invertibility for $S=3$", fontsize=12)
-axes[1].grid(alpha=0.3)
+axes[1].set_title(
+    "Derivative changes sign - non-invertibility for $S=3$",
+    fontsize=12,
+)
 
 plt.tight_layout()
 plt.show()
@@ -1139,15 +1484,19 @@ print("Sign changes in dm/dr:",
       np.sum(np.diff(np.sign(dm_dr)) != 0))
 ```
 
-The derivative $\partial m / \partial r$ changes sign, confirming that the MRS (and hence
+The derivative $\partial m / \partial r$ changes sign, confirming that the MRS
+(and hence
 the equilibrium price) is **not** monotone in $r$ for $S = 3$.
 
-**3.** In the two-state case $S = 2$, the prior is parameterized by a single scalar $q$
-and the MRS is a function of $q$ alone.  One can show directly that $\partial m / \partial q$
-has a definite sign determined entirely by whether $a_1 > a_2$ and whether
-$\sigma > 1$ or $\sigma < 1$ hold—there is no room for sign changes.  With three states,
-the two-dimensional prior $(q, r)$ allows richer interactions between $\beta_s$ values that
-can reverse the sign of the derivative.
+**3.** In the two-state case $S = 2$, the prior is parameterized by a single
+scalar $q$ and the MRS is a function of $q$ alone.
+
+One can show directly that $\partial m / \partial q$ has a definite sign
+determined entirely by whether $a_1 > a_2$ and whether $\sigma > 1$ or $\sigma <
+1$ hold, so there is no room for sign changes.
+
+With three states, the two-dimensional prior $(q, r)$ allows richer interactions
+between $\beta_s$ values that can reverse the sign of the derivative.
 
 ```{solution-end}
 ```
@@ -1158,21 +1507,33 @@ can reverse the sign of the derivative.
 {prf:ref}`ime_theorem_bayesian_convergence`
 assumes the true
 distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
-$h(\bar\lambda) > 0$).  Investigate what happens when the true model is **not** in the
+$h(\bar\lambda) > 0$).
+
+Investigate what happens when the true model is **not** in the
 prior support.
 
-1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior that
-    places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$.
+1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior
+   that
+    places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.3,
+    0.4^2)$.
 
     - Plot the posterior weight on each model over time.
 
-2. Show that the **predictive** (mixture) price distribution converges to the *closest*
-    model in KL divergence terms—which by symmetry is the equal mixture, with mean 2.0.
+2. Show that the **predictive** (mixture) price distribution converges to the
+   *closest*
+    model in KL divergence terms.
 
-    - Verify this numerically by computing the predictive mean over time.
+    - Compute the KL divergence from the true model to each wrong model.
+    - Verify numerically that the posterior concentrates on the closer wrong
+      model and that
+      the predictive mean converges to that model's mean.
 
 3. Relate this finding to the Bayesian consistency literature: when is the limit
-    distribution a good approximation to the true distribution even under misspecification?
+    distribution a good approximation to the true distribution even under
+    misspecification?
+    Why is the symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ a
+    knife-edge case rather
+    than a setting with a deterministic 50-50 posterior limit?
 ```
 
 ```{solution-start} km_ex4
@@ -1180,12 +1541,10 @@ prior support.
 ```
 
 ```{code-cell} ipython3
-def simulate_misspecified(T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0):
-    """
-    Misspecified Bayesian learning: two wrong models with means p_bar_wrong[0,1].
-    True model has mean p_bar_true (not in prior support).
-    Returns (n_paths, T+1, 2) array of posterior weights.
-    """
+def simulate_misspecified(
+    T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0
+):
+    """Simulate learning under a misspecified two-model prior."""
     rng = np.random.default_rng(seed)
     h_paths = np.zeros((n_paths, T + 1, 2))
     h_paths[:, 0, :] = h0
@@ -1202,52 +1561,116 @@ def simulate_misspecified(T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed
     return h_paths
 
 
-T        = 1000
-p_true   = 2.0
-p_wrong  = np.array([1.5, 2.5])
-sigma_p  = 0.4
-h0       = np.array([0.5, 0.5])
-n_paths  = 30
+def predictive_density(weights, means, sigma_p, p_grid):
+    """Return the predictive density under the current posterior weights."""
+    density = np.zeros_like(p_grid)
+    for weight, mean in zip(weights, means):
+        density += weight * norm.pdf(p_grid, loc=mean, scale=sigma_p)
+    return density
+
+
+T = 1000
+p_true = 2.0
+p_wrong = np.array([1.5, 2.3])
+sigma_p = 0.4
+h0 = np.array([0.5, 0.5])
+n_paths = 30
 
 h_misspec = simulate_misspecified(T, p_true, p_wrong, sigma_p, h0, n_paths)
 
+kl_vals = (p_true - p_wrong)**2 / (2 * sigma_p**2)
+for mean, kl in zip(p_wrong, kl_vals):
+    print(f"KL(true || N({mean:.1f}, sigma^2)) = {kl:.4f}")
+
 t_grid = np.arange(T + 1)
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
-for ax, k, label in zip(axes, [0, 1], [r"$N(1.5, \sigma^2)$", r"$N(2.5, \sigma^2)$"]):
+labels = [r"$N(1.5, \sigma^2)$", r"$N(2.3, \sigma^2)$"]
+for ax, k, label in zip(axes, [0, 1], labels):
     for path in h_misspec:
         ax.plot(t_grid, path[:, k], alpha=0.2, lw=0.8, color="steelblue")
     ax.plot(t_grid, np.median(h_misspec[:, :, k], axis=0),
             color="navy", lw=2, label="median")
-    ax.axhline(0.5, color="crimson", lw=1.5, ls="--", label="0.5 (symmetric limit)")
     ax.set_title(f"Posterior weight on {label}", fontsize=11)
     ax.set_xlabel("period $t$", fontsize=11)
     ax.set_ylabel("posterior weight", fontsize=11)
     ax.legend(fontsize=9)
-    ax.grid(alpha=0.3)
 
 plt.tight_layout()
 plt.show()
 
-# Predictive mean = h[:,0]*1.5 + h[:,1]*2.5
+# Predictive density and mean along the median posterior path.
+median_path = np.median(h_misspec, axis=0)
+p_grid = np.linspace(0.0, 3.5, 300)
+closer_idx = np.argmin(kl_vals)
+
+fig, ax = plt.subplots(figsize=(8, 4))
+colors = plt.cm.Blues(np.linspace(0.3, 1.0, 4))
+for t_snap, color in zip([0, 10, 100, T], colors):
+    dens = predictive_density(median_path[t_snap], p_wrong, sigma_p, p_grid)
+    ax.plot(p_grid, dens, color=color, lw=2, label=f"t = {t_snap}")
+
+ax.plot(
+    p_grid,
+    norm.pdf(p_grid, loc=p_wrong[closer_idx], scale=sigma_p),
+    "k--",
+    lw=2,
+    label="KL-best wrong model",
+)
+ax.set_xlabel("price $p$", fontsize=11)
+ax.set_ylabel("density", fontsize=11)
+ax.legend(fontsize=9)
+plt.tight_layout()
+plt.show()
+
 pred_mean = np.median(
     h_misspec[:, :, 0] * p_wrong[0] + h_misspec[:, :, 1] * p_wrong[1], axis=0
 )
 print(f"True mean: {p_true}")
 print(f"Predictive mean at T={T}: {pred_mean[-1]:.4f}")
-print("(Symmetry implies equal weight on 1.5 and 2.5 → predictive mean = 2.0)")
+print(f"Closer misspecified mean: {p_wrong[np.argmin(kl_vals)]:.1f}")
 ```
 
-By symmetry, the two wrong models are equidistant from the true distribution in KL
-divergence. 
+Here
+
+$$
+D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(2.3, 0.4^2)\bigr)
+<
+D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(1.5, 0.4^2)\bigr),
+$$
 
-The posterior therefore converges to the 50-50 mixture, and the predictive mean
-converges to $0.5 \times 1.5 + 0.5 \times 2.5 = 2.0$—coinciding with the true mean
-despite misspecification. 
+so the model with mean $2.3$ is the unique KL-best approximation among the two
+wrong models, and in the simulation posterior weight concentrates on that model
+while the predictive mean converges to $2.3$, not to the true mean $2.0$.
 
 This is an instance of the general result that under
-misspecification, Bayesian posteriors converge to the distribution in the model class that
+misspecification, Bayesian posteriors converge to the distribution in the model
+class that
 minimizes KL divergence from the model actually generating the data.
 
+The connection is that posterior odds are cumulative likelihood ratios.
+
+If we compare the two wrong Gaussian models $f$ and $g$, then under the true
+distribution $h$ the average log likelihood ratio satisfies
+
+$$
+\frac{1}{t} E_h[\log L_t] = K(h,g) - K(h,f).
+$$
+
+So if $f$ is KL-closer to $h$ than $g$ is, $\log L_t$ has positive drift and
+posterior odds tilt toward $f$.
+
+That is exactly the mechanism emphasized in {doc}`Likelihood Ratio Processes
+<likelihood_ratio_process>`.
+
+The lecture {doc}`likelihood_bayes` gives the Bayesian version of the same
+argument by showing how the posterior is a monotone transform of the likelihood
+ratio process.
+
+The symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ is different because both
+wrong models are equally far from the truth in KL terms, so there is no unique
+pseudo-true model and that knife-edge symmetry does **not** imply a
+deterministic 50-50 posterior limit.
+
 ```{solution-end}
 ```
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 2b0d292df..55c513e3a 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -326,13 +326,13 @@ Let's compute $a_1, a_2, b_1, b_2$.
 
 ```{code-cell} python3
 
-beta = multi_normal.βs
+β = multi_normal.βs
 
-a1 = μ[0] - beta[0]*μ[1]
-b1 = beta[0]
+a1 = μ[0] - β[0]*μ[1]
+b1 = β[0]
 
-a2 = μ[1] - beta[1]*μ[0]
-b2 = beta[1]
+a2 = μ[1] - β[1]*μ[0]
+b2 = β[1]
 ```
 
 Let's print out the intercepts and slopes.
@@ -2339,18 +2339,15 @@ the retained $z_1$ values.
 
 ```{code-cell} python3
 import numpy as np
-import statsmodels.api as sm
 
 μ = np.array([.5, 1.])
 Σ = np.array([[1., .5], [.5, 1.]])
 
-# (a) analytical conditional distribution
 mn = MultivariateNormal(μ, Σ)
 mn.partition(1)
 μ1_hat, Σ11_hat = mn.cond_dist(0, np.array([2.]))
 print(f"Analytical  μ̂₁ = {μ1_hat[0]:.4f},  Σ̂₁₁ = {Σ11_hat[0,0]:.4f}")
 
-# (b) simulation
 n = 1_000_000
 data = np.random.multivariate_normal(μ, Σ, size=n)
 z1_all, z2_all = data[:, 0], data[:, 1]
@@ -2396,12 +2393,12 @@ so $b_1 b_2 = \rho^2$.
 ```{code-cell} python3
 import numpy as np
 
-for rho in [0.2, 0.5, 0.9]:
-    Σ = np.array([[1., rho], [rho, 1.]])
+for ρ in [0.2, 0.5, 0.9]:
+    Σ = np.array([[1., ρ], [ρ, 1.]])
     mn = MultivariateNormal(np.zeros(2), Σ)
     mn.partition(1)
     product = float(mn.βs[0]) * float(mn.βs[1])
-    print(f"ρ={rho:.1f}:  b1*b2 = {product:.4f},  ρ² = {rho**2:.4f},  match: {np.isclose(product, rho**2)}")
+    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ² = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
 ```
 
 ```{solution-end}
@@ -2442,7 +2439,7 @@ for σy_val in [1., 5., 10., 20., 50.]:
         μ_i, Σ_i, _ = construct_moments_IQ(i, μθ_val, σθ_val, σy_val)
         mn_i = MultivariateNormal(μ_i, Σ_i)
         mn_i.partition(i)
-        _, Σθ_i = mn_i.cond_dist(1, np.zeros(i))   # conditioning value doesn't affect variance
+        _, Σθ_i = mn_i.cond_dist(1, np.zeros(i))
         σθ_hat_arr[i - 1] = np.sqrt(Σθ_i[0, 0])
     ax.plot(range(1, n_max + 1), σθ_hat_arr, label=f'σy={σy_val:.0f}')
 
@@ -2490,7 +2487,6 @@ import matplotlib.pyplot as plt
 n_scores = 20
 μθ_val, σy_val = 100., 10.
 
-# draw one set of test scores from a fixed "true" θ
 np.random.seed(42)
 true_θ = 108.
 y_obs = true_θ + σy_val * np.random.randn(n_scores)
@@ -2564,7 +2560,6 @@ T_ex = 60
 x0_hat_ex = np.zeros(2)
 Σ0_ex = np.eye(2)
 
-# simulate true states and observations
 np.random.seed(7)
 x_true = np.zeros((T_ex + 1, 2))
 y_seq_ex = np.zeros(T_ex)
@@ -2572,10 +2567,8 @@ for t in range(T_ex):
     x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * np.random.randn()
     y_seq_ex[t] = G_ex @ x_true[t] + np.random.randn()
 
-# run filter
 x_hat_seq, Σ_hat_seq = iterate(x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
 
-# (b) conditional variances
 fig, ax = plt.subplots()
 ax.plot(Σ_hat_seq[:, 0, 0], label=r'$\Sigma_t[0,0]$')
 ax.plot(Σ_hat_seq[:, 1, 1], label=r'$\Sigma_t[1,1]$')
@@ -2584,7 +2577,6 @@ ax.set_ylabel('conditional variance')
 ax.legend()
 plt.show()
 
-# (c) filtered state vs. truth vs. observations
 fig, ax = plt.subplots()
 ax.plot(x_true[1:, 0], label='true $x_t[0]$', alpha=0.7)
 ax.plot(x_hat_seq[1:, 0], label=r'filtered $\hat{x}_t[0]$', ls='--')
@@ -2633,7 +2625,6 @@ k_fa = 2
 Λ_fa[:N_fa//2, 0] = 1
 Λ_fa[N_fa//2:, 1] = 1
 
-results_table = {}
 for σu_val in [0.5, 2.0]:
     D_fa = np.eye(N_fa) * σu_val ** 2
     Σy_fa = Λ_fa @ Λ_fa.T + D_fa
@@ -2644,10 +2635,8 @@ for σu_val in [0.5, 2.0]:
     λ_fa   = λ_fa[ind_fa]
 
     frac = λ_fa[:2].sum() / λ_fa.sum()
-    results_table[σu_val] = frac
     print(f"σu={σu_val}: fraction explained by first 2 PCs = {frac:.4f}")
 
-# (b) comparison using σu=0.5
 σu_b = 0.5
 D_b  = np.eye(N_fa) * σu_b ** 2
 Σy_b = Λ_fa @ Λ_fa.T + D_b
@@ -2658,11 +2647,9 @@ z_b  = np.random.multivariate_normal(μz_b, Σz_b)
 f_b  = z_b[:k_fa]
 y_b  = z_b[k_fa:]
 
-# factor-analytic E[f|y]
 B_b    = Λ_fa.T @ np.linalg.inv(Σy_b)
 Efy_b  = B_b @ y_b
 
-# PCA projection
 λ_b, P_b = np.linalg.eigh(Σy_b)
 ind_b    = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
 P_b      = P_b[:, ind_b]
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index 8cbb50f40..bc5ac21f4 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -66,35 +66,45 @@ We'll briefly define what we mean by a **probability space**, a **probability me
 For most of this lecture, we sweep these objects into the background
  
 ```{note}
-Nevertheless, they'll be lurking beneath **induced distributions** of random variables that we'll focus on here. These deeper objects are essential for defining and analysing the concepts of stationarity and ergodicity that underly laws of large numbers. For a relatively
-nontechnical presentation of some of these results see this chapter from Lars Peter Hansen and Thomas J. Sargent's online monograph titled "Risk, Uncertainty, and Values":<https://lphansen.github.io/QuantMFR/book/1_stochastic_processes.html>.
+Nevertheless, they'll be lurking beneath **induced distributions** of random variables that we'll focus on here. 
+
+These deeper objects are essential for defining and analysing the concepts of stationarity and ergodicity that underly laws of large numbers.
+
+For a relatively
+nontechnical presentation of some of these results see this chapter from Lars Peter Hansen and Thomas J. Sargent's online monograph titled [*Risk, Uncertainty, and Values*](https://lphansen.github.io/QuantMFR/book/1_stochastic_processes.html).
 ``` 
   
 
 
-Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega$ be a particular underlying outcomes.
+Let $\Omega$ be a set of possible underlying outcomes and let $\omega \in \Omega$ be a particular underlying outcome.
 
-Let $\mathcal{G} \subset \Omega$ be a subset of $\Omega$.
+Let $\mathcal{F}$ be a collection of subsets of $\Omega$ that we call **events**.
 
-Let $\mathcal{F}$ be a collection of such subsets $\mathcal{G} \subset \Omega$.
+(Technically, $\mathcal{F}$ is a [$\sigma$-algebra](https://en.wikipedia.org/wiki/Sigma-algebra).)
 
-The pair $\Omega,\mathcal{F}$ forms our **probability space** on which we want to put a probability measure.
+A **probability measure** $\mu$ maps each event $\mathcal{G} \in \mathcal{F}$ into a scalar number $\mu(\mathcal{G})$ between $0$ and $1$, with $\mu(\Omega)=1$.
 
-A **probability measure** $\mu$ maps a set of possible underlying outcomes $\mathcal{G} \in \mathcal{F}$ into a scalar number between $0$ and $1$
+The triple $\Omega,\mathcal{F},\mu$ forms our **probability space**.
 
-- this is the "probability" that $X$ belongs to $A$, denoted by $ \textrm{Prob}\{X\in A\}$.
+A **random variable** $X(\omega)$ is a function of the underlying outcome $\omega \in \Omega$ that assigns a value in some set of possible values.
 
-A **random variable** $X(\omega)$ is a function of the underlying outcome $\omega \in \Omega$.
+If $A$ is a set of possible values of $X$, then the event that $X$ lies in $A$ is
 
+$$
+\mathcal{G} = \{\omega \in \Omega : X(\omega) \in A\}.
+$$
 
-The random variable $X(\omega)$ has a **probability distribution** that is induced by the underlying probability measure $\mu$ and the function
-$X(\omega)$:
+The random variable $X(\omega)$ has a **probability distribution** induced by the probability measure $\mu$:
 
 $$
-\textrm{Prob} (X \in A ) = \int_{\mathcal{G}} \mu(\omega) d \omega
-$$ (eq:CDFfromdensity)
+\textrm{Prob}(X \in A) = \mu(\mathcal{G}).
+$$
 
-where ${\mathcal G}$ is the subset of $\Omega$ for which $X(\omega) \in A$.
+If $\mu$ has a density $p(\omega)$, then we can also write
+
+$$
+\textrm{Prob}(X \in A) = \int_{\mathcal{G}} p(\omega)\, d \omega
+$$ (eq:CDFfromdensity)
 
 We call this the induced probability distribution of random variable $X$.
 
@@ -124,6 +134,7 @@ To appreciate how statisticians connect probabilities to data, the key is to und
      - **Law of Large Numbers (LLN)**
      - **Central Limit Theorem (CLT)**
 
+### A discrete random variable example
 
 #### Scalar example
 
@@ -156,7 +167,7 @@ What do "identical" and "independent" mean in IID or iid ("identically and indep
 
 $$
 \begin{aligned}
-\textrm{Prob}\{x_0 = i_0, x_1 = i_1, \dots , x_{N-1} = i_{N-1}\} &= \textrm{Prob}\{x_0 = i_0\} \cdot \dots \cdot \textrm{Prob}\{x_{I-1} = i_{I-1}\}\\
+\textrm{Prob}\{x_0 = i_0, x_1 = i_1, \dots , x_{N-1} = i_{N-1}\} &= \textrm{Prob}\{x_0 = i_0\} \cdot \dots \cdot \textrm{Prob}\{x_{N-1} = i_{N-1}\}\\
 &= f_{i_0} f_{i_1} \cdot \dots \cdot f_{i_{N-1}}\\
 \end{aligned}
 $$
@@ -182,13 +193,14 @@ A Central Limit Theorem (CLT) describes a **rate** at which $\tilde {f_i} \to f_
 
 See {doc}`lln_clt` for a detailed treatment of both results.
 
+### Understanding probability: frequentist vs. Bayesian
+
 For "frequentist" statisticians, **anticipated relative frequency** is **all** that a probability distribution means.
 
 But for a Bayesian it means something else -- something partly subjective and purely personal.
 
 We say "partly" because a Bayesian also pays attention to relative frequencies.
 
-
 ## Representing probability distributions
 
 A probability distribution $\textrm{Prob} (X \in A)$ can be described by its **cumulative distribution function (CDF)**
@@ -216,7 +228,7 @@ For a **discrete-valued** random variable
 
 * the number  of possible values of $X$ is finite or countably infinite
 * we replace a  **density** with a **probability mass function**, a non-negative sequence that sums to one
-* we replace integration with summation in the formula like {eq}`eq:CDFfromdensity` that relates a CDF to a probability mass function
+* when a density exists, we replace integration with summation in formulas like {eq}`eq:CDFfromdensity`
 
 
 In this lecture, we mostly discuss discrete random variables.
@@ -297,7 +309,7 @@ An example of a parametric probability distribution is  a **geometric distributi
 It is described by
 
 $$
-f_{i} = \textrm{Prob}\{X=i\} = (1-\lambda)\lambda^{i},\quad \lambda \in [0,1], \quad i = 0, 1, 2, \ldots
+f_{i} = \textrm{Prob}\{X=i\} = (1-\lambda)\lambda^{i},\quad \lambda \in [0,1), \quad i = 0, 1, 2, \ldots
 $$
 
 Evidently,  $\sum_{i=0}^{\infty}f_i=1$.
@@ -310,7 +322,7 @@ $$
 
 ### Continuous random variable
 
-Let $X$ be a continous random variable that takes values $X \in \tilde{X}\equiv[X_U,X_L]$ whose distributions have parameters $\theta$.
+Let $X$ be a continuous random variable that takes values in a set $\tilde{X} \subseteq \mathbb{R}$ and whose distribution has parameters $\theta$.
 
 $$
 \textrm{Prob}\{X\in A\} = \int_{x\in A} f(x;\theta)\,dx;  \quad f(x;\theta)\ge0
@@ -432,7 +444,7 @@ $$
 $$ (eq:condprobbayes)
 
 ```{note}
-Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 1,  \ldots, J$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
+Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 0,  \ldots, I-1$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
 He  then interprets  formula {eq}`eq:condprobbayes` as a procedure for constructing a **posterior** distribution that describes how he would  revise his subjective beliefs after observing that $Y$ equals $j$.  
 ```
 
@@ -839,6 +851,8 @@ class discrete_bijoint:
 
 Let's apply our code to some examples.
 
+### Numerical examples
+
 #### Example 1
 
 ```{code-cell} ipython3
@@ -924,6 +938,8 @@ y = np.linspace(-10, 10, 1_000)
 x_mesh, y_mesh = np.meshgrid(x, y, indexing="ij")
 ```
 
+### Joint, marginal, and conditional distributions
+
 #### Joint distribution
 
 Let's plot the **population** joint density.
@@ -987,9 +1003,9 @@ plt.show()
 For a bivariate normal population distribution, the conditional distributions are also normal:
 
 $$
-\begin{aligned} \\
-[X|Y &= y ]\sim \mathbb{N}\bigg[\mu_X+\rho\sigma_X\frac{y-\mu_Y}{\sigma_Y},\sigma_X^2(1-\rho^2)\bigg] \\
-[Y|X &= x ]\sim \mathbb{N}\bigg[\mu_Y+\rho\sigma_Y\frac{x-\mu_X}{\sigma_X},\sigma_Y^2(1-\rho^2)\bigg]
+\begin{aligned}
+X \mid Y = y &\sim \mathbb{N}\bigg[\mu_X+\rho\sigma_X\frac{y-\mu_Y}{\sigma_Y},\sigma_X^2(1-\rho^2)\bigg] \\
+Y \mid X = x &\sim \mathbb{N}\bigg[\mu_Y+\rho\sigma_Y\frac{x-\mu_X}{\sigma_X},\sigma_Y^2(1-\rho^2)\bigg]
 \end{aligned}
 $$
 
@@ -997,30 +1013,33 @@ $$
 Please see this {doc}`quantecon lecture <multivariate_normal>` for more details.
 ```
 
-Let's approximate  the joint density by discretizing and mapping the approximating joint density into a  matrix.
+Let's approximate the joint density by discretizing and mapping the approximating joint density into a matrix.
+
+On an evenly spaced grid, we can approximate the conditional distribution by assigning probability weights proportional to a slice of the joint density.
 
-We can compute the discretized marginal density  by just using matrix algebra and  noting that
+For fixed $y$, this means that
 
 $$
-\textrm{Prob}\{X=i|Y=j\}=\frac{f_{ij}}{\sum_{i}f_{ij}}
+z_i
+\equiv \frac{f(x_i,y)}{\sum_k f(x_k,y)}
 $$
 
 Fix $y=0$.
 
 ```{code-cell} ipython3
-# discretized marginal density
+# discretized conditional distribution of X given Y = 0
 x = np.linspace(-10, 10, 1_000_000)
 z = func(x, y=0) / np.sum(func(x, y=0))
 plt.plot(x, z)
 plt.show()
 ```
 
-The mean and variance are computed by
+The conditional mean and variance are then approximated by
 
 $$
 \begin{aligned}
-\mathbb{E}\left[X\vert Y=j\right] & =\sum_{i}iProb\{X=i\vert Y=j\}=\sum_{i}i\frac{f_{ij}}{\sum_{i}f_{ij}} \\
-\mathbb{D}\left[X\vert Y=j\right] &=\sum_{i}\left(i-\mu_{X\vert Y=j}\right)^{2}\frac{f_{ij}}{\sum_{i}f_{ij}}
+\mathbb{E}\left[X\vert Y=y\right] & \approx \sum_i x_i z_i \\
+\mathbb{D}\left[X\vert Y=y\right] & \approx \sum_i\left(x_i-\mu_{X\vert Y=y}\right)^{2} z_i
 \end{aligned}
 $$
 
@@ -1042,14 +1061,14 @@ plt.show()
 Fix $x=1$.
 
 ```{code-cell} ipython3
-y = np.linspace(0, 10, 1_000_000)
+y = np.linspace(-10, 10, 1_000_000)
 z = func(x=1, y=y) / np.sum(func(x=1, y=y))
 plt.plot(y,z)
 plt.show()
 ```
 
 ```{code-cell} ipython3
-# discretized mean and standard deviation
+# discretized conditional mean and standard deviation
 μy = np.dot(y,z)
 σy = np.sqrt(np.dot((y - μy)**2, z))
 
@@ -1226,7 +1245,7 @@ Couplings are important in optimal transport problems and in Markov processes. P
 
 ## Copula functions
 
-Suppose that $X_1, X_2, \dots, X_n$ are $N$ random variables  and that
+Suppose that $X_1, X_2, \dots, X_N$ are $N$ random variables  and that
 
 * their marginal distributions are $F_1(x_1), F_2(x_2),\dots, F_N(x_N)$,  and
 
@@ -1238,12 +1257,15 @@ $$
 H(x_1,x_2,\dots,x_N) = C(F_1(x_1), F_2(x_2),\dots,F_N(x_N)).
 $$
 
-We can obtain
+If the marginal distributions are continuous, then the copula is unique.
+In that case, we can recover it from the marginal inverses:
 
 $$
-C(u_1,u_2,\dots,u_n) = H[F^{-1}_1(u_1),F^{-1}_2(u_2),\dots,F^{-1}_N(u_N)]
+C(u_1,u_2,\dots,u_N) = H(F^{-1}_1(u_1),F^{-1}_2(u_2),\dots,F^{-1}_N(u_N))
 $$
 
+When marginal distributions are not continuous, one uses generalized inverses, and the copula is uniquely determined only on $\textrm{Ran}(F_1)\times \cdots \times \textrm{Ran}(F_N)$.
+
 In a reverse direction of logic, given univariate  **marginal distributions**
 $F_1(x_1), F_2(x_2),\dots,F_N(x_N)$ and a copula function $C(\cdot)$, the function $H(x_1,x_2,\dots,x_N) = C(F_1(x_1), F_2(x_2),\dots,F_N(x_N))$ is a **coupling** of $F_1(x_1), F_2(x_2),\dots,F_N(x_N)$.
 
@@ -1252,6 +1274,8 @@ Thus, for given marginal distributions, we can use  a copula function to determi
 
 Copula functions are often used to characterize **dependence** of  random variables.
 
+### Bivariate examples with discrete and continuous distributions
+
 #### Discrete marginal distribution
 
 As mentioned above, for two given marginal distributions there can be more than one coupling.
@@ -1272,9 +1296,8 @@ For these two random variables there can be more than one coupling.
 Let's first generate X and Y.
 
 ```{code-cell} ipython3
-# define parameters
-mu = np.array([0.6, 0.4])
-nu = np.array([0.3, 0.7])
+μ = np.array([0.6, 0.4])
+ν = np.array([0.3, 0.7])
 
 # number of draws
 draws = 1_000_000
@@ -1285,10 +1308,10 @@ p = np.random.rand(draws)
 # generate draws of X and Y via uniform distribution
 x = np.ones(draws)
 y = np.ones(draws)
-x[p <= mu[0]] = 0
-x[p > mu[0]] = 1
-y[p <= nu[0]] = 0
-y[p > nu[0]] = 1
+x[p <= μ[0]] = 0
+x[p > μ[0]] = 1
+y[p <= ν[0]] = 0
+y[p > ν[0]] = 1
 ```
 
 ```{code-cell} ipython3
@@ -1499,12 +1522,12 @@ mystnb:
 from scipy import stats
 
 # Gaussian copula parameters
-rho_cop = 0.8
+ρ_cop = 0.8
 n_cop = 100_000
 
-# Step 1: draw from bivariate standard normal with correlation rho_cop
+# Step 1: draw from bivariate standard normal with correlation ρ_cop
 z = np.random.multivariate_normal(
-    [0, 0], [[1, rho_cop], [rho_cop, 1]], n_cop
+    [0, 0], [[1, ρ_cop], [ρ_cop, 1]], n_cop
 )
 
 # Step 2: apply normal CDF -> uniform marginals (the copula itself)
@@ -1569,24 +1592,20 @@ import numpy as np
 F = np.array([[0.3, 0.2],
               [0.1, 0.4]])
 
-# (a) marginals
-mu = F.sum(axis=1)   # sum over columns -> marginal for X
-nu = F.sum(axis=0)   # sum over rows    -> marginal for Y
-print("mu (marginal of X):", mu)
-print("nu (marginal of Y):", nu)
+μ = F.sum(axis=1)
+ν = F.sum(axis=0)
+print("μ (marginal of X):", μ)
+print("ν (marginal of Y):", ν)
 
-# (b) independence matrix
-F_indep = np.outer(mu, nu)
+F_indep = np.outer(μ, ν)
 print("\nIndependence matrix (outer product):\n", F_indep)
 print("\nActual joint F:\n", F)
 
-# (c) test independence
-print("\nIndependent (F == mu ⊗ nu)?", np.allclose(F, F_indep))
+print("\nIndependent (F == μ ⊗ ν)?", np.allclose(F, F_indep))
 
-# (d) conditional vs. marginal
-prob_X0_given_Y10 = F[0, 0] / nu[0]
+prob_X0_given_Y10 = F[0, 0] / ν[0]
 print(f"\nProb(X=0 | Y=10) = {prob_X0_given_Y10:.4f}")
-print(f"Prob(X=0)         = {mu[0]:.4f}")
+print(f"Prob(X=0)         = {μ[0]:.4f}")
 ```
 
 ```{solution-end}
@@ -1620,22 +1639,19 @@ ys = np.array([10, 20])
 F  = np.array([[0.3, 0.2],
                [0.1, 0.4]])
 
-mu = F.sum(axis=1)
-nu = F.sum(axis=0)
+μ = F.sum(axis=1)
+ν = F.sum(axis=0)
 
-# (a)
-E_X  = xs @ mu
-E_Y  = ys @ nu
+E_X  = xs @ μ
+E_Y  = ys @ ν
 E_XY = sum(xs[i] * ys[j] * F[i, j] for i in range(2) for j in range(2))
 print(f"E[X] = {E_X}, E[Y] = {E_Y}, E[XY] = {E_XY}")
 
-# (b)
 cov_XY = E_XY - E_X * E_Y
 print(f"Cov(X,Y) = {cov_XY:.4f}")
 
-# (c)
-var_X  = ((xs - E_X)**2) @ mu
-var_Y  = ((ys - E_Y)**2) @ nu
+var_X  = ((xs - E_X)**2) @ μ
+var_Y  = ((ys - E_Y)**2) @ ν
 cor_XY = cov_XY / np.sqrt(var_X * var_Y)
 print(f"Cor(X,Y) = {cor_XY:.4f}")
 ```
@@ -1677,12 +1693,10 @@ Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z =
 import numpy as np
 import matplotlib.pyplot as plt
 
-# (a) convolution
 f = np.ones(6) / 6
-h = np.convolve(f, f)        # Z takes values 2,...,12
+h = np.convolve(f, f)
 z_vals = np.arange(2, 13)
 
-# (b & c) plot theory and simulation
 n = 1_000_000
 z_sim = np.random.randint(1, 7, n) + np.random.randint(1, 7, n)
 counts = np.bincount(z_sim, minlength=13)[2:]
@@ -1695,7 +1709,6 @@ ax.set_ylabel('Probability')
 ax.legend()
 plt.show()
 
-# (d) moments
 E_Z   = z_vals @ h
 Var_Z = ((z_vals - E_Z)**2) @ h
 print(f"Theory:     E[Z] = {E_Z:.2f}, Var(Z) = {Var_Z:.4f}")
@@ -1734,21 +1747,18 @@ import numpy as np
 
 P    = np.array([[0.9, 0.1],
                  [0.2, 0.8]])
-psi0 = np.array([1.0, 0.0])
+ψ0 = np.array([1.0, 0.0])
 
-# (a)
 for n in [1, 5, 20, 100]:
-    print(f"psi_{n:3d} = {psi0 @ np.linalg.matrix_power(P, n)}")
+    print(f"ψ_{n:3d} = {ψ0 @ np.linalg.matrix_power(P, n)}")
 
-# (b) stationary: solve (P^T - I) psi = 0  with  sum = 1
 A = np.vstack([P.T - np.eye(2), np.ones(2)])
 b = np.array([0.0, 0.0, 1.0])
-psi_star, *_ = np.linalg.lstsq(A, b, rcond=None)
-print(f"\nStationary distribution: {psi_star}")
+ψ_star, *_ = np.linalg.lstsq(A, b, rcond=None)
+print(f"\nStationary distribution: {ψ_star}")
 
-# (c) verify
-psi_100 = psi0 @ np.linalg.matrix_power(P, 100)
-print(f"psi_100 close to stationary? {np.allclose(psi_100, psi_star, atol=1e-6)}")
+ψ_100 = ψ0 @ np.linalg.matrix_power(P, 100)
+print(f"ψ_100 close to stationary? {np.allclose(ψ_100, ψ_star, atol=1e-6)}")
 ```
 
 ```{solution-end}
@@ -1781,37 +1791,32 @@ import numpy as np
 
 xs = np.array([0, 1])
 ys = np.array([0, 1])
-mu = np.array([0.5, 0.5])
-nu = np.array([0.4, 0.6])
+μ = np.array([0.5, 0.5])
+ν = np.array([0.4, 0.6])
 
-# (a) upper Fréchet: maximise P(X=i, Y=i)
 F_upper = np.array([[0.4, 0.1],
                     [0.0, 0.5]])
 
-# (b) lower Fréchet: maximise P(X=i, Y=1-i)
 F_lower = np.array([[0.0, 0.5],
                     [0.4, 0.1]])
 
-# (c) independent
-F_indep = np.outer(mu, nu)
+F_indep = np.outer(μ, ν)
 
-# (d) check marginals
 for F, name in [(F_upper, "Upper Fréchet"),
                 (F_lower, "Lower Fréchet"),
                 (F_indep, "Independent  ")]:
     print(f"{name}: row sums = {F.sum(axis=1)}, col sums = {F.sum(axis=0)}")
 
-# (e) correlations
 def correlation(F, xs, ys):
-    mu_x  = F.sum(axis=1)
-    nu_y  = F.sum(axis=0)
-    E_X   = xs @ mu_x
-    E_Y   = ys @ nu_y
+    μ_x  = F.sum(axis=1)
+    ν_y  = F.sum(axis=0)
+    E_X  = xs @ μ_x
+    E_Y  = ys @ ν_y
     E_XY  = sum(xs[i]*ys[j]*F[i,j] for i in range(2) for j in range(2))
     cov   = E_XY - E_X * E_Y
-    sig_X = np.sqrt(((xs - E_X)**2) @ mu_x)
-    sig_Y = np.sqrt(((ys - E_Y)**2) @ nu_y)
-    return cov / (sig_X * sig_Y)
+    σ_X = np.sqrt(((xs - E_X)**2) @ μ_x)
+    σ_Y = np.sqrt(((ys - E_Y)**2) @ ν_y)
+    return cov / (σ_X * σ_Y)
 
 print(f"\nCor upper Fréchet = {correlation(F_upper, xs, ys):.4f}  (maximum)")
 print(f"Cor lower Fréchet = {correlation(F_lower, xs, ys):.4f}  (minimum)")
@@ -1852,28 +1857,28 @@ import numpy as np
 import matplotlib.pyplot as plt
 from scipy.special import comb
 
-thetas = np.array([0.2, 0.5, 0.8])
-prior  = np.array([0.25, 0.50, 0.25])
+θ_vals = np.array([0.2, 0.5, 0.8])
+π = np.array([0.25, 0.50, 0.25])
 
-def compute_posterior(k, n, thetas, prior):
-    likelihood = comb(n, k) * thetas**k * (1 - thetas)**(n - k)
-    unnorm = likelihood * prior
+def compute_posterior(k, n, θ_vals, π):
+    likelihood = comb(n, k) * θ_vals**k * (1 - θ_vals)**(n - k)
+    unnorm = likelihood * π
     return unnorm / unnorm.sum(), likelihood
 
-post7, lik7 = compute_posterior(7, 10, thetas, prior)
-post3, lik3 = compute_posterior(3, 10, thetas, prior)
+post7, lik7 = compute_posterior(7, 10, θ_vals, π)
+post3, lik3 = compute_posterior(3, 10, θ_vals, π)
 
 print("k=7:  likelihood =", lik7.round(4), " posterior =", post7.round(4))
 print("k=3:  likelihood =", lik3.round(4), " posterior =", post3.round(4))
 
-x = np.arange(len(thetas))
+x = np.arange(len(θ_vals))
 w = 0.3
 fig, axes = plt.subplots(1, 2, figsize=(10, 4))
 for ax, post, title in zip(axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
-    ax.bar(x - w/2, prior, w, label='Prior',     alpha=0.7)
+    ax.bar(x - w/2, π, w, label='Prior',     alpha=0.7)
     ax.bar(x + w/2, post,  w, label='Posterior', alpha=0.7)
     ax.set_xticks(x)
-    ax.set_xticklabels([f'θ={t}' for t in thetas])
+    ax.set_xticklabels([f'θ={t}' for t in θ_vals])
     ax.set_ylabel('Probability')
     ax.set_title(title)
     ax.legend()

From 142fa51d1f1473197f71d1fbb710deda6b29dafe Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 22:33:59 +1000
Subject: [PATCH 05/10] updates

---
 lectures/multivariate_normal.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 55c513e3a..a3be75575 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -2346,7 +2346,7 @@ import numpy as np
 mn = MultivariateNormal(μ, Σ)
 mn.partition(1)
 μ1_hat, Σ11_hat = mn.cond_dist(0, np.array([2.]))
-print(f"Analytical  μ̂₁ = {μ1_hat[0]:.4f},  Σ̂₁₁ = {Σ11_hat[0,0]:.4f}")
+print(f"Analytical  μ1_hat = {μ1_hat[0]:.4f},  Σ11_hat = {Σ11_hat[0,0]:.4f}")
 
 n = 1_000_000
 data = np.random.multivariate_normal(μ, Σ, size=n)
@@ -2355,7 +2355,7 @@ z1_all, z2_all = data[:, 0], data[:, 1]
 mask = np.abs(z2_all - 2.) < 0.05
 z1_cond = z1_all[mask]
 print(f"Sample size in band: {mask.sum()}")
-print(f"Sample      μ̂₁ = {np.mean(z1_cond):.4f},  Σ̂₁₁ = {np.var(z1_cond, ddof=1):.4f}")
+print(f"Sample      μ1_hat = {np.mean(z1_cond):.4f},  Σ11_hat = {np.var(z1_cond, ddof=1):.4f}")
 ```
 
 ```{solution-end}
@@ -2398,7 +2398,7 @@ for ρ in [0.2, 0.5, 0.9]:
     mn = MultivariateNormal(np.zeros(2), Σ)
     mn.partition(1)
     product = float(mn.βs[0]) * float(mn.βs[1])
-    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ² = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
+    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ^2 = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
 ```
 
 ```{solution-end}
@@ -2504,15 +2504,15 @@ for σθ_val in σθ_vals:
 
 fig, ax = plt.subplots()
 ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', label=r'$\hat{\mu}_\theta$')
-ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean ȳ = {y_bar:.1f}')
+ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean y_bar = {y_bar:.1f}')
 ax.axhline(μθ_val, ls=':',  color='g', label=f'prior mean μθ = {μθ_val:.0f}')
 ax.set_xlabel(r'$\sigma_\theta$')
 ax.set_ylabel(r'posterior mean $\hat{\mu}_\theta$')
 ax.legend()
 plt.show()
 
-print(f"ȳ = {y_bar:.4f}")
-print(f"Large σθ posterior mean ≈ {μθ_hat_vals[-1]:.4f}")
+print(f"y_bar = {y_bar:.4f}")
+print(f"Large σθ posterior mean approx {μθ_hat_vals[-1]:.4f}")
 ```
 
 ```{solution-end}

From b353c4e024830b5523bda20d099a9ff66eb6cc96 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 21 Apr 2026 22:35:55 +1000
Subject: [PATCH 06/10] update

---
 lectures/information_market_equilibrium.md | 26 +++++++++++-----------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 475df4d1f..fdba548c8 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -1032,11 +1032,11 @@ mystnb:
     caption: price distribution convergence
     name: fig-price-convergence
 ---
-def price_expectation(h_t, p_bar_true, p_bar_alt, sigma_p, p_grid):
+def price_expectation(h_t, p_bar_true, p_bar_alt, σ_p, p_grid):
     """Return the predictive price density at posterior weight h_t."""
     return (
-        h_t * norm.pdf(p_grid, loc=p_bar_true, scale=sigma_p)
-        + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=sigma_p)
+        h_t * norm.pdf(p_grid, loc=p_bar_true, scale=σ_p)
+        + (1 - h_t) * norm.pdf(p_grid, loc=p_bar_alt, scale=σ_p)
     )
 
 
@@ -1542,7 +1542,7 @@ prior support.
 
 ```{code-cell} ipython3
 def simulate_misspecified(
-    T, p_bar_true, p_bar_wrong, sigma_p, h0, n_paths, seed=0
+    T, p_bar_true, p_bar_wrong, σ_p, h0, n_paths, seed=0
 ):
     """Simulate learning under a misspecified two-model prior."""
     rng = np.random.default_rng(seed)
@@ -1551,9 +1551,9 @@ def simulate_misspecified(
 
     for path in range(n_paths):
         h = np.array(h0, dtype=float)
-        prices = rng.normal(p_bar_true, sigma_p, size=T)
+        prices = rng.normal(p_bar_true, σ_p, size=T)
         for t, price in enumerate(prices):
-            likes = norm.pdf(price, loc=p_bar_wrong, scale=sigma_p)
+            likes = norm.pdf(price, loc=p_bar_wrong, scale=σ_p)
             h = h * likes
             h /= h.sum()
             h_paths[path, t + 1, :] = h
@@ -1561,24 +1561,24 @@ def simulate_misspecified(
     return h_paths
 
 
-def predictive_density(weights, means, sigma_p, p_grid):
+def predictive_density(weights, means, σ_p, p_grid):
     """Return the predictive density under the current posterior weights."""
     density = np.zeros_like(p_grid)
     for weight, mean in zip(weights, means):
-        density += weight * norm.pdf(p_grid, loc=mean, scale=sigma_p)
+        density += weight * norm.pdf(p_grid, loc=mean, scale=σ_p)
     return density
 
 
 T = 1000
 p_true = 2.0
 p_wrong = np.array([1.5, 2.3])
-sigma_p = 0.4
+σ_p = 0.4
 h0 = np.array([0.5, 0.5])
 n_paths = 30
 
-h_misspec = simulate_misspecified(T, p_true, p_wrong, sigma_p, h0, n_paths)
+h_misspec = simulate_misspecified(T, p_true, p_wrong, σ_p, h0, n_paths)
 
-kl_vals = (p_true - p_wrong)**2 / (2 * sigma_p**2)
+kl_vals = (p_true - p_wrong)**2 / (2 * σ_p**2)
 for mean, kl in zip(p_wrong, kl_vals):
     print(f"KL(true || N({mean:.1f}, sigma^2)) = {kl:.4f}")
 
@@ -1607,12 +1607,12 @@ closer_idx = np.argmin(kl_vals)
 fig, ax = plt.subplots(figsize=(8, 4))
 colors = plt.cm.Blues(np.linspace(0.3, 1.0, 4))
 for t_snap, color in zip([0, 10, 100, T], colors):
-    dens = predictive_density(median_path[t_snap], p_wrong, sigma_p, p_grid)
+    dens = predictive_density(median_path[t_snap], p_wrong, σ_p, p_grid)
     ax.plot(p_grid, dens, color=color, lw=2, label=f"t = {t_snap}")
 
 ax.plot(
     p_grid,
-    norm.pdf(p_grid, loc=p_wrong[closer_idx], scale=sigma_p),
+    norm.pdf(p_grid, loc=p_wrong[closer_idx], scale=σ_p),
     "k--",
     lw=2,
     label="KL-best wrong model",

From 76d1a159eb6b22d7634d7015b08f38212a014d48 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 22 Apr 2026 14:21:59 +1000
Subject: [PATCH 07/10] updates

---
 lectures/information_market_equilibrium.md | 237 +++------------------
 1 file changed, 33 insertions(+), 204 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index fdba548c8..5dffc3b90 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -71,11 +71,9 @@ Important findings of {cite:t}`kihlstrom_mirman1975` are:
   insider's posterior distribution to the equilibrium price is one-to-one on
   the set of
   posteriors that can actually arise from the signal.
-- In the paper's two-state theorem, invertibility holds when the informed
-  agent's utility is homothetic and the elasticity of substitution is everywhere
-  either below one or above one, with CES preferences providing a convenient
-  illustration and Cobb-Douglas preferences ($\sigma = 1$) giving the opposite
-  case in which the equilibrium price is independent of the insider's posterior.
+  - Invertibility holds when the informed
+    agent's utility is homothetic and the elasticity of substitution is everywhere
+    either below one or above one.
 - In the dynamic economy, as information accumulates, Bayesian price
   expectations converge to **rational expectations**, even when the deep
   structure is not identified from prices alone.
@@ -230,8 +228,8 @@ $p(\mu_{\tilde{y}})$ is **sufficient** for $\tilde{y}$.
 ```{prf:definition} Sufficiency
 :label: ime_def_sufficiency
 
-A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ (with
-respect to $\bar{a}$) if there exists a conditional distribution $P(y' \mid y)$,
+A random variable $\tilde{y}$ is *sufficient* for $\tilde{y}'$ with
+respect to $\bar{a}$ if there exists a conditional distribution $P(y' \mid y)$,
 **independent of** $\bar{a}$, such that
 
 $$
@@ -285,7 +283,7 @@ equivalently if its
 inverse is well defined on the price set
 
 $$
-P \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
+\mathcal{P} \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
   P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \bigr\}.
 $$
 ```
@@ -360,7 +358,7 @@ when is the belief-to-price map actually one-to-one?
 
 When does the belief-to-price map fail to be invertible?
 
-Theorem {prf:ref}`ime_theorem_invertibility_conditions`
+{prf:ref}`ime_theorem_invertibility_conditions`
 shows that for a two-state economy ($S = 2$), the answer depends on the
 **elasticity of
 substitution** $\sigma$ of agent 1's utility function.
@@ -372,25 +370,21 @@ argument.
 ```{prf:lemma} Same Price Implies Same Allocation
 :label: ime_lemma_same_price_same_allocation
 
-Fix the beliefs of all agents except agent 1.
+Assume that $u^i$ has continuous first partial derivatives
+and that $u^i$ is quasi-concave. Let $p\in\mathcal{P}$. If there exist two measures $\mu^*$ and $\mu'$ in $M$ such that $p(\mu*, P^2, . . . ,P^n), = p(\mu',P^2, ... ,P^n)=p$, then
+
+$$
+x^i(\mu^*, P^2, \dots, P^n) = x^i(\mu', P^2, \dots, P^n), \quad 
+i = 1, \dots, n
+$$
+```
+
+This lemma says that fix the beliefs of all agents except agent 1.
 
 If two posterior beliefs $\mu$ and $\mu'$
 both generate the same equilibrium price $p$, then they generate the same
 equilibrium
 allocation for every trader.
-```
-
-```{prf:proof} (Sketch)
-All uninformed agents face the same price $p$ and keep the same beliefs, so
-their demands
-are unchanged.
-
-The firm's supply is also unchanged because it depends only on $p$.
-
-Market clearing then pins down agent 1's demand as the residual, so agent 1 must
-consume the
-same bundle under $\mu$ and $\mu'$ as well.
-```
 
 This lemma lets us define the informed agent's equilibrium bundle as a function
 of price
@@ -421,7 +415,7 @@ solution
 $\mu \in M$, then the price map is invertible on $P$.
 ```
 
-This is Lemma 3 in the paper: if two different posteriors gave the same price,
+If two different posteriors gave the same price,
 then by
 {prf:ref}`ime_lemma_same_price_same_allocation` they would share the same bundle
 $x(p)$,
@@ -476,9 +470,6 @@ making agent 1's demand for good 1 independent of information about $\bar{a}$.
 
 So the market price cannot reveal that information.
 
-The general theorem is abstract, so we now specialize to CES utility to make the
-mechanism concrete.
-
 ### CES utility
 
 For concreteness we work with the **constant-elasticity-of-substitution** (CES)
@@ -528,7 +519,7 @@ $$
 
 For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p =
 W^1 - p$,
-giving $p^* = W^1/2$ regardless of the posterior $q$—confirming that no
+giving $p^* = W^1/2$ regardless of the posterior $q$, confirming that no
 information
 is transmitted through the price in the Cobb-Douglas case.
 
@@ -601,13 +592,13 @@ plt.show()
 
 The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
-- *CES with $\sigma \neq 1$*: the equilibrium price is **strictly monotone** in
+- *CES with $\sigma \neq 1$*: the equilibrium price is *strictly monotone* in
   $q$.
 
   - An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely
     invert the
-  price to recover $q$—inside information is fully transmitted.
-- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$—information is never
+  price to recover $q$, that is, inside information is fully transmitted.
+- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$, that is, information is never
   transmitted through the market.
 
 ```{code-cell} ipython3
@@ -714,7 +705,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-When $\sigma = 1$ the ratio is constant across all $a_s$ values—information
+When $\sigma = 1$ the ratio is constant across all $a_s$ values, information
 about the state has no effect on the marginal rate of substitution.
 
 For $\sigma < 1$ the
@@ -843,9 +834,6 @@ g(p^{t+1} \mid p^1, \ldots, p^t)
     h(\lambda \mid p^1, \ldots, p^t).
 $$
 
-With the posterior and predictive density defined, we can state the paper's
-convergence result.
-
 ### The convergence theorem
 
 ```{prf:theorem} Bayesian Convergence
@@ -1141,7 +1129,7 @@ t_grid = np.arange(T + 1)
 fig, axes = plt.subplots(1, 3, figsize=(13, 4), sharey=True)
 struct_labels = [
     r"$\lambda^{(1)}$",
-    r"$\lambda^{(2)}$ (same reduced form as $\lambda^{(1)}$)",
+    r"$\lambda^{(2)}$",
     r"$\lambda^{(3)}$",
 ]
 
@@ -1377,166 +1365,25 @@ $D_{KL}$).
 ```{exercise}
 :label: km_ex3
 
-The paper constructs a
-counterexample showing that for $S = 3$ states, even if the elasticity of
-substitution
-of $u^1$ is everywhere greater than one, the price map need **not** be
-invertible.
-
-Consider the marginal rate of substitution for the portfolio utility
-$u^1(a_s x_1 + x_2)$ (infinite elasticity of substitution) and three states
-$a_1 > a_2 > a_3$.
-
-The MRS is
-
-$$
-m(\mu)
-= \frac{a_1\beta_1\mu(a_1) + a_2\beta_2\mu(a_2) + a_3\beta_3\mu(a_3)}
-       {\beta_1\mu(a_1) + \beta_2\mu(a_2) + \beta_3\mu(a_3)},
-$$
-
-where $\beta_s = u^{1\prime}(a_s x_1 + x_2)$.
-
-1. For the parameterization used by {cite:t}`kihlstrom_mirman1975`—let
-$\mu(a_3) = q$, $\mu(a_2) = r$, $\mu(a_1) = 1-r-q$—write $m$ as a function of
-$(q, r)$.
-Compute $\partial m / \partial r$ and show that its sign depends on
-$\beta_1\beta_2(a_1-a_2)$ and $\beta_2\beta_3(a_2-a_3)$.
-
-1. Choose $a_1 = 3$, $a_2 = 2$, $a_3 = 0.5$ and $u'(c) = c^{-\gamma}$ (CRRA with
-   risk
-aversion $\gamma$).  Fix $x_1 = 1$, $x_2 = 0.5$.  For $\gamma = 2$, verify
-numerically
-that $\partial m/\partial r$ changes sign (i.e., $m$ is *not* globally monotone
-in $r$),
-giving a counterexample to invertibility.
-
-1. Explain why this non-monotonicity does *not* arise in the two-state case $S =
-   2$.
-```
-
-```{solution-start} km_ex3
-:class: dropdown
-```
-
-**1.** Rewrite the MRS with $\mu_1 = 1-r-q$:
-
-$$
-m(q,r) = \frac{a_1\beta_1(1-r-q) + a_2\beta_2 r + a_3\beta_3 q}
-               {\beta_1(1-r-q) + \beta_2 r + \beta_3 q}.
-$$
-
-Differentiating using the quotient rule (denominator $D$):
-
-$$
-\frac{\partial m}{\partial r}
-= \frac{(a_2\beta_2 - a_1\beta_1)D - (a_1\beta_1(1-r-q)+a_2\beta_2 r+a_3\beta_3
-q)(\beta_2-\beta_1)}{D^2}.
-$$
-
-After simplification this reduces to a signed combination of
-$\beta_1\beta_2(a_1-a_2)({\cdot})$ and $\beta_2\beta_3(a_2-a_3)({\cdot})$ terms
-whose sign is parameter-dependent.
-
-**2. Numerical verification.**
-
-```{code-cell} ipython3
-def mrs_3state(q, r, a1, a2, a3, x1, x2, γ):
-    """Return the three-state MRS at (q, r)."""
-    μ1, μ2, μ3 = 1 - r - q, r, q
-    β1 = (a1 * x1 + x2)**(-γ)
-    β2 = (a2 * x1 + x2)**(-γ)
-    β3 = (a3 * x1 + x2)**(-γ)
-    num = a1 * β1 * μ1 + a2 * β2 * μ2 + a3 * β3 * μ3
-    den = β1 * μ1 + β2 * μ2 + β3 * μ3
-    return num / den
-
-a1, a2, a3 = 3.0, 2.0, 0.5
-x1, x2 = 1.0, 0.5
-γ = 2.0
-q_fix = 0.1
-r_grid = np.linspace(0.05, 0.80, 200)
-
-# Valid region: q + r <= 1.
-r_valid = r_grid[r_grid + q_fix <= 0.95]
-m_vals = [mrs_3state(q_fix, r, a1, a2, a3, x1, x2, γ) for r in r_valid]
-dm_dr = np.gradient(m_vals, r_valid)
-
-fig, axes = plt.subplots(1, 2, figsize=(11, 4))
-axes[0].plot(r_valid, m_vals, color="steelblue", lw=2)
-axes[0].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
-axes[0].set_ylabel("MRS m(q, r)", fontsize=12)
-axes[0].set_title(f"MRS is non-monotone in r (CRRA gamma={γ})", fontsize=12)
-
-axes[1].plot(r_valid, dm_dr, color="crimson", lw=2)
-axes[1].axhline(0, color="black", lw=1, ls="--")
-axes[1].set_xlabel(r"$r = \mu(a_2)$", fontsize=12)
-axes[1].set_ylabel(r"$\partial m / \partial r$", fontsize=12)
-axes[1].set_title(
-    "Derivative changes sign - non-invertibility for $S=3$",
-    fontsize=12,
-)
-
-plt.tight_layout()
-plt.show()
-
-print("Sign changes in dm/dr:",
-      np.sum(np.diff(np.sign(dm_dr)) != 0))
-```
-
-The derivative $\partial m / \partial r$ changes sign, confirming that the MRS
-(and hence
-the equilibrium price) is **not** monotone in $r$ for $S = 3$.
-
-**3.** In the two-state case $S = 2$, the prior is parameterized by a single
-scalar $q$ and the MRS is a function of $q$ alone.
-
-One can show directly that $\partial m / \partial q$ has a definite sign
-determined entirely by whether $a_1 > a_2$ and whether $\sigma > 1$ or $\sigma <
-1$ hold, so there is no room for sign changes.
-
-With three states, the two-dimensional prior $(q, r)$ allows richer interactions
-between $\beta_s$ values that can reverse the sign of the derivative.
-
-```{solution-end}
-```
-
-```{exercise}
-:label: km_ex4
-
 {prf:ref}`ime_theorem_bayesian_convergence`
 assumes the true
 distribution $g(\cdot \mid \bar\lambda)$ is in the support of the prior (i.e.,
 $h(\bar\lambda) > 0$).
 
-Investigate what happens when the true model is **not** in the
+Investigate what happens when the true model is *not* in the
 prior support.
 
-1. Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior
+Simulate $T = 1,000$ periods of prices from $N(2.0, 0.4^2)$ but use a prior
    that
     places equal weight on two *wrong* models: $N(1.5, 0.4^2)$ and $N(2.3,
     0.4^2)$.
 
-    - Plot the posterior weight on each model over time.
-
-2. Show that the **predictive** (mixture) price distribution converges to the
-   *closest*
-    model in KL divergence terms.
+Plot the posterior weight on each model over time.
 
-    - Compute the KL divergence from the true model to each wrong model.
-    - Verify numerically that the posterior concentrates on the closer wrong
-      model and that
-      the predictive mean converges to that model's mean.
-
-3. Relate this finding to the Bayesian consistency literature: when is the limit
-    distribution a good approximation to the true distribution even under
-    misspecification?
-    Why is the symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ a
-    knife-edge case rather
-    than a setting with a deterministic 50-50 posterior limit?
+Discuss your findings.
 ```
 
-```{solution-start} km_ex4
+```{solution-start} km_ex3
 :class: dropdown
 ```
 
@@ -1599,7 +1446,7 @@ for ax, k, label in zip(axes, [0, 1], labels):
 plt.tight_layout()
 plt.show()
 
-# Predictive density and mean along the median posterior path.
+# Predictive density and mean along the median posterior path
 median_path = np.median(h_misspec, axis=0)
 p_grid = np.linspace(0.0, 3.5, 300)
 closer_idx = np.argmin(kl_vals)
@@ -1639,16 +1486,10 @@ D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(2.3, 0.4^2)\bigr)
 D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(1.5, 0.4^2)\bigr),
 $$
 
-so the model with mean $2.3$ is the unique KL-best approximation among the two
-wrong models, and in the simulation posterior weight concentrates on that model
-while the predictive mean converges to $2.3$, not to the true mean $2.0$.
+so the model with mean $2.3$ is the KL-best approximation among the two
+wrong models, and in the simulation posterior weight concentrates on that model.
 
-This is an instance of the general result that under
-misspecification, Bayesian posteriors converge to the distribution in the model
-class that
-minimizes KL divergence from the model actually generating the data.
-
-The connection is that posterior odds are cumulative likelihood ratios.
+Since posterior odds are cumulative {doc}`likelihood ratios<likelihood_bayes>`.
 
 If we compare the two wrong Gaussian models $f$ and $g$, then under the true
 distribution $h$ the average log likelihood ratio satisfies
@@ -1660,17 +1501,5 @@ $$
 So if $f$ is KL-closer to $h$ than $g$ is, $\log L_t$ has positive drift and
 posterior odds tilt toward $f$.
 
-That is exactly the mechanism emphasized in {doc}`Likelihood Ratio Processes
-<likelihood_ratio_process>`.
-
-The lecture {doc}`likelihood_bayes` gives the Bayesian version of the same
-argument by showing how the posterior is a monotone transform of the likelihood
-ratio process.
-
-The symmetric pair $N(1.5, 0.4^2)$ and $N(2.5, 0.4^2)$ is different because both
-wrong models are equally far from the truth in KL terms, so there is no unique
-pseudo-true model and that knife-edge symmetry does **not** imply a
-deterministic 50-50 posterior limit.
-
 ```{solution-end}
 ```

From c866955eb617b055ec79f41ae031756897a15313 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 23 Apr 2026 11:45:28 +1000
Subject: [PATCH 08/10] updates

---
 lectures/information_market_equilibrium.md | 163 ++++++++++----------
 lectures/multivariate_normal.md            |  16 +-
 lectures/prob_matrix.md                    | 170 +++++++--------------
 3 files changed, 146 insertions(+), 203 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 5dffc3b90..5dd40fc49 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -248,8 +248,8 @@ about $\bar{a}$.
 ```{prf:lemma} Posterior Sufficiency
 :label: ime_lemma_posterior_sufficiency
 
-The posterior distribution $\mu_{\tilde{y}}$
-is sufficient for $\tilde{y}$.
+The posterior distribution $\mu_{\tilde{y}}$ is a sufficient statistic for
+$\tilde{y}$.
 ```
 
 ```{prf:proof} (Sketch)
@@ -275,16 +275,13 @@ belief to price.
 ```{prf:theorem} Price Revelation
 :label: ime_theorem_price_revelation
 
-In the economy described above, the price
-random variable $p(\mu_{\tilde{y}})$ is sufficient for $\tilde{y}$ **if and only
-if** the
-belief-to-price map is one-to-one on the realized posterior set $M$,
-equivalently if its
-inverse is well defined on the price set
+In the model outlined above, the price random variable $p(\mu_{\tilde{y}})$ is
+sufficient for the random variable $\tilde{y}$ if and only if the function
+$p(P^1)$ is invertible on the set of prices
 
 $$
-\mathcal{P} \equiv \bigl\{\, p(\mu_y) : y \in Y,\;
-  P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \bigr\}.
+\mathcal{P} = \Bigl\{\, p(\mu_y) : y \in Y,\;
+  P(\tilde{y} = y) = \sum_{a \in A} \phi_a(y)\,\mu_0(a) > 0 \Bigr\}.
 $$
 ```
 
@@ -370,25 +367,32 @@ argument.
 ```{prf:lemma} Same Price Implies Same Allocation
 :label: ime_lemma_same_price_same_allocation
 
-Assume that $u^i$ has continuous first partial derivatives
-and that $u^i$ is quasi-concave. Let $p\in\mathcal{P}$. If there exist two measures $\mu^*$ and $\mu'$ in $M$ such that $p(\mu*, P^2, . . . ,P^n), = p(\mu',P^2, ... ,P^n)=p$, then
+Assume that $u^i$ has continuous first partial derivatives and that $u^i$ is
+quasi-concave.
+
+Let $p \in \mathcal{P}$.
+
+If there exist two measures $\mu^*$ and $\mu'$ in $M$ such that
+$p(\mu^*, P^2, \ldots, P^n) = p(\mu', P^2, \ldots, P^n) = p$, then
 
 $$
-x^i(\mu^*, P^2, \dots, P^n) = x^i(\mu', P^2, \dots, P^n), \quad 
-i = 1, \dots, n
+x^i(\mu^*, P^2, \ldots, P^n) = x^i(\mu', P^2, \ldots, P^n), \quad
+i = 1, \ldots, n.
 $$
 ```
 
-This lemma says that fix the beliefs of all agents except agent 1.
+Fix the beliefs of all agents except agent 1.
 
-If two posterior beliefs $\mu$ and $\mu'$
-both generate the same equilibrium price $p$, then they generate the same
-equilibrium
-allocation for every trader.
+The lemma says that if two posterior beliefs $\mu^*$ and $\mu'$ for agent 1
+both support the same equilibrium price $p$, then they support the same
+equilibrium allocation for every trader.
+
+The intuition is that when the price is unchanged, the demands of the
+uninformed traders are unchanged too, so market clearing forces the informed
+agent's bundle to be unchanged as well.
 
 This lemma lets us define the informed agent's equilibrium bundle as a function
-of price
-alone:
+of price alone:
 
 $$
 x(p) = (x_1(p), x_2(p)).
@@ -410,18 +414,24 @@ whether this equation admits a unique posterior $\mu$.
 ```{prf:lemma} Unique Posterior at a Given Price
 :label: ime_lemma_unique_posterior
 
-If, for each price $p \in P$, the first-order condition above has a unique
-solution
-$\mu \in M$, then the price map is invertible on $P$.
+Assume that the first partial derivatives of $u^1$ exist and that $u^1$ is
+quasi-concave.
+
+Also assume that agent 1 always consumes positive quantities of both goods.
+
+Then $p(P^1)$ is invertible on $\mathcal{P}$ if for each $p \in \mathcal{P}$
+there exists a unique probability measure $\mu \in M$ such that
+
+$$
+\frac{\sum_{s=1}^S a_s\, u^1_1(a_s x_1(p), x_2(p))\, \mu(a_s)}
+     {\sum_{s=1}^S u^1_2(a_s x_1(p), x_2(p))\, \mu(a_s)} = p.
+$$
 ```
 
-If two different posteriors gave the same price,
-then by
+If two different posteriors gave the same price, then by
 {prf:ref}`ime_lemma_same_price_same_allocation` they would share the same bundle
-$x(p)$,
-contradicting uniqueness of the posterior that solves the first-order condition
-at that
-price.
+$x(p)$, contradicting uniqueness of the posterior that solves the first-order
+condition at that price.
 
 ### The two-state first-order condition
 
@@ -448,27 +458,26 @@ equation.
 ```{prf:theorem} Invertibility Conditions
 :label: ime_theorem_invertibility_conditions
 
-Assume $u^1$ is quasi-concave and
-homothetic with continuous first partials. 
+Assume that the first partial derivatives of $u^1$ exist and that $u^1$ is
+quasi-concave and homothetic.
 
-Assume agent 1 always consumes positive
-quantities of both goods.
+Also suppose that the informed agent always consumes positive quantities of
+both goods in all equilibrium allocations.
 
-For $S = 2$:
+If $S = 2$ and the elasticity of substitution of $u^1$ is either always less
+than one or always greater than one, then $p(P^1)$ is invertible on
+$\mathcal{P}$.
 
-- If $\sigma < 1$ for all feasible allocations, the price map is **invertible**
-  on $P$.
-- If $\sigma > 1$ for all feasible allocations, the price map is **invertible**
-  on $P$.
-- If $u^1$ is **Cobb-Douglas** ($\sigma = 1$), the price map is **constant** on
-  $P$
-  (no information is transmitted).
+If $u^1$ is Cobb-Douglas (elasticity of substitution constant and equal to
+one), then $p(P^1)$ is constant on $\mathcal{P}$.
 ```
 
-Thus, when $\sigma = 1$ the income and substitution effects exactly cancel,
-making agent 1's demand for good 1 independent of information about $\bar{a}$.
+When $\sigma = 1$ the income and substitution effects exactly cancel, so
+agent 1's demand for good 1 does not respond to changes in beliefs about
+$\bar{a}$.
 
-So the market price cannot reveal that information.
+Because the demand is unchanged, the market-clearing price is unchanged too,
+and the price reveals nothing about the insider's signal.
 
 ### CES utility
 
@@ -592,14 +601,14 @@ plt.show()
 
 The plot confirms {prf:ref}`ime_theorem_invertibility_conditions`.
 
-- *CES with $\sigma \neq 1$*: the equilibrium price is *strictly monotone* in
-  $q$.
+For CES with $\sigma \neq 1$, the equilibrium price is strictly monotone in $q$.
+
+An outside observer who knows the equilibrium map $p^*(\cdot)$ can therefore
+invert the price uniquely to recover $q$, so the inside information is fully
+transmitted.
 
-  - An outside observer who knows the equilibrium map $p^*(\cdot)$ can uniquely
-    invert the
-  price to recover $q$, that is, inside information is fully transmitted.
-- *Cobb-Douglas ($\sigma = 1$)*: the price is *flat* in $q$, that is, information is never
-  transmitted through the market.
+For Cobb-Douglas ($\sigma = 1$), the price is flat in $q$, so information is
+never transmitted through the market.
 
 ```{code-cell} ipython3
 p_cd = [eq_price(q, a1, a2, W1, ρ=0.0) for q in q_grid]
@@ -620,15 +629,16 @@ pattern back to the proof of {prf:ref}`ime_theorem_invertibility_conditions`.
 (price_monotonicity)=
 ### Why monotonicity depends on $\sigma$
 
-The key derivative in the paper fixes a price $p$, treats $\alpha_s(p)$ and
-$\beta_s(p)$ as constants, and then differentiates the right-hand side of
+Fix a price $p$ and treat $\alpha_s(p)$ and $\beta_s(p)$ as constants.
+
+The right-hand side of the two-state first-order condition
 
 $$
 \frac{\alpha_1(p)\, q + \alpha_2(p)\, (1-q)}
      {\beta_1(p)\, q + \beta_2(p)\, (1-q)}
 $$
 
-is a function of $q$ whose derivative is
+is then a function of $q$ alone, with derivative
 
 $$
 \frac{\partial}{\partial q}
@@ -705,12 +715,12 @@ plt.tight_layout()
 plt.show()
 ```
 
-When $\sigma = 1$ the ratio is constant across all $a_s$ values, information
-about the state has no effect on the marginal rate of substitution.
+When $\sigma = 1$ the ratio is constant across all $a_s$ values, so
+information about the state has no effect on the marginal rate of substitution.
 
-For $\sigma < 1$ the
-ratio is decreasing in $a_s$, and for $\sigma > 1$ it is increasing, making the
-equilibrium price strictly monotone in the posterior $q$ in both cases.
+For $\sigma < 1$ the ratio is decreasing in $a_s$, and for $\sigma > 1$ it is
+increasing, making the equilibrium price strictly monotone in the posterior $q$
+in both cases.
 
 The static analysis asks whether a current price reveals current private
 information, whereas the next section asks what a whole history of prices
@@ -1201,11 +1211,10 @@ that theorem.
 :class: dropdown
 ```
 
-**1. First-order condition.**
+For the first-order condition, define $W_s = w + (a_s - p)\,x_1$ for
+$s = 1, 2$.
 
-Define $W_s = w + (a_s - p)\,x_1$ for $s=1,2$.
-
-The FOC is
+Then the FOC is
 
 $$
 q\,(a_1 - p)\,\gamma\, e^{-\gamma W_1}
@@ -1219,11 +1228,8 @@ q\,(a_1 - p)\, e^{-\gamma(a_1-p) x_1}
   = (1-q)\,(p - a_2)\, e^{\gamma(p-a_2) x_1}.
 $$
 
-**2. Market-clearing equilibrium price.**
-
-Setting $x_1 = 1$ (all supply absorbed by informed agent), the equation becomes
-
-a scalar root-finding problem in $p$:
+Setting $x_1 = 1$ (the informed agent absorbs all supply), this becomes a
+scalar root-finding problem in $p$:
 
 $$
 F(p;\,q,\gamma) \equiv
@@ -1259,16 +1265,15 @@ plt.tight_layout()
 plt.show()
 ```
 
-**3. Invertibility for CARA.**
+The price is strictly increasing in $q$ for every $\gamma > 0$.
 
-The price is strictly increasing in $q$ for every $\gamma > 0$, because
-portfolio utility $u(x_2 + \bar{a}\,x_1)$ treats the two goods as **perfect
-substitutes** in creating wealth, so a higher posterior probability of the
-high-return state raises the marginal value of the risky asset and pushes the
-equilibrium price upward.
+The reason is that portfolio utility $u(x_2 + \bar{a}\,x_1)$ treats the two
+goods as perfect substitutes in creating wealth, so a higher posterior
+probability of the high-return state raises the marginal value of the risky
+asset and pushes the equilibrium price upward.
 
 This behavior is similar in spirit to the $\sigma > 1$ case in
-{prf:ref}`ime_theorem_invertibility_conditions`, but it is *not* a direct
+{prf:ref}`ime_theorem_invertibility_conditions`, but it is not a direct
 consequence of that theorem because CARA utility over wealth is not homothetic
 in the two-good representation used in the theorem.
 
@@ -1486,10 +1491,10 @@ D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(2.3, 0.4^2)\bigr)
 D_{KL}\bigl(N(2.0, 0.4^2)\,\|\,N(1.5, 0.4^2)\bigr),
 $$
 
-so the model with mean $2.3$ is the KL-best approximation among the two
-wrong models, and in the simulation posterior weight concentrates on that model.
+so the model with mean $2.3$ is the KL-best approximation among the two wrong
+models, and in the simulation posterior weight concentrates on that model.
 
-Since posterior odds are cumulative {doc}`likelihood ratios<likelihood_bayes>`.
+Posterior odds are cumulative {doc}`likelihood ratios<likelihood_bayes>`.
 
 If we compare the two wrong Gaussian models $f$ and $g$, then under the true
 distribution $h$ the average log likelihood ratio satisfies
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index a3be75575..70f361ae9 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -1003,7 +1003,7 @@ w_{6}
 $$
 
 where
-$w \begin{bmatrix} w_1 \cr w_2 \cr \vdots \cr w_6 \end{bmatrix}$
+$w = \begin{bmatrix} w_1 \cr w_2 \cr \vdots \cr w_6 \end{bmatrix}$
 is a standard normal random vector.
 
 We construct a Python function `construct_moments_IQ2d` to construct
@@ -1066,7 +1066,7 @@ multi_normal_IQ2d.partition(k)
 multi_normal_IQ2d.cond_dist(1, [*y1, *y2])
 ```
 
-Now let’s compute distributions of $\theta$ and $\mu$
+Now let’s compute distributions of $\theta$ and $\eta$
 separately conditional on various subsets of test scores.
 
 It will be fun to compare outcomes with the help of an auxiliary function
@@ -1423,7 +1423,7 @@ This example is an instance of what is known as a **Wold representation** in tim
 Consider the stochastic second-order linear difference equation
 
 $$
-y_{t} = \alpha_{0} + \alpha_{1} y_{y-1} + \alpha_{2} y_{t-2} + u_{t}
+y_{t} = \alpha_{0} + \alpha_{1} y_{t-1} + \alpha_{2} y_{t-2} + u_{t}
 $$
 
 where $u_{t} \sim N \left(0, \sigma_{u}^{2}\right)$ and
@@ -1518,7 +1518,6 @@ $$
 
 ```{code-cell} python3
 # set parameters
-T = 80
 T = 160
 # coefficients of the second order difference equation
 𝛼0 = 10
@@ -1526,7 +1525,6 @@ T = 160
 𝛼2 = -.9
 
 # variance of u
-σu = 1.
 σu = 10.
 
 # distribution of y_{-1} and y_{0}
@@ -1840,7 +1838,7 @@ of $x_t$ conditional on
 $y_0, y_1, \ldots , y_{t-1} = y^{t-1}$ is
 
 $$
-x_t | y^{t-1} \sim {\mathcal N}(A \tilde x_t , A \tilde \Sigma_t A' + C C' )
+x_t | y^{t-1} \sim {\mathcal N}(A \tilde x_{t-1} , A \tilde \Sigma_{t-1} A' + C C' )
 $$
 
 where $\{\tilde x_t, \tilde \Sigma_t\}_{t=1}^\infty$ can be
@@ -2015,7 +2013,7 @@ $\Lambda \Lambda^\top$ of rank $k$.
 
 This means that all covariances among the $n$ components of the
 $Y$ vector are intermediated by their common dependencies on the
-$k<$ factors.
+$k$ factors.
 
 Form
 
@@ -2277,8 +2275,8 @@ $Y$ on the first two principal components does a good job of
 approximating $Ef \mid y$.
 
 We confirm this in the following plot of $f$,
-$E y \mid f$, $E f \mid y$, and $\hat{y}$ on the
-coordinate axis versus $y$ on the ordinate axis.
+$E y \mid f$, $E f \mid y$, and $\hat{y}$ against the
+observation index on the horizontal axis.
 
 ```{code-cell} python3
 plt.scatter(range(N), Λ @ f, label='$Ey|f$')
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index bc5ac21f4..3a63a54d3 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -53,6 +53,8 @@ As usual, we'll start with some imports
 import numpy as np
 import matplotlib.pyplot as plt
 import prettytable as pt
+from scipy import stats
+from scipy.special import comb
 from mpl_toolkits.mplot3d import Axes3D
 from matplotlib_inline.backend_inline import set_matplotlib_formats
 set_matplotlib_formats('retina')
@@ -300,7 +302,7 @@ Note that a sufficient statistic corresponds to a particular statistical model.
 
 Sufficient statistics are key tools that AI uses to summarize or compress a **big data** set.
 
-R. A. Fisher provided a rigorous definition of **information** -- see <https://en.wikipedia.org/wiki/Fisher_information>.
+R. A. Fisher provided a rigorous definition of **information** -- see [Fisher information](https://en.wikipedia.org/wiki/Fisher_information).
 
 
 
@@ -370,7 +372,7 @@ $$
 
 ## Marginal probability distributions
 
-The joint distribution induce marginal distributions
+The joint distribution induces marginal distributions
 
 $$
 \textrm{Prob}\{X=i\}= \sum_{j=0}^{J-1}f_{ij} = \mu_i, \quad i=0,\ldots,I-1
@@ -433,7 +435,7 @@ where $i=0, \ldots,I-1, \quad j=0,\ldots,J-1$.
 Note that
 
 $$
-\sum_{i}\textrm{Prob}\{X_i=i|Y_j=j\}
+\sum_{i}\textrm{Prob}\{X=i|Y=j\}
 =\frac{ \sum_{i}f_{ij} }{ \sum_{i}f_{ij}}=1
 $$
 
@@ -444,7 +446,11 @@ $$
 $$ (eq:condprobbayes)
 
 ```{note}
-Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 0,  \ldots, I-1$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
+Formula {eq}`eq:condprobbayes` is also  what a  Bayesian calls **Bayes' Law**. 
+
+A Bayesian statistician regards  marginal probability distribution $\textrm{Prob}({X=i}), i = 0,  \ldots, I-1$ as a **prior** distribution that describes his personal subjective beliefs about $X$.
+
+
 He  then interprets  formula {eq}`eq:condprobbayes` as a procedure for constructing a **posterior** distribution that describes how he would  revise his subjective beliefs after observing that $Y$ equals $j$.  
 ```
 
@@ -491,8 +497,8 @@ where
 $$
 \left[
    \begin{matrix}
-  p_{11} & p_{12}\\
-  p_{21} & p_{22}
+  p_{00} & p_{01}\\
+  p_{10} & p_{11}
   \end{matrix}
 \right]
 $$
@@ -519,7 +525,7 @@ Suppose that
 
 $$
 \begin{aligned}
-\text{Prob} \{X(0)=i,X(1)=j\} &=f_{ij}≥0，i=0,\cdots,I-1\\
+\text{Prob} \{X(0)=i,X(1)=j\} &=f_{ij}\geq 0, \quad i=0,\cdots,I-1, \quad j=0,\cdots,J-1\\
 \sum_{i}\sum_{j}f_{ij}&=1
 \end{aligned}
 $$
@@ -545,8 +551,8 @@ where
 
 $$
 \begin{aligned}
-\textrm{Prob}\{X=i\} &=f_i\ge0， \sum{f_i}=1 \cr
-\textrm{Prob}\{Y=j\} & =g_j\ge0， \sum{g_j}=1
+\textrm{Prob}\{X=i\} &=f_i\ge 0, \quad \sum_{i}{f_i}=1 \cr
+\textrm{Prob}\{Y=j\} & =g_j\ge 0, \quad \sum_{j}{g_j}=1
 \end{aligned}
 $$
 
@@ -572,7 +578,7 @@ $$
 \end{aligned}
 $$
 
-A continuous random variable having  density $f_{X}(x)$) has  mean and variance
+A continuous random variable having  density $f_{X}(x)$ has  mean and variance
 
 $$
 \begin{aligned}
@@ -1136,10 +1142,10 @@ Start with a joint distribution
 $$
 \begin{aligned}
 f_{ij} & =\textrm{Prob}\{X=i,Y=j\}\\
-i& =0, \cdots，I-1\\
-j& =0, \cdots，J-1\\
-& \text{stacked to an }I×J\text{ matrix}\\
-& e.g. \quad I=1, J=1
+i& =0, \cdots, I-1\\
+j& =0, \cdots, J-1\\
+& \text{stacked to an }I\times J\text{ matrix}\\
+& e.g. \quad I=2, J=2
 \end{aligned}
 $$
 
@@ -1148,8 +1154,8 @@ where
 $$
 \left[
    \begin{matrix}
-  f_{11} & f_{12}\\
-  f_{21} & f_{22}
+  f_{00} & f_{01}\\
+  f_{10} & f_{11}
   \end{matrix}
 \right]
 $$
@@ -1158,7 +1164,7 @@ From the joint distribution, we have shown above that we  obtain **unique** marg
 
 Now we'll try to go in a reverse direction.
 
-We'll find that from two marginal distributions, can we usually construct more than one   joint distribution that verifies these marginals.
+We'll find that from two marginal distributions we can usually construct more than one joint distribution that satisfies these marginals.
 
 Each of these joint distributions is called a **coupling** of the two marginal distributions.
 
@@ -1171,7 +1177,7 @@ $$
 \end{aligned}
 $$
 
-Given two marginal distribution, $\mu$ for $X$ and $\nu$ for $Y$, a joint distribution $f_{ij}$ is said to be a **coupling** of $\mu$ and $\nu$.
+Given two marginal distributions, $\mu$ for $X$ and $\nu$ for $Y$, a joint distribution $f_{ij}$ is said to be a **coupling** of $\mu$ and $\nu$.
 
 Consider the following bivariate example.
 
@@ -1187,7 +1193,7 @@ $$
 
 We construct  two couplings.
 
-The first coupling if our two marginal distributions is the joint distribution
+The first coupling of our two marginal distributions is the joint distribution
 
 $$f_{ij}=
 \left[
@@ -1223,7 +1229,7 @@ f_{ij}=
 \right]
 $$
 
-The verify that this is a coupling, note that
+To verify that this is a coupling, note that
 
 $$
 \begin{aligned}
@@ -1258,6 +1264,7 @@ H(x_1,x_2,\dots,x_N) = C(F_1(x_1), F_2(x_2),\dots,F_N(x_N)).
 $$
 
 If the marginal distributions are continuous, then the copula is unique.
+
 In that case, we can recover it from the marginal inverses:
 
 $$
@@ -1365,7 +1372,7 @@ draws1 = 1_000_000
 # generate draws from uniform distribution
 p = np.random.rand(draws1)
 
-# generate draws of first copuling via uniform distribution
+# generate draws of first coupling via uniform distribution
 c1 = np.vstack([np.ones(draws1), np.ones(draws1)])
 # X=0, Y=0
 c1[0, p <= f1_cum[0]] = 0
@@ -1440,7 +1447,7 @@ draws2 = 1_000_000
 # generate draws from uniform distribution
 p = np.random.rand(draws2)
 
-# generate draws of first coupling via uniform distribution
+# generate draws of second coupling via uniform distribution
 c2 = np.vstack([np.ones(draws2), np.ones(draws2)])
 # X=0, Y=0
 c2[0, p <= f2_cum[0]] = 0
@@ -1464,7 +1471,7 @@ f2_10 = sum((c2[0, :] == 1)*(c2[1, :] == 0))/draws2
 f2_11 = sum((c2[0, :] == 1)*(c2[1, :] == 1))/draws2
 
 # print output of second joint distribution
-print("first joint distribution for c2")
+print("second joint distribution for c2")
 c2_mtb = pt.PrettyTable()
 c2_mtb.field_names = ['c2_x_value', 'c2_y_value', 'c2_prob']
 c2_mtb.add_row([0, 0, f2_00])
@@ -1507,7 +1514,8 @@ arbitrary marginal distributions.
 The construction has three steps:
 
 1. Draw $(Z_1, Z_2)$ from a bivariate standard normal with correlation $\rho$.
-2. Apply the standard normal CDF: $U_k = \Phi(Z_k)$. The pair $(U_1, U_2)$ has uniform marginals but retains the dependence structure of $(Z_1, Z_2)$ — this is the copula.
+2. Apply the standard normal CDF: $U_k = \Phi(Z_k)$. 
+   - The pair $(U_1, U_2)$ has uniform marginals but retains the dependence structure of $(Z_1, Z_2)$ --- this is the copula.
 3. Apply the inverse CDF of any desired marginal: $X_k = F_k^{-1}(U_k)$.
 
 The following code illustrates this with exponential marginals.
@@ -1519,22 +1527,21 @@ mystnb:
     caption: gaussian copula with exponential marginals
     name: fig-gaussian-copula
 ---
-from scipy import stats
 
 # Gaussian copula parameters
 ρ_cop = 0.8
 n_cop = 100_000
 
-# Step 1: draw from bivariate standard normal with correlation ρ_cop
+# Draw from bivariate standard normal with correlation ρ_cop
 z = np.random.multivariate_normal(
     [0, 0], [[1, ρ_cop], [ρ_cop, 1]], n_cop
 )
 
-# Step 2: apply normal CDF -> uniform marginals (the copula itself)
+# Apply normal CDF -> uniform marginals (the copula itself)
 u1 = stats.norm.cdf(z[:, 0])
 u2 = stats.norm.cdf(z[:, 1])
 
-# Step 3: apply inverse CDFs of desired marginals (here: Exponential)
+# Apply inverse CDFs of desired marginals (here: Exponential)
 x1 = stats.expon.ppf(u1, scale=1.0)   # Exp with mean 1
 x2 = stats.expon.ppf(u2, scale=0.5)   # Exp with mean 0.5
 
@@ -1545,7 +1552,6 @@ axes[0].set_ylabel('$u_2$')
 axes[1].scatter(x1[:3000], x2[:3000], alpha=0.2, s=2)
 axes[1].set_xlabel('$x_1$ (Exp, mean=1)')
 axes[1].set_ylabel('$x_2$ (Exp, mean=0.5)')
-plt.tight_layout()
 plt.show()
 
 print(f"Sample correlation of (x1, x2): {np.corrcoef(x1, x2)[0, 1]:.3f}")
@@ -1587,8 +1593,6 @@ where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-
 F = np.array([[0.3, 0.2],
               [0.1, 0.4]])
 
@@ -1601,7 +1605,7 @@ F_indep = np.outer(μ, ν)
 print("\nIndependence matrix (outer product):\n", F_indep)
 print("\nActual joint F:\n", F)
 
-print("\nIndependent (F == μ ⊗ ν)?", np.allclose(F, F_indep))
+print("\nIndependent (F == μ times ν)?", np.allclose(F, F_indep))
 
 prob_X0_given_Y10 = F[0, 0] / ν[0]
 print(f"\nProb(X=0 | Y=10) = {prob_X0_given_Y10:.4f}")
@@ -1632,8 +1636,6 @@ Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 2
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-
 xs = np.array([0, 1])
 ys = np.array([10, 20])
 F  = np.array([[0.3, 0.2],
@@ -1672,17 +1674,17 @@ and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0
 ```{exercise}
 :label: prob_matrix_ex3
 
-**Sum of Two Dice (Convolution)**
+**Sum of Two Dice**
 
 Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
 
 1. Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
 
-1. Plot the theoretical distribution.
+1. Plot the result generated by the formula.
 
 1. Simulate $10^6$ rolls and overlay the empirical histogram on the plot.
 
-1. Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ both from the theoretical distribution and from the simulation.
+1. Compute $\mathbb{E}[Z]$ and $\text{Var}(Z)$ from the two calculations
 ```
 
 ```{solution-start} prob_matrix_ex3
@@ -1690,11 +1692,14 @@ Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z =
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
-
 f = np.ones(6) / 6
-h = np.convolve(f, f)
+g = np.ones(6) / 6
+h = [
+    sum(f[i]*g[k-i] for i in range(
+        max(0, k-len(g)+1), # f_i exists 
+        min(len(f), k+1))   # g_{k-i} exists
+        ) 
+        for k in range(len(f) + len(g) - 1)]
 z_vals = np.arange(2, 13)
 
 n = 1_000_000
@@ -1743,10 +1748,8 @@ where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-
-P    = np.array([[0.9, 0.1],
-                 [0.2, 0.8]])
+P = np.array([[0.9, 0.1],
+              [0.2, 0.8]])
 ψ0 = np.array([1.0, 0.0])
 
 for n in [1, 5, 20, 100]:
@@ -1767,68 +1770,6 @@ print(f"ψ_100 close to stationary? {np.allclose(ψ_100, ψ_star, atol=1e-6)}")
 ```{exercise}
 :label: prob_matrix_ex5
 
-**Fréchet–Hoeffding Bounds**
-
-Let $X \in \{0,1\}$ and $Y \in \{0,1\}$ with marginals $\mu = [0.5,\, 0.5]$ and $\nu = [0.4,\, 0.6]$.
-
-1. Construct the **comonotone** (upper Fréchet) coupling that puts as much mass as possible on the diagonal $\{X=i, Y=i\}$.
-
-1. Construct the **counter-monotone** (lower Fréchet) coupling that puts as much mass as possible on the anti-diagonal.
-
-1. Construct the **independent** coupling $f^{\perp}_{ij} = \mu_i \nu_j$.
-
-1. Verify that all three have the correct marginals.
-
-1. For each coupling compute $\text{Cor}(X,Y)$. Which maximises / minimises the correlation?
-```
-
-```{solution-start} prob_matrix_ex5
-:class: dropdown
-```
-
-```{code-cell} ipython3
-import numpy as np
-
-xs = np.array([0, 1])
-ys = np.array([0, 1])
-μ = np.array([0.5, 0.5])
-ν = np.array([0.4, 0.6])
-
-F_upper = np.array([[0.4, 0.1],
-                    [0.0, 0.5]])
-
-F_lower = np.array([[0.0, 0.5],
-                    [0.4, 0.1]])
-
-F_indep = np.outer(μ, ν)
-
-for F, name in [(F_upper, "Upper Fréchet"),
-                (F_lower, "Lower Fréchet"),
-                (F_indep, "Independent  ")]:
-    print(f"{name}: row sums = {F.sum(axis=1)}, col sums = {F.sum(axis=0)}")
-
-def correlation(F, xs, ys):
-    μ_x  = F.sum(axis=1)
-    ν_y  = F.sum(axis=0)
-    E_X  = xs @ μ_x
-    E_Y  = ys @ ν_y
-    E_XY  = sum(xs[i]*ys[j]*F[i,j] for i in range(2) for j in range(2))
-    cov   = E_XY - E_X * E_Y
-    σ_X = np.sqrt(((xs - E_X)**2) @ μ_x)
-    σ_Y = np.sqrt(((ys - E_Y)**2) @ ν_y)
-    return cov / (σ_X * σ_Y)
-
-print(f"\nCor upper Fréchet = {correlation(F_upper, xs, ys):.4f}  (maximum)")
-print(f"Cor lower Fréchet = {correlation(F_lower, xs, ys):.4f}  (minimum)")
-print(f"Cor independent   = {correlation(F_indep, xs, ys):.4f}")
-```
-
-```{solution-end}
-```
-
-```{exercise}
-:label: prob_matrix_ex6
-
 **Bayes' Law with a Discrete Prior**
 
 A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
@@ -1848,15 +1789,11 @@ for each $\theta$.
 1. Repeat for $k = 3$ heads and describe how the posterior shifts.
 ```
 
-```{solution-start} prob_matrix_ex6
+```{solution-start} prob_matrix_ex5
 :class: dropdown
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
-from scipy.special import comb
-
 θ_vals = np.array([0.2, 0.5, 0.8])
 π = np.array([0.25, 0.50, 0.25])
 
@@ -1868,13 +1805,16 @@ def compute_posterior(k, n, θ_vals, π):
 post7, lik7 = compute_posterior(7, 10, θ_vals, π)
 post3, lik3 = compute_posterior(3, 10, θ_vals, π)
 
-print("k=7:  likelihood =", lik7.round(4), " posterior =", post7.round(4))
-print("k=3:  likelihood =", lik3.round(4), " posterior =", post3.round(4))
+print("k=7:  likelihood =", lik7.round(4), 
+      " posterior =", post7.round(4))
+print("k=3:  likelihood =", lik3.round(4), 
+      " posterior =", post3.round(4))
 
 x = np.arange(len(θ_vals))
 w = 0.3
 fig, axes = plt.subplots(1, 2, figsize=(10, 4))
-for ax, post, title in zip(axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
+for ax, post, title in zip(
+    axes, [post7, post3], ['k=7 heads', 'k=3 heads']):
     ax.bar(x - w/2, π, w, label='Prior',     alpha=0.7)
     ax.bar(x + w/2, post,  w, label='Posterior', alpha=0.7)
     ax.set_xticks(x)

From ba24b95c41fdfbabd7c8cb061c00fda3a4589f60 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Thu, 23 Apr 2026 18:54:31 +0800
Subject: [PATCH 09/10] update

---
 lectures/_static/quant-econ.bib            | 12 +++++-------
 lectures/information_market_equilibrium.md |  3 ---
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 547006f33..6cf319fcd 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -3655,18 +3655,17 @@ @article{muth1961
 
 @article{radner1972,
   author  = {Radner, Roy},
-  title   = {Existence of Equilibrium Plans, Prices, and Price Expectations
-             in a Sequence of Markets},
+  title   = {Existence of Equilibrium of Plans, Prices, and Price Expectations in a Sequence of Markets},
   journal = {Econometrica},
   volume  = {40},
   number  = {2},
-  pages   = {289--304},
+  pages   = {289--303},
   year    = {1972}
 }
 
 @article{arrow1964,
   author  = {Arrow, Kenneth J.},
-  title   = {The Role of Securities in the Optimal Allocation of Risk-Bearing},
+  title   = {The Role of Securities in the Optimal Allocation of Risk-bearing},
   journal = {Review of Economic Studies},
   volume  = {31},
   number  = {2},
@@ -3676,11 +3675,10 @@ @article{arrow1964
 
 @article{grossman1976,
   author  = {Grossman, Sanford J.},
-  title   = {On the Efficiency of Competitive Stock Markets Where Trades Have
-             Diverse Information},
+  title   = {On the Efficiency of Competitive Stock Markets Where Trades Have Diverse Information},
   journal = {Journal of Finance},
   volume  = {31},
   number  = {2},
   pages   = {573--585},
   year    = {1976}
-}
+}
\ No newline at end of file
diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index 5dd40fc49..d9b4f95cc 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -88,9 +88,6 @@ Reduced-form and structural models come in pairs.
 To each structure or structural model
 there is a reduced form, or collection of reduced forms, underlying different
 possible regressions.
-
-In this lecture, a **structure** is a parameterization of the underlying
-endowment process.
 ```
 
 The lecture is organized as follows.

From 5d97aad0f85e5c6c4dd633faeff838af18897719 Mon Sep 17 00:00:00 2001
From: HumphreyYang <humzyyang@gmail.com>
Date: Fri, 24 Apr 2026 01:53:14 +0800
Subject: [PATCH 10/10] updates

---
 lectures/information_market_equilibrium.md | 104 ++++--
 lectures/multivariate_normal.md            | 379 +++++++++++++++------
 lectures/prob_matrix.md                    |  70 ++--
 3 files changed, 382 insertions(+), 171 deletions(-)

diff --git a/lectures/information_market_equilibrium.md b/lectures/information_market_equilibrium.md
index d9b4f95cc..bfb19ac3f 100644
--- a/lectures/information_market_equilibrium.md
+++ b/lectures/information_market_equilibrium.md
@@ -71,7 +71,7 @@ Important findings of {cite:t}`kihlstrom_mirman1975` are:
   insider's posterior distribution to the equilibrium price is one-to-one on
   the set of
   posteriors that can actually arise from the signal.
-  - Invertibility holds when the informed
+  - For the two-state case ($S = 2$), invertibility holds when the informed
     agent's utility is homothetic and the elasticity of substitution is everywhere
     either below one or above one.
 - In the dynamic economy, as information accumulates, Bayesian price
@@ -120,8 +120,9 @@ from scipy.stats import norm
 
 The economy has two goods. 
 
-Good 2 is the numeraire (price normalized to 1); good 1 trades
-at price $p > 0$.
+Good 2 is the numeraire (price normalized to 1). 
+
+Good 1 trades at price $p > 0$.
 
 An unknown parameter $\bar{a}$ affects the value of good 1. 
 
@@ -174,7 +175,7 @@ the calculations transparent.
 
 Suppose **agent 1** (the insider) observes a private signal $\tilde{y}$
 correlated with
-$\bar{a}$ before trading.
+$\bar{a}$ before trading, where $\tilde{y}$ takes values in a finite set $Y$.
 
 Before the signal arrives, agent 1 has prior beliefs
 $\mu_0 = P^1$.
@@ -347,9 +348,6 @@ invertibility holds.
 (invertibility_conditions)=
 ## Invertibility and the elasticity of substitution
 
-The price-revelation theorem reduces the economic problem to a narrower one:
-when is the belief-to-price map actually one-to-one?
-
 When does the belief-to-price map fail to be invertible?
 
 {prf:ref}`ime_theorem_invertibility_conditions`
@@ -395,6 +393,9 @@ $$
 x(p) = (x_1(p), x_2(p)).
 $$
 
+Throughout, $u^i_j$ denotes the partial derivative of $u^i$ with respect to its
+$j$-th argument.
+
 Whenever the informed agent consumes positive amounts of both goods, optimality
 of $x(p)$
 under posterior $\mu$ gives the interior first-order condition
@@ -478,7 +479,7 @@ and the price reveals nothing about the insider's signal.
 
 ### CES utility
 
-For concreteness we work with the **constant-elasticity-of-substitution** (CES)
+For concreteness we work with a simplified example with the **constant-elasticity-of-substitution** (CES)
 utility
 function
 
@@ -513,19 +514,22 @@ We focus on agent 1 as the *only* informed trader who absorbs one unit of good 1
 at
 equilibrium (i.e., $x_1 = 1$).
 
+Let $W_1 = w^1 + \theta^1 \pi$ denote agent 1's total wealth (endowment plus
+profit share).
+
 Agent 1's budget constraint then reduces to
-$x_2 = W^1 - p$, and the equilibrium price is the unique $p \in (0, W^1)$
+$x_2 = W_1 - p$, and the equilibrium price is the unique $p \in (0, W_1)$
 satisfying
 the first-order condition
 
 $$
-p \bigl[q\, u_2(a_1,\, W^1-p) + (1-q)\, u_2(a_2,\, W^1-p)\bigr]
-= q\, a_1\, u_1(a_1,\, W^1-p) + (1-q)\, a_2\, u_1(a_2,\, W^1-p).
+p \bigl[q\, u_2(a_1,\, W_1-p) + (1-q)\, u_2(a_2,\, W_1-p)\bigr]
+= q\, a_1\, u_1(a_1,\, W_1-p) + (1-q)\, a_2\, u_1(a_2,\, W_1-p).
 $$
 
 For Cobb-Douglas utility ($\sigma = 1$), the first-order condition becomes $p =
-W^1 - p$,
-giving $p^* = W^1/2$ regardless of the posterior $q$, confirming that no
+W_1 - p$,
+giving $p^* = W_1/2$ regardless of the posterior $q$, confirming that no
 information
 is transmitted through the price in the Cobb-Douglas case.
 
@@ -616,7 +620,7 @@ print(f"Cobb-Douglas (rho=0): min p* = {min(p_cd):.6f}, "
 print(f"Analytical CD price  = W1/2 = {W1/2:.6f}")
 ```
 
-Every entry equals $W^1/2 = 2.0$ exactly, confirming analytically that the
+Every entry equals $W_1/2 = 2.0$ exactly, confirming analytically that the
 Cobb-Douglas
 equilibrium price is independent of $q$ and of the state values $a_1, a_2$.
 
@@ -740,9 +744,9 @@ In each period $t$:
 3. Consumers trade and consume.
 
 The endowment vectors $\{\tilde{\omega}^t\}$ are **i.i.d.** with density
-$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_n)$ is
+$f(\omega^t \mid \lambda)$, where $\lambda = (\lambda_1, \ldots, \lambda_K)$ is
 a
-**structural parameter vector** that is *fixed but unknown*.
+**structural parameter vector** (of dimension $K$) that is *fixed but unknown*.
 
 The equilibrium price at time $t$ is a deterministic function of $\omega^t$, so
 $\{p^t\}$ is also i.i.d.
@@ -849,9 +853,7 @@ $$
 Let $\bar\lambda$ be the true
 structural parameter and $\bar\mu$ the reduced form that contains $\bar\lambda$.
 
-Assume the prior assigns positive probability to $\bar\lambda$ (equivalently,
-positive
-probability to the class $\bar\mu$).
+Assume the prior assigns positive probability to the reduced-form class $\bar\mu$.
 
 Define the posterior mass on a reduced-form class by
 
@@ -888,6 +890,16 @@ which equals the rational-expectations price distribution for a fully informed
 observer.
 ```
 
+```{note}
+Note that the theorem only requires the prior to assign positive probability to the reduced-form class $\bar\mu$ that contains the true structure $\bar\lambda$.
+
+This is implied by, but weaker than, assigning positive probability to the true
+structural parameter $\bar\lambda$ itself.
+
+A prior could place zero mass on $\bar\lambda$
+while still placing positive mass on other structures inside $\bar\mu$.
+```
+
 The important distinction is that price observers need not learn $\bar \lambda$
 itself.
 
@@ -896,7 +908,7 @@ They only learn which reduced-form class is correct.
 That is enough for forecasting because every $\lambda \in \bar \mu$ generates
 the same price density $g(\cdot \mid \bar \mu)$.
 
-This is exactly the paper's point: rational price expectations emerge from
+Rational price expectations emerge from
 learning the
 reduced form, not from identifying every structural detail of the economy.
 
@@ -905,8 +917,7 @@ for next
 period's price matches the objective price distribution generated by the true
 reduced form.
 
-The theorem is easiest to absorb in a stripped-down example, so we now turn to a
-simple simulation.
+Let's now turn to a simple simulation.
 
 (bayesian_simulation)=
 ## Simulating Bayesian learning from prices
@@ -920,7 +931,7 @@ The observer knows the two possible price distributions (the reduced forms) but
 not which
 one governs the data.
 
-This is a **Bayesian model selection** problem.
+This is a **Bayesian model selection** problem we have seen in {doc}`likelihood_bayes`.
 
 With a prior $h_0$ on $\mu_1$ and the observed price $p^t$, the posterior weight
 on $\mu_1$
@@ -931,6 +942,8 @@ h_t = \frac{h_{t-1}\, g(p^t \mid \mu_1)}{h_{t-1}\, g(p^t \mid \mu_1)
       + (1-h_{t-1})\, g(p^t \mid \mu_2)}.
 $$
 
+We consider a numerical example with two normal distributions with different means
+
 ```{code-cell} ipython3
 def simulate_bayesian_learning(
     p_bar_true, p_bar_alt, σ_p, T, h0, n_paths, seed=42
@@ -976,6 +989,16 @@ def plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, ax):
     ax.legend(fontsize=10)
 ```
 
+We consider two cases, one that is easy to learn and another one that is harder to learn,
+using $T = 300$ periods, $n = 40$ simulated paths, a diffuse prior $h_0 = 0.5$, and
+common standard deviation $\sigma_p = 0.4$.
+
+- *Easy case*: true model $N(2.0,\, 0.4^2)$, alternative $N(1.2,\, 0.4^2)$.
+- *Hard case*: true model $N(2.0,\, 0.4^2)$, alternative $N(1.8,\, 0.4^2)$.
+
+Whether easy or hard to learn depends on "how close" the true distribution is compared to the
+alternative hypothesis.
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -983,19 +1006,19 @@ mystnb:
     caption: bayesian learning across paths
     name: fig-bayesian-learning
 ---
-T       = 300
-h0      = 0.5     # diffuse prior
+T = 300
+h0 = 0.5     # diffuse prior
 n_paths = 40
 σ_p = 0.4
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 
-# Distinct reduced forms.
+# Distinct reduced forms
 p_bar_true, p_bar_alt = 2.0, 1.2
 h_paths = simulate_bayesian_learning(p_bar_true, p_bar_alt, σ_p, T, h0, n_paths)
 plot_bayesian_learning(h_paths, p_bar_true, p_bar_alt, axes[0])
 
-# Similar reduced forms.
+# Similar reduced forms
 p_bar_true, p_bar_alt = 2.0, 1.8
 h_paths_hard = simulate_bayesian_learning(
     p_bar_true, p_bar_alt, σ_p, T, h0, n_paths
@@ -1011,15 +1034,16 @@ probability one,
 though convergence is slower when the two price distributions are similar (right
 panel).
 
-This first simulation tracks posterior mass, and the next one tracks the
-predictive density itself.
-
 ### Price expectations vs. rational expectations
 
 We now verify that the observer's price expectations converge to the
 rational-expectations
 distribution $g(p \mid \bar\mu)$.
 
+We continue to use the parameterization of the "easy-to-learn" example above
+($\bar{p}_{\text{true}} = 2.0$, $\bar{p}_{\text{alt}} = 1.2$, $\sigma_p = 0.4$),
+now extending to $T = 1{,}000$ periods with a single simulated path and prior $h_0 = 0.5$
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -1094,6 +1118,12 @@ $\mu_1 = \{\lambda^{(1)}, \lambda^{(2)}\}$ and $\mu_2 = \{\lambda^{(3)}\}$
 (because $\lambda^{(1)}$ and $\lambda^{(2)}$ generate the same price
 distribution).
 
+The three structures have price means $\bar{p}_1 = \bar{p}_2 = 2.0$ and
+$\bar{p}_3 = 1.2$, with common standard deviation $\sigma_p = 0.4$, a
+uniform prior $h_0 = (1/3, 1/3, 1/3)$, and $T = 400$ periods over $30$ paths.
+
+The true structure is $\lambda^{(1)}$.
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -1121,12 +1151,12 @@ def simulate_learning_3struct(
     return h_paths
 
 
-# Structures 0 and 1 share the same reduced form.
+# Structures 0 and 1 share the same reduced form
 p_bar_vec = np.array([2.0, 2.0, 1.2])
 h0_vec = np.array([1 / 3, 1 / 3, 1 / 3])
 σ_p = 0.4
 T = 400
-true_idx = 0     # Structure 0 is observationally equivalent to 1.
+true_idx = 0     # Structure 0 is observationally equivalent to 1
 
 h_paths_3 = simulate_learning_3struct(
     T, h0_vec, p_bar_vec, σ_p, true_idx, n_paths=30
@@ -1314,11 +1344,13 @@ roughly $T_{0.99} \approx C / D_{KL}$ for some constant $C$.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 σ_p = 0.4
 
 def kl_normal(p1, p2, σ):
-    """Return the KL divergence for N(p1, sigma^2) and N(p2, sigma^2)."""
+    """Return the KL divergence for N(p1, σ^2) and N(p2, σ^2)."""
     return (p1 - p2)**2 / (2 * σ**2)
 
 cases = [("Easy",  2.0, 1.2), ("Hard", 2.0, 1.8)]
@@ -1333,7 +1365,7 @@ for ax, (name, p1, p2) in zip(axes, cases):
     kl = kl_normal(p1, p2, σ_p)
     paths = simulate_bayesian_learning(p1, p2, σ_p, T=2000,
                                        h0=0.5, n_paths=n_paths, seed=42)
-    # First period with posterior >= 0.99.
+    # First period with posterior >= 0.99
     T99 = []
     for path in paths:
         idx = np.where(path >= 0.99)[0]
@@ -1389,6 +1421,8 @@ Discuss your findings.
 :class: dropdown
 ```
 
+Here is  one solution:
+
 ```{code-cell} ipython3
 def simulate_misspecified(
     T, p_bar_true, p_bar_wrong, σ_p, h0, n_paths, seed=0
@@ -1429,7 +1463,7 @@ h_misspec = simulate_misspecified(T, p_true, p_wrong, σ_p, h0, n_paths)
 
 kl_vals = (p_true - p_wrong)**2 / (2 * σ_p**2)
 for mean, kl in zip(p_wrong, kl_vals):
-    print(f"KL(true || N({mean:.1f}, sigma^2)) = {kl:.4f}")
+    print(f"KL(true || N({mean:.1f}, σ^2)) = {kl:.4f}")
 
 t_grid = np.arange(T + 1)
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index 70f361ae9..96adcd647 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -67,6 +67,8 @@ import matplotlib.pyplot as plt
 import numpy as np
 from numba import jit
 import statsmodels.api as sm
+
+rng = np.random.default_rng(0)
 ```
 
 Assume that an $N \times 1$ random vector $z$ has a
@@ -474,7 +476,7 @@ of $\epsilon$ will converge to $\hat{\Sigma}_1$.
 n = 1_000_000 # sample size
 
 # simulate multivariate normal random vectors
-data = np.random.multivariate_normal(μ, Σ, size=n)
+data = rng.multivariate_normal(μ, Σ, size=n)
 z1_data = data[:, 0]
 z2_data = data[:, 1]
 
@@ -517,8 +519,8 @@ Let’s apply our code to a trivariate example.
 We’ll specify the mean vector and the covariance matrix as follows.
 
 ```{code-cell} python3
-μ = np.random.random(3)
-C = np.random.random((3, 3))
+μ = rng.random(3)
+C = rng.random((3, 3))
 Σ = C @ C.T # positive semi-definite
 
 multi_normal = MultivariateNormal(μ, Σ)
@@ -545,7 +547,7 @@ z2 = np.array([2., 5.])
 
 ```{code-cell} python3
 n = 1_000_000
-data = np.random.multivariate_normal(μ, Σ, size=n)
+data = rng.multivariate_normal(μ, Σ, size=n)
 z1_data = data[:, :k]
 z2_data = data[:, k:]
 ```
@@ -714,7 +716,7 @@ $\theta$ conditional on our test scores.
 Let’s do that and then print out some pertinent quantities.
 
 ```{code-cell} python3
-x = np.random.multivariate_normal(μ_IQ, Σ_IQ)
+x = rng.multivariate_normal(μ_IQ, Σ_IQ)
 y = x[:-1] # test scores
 θ = x[-1]  # IQ
 ```
@@ -1044,7 +1046,7 @@ n = 2
 
 ```{code-cell} python3
 # take one draw
-x = np.random.multivariate_normal(μ_IQ2d, Σ_IQ2d)
+x = rng.multivariate_normal(μ_IQ2d, Σ_IQ2d)
 y1 = x[:n]
 y2 = x[n:2*n]
 θ = x[2*n]
@@ -1261,7 +1263,7 @@ This is going to be very useful for doing the conditioning to be used in
 the fun exercises below.
 
 ```{code-cell} python3
-z = np.random.multivariate_normal(μz, Σz)
+z = rng.multivariate_normal(μz, Σz)
 
 x = z[:T+1]
 y = z[T+1:]
@@ -1660,7 +1662,7 @@ conditional mean $E \left[p_{t} \mid y_{t-1}, y_{t}\right]$ using
 the `MultivariateNormal` class.
 
 ```{code-cell} python3
-z = np.random.multivariate_normal(μz, Σz)
+z = rng.multivariate_normal(μz, Σz)
 y, p = z[:T], z[T:]
 ```
 
@@ -1979,7 +1981,7 @@ We describe the Kalman filter and some applications of it in {doc}`A First Look
 
 ## Classic factor analysis model
 
-The factor analysis model widely used in psychology and other fields can
+The factor analysis model can
 be represented as
 
 $$
@@ -1990,11 +1992,11 @@ where
 
 1. $Y$ is $n \times 1$ random vector,
    $E U U^\top = D$ is a diagonal matrix,
-1. $\Lambda$ is $n \times k$ coefficient matrix,
-1. $f$ is $k \times 1$ random vector,
+2. $\Lambda$ is $n \times k$ coefficient matrix,
+3. $f$ is $k \times 1$ random vector,
    $E f f^\top = I$,
-1. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f^\top = 0 $ )
-1. It is presumed that $k$ is small relative to $n$; often
+4. $U$ is $n \times 1$ random vector, and $U \perp f$ (i.e., $E U f^\top = 0 $ )
+5. It is presumed that $k$ is small relative to $n$; often
    $k$ is only $1$ or $2$, as in our IQ examples.
 
 This implies that
@@ -2097,7 +2099,7 @@ $Z$.
 ```
 
 ```{code-cell} python3
-z = np.random.multivariate_normal(μz, Σz)
+z = rng.multivariate_normal(μz, Σz)
 
 f = z[:k]
 y = z[k:]
@@ -2148,8 +2150,9 @@ model.
 
 
 
-Technically, this means that the PCA model is misspecified. (Can you
-explain why?)
+Technically, this means that the PCA model is misspecified.
+
+(Can you explain why?)
 
 Nevertheless, this exercise will let us study how well the first two
 principal components from a PCA can approximate the conditional
@@ -2250,51 +2253,144 @@ Let’s look at them, after which we’ll look at $E f | y = B y$
 B @ y
 ```
 
-The fraction of variance in $y_{t}$ explained by the first two
-principal components can be computed as below.
+```{note}
+The two largest eigenvalues are both $5.25$ in this example. 
+
+
+When an
+eigenvalue is repeated, the associated principal components are not
+individually pinned down: any orthonormal basis for the same
+two-dimensional eigenspace is valid.
+
+For that reason, it is not meaningful to compare $\epsilon_1$ and
+$\epsilon_2$ component-by-component with $E[f \mid Y]$. 
+
+The PC scores
+live in a PCA coordinate system, while $E[f \mid Y]$ lives in factor
+space. 
+
+Even within the common two-dimensional subspace, the PCA basis can
+be rotated or sign-flipped, and its coordinates need not use the same
+scaling as the factor coordinates.
+
+What is uniquely determined is the two-dimensional subspace spanned by
+the first two columns of $P$. 
+
+In this symmetric example, that subspace is
+exactly the column space of $\Lambda$.
+```
+
+The fraction of variance in $y_t$ explained by the first two principal
+components is
 
 ```{code-cell} python3
 𝜆_tilde[:2].sum() / 𝜆_tilde.sum()
 ```
 
-Compute
+To compare PCA with the factor model in observation space, compute
 
 $$
 \hat{Y} = P_{j} \epsilon_{j} + P_{k} \epsilon_{k}
 $$
 
-where $P_{j}$ and $P_{k}$ correspond to the largest two
-eigenvalues.
+where $P_j$ and $P_k$ are the eigenvectors associated with the two
+largest eigenvalues.
 
 ```{code-cell} python3
 y_hat = P[:, :2] @ ε[:2]
 ```
 
-In this example, it turns out that the projection $\hat{Y}$ of
-$Y$ on the first two principal components does a good job of
-approximating $Ef \mid y$.
+$\hat{Y}$ is the rank-2 PCA approximation to $Y$ in observation space,
+so it is a 10-vector rather than a 2-vector. 
+
+The natural observation-space
+counterpart from the factor model is $\Lambda E[f \mid Y]$, which is
+also a 10-vector.
+
+In this symmetric example, both vectors lie in the same two-dimensional
+subspace, namely the column space of $\Lambda$. 
+
+They are therefore close,
+but not identical. 
+
+The PCA reconstruction uses the block means directly,
+while $\Lambda E[f \mid Y]$ shrinks those block means toward zero by the
+factor $5/(5+\sigma_u^2) \approx 0.952$.
+
+The next plot makes this comparison concrete.
+
+The two scatter plots, $E[Y \mid f] = \Lambda f$ and $\hat{Y}$, are both
+10-vectors in observation space, so they can be compared directly.
+
+The horizontal lines show the factor values $f_1$ and $f_2$, together
+with their posterior means $E[f_i \mid Y]$. 
 
-We confirm this in the following plot of $f$,
-$E y \mid f$, $E f \mid y$, and $\hat{y}$ against the
-observation index on the horizontal axis.
+These are 2-dimensional
+factor-space quantities, drawn over the relevant half of the index set to
+match the block structure of $\Lambda$.
+
+This uses the same idea as the earlier formula
+$E[Y \mid f] = \Lambda f$: the matrix $\Lambda$ maps a 2-vector in factor
+space into a 10-vector in observation space. 
+
+In our example,
+
+$$
+\Lambda a
+=
+\begin{bmatrix}
+a_1 \\
+\vdots \\
+a_1 \\
+a_2 \\
+\vdots \\
+a_2
+\end{bmatrix}
+\quad \text{for any } a = \begin{bmatrix} a_1 \\ a_2 \end{bmatrix},
+$$
+
+because the first five rows of $\Lambda$ are $(1,0)$ and the last five
+rows are $(0,1)$.
+
+Therefore, once we observe $Y=y$, the posterior mean
+$E[f \mid Y=y] = \begin{bmatrix} E[f_1 \mid y] \\ E[f_2 \mid y] \end{bmatrix}$
+is converted into the observation-space vector
+
+$$
+\Lambda E[f \mid Y=y]
+=
+\begin{bmatrix}
+E[f_1 \mid y] \\
+\vdots \\
+E[f_1 \mid y] \\
+E[f_2 \mid y] \\
+\vdots \\
+E[f_2 \mid y]
+\end{bmatrix}.
+$$
+
+So the horizontal line at height $E[f_1 \mid y]$ over the first five
+indices, together with the horizontal line at height $E[f_2 \mid y]$
+over the last five indices, is exactly a picture of
+$\Lambda E[f \mid Y=y]$.
 
 ```{code-cell} python3
-plt.scatter(range(N), Λ @ f, label='$Ey|f$')
-plt.scatter(range(N), y_hat, label=r'$\hat{y}$')
+plt.scatter(range(N), Λ @ f, label=r'$E[Y \mid f]$')
+plt.scatter(range(N), y_hat, label=r'$\hat{Y}$')
 plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')
 plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')
 
 Efy = B @ y
-plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label='$Ef_{1}|y$')
-plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label='$Ef_{2}|y$')
+plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label=r'$E[f_1 \mid y]$')
+plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label=r'$E[f_2 \mid y]$')
 plt.legend()
 
 plt.show()
 ```
 
-The covariance matrix of $\hat{Y}$ can be computed by first
-constructing the covariance matrix of $\epsilon$ and then use the
-upper left block for $\epsilon_{1}$ and $\epsilon_{2}$.
+To compute the covariance matrix of $\hat{Y}$, first form the covariance
+matrix of $\epsilon$ and then extract the upper-left block corresponding
+to $\epsilon_1$ and $\epsilon_2$.
 
 ```{code-cell} python3
 Σεjk = (P.T @ Σy @ P)[:2, :2]
@@ -2324,9 +2420,11 @@ fix $z_2 = 2$.
 1. Use `MultivariateNormal` to compute the analytical conditional mean
 $\hat{\mu}_1$ and variance $\hat{\Sigma}_{11}$ of $z_1 \mid z_2 = 2$.
 
-1. Draw $10^6$ samples from the joint distribution. Retain only those
-for which $|z_2 - 2| < 0.05$. Compute the sample mean and variance of
-the retained $z_1$ values.
+1. Draw $10^6$ samples from the joint distribution.
+
+   Retain only those for which $|z_2 - 2| < 0.05$.
+
+   Compute the sample mean and variance of the retained $z_1$ values.
 
 1. Confirm that the sample estimates are close to the analytical values.
 ```
@@ -2335,9 +2433,9 @@ the retained $z_1$ values.
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
+Here is one solution:
 
+```{code-cell} python3
 μ = np.array([.5, 1.])
 Σ = np.array([[1., .5], [.5, 1.]])
 
@@ -2347,7 +2445,7 @@ mn.partition(1)
 print(f"Analytical  μ1_hat = {μ1_hat[0]:.4f},  Σ11_hat = {Σ11_hat[0,0]:.4f}")
 
 n = 1_000_000
-data = np.random.multivariate_normal(μ, Σ, size=n)
+data = rng.multivariate_normal(μ, Σ, size=n)
 z1_all, z2_all = data[:, 0], data[:, 1]
 
 mask = np.abs(z2_all - 2.) < 0.05
@@ -2389,14 +2487,13 @@ $$
 so $b_1 b_2 = \rho^2$.
 
 ```{code-cell} python3
-import numpy as np
-
 for ρ in [0.2, 0.5, 0.9]:
     Σ = np.array([[1., ρ], [ρ, 1.]])
     mn = MultivariateNormal(np.zeros(2), Σ)
     mn.partition(1)
-    product = float(mn.βs[0]) * float(mn.βs[1])
-    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f},  ρ^2 = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
+    product = mn.βs[0].item() * mn.βs[1].item()
+    print(f"ρ={ρ:.1f}:  b1*b2 = {product:.4f}")
+    print(f"ρ^2 = {ρ**2:.4f},  match: {np.isclose(product, ρ**2)}")
 ```
 
 ```{solution-end}
@@ -2411,11 +2508,12 @@ Using the one-dimensional IQ model with $n = 50$ test scores and
 $\mu_\theta = 100$, $\sigma_\theta = 10$:
 
 1. Vary the test-score noise $\sigma_y \in \{1, 5, 10, 20, 50\}$.
-For each value, plot the posterior standard deviation
+
+- For each value, plot the posterior standard deviation
 $\hat{\sigma}_\theta$ as a function of the number of test scores
 included (from 1 to 50), with all curves on the same axes.
 
-1. Explain intuitively why a larger $\sigma_y$ leads to a slower
+2. Explain intuitively why a larger $\sigma_y$ leads to a slower
 decline of posterior uncertainty.
 ```
 
@@ -2423,10 +2521,9 @@ decline of posterior uncertainty.
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+Here is one solution:
 
+```{code-cell} python3
 n_max = 50
 μθ_val, σθ_val = 100., 10.
 
@@ -2449,13 +2546,15 @@ plt.show()
 
 When $\sigma_y$ is large each test score is a noisy signal about $\theta$,
 so many more observations are required before the posterior variance falls
-appreciably. In the limit $\sigma_y \to 0$ a single observation pins down
+appreciably. 
+
+In the limit $\sigma_y \to 0$ a single observation pins down
 $\theta$ exactly.
 
 ```{solution-end}
 ```
 
-```{exercise}
+````{exercise}
 :label: mv_normal_ex4
 
 **Prior vs. likelihood in IQ inference**
@@ -2464,30 +2563,41 @@ Using the one-dimensional IQ model with $n = 20$ test scores and
 $\mu_\theta = 100$, $\sigma_y = 10$:
 
 1. Fix $\sigma_y = 10$ and vary the prior spread
-$\sigma_\theta \in \{1, 5, 10, 50, 500\}$. For each value compute the
-posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
-scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
+$\sigma_\theta \in \{1, 5, 10, 50, 500\}$.
+
+    - For each value compute the
+    posterior mean $\hat{\mu}_\theta$ given the same set of $n = 20$ test
+    scores and plot $\hat{\mu}_\theta$ against $\sigma_\theta$.
 
-1. Show analytically (or verify numerically) that as $\sigma_\theta \to \infty$
-the posterior mean converges to the sample mean $\bar{y}$, and as
-$\sigma_y \to \infty$ the posterior mean converges to the prior mean
-$\mu_\theta$.
+1. Show analytically (or verify numerically) that
+
+   - as $\sigma_\theta \to \infty$ the posterior mean converges to the
+     sample mean $\bar{y}$ (the data dominate the prior), and
+   - as $\sigma_\theta \to 0$ the posterior mean converges to the prior
+     mean $\mu_\theta$ (the prior dominates the data).
+
+```{hint}
+The posterior mean formula is
+$\hat{\mu}_\theta = \bigl(\mu_\theta/\sigma_\theta^2 + n\bar{y}/\sigma_y^2\bigr)
+\big/ \bigl(1/\sigma_\theta^2 + n/\sigma_y^2\bigr)$.
 ```
 
+Examine each limit by letting $\sigma_\theta$ go to $\infty$ or $0$.
+````
+
 ```{solution-start} mv_normal_ex4
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+Here is one solution:
 
+```{code-cell} python3
 n_scores = 20
 μθ_val, σy_val = 100., 10.
 
-np.random.seed(42)
+rng = np.random.default_rng(42)
 true_θ = 108.
-y_obs = true_θ + σy_val * np.random.randn(n_scores)
+y_obs = true_θ + σy_val * rng.standard_normal(n_scores)
 y_bar = np.mean(y_obs)
 
 σθ_vals = [1., 5., 10., 50., 500.]
@@ -2498,19 +2608,34 @@ for σθ_val in σθ_vals:
     mn_i = MultivariateNormal(μ_i, Σ_i)
     mn_i.partition(n_scores)
     μθ_hat, _ = mn_i.cond_dist(1, y_obs)
-    μθ_hat_vals.append(float(μθ_hat))
+    μθ_hat_vals.append(μθ_hat.item())
+
+def posterior_mean(σθ_val):
+    μ_i, Σ_i, _ = construct_moments_IQ(n_scores, μθ_val, σθ_val, σy_val)
+    mn_i = MultivariateNormal(μ_i, Σ_i)
+    mn_i.partition(n_scores)
+    μθ_hat, _ = mn_i.cond_dist(1, y_obs)
+    return μθ_hat.item()
 
 fig, ax = plt.subplots()
-ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', label=r'$\hat{\mu}_\theta$')
-ax.axhline(y_bar,  ls='--', color='r', label=f'sample mean y_bar = {y_bar:.1f}')
-ax.axhline(μθ_val, ls=':',  color='g', label=f'prior mean μθ = {μθ_val:.0f}')
+ax.semilogx(σθ_vals, μθ_hat_vals, 'o-', 
+            label=r'$\hat{\mu}_\theta$')
+ax.axhline(y_bar,  ls='--', color='r', 
+            label=f'sample mean y_bar = {y_bar:.1f}')
+ax.axhline(μθ_val, ls=':',  color='g', 
+            label=f'prior mean μθ = {μθ_val:.0f}')
 ax.set_xlabel(r'$\sigma_\theta$')
 ax.set_ylabel(r'posterior mean $\hat{\mu}_\theta$')
 ax.legend()
 plt.show()
 
+σθ_small = 1e-2
+σθ_large = 1e4
+
 print(f"y_bar = {y_bar:.4f}")
-print(f"Large σθ posterior mean approx {μθ_hat_vals[-1]:.4f}")
+print(f"Posterior mean with σθ={σθ_large:.0e}: {posterior_mean(σθ_large):.4f}")
+print(f"Posterior mean with σθ={σθ_small:.0e}: {posterior_mean(σθ_small):.4f}")
+print(f"Prior mean μθ = {μθ_val:.4f}")
 ```
 
 ```{solution-end}
@@ -2535,9 +2660,11 @@ and initial conditions $\hat{x}_0 = [0, 0]'$, $\Sigma_0 = I_2$:
 1. Simulate $T = 60$ periods of $\{x_t, y_t\}$ and run the filter.
 
 1. Plot the sequences of conditional variances $\Sigma_t[0,0]$ and
-$\Sigma_t[1,1]$ over time. Verify that they converge to a steady state.
+$\Sigma_t[1,1]$ over time.
 
-1. Plot the filtered state estimates $\hat{x}_t[0]$ together with the
+   Verify that they converge to a steady state.
+
+1. Plot the filtered state estimates $\tilde{x}_t[0]$ together with the
 true $x_t[0]$ and the raw observations $y_t$ on a single figure.
 ```
 
@@ -2545,10 +2672,9 @@ true $x_t[0]$ and the raw observations $y_t$ on a single figure.
 :class: dropdown
 ```
 
-```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+Here is one solution:
 
+```{code-cell} python3
 A_ex = np.array([[0.9, 0.], [0., 0.5]])
 C_ex = np.array([[1.], [1.]])
 G_ex = np.array([[1., 0.]])
@@ -2558,26 +2684,44 @@ T_ex = 60
 x0_hat_ex = np.zeros(2)
 Σ0_ex = np.eye(2)
 
-np.random.seed(7)
+rng = np.random.default_rng(7)
 x_true = np.zeros((T_ex + 1, 2))
 y_seq_ex = np.zeros(T_ex)
 for t in range(T_ex):
-    x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * np.random.randn()
-    y_seq_ex[t] = G_ex @ x_true[t] + np.random.randn()
+    x_true[t + 1] = A_ex @ x_true[t] + C_ex[:, 0] * rng.standard_normal()
+    y_seq_ex[t] = (G_ex @ x_true[t]).item() + rng.standard_normal()
 
-x_hat_seq, Σ_hat_seq = iterate(x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
+x_hat_seq, Σ_hat_seq = iterate(
+    x0_hat_ex, Σ0_ex, A_ex, C_ex, G_ex, R_ex, y_seq_ex)
 
+# x_hat_seq[t] = E[x_t | y^{t-1}] (one-step-ahead prediction)
+# Σ_hat_seq[t] = corresponding prediction-error covariance
 fig, ax = plt.subplots()
 ax.plot(Σ_hat_seq[:, 0, 0], label=r'$\Sigma_t[0,0]$')
 ax.plot(Σ_hat_seq[:, 1, 1], label=r'$\Sigma_t[1,1]$')
 ax.set_xlabel('t')
-ax.set_ylabel('conditional variance')
+ax.set_ylabel('prediction-error variance')
 ax.legend()
 plt.show()
 
+# The `iterate` function stores one-step-ahead predictions. 
+# We recover the filtered estimates E[x_t | y^t] by re-applying
+# the measurement-update step at each t.
+n_state = 2
+x_filt_seq = np.empty((T_ex, n_state))
+for t in range(T_ex):
+    xt_hat = x_hat_seq[t]
+    Σt     = Σ_hat_seq[t]
+    μ_k = np.hstack([xt_hat, G_ex @ xt_hat])
+    Σ_k = np.block([[Σt,          Σt  @ G_ex.T          ],
+                    [G_ex @ Σt,   G_ex @ Σt @ G_ex.T + R_ex]])
+    mn_k = MultivariateNormal(μ_k, Σ_k)
+    mn_k.partition(n_state)
+    x_filt_seq[t], _ = mn_k.cond_dist(0, y_seq_ex[t:t+1])
+
 fig, ax = plt.subplots()
-ax.plot(x_true[1:, 0], label='true $x_t[0]$', alpha=0.7)
-ax.plot(x_hat_seq[1:, 0], label=r'filtered $\hat{x}_t[0]$', ls='--')
+ax.plot(x_true[:-1, 0], label='true $x_t[0]$', alpha=0.7)
+ax.plot(x_filt_seq[:, 0], label=r'filtered $\tilde{x}_t[0]$', ls='--')
 ax.plot(y_seq_ex, label='observations $y_t$', alpha=0.4, lw=0.8)
 ax.set_xlabel('t')
 ax.legend()
@@ -2595,14 +2739,22 @@ plt.show()
 In the classic factor analysis model at the end of the lecture the true
 covariance is $\Sigma_y = \Lambda \Lambda' + D$.
 
-1. Set $\sigma_u = 2$ (instead of $0.5$). Recompute the fraction of
-variance explained by the first two principal components and compare
-it with the $\sigma_u = 0.5$ result. Explain the change.
+1. Set $\sigma_u = 2$ (instead of $0.5$). 
+
+    - Recompute the fraction of
+    variance explained by the first two principal components and compare
+    it with the $\sigma_u = 0.5$ result. 
+    - Explain the change.
 
-1. Show that the conditional expectation $E[f \mid Y] = BY$ with
-$B = \Lambda^\top \Sigma_y^{-1}$ is **not** equal to the two-component PCA
-projection $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$. Plot both on the same
-axes.
+1. Show that the observation-space factor-analytic posterior
+   $\Lambda E[f \mid Y] = \Lambda B Y$ (an $N$-vector) is **not** equal to
+   the two-component PCA reconstruction
+   $\hat{Y} = P_{:,1:2}\,\epsilon_{1:2}$ (also an $N$-vector).
+    - Plot both on the same axes.
+
+   *Note:* $E[f \mid Y] = BY$ is a $k$-vector and $\hat{Y}$ is an
+   $N$-vector, so they cannot be compared directly; the comparison must be
+   made in observation space via $\Lambda E[f \mid Y]$.
 
 1. In one or two sentences, explain why PCA is misspecified for
 factor-analytic data.
@@ -2612,9 +2764,10 @@ factor-analytic data.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} python3
-import numpy as np
-import matplotlib.pyplot as plt
+rng = np.random.default_rng(42)
 
 N_fa = 10
 k_fa = 2
@@ -2636,40 +2789,50 @@ for σu_val in [0.5, 2.0]:
     print(f"σu={σu_val}: fraction explained by first 2 PCs = {frac:.4f}")
 
 σu_b = 0.5
-D_b  = np.eye(N_fa) * σu_b ** 2
+D_b = np.eye(N_fa) * σu_b ** 2
 Σy_b = Λ_fa @ Λ_fa.T + D_b
 
 μz_b = np.zeros(k_fa + N_fa)
 Σz_b = np.block([[np.eye(k_fa), Λ_fa.T], [Λ_fa, Σy_b]])
-z_b  = np.random.multivariate_normal(μz_b, Σz_b)
-f_b  = z_b[:k_fa]
-y_b  = z_b[k_fa:]
+z_b = rng.multivariate_normal(μz_b, Σz_b)
+f_b = z_b[:k_fa]
+y_b = z_b[k_fa:]
 
-B_b    = Λ_fa.T @ np.linalg.inv(Σy_b)
+B_b = Λ_fa.T @ np.linalg.inv(Σy_b)
 Efy_b  = B_b @ y_b
 
 λ_b, P_b = np.linalg.eigh(Σy_b)
-ind_b    = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
-P_b      = P_b[:, ind_b]
-ε_b      = P_b.T @ y_b
-y_hat_b  = P_b[:, :2] @ ε_b[:2]
+ind_b = sorted(range(N_fa), key=lambda x: λ_b[x], reverse=True)
+P_b = P_b[:, ind_b]
+ε_b = P_b.T @ y_b
+y_hat_b = P_b[:, :2] @ ε_b[:2]
 
 fig, ax = plt.subplots(figsize=(8, 4))
-ax.scatter(range(N_fa), Λ_fa @ Efy_b, label=r'Factor-analytic $\Lambda E[f\mid y]$')
-ax.scatter(range(N_fa), y_hat_b, marker='x', label=r'PCA projection $\hat{y}$')
-ax.scatter(range(N_fa), Λ_fa @ f_b, marker='^', alpha=0.6, label=r'True signal $\Lambda f$')
+ax.scatter(range(N_fa), 
+        Λ_fa @ Efy_b, label=r'Factor-analytic $\Lambda E[f\mid y]$')
+ax.scatter(range(N_fa), 
+        y_hat_b, marker='x', label=r'PCA projection $\hat{y}$')
+ax.scatter(range(N_fa), 
+        Λ_fa @ f_b, marker='^', alpha=0.6, label=r'True signal $\Lambda f$')
 ax.set_xlabel('observation index')
 ax.legend()
 plt.show()
 ```
 
-PCA is misspecified for factor-analytic data because it imposes no
-structure on the residual covariance: it decomposes $\Sigma_y$ into
-eigenvectors that need not align with the factor loadings $\Lambda$.
-The factor model, by contrast, correctly separates the covariance into a
-low-rank systematic part $\Lambda\Lambda'$ and a diagonal idiosyncratic
-part $D$, so its conditional expectation $E[f\mid Y]$ is the minimum-variance
-linear estimator of the factors.
+In this symmetric example, PCA does recover the same two-dimensional
+observation-space subspace as the factor model, namely the column space
+of $\Lambda$. But PCA is still misspecified for factor-analytic data,
+because it treats the covariance matrix as an arbitrary matrix to be
+approximated and does not use the special decomposition
+$\Sigma_y = \Lambda \Lambda^\top + D$ into a common part and an
+idiosyncratic noise part.
+
+So the two methods are solving different problems. PCA forms
+$\hat{Y}$ as the best rank-2 approximation to the observed data vector
+$Y$, which in this example amounts to using the block means. The factor
+model instead computes $\Lambda E[f \mid Y]$, the conditional mean of the
+latent common component $\Lambda f$ given the data, and because it
+accounts for noise it shrinks those block means toward zero.
 
 ```{solution-end}
 ```
diff --git a/lectures/prob_matrix.md b/lectures/prob_matrix.md
index 3a63a54d3..b29820c20 100644
--- a/lectures/prob_matrix.md
+++ b/lectures/prob_matrix.md
@@ -58,6 +58,8 @@ from scipy.special import comb
 from mpl_toolkits.mplot3d import Axes3D
 from matplotlib_inline.backend_inline import set_matplotlib_formats
 set_matplotlib_formats('retina')
+
+rng = np.random.default_rng(0)
 ```
 
 
@@ -620,7 +622,7 @@ f = np.array([[0.3, 0.2], [0.1, 0.4]])
 f_cum = np.cumsum(f)
 
 # draw random numbers
-p = np.random.rand(1_000_000)
+p = rng.random(1_000_000)
 x = np.vstack([xs[1]*np.ones(p.shape), ys[1]*np.ones(p.shape)])
 # map to the bivariate distribution
 
@@ -777,7 +779,7 @@ class discrete_bijoint:
         xs = self.xs
         ys = self.ys
         f_cum = np.cumsum(self.f)
-        p = np.random.rand(n)
+        p = rng.random(n)
         x = np.empty([2, p.shape[0]])
         lf = len(f_cum)
         lx = len(xs)-1
@@ -979,7 +981,7 @@ Next  we can use   a built-in `numpy` function to draw random samples, then calc
 μ= np.array([0, 5])
 σ= np.array([[5, .2], [.2, 1]])
 n = 1_000_000
-data = np.random.multivariate_normal(μ, σ, n)
+data = rng.multivariate_normal(μ, σ, n)
 x = data[:, 0]
 y = data[:, 1]
 ```
@@ -990,7 +992,7 @@ y = data[:, 1]
 plt.hist(x, bins=1_000, alpha=0.6)
 μx_hat, σx_hat = np.mean(x), np.std(x)
 print(μx_hat, σx_hat)
-x_sim = np.random.normal(μx_hat, σx_hat, 1_000_000)
+x_sim = rng.normal(μx_hat, σx_hat, 1_000_000)
 plt.hist(x_sim, bins=1_000, alpha=0.4, histtype="step")
 plt.show()
 ```
@@ -999,7 +1001,7 @@ plt.show()
 plt.hist(y, bins=1_000, density=True, alpha=0.6)
 μy_hat, σy_hat = np.mean(y), np.std(y)
 print(μy_hat, σy_hat)
-y_sim = np.random.normal(μy_hat, σy_hat, 1_000_000)
+y_sim = rng.normal(μy_hat, σy_hat, 1_000_000)
 plt.hist(y_sim, bins=1_000, density=True, alpha=0.4, histtype="step")
 plt.show()
 ```
@@ -1059,7 +1061,7 @@ Let's draw from a normal distribution with above mean and variance and check how
 σx = np.sqrt(np.dot((x - μx)**2, z))
 
 # sample
-zz = np.random.normal(μx, σx, 1_000_000)
+zz = rng.normal(μx, σx, 1_000_000)
 plt.hist(zz, bins=300, density=True, alpha=0.3, range=[-10, 10])
 plt.show()
 ```
@@ -1079,7 +1081,7 @@ plt.show()
 σy = np.sqrt(np.dot((y - μy)**2, z))
 
 # sample
-zz = np.random.normal(μy,σy,1_000_000)
+zz = rng.normal(μy, σy, 1_000_000)
 plt.hist(zz, bins=100, density=True, alpha=0.3)
 plt.show()
 ```
@@ -1187,7 +1189,7 @@ $$
 \text{Prob} \{X=1\}=& q  =\mu_{1}\\
 \text{Prob} \{Y=0\}=& 1-r  =\nu_{0}\\
 \text{Prob} \{Y=1\}= & r  =\nu_{1}\\
-\text{where } 0 \leq q < r \leq 1
+\text{where } 0 \leq q \leq r \leq 1
 \end{aligned}
 $$
 
@@ -1309,16 +1311,17 @@ Let's first generate X and Y.
 # number of draws
 draws = 1_000_000
 
-# generate draws from uniform distribution
-p = np.random.rand(draws)
+# generate independent draws from uniform distribution for X and Y
+p_x = rng.random(draws)
+p_y = rng.random(draws)
 
-# generate draws of X and Y via uniform distribution
+# generate draws of X and Y via independent uniform draws
 x = np.ones(draws)
 y = np.ones(draws)
-x[p <= μ[0]] = 0
-x[p > μ[0]] = 1
-y[p <= ν[0]] = 0
-y[p > ν[0]] = 1
+x[p_x <= μ[0]] = 0
+x[p_x > μ[0]] = 1
+y[p_y <= ν[0]] = 0
+y[p_y > ν[0]] = 1
 ```
 
 ```{code-cell} ipython3
@@ -1370,7 +1373,7 @@ f1_cum = np.cumsum(f1)
 draws1 = 1_000_000
 
 # generate draws from uniform distribution
-p = np.random.rand(draws1)
+p = rng.random(draws1)
 
 # generate draws of first coupling via uniform distribution
 c1 = np.vstack([np.ones(draws1), np.ones(draws1)])
@@ -1445,7 +1448,7 @@ f2_cum = np.cumsum(f2)
 draws2 = 1_000_000
 
 # generate draws from uniform distribution
-p = np.random.rand(draws2)
+p = rng.random(draws2)
 
 # generate draws of second coupling via uniform distribution
 c2 = np.vstack([np.ones(draws2), np.ones(draws2)])
@@ -1533,7 +1536,7 @@ mystnb:
 n_cop = 100_000
 
 # Draw from bivariate standard normal with correlation ρ_cop
-z = np.random.multivariate_normal(
+z = rng.multivariate_normal(
     [0, 0], [[1, ρ_cop], [ρ_cop, 1]], n_cop
 )
 
@@ -1592,6 +1595,8 @@ where $X \in \{0,1\}$ and $Y \in \{10, 20\}$.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 F = np.array([[0.3, 0.2],
               [0.1, 0.4]])
@@ -1635,6 +1640,8 @@ Using the same joint distribution $F$ and values $X \in \{0,1\}$, $Y \in \{10, 2
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 xs = np.array([0, 1])
 ys = np.array([10, 20])
@@ -1676,7 +1683,7 @@ and therefore $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0
 
 **Sum of Two Dice**
 
-Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
+Let $X$ and $Y$ be **independent** random variables, each uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z = X + Y$.
 
 1. Use the convolution formula $h_k = \sum_i f_i g_{k-i}$ to compute the distribution of $Z$.
 
@@ -1691,6 +1698,8 @@ Let $X$ and $Y$ each be uniformly distributed on $\{1,2,3,4,5,6\}$, and let $Z =
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 f = np.ones(6) / 6
 g = np.ones(6) / 6
@@ -1703,7 +1712,7 @@ h = [
 z_vals = np.arange(2, 13)
 
 n = 1_000_000
-z_sim = np.random.randint(1, 7, n) + np.random.randint(1, 7, n)
+z_sim = rng.integers(1, 7, n) + rng.integers(1, 7, n)
 counts = np.bincount(z_sim, minlength=13)[2:]
 
 fig, ax = plt.subplots()
@@ -1747,6 +1756,8 @@ where $p_{ij} = \text{Prob}\{X(t+1)=j \mid X(t)=i\}$.
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 P = np.array([[0.9, 0.1],
               [0.2, 0.8]])
@@ -1774,25 +1785,29 @@ print(f"ψ_100 close to stationary? {np.allclose(ψ_100, ψ_star, atol=1e-6)}")
 
 A coin has unknown bias $\theta \in \{0.2,\, 0.5,\, 0.8\}$ with prior $\pi = [0.25,\, 0.50,\, 0.25]$.
 
+Assume that, conditional on $\theta$, the coin flips are i.i.d. Bernoulli($\theta$).
+
 1. After observing $k = 7$ heads in $n = 10$ flips, compute the likelihood
 
-$$
-\mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
-$$
+   $$
+   \mathcal{L}(\theta \mid \text{data}) = \binom{10}{7}\,\theta^7\,(1-\theta)^3
+   $$
 
-for each $\theta$.
+   for each $\theta$.
 
-1. Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
+2. Apply equation {eq}`eq:condprobbayes` to compute the posterior $\pi(\theta \mid \text{data})$.
 
-1. Plot the prior and posterior side by side.
+3. Plot the prior and posterior side by side.
 
-1. Repeat for $k = 3$ heads and describe how the posterior shifts.
+4. Repeat for $k = 3$ heads and describe how the posterior shifts.
 ```
 
 ```{solution-start} prob_matrix_ex5
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 θ_vals = np.array([0.2, 0.5, 0.8])
 π = np.array([0.25, 0.50, 0.25])
@@ -1822,7 +1837,6 @@ for ax, post, title in zip(
     ax.set_ylabel('Probability')
     ax.set_title(title)
     ax.legend()
-plt.tight_layout()
 plt.show()
 ```