diff --git a/lectures/_toc.yml b/lectures/_toc.yml index cc420c3e..c9f39454 100644 --- a/lectures/_toc.yml +++ b/lectures/_toc.yml @@ -86,6 +86,7 @@ parts: chapters: - file: simple_linear_regression - file: mle + - file: bayes_intro - caption: Other numbered: true chapters: diff --git a/lectures/bayes_intro.md b/lectures/bayes_intro.md new file mode 100644 index 00000000..adf2066b --- /dev/null +++ b/lectures/bayes_intro.md @@ -0,0 +1,789 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.17.2 +kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Introduction to Bayesian Methods + +## Overview + +In this lecture we study one of the most important ideas in statistics: how to update our beliefs about an unknown quantity as new data arrives. + +The technique we will use is called **Bayesian updating**. + +We start with a belief about some unknown number. + +As we observe data, we revise that belief in a way that is mathematically precise. + +We will develop these ideas through an example drawn from development +finance: estimating the default rate on a new type of loan. + +Along the way we will meet conditional probability, Bayes' law, the Bernoulli +and binomial distributions, and the beta distribution. + +Let's begin by importing the libraries we need. + +```{code-cell} ipython3 +import numpy as np +import matplotlib.pyplot as plt +from scipy.stats import beta, binom +``` + +## Conditional probability + +Before we can talk about updating beliefs, we need to recall the idea of **conditional probability**. + +### Definition + +Suppose $A$ and $B$ are two events with $P(B) > 0$. + +The probability of $A$ given that $B$ has occurred is written $P(A \mid B)$ and defined by + +$$ + P(A \mid B) = \frac{P(A \cap B)}{P(B)}. +$$ (eq:cond_prob) + +Here $A \cap B$ is the event that both $A$ and $B$ occur. + +The intuition is that learning $B$ has occurred restricts us to the world in which $B$ is true. + +Within that restricted world, we ask how likely $A$ is. + +### Example: risky borrowing + +A bank classifies its borrowers as either *low risk* or *high risk*. + +Among all borrowers, 80% are low risk and 20% are high risk. + +The conditional default probabilities are: + +* low-risk borrowers: 0.05 +* high-risk borrowers: 0.40 + +Suppose we pick a borrower at random and find that they defaulted. + +What is the probability that the borrower was high risk? + +Let $H = $ high risk, $L = $ low risk and $D = $ default. + +We know that $P(H) = 0.2$, $P(L) = 0.8$, $P(D \mid H) = 0.40$ and $P(D \mid L) = 0.05$. + +The probability that a borrower is both high risk and defaults is + +$$ + P(H \cap D) = P(D \mid H)\, P(H) = 0.40 \times 0.2 = 0.08. +$$ + +The overall probability of default can be computed from the [law of total probability](https://en.wikipedia.org/wiki/Law_of_total_probability): + +$$ + P(D) = P(D \mid H)\, P(H) + P(D \mid L)\, P(L) = 0.40 \times 0.2 + 0.05 \times 0.8 = 0.08 + 0.04 = 0.12. +$$ + +Applying {eq}`eq:cond_prob`, we get + +$$ + P(H \mid D) = \frac{P(H \cap D)}{P(D)} = \frac{0.08}{0.12} \approx 0.667. +$$ + +So observing a default raises our assessment that the borrower is high risk from 20% to about 67%. + + +### Bayes' law + +The last computation was, in fact, a typical Bayesian calculation. + +Let's clarify further by returning to a setting with two abstract events $A$ and $B$. + +Notice that the event $A \cap B$ (both $A$ and $B$ occur) is the same as $B \cap A$. + +Hence, applying definition {eq}`eq:cond_prob` two ways, we have + +$$ + P(A \mid B)\, P(B) = P(A \cap B) = P(B \mid A)\, P(A). +$$ + +Dividing through by $P(B)$ gives **Bayes' law**: + +$$ + P(A \mid B) = \frac{P(B \mid A)\, P(A)}{P(B)}. +$$ (eq:bayes_law) + +In this setting, we call + +* $P(A)$ the **prior** — our belief about $A$ before seeing data, +* $P(B \mid A)$ the **likelihood** — how probable the data $B$ is when $A$ is true, and +* $P(A \mid B)$ the **posterior** — our updated belief about $A$ after seeing $B$. + +The denominator $P(B)$ is a normalizing constant that makes the posterior probabilities sum to one. + +```{prf:example} +:label: bayes_ex_cond +The borrower calculation we completed above was an application of Bayes' law with $A = H$ and $B = D$. + +Bayes' law tells us how to convert $P(D \mid H)$, which was assumed to be known, into $P(H \mid D)$, which we want. +``` + +## A microloan default problem + +Now we turn to our main example. + +A development bank is entering a new lending market — say, smallholder farmers in a region where no historical lending data exists. + +The bank makes a series of small loans (microloans). + +Each loan either **defaults** or is **repaid**. + +We encode the outcome of loan $i$ as a random variable $Y_i$, where + +$$ + Y_i = + \begin{cases} + 1 & \text{if loan } i \text{ defaults}, \\ + 0 & \text{if loan } i \text{ is repaid}. + \end{cases} +$$ + +Let $\theta$ be the probability that any given loan defaults. + +We assume that, given $\theta$, the outcomes $Y_1, Y_2, \ldots$ are independent draws with + +$$ + P(Y_i = 1 \mid \theta) = \theta, + \qquad + P(Y_i = 0 \mid \theta) = 1 - \theta. +$$ + +A random variable of this form is called a **Bernoulli** random variable with parameter $\theta$. + +The catch is that the bank does **not** know $\theta$. + +Since the market is new, $\theta$ is uncertain. + +At the same time, the bank has experience in similar markets, so it does not start from total ignorance. + +The Bayesian approach is to treat $\theta$ as a random quantity and describe our beliefs about it with a probability density $\pi$ on the interval $[0, 1]$. + +This density is called the **prior**. + +One possible option for $\pi$ is to use a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution), whose density on $[0,1]$ is + +$$ + \pi(\theta) = \frac{\theta^{a - 1} (1 - \theta)^{b - 1}}{B(a, b)}, + \qquad + B(a, b) = \int_0^1 t^{a-1}(1-t)^{b-1}\, dt, +$$ + +for parameters $a > 0$ and $b > 0$. + +The denominator $B(a, b)$ is called the **beta function**, and it is simply the +constant that makes the density integrate to one. + +For our purposes the important part is the shape, $\theta^{a-1}(1-\theta)^{b-1}$, which is all that depends on $\theta$. + +One nice property of the beta distribution is that, by varying $a$ and $b$, we can represent a wide range of beliefs. + +The four examples below range from "no idea at all" to at least some opinion as to whether $\theta$ is +likely to take a low or high value. + +```{code-cell} ipython3 +θ_grid = np.linspace(0, 1, 500) +plot_grid = np.linspace(0.001, 0.999, 500) # avoids the spikes at the endpoints +params = [(0.5, 0.5), (1, 1), (2, 5), (8, 3)] + +fig, axes = plt.subplots(2, 2, figsize=(10, 6)) +for (a, b), ax in zip(params, axes.flatten()): + ax.plot(plot_grid, beta.pdf(plot_grid, a, b), lw=2) + ax.set_title(f"Beta({a}, {b})") + ax.set_xlabel(r"$\theta$") +fig.tight_layout() +plt.show() +``` + +The shapes are strikingly different. + +$\text{Beta}(1, 1)$ is flat — it expresses complete ignorance, treating every default rate as equally likely. + +$\text{Beta}(0.5, 0.5)$ piles weight near 0 and 1, a belief that the market is probably either very safe or very risky. + +$\text{Beta}(2, 5)$ leans toward low default rates, while $\text{Beta}(8, 3)$ leans toward high ones. + + + +## A one-step update + +Now suppose the bank observes the outcome of a single loan. + +Call this outcome $y \in \{0, 1\}$. + +We want to update the prior $\pi(\theta)$ into a **posterior** $\pi(\theta \mid y)$. + +The probability of $y$ given $\theta$ is + +$$ + p(y \mid \theta) = \theta^{y} (1 - \theta)^{1 - y}. +$$ (eq:bernoulli_lik) + +This formula (which is called the Bernoulli distribution) gives us the right numbers: + +* The default outcome $y = 1$ yields $\theta^1 (1-\theta)^0 = \theta$, the probability of a default. +* The repayment outcome $y = 0$ yields $\theta^0 (1-\theta)^1 = 1 - \theta$, the probability of repayment. + +Bayes' law for a continuous parameter takes the form + +$$ + \pi(\theta \mid y) = \frac{p(y \mid \theta)\, \pi(\theta)}{\int_0^1 p(y \mid t)\, \pi(t)\, dt}. +$$ (eq:bayes_density) + +This is the exact analogue of {eq}`eq:bayes_law`. + +The numerator is likelihood times prior. + +The denominator is an integral that sums the numerator over all possible values of $\theta$, ensuring the posterior integrates to one. + +Substituting the Bernoulli likelihood {eq}`eq:bernoulli_lik` gives our complete one-step update rule: + +$$ + \pi(\theta \mid y) = \frac{\theta^{y}(1-\theta)^{1-y}\, \pi(\theta)}{\int_0^1 t^{y}(1-t)^{1-y}\, \pi(t)\, dt}. +$$ (eq:bayes_update) + +This rule is valid for any $\theta$. + +Hence we can think of it as taking any prior density $\pi(\cdot)$ and any single +observation $y$ and converting it into the posterior density $\pi(\cdot \mid y)$. + + +## Computing the update numerically + +How can we compute the posterior density? + +### Prior + +First let's set up the prior. + +For our development bank we will begin with the prior $\pi = \text{Beta}(2, 5)$. + +```{code-cell} ipython3 +a_0, b_0 = 2, 5 + +def pi(θ): + return beta.pdf(θ, a_0, b_0) +``` + +This prior puts most of its weight on default rates below 0.5, with a peak around 0.2, reflecting cautious optimism together with genuine uncertainty. + + +### Normalizing constant + +Next we need to compute the integral in the denominator of {eq}`eq:bayes_density`. + +One general approach is to compute it numerically, using a technique +such as the [trapezoidal rule](https://en.wikipedia.org/wiki/Trapezoidal_rule). + +We fix a grid of points across $[0, 1]$ and represent each density by its values at those grid points. + +Every integral then becomes a sum that `numpy` can evaluate for us. + +The idea of the trapezoidal rule is to join neighboring grid points by straight lines and sum the areas of the resulting trapezoids. + +The figure below illustrates this for the integrand $p(y \mid \theta)\, \pi(\theta)$ with $y = 1$, using a coarse grid so the trapezoids are visible. + +```{code-cell} ipython3 +:tags: [hide-input] + +fine = np.linspace(0, 1, 500) +integrand = fine * beta.pdf(fine, a_0, b_0) # θ^1 (1-θ)^0 × prior + +coarse = np.linspace(0, 1, 9) # eight trapezoids +heights = coarse * beta.pdf(coarse, a_0, b_0) + +fig, ax = plt.subplots() +ax.plot(fine, integrand, lw=2, label=r"$p(y \mid \theta)\,\pi(\theta)$") +ax.fill_between(coarse, heights, alpha=0.3, + label="trapezoidal approximation") +ax.plot(coarse, heights, 'o-', color='C1', lw=1, ms=4) +for x, h in zip(coarse, heights): # draw the trapezoid edges + ax.plot([x, x], [0, h], color='C1', lw=0.8, alpha=0.6) +ax.set_xlabel(r"$\theta$") +ax.set_ylabel("integrand") +ax.legend() +plt.show() +``` + +The finer the grid, the closer the shaded region gets to the true area under the curve. + +Let's build the update in two steps. + +First, recall that the denominator in {eq}`eq:bayes_density` is the integrated likelihood times the prior: + +$$ + \int_0^1 p(y \mid t)\, \pi(t)\, dt . +$$ + +The function below computes this constant on the grid, approximating the integral with `np.trapezoid`. + +```{code-cell} ipython3 +def normalizing_constant(y): + "Compute the denominator of Bayes' law: likelihood times prior, integrated." + t_grid = np.linspace(0, 1, 100) # linear grid of 100 points + likelihood_vals = t_grid**y * (1 - t_grid)**(1 - y) + prior_vals = pi(t_grid) + integrand = likelihood_vals * prior_vals + return np.trapezoid(integrand, t_grid) +``` + +### Calculating the posterior + +The second step divides the (unnormalized) product of likelihood and prior by this constant. + +The result is the posterior density, evaluated on the grid. + +```{code-cell} ipython3 +def update(y, θ_grid): + """ + Compute posterior values pi(θ, y) on the set of grid points θ_grid + after observing outcome y. + """ + likelihood_vals = θ_grid**y * (1 - θ_grid)**(1 - y) + prior_vals = pi(θ_grid) + return likelihood_vals * prior_vals / normalizing_constant(y) +``` + +Let's start from our Beta(2, 5) prior and suppose that we observe a single default ($y = 1$). + +```{code-cell} ipython3 +θ_grid = np.linspace(0, 1, 500) +prior_vals = pi(θ_grid) +posterior_vals = update(y=1, θ_grid=θ_grid) + +fig, ax = plt.subplots() +ax.plot(θ_grid, prior_vals, lw=2, label="prior") +ax.plot(θ_grid, posterior_vals, lw=2, label="posterior after one default") +ax.set_xlabel(r"$\theta$") +ax.set_ylabel("density") +ax.legend() +plt.show() +``` + +Observing a default shifts the posterior to the right, toward higher default probabilities. + +This makes sense: a default is evidence that $\theta$ might be larger than we thought. + +If instead the loan had been repaid, the posterior would shift left. + +```{code-cell} ipython3 +posterior_repaid = update(y=0, θ_grid=θ_grid) + +fig, ax = plt.subplots() +ax.plot(θ_grid, prior_vals, lw=2, label="prior") +ax.plot(θ_grid, posterior_repaid, lw=2, label="posterior after one repayment") +ax.set_xlabel(r"$\theta$") +ax.set_ylabel("density") +ax.legend() +plt.show() +``` + + +## Iterating the update + +Here is a key observation: the posterior after one step is itself a perfectly good prior for the next step. + +So we can repeat the update as each new loan resolves. + +Starting from a prior $\pi_0$, observing $Y_1$ gives a posterior $\pi_1$. + +Treating $\pi_1$ as the new prior and observing $Y_2$ gives $\pi_2$, and so on. + +This produces a sequence of densities $\pi_0, \pi_1, \pi_2, \ldots$ that captures our evolving beliefs. + +Let's simulate a stream of loan outcomes and watch the beliefs evolve. + +We will generate data from a "true" default rate $\theta^* = 0.15$, which the bank does not know. + +```{code-cell} ipython3 +θ_true = 0.15 +n = 100 +rng = np.random.default_rng(seed=42) +# Generate n draws from the Bernoulli distribution with θ = θ_true +outcomes = (rng.random(n) < θ_true).astype(int) +``` + +Now we iterate the grid update over these outcomes, recording the posterior at a few selected stages. + +```{code-cell} ipython3 +snapshots = [1, 5, 20, 100] + +# Set up the prior Beta(a_0, b_0) as values on a grid +current_vals = pi(θ_grid) + +# Plot the prior +fig, ax = plt.subplots() +ax.plot(θ_grid, current_vals, 'k-', lw=2, alpha=0.7, label="prior") + +# Update the density (on the grid) and plot +for i in range(1, n + 1): + y = outcomes[i - 1] + likelihood_vals = θ_grid**y * (1 - θ_grid)**(1 - y) + current_vals = likelihood_vals * current_vals + current_vals /= np.trapezoid(current_vals, θ_grid) + if i in snapshots: + ax.plot(θ_grid, current_vals, lw=2, label=f"posterior after {i} loans") + +ax.axvline(θ_true, color='k', ls=':', label=r"true $\theta^*$") +ax.set_xlabel(r"$\theta$") +ax.set_ylabel("density") +ax.legend() +plt.show() +``` + +Two things stand out. + +First, the posterior **concentrates** around the true value $\theta^* = 0.15$ as more loans resolve. + +Second, the posterior becomes **tighter** — our uncertainty about $\theta$ steadily shrinks. + +Early on, the prior has a strong influence on our beliefs. + +As data accumulates, that influence fades and the data takes over. + +## Further observations + +Let's make two additional observations regarding the model described above. + +The first shows that we can do the update steps above with pencil and paper in +the special case of a beta prior. + +The second observation concerns batch updating. + + +### A closed form: the beta prior + +The numerical updating rule provided above always works, regardless of our +choice of prior. + +But in the beta case we get a nice analytical approach as well. + +Recall that this prior is proportional to $\theta^{a-1}(1-\theta)^{b-1}$. + +When we multiply this expression by the Bernoulli likelihood $\theta^{y}(1-\theta)^{1-y}$, we get + +$$ + \theta^{y}(1-\theta)^{1-y} \cdot \theta^{a-1}(1-\theta)^{b-1} + = \theta^{(a + y) - 1}(1-\theta)^{(b + 1 - y) - 1}. +$$ + +The right-hand side has the form of another beta density. + +So if the prior is $\text{Beta}(a, b)$, the posterior is again a beta distribution — with updated parameters. + +We say that the beta distribution is a **conjugate prior** for the Bernoulli likelihood. + +The update rule for the parameters is very simple: + +- a default ($y = 1$) sends $(a, b) \mapsto (a + 1,\, b)$, +- a repayment ($y = 0$) sends $(a, b) \mapsto (a,\, b + 1)$. + +Let's verify that this closed form agrees with our numerical computation. + +We overlay the analytical $\text{Beta}(a_0 + 1, b_0)$ posterior on the grid posterior after one default. + +```{code-cell} ipython3 +closed_form = beta.pdf(θ_grid, a_0 + 1, b_0) + +fig, ax = plt.subplots() +ax.plot(θ_grid, posterior_vals, lw=4, alpha=0.5, label="numerical posterior") +ax.plot(θ_grid, closed_form, ls='--', lw=2, label=f"Beta({a_0 + 1}, {b_0})") +ax.set_xlabel(r"$\theta$") +ax.set_ylabel("density") +ax.legend() +plt.show() +``` + +The two curves lie exactly on top of one another, confirming that our numerics and the closed form agree. + + +### The batch update via the binomial likelihood + +There is a second, equally natural way to think about the problem of belief +updating given loan outcome data. + +Instead of processing outcomes one at a time, suppose we wait and observe all $n$ outcomes $Y_1, \ldots, Y_n$ together. + +Then we update directly from the prior $\pi_0$ to the posterior $\pi_n$ in a single step. + +Let the total number of defaults be given by + +$$ + k = \sum_{i=1}^n Y_i. +$$ + +The probability of observing exactly $k$ defaults out of $n$ loans is given by the **binomial** distribution: + +$$ + p(k \mid \theta) = \binom{n}{k} \theta^{k} (1 - \theta)^{n - k}. +$$ (eq:binom_lik) + +Bayes' law then gives the one-shot update + +$$ + \pi_n(\theta) = \frac{p(k \mid \theta)\, \pi_0(\theta)}{\int_0^1 p(k \mid t)\, \pi_0(t)\, dt}. +$$ + +With a $\text{Beta}(a, b)$ prior, conjugacy again gives a clean closed form. + +The binomial likelihood is proportional to $\theta^{k}(1-\theta)^{n-k}$, so the posterior is + +$$ + \text{Beta}(a + k,\ b + n - k). +$$ + +In words: add the number of defaults to $a$, and add the number of repayments to $b$. + + +## Sequential and batch updates agree + +We now have two routes from prior to posterior. + +One processes the $n$ outcomes one at a time; the other processes them all at once. + +Reassuringly, they give exactly the same answer. + +We can see why with a short algebraic argument. + +The likelihood of the full sequence $Y_1, \ldots, Y_n$, for given $\theta$, is the product of the individual Bernoulli likelihoods: + +$$ + \prod_{i=1}^n \theta^{Y_i}(1-\theta)^{1 - Y_i} + = \theta^{k}(1-\theta)^{n-k}. +$$ + +This is identical to the binomial likelihood {eq}`eq:binom_lik`, except for the factor $\binom{n}{k}$. + +But $\binom{n}{k}$ does not depend on $\theta$, so it cancels between the numerator and denominator when we normalize. + +Hence the two posteriors are equal. + +```{note} +It might appear that the sequential update never used the assumption that the outcomes are independent, while the batch update clearly did. + +In fact both rely on it. + +The honest update at step $i$ conditions on everything seen so far: + +$$ + \pi_i(\theta) \propto p(Y_i \mid \theta, Y_1, \ldots, Y_{i-1})\, \pi_{i-1}(\theta). +$$ + +When we replace $p(Y_i \mid \theta, Y_1, \ldots, Y_{i-1})$ by the single-outcome likelihood $p(Y_i \mid \theta)$, we are assuming that, given $\theta$, the new outcome is independent of the past. + +This is exactly the conditional independence that lets the batch likelihood factor into a product. + +So the assumption is doing the same work in both routes — it is just hidden in the sequential one. +``` + + +## From posterior to loan pricing + +Why does any of this matter for the bank? + +The whole point of estimating $\theta$ is to make better lending decisions. + +Let's discuss how this can be done. + +### Expected loss + +Suppose each loan has size 1, and the bank loses the full amount when a loan defaults. + +Then the **expected loss** on a new loan is just the probability of default. + +Given the data, our best estimate of that probability is the **posterior mean**. + +Recall that the posterior is $\text{Beta}(a + k,\ b + n - k)$. + +The [mean of a beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) $\text{Beta}(\alpha, \beta)$ is $\alpha / (\alpha + \beta)$, so here + +$$ + \mathbb{E}[\theta \mid \text{data}] = \frac{a + k}{a + b + n}. +$$ + +This is the number the bank should plug into its pricing: to break even, the interest rate must at least cover the expected loss. + +Let's track how the expected loss, and our uncertainty about it, evolve as loans resolve. + +```{code-cell} ipython3 +cum_defaults = np.cumsum(outcomes) +loans = np.arange(1, n + 1) + +a_n = a_0 + cum_defaults +b_n = b_0 + loans - cum_defaults + +# Use standard formulas to compute the mean and standard deviation +# at every n (vectorized) +post_mean = a_n / (a_n + b_n) +post_std = np.sqrt(a_n * b_n / ((a_n + b_n)**2 * (a_n + b_n + 1))) + +fig, ax = plt.subplots() +ax.plot(loans, post_mean, lw=2, label="posterior mean (expected loss)") +ax.fill_between(loans, post_mean - post_std, post_mean + post_std, + alpha=0.2, label="± one posterior std. dev.") +ax.axhline(θ_true, color='k', ls=':', label=r"true $\theta^*$") +ax.set_xlabel("number of loans observed") +ax.set_ylabel(r"estimate of $\theta$") +ax.legend() +plt.show() +``` + +As the bank observes more loans, its estimate of the expected loss settles down near the true default rate. + +At the same time, the band of uncertainty narrows. + +This is the practical payoff of Bayesian updating: the bank can price cautiously when data is scarce, and sharpen its pricing as experience accumulates. + +### The break-even interest rate + +Let's turn the expected loss into an actual interest rate. + +Consider a loan of size 1, carrying interest rate $r$. + +If the borrower repays, the bank receives $1 + r$. + +If the borrower defaults, the bank receives nothing. + +Suppose the bank is risk neutral, faces no funding costs, and operates in a competitive market that drives expected profits to zero. + +The zero-profit condition sets the expected repayment equal to the amount lent: + +$$ + (1 - \theta)(1 + r) = 1 . +$$ + +Solving for $r$ gives the break-even interest rate + +$$ + r = \frac{\theta}{1 - \theta} . +$$ + +This rate has a clean interpretation: the expected interest income $(1 - \theta) r$ exactly equals the expected loss $\theta$. + +Of course the bank does not know $\theta$. + +Because the bank is risk neutral, only the mean of $\theta$ enters the calculation, so it replaces $\theta$ by its posterior mean $\hat\theta = \mathbb{E}[\theta \mid \text{data}]$: + +$$ + r = \frac{\hat\theta}{1 - \hat\theta} . +$$ + +Let's see how this break-even rate evolves as the loan book grows. + +```{code-cell} ipython3 +implied_rate = post_mean / (1 - post_mean) + +fig, ax = plt.subplots() +ax.plot(loans, implied_rate, lw=2, label="break-even rate") +ax.axhline(θ_true / (1 - θ_true), color='k', ls=':', + label=r"rate at true $\theta^*$") +ax.set_xlabel("number of loans observed") +ax.set_ylabel(r"interest rate $r$") +ax.legend() +plt.show() +``` + +Early on, with little data, the rate reflects the prior and is somewhat unstable. + +As loans resolve, it settles toward $\theta^* / (1 - \theta^*) \approx 0.176$, the rate the bank would charge if it knew the true default probability. + +## Exercises + +```{exercise} +:label: bayes_ex1 + +The prior matters a lot when data is scarce, but its influence should fade as data accumulates. + +Illustrate this by repeating the iteration of the previous sections from two very different priors: + +* an *optimistic* prior $\text{Beta}(2, 8)$, which expects a low default rate, and +* a *skeptical* prior $\text{Beta}(8, 2)$, which expects a high default rate. + +Use the same simulated data (with $\theta^* = 0.15$) for both. + +Plot the two posterior means as a function of the number of loans observed, and confirm that they converge as $n$ grows. +``` + +```{solution-start} bayes_ex1 +:class: dropdown +``` + +We can use the conjugate update rule directly, since both priors are beta distributions. + +```{code-cell} ipython3 +priors = {"optimistic Beta(2, 8)": (2, 8), + "skeptical Beta(8, 2)": (8, 2)} + +cum_defaults = np.cumsum(outcomes) +loans = np.arange(1, n + 1) + +fig, ax = plt.subplots() +for label, (a, b) in priors.items(): + a_seq = a + cum_defaults + b_seq = b + loans - cum_defaults + ax.plot(loans, a_seq / (a_seq + b_seq), lw=2, label=label) + +ax.axhline(θ_true, color='k', ls=':', label=r"true $\theta^*$") +ax.set_xlabel("number of loans observed") +ax.set_ylabel("posterior mean") +ax.legend() +plt.show() +``` + +The two estimates start far apart, reflecting the disagreement between the priors. + +As loans accumulate, both are pulled toward the true value $\theta^* = 0.15$ and the gap between them shrinks. + +With enough data, the choice of prior barely matters — the data dominates. + +```{solution-end} +``` + +```{exercise} +:label: bayes_ex2 + +Suppose the bank starts with a $\text{Beta}(2, 5)$ prior and then observes 200 loans, of which 26 default. + +1. Compute the posterior mean default rate. +2. Compute a 90% **credible interval** for $\theta$ — an interval that contains $\theta$ with posterior probability 0.90. + +For the credible interval, use the 5th and 95th percentiles of the posterior, available via `beta.ppf`. +``` + +```{solution-start} bayes_ex2 +:class: dropdown +``` + +The posterior is $\text{Beta}(a_0 + k,\ b_0 + n - k)$ with $a_0 = 2$, $b_0 = 5$, $n = 200$ and $k = 26$. + +```{code-cell} ipython3 +a_post = 2 + 26 +b_post = 5 + (200 - 26) + +post_mean = a_post / (a_post + b_post) +lower, upper = beta.ppf([0.05, 0.95], a_post, b_post) + +print(f"Posterior: Beta({a_post}, {b_post})") +print(f"Posterior mean default rate: {post_mean:.4f}") +print(f"90% credible interval: ({lower:.4f}, {upper:.4f})") +``` + +The bank's best estimate of the default rate is about 13.5%, and it is 90% sure that the true rate lies within the reported interval. + +These two numbers — a point estimate and a measure of uncertainty — are exactly what is needed to price the loans. + +```{solution-end} +```