From fd69f1370e8b0ead63562a88f71962459986201c Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 06:44:15 +1000 Subject: [PATCH 01/11] Fix typos, spelling, notation errors, and swapped bounds in prob_meaning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix spelling: probabilties, probabililty (x6), statististian - Fix doubled word: "to to help" - Fix variance formula: remove erroneous factor of n (rho is Bernoulli, not binomial) - Fix notation: P_{k,i} → rho_{k,i} to match definition - Fix subject-verb agreement: "means converges" → "mean converges" - Fix swapped upper/lower bounds in part (e) ppf calls - Fix compare() to include k=0 - Fix LaTeX: replace * with \cdot for multiplication - Fix log(I) range: text said 2 to 7, code has 2 to 6 - Fix imprecise wording: f_k^I approximates Prob(X=k|θ), not θ - Clarify vague exercise pm_ex1 part 3 Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 46 ++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index dfde21873..935120d31 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -33,7 +33,7 @@ After you watch that video, please watch the following video on the Bayesian app ```{youtube} Pahyv9i_X2k ``` -After you are familiar with the material in these videos, this lecture uses the Socratic method to to help consolidate your understanding of the different questions that are answered by +After you are familiar with the material in these videos, this lecture uses the Socratic method to help consolidate your understanding of the different questions that are answered by * a frequentist confidence interval @@ -73,7 +73,7 @@ Empowered with these Python tools, we'll now explore the two meanings described Consider the following classic example. -The random variable $X $ takes on possible values $k = 0, 1, 2, \ldots, n$ with probabilties +The random variable $X $ takes on possible values $k = 0, 1, 2, \ldots, n$ with probabilities $$ \textrm{Prob}(X = k | \theta) = @@ -128,7 +128,7 @@ As usual, a law of large numbers justifies this answer. 2. Please use your code to compute $f_k^I, k = 0, \ldots , n$ and compare them to $\textrm{Prob}(X = k | \theta)$ for various values of $\theta, n$ and $I$ -3. With the Law of Large numbers in mind, use your code to say something +3. With the Law of Large Numbers in mind, use your code to describe the relationship between $f_k^I$ and $\textrm{Prob}(X = k | \theta)$ as $I$ grows ``` ```{solution-start} pm_ex1 @@ -189,10 +189,10 @@ class frequentist: comp = pt.PrettyTable() comp.field_names = ['k', 'Theoretical', 'Frequentist'] self.draw() - for i in range(n): - self.binomial(i+1) - self.compute_fk(i+1) - comp.add_row([i+1, self.P, self.f_kI]) + for i in range(n+1): + self.binomial(i) + self.compute_fk(i) + comp.add_row([i, self.P, self.f_kI]) print(comp) ``` @@ -283,7 +283,7 @@ plt.show() **Comparison with different $I$** -Now we fix $\theta=0.7, n=20, k=10$ and vary $\log(I)$ from $2$ to $7$. +Now we fix $\theta=0.7, n=20, k=10$ and vary $\log(I)$ from $2$ to $6$. ```{code-cell} ipython3 I_log_low, I_log_high, nI = 2, 6, 200 @@ -318,17 +318,17 @@ From the above graphs, we can see that **$I$, the number of independent sequence When $I$ becomes larger, the difference between theoretical probability and frequentist estimate becomes smaller. Also, as long as $I$ is large enough, changing $\theta$ or $n$ does not substantially change the accuracy of the observed fraction -as an approximation of $\theta$. +as an approximation of $\textrm{Prob}(X = k | \theta)$. The Law of Large Numbers is at work here. For each draw of an independent sequence, $\textrm{Prob}(X_i = k | \theta)$ is the same, so aggregating all draws forms an i.i.d sequence of a binary random variable $\rho_{k,i},i=1,2,...I$, with a mean of $\textrm{Prob}(X = k | \theta)$ and a variance of $$ -n \cdot \textrm{Prob}(X = k | \theta) \cdot (1-\textrm{Prob}(X = k | \theta)). +\textrm{Prob}(X = k | \theta) \cdot (1-\textrm{Prob}(X = k | \theta)). $$ -So, by the LLN, the average of $P_{k,i}$ converges to: +So, by the LLN, the average of $\rho_{k,i}$ converges to: $$ E[\rho_{k,i}] = \textrm{Prob}(X = k | \theta) = \left(\frac{n!}{k! (n-k)!} \right) \theta^k (1-\theta)^{n-k} @@ -378,7 +378,7 @@ a **beta distribution** with parameters $\alpha, \beta$. **f)** Please tell what question a Bayesian coverage interval answers. -**g)** Please compute the Posterior probabililty that $\theta \in [.45, .55]$ for various values of sample size $n$. +**g)** Please compute the Posterior probability that $\theta \in [.45, .55]$ for various values of sample size $n$. **h)** Please use your Python class to study what happens to the posterior distribution as $n \rightarrow + \infty$, again assuming that the true value of $\theta = .4$, though it is unknown to the person doing the updating via Bayes' Law. ``` @@ -518,8 +518,8 @@ plt.show() **e)** For various $n$'s, please describe and compute $.05$ and $.95$ quantiles for posterior probabilities. ```{code-cell} ipython3 -upper_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list[:14]] -lower_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list[:14]] +lower_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list[:14]] +upper_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list[:14]] interval_df = pd.DataFrame() interval_df['upper'] = upper_bound @@ -543,7 +543,7 @@ $$ F(a)=p_1,F(b)=p_2 $$ -**g)** Please compute the Posterior probabililty that $\theta \in [.45, .55]$ for various values of sample size $n$. +**g)** Please compute the Posterior probability that $\theta \in [.45, .55]$ for various values of sample size $n$. ```{code-cell} ipython3 left_value, right_value = 0.45, 0.55 @@ -561,15 +561,15 @@ ax.set_xlabel('Number of Observations', fontsize=11) plt.show() ``` -Notice that in the graph above the posterior probabililty that $\theta \in [.45, .55]$ typically exhibits a hump shape as $n$ increases. +Notice that in the graph above the posterior probability that $\theta \in [.45, .55]$ typically exhibits a hump shape as $n$ increases. Two opposing forces are at work. -The first force is that the individual adjusts his belief as he observes new outcomes, so his posterior probability distribution becomes more and more realistic, which explains the rise of the posterior probabililty. +The first force is that the individual adjusts his belief as he observes new outcomes, so his posterior probability distribution becomes more and more realistic, which explains the rise of the posterior probability. However, $[.45, .55]$ actually excludes the true $\theta =.4 $ that generates the data. -As a result, the posterior probabililty drops as larger and larger samples refine his posterior probability distribution of $\theta$. +As a result, the posterior probability drops as larger and larger samples refine his posterior probability distribution of $\theta$. The descent seems precipitous only because of the scale of the graph that has the number of observations increasing disproportionately. @@ -599,7 +599,7 @@ plt.show() As $n$ increases, we can see that the probability density functions _concentrate_ on $0.4$, the true value of $\theta$. -Here the posterior means converges to $0.4$ while the posterior standard deviations converges to $0$ from above. +Here the posterior mean converges to $0.4$ while the posterior standard deviation converges to $0$ from above. To show this, we compute the means and variances statistics of the posterior distributions. @@ -635,15 +635,15 @@ It is natural to extend the one-step Bayesian update to an $n$-step Bayesian upd $$ -\textrm{Prob}(\theta|k) = \frac{\textrm{Prob}(\theta,k)}{\textrm{Prob}(k)}=\frac{\textrm{Prob}(k|\theta)*\textrm{Prob}(\theta)}{\textrm{Prob}(k)}=\frac{\textrm{Prob}(k|\theta)*\textrm{Prob}(\theta)}{\int_0^1 \textrm{Prob}(k|\theta)*\textrm{Prob}(\theta) d\theta} +\textrm{Prob}(\theta|k) = \frac{\textrm{Prob}(\theta,k)}{\textrm{Prob}(k)}=\frac{\textrm{Prob}(k|\theta) \cdot \textrm{Prob}(\theta)}{\textrm{Prob}(k)}=\frac{\textrm{Prob}(k|\theta) \cdot \textrm{Prob}(\theta)}{\int_0^1 \textrm{Prob}(k|\theta) \cdot \textrm{Prob}(\theta) d\theta} $$ $$ -=\frac{{N \choose k} (1 - \theta)^{N-k} \theta^k*\frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)}}{\int_0^1 {N \choose k} (1 - \theta)^{N-k} \theta^k*\frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)} d\theta} +=\frac{{N \choose k} (1 - \theta)^{N-k} \theta^k \cdot \frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)}}{\int_0^1 {N \choose k} (1 - \theta)^{N-k} \theta^k \cdot \frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)} d\theta} $$ $$ -=\frac{(1 -\theta)^{\beta+N-k-1}* \theta^{\alpha+k-1}}{\int_0^1 (1 - \theta)^{\beta+N-k-1}* \theta^{\alpha+k-1} d\theta} +=\frac{(1 -\theta)^{\beta+N-k-1} \cdot \theta^{\alpha+k-1}}{\int_0^1 (1 - \theta)^{\beta+N-k-1} \cdot \theta^{\alpha+k-1} d\theta} $$ $$ @@ -687,7 +687,7 @@ plt.show() After observing a large number of outcomes, the posterior distribution collapses around $0.4$. -Thus, the Bayesian statististian comes to believe that $\theta$ is near $.4$. +Thus, the Bayesian statistician comes to believe that $\theta$ is near $.4$. As shown in the figure above, as the number of observations grows, the Bayesian coverage intervals (BCIs) become narrower and narrower around $0.4$. From dbfabac0b993e9f0df06664843ed78e581ad53ab Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 06:46:46 +1000 Subject: [PATCH 02/11] Improve code quality: naming, PEP 8, line length, grid resolution MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename class frequentist → Frequentist (PEP 8) - Rename Bay_stat → bayes (snake_case for instances) - Rename ii → i/post, num → n_obs, num_list → n_obs_list, kk → k, K → head_counts, comp → table, step_num → n_obs, npt → n_thetas, nn → n_ns, nI → n_Is - Replace (sample <= θ) * 1 with .astype(int) for consistency - Shorten docstrings to fit within 80 characters - Break long code lines (plot calls, list comprehensions, titles) - Increase θ grid from 100 to 1000 points for smoother density plots - Use f-strings with comma formatting for plot labels Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 226 +++++++++++++++++---------------------- 1 file changed, 96 insertions(+), 130 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index 935120d31..50120bda6 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -138,68 +138,44 @@ As usual, a law of large numbers justifies this answer. Here is one solution: ```{code-cell} ipython3 -class frequentist: +class Frequentist: def __init__(self, θ, n, I): - - ''' - initialization - ----------------- - parameters: - θ : probability that one toss of a coin will be a head with Y = 1 - n : number of independent flips in each independent sequence of draws - I : number of independent sequence of draws - - ''' - self.θ, self.n, self.I = θ, n, I def binomial(self, k): - - '''compute the theoretical probability for specific input k''' - - θ, n = self.θ, self.n + '''Compute the theoretical probability.''' self.k = k - self.P = binom.pmf(k, n, θ) + self.P = binom.pmf(k, self.n, self.θ) def draw(self): - - '''draw n independent flips for I independent sequences''' - + '''Draw n independent flips for I sequences.''' θ, n, I = self.θ, self.n, self.I sample = np.random.rand(I, n) - Y = (sample <= θ) * 1 - self.Y = Y - - def compute_fk(self, kk): + self.Y = (sample <= θ).astype(int) - '''compute f_{k}^I for specific input k''' - - Y, I = self.Y, self.I - K = np.sum(Y, 1) - f_kI = np.sum(K == kk) / I - self.f_kI = f_kI - self.kk = kk + def compute_fk(self, k): + '''Compute f_k^I for a given k.''' + head_counts = np.sum(self.Y, axis=1) + self.f_kI = np.sum(head_counts == k) / self.I def compare(self): - - '''compute and print the comparison''' - + '''Compute and print the comparison.''' n = self.n - comp = pt.PrettyTable() - comp.field_names = ['k', 'Theoretical', 'Frequentist'] + table = pt.PrettyTable() + table.field_names = ['k', 'Theoretical', 'Frequentist'] self.draw() for i in range(n+1): self.binomial(i) self.compute_fk(i) - comp.add_row([i, self.P, self.f_kI]) - print(comp) + table.add_row([i, self.P, self.f_kI]) + print(table) ``` ```{code-cell} ipython3 θ, n, k, I = 0.7, 20, 10, 1_000_000 -freq = frequentist(θ, n, I) +freq = Frequentist(θ, n, I) freq.compare() ``` @@ -222,12 +198,12 @@ $$ We'll vary $\theta$ from $0.01$ to $0.99$ and plot outcomes against $\theta$. ```{code-cell} ipython3 -θ_low, θ_high, npt = 0.01, 0.99, 50 -thetas = np.linspace(θ_low, θ_high, npt) +θ_low, θ_high, n_thetas = 0.01, 0.99, 50 +thetas = np.linspace(θ_low, θ_high, n_thetas) P = [] f_kI = [] -for i in range(npt): - freq = frequentist(thetas[i], n, I) +for i in range(n_thetas): + freq = Frequentist(thetas[i], n, I) freq.binomial(k) freq.draw() freq.compute_fk(k) @@ -255,12 +231,12 @@ Now we fix $\theta=0.7, k=10, I=1,000,000$ and vary $n$ from $1$ to $100$. Then we'll plot outcomes. ```{code-cell} ipython3 -n_low, n_high, nn = 1, 100, 50 -ns = np.linspace(n_low, n_high, nn, dtype='int') +n_low, n_high, n_ns = 1, 100, 50 +ns = np.linspace(n_low, n_high, n_ns, dtype='int') P = [] f_kI = [] -for i in range(nn): - freq = frequentist(θ, ns[i], I) +for i in range(n_ns): + freq = Frequentist(θ, ns[i], I) freq.binomial(k) freq.draw() freq.compute_fk(k) @@ -286,13 +262,13 @@ plt.show() Now we fix $\theta=0.7, n=20, k=10$ and vary $\log(I)$ from $2$ to $6$. ```{code-cell} ipython3 -I_log_low, I_log_high, nI = 2, 6, 200 -log_Is = np.linspace(I_log_low, I_log_high, nI) +I_log_low, I_log_high, n_Is = 2, 6, 200 +log_Is = np.linspace(I_log_low, I_log_high, n_Is) Is = np.power(10, log_Is).astype(int) P = [] f_kI = [] -for i in range(nI): - freq = frequentist(θ, n, Is[i]) +for i in range(n_Is): + freq = Frequentist(θ, n, Is[i]) freq.binomial(k) freq.draw() freq.compute_fk(k) @@ -430,85 +406,64 @@ Now please pretend that the true value of $\theta = .4$ and that someone who doe class Bayesian: def __init__(self, θ=0.4, n=1_000_000, α=0.5, β=0.5): - """ - Parameters: + ''' + Parameters ---------- - θ : float, ranging from [0,1]. - probability that one toss of a coin will be a head with Y = 1 - - n : int. - number of independent flips in an independent sequence of draws - - α&β : int or float. - parameters of the prior distribution on θ - - """ + θ : Probability of heads on each flip. + n : Number of flips in the sequence. + α, β : Parameters of the beta prior on θ. + ''' self.θ, self.n, self.α, self.β = θ, n, α, β self.prior = st.beta(α, β) def draw(self): - """ - simulate a single sequence of draws of length n, given probability θ - - """ + '''Simulate a sequence of n coin flips.''' array = np.random.rand(self.n) self.draws = (array < self.θ).astype(int) - def form_single_posterior(self, step_num): - """ - form a posterior distribution after observing the first step_num elements of the draws - - Parameters - ---------- - step_num: int. - number of steps observed to form a posterior distribution - - Returns - ------ - the posterior distribution for sake of plotting in the subsequent steps + def form_single_posterior(self, n_obs): + '''Return the posterior after the first n_obs flips.''' + heads = self.draws[:n_obs].sum() + tails = n_obs - heads + return st.beta(self.α + heads, self.β + tails) - """ - heads_num = self.draws[:step_num].sum() - tails_num = step_num - heads_num - - return st.beta(self.α+heads_num, self.β+tails_num) - - def form_posterior_series(self,num_obs_list): - """ - form a series of posterior distributions that form after observing different number of draws. - - Parameters - ---------- - num_obs_list: a list of int. - a list of the number of observations used to form a series of posterior distributions. - - """ + def form_posterior_series(self, n_obs_list): + '''Form posteriors for each sample size in n_obs_list.''' self.posterior_list = [] - for num in num_obs_list: - self.posterior_list.append(self.form_single_posterior(num)) + for n_obs in n_obs_list: + self.posterior_list.append( + self.form_single_posterior(n_obs) + ) ``` **d)** Please plot the posterior distribution for $\theta$ as a function of $\theta$ as $n$ grows from $1, 2, \ldots$. ```{code-cell} ipython3 -Bay_stat = Bayesian() -Bay_stat.draw() +bayes = Bayesian() +bayes.draw() -num_list = [1, 2, 3, 4, 5, 10, 20, 30, 50, 70, 100, 300, 500, 1000, # this line for finite n - 5000, 10_000, 50_000, 100_000, 200_000, 300_000] # this line for approximately infinite n +n_obs_list = [1, 2, 3, 4, 5, 10, 20, 30, 50, 70, + 100, 300, 500, 1000, + 5000, 10_000, 50_000, 100_000, + 200_000, 300_000] -Bay_stat.form_posterior_series(num_list) +bayes.form_posterior_series(n_obs_list) -θ_values = np.linspace(0.01, 1, 100) +θ_values = np.linspace(0.01, 1, 1000) fig, ax = plt.subplots(figsize=(10, 6)) -ax.plot(θ_values, Bay_stat.prior.pdf(θ_values), label='Prior Distribution', color='k', linestyle='--') +ax.plot(θ_values, bayes.prior.pdf(θ_values), + label='Prior Distribution', color='k', + linestyle='--') -for ii, num in enumerate(num_list[:14]): - ax.plot(θ_values, Bay_stat.posterior_list[ii].pdf(θ_values), label='Posterior with n = %d' % num) +for i, n_obs in enumerate(n_obs_list[:14]): + posterior = bayes.posterior_list[i] + ax.plot(θ_values, posterior.pdf(θ_values), + label=f'Posterior with n = {n_obs}') -ax.set_title('P.D.F of Posterior Distributions', fontsize=15) +ax.set_title('P.D.F of Posterior Distributions', + fontsize=15) ax.set_xlabel(r"$\theta$", fontsize=15) ax.legend(fontsize=11) @@ -518,13 +473,13 @@ plt.show() **e)** For various $n$'s, please describe and compute $.05$ and $.95$ quantiles for posterior probabilities. ```{code-cell} ipython3 -lower_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list[:14]] -upper_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list[:14]] +lower_bound = [post.ppf(0.05) for post in bayes.posterior_list[:14]] +upper_bound = [post.ppf(0.95) for post in bayes.posterior_list[:14]] interval_df = pd.DataFrame() interval_df['upper'] = upper_bound interval_df['lower'] = lower_bound -interval_df.index = num_list[:14] +interval_df.index = n_obs_list[:14] interval_df = interval_df.T interval_df ``` @@ -548,14 +503,20 @@ $$ ```{code-cell} ipython3 left_value, right_value = 0.45, 0.55 -posterior_prob_list=[ii.cdf(right_value)-ii.cdf(left_value) for ii in Bay_stat.posterior_list] +posterior_prob_list = [ + post.cdf(right_value) - post.cdf(left_value) + for post in bayes.posterior_list +] fig, ax = plt.subplots(figsize=(8, 5)) ax.plot(posterior_prob_list) -ax.set_title('Posterior Probabililty that '+ r"$\theta$" +' Ranges from %.2f to %.2f'%(left_value, right_value), - fontsize=13) +ax.set_title( + r'Posterior Probability that $\theta$' + f' Ranges from {left_value:.2f}' + f' to {right_value:.2f}', + fontsize=13) ax.set_xticks(np.arange(0, len(posterior_prob_list), 3)) -ax.set_xticklabels(num_list[::3]) +ax.set_xticklabels(n_obs_list[::3]) ax.set_xlabel('Number of Observations', fontsize=11) plt.show() @@ -584,10 +545,10 @@ Using the Python class we made above, we can see the evolution of posterior dist ```{code-cell} ipython3 fig, ax = plt.subplots(figsize=(10, 6)) -for ii, num in enumerate(num_list[14:]): - ii += 14 - ax.plot(θ_values, Bay_stat.posterior_list[ii].pdf(θ_values), - label='Posterior with n=%d thousand' % (num/1000)) +for i, n_obs in enumerate(n_obs_list[14:]): + posterior = bayes.posterior_list[i + 14] + ax.plot(θ_values, posterior.pdf(θ_values), + label=f'Posterior with n = {n_obs:,}') ax.set_title('P.D.F of Posterior Distributions', fontsize=15) ax.set_xlabel(r"$\theta$", fontsize=15) @@ -604,21 +565,23 @@ Here the posterior mean converges to $0.4$ while the posterior standard deviat To show this, we compute the means and variances statistics of the posterior distributions. ```{code-cell} ipython3 -mean_list = [ii.mean() for ii in Bay_stat.posterior_list] -std_list = [ii.std() for ii in Bay_stat.posterior_list] +mean_list = [post.mean() for post in bayes.posterior_list] +std_list = [post.std() for post in bayes.posterior_list] fig, ax = plt.subplots(1, 2, figsize=(14, 5)) ax[0].plot(mean_list) -ax[0].set_title('Mean Values of Posterior Distribution', fontsize=13) +ax[0].set_title('Mean of Posterior Distribution', + fontsize=13) ax[0].set_xticks(np.arange(0, len(mean_list), 3)) -ax[0].set_xticklabels(num_list[::3]) +ax[0].set_xticklabels(n_obs_list[::3]) ax[0].set_xlabel('Number of Observations', fontsize=11) ax[1].plot(std_list) -ax[1].set_title('Standard Deviations of Posterior Distribution', fontsize=13) +ax[1].set_title('Std Dev of Posterior Distribution', + fontsize=13) ax[1].set_xticks(np.arange(0, len(std_list), 3)) -ax[1].set_xticklabels(num_list[::3]) +ax[1].set_xticklabels(n_obs_list[::3]) ax[1].set_xlabel('Number of Observations', fontsize=11) plt.show() @@ -669,17 +632,20 @@ According to the Law of Large Numbers, for a large number of observations, obser Consequently, the mean of the posterior distribution converges to $0.4$ and the variance withers to zero. ```{code-cell} ipython3 -upper_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list] -lower_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list] +upper_bound = [post.ppf(0.95) for post in bayes.posterior_list] +lower_bound = [post.ppf(0.05) for post in bayes.posterior_list] fig, ax = plt.subplots(figsize=(10, 6)) -ax.scatter(np.arange(len(upper_bound)), upper_bound, label='95 th Quantile') -ax.scatter(np.arange(len(lower_bound)), lower_bound, label='05 th Quantile') +ax.scatter(np.arange(len(upper_bound)), + upper_bound, label='95th Quantile') +ax.scatter(np.arange(len(lower_bound)), + lower_bound, label='5th Quantile') ax.set_xticks(np.arange(0, len(upper_bound), 2)) -ax.set_xticklabels(num_list[::2]) +ax.set_xticklabels(n_obs_list[::2]) ax.set_xlabel('Number of Observations', fontsize=12) -ax.set_title('Bayesian Coverage Intervals of Posterior Distributions', fontsize=15) +ax.set_title('Bayesian Coverage Intervals of ' + 'Posterior Distributions', fontsize=15) ax.legend(fontsize=11) plt.show() From 4608774e0549e32872802ceaf5b217aecfafaaa7 Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 06:47:50 +1000 Subject: [PATCH 03/11] Restructure Bayesian section: derive posterior before exercise MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add back-reference to prob_matrix lecture for Bayes' Law intro - Derive the n-step posterior Beta(α+k, β+n-k) before the exercise, so the exercise solution code no longer precedes its own derivation - Replace the duplicated derivation after the exercise with a concise summary referencing the formula above - Remove duplicate "Now pretend..." sentence before part (c) - Replace "this quantecon lecture" cross-references with actual titles for better PDF rendering Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 61 +++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 35 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index 50120bda6..d1cffb7aa 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -339,6 +339,22 @@ $$ where $B(\alpha, \beta)$ is a **beta function** , so that $P(\theta)$ is a **beta distribution** with parameters $\alpha, \beta$. +We can update this prior after observing data using Bayes' Law (see {doc}`Probability with Matrices ` for an introduction). + +For a sample of $n$ coin flips that yields $k$ heads, the **likelihood function** is the binomial probability + +$$ +L(k | \theta) = {n \choose k} \theta^k (1-\theta)^{n-k} +$$ + +Applying Bayes' Law with our beta prior, the **posterior distribution** is + +$$ +\textrm{Prob}(\theta | k) = \frac{L(k | \theta) \cdot P(\theta)}{\int_0^1 L(k | \theta) \cdot P(\theta) \, d\theta} = \textrm{Beta}(\alpha + k, \, \beta + n - k) +$$ + +So the posterior is also a beta distribution — a consequence of the beta prior being **conjugate** to the binomial likelihood. + ```{exercise} :label: pm_ex2 @@ -398,8 +414,6 @@ $$ \textrm{Prob}(\theta | Y) \sim \textrm{Beta}(\alpha + Y, \beta + (1-Y)) $$ -Now please pretend that the true value of $\theta = .4$ and that someone who doesn't know this has a beta prior with $\beta = \alpha = .5$. - **c)** Now pretend that the true value of $\theta = .4$ and that someone who doesn't know this has a beta prior distribution with parameters with $\beta = \alpha = .5$. Please write a Python class to simulate this person's personal posterior distribution for $\theta$ for a _single_ sequence of $n$ draws. ```{code-cell} ipython3 @@ -592,44 +606,21 @@ plt.show() How shall we interpret the patterns above? -The answer is encoded in the Bayesian updating formulas. - -It is natural to extend the one-step Bayesian update to an $n$-step Bayesian update. - - -$$ -\textrm{Prob}(\theta|k) = \frac{\textrm{Prob}(\theta,k)}{\textrm{Prob}(k)}=\frac{\textrm{Prob}(k|\theta) \cdot \textrm{Prob}(\theta)}{\textrm{Prob}(k)}=\frac{\textrm{Prob}(k|\theta) \cdot \textrm{Prob}(\theta)}{\int_0^1 \textrm{Prob}(k|\theta) \cdot \textrm{Prob}(\theta) d\theta} -$$ - -$$ -=\frac{{N \choose k} (1 - \theta)^{N-k} \theta^k \cdot \frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)}}{\int_0^1 {N \choose k} (1 - \theta)^{N-k} \theta^k \cdot \frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)} d\theta} -$$ - -$$ -=\frac{(1 -\theta)^{\beta+N-k-1} \cdot \theta^{\alpha+k-1}}{\int_0^1 (1 - \theta)^{\beta+N-k-1} \cdot \theta^{\alpha+k-1} d\theta} -$$ - -$$ -={Beta}(\alpha + k, \beta+N-k) -$$ - -A beta distribution with $\alpha$ and $\beta$ has the following mean and variance. - -The mean is $\frac{\alpha}{\alpha + \beta}$ +The answer is encoded in the Bayesian updating formula derived above. -The variance is $\frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$ +Recall that after observing $k$ heads in $N$ flips, the posterior is $\textrm{Beta}(\alpha + k, \, \beta + N - k)$. -* $\alpha$ can be viewed as the number of successes +A beta distribution with parameters $\alpha$ and $\beta$ has -* $\beta$ can be viewed as the number of failures +* mean $\frac{\alpha}{\alpha + \beta}$ -The random variables $k$ and $N-k$ are governed by Binomial Distribution with $\theta=0.4$. +* variance $\frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$ -Call this the true data generating process. +Here $\alpha + k$ can be viewed as the number of successes (prior pseudo-count plus observed heads) and $\beta + N - k$ as the number of failures. -According to the Law of Large Numbers, for a large number of observations, observed frequencies of $k$ and $N-k$ will be described by the true data generating process, i.e., the population probability distribution that we assumed when generating the observations on the computer. (See {ref}`pm_ex1`). +Since the data are generated with $\theta = 0.4$, the Law of Large Numbers tells us that, as $N$ grows, $k/N \to 0.4$ (see {ref}`pm_ex1`). -Consequently, the mean of the posterior distribution converges to $0.4$ and the variance withers to zero. +Consequently, the posterior mean converges to $0.4$ and the posterior variance shrinks to zero. ```{code-cell} ipython3 upper_bound = [post.ppf(0.95) for post in bayes.posterior_list] @@ -682,7 +673,7 @@ To be argumentative, one could ask, why should the form of the likelihood functi A dignified response to that question is, well, it shouldn't, but if you want to compute a posterior easily you'll just be happier if your prior is conjugate to your likelihood. -Otherwise, your posterior won't have a convenient analytical form and you'll be in the situation of wanting to apply the Markov chain Monte Carlo techniques deployed in {doc}`this quantecon lecture `. +Otherwise, your posterior won't have a convenient analytical form and you'll be in the situation of wanting to apply the Markov chain Monte Carlo techniques deployed in {doc}`Non-Conjugate Priors `. We also apply these powerful methods to approximating Bayesian posteriors for non-conjugate priors in -{doc}`this quantecon lecture ` and {doc}`this quantecon lecture ` +{doc}`Posterior Distributions for AR(1) Parameters ` and {doc}`Forecasting an AR(1) Process `. From 556e31a1f41d161255399a093598ddcc79e219da Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 07:21:07 +1000 Subject: [PATCH 04/11] Replace prettytable with pandas DataFrame in compare() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drop the prettytable dependency — pandas is already imported and renders nicely in Jupyter notebooks. The compare() method now returns a DataFrame instead of printing a PrettyTable. Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 26 +++++++++----------------- 1 file changed, 9 insertions(+), 17 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index d1cffb7aa..c6f83bc07 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -49,19 +49,11 @@ We provide our own answers as the lecture unfolds, but you'll learn more if you **Code for answering questions:** -In addition to what’s in Anaconda, this lecture will deploy the following library: - -```{code-cell} ipython3 -:tags: [hide-output] -pip install prettytable -``` - -To answer our coding questions, we'll start with some imports +To answer our coding questions, we’ll start with some imports ```{code-cell} ipython3 import numpy as np import pandas as pd -import prettytable as pt import matplotlib.pyplot as plt from scipy.stats import binom import scipy.stats as st @@ -161,15 +153,15 @@ class Frequentist: def compare(self): '''Compute and print the comparison.''' - n = self.n - table = pt.PrettyTable() - table.field_names = ['k', 'Theoretical', 'Frequentist'] self.draw() - for i in range(n+1): - self.binomial(i) - self.compute_fk(i) - table.add_row([i, self.P, self.f_kI]) - print(table) + rows = [] + for k in range(self.n + 1): + self.binomial(k) + self.compute_fk(k) + rows.append([k, self.P, self.f_kI]) + return pd.DataFrame( + rows, columns=['k', 'Theoretical', 'Frequentist'] + ).set_index('k') ``` ```{code-cell} ipython3 From 53bd65a01f092f386c4746d71d37ad319325eae1 Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 08:05:55 +1000 Subject: [PATCH 05/11] Fix exercise pm_ex2 parts (a) and (b) to match their solutions The question asked for the likelihood of "a sample of length n from a binomial" but the solution gave the single-flip Bernoulli case. Reword both the questions and solution headers so parts (a) and (b) are explicitly about a single coin flip. The general n-step case is already derived in the lecture text before the exercise. Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index c6f83bc07..593eacaab 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -350,9 +350,9 @@ So the posterior is also a beta distribution — a consequence of the beta prior ```{exercise} :label: pm_ex2 -**a)** Please write down the **likelihood function** for a sample of length $n$ from a binomial distribution with parameter $\theta$. +**a)** Please write down the **likelihood function** for a single coin flip with outcome $Y \in \{0, 1\}$. -**b)** Please write down the **posterior** distribution for $\theta$ after observing one flip of the coin. +**b)** Please write down the **posterior** distribution for $\theta$ after observing that single flip. **c)** Now pretend that the true value of $\theta = .4$ and that someone who doesn't know this has a beta prior distribution with parameters with $\beta = \alpha = .5$. Please write a Python class to simulate this person's personal posterior distribution for $\theta$ for a _single_ sequence of $n$ draws. @@ -372,18 +372,13 @@ So the posterior is also a beta distribution — a consequence of the beta prior :class: dropdown ``` -**a)** Please write down the **likelihood function** and the **posterior** distribution for $\theta$ after observing one flip of our coin. - -Suppose the outcome is __Y__. - -The likelihood function is: +**a)** The **likelihood function** for a single coin flip with outcome $Y \in \{0, 1\}$ is $$ -L(Y|\theta)= \textrm{Prob}(X = Y | \theta) = -\theta^Y (1-\theta)^{1-Y} +L(Y|\theta) = \theta^Y (1-\theta)^{1-Y} $$ -**b)** Please write the **posterior** distribution for $\theta$ after observing one flip of our coin. +**b)** The **posterior** distribution for $\theta$ after observing that single flip: The prior distribution is From 220f45f3062192dca514dfb92ddeb0541e26b8df Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 08:14:47 +1000 Subject: [PATCH 06/11] Add reproducible random seeds using modern NumPy API Both classes now accept an rng parameter and use rng.random() instead of np.random.rand(). Each code cell passes a seeded np.random.default_rng() for reproducible output across builds. Also remove "typically" from the hump-shape sentence, since with fixed seeds the behavior is deterministic. Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index 593eacaab..f7fa9476b 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -132,8 +132,9 @@ Here is one solution: ```{code-cell} ipython3 class Frequentist: - def __init__(self, θ, n, I): + def __init__(self, θ, n, I, rng=None): self.θ, self.n, self.I = θ, n, I + self.rng = rng or np.random.default_rng() def binomial(self, k): '''Compute the theoretical probability.''' @@ -143,7 +144,7 @@ class Frequentist: def draw(self): '''Draw n independent flips for I sequences.''' θ, n, I = self.θ, self.n, self.I - sample = np.random.rand(I, n) + sample = self.rng.random((I, n)) self.Y = (sample <= θ).astype(int) def compute_fk(self, k): @@ -165,9 +166,10 @@ class Frequentist: ``` ```{code-cell} ipython3 +rng = np.random.default_rng(123) θ, n, k, I = 0.7, 20, 10, 1_000_000 -freq = Frequentist(θ, n, I) +freq = Frequentist(θ, n, I, rng=rng) freq.compare() ``` @@ -190,12 +192,13 @@ $$ We'll vary $\theta$ from $0.01$ to $0.99$ and plot outcomes against $\theta$. ```{code-cell} ipython3 +rng = np.random.default_rng(234) θ_low, θ_high, n_thetas = 0.01, 0.99, 50 thetas = np.linspace(θ_low, θ_high, n_thetas) P = [] f_kI = [] for i in range(n_thetas): - freq = Frequentist(thetas[i], n, I) + freq = Frequentist(thetas[i], n, I, rng=rng) freq.binomial(k) freq.draw() freq.compute_fk(k) @@ -223,12 +226,13 @@ Now we fix $\theta=0.7, k=10, I=1,000,000$ and vary $n$ from $1$ to $100$. Then we'll plot outcomes. ```{code-cell} ipython3 +rng = np.random.default_rng(345) n_low, n_high, n_ns = 1, 100, 50 ns = np.linspace(n_low, n_high, n_ns, dtype='int') P = [] f_kI = [] for i in range(n_ns): - freq = Frequentist(θ, ns[i], I) + freq = Frequentist(θ, ns[i], I, rng=rng) freq.binomial(k) freq.draw() freq.compute_fk(k) @@ -254,13 +258,14 @@ plt.show() Now we fix $\theta=0.7, n=20, k=10$ and vary $\log(I)$ from $2$ to $6$. ```{code-cell} ipython3 +rng = np.random.default_rng(456) I_log_low, I_log_high, n_Is = 2, 6, 200 log_Is = np.linspace(I_log_low, I_log_high, n_Is) Is = np.power(10, log_Is).astype(int) P = [] f_kI = [] for i in range(n_Is): - freq = Frequentist(θ, n, Is[i]) + freq = Frequentist(θ, n, Is[i], rng=rng) freq.binomial(k) freq.draw() freq.compute_fk(k) @@ -406,20 +411,23 @@ $$ ```{code-cell} ipython3 class Bayesian: - def __init__(self, θ=0.4, n=1_000_000, α=0.5, β=0.5): + def __init__(self, θ=0.4, n=1_000_000, α=0.5, β=0.5, + rng=None): ''' Parameters ---------- θ : Probability of heads on each flip. n : Number of flips in the sequence. α, β : Parameters of the beta prior on θ. + rng : NumPy random generator. ''' self.θ, self.n, self.α, self.β = θ, n, α, β + self.rng = rng or np.random.default_rng() self.prior = st.beta(α, β) def draw(self): '''Simulate a sequence of n coin flips.''' - array = np.random.rand(self.n) + array = self.rng.random(self.n) self.draws = (array < self.θ).astype(int) def form_single_posterior(self, n_obs): @@ -440,7 +448,8 @@ class Bayesian: **d)** Please plot the posterior distribution for $\theta$ as a function of $\theta$ as $n$ grows from $1, 2, \ldots$. ```{code-cell} ipython3 -bayes = Bayesian() +rng = np.random.default_rng(567) +bayes = Bayesian(rng=rng) bayes.draw() n_obs_list = [1, 2, 3, 4, 5, 10, 20, 30, 50, 70, @@ -523,7 +532,7 @@ ax.set_xlabel('Number of Observations', fontsize=11) plt.show() ``` -Notice that in the graph above the posterior probability that $\theta \in [.45, .55]$ typically exhibits a hump shape as $n$ increases. +Notice that in the graph above the posterior probability that $\theta \in [.45, .55]$ exhibits a hump shape as $n$ increases. Two opposing forces are at work. From 9b9307efe979cda2e7741056a38bc002f9c2e7bf Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 08:19:10 +1000 Subject: [PATCH 07/11] Clean up posterior PDF plots - Rename title from "P.D.F" to "PDF" - Simplify legend labels: "n = 0 (prior)", "n = 1", etc. - Remove n = 30, 70, 300, 500 from observation list to reduce clutter in the first PDF plot Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index f7fa9476b..fee99baee 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -452,8 +452,8 @@ rng = np.random.default_rng(567) bayes = Bayesian(rng=rng) bayes.draw() -n_obs_list = [1, 2, 3, 4, 5, 10, 20, 30, 50, 70, - 100, 300, 500, 1000, +n_obs_list = [1, 2, 3, 4, 5, 10, 20, 50, + 100, 1000, 5000, 10_000, 50_000, 100_000, 200_000, 300_000] @@ -464,15 +464,15 @@ bayes.form_posterior_series(n_obs_list) fig, ax = plt.subplots(figsize=(10, 6)) ax.plot(θ_values, bayes.prior.pdf(θ_values), - label='Prior Distribution', color='k', + label='n = 0 (prior)', color='k', linestyle='--') -for i, n_obs in enumerate(n_obs_list[:14]): +for i, n_obs in enumerate(n_obs_list[:10]): posterior = bayes.posterior_list[i] ax.plot(θ_values, posterior.pdf(θ_values), - label=f'Posterior with n = {n_obs}') + label=f'n = {n_obs}') -ax.set_title('P.D.F of Posterior Distributions', +ax.set_title('PDF of Posterior Distributions', fontsize=15) ax.set_xlabel(r"$\theta$", fontsize=15) @@ -483,13 +483,13 @@ plt.show() **e)** For various $n$'s, please describe and compute $.05$ and $.95$ quantiles for posterior probabilities. ```{code-cell} ipython3 -lower_bound = [post.ppf(0.05) for post in bayes.posterior_list[:14]] -upper_bound = [post.ppf(0.95) for post in bayes.posterior_list[:14]] +lower_bound = [post.ppf(0.05) for post in bayes.posterior_list[:10]] +upper_bound = [post.ppf(0.95) for post in bayes.posterior_list[:10]] interval_df = pd.DataFrame() interval_df['upper'] = upper_bound interval_df['lower'] = lower_bound -interval_df.index = n_obs_list[:14] +interval_df.index = n_obs_list[:10] interval_df = interval_df.T interval_df ``` @@ -555,12 +555,12 @@ Using the Python class we made above, we can see the evolution of posterior dist ```{code-cell} ipython3 fig, ax = plt.subplots(figsize=(10, 6)) -for i, n_obs in enumerate(n_obs_list[14:]): - posterior = bayes.posterior_list[i + 14] +for i, n_obs in enumerate(n_obs_list[10:]): + posterior = bayes.posterior_list[i + 10] ax.plot(θ_values, posterior.pdf(θ_values), - label=f'Posterior with n = {n_obs:,}') + label=f'n = {n_obs:,}') -ax.set_title('P.D.F of Posterior Distributions', fontsize=15) +ax.set_title('PDF of Posterior Distributions', fontsize=15) ax.set_xlabel(r"$\theta$", fontsize=15) ax.set_xlim(0.3, 0.5) From 3663efe9f9bb6511d564063518c2be26d17c23ea Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 08:20:49 +1000 Subject: [PATCH 08/11] Remove repeated question text from exercise solutions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Solutions for parts (c)-(h) no longer duplicate the question text as a header — they just use the part label. This follows the QuantEcon convention. Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index fee99baee..909db225e 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -406,7 +406,7 @@ $$ \textrm{Prob}(\theta | Y) \sim \textrm{Beta}(\alpha + Y, \beta + (1-Y)) $$ -**c)** Now pretend that the true value of $\theta = .4$ and that someone who doesn't know this has a beta prior distribution with parameters with $\beta = \alpha = .5$. Please write a Python class to simulate this person's personal posterior distribution for $\theta$ for a _single_ sequence of $n$ draws. +**c)** ```{code-cell} ipython3 class Bayesian: @@ -445,7 +445,7 @@ class Bayesian: ) ``` -**d)** Please plot the posterior distribution for $\theta$ as a function of $\theta$ as $n$ grows from $1, 2, \ldots$. +**d)** ```{code-cell} ipython3 rng = np.random.default_rng(567) @@ -480,7 +480,7 @@ ax.legend(fontsize=11) plt.show() ``` -**e)** For various $n$'s, please describe and compute $.05$ and $.95$ quantiles for posterior probabilities. +**e)** ```{code-cell} ipython3 lower_bound = [post.ppf(0.05) for post in bayes.posterior_list[:10]] @@ -496,9 +496,7 @@ interval_df As $n$ increases, we can see that Bayesian coverage intervals narrow and move toward $0.4$. -**f)** Please tell what question a Bayesian coverage interval answers. - -The Bayesian coverage interval tells the range of $\theta$ that corresponds to the [$p_1$, $p_2$] quantiles of the cumulative probability distribution (CDF) of the posterior distribution. +**f)** The Bayesian coverage interval tells the range of $\theta$ that corresponds to the [$p_1$, $p_2$] quantiles of the cumulative probability distribution (CDF) of the posterior distribution. To construct the coverage interval we first compute a posterior distribution of the unknown parameter $\theta$. @@ -508,7 +506,7 @@ $$ F(a)=p_1,F(b)=p_2 $$ -**g)** Please compute the Posterior probability that $\theta \in [.45, .55]$ for various values of sample size $n$. +**g)** ```{code-cell} ipython3 left_value, right_value = 0.45, 0.55 @@ -548,9 +546,7 @@ When the number of observations becomes large enough, our Bayesian becomes so co That is why we see a nearly horizontal line when the number of observations exceeds 500. -**h)** Please use your Python class to study what happens to the posterior distribution as $n \rightarrow + \infty$, again assuming that the true value of $\theta = .4$, though it is unknown to the person doing the updating via Bayes' Law. - -Using the Python class we made above, we can see the evolution of posterior distributions as $n$ approaches infinity. +**h)** Using the Python class we made above, we can see the evolution of posterior distributions as $n$ approaches infinity. ```{code-cell} ipython3 fig, ax = plt.subplots(figsize=(10, 6)) From b50cd4a6bbf0c48bdbf0d3cbf9ab80c178ee14ec Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 08:23:42 +1000 Subject: [PATCH 09/11] =?UTF-8?q?Use=20p(=CE=B8)=20for=20density=20notatio?= =?UTF-8?q?n=20and=20break=20up=20solution=20derivation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Change P(θ) to p(θ) throughout and note it is a density - Replace the single aligned equation block in the solution for part (b) with three separate display equations, each introduced by explanatory text (Bayes' Law, substitution, collecting powers) Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index 909db225e..8c68ebba2 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -327,14 +327,14 @@ Instead, the probability distribution of $\theta$ is now a summary of our views * **before** we have seen **any** data at all, or * **before** we have seen **more** data, after we have seen **some** data -Thus, suppose that, before seeing any data, you have a personal prior probability distribution saying that +Thus, suppose that, before seeing any data, you have a personal prior probability distribution with density $$ -P(\theta) = \frac{\theta^{\alpha-1}(1-\theta)^{\beta -1}}{B(\alpha, \beta)} +p(\theta) = \frac{\theta^{\alpha-1}(1-\theta)^{\beta -1}}{B(\alpha, \beta)} $$ -where $B(\alpha, \beta)$ is a **beta function** , so that $P(\theta)$ is -a **beta distribution** with parameters $\alpha, \beta$. +where $B(\alpha, \beta)$ is a **beta function** , so that $p(\theta)$ is +the density of a **beta distribution** with parameters $\alpha, \beta$. We can update this prior after observing data using Bayes' Law (see {doc}`Probability with Matrices ` for an introduction). @@ -344,10 +344,10 @@ $$ L(k | \theta) = {n \choose k} \theta^k (1-\theta)^{n-k} $$ -Applying Bayes' Law with our beta prior, the **posterior distribution** is +Applying Bayes' Law with our beta prior, the **posterior density** is $$ -\textrm{Prob}(\theta | k) = \frac{L(k | \theta) \cdot P(\theta)}{\int_0^1 L(k | \theta) \cdot P(\theta) \, d\theta} = \textrm{Beta}(\alpha + k, \, \beta + n - k) +p(\theta | k) = \frac{L(k | \theta) \cdot p(\theta)}{\int_0^1 L(k | \theta) \cdot p(\theta) \, d\theta} = \textrm{Beta}(\alpha + k, \, \beta + n - k) $$ So the posterior is also a beta distribution — a consequence of the beta prior being **conjugate** to the binomial likelihood. @@ -383,27 +383,28 @@ $$ L(Y|\theta) = \theta^Y (1-\theta)^{1-Y} $$ -**b)** The **posterior** distribution for $\theta$ after observing that single flip: +**b)** By Bayes' Law, the posterior density for $\theta$ after observing a single flip $Y$ is -The prior distribution is +$$ +p(\theta | Y) = \frac{L(Y | \theta) \cdot p(\theta)}{\int_{0}^{1} L(Y | \theta) \cdot p(\theta) \, d\theta} +$$ + +Substituting the likelihood from (a) and the beta prior density, this becomes $$ -\textrm{Prob}(\theta) = \frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)} +p(\theta | Y) = \frac{\theta^Y (1-\theta)^{1-Y} \cdot \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} / B(\alpha, \beta)}{\int_{0}^{1} \theta^Y (1-\theta)^{1-Y} \cdot \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} / B(\alpha, \beta) \, d\theta} $$ -We can derive the posterior distribution for $\theta$ via +Collecting powers of $\theta$ and $(1-\theta)$, we recognize the kernel of a beta density: -\begin{align*} - \textrm{Prob}(\theta | Y) &= \frac{\textrm{Prob}(Y | \theta) \textrm{Prob}(\theta)}{\textrm{Prob}(Y)} \\ - &=\frac{\textrm{Prob}(Y | \theta) \textrm{Prob}(\theta)}{\int_{0}^{1} \textrm{Prob}(Y | \theta) \textrm{Prob}(\theta) d \theta }\\ - &= \frac{\theta^Y (1-\theta)^{1-Y}\frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)}}{\int_{0}^{1}\theta^Y (1-\theta)^{1-Y}\frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)} d \theta } \\ - &= \frac{ \theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1}}{\int_{0}^{1}\theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1} d \theta} -\end{align*} +$$ +p(\theta | Y) = \frac{\theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1}}{\int_{0}^{1} \theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1} \, d\theta} +$$ which means that $$ -\textrm{Prob}(\theta | Y) \sim \textrm{Beta}(\alpha + Y, \beta + (1-Y)) +\theta | Y \sim \textrm{Beta}(\alpha + Y, \, \beta + (1-Y)) $$ **c)** From 731fdb20e6308aeca59b3b61acec7edf4e5f558d Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Wed, 27 May 2026 08:35:34 +1000 Subject: [PATCH 10/11] Final review polish: notation, plotting style, consistency MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Use **bold** instead of __bold__ for binomial distribution - Use IID instead of i.i.d. - Fix double "with" in exercise (c) wording - Rename quantile variables from p_1/p_2 to q_1/q_2 to avoid clash with p(θ) density notation - Fix "means and variances statistics" → "mean and standard deviation" - Standardize N → n in post-exercise text to match pre-exercise - Update "exceeds 500" → "exceeds 1000" to match revised n_obs_list - Standardize frequentist plots to use ax. methods instead of plt. - Remove dead self.k assignment in Frequentist.binomial() Co-Authored-By: Claude Opus 4.7 (1M context) --- lectures/prob_meaning.md | 60 +++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 29 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index 8c68ebba2..a5942ac14 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -19,7 +19,7 @@ kernelspec: This lecture illustrates two distinct interpretations of a **probability distribution** - * A frequentist interpretation as **relative frequencies** anticipated to occur in a large i.i.d. sample + * A frequentist interpretation as **relative frequencies** anticipated to occur in a large IID. sample * A Bayesian interpretation as a **personal opinion** (about a parameter or list of parameters) after seeing a collection of observations @@ -74,7 +74,7 @@ $$ where the fixed parameter $\theta \in (0,1)$. -This is called the __binomial distribution__. +This is called the **binomial distribution**. Here @@ -138,7 +138,6 @@ class Frequentist: def binomial(self, k): '''Compute the theoretical probability.''' - self.k = k self.P = binom.pmf(k, self.n, self.θ) def draw(self): @@ -211,11 +210,12 @@ fig, ax = plt.subplots(figsize=(8, 6)) ax.grid() ax.plot(thetas, P, 'k-.', label='Theoretical') ax.plot(thetas, f_kI, 'r--', label='Fraction') -plt.title(r'Comparison with different $\theta$', fontsize=16) -plt.xlabel(r'$\theta$', fontsize=15) -plt.ylabel('Fraction', fontsize=15) -plt.tick_params(labelsize=13) -plt.legend() +ax.set_title(r'Comparison with different $\theta$', + fontsize=16) +ax.set_xlabel(r'$\theta$', fontsize=15) +ax.set_ylabel('Fraction', fontsize=15) +ax.tick_params(labelsize=13) +ax.legend() plt.show() ``` @@ -245,11 +245,12 @@ fig, ax = plt.subplots(figsize=(8, 6)) ax.grid() ax.plot(ns, P, 'k-.', label='Theoretical') ax.plot(ns, f_kI, 'r--', label='Frequentist') -plt.title(r'Comparison with different $n$', fontsize=16) -plt.xlabel(r'$n$', fontsize=15) -plt.ylabel('Fraction', fontsize=15) -plt.tick_params(labelsize=13) -plt.legend() +ax.set_title(r'Comparison with different $n$', + fontsize=16) +ax.set_xlabel(r'$n$', fontsize=15) +ax.set_ylabel('Fraction', fontsize=15) +ax.tick_params(labelsize=13) +ax.legend() plt.show() ``` @@ -278,11 +279,12 @@ fig, ax = plt.subplots(figsize=(8, 6)) ax.grid() ax.plot(Is, P, 'k-.', label='Theoretical') ax.plot(Is, f_kI, 'r--', label='Fraction') -plt.title(r'Comparison with different $I$', fontsize=16) -plt.xlabel(r'$I$', fontsize=15) -plt.ylabel('Fraction', fontsize=15) -plt.tick_params(labelsize=13) -plt.legend() +ax.set_title(r'Comparison with different $I$', + fontsize=16) +ax.set_xlabel(r'$I$', fontsize=15) +ax.set_ylabel('Fraction', fontsize=15) +ax.tick_params(labelsize=13) +ax.legend() plt.show() ``` @@ -295,7 +297,7 @@ as an approximation of $\textrm{Prob}(X = k | \theta)$. The Law of Large Numbers is at work here. -For each draw of an independent sequence, $\textrm{Prob}(X_i = k | \theta)$ is the same, so aggregating all draws forms an i.i.d sequence of a binary random variable $\rho_{k,i},i=1,2,...I$, with a mean of $\textrm{Prob}(X = k | \theta)$ and a variance of +For each draw of an independent sequence, $\textrm{Prob}(X_i = k | \theta)$ is the same, so aggregating all draws forms an IID sequence of a binary random variable $\rho_{k,i},i=1,2,...I$, with a mean of $\textrm{Prob}(X = k | \theta)$ and a variance of $$ \textrm{Prob}(X = k | \theta) \cdot (1-\textrm{Prob}(X = k | \theta)). @@ -320,7 +322,7 @@ Instead, we think of it as a **random variable**. $\theta$ is described by a probability distribution. -But now this probability distribution means something different than a relative frequency that we can anticipate to occur in a large i.i.d. sample. +But now this probability distribution means something different than a relative frequency that we can anticipate to occur in a large IID. sample. Instead, the probability distribution of $\theta$ is now a summary of our views about likely values of $\theta$ either @@ -359,7 +361,7 @@ So the posterior is also a beta distribution — a consequence of the beta prior **b)** Please write down the **posterior** distribution for $\theta$ after observing that single flip. -**c)** Now pretend that the true value of $\theta = .4$ and that someone who doesn't know this has a beta prior distribution with parameters with $\beta = \alpha = .5$. Please write a Python class to simulate this person's personal posterior distribution for $\theta$ for a _single_ sequence of $n$ draws. +**c)** Now pretend that the true value of $\theta = .4$ and that someone who doesn't know this has a beta prior distribution with parameters $\beta = \alpha = .5$. Please write a Python class to simulate this person's personal posterior distribution for $\theta$ for a _single_ sequence of $n$ draws. **d)** Please plot the posterior distribution for $\theta$ as a function of $\theta$ as $n$ grows as $1, 2, \ldots$. @@ -497,14 +499,14 @@ interval_df As $n$ increases, we can see that Bayesian coverage intervals narrow and move toward $0.4$. -**f)** The Bayesian coverage interval tells the range of $\theta$ that corresponds to the [$p_1$, $p_2$] quantiles of the cumulative probability distribution (CDF) of the posterior distribution. +**f)** The Bayesian coverage interval tells the range of $\theta$ that corresponds to the [$q_1$, $q_2$] quantiles of the cumulative distribution function (CDF) of the posterior distribution. To construct the coverage interval we first compute a posterior distribution of the unknown parameter $\theta$. -If the CDF is $F(\theta)$, then the Bayesian coverage interval $[a,b]$ for the interval $[p_1,p_2]$ is described by +If the CDF is $F(\theta)$, then the Bayesian coverage interval $[a,b]$ for the interval $[q_1,q_2]$ is described by $$ -F(a)=p_1,F(b)=p_2 +F(a)=q_1,F(b)=q_2 $$ **g)** @@ -545,7 +547,7 @@ The descent seems precipitous only because of the scale of the graph that has t When the number of observations becomes large enough, our Bayesian becomes so confident about $\theta$ that he considers $\theta \in [.45, .55]$ very unlikely. -That is why we see a nearly horizontal line when the number of observations exceeds 500. +That is why we see a nearly horizontal line when the number of observations exceeds 1000. **h)** Using the Python class we made above, we can see the evolution of posterior distributions as $n$ approaches infinity. @@ -569,7 +571,7 @@ As $n$ increases, we can see that the probability density functions _concentrate Here the posterior mean converges to $0.4$ while the posterior standard deviation converges to $0$ from above. -To show this, we compute the means and variances statistics of the posterior distributions. +To show this, we compute the mean and standard deviation of the posterior distributions. ```{code-cell} ipython3 mean_list = [post.mean() for post in bayes.posterior_list] @@ -601,7 +603,7 @@ How shall we interpret the patterns above? The answer is encoded in the Bayesian updating formula derived above. -Recall that after observing $k$ heads in $N$ flips, the posterior is $\textrm{Beta}(\alpha + k, \, \beta + N - k)$. +Recall that after observing $k$ heads in $n$ flips, the posterior is $\textrm{Beta}(\alpha + k, \, \beta + n - k)$. A beta distribution with parameters $\alpha$ and $\beta$ has @@ -609,9 +611,9 @@ A beta distribution with parameters $\alpha$ and $\beta$ has * variance $\frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$ -Here $\alpha + k$ can be viewed as the number of successes (prior pseudo-count plus observed heads) and $\beta + N - k$ as the number of failures. +Here $\alpha + k$ can be viewed as the number of successes (prior pseudo-count plus observed heads) and $\beta + n - k$ as the number of failures. -Since the data are generated with $\theta = 0.4$, the Law of Large Numbers tells us that, as $N$ grows, $k/N \to 0.4$ (see {ref}`pm_ex1`). +Since the data are generated with $\theta = 0.4$, the Law of Large Numbers tells us that, as $n$ grows, $k/n \to 0.4$ (see {ref}`pm_ex1`). Consequently, the posterior mean converges to $0.4$ and the posterior variance shrinks to zero. From 400d8aba354ca172fa5221d8f4e9bd6bda6c251a Mon Sep 17 00:00:00 2001 From: John Stachurski Date: Thu, 28 May 2026 17:10:56 +1000 Subject: [PATCH 11/11] Adopt new style guide notation conventions Replace \textrm{Prob}(...) with \mathbb{P}{...} and E[...] with \mathbb{E}[...] following QuantEcon.manual#84. Co-Authored-By: Claude Opus 4.7 --- lectures/prob_meaning.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/lectures/prob_meaning.md b/lectures/prob_meaning.md index a5942ac14..6b06cfb27 100644 --- a/lectures/prob_meaning.md +++ b/lectures/prob_meaning.md @@ -68,7 +68,7 @@ Consider the following classic example. The random variable $X $ takes on possible values $k = 0, 1, 2, \ldots, n$ with probabilities $$ -\textrm{Prob}(X = k | \theta) = +\mathbb{P}\{X = k \mid \theta\} = \left(\frac{n!}{k! (n-k)!} \right) \theta^k (1-\theta)^{n-k} $$ @@ -106,7 +106,7 @@ f_k^I = \frac{\textrm{number of samples of length n for which } \sum_{h=1}^n y_h I} $$ -The probability $\textrm{Prob}(X = k | \theta)$ answers the following question: +The probability $\mathbb{P}\{X = k \mid \theta\}$ answers the following question: * As $I$ becomes large, in what fraction of $I$ independent draws of $n$ coin flips should we anticipate $k$ heads to occur? @@ -118,9 +118,9 @@ As usual, a law of large numbers justifies this answer. 1. Please write a Python class to compute $f_k^I$ 2. Please use your code to compute $f_k^I, k = 0, \ldots , n$ and compare them to - $\textrm{Prob}(X = k | \theta)$ for various values of $\theta, n$ and $I$ + $\mathbb{P}\{X = k \mid \theta\}$ for various values of $\theta, n$ and $I$ -3. With the Law of Large Numbers in mind, use your code to describe the relationship between $f_k^I$ and $\textrm{Prob}(X = k | \theta)$ as $I$ grows +3. With the Law of Large Numbers in mind, use your code to describe the relationship between $f_k^I$ and $\mathbb{P}\{X = k \mid \theta\}$ as $I$ grows ``` ```{solution-start} pm_ex1 @@ -293,20 +293,20 @@ From the above graphs, we can see that **$I$, the number of independent sequence When $I$ becomes larger, the difference between theoretical probability and frequentist estimate becomes smaller. Also, as long as $I$ is large enough, changing $\theta$ or $n$ does not substantially change the accuracy of the observed fraction -as an approximation of $\textrm{Prob}(X = k | \theta)$. +as an approximation of $\mathbb{P}\{X = k \mid \theta\}$. The Law of Large Numbers is at work here. -For each draw of an independent sequence, $\textrm{Prob}(X_i = k | \theta)$ is the same, so aggregating all draws forms an IID sequence of a binary random variable $\rho_{k,i},i=1,2,...I$, with a mean of $\textrm{Prob}(X = k | \theta)$ and a variance of +For each draw of an independent sequence, $\mathbb{P}\{X_i = k \mid \theta\}$ is the same, so aggregating all draws forms an IID sequence of a binary random variable $\rho_{k,i},i=1,2,...I$, with a mean of $\mathbb{P}\{X = k \mid \theta\}$ and a variance of $$ -\textrm{Prob}(X = k | \theta) \cdot (1-\textrm{Prob}(X = k | \theta)). +\mathbb{P}\{X = k \mid \theta\} \cdot (1-\mathbb{P}\{X = k \mid \theta\}). $$ So, by the LLN, the average of $\rho_{k,i}$ converges to: $$ -E[\rho_{k,i}] = \textrm{Prob}(X = k | \theta) = \left(\frac{n!}{k! (n-k)!} \right) \theta^k (1-\theta)^{n-k} +\mathbb{E}[\rho_{k,i}] = \mathbb{P}\{X = k \mid \theta\} = \left(\frac{n!}{k! (n-k)!} \right) \theta^k (1-\theta)^{n-k} $$ as $I$ goes to infinity. @@ -343,13 +343,13 @@ We can update this prior after observing data using Bayes' Law (see {doc}`Probab For a sample of $n$ coin flips that yields $k$ heads, the **likelihood function** is the binomial probability $$ -L(k | \theta) = {n \choose k} \theta^k (1-\theta)^{n-k} +L(k \mid \theta) = {n \choose k} \theta^k (1-\theta)^{n-k} $$ Applying Bayes' Law with our beta prior, the **posterior density** is $$ -p(\theta | k) = \frac{L(k | \theta) \cdot p(\theta)}{\int_0^1 L(k | \theta) \cdot p(\theta) \, d\theta} = \textrm{Beta}(\alpha + k, \, \beta + n - k) +p(\theta \mid k) = \frac{L(k \mid \theta) \cdot p(\theta)}{\int_0^1 L(k \mid \theta) \cdot p(\theta) \, d\theta} = \textrm{Beta}(\alpha + k, \, \beta + n - k) $$ So the posterior is also a beta distribution — a consequence of the beta prior being **conjugate** to the binomial likelihood. @@ -388,25 +388,25 @@ $$ **b)** By Bayes' Law, the posterior density for $\theta$ after observing a single flip $Y$ is $$ -p(\theta | Y) = \frac{L(Y | \theta) \cdot p(\theta)}{\int_{0}^{1} L(Y | \theta) \cdot p(\theta) \, d\theta} +p(\theta \mid Y) = \frac{L(Y \mid \theta) \cdot p(\theta)}{\int_{0}^{1} L(Y \mid \theta) \cdot p(\theta) \, d\theta} $$ Substituting the likelihood from (a) and the beta prior density, this becomes $$ -p(\theta | Y) = \frac{\theta^Y (1-\theta)^{1-Y} \cdot \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} / B(\alpha, \beta)}{\int_{0}^{1} \theta^Y (1-\theta)^{1-Y} \cdot \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} / B(\alpha, \beta) \, d\theta} +p(\theta \mid Y) = \frac{\theta^Y (1-\theta)^{1-Y} \cdot \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} / B(\alpha, \beta)}{\int_{0}^{1} \theta^Y (1-\theta)^{1-Y} \cdot \theta^{\alpha - 1} (1 - \theta)^{\beta - 1} / B(\alpha, \beta) \, d\theta} $$ Collecting powers of $\theta$ and $(1-\theta)$, we recognize the kernel of a beta density: $$ -p(\theta | Y) = \frac{\theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1}}{\int_{0}^{1} \theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1} \, d\theta} +p(\theta \mid Y) = \frac{\theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1}}{\int_{0}^{1} \theta^{Y+\alpha - 1} (1 - \theta)^{1-Y+\beta - 1} \, d\theta} $$ which means that $$ -\theta | Y \sim \textrm{Beta}(\alpha + Y, \, \beta + (1-Y)) +\theta \mid Y \sim \textrm{Beta}(\alpha + Y, \, \beta + (1-Y)) $$ **c)** @@ -656,7 +656,7 @@ So posterior and prior are both beta distributions, albeit ones with different p When a likelihood function and prior fit together like hand and glove in this way, we can say that the prior and posterior are **conjugate distributions**. -In this situation, we also sometimes say that we have **conjugate prior** for the likelihood function $\textrm{Prob}(X | \theta)$. +In this situation, we also sometimes say that we have **conjugate prior** for the likelihood function $\mathbb{P}\{X \mid \theta\}$. Typically, the functional form of the likelihood function determines the functional form of a **conjugate prior**.