From 84c84cb0c4f8c951b681c60d8115991b753e4fb1 Mon Sep 17 00:00:00 2001
From: Dennis Leung <43949365+dmhleung@users.noreply.github.com>
Date: Thu, 5 Mar 2026 18:47:11 +1100
Subject: [PATCH] Some amendments by Dennis to 03 and 04

Script 4 is still not checked thru by Dennis. Will continue next time.
---
 quarto_scripts/03-Example3-1and3-11Caes.qmd |   7 +-
 quarto_scripts/04-Example3-2BTR.qmd         | 322 +++++++++++---------
 2 files changed, 173 insertions(+), 156 deletions(-)

diff --git a/quarto_scripts/03-Example3-1and3-11Caes.qmd b/quarto_scripts/03-Example3-1and3-11Caes.qmd
index 7c7129b..709dc01 100644
--- a/quarto_scripts/03-Example3-1and3-11Caes.qmd
+++ b/quarto_scripts/03-Example3-1and3-11Caes.qmd
@@ -213,8 +213,7 @@ The codebase is in Fortran, and is very fast.
 We can use `glmnet` to fit the same multinomial model
 by setting `lambda = 0` (giving no weight to the regularisation step)
 to get the unregularized estimates.
-If you look up `?glmnet`, pay attention to the multinomial parameterization,
-since the returned coefficients are not on the same baseline coding as `multinom()`.
+If you look up `?glmnet`, pay attention to what they said about symmetric multinomial model which has a different parametrization; [see the section on "Multinomial Regression" in their vignette](https://glmnet.stanford.edu/articles/glmnet.html#multinomial-regression-family-multinomial) for the precise form of this parametrization. Because of a different parametrization, the returned fitted coefficients won't be same as `multinom()`.
 
 ```{r}
 #| label: glmnet-multinomial-fit
@@ -384,7 +383,7 @@ because it fits a wide variety of multivariate response regression models.
 It is not as fast as `glmnet` or `nnet` for large datasets,
 because it uses the Fisher scoring algorithm (like the book).
 
-Is is good practice to
+It is good practice to
 [read the documentation](https://search.r-project.org/CRAN/refmans/VGAM/html/vglm.html)
 for the `vglm()` function
 with `?vglm` to understand the arguments and output structure.
@@ -638,4 +637,4 @@ for (pkg in packages) {
         cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
     }
 }
-```
\ No newline at end of file
+```
diff --git a/quarto_scripts/04-Example3-2BTR.qmd b/quarto_scripts/04-Example3-2BTR.qmd
index 7c7598f..9102d24 100644
--- a/quarto_scripts/04-Example3-2BTR.qmd
+++ b/quarto_scripts/04-Example3-2BTR.qmd
@@ -1,13 +1,17 @@
 ---
 title: "Ordinal Regression: Breathing Test Results."
 subtitle: "@fahrmeir_multivariate_2001 Examples 3.2 & 3.4"
-date: today
+editor: 
+  markdown: 
+    wrap: 72
 ---
 
 ## TODO
 
-- Look at 3.3.3 pg 31/40 for variable information on thetas and gammas.
-- Write up more about the different link functions (2025-04-11 lecture 23 min)
+-   Look at 3.3.3 pg 31/40 for variable information on thetas and
+    gammas.
+-   Write up more about the different link functions (2025-04-11 lecture
+    23 min)
 
 ## Data Loading and Setup
 
@@ -21,13 +25,21 @@ library(readr)
 library(tidyr)
 ```
 
-We load the BTR dataset.
-It contains breathing test results normal, borderline, and abnormal
-(which we can consider ordered categories),
-and considered the effects of the covariates age and smoking status.
+We load the BTR dataset. It contains breathing test results normal,
+borderline, and abnormal (which we can consider ordered categories), and
+considered the effects of the covariates age and smoking status.
 
 ```{r}
 #| label: load-btr
+btr_dat <- read_csv(
+  "data/Example3-2BTR.csv"
+  )
+```
+
+If we just read_csv() as it is, we can see that the variables `BTR`, `Age`, and `Smoking` are all read as characters. This is not what we want, and we will use the `col_types` argument to specify that we want them as factors.
+
+```{r}
+#| label: load-btr-factor
 btr_dat <- read_csv(
     "data/Example3-2BTR.csv",
     col_types = cols(
@@ -41,9 +53,9 @@ btr_dat <- read_csv(
 btr_dat
 ```
 
-We can see that `BTR`, `Age`, and `Smoking` are all factors.
-We check if the breathing test result (`BTR`) is encoded as an ordered factor,
-and if not, we convert it to an ordered factor.
+We can see that `BTR`, `Age`, and `Smoking` are all factors now. We check if
+the breathing test result (`BTR`) is encoded as an ordered factor, and
+if not, we convert it to an ordered factor.
 
 ```{r}
 #| label: order-btr
@@ -52,25 +64,28 @@ btr_dat$BTR <- as.ordered(btr_dat$BTR)
 ```
 
 We can now confirm the conversion was successful.
+
 ```{r}
 #| label: check-btr-ordered
 class(btr_dat$BTR)
 btr_dat$BTR
 ```
 
-The default ordering happens to be good for us,
-but if we needed to change the order, we could specify it with
+The default ordering happens to be good for us, but if we needed to
+change the order, we could specify it with
 `factor(..., levels = c(...), ordered = TRUE)`.
 
 We now examine the current contrast coding scheme:
+
 ```{r}
 #| label: get-contrasts-opt
 getOption("contrasts")
 ```
 
 `"contr.treatment"` corresponds to dummy coding for unordered factors.
-For ordered factors, the default is `"contr.poly"` (orthogonal polynomials).
-We consider `Age` and `Smoking` as unordered factors, and we will use dummy coding.
+For ordered factors, the default is `"contr.poly"` (orthogonal
+polynomials). We consider `Age` and `Smoking` as unordered factors, and
+we will use dummy coding.
 
 ```{r}
 #| label: show-contrasts
@@ -78,28 +93,28 @@ contrasts(btr_dat$Smoking)
 contrasts(btr_dat$Age)
 ```
 
-The default reference categories are `never-smoked` for `Smoking`,
-and the `<40` for Age,
-which matches @fahrmeir_multivariate_2001,
-however they use **effect coding** rather than **dummy coding.**
+The default reference categories are `never-smoked` for `Smoking`, and
+the `<40` for Age, which matches @fahrmeir_multivariate_2001, however
+they use **effect coding** rather than **dummy coding.**
 
-::: {.callout-note}
-- **Dummy coding**: Reference category is 0, other categories are 1;
-coefficients represent difference from reference category
-- **Effect coding**: Reference category is -1, other categories are 0 or 1;
-coefficients represent difference from overall mean
+::: callout-note
+-   **Dummy coding**: Reference category is 0, other categories are 1;
+    coefficients represent difference from reference category
+-   **Effect coding**: Reference category is -1, other categories are 0
+    or 1; coefficients represent difference from overall mean
 :::
 
 ## Model Fitting: Proportional Odds Models
 
-The `polr()` function fits cumulative logit (and other) models.
-[Looking at the documentation](https://search.r-project.org/CRAN/refmans/MASS/html/polr.html)
-using `?polr`,
-the sign of coefficients in `polr()` is **opposite** to the textbook formula.
+The `polr()` function fits cumulative logit (and other) models. [Looking
+at the
+documentation](https://search.r-project.org/CRAN/refmans/MASS/html/polr.html)
+using `?polr`, the sign of coefficients in `polr()` is **opposite** to
+the textbook formula.
 
-We first fit a cumulative logistic/proportional odds model
-with an intercept, age, smoking.
-We also include an interaction effect between age and smoking.
+We first fit a cumulative logistic/proportional odds model with an
+intercept, age, smoking. We also include an interaction effect between
+age and smoking.
 
 ```{r}
 #| label: fit-logit-int
@@ -115,21 +130,21 @@ btr_logit_inter <- polr(
 ::: {.callout-tip collapse="true"}
 ## `polr()` Function Syntax
 
-- **`formula`**: Model structure `response ~ predictors`;
-response must be an ordered factor
-- **`data`**: Data frame containing the variables
-- **`weights`**: Case weights (here, frequency counts for each covariate combination)
-- **`Hess = TRUE`**: Return the Hessian matrix;
-needed for `summary()` and confidence intervals
-- **`method`**: Link function---`"logistic"`
-(default, proportional odds), `"probit"`, `"cloglog"`, `"loglog"`, or `"cauchit"`
+-   **`formula`**: Model structure `response ~ predictors`; response
+    must be an ordered factor
+-   **`data`**: Data frame containing the variables
+-   **`weights`**: Case weights (here, frequency counts for each
+    covariate combination)
+-   **`Hess = TRUE`**: Return the Hessian matrix; needed for `summary()`
+    and confidence intervals
+-   **`method`**: Link function---`"logistic"` (default, proportional
+    odds), `"probit"`, `"cloglog"`, `"loglog"`, or `"cauchit"`
 :::
 
-The `polr()` function is limited.
-It can model standard cumulative models
-(see section 3.3.1 of @fahrmeir_multivariate_2001),
-but doesn't allow for extended cumulative model with threshold variables
-(see section 3.3.2 of @fahrmeir_multivariate_2001).
+The `polr()` function is limited. It can model standard cumulative
+models (see section 3.3.1 of @fahrmeir_multivariate_2001), but doesn't
+allow for extended cumulative model with threshold variables (see
+section 3.3.2 of @fahrmeir_multivariate_2001).
 
 ```{r}
 #| label: summary-logit-int
@@ -139,24 +154,25 @@ summary(btr_logit_inter)
 ::: {.callout-important collapse="true"}
 ## SELF TEST: Interpret Coefficients in `polr()` for a Former Smoker age 40 to 59
 
-A former smoker who is Age 40 to 59 then the cumulative odds of having a normal test
-compared to not having a normal test is $\exp(2.8334)$
-times the odds for a person who has never smoked.
+A former smoker who is Age 40 to 59 then the cumulative odds of having a
+normal test compared to not having a normal test is $\exp(2.8334)$ times
+the odds for a person who has never smoked.
 
-If smoking has no association with the breathing test results,
-then we would expect this number to be 1.
+If smoking has no association with the breathing test results, then we
+would expect this number to be 1.
 :::
 
+There is no residual readily available, but we can observe the linear
+predictor $\eta$ **as defined in the documentation page for the `polr()`
+function without the intercept**.
 
-There is no residual readily available, but we can observe the linear predictor
-(represented as $\eta$ in the documentation,
-but referred to in our subject with $\mathbf{x} \hat{\boldsymbol{\gamma}}$).
 ```{r}
 #| label: linear-predictor
 btr_logit_inter$lp
 ```
 
-The $\zeta$'s are our theshold parameters $\theta$
+The $\zeta$'s are the theshold parameters $\theta$'s as defined in our
+textbook and slides.
 
 ```{r}
 #| label: threshold-zeta
@@ -202,12 +218,14 @@ summary(btr_logit_main)
 
 Again the signs of the coefficients are the opposite to our model.
 
-Since the interaction model is nested within the main effects model,
-we can compare them with a likelihood ratio test.
+Since the interaction model is nested within the main effects model, we
+can compare them with a likelihood ratio test.
 
 ::: {.callout-important collapse="true"}
 ## SELF TEST: How Many Degrees of Freedom will the Likelihood Ratio Test have?
-The test has 2 degrees of freedom since we are adding two interaction terms (`Age40to59:Smoking2Former` and `Age40to59:Smoking3Current`).
+
+The test has 2 degrees of freedom since we are adding two interaction
+terms (`Age40to59:Smoking2Former` and `Age40to59:Smoking3Current`).
 :::
 
 We can use `anova()`:
@@ -220,26 +238,26 @@ anova(btr_logit_inter, btr_logit_main)
 ::: {.callout-important collapse="true"}
 ## SELF TEST: Interpret the Likelihood Ratio Test Results
 
-We reject the null hypothesis that there is no interaction,
-at all reasonable significance levels.
+We reject the null hypothesis that there is no interaction, at all
+reasonable significance levels.
 :::
 
-::: {.callout-note}
-The LR statistic in the `anova()` output is the difference
-between the deviances of the two models.
-Each deviance reported is $-2 \times \log(\text{likelihood})$ for that model.
+::: callout-note
+The LR statistic in the `anova()` output is the difference between the
+deviances of the two models. Each deviance reported is
+$-2 \times \log(\text{likelihood})$ for that model.
 :::
 
 ## Alternative Link Functions
 
-The proportional odds assumption can be tested
-by fitting models with different link functions.
-The `polr()` function supports:
+The proportional odds assumption can be tested by fitting models with
+different link functions. The `polr()` function supports:
 
-- **logistic**: cumulative logit model (default)
-- **probit**: cumulative normal model
-- **cloglog**: complementary log-log (grouped Cox/proportional hazards)
-- **loglog**: extreme maximal-value distribution.
+-   **logistic**: cumulative logit model (default)
+-   **probit**: cumulative normal model
+-   **cloglog**: complementary log-log (grouped Cox/proportional
+    hazards)
+-   **loglog**: extreme maximal-value distribution.
 
 See Figure 3.1 of @fahrmeir_multivariate_2001 for some good diagrams
 demonstrating the shapes of these link functions.
@@ -263,7 +281,8 @@ summary(btr_ph)
 
 ### Extreme Maximal-Value Distribution
 
-This model is just a reparameterisation of the proportional hazard model.
+This model is just a reparameterisation of the proportional hazard
+model.
 
 ```{r}
 #| label: fit-loglog-int
@@ -280,9 +299,8 @@ summary(btr_extm)
 
 ## Matching our Results to @fahrmeir_multivariate_2001 Example 3.4
 
-By default, R uses dummy coding.
-To match to the textbook, we need to use effect coding.
-We switch the global contrast option and refit.
+By default, R uses dummy coding. To match to the textbook, we need to
+use effect coding. We switch the global contrast option and refit.
 
 ```{r}
 #| label: set-effect-coding
@@ -293,8 +311,8 @@ contrasts(btr_dat$Smoking)
 contrasts(btr_dat$Age)
 ```
 
-We now have the final categories coded as -1, as required.
-Therefore we refit both models:
+We now have the final categories coded as -1, as required. Therefore we
+refit both models:
 
 ```{r}
 #| label: fit-logit-eff
@@ -318,7 +336,8 @@ summary(btr_logit_inter_eff)
 summary(btr_logit_main_eff)
 ```
 
-The values reported in the summaries now match the textbook (excluding rounding errors).
+The values reported in the summaries now match the textbook (excluding
+rounding errors).
 
 In addition, we can redo the likelihood ratio test with the new models
 to confirm we get the same result.
@@ -332,20 +351,19 @@ As expected, the likelihood ratio test results are the same as before.
 
 ## Saturated Model and Deviance Calculation
 
-In each of the regressed model summaries,
-there is a value "Residual Deviance" reported.
-We need to be careful to understand what this exactly means,
-because as we have previously seen,
-this can be highly package dependent.
+In each of the regressed model summaries, there is a value "Residual
+Deviance" reported. We need to be careful to understand what this
+exactly means, because as we have previously seen, this can be highly
+package dependent.
 
-The deviance reported by `polr()` is
-$-2 \times \log(\text{likelihood})$ for the fitted model.
-We can manually compute the **residual deviance** (difference from saturated model).
+The deviance reported by `polr()` is $-2 \times \log(\text{likelihood})$
+for the fitted model. We can manually compute the **residual deviance**
+(difference from saturated model).
 
-We first compute the log-likelihood of the saturated model.
-We create a matrix with each row corresponding to a unique covariate combination,
-and each column counting the frequency of each response per category
-for that covariate combination.
+We first compute the log-likelihood of the saturated model. We create a
+matrix with each row corresponding to a unique covariate combination,
+and each column counting the frequency of each response per category for
+that covariate combination.
 
 ```{r}
 #| label: make-count-matrix
@@ -372,12 +390,11 @@ probs <- data_mat / rowSums(data_mat)
 probs
 ```
 
-This is the also the matrix of maximum likelihood estimates
-for the saturated model.
-We can manually calculate the log-likelihood of the saturated model
-using the multinomial distribution
-by first calculating the likelihood for each cell, (treating $0 \log(0) := 0$),
-and summing over all cells to get the total log-likelihood for the saturated model.
+This is the also the matrix of maximum likelihood estimates for the
+saturated model. We can manually calculate the log-likelihood of the
+saturated model using the multinomial distribution by first calculating
+the likelihood for each cell, (treating $0 \log(0) := 0$), and summing
+over all cells to get the total log-likelihood for the saturated model.
 
 ```{r}
 #| label: loglike-matrix
@@ -394,22 +411,19 @@ loglike_saturated
 ::: {.callout-note collapse="true"}
 ## SELF TEST: Explain Why We Compute the Saturated Log-Likelihood with this Sum
 
-The above computation is under the ungrouped view of the data.
-The probability mass function for the multinomial distribution is hence:
-$$
+The above computation is under the ungrouped view of the data. The
+probability mass function for the multinomial distribution is hence: $$
     \Pr(Y_1 = y_1, \ldots, Y_k = y_k)
     = \prod_{i = 1}^{6} p_1^{y_{i1}} p_2^{y_{i2}} p_3^{y_{i3}},
-$$
-where $y_i$ are the observed counts and $p_i$ are the probabilities for each category.
-The log-likelihood is then:
-$$\sum_{i,j} y_{ij} \log(p_{ij}),$$
-where we treat $0 \log(0) = 0$.
+$$ where $y_i$ are the observed counts and $p_i$ are the probabilities
+for each category. The log-likelihood is then:
+$$\sum_{i,j} y_{ij} \log(p_{ij}),$$ where we treat $0 \log(0) = 0$.
 :::
 
-We can recover the deviance reported in Table 3.3 of @fahrmeir_multivariate_2001
-by computing the residual deviance as the difference
-between the reported 'deviance' of the fitted model
-and the deviance of the saturated model.
+We can recover the deviance reported in Table 3.3 of
+@fahrmeir_multivariate_2001 by computing the residual deviance as the
+difference between the reported 'deviance' of the fitted model and the
+deviance of the saturated model.
 
 ```{r}
 #| label: residual-dev-logit
@@ -417,27 +431,28 @@ residual_dev_logit <- 2 * loglike_saturated + btr_logit_inter_eff$deviance
 residual_dev_logit
 ```
 
-This confirms that the so called 'deviance' reported by `polr()`
-is actually just twice the negative of the log-likelihood of the fitted model.
-This is very different from what we usually consider the deviance.
+This confirms that the so called 'deviance' reported by `polr()` is
+actually just twice the negative of the log-likelihood of the fitted
+model. This is very different from what we usually consider the
+deviance.
 
 ### Returning the Deviance for Alternative Link Functions
 
-We fit models with complementary log-log and loglog link functions using effect coding,
-then summarize results in a comparison table:
+We fit models with complementary log-log and loglog link functions using
+effect coding, then summarize results in a comparison table:
 
-::: {.callout-tip}
+::: callout-tip
 The table is built in three steps:
 
-1. Fit one model per link and extract coefficient estimates
-with `coef(summary(...))[ ,"Value"]`.
-2. Build two small tibbles (one for coefficients, one for residual deviance)
-and stack them with `bind_rows()`.
-3. Use `mutate(across(where(is.numeric), ~ round(., 4)))`
-to round all numeric columns for cleaner display.
+1.  Fit one model per link and extract coefficient estimates with
+    `coef(summary(...))[ ,"Value"]`.
+2.  Build two small tibbles (one for coefficients, one for residual
+    deviance) and stack them with `bind_rows()`.
+3.  Use `mutate(across(where(is.numeric), ~ round(., 4)))` to round all
+    numeric columns for cleaner display.
 
-`rownames(coef(...))` supplies the parameter labels
-so each estimate aligns across link functions.
+`rownames(coef(...))` supplies the parameter labels so each estimate
+aligns across link functions.
 :::
 
 ```{r}
@@ -495,9 +510,10 @@ results_table
 ::: {.callout-tip collapse="true"}
 ## Alternative: Nested Data + Functional Mapping Workflow
 
-This can be written in a nested/list-column style using functional programming.
-The idea is to store one row per link function, fit each model via `map()`,
-then map again to extract coefficients and residual deviance.
+This can be written in a nested/list-column style using functional
+programming. The idea is to store one row per link function, fit each
+model via `map()`, then map again to extract coefficients and residual
+deviance.
 
 ```{r}
 #| label: fit-alt-links-eff-functional
@@ -550,15 +566,14 @@ results_table_map <- bind_rows(coef_long, dev_long) |>
 
 results_table_map
 ```
-
 :::
 
-We have now matched Table 3.3 of @fahrmeir_multivariate_2001
-(with inverse signs for the coefficients).
+We have now matched Table 3.3 of @fahrmeir_multivariate_2001 (with
+inverse signs for the coefficients).
 
-It is **highly recommended** that you read through Example 3.4 in the textbook
-to get an idea of intepreting the results. It is worth considering here
-that the proportional hazard has the lowest deviance,
+It is **highly recommended** that you read through Example 3.4 in the
+textbook to get an idea of intepreting the results. It is worth
+considering here that the proportional hazard has the lowest deviance,
 and gives a much smaller threshold parameter.
 
 ## Other Tricks and Commands
@@ -574,8 +589,8 @@ residuals(btr_logit_inter_eff)
 
 ### Fitted Values and Predictions
 
-Fitted probabilities can be extracted directly
-or using the generic `predict()` function:
+Fitted probabilities can be extracted directly or using the generic
+`predict()` function:
 
 ```{r}
 #| label: fitted-probs
@@ -588,8 +603,8 @@ Both approaches yield identical results.
 
 ### Model Selection with AIC
 
-The `step()` function performs automatic model selection using the AIC criterion.
-(not directly covered in this course unless time allows)
+The `step()` function performs automatic model selection using the AIC
+criterion. (not directly covered in this course unless time allows)
 
 ```{r}
 #| label: model-selection-aic
@@ -598,9 +613,9 @@ step(btr_logit_inter_eff)
 
 ### Variance-Covariance Matrix and Standard Errors
 
-We can extract the estimated variance-covariance matrix of the parameter estimates
-using `vcov()`.
-This should line up with the inverse of the Fisher information matrix.
+We can extract the estimated variance-covariance matrix of the parameter
+estimates using `vcov()`. This should line up with the inverse of the
+Fisher information matrix.
 
 ```{r}
 #| label: vcov-extraction
@@ -608,9 +623,9 @@ logit_vcov <- vcov(btr_logit_inter_eff)
 logit_vcov
 ```
 
-From this we can match the standard errors reported in the summary output,
-since they are the square root of the diagonal elements
-of the variance-covariance matrix:
+From this we can match the standard errors reported in the summary
+output, since they are the square root of the diagonal elements of the
+variance-covariance matrix:
 
 ```{r}
 #| label: vcov-comparison
@@ -620,11 +635,11 @@ se_from_summary <- summary(btr_logit_inter_eff)$coefficients[, "Std. Error"]
 se_from_vcov == se_from_summary
 ```
 
-We should be able to recover the same confidence intervals
-by using the variance-covariance matrix and a normal approximation (Wald intervals).
+We should be able to recover the same confidence intervals by using the
+variance-covariance matrix and a normal approximation (Wald intervals).
 
-We can check the covariance matrix
-against the inverse of the Hessian matrix (Fisher information) returned by `polr()`:
+We can check the covariance matrix against the inverse of the Hessian
+matrix (Fisher information) returned by `polr()`:
 
 ```{r}
 #| label: hessian-comparison
@@ -633,6 +648,7 @@ hessian_inv
 ```
 
 We can see that the matrices are slightly different.
+
 ```{r}
 #| label: 04-example3-2btr-chunk-32
 all.equal(hessian_inv, logit_vcov)
@@ -640,16 +656,17 @@ all.equal(hessian_inv, logit_vcov)
 
 ### Confidence Intervals
 
-R provides the generic method `confint()`
-to compute confidence intervals for model parameters.
+R provides the generic method `confint()` to compute confidence
+intervals for model parameters.
 
 ```{r}
 #| label: confint-profile
 confint(btr_logit_inter_eff) |> round(3)
 ```
 
-We can attempt to recovery the same confidence intervals using a Wald approximation
-based on the coefficient estimates and their standard errors:
+We can attempt to recovery the same confidence intervals using a Wald
+approximation based on the coefficient estimates and their standard
+errors:
 
 ```{r}
 #| label: confint-wald-manual-upper
@@ -660,12 +677,13 @@ wald_lower <- coeff_est + qnorm(0.025) * sqrt(diag(vcov(btr_logit_inter_eff)))
 tibble(wald_lower, wald_upper)
 ```
 
-These values are different from the confidence intervals computed by `confint()`.
-This is because the `confint()` function uses the profile likelihood method,
-which is more suitable for some non-linear regression problems.
+These values are different from the confidence intervals computed by
+`confint()`. This is because the `confint()` function uses the profile
+likelihood method, which is more suitable for some non-linear regression
+problems.
 
-For more details see @venables_modern_2002, p. 220,
-and run `profile(btr_logit_inter_eff)`.
+For more details see @venables_modern_2002, p. 220, and run
+`profile(btr_logit_inter_eff)`.
 
 ```{r}
 #| label: package-citations
@@ -688,4 +706,4 @@ for (pkg in packages) {
         cat("- **", pkg, "**: ", cit_text, "\n\n", sep = "")
     }
 }
-```
\ No newline at end of file
+```