In this vignette, we will explain how to compute a Bayes factor for inequality-constrained hypotheses for multinomial models.
As example for an inequality-constrained hypothesis in multinomial
models, we will use the data set, lifestresses
, which is
included in the package multibridge
. The dataset is based
on a survey study by Uhlenhuth et al.
(1974) which among other things has dealt with experienced
negative life events and lifestresses. The subset of the data which we
use here summarizes when respondents have experienced a negative event
in their lives in the 18 months prior to the interview. In total, the
dataset contains the responses of 147 people. This subset was analyzed
by Haberman (1978) who wanted to show that
the participants listed mainly negative events of the recent past.
Furthermore, in the context of inequality-constrained hypotheses the
dataset was discussed in Sarafoglou et al.
(2021).
To get an overview, we will load and visualize the data before we start the data analysis.
library(multibridge)
data(lifestresses)
lifestresses
## month stress.freq stress.percentage
## 1 1 15 10.2
## 2 2 11 7.5
## 3 3 14 9.5
## 4 4 17 11.6
## 5 5 5 3.4
## 6 6 11 7.5
## 7 7 10 6.8
## 8 8 4 2.7
## 9 9 8 5.4
## 10 10 10 6.8
## 11 11 7 4.8
## 12 12 9 6.1
## 13 13 11 7.5
## 14 14 3 2.0
## 15 15 6 4.1
## 16 16 1 0.7
## 17 17 1 0.7
## 18 18 4 2.7
# visualize the data
op <- par(cex.main = 1.5, mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5 , font.lab = 2, cex.axis = 1.3, bty = "n", las = 1)
plot(lifestresses$month, lifestresses$stress.freq, col = "black",
pch = 1, cex = 2, type="b",
xlim = c(0, 18), ylim = c(0, 20),
ylab = "", xlab = "", axes = FALSE)
axis(1, at = seq(0, 18, length.out = 10), label = seq(0, 18, length.out = 10))
axis(2, at = seq(0, 20, length.out = 5), label = seq(0, 20, length.out = 5), las = 1)
par(las = 0)
mtext("Months Before Interview", side = 1, line = 2.5, cex = 1.5)
mtext("Participants Reporting a\nNegative Life Event", side = 2, line = 3.0, cex = 1.5)
par(op)
The model that we will use assumes that the vector of observations \(x_1, \cdots, x_K\) in the \(K\) categories follow a multinomial distribution. The parameter vector of the multinomial model, \(\theta_1, \cdots, \theta_K\), contains the probabilities of observing a value in a particular category; here, it reflects the probabilities of reporting a negative life event or life stress during one of the 18 months. The parameter vector \(\theta_1, \cdots, \theta_K\) is drawn from a Dirichlet distribution with concentration parameters \(\alpha_1, \cdots, \alpha_K\). The model can be described as follows:
\[\begin{align} x_1, \cdots, x_K &\sim \text{Multinomial}(\sum_{k = 1}^K x_k, \theta_1, \cdots, \theta_K) \\ \theta_1, \cdots, \theta_K &\sim \text{Dirichlet}(\alpha_1, \cdots, \alpha_K) \\ \end{align}\]
Here, we test the inequality-constrained hypothesis \(\mathcal{H}_r\) that the number of reported negative life events decreases over time against the encompassing hypothesis \(\mathcal{H}_e\) without constraints:
\[\begin{align*} \mathcal{H}_r &: \theta_{1} > \theta_{2} > \cdots > \theta_{18} \\ \mathcal{H}_e &: \theta_1, \theta_2, \cdots, \theta_{18}. \end{align*}\]
To compute the Bayes factor in favor of the inequality-constrained hypothesis, \(\text{BF}_{re}\), we need to specify (1) a vector containing the number of observations, (2) the inequality-constrained hypothesis, (3) a vector with concentration parameters, and (4) the labels of categories of interest (i.e., months prior to the interview). .
x <- lifestresses$stress.freq
# Test the following restricted Hypothesis:
# Hr: month1 > month2 > ... > month18
Hr <- paste0(1:18, collapse=">")
# Prior specification
# We assign a uniform Dirichlet distribution, that is, we set all concentration parameters to 1
a <- rep(1, 18)
categories <- lifestresses$month
With this information, we can now conduct the analysis with the
function mult_bf_informed()
. Since we are interested in
quantifying evidence in favor of the restricted hypothesis, we set the
Bayes factor type to BFre
. For reproducibility, we are also
setting a seed with the argument seed
:
ineq_results <- multibridge::mult_bf_informed(x=x, Hr=Hr, a=a, factor_levels=categories,
bf_type = 'BFre', seed = 2020)
We can get a quick overview of the results by using the implemented
summary()
method:
m1 <- summary(ineq_results)
m1
## Bayes factor analysis
##
## Hypothesis H_e:
## All parameters are free to vary.
##
## Hypothesis H_r:
## 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14 > 15 > 16 > 17 > 18
##
## Bayes factor estimate BFre:
##
## 162.06
##
## Based on 1 independent inequality-constrained hypothesis.
##
## Relative Mean-Square Error:
##
## 0.000193
##
## Posterior Median and Credible Intervals Of Marginal Beta Distributions:
## alpha beta 2.5% 50% 97.5%
## 1 1 1 + 15 17 + 132 0.05680 0.0953 0.1460
## 2 2 1 + 11 17 + 136 0.03840 0.0710 0.1170
## 3 3 1 + 14 17 + 133 0.05210 0.0893 0.1390
## 4 4 1 + 17 17 + 130 0.06640 0.1080 0.1610
## 5 5 1 + 5 17 + 142 0.01350 0.0345 0.0697
## 6 6 1 + 11 17 + 136 0.03840 0.0710 0.1170
## 7 7 1 + 10 17 + 137 0.03400 0.0649 0.1090
## 8 8 1 + 4 17 + 143 0.00997 0.0284 0.0613
## 9 9 1 + 8 17 + 139 0.02540 0.0528 0.0939
## 10 10 1 + 10 17 + 137 0.03400 0.0649 0.1090
## 11 11 1 + 7 17 + 140 0.02130 0.0467 0.0860
## 12 12 1 + 9 17 + 138 0.02960 0.0588 0.1020
## 13 13 1 + 11 17 + 136 0.03840 0.0710 0.1170
## 14 14 1 + 3 17 + 144 0.00668 0.0223 0.0525
## 15 15 1 + 6 17 + 141 0.01730 0.0406 0.0779
## 16 16 1 + 1 17 + 146 0.00148 0.0102 0.0335
## 17 17 1 + 1 17 + 146 0.00148 0.0102 0.0335
## 18 18 1 + 4 17 + 143 0.00997 0.0284 0.0613
Since we are dealing with many categories, one might want to assess the accuracy of the Bayes factor. This can be done, by inspecting the corresponding error of the estimate. The relative mean-square error of the Bayes factor estimate is about \(1.93299\times 10^{-4}\), which can be considered low.
More information on the accuracy can be extracted from the output
directly. Information about the Bayes factor are stored in
bf_list
under error_measures
.
ineq_results$bf_list$error_measures
## re2 cv percentage
## 1 0.000193299 0.0139032 1.3903%
This dataframe features the approximate relative mean-squared error for the Bayes factor, the approximate coefficient of variation, and the approximate percentage error.