Ex. 3 - Retrieving regression coefficients

Yuan-Ling Liaw and Waldir Leoncio

library(lsasim)
packageVersion("lsasim")
[1] '2.1.5'

With the arguments theta = TRUE, full_Output = TRUE and family = "gaussian", the output will automatically contain the \(\beta\) vector and the \(R\) matrix (i.e., beta_gen will be called automatically from within questionnaire_gen).


We generate one latent trait, two continuous, one binary, and one 3-category covariates. The data is generated from a multivariate normal distribution. The logical argument full_output is TRUE.

set.seed(1234)
bg <- questionnaire_gen(n_obs = 1000, n_X = 2, n_W = list(2, 3), theta = TRUE, family = "gaussian",
    full_output = TRUE)
str(bg$bg)
'data.frame':   1000 obs. of  6 variables:
 $ subject: int  1 2 3 4 5 6 7 8 9 10 ...
 $ theta  : num  -1.732 0.707 0.911 1.509 -0.5 ...
 $ q1     : num  -0.491 0.16 -0.39 -1.307 0.602 ...
 $ q2     : num  0.5499 0.0669 0.087 -1.628 0.2559 ...
 $ q3     : Factor w/ 2 levels "1","2": 2 2 2 1 2 2 1 1 2 1 ...
 $ q4     : Factor w/ 3 levels "1","2","3": 1 2 3 2 1 2 1 1 1 3 ...

linear_regression is a list that contains two elements. The first element, betas, summarizes the true regression coefficients \(\beta\). The second element, vcov_YXW, shows the \(R\) matrix.

bg$linear_regression
$betas
     theta         q1         q2       q3.2       q4.2       q4.3 
-0.8174844 -0.5836818 -0.5402292 -0.1049199  0.8699444  1.7398887 

$vcov_YXW
            theta          q1          q2          q3.2          q4.2        q4.3
theta  1.00000000 -0.17564245 -0.26487195 -7.889541e-02  0.000000e+00  0.12421718
q1    -0.17564245  1.00000000 -0.28027150  3.541595e-02  0.000000e+00  0.14963273
q2    -0.26487195 -0.28027150  1.00000000  2.556079e-01  0.000000e+00  0.07965237
q3.2  -0.07889541  0.03541595  0.25560791  2.500000e-01 -2.775558e-17  0.06097692
q4.2   0.00000000  0.00000000  0.00000000 -2.775558e-17  2.400000e-01 -0.12000000
q4.3   0.12421718  0.14963273  0.07965237  6.097692e-02 -1.200000e-01  0.21000000

beta_gen uses the output from questionnaire_gen to generate linear regression coefficients.

beta_gen(bg)
     theta         x1         x2        w12        w22        w23 
-0.8174844 -0.5836818 -0.5402292 -0.1049199  0.8699444  1.7398887 

If the logical argument MC is TRUE in beta_gen, Monte Carlo simulation is used to estimate regression coefficients. If the logical argument rename_to_q is TRUE, the background variables are all labeled as q to match the default behavior of questionnaire_gen.

The first column contains the true \(\beta\), as estimated by the covariance matrix, which will always be the same for the same data. The column of MC reports the Monte Carlo simulation estimates for \(\beta\), which is sample-dependent and will change each time the beta_gen function is called. The next two columns summarize the 99% confidence interval for these estimates. And the column of cov_in_CI return to logical argument whether the cov_matrix estimates are contained within their respective confidence intervals (“1” corresponds to “yes” and “0” to “no”).

beta_gen(bg, MC = TRUE, MC_replications = 100, rename_to_q = TRUE)
      cov_matrix         MC       0.5%       99.5% cov_in_CI
theta -0.8174844 -0.7528447 -0.8380719 -0.64147366         1
q1    -0.5836818 -0.5627695 -0.6279581 -0.49666666         1
q2    -0.5402292 -0.4950472 -0.5571855 -0.44193911         1
q3.2  -0.1049199 -0.1876017 -0.3357658 -0.08231601         1
q4.2   0.8699444  0.8165117  0.7057719  0.92812704         1
q4.3   1.7398887  1.7389447  1.5867220  1.88949655         1
beta_gen(bg, MC = TRUE, MC_replications = 100, rename_to_q = TRUE)
      cov_matrix         MC       0.5%       99.5% cov_in_CI
theta -0.8174844 -0.7485083 -0.8550206 -0.63323259         1
q1    -0.5836818 -0.5650178 -0.6309527 -0.51627209         1
q2    -0.5402292 -0.4954573 -0.5544919 -0.43396311         1
q3.2  -0.1049199 -0.1793640 -0.3039317 -0.05262306         1
q4.2   0.8699444  0.8092411  0.6481412  0.95344844         1
q4.3   1.7398887  1.7205485  1.5569970  1.90153362         1