Package “Tlasso”

Xiang Lyu, Will Wei Sun, Zhaoran Wang, Han Liu, Jian Yang, Guang Cheng.

2022-01-31

Basic Info

Methodology

Functions

Quick Start

Reference

Basic Info

This package considers the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. Data are assumed to follow a tensor normal distribution whose covariance has a Kronecker product structure. For estimation, this package provides an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, and attains an estimator with the optimal statistical rate of convergence. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. For inference, this package provides a large-scale multiple testing method for support recovery of sparse precision matrix. A FDR control procedure can be easily implmented into the inference. Test consistency and FDR convergence achieves with only two tensor samples.

Methodology

Estimation: Tlasso Algorithm. Lyu et al. (2019)

A tensor \({\cal T} \in \mathbb R^{m_1 \times m_2 \times \cdots \times m_K}\) follows the tensor normal distribution with zero mean and covariance matrices \(\bf{\Sigma}_1, \ldots, \bf{\Sigma}_K\), denoted as \({\cal T} \sim \textrm{TN}({\bf0}; \bf{\Sigma}_1, \ldots, \bf{\Sigma}_K)\), if its probability density function is \[ p({\cal T}| \bf{\Sigma}_1,\ldots,\bf{\Sigma}_K) = (2\pi)^{-m/2} \biggl\{ \prod_{k=1}^K |\bf{\Sigma}_k|^{-m/(2m_k)} \biggr\} \exp \big(- \|{\cal T} \times \bf{\Sigma}^{-1/2}\|_F^2/2 \big), \] where \(m = \prod_{k=1}^K m_k\) and \(\bf{\Sigma}^{-1/2} := \{\bf{\Sigma}_1^{-1/2},\ldots,\bf{\Sigma}_K^{-1/2}\}\).

A standard approach to estimate precision matrix \(\bf{\Omega}_k^*\), \(k=1,\ldots,K\), is to use the maximum likelihood method. Up to a constant, the negative log-likelihood function of the tensor normal distribution is \[ \ell(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{2}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \frac{1}{2}\sum_{k=1}^K \frac{m}{m_k} \log |\bf{\Omega}_k|, \] where \(\bf{S} := \frac{1}{n} \sum_{i=1}^n \textrm{vec}({\cal T}_i) \textrm{vec}({\cal T}_i)^{\top}\). To encourage the sparsity of each precision matrix in the high-dimensional scenario, a penalized log-likelihood estimator is proposed which minimizes \[ q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{m}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \sum_{k=1}^K \frac{1}{m_k} \log |\bf{\Omega}_k| + \sum_{k=1}^K P_{\lambda_k}(\bf{\Omega}_k), \] where \(P_{\lambda_k}(\bf{\Omega}_k) = \lambda_k \sum_{i\ne j} |[\bf{\Omega}_{k}]_{i,j}|\) is penalty function and \(\lambda_k\) is tuning parameter.

\(q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)\) is jointly non-convex with respect to \(\bf{\Omega}_1, \ldots, \bf{\Omega}_K\). Nevertheless, \(q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)\) is bi-convex problem since \(q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)\) is convex in \(\bf{\Omega}_k\) when the rest \(K-1\) precision matrices are fixed. According to its bi-convex property, this package solves this non-convex problem by alternatively update one precision matrix with other matrices fixed. Note that, for any \(k = 1,\ldots, K\), minimizing \(q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)\) with respect to \(\bf{\Omega}_k\) while fixing the rest \(K-1\) precision matrices is equivalent to minimizing \[ L(\bf{\Omega}_k) := \frac{1}{m_k}\textrm{tr}(\bf{S}_k \bf{\Omega}_k) - \frac{1}{m_k} \log |\bf{\Omega}_k| + \lambda_k \|\bf{\Omega}_k\|_{1,\textrm{off}}. \] Here \(\bf{S}_k := \frac{m_k}{n m}\sum_{i=1}^n \bf{V}_i^k \bf{V}_i^{k\top}\), where \(\bf{V}_i^k := \big[ {\cal T}_i \times \big\{\bf{\Omega}_1^{1/2},\ldots,\bf{\Omega}_{k-1}^{1/2}, 1_{m_k}, \bf{\Omega}_{k+1}^{1/2},\ldots,\bf{\Omega}_{K}^{1/2} \big\} \big]_{(k)}\) with \(\times\) the tensor product operation and \([\cdot]_{(k)}\) the mode-\(k\) matricization operation defined in . Note that minimizing \(L(\bf{\Omega}_k)\) corresponds to estimating vector-valued Gaussian graphical model and can be solved efficiently via the glasso algorithm Friedman et al. (2008). See Lyu et al. (2019) for detailed algorithm, named Tlasso.

Inference:

Multiple testing method is established to testing the entries of precision matrix for each way of the tensor, Lyu et al. (2019). We focus on \(\bf{\Omega}_1^*\), and procedures on the rest \(K-1\) precision matrices are symmetric. Hypothesis for \(\bf{\Omega}_1^*\) is, \(\forall \,1 \le i < j \le m_1\), \[ H_{0 1 , i j } : \; [{\bf{\Omega}}_1^*]_{i,j} =0 \quad\quad \textrm{vs} \quad\quad H_{11 , ij} : \; [\bf{\Omega}_1^*]_{i,j} \ne 0 \]

Let index \(-i\) in \(k\)-th mode denote all elements of mode-\(k\) except the \(i\)-th one. From the distribution of \({\cal T }_{: , i_2 , \ldots , i_K}\), we have, for every \(i_k \in \{ 1 , \ldots , m_k\}\), \(k \in \{ 1 , \ldots , K\}\), \({\cal T}_{i_1 , i_2 , \ldots , i_K} | {\cal T}_{-i_1 , i_2 , \ldots , i_K} \sim \textrm{N} ( - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} {\cal T}_{-i_1 , i_2 , \ldots , i_K} ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K[\bf{\Sigma}_k^*]_{i_k, i_k} )\). An equivalent linear model is \[ {\cal T }_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; - i_1, i_2 , \ldots , i_K}^\top \bf{\theta}_{i_1} + \xi_{l ; i_1, i_2 , \ldots , i_K}, \forall \, l \in \{ 1 , \ldots , n \} \] where \(\bf{\theta}_{i_1} = - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} \text{ , and }\, \xi_{l ;i_1, i_2 , \ldots , i_K} \sim \textrm{N} (0 \, ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K [\bf{\Sigma}_k^*]_{i_k, i_k}).\) Note that intercept term is eliminated since zero mean of \({\cal T }_{: , i_2 , \ldots , i_K}\).

Test statistic is constructed by both bias and variance correction of sample covariance of residuals. Let \(\hat{\bf{\theta}}_{i_1} = (\hat{\theta}_{1, i_1} , \ldots , \hat{\theta}_{m_1-1, i_1})^\top = - [\hat{\bf{\Omega}}_1]_{i_1,i_1}^{-1} [\hat{\bf{\Omega}}_1]_{i_1, - i_1}\) be the estimator of \(\bf{\theta}_{i_1}\), where \(\hat{\bf{\Omega}}_1\) is the output of Tlasso Algorithm. Given \(\{ \hat{\bf{\theta}}_{i_1} \}_{i_1 = 1 }^{m_1}\), the residual of the linear model is defined as \[ \hat{\xi}_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; i_1, i_2 , \ldots , i_K} - \bar{{\cal T }}_{ i_1, i_2 , \ldots , i_K} - ( {\cal T }_{l ; - i_1, i_2 , \ldots , i_K} - \bar{{\cal T }}_{ - i_1, i_2 , \ldots , i_K} )^\top \hat{\bf{\theta}}_{i_1}, \] where \(\bar{{\cal T }} = \frac{1}{n}\sum \limits_{l=1}^{n} {\cal T }_{l }\). The sample covariance of residuals is
\[ \hat{\varrho}_{i,j}= \frac{m_1}{(n-1) m } \sum \limits_{l=1}^{n} \sum \limits_{i_2=1}^{m_2} \cdots \sum \limits_{i_K=1}^{m_K} \hat{\xi}_{l ; i, i_2 , \ldots , i_K} \hat{\xi}_{l ; j, i_2 , \ldots , i_K} . \] Test statistic is \[\tau_{i,j} = \frac{\hat{\varrho}_{i.j} + \mu_{i,j}}{\varpi}, \forall 1 \le i < j \le m_1.\] The bias correction term \(\mu_{i,j}=\hat{\varrho}_{i,i} \hat{\theta}_{i , j} + \hat{\varrho}_{j,j} \hat{\theta}_{j-1, i}\) translates the mean of \(\tau_{i,j}\) to zero under \(H_{01, ij}\). The variance correction term \[ \varpi^2 = \frac{m \cdot \|\widehat{\bf{S}}_2\|_F^2 \cdots \|\widehat{\bf{S}}_K\|_F^2}{m_1 \cdot (\textrm{tr}(\widehat{\bf{S}}_2))^2 \cdots (\textrm{tr}(\widehat{\bf{S}}_K))^2}, \] plays a significant role in rescaling \(\tau_{i,j}\) into an asymptotic standard normal distribution, where $ k := {i=1}^n _i _i^{} $ is the estimate of \(\bf{\Sigma}_k\) with \(\widehat{\bf{V}}_i := \big[ {\cal T}_i \times \bigl\{\widehat{\bf{\Omega}}_1^{1/2},\ldots,\widehat{\bf{\Omega}}_{k-1}^{1/2}, 1_{m_k}, \widehat{\bf{\Omega}}_{k+1}^{1/2},\ldots,\widehat{\bf{\Omega}}_{K}^{1/2} \bigr\} \big]_{(k)}\) and \(\widehat{\bf{\Omega}}_{k}\) from Tlasso Algorithm, \(k \in \{2 , \ldots , K \}\). Lyu et al. (2019) proves that, under certain conditions, \[ \sqrt{\frac{(n-1) m }{ m_1 \hat{\varrho}_{i,i} \hat{\varrho}_{j,j} }} \tau_{i,j} \rightarrow \textrm{N} ( 0 ;1 ) \] in distribution, as \(nm/m_1 \rightarrow \infty\).

Let \(\tilde{\tau}_{i,j}= \sqrt{(n-1) m/ (m_1\hat{\varrho}_{i,i} \hat{\varrho}_{j,j}) } \tau_{i,j}\). Given the values of \(\tilde{\tau}_{ij}\) and thresholding level \(\varsigma\), we define a test procedure \(\varphi_{\varsigma} (\tilde{\tau}_{i,j})= 1 \{ |\tilde{\tau}_{i,j}| \ge \varsigma \}\) and reject \(H_{01, ij}\) if \(\varphi_{\varsigma} (\tilde{\tau}_{i,j})=1\). The definition of FDR/FDP in our notation is \[ \textrm{FDP} = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \, \text{ , and } \,\textrm{FDR} = \bf{E} (\textrm{FDP}). \] where \({\cal H}_0 = \{ (i,j) : [\bf{\Omega}_1^*]_{i,j} = 0 , 1 \le i < j \le m_1 \}\). To recover the support of \(\bf{\Omega}_1^*\), \((m_1-1)m_1/2\) tests are simultaneously conducted. Hence, a small enough \(\varsigma\) that enhances power while controls FDP under a pre-specific level \(\upsilon \in (0,1)\) is an ideal choice. In particular, \(\varsigma_{*} = \inf \{ \varsigma > 0: \text{FDP} \le \upsilon \}\). However, \(\varsigma_*\) is oracle since we have no access to the information of \({\cal H}_0\). Due to sparsity, \(w_0 =|{\cal H}_0|\) can be approximated by \(w= m_1(m_1 -1 )/2\). \(P(\varphi_{\varsigma} (\tilde{\tau}_{i,j})=1)\) is close to \(2(1 - \Phi( \varsigma))\) by test consistency. The approximation of \(\varsigma_*\) is \[ \hat{\varsigma}= \inf \bigg \{ \varsigma > 0: \frac{2(1-\Phi( \varsigma )) w}{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \le \upsilon \bigg \}. \]

FDR/FDP control procedure is to infer the support of \(\bf{\Omega}_1^*\) at thresholding level \(\hat{\varsigma}\). In particular, FDR and FDP in the procedure are \[ \textrm{FDP}_1 = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \text{ , and } \,\textrm{FDR}_1 = \bf{E} (\textrm{FDP}_1). \] Note that we set \(\tilde{\tau}_{i,j} = \tilde{\tau}_{j,i}\) for \(1 \le i < j \le m_1\). The inference of \(\textrm{supp}(\bf{\Omega}_k^*) , \, k \in \{2 , \ldots , K \}\), is a symmetric procedure.

Lyu et al. (2019) gives the asymptotic control of \(\text{FDP}_1\) and \(\text{FDR}_1\) for the support of \(\bf{\Omega}_{1}^*\): under certain conditions, \[\textrm{FDP}_1 w / \upsilon w_0 \rightarrow 1 , \; \textrm{FDR}_1 w / \upsilon w_0 \rightarrow 1 \] in probability as \(nm/m_1 \rightarrow \infty\).

Back to Top

Functions

This packages contains following functions:

  1. ChainOmega
    This function generates precision matrix of triangle graph (chain like network) following the set-up in Fan et al. (2009). It first constructs a covariance matrix \(\Sigma\) that its (i,j) entry is \(\exp (- | h_i - h_j |/2)\) with \(h_1 < h_2 < \ldots < h_p\). The difference \(h_i - h_{i+1}\) is generated i.i.d. from Unif(0.5,1). See Fan et al. (2009) for more details.
  2. NeighborOmega
    This function generates precision matrix of nearest neighbor network following the set-up in Li and Gui (2006) and Lee and Liu (2006). For a knn nearest-neighbor graph, this function first randomly picks p points from a unit square and computes all pairwise distances among the points. Then it searches for the knn nearest-neighbors of each point and a pair of symmetric entries in the precision matrix that has a random chosen value from \([-1, -0.5] \cup [0.5, 1]\). Finally, to ensure positive definite property, it normalizes the matrix as \(\Omega <- \Omega + (\lambda_{\min} (\Omega) + 0.2 ) 1_p\) where \(\lambda_{\min} (\cdot )\) refers to the samllest eigenvalue.
  3. Trnorm
    This function generates obeservations from separable tensor normal distribution and returns a \(m_1 * \ldots * m_K * n\) array. If Sigma.list is not given, default distribution is from either triangle graph or nearest-neighbor graph (depends on type).
  4. Tlasso.fit
    This function conducts an alternating optimization algorithm to precision matrices of sparse tensor graphical models. The output is optimal consistent even when \(T=1\), see Lyu et al. (2019) for details. There are two ternimation criteria, T and thres. Algorithm will be terminated if output in certain iteration change less than thres. Otherwise, T iterations will be fully operated.
  5. covres
    This function generates sample covariance matrix of residuals (includes diagnoal) and is the basis for support recovery procedure, see Lyu et al. (2019). Note that output matrix includes diagnoal while bias corrected matrix (output of biascor) for inference is off-diagnoal. Elements in Omega.list are true precision matrices or estimation of the true ones, the latter can be output of Tlasso.fit.
  6. biascor
    This function computes bias corrected sample covariance matrix of residuals (excludes diagnoal, diagnoal is zero vector). Note that output matrix excludes diagnoal while sample covariance of residuals includes diagnoal, see Lyu et al. (2019) for details. Elements in Omega.list are true precision matrices or estimation of the true ones, the latter can be output of Tlasso.fit.
  7. varcor
    This function computes variance correction term of sample covariance of residuals and is utilized to normalize test statistic into standord normal, see Lyu et al. (2019) for details.
  8. est.analysis
    This function generates a list of performance measures of optimazation for sparse tensor graphical models, i.e., estimation errors and model selection consistency. Errors are measured in Frobenius norm and Max norm. Model selection measures are TPR and TNR. All these measures are computed in each mode, average across all modes, and kronecker production of precision matrices.
  9. infer.analysis
    This function computes performance measures of inference for sparse tensor graphical models. False positive, false negative, discovery (number of rejected null hypothesis), non-discovery (number of non-rejected null hypothesis), and total non-zero entries of each true precision matrix is listed in output.
  10. graph.pattern
    This function draws an undirected graph based on presicion matrix to present connection among variables. If an entry is zero, then no edge is connected between corresponding pair of nodes.

Back to Top

Quick Start

The purpose of this section is to show users the basic usage of this package. We will briefly go through main functions, see what they can do and have a look at outputs. An detailed example of complete procedures of estimation and inference will be presented to give users a general sense of the pakcage.

First, we load Tlasso package:

library(Tlasso)

Then, we generate a list of precision matrices of triangle graph.

m.vec = c(5,5,5)  # dimensionality of a tensor 
# m1, m2, m3
n = 5   # sample size 

Omega.true.list = list()
for (k in 1:length(m.vec)) {
  Omega.true.list[[k]] = ChainOmega(m.vec[k], sd = k)
}

Omega.true.list[[1]]
##            [,1]       [,2]       [,3]       [,4]       [,5]
## [1,]  0.3168143 -0.2308893  0.0000000  0.0000000  0.0000000
## [2,] -0.2308893  0.4674875 -0.2123306  0.0000000  0.0000000
## [3,]  0.0000000 -0.2123306  0.4234692 -0.1841058  0.0000000
## [4,]  0.0000000  0.0000000 -0.1841058  0.3658496 -0.1499391
## [5,]  0.0000000  0.0000000  0.0000000 -0.1499391  0.2415995

ChainOmega returns a precision matrix of triangle graph with dimension m.vec[k] and seed number k. Given precision matrices, we generate obervations from corresponding tensor normal distribution.

Sigma.true.list = list()
for (k in 1:length(m.vec)) {
  Sigma.true.list[[k]] = solve(Omega.true.list[[k]])
} # generate covariance matrices list

DATA=Trnorm(n,m.vec,Sigma.list=Sigma.true.list) 
# obersavations from tensor normal distribution

Trnorm generates observations from separable tensor normal distribution with covariance matrix Sigma.list[[k]] for kth mode. DATA is a \(m_1 * m_2 * m_3 * n\) array, i.e, a \(10 \times 10 \times 10 \times 10\) array. Default distribution is from triangle graph or 4 near-neighbor graph (depends on type).

DATA2=Trnorm(n,m.vec)
# default is triangle graph 
# equivalent to DATA2 = Trnorm(n,m.vec, type='Chain', sd=1)
DATA3=Trnorm(n,m.vec,type='Neighbor')
# 4 nearest-neighbor graph
# equivalent to DATA3 = Trnorm(n,m.vec, type='Neighbor', sd=1, knn=4)

Given observations DATA, we use Tlasso.fit to conduct alternating optimization.

lambda.thm = 20*c( sqrt(log(m.vec[1])/(n*prod(m.vec))), 
                   sqrt(log(m.vec[2])/(n*prod(m.vec))), 
                   sqrt(log(m.vec[3])/(n*prod(m.vec))))
# lambda.thm is regularization parameter
out.tlasso = Tlasso.fit(DATA,T=1,lambda.vec = lambda.thm)   
# output is a list of estimation of precision matrices
out.tlasso[[1]]
##              [,1]        [,2]         [,3]        [,4]        [,5]
## [1,]  0.361422637 -0.23704270  0.002333856 -0.06126152  0.01539202
## [2,] -0.237046526  0.46822624 -0.192384949 -0.05154091  0.05256162
## [3,]  0.002304876 -0.19236551  0.424993048 -0.17817458 -0.07814616
## [4,] -0.061257515 -0.05155747 -0.178173408  0.37108685 -0.08295747
## [5,]  0.015405071  0.05255995 -0.078142842 -0.08296053  0.19265707

Tlasso.fit generates a list of precision matrices out.tlasso. Default is maximal iteration T=1 and termination thresholding level thres=1e-05. If estimation output changes less than thres, in terms of Frobenius norm, after certain iteration, Tlasso.fit will be terminated immediately (before Tth iteration). The performance of Tlasso.fit can be evaluated by est.analysis.

# compare out.tlasso and Omega.true.list
# main diagnoal is taken into consideration
est.analysis(out.tlasso,Omega.true.list,offdiag=FALSE)
## $error.kro
## [1] 0.3656762
## 
## $tpr.kro
## [1] 1
## 
## $tnr.kro
## [1] 0.4989574
## 
## $av.error.f
## [1] 0.1986636
## 
## $av.error.max
## [1] 0.08618703
## 
## $av.tpr
## [1] 1
## 
## $av.tnr
## [1] 0.3333333
## 
## $error.f
## [1] 0.2130124 0.0980819 0.2848964
## 
## $error.max
## [1] 0.07814616 0.05873962 0.12167531
## 
## $tpr
## [1] 1 1 1
## 
## $tnr
## [1] 0.0000000 0.3333333 0.6666667

est.analysis returns a list of estimation performance measures:

Argument Explanation
Out$error.kro error in Frobenius norm of kronecker product
Out$tpr.kro TPR of kronecker product
Out$tnr.kro TNR of kronecker product
Out$av.error.f averaged Frobenius norm error across all modes
Out$av.error.max averaged Max norm error across all modes
Out$av.tpr averaged TPR across all modes
Out$av.tnr averaged TNR across all modes
Out$error.f vector; error in Frobenius norm of each mode
Out$error.max vector; error in Max norm of each mode
Out$tpr vector; TPR of each mode
Out$tnr vector; TNR of each mode

Given DATA and out.tlasso, we next show how to compute test statistic.

mat.list=list() # list of matrices of test statistic value  
for ( k in 1:length(m.vec)) {
  rho=covres(DATA, out.tlasso, k = k) 
  # sample covariance matrix of residuals, including diagnoal 
  bias_rho=biascor(rho,out.tlasso,k=k)
  # bias corrected sample covariance of residuals, excluding diagnoal
  
  varpi2=varcor(DATA, out.tlasso, k = k)
  # variance correction term for kth mode's sample covariance of residuals

  tautest=matrix(0,m.vec[k],m.vec[k])
  for( i in 1:(m.vec[k]-1)) {
    for ( j in (i+1):m.vec[k]){
      tautest[j,i]=tautest[i,j]=sqrt((n-1)*prod(m.vec[-k]))*
                      bias_rho[i,j]/sqrt(varpi2*rho[i,i]*rho[j,j])
      # compute final test statistic 
    }
  }
  
  mat.list[[k]]=tautest
}
mat.list[[1]]
##            [,1]      [,2]      [,3]     [,4]       [,5]
## [1,]  0.0000000  2.761395 -0.231802 1.509672 -0.4881653
## [2,]  2.7613945  0.000000  2.411092 0.797850 -1.2962198
## [3,] -0.2318020  2.411092  0.000000 2.153166  1.2980072
## [4,]  1.5096719  0.797850  2.153166 0.000000  2.3900777
## [5,] -0.4881653 -1.296220  1.298007 2.390078  0.0000000

To compute test statistic, we first need to compute sample covariance of residuals via rho. rho returns a sample covariance matrix, including diagnoal. Then we conduct bias correction biascor and variance correction varcor. biascor returns a bias corrected sample covariance matrix. varcor returns a scalar to scale test statistic into standard normal. Given test statistic value mat.list, we turn to test hypothesis. The significant level we choose is \(0.95\).

# inference measures (off-diagnoal), critical value is 0.975 quantile of standard normal
infer.analysis(mat.list, qnorm(0.975), Omega.true.list, offdiag=TRUE)
## $fp
## [1] 0 0 2
## 
## $fn
## [1] 0 0 4
## 
## $d
## [1] 8 8 6
## 
## $nd
## [1] 12 12 14
## 
## $t
## [1] 8 8 8

infer.analysis returns a list of inference performance measures:

Argument Explanation
Out$fp vector; number of false positive of each mode
Out$fn vector; number of false negative of each mode
Out$d vector; number of all discovery of each mode
Out$nd vector; number of all non-discovery of each mode
Out$t vector; number of all true non-zero entries of each mode

Due to the fact that this inference procedure relies on multiple testing, FDR control is indispensible. Lyu et al. (2019) provides an easy-implemented and efficient FDR control procedure. This procedure asymptotically controls FDR via selecting the smallest critical value that contrains FDP under certain level.

k=1 # interested mode 
upsilon=0.1  # control level

# compute the difference between FDP and upsilon
fun=function(varsigma,mk,upsilon,tautest) {
  return((2*(1-pnorm(varsigma))*mk*(mk-1))/max(1,sum(sign(abs(tautest) > varsigma))) - upsilon)
}

# select a critical value in (0,6) that has the samllest difference 
diff=c();ind=1;inter=seq(0,6,0.0001)
for (varsigma in inter) {
    diff[ind]=fun(varsigma,mk=m.vec[k],upsilon=upsilon,tautest=mat.list[[k]])
    ind=ind+1
}
# the smallest critical value that constrains FDP under upsilon
critical=inter[min(which(diff < 0))]

# testing hypothesis with the critcal value 
# FDR will converge to the limit proved in Lyu et al. 2019.
inference.FDR=infer.analysis(mat.list, critical, Omega.true.list, offdiag=TRUE)

Finally, we would like to visualize graph structure of inference via graph.pattern.

k=1 # interested mode
# true graph structure. 
# set thres=0 in case true edge is eliminated
graph.pattern(Omega.true.list[[1]],main='True graph of mode 1',thres=0)

inf.mat=mat.list[[k]] > qnorm(0.975)
# set thres=0 (<1) since inf.mat is logical
graph.pattern(inf.mat,main='Inference of mode 1',thres=0)

From graphs we can see that inference result quite matches true graph.

Back to Top

Reference

  1. Fan J, Feng Y, Wu Y. Network exploration via the adaptive LASSO and SCAD penalties. The annals of applied statistics, 2009, 3(2): 521.
  2. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 2008: 9.3: 432-441.
  3. Lee W, Liu Y. Joint estimation of multiple precision matrices with common structures. Journal of Machine Learning Research, 2015, 16: 1035-1062.
  4. Li H, Gui J. Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics, 2006, 7(2): 302-317.
  5. Lyu, X., Sun, W. W., Wang, Z., Liu, H., Yang, J., Cheng, G. Tensor graphical model: Non-convex optimization and statistical inference. IEEE transactions on pattern analysis and machine intelligence, 2019: 42(8): 2024-2037.