First, we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform by using the function gendata_RNAExp
in DR.SC
package, which is a Seurat object format. It is noted that the meta.data
must include spatial coordinates in columns named “row” (x coordinates) and “col” (y coordinates)!
This preprocessing includes Log-normalization and feature selection. Here we select highly variable genes for example first. The selected genes’ names are saved in “seu@assays$RNA@var.features”
For function DR.SC
, users can specify the number of clusters \(K\) or set K
to be an integer vector by using modified BIC(MBIC) to determine \(K\). First, we try using user-specified number of clusters. Then we show the version chosen by MBIC.
After finishing model fitting, we use ajusted rand index (ARI
) to check the performance of clustering
Next, we show the application of DR-SC in visualization. First, we can visualize the clusters from DR-SC on the spatial coordinates.
We can also visualize the clusters from DR-SC on the two-dimensional tSNE based on the extracted features from DR-SC.
Show the UMAP plot based on the extracted features from DR-SC.
Use MBIC to choose number of clusters:
First, we select the spatilly variable genes using funciton FindSVGs
.
### Given K
seu <- NormalizeData(seu, verbose=F)
# choose 400 spatially variable features using FindSVGs
seus <- FindSVGs(seu, nfeatures = 400, verbose = F)
seu2 <- DR.SC(seus, q=4, K=4, platform = 'ST', verbose=F)
Using ARI to check the performance of clustering
Show the spatial scatter plot for clusters
Show the tSNE plot based on the extracted features from DR-SC.
Show the UMAP plot based on the extracted features from DR-SC.
Use MBIC to choose number of clusters:
Conduct visualization of marker gene expression. ### Ridge plots Visualize single cell expression distributions in each cluster from Seruat.
dat <- FindAllMarkers(seu2)
suppressPackageStartupMessages(library(dplyr) )
# Find the top 1 marker genes, user can change n to access more marker genes
dat %>%group_by(cluster) %>%
top_n(n = 1, wt = avg_log2FC) -> top
genes <- top$gene
RidgePlot(seu2, features = genes, ncol = 2)
Visualize single cell expression distributions in each cluster
We extract tSNE based on the features from DR-SC and then visualize feature expression in the low-dimensional space
The size of the dot corresponds to the percentage of cells expressing the feature in each cluster. The color represents the average expression level
Single cell heatmap of feature expression
# standard scaling (no regression)
dat %>%group_by(cluster) %>%
top_n(n = 30, wt = avg_log2FC) -> top
### select the marker genes that are also the variable genes.
genes <- intersect(top$gene, seu2[['RNA']]@var.features)
## Change the HVGs to SVGs
# <- topSVGs(seu2, 400)
seu2 <- ScaleData(seu2, verbose = F)
DoHeatmap(subset(seu2, downsample = 500),features = genes, size = 5)