SCOPRO (SCOre PROjection) is an R package that assigns a score projection from 0 to 1 between a given in vivo stage and each single cluster from an in vitro dataset. The score is assigned based on the the fraction of specific markers of the in vivo stage that are conserved in the in vitro clusters.
You can install the development version from GitHub with:
::install_github("ScialdoneLab/SCOPRO",auth_token="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",ref="master") devtools
The main function of the package is SCOPRO
SCOPRO(norm_vitro, norm_vivo, cluster_vitro, cluster_vivo, name_vivo, marker_stages_filter, threshold = 0.1, number_link = 1, fold_change = 3, threshold_fold_change = 0.1, marker_stages, selected_stages)
requires as input:
The mean expression profile of marker_stages_filter genes is computed for each cluster in the in vivo and in vitro dataset. For a given cluster, a connectivity matrix is computed with number of rows and number of columns equal to the length of marker_stages_filter. Each entry (i,j) in the matrix can be 1 if the fold_change between gene i and gene j is above fold_change. Otherwise is 0. Finally the connectivity matrix of a given name_vivo stage and all the clusters in the in vitro dataset are compared. A gene i is considered to be conserved between name_vivo and an in vitro cluster if the jaccard index of the links of gene i is above threshold.
Below an example of input using the development version of SCOPRO from GitHub
<- getwd()
current_wd = "https://hmgubox2.helmholtz-muenchen.de/index.php/s/EHQSnjMJxkR7QYT/download/SCOPRO.zip"
url <- paste0(current_wd,"/SCOPRO.zip")
destfile download.file(url, destfile, quiet = FALSE)
unzip(destfile, exdir=current_wd)
Load in vitro dataset (single cell RNA seq mouse data from Iturbe et al., 2021)
setwd(paste0(current_wd,"/SCOPRO"))
load(file='mayra_dati_raw_0.Rda')
=cluster_analysis_integrate_rare(mayra_dati_raw_0,"Mayra_data_0",0.1,5,30)
mayra_seurat_0=as.matrix(GetAssayData(mayra_seurat_0, slot = "data",assay="RNA"))
norm_es_vitro=as.vector(mayra_seurat_0$RNA_snn_res.0.1) cluster_es_vitro
Load in vivo mouse dataset (bulk RNA seq data from Deng et al. , 2014 and Mohammed et al. , 2017)
setwd(paste0(current_wd,"/SCOPRO"))
load(file="seurat_genes_published_mouse.Rda")
<- as.matrix(GetAssayData(seurat_genes_published_mouse, slot = "data",assay="RNA")) norm_vivo
Compute markers for selected in vivo stages using CIARA function markers_cluster_seurat based on package Seurat
DefaultAssay(seurat_genes_published_mouse) <- "RNA"
<- as.vector(seurat_genes_published_mouse$stim)
cluster_mouse_published
<- c("Late_2_cell", "epiblast_4.5", "epiblast_5.5", "epiblast_6.5")
relevant_stages
DefaultAssay(seurat_genes_published_mouse) <- "RNA"
<- CIARA::markers_cluster_seurat(seurat_genes_published_mouse[,cluster_mouse_published%in%relevant_stages],cluster_mouse_published[cluster_mouse_published%in%relevant_stages],names(seurat_genes_published_mouse$RNA_snn_res.0.2)[cluster_mouse_published%in%relevant_stages],10)
markers_first_ESC_small
<- as.vector(markers_first_ESC_small[[3]])
markers_mouse <- names(markers_first_ESC_small[[3]])
stages_markers
## Keeping only the genes in common between in vitro and in vivo datasets
<- stages_markers[markers_mouse %in% row.names(norm_es_vitro)]
stages_markers
<- markers_mouse[markers_mouse %in% row.names(norm_es_vitro)]
markers_small names(markers_small) <- stages_markers
For each in vivo stage, we select only the markers for which the median is above 0.1 and is below 0.1 in all the other stages.
<- select_top_markers(relevant_stages, cluster_mouse_published, norm_vivo, markers_small, max_number = 100, threshold = 0.1)
marker_result <- marker_result[[1]]
marker_all <- marker_result[[2]] marker_stages
We run SCOPRO between the cluster of the mouse ESCs dataset and the in vivo stage “Late 2-cells”.
The function SCOPRO first computes the mean expression profile of marker_stages_filter genes for each cluster in the in vivo and in vitro dataset. For a given cluster, a connectivity matrix is computed with number of rows and number of columns equal to the length of marker_stages_filter. Each entry (i,j) in the matrix can be 1 if the fold_change between gene i and gene j is above fold_change. Otherwise is 0. Finally the connectivity matrix of Late 2-cells stage and all the clusters in the in vitro dataset are compared. A gene i is considered to be conserved between Late 2-cells stage and an in vitro cluster if the jaccard index of the links of gene i is above threshold.
There are 25 markers of the Late 2-cells stage that are also expressed in the mouse ESC datasets. More than 75% of these 25 markers are conserved in the cluster number 2. This result is expected since cluster 2 is made up by 2CLC, a rare population of cells known to be transcriptionally similar to the late 2 cells-stage in the mouse embryo development (typical markers of 2CLC are the Zscan4 genes, also highly expressed in the late 2 cells-stage).
<- filter_in_vitro(norm_es_vitro,cluster_es_vitro ,marker_all, fraction = 0.10, threshold = 0)
marker_stages_filter
<- SCOPRO(norm_es_vitro,norm_vivo,cluster_es_vitro,cluster_mouse_published,"Late_2_cell",marker_stages_filter, threshold = 0.1, number_link = 1, fold_change = 3, threshold_fold_change = 0.1 ,marker_stages, relevant_stages)
analysis_2cell
plot_score(analysis_2cell, marker_stages, marker_stages_filter, relevant_stages, "Late_2_cell", "Final score", "Cluster", "Late_2_cell")
We can visualize which are the markers of the late 2 cells stage that are conserved/ not conserved in cluster 2. As expected the Zscan4 family genes are conserved.
<- select_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")
common_genes <- select_no_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")
no_common_genes
<- c(no_common_genes[1:4], common_genes[1:10])
all_genes <- c(paste0(no_common_genes[1:4], "-no_conserved"), paste0(common_genes[1:10], "-conserved"))
all_genes_label
<- plot_score_genes(all_genes, "Mouse ESC", "Mouse vitro", norm_es_vitro,norm_vivo[ , cluster_mouse_published=="Late_2_cell"],cluster_es_vitro, cluster_mouse_published[cluster_mouse_published == "Late_2_cell"], all_genes_label, 7, 10, "Late_2_cell")
rabbit_plot
rabbit_plot
The following vignette is available and completely reproducible. In this vignette it is shown the projection performed between single cell RNA seq mouse data from Iturbe et al., 2021 and in vivo mouse datasets from Deng et al. , 2014 and Mohammed et al. , 2017. It can be accessed within R with:
::vignette("SCOPRO_vignette") utils
Contributions in the form of feedback, comments, code and bug report are welcome. * For any contributions, feel free to fork the source code and submit a pull requests. * Please report any issues or bugs here: https://github.com/ScialdoneLab/SCOPRO/issues. Any questions and requests for support can also be directed to the package maintainer (gabriele[dot]lubatti[at]helmholtz-muenchen[dot]de).