Help & FAQs
If you have any concerns please read this collection of frequently asked questions before contacting us. If
you are still unclear about something feel free to contact.
Database content
Single-cell genome sequencing have greatly expanded our understanding of complex microbial ecosystems and variant cell states by isolating the contributions of distinct cellular populations. Within tumor microenvironment, cells are exhibiting different cellular behaviors driven by the fine-tuning of gene expression and regulation. Identification of cellular-specific gene regulation network will help us to understand the disease pathology of individual cell and further contribute to precision medicine.
To open a new gate to personalized characterization of diseases based on the opinion of "One Cell, One World", we describe a comprehensive database, LnCeCell, which documents cellular-specific lncRNA-associated ceRNA networks and biomarkers of high quality manual curation based on the published literature and high-throughput identification from single-cell genomics data.
LnCeCell curated cellular-specific ceRNA regulations from thousands of cells across 25 types of cancers, including:
1. more than 9,000 experimentally supported lncRNA biomarkers associating with tumor metastasis, recurrence, prognosis, circulating, drug resistance and etc.;
2. cellular-specific ceRNA networks for each of the primary, malignant, metastatic cancer cells and immune cells;
3. detail information of ceRNA sub-cellular locations by manual curation from literatures and related data sources;
4. clusters of distinct cellular populations which exhibiting diverse behaviors such as angiogenesis, apoptosis, cell cycle, invasion, proliferation, stemness and etc.
First of all, we systematically collected cancer-related scRNA-seq datasets from CancerSEA (http://biocc.hrbmu.edu.cn/CancerSEA/), including mRNA and lncrna expression profiles with more than 100 cells (1), which can be used for single cell lncRNA-associated ceRNA network construction. And a total of 20 sets of single cell datasets from 12 cancer types were obtained. We also collected cancer-related scRNA-seq datasets from Gene Expression Omnibus (GEO) according to the following keyword search: (‘single cell’ OR ‘single-cell’ OR ‘single cells’ OR ‘single-cells’) AND (‘transcriptomics’ OR ‘transcriptome’ OR ‘RNA-seq’ OR ‘RNA-sequencing’ OR ‘RNA sequencing’ OR ‘scRNA-seq’ OR ‘scRNA seq’) AND (‘tumor’ OR ‘tumour’ OR ‘cancer’ OR ‘carcinoma’ OR ‘neoplasm’ OR ‘neoplastic’). We required that the number of cancer cells should be greater than 100 after quality control, and the expression profiles could be divided into mRNA and lncRNA expression profiles through annotation from GENCODE (release 34, GRCh38). If the original papers have included the information about whether cells are malignant or not, we only remained the malignant cells. Considering the high technical noise of single cell expression profile, we have carried out quality control on single cells. We excluded cells with the number of expressed genes fewer than 1,000. Genes with detectable expression in at least 1% cells were retained. At last, a total of 94,605 cancer cells derived from 40 single-cell datasets from 25 cancer types were remained for the construction of single cell ceRNA networks. For each dataset, we showed the clustering map of cell populations, constructed cellular-specific lncRNA-associated ceRNA networks for all cells in the dataset, showed the sub-cellular localization of these ceRNAs, and characterized the functional states of each cell.
In order to distinguish the functional states of different cancer cells, we downloaded the characteristic gene sets corresponding to the 14 functional states including stemness, invasion, metastasis, proliferation, EMT, angiogenesis, apoptosis, cell cycle, differentiation, DNA damage, DNA repair, hypoxia, inflammation and quiescence from CancerSEA (1). Based on these signatures, the activities of 14 functional states across cancer single cells in each dataset were evaluated using Gene Set Variation Analysis (GSVA) with the GSVA package in R (2). The sub-cellular and extracellular vesicle locations of lncRNAs, miRNAs and mRNAs were collected from related databases (3-7) and manual curation from published literatures. A number of 9,306 experimentally supported lncRNA biomarkers associating with drug resistance, circulating, survival, immune, metastasis, recurrence, cell growth, EMT, apoptosis, and autophagy were manually curated from the literatures and integrated into the LnCeCell database. For pathway annotation, a total of 1,329 biological pathway gene sets of KEGG, BioCarta, Reactome, and other biological pathway databases were collected from MSigDB (8). For biological function annotation, a total of 5,917 gene sets representing functional terms were collected from Gene Ontology (9). Ten classic cancer hallmark processes, including Self Sufficiency in Growth Signals, Insensitivity to Antigrowth Signals, Evading Apoptosis, Limitless Replicative Potential, Sustained Angiogenesis, Tissue Invasion and Metastasis, Genome Instability and Mutation, Tumor Promoting Inflammation, Reprogramming Energy Metabolism and Evading Immune Detection, were derived from a previous study (10). We manually curated gene sets of the ten cancer hallmark processes from corresponding GO terms and mapped them to each of the cancer hallmarks.
We collected candidate ceRNA pairs from two databases: starBase v2.0 (11) and LncACTdb 2.0 (12), and used the union of them as candidate ceRNA regulations. A total of 108,668 candidate ceRNA regulations were collected. To verify whether these ceRNAs were associated in a single cell, we used a published method for cell-specific network construction based on probability theory to identify ceRNA networks in single cells (Figure S1A) (13). In this work, we assume that a ceRNA pair may have association in some cells but not in the other cells due to the difference of cell types.
We determined whether lncRNAs and mRNAs were related in a cell by testing the statistical independence of the expression values for a candidate ceRNA in the same cell. For this ceRNA: x(mRNA), y(lncRNA) in cell k, we calculate a statistic:
Where n is the total number of cells. nx(k) and ny(k) are predetermined integers. We set nx(k) = ny(k) = 0.1n in this work. We first draw the two boxes near xk and yk based on the predetermined nx(k) and ny(k), and then we can straight forwardly have the third box, which is simply the intersection of the previous two boxes (Figure S1B). Thus, we can obtain the value of nxy(k) by counting the plots in the third box.
If x and y are independent of each other, this statistic follows standard normal distribution and the mean value and variance for the n cells are 0 and 1 respectively. Therefore, we can give the significance of x, y correlation according to the statistic. If P < 0.05, edgexy(k) is set to 1 in the network of cell k. We retained pairs which meet P < 0.05 in a single cell for network construction. The algorithm requires that single cell datasets must have both mRNA and lncRNA expression profiles, and the numbers of cells are greater than 100. While there is no strict requirement on the data type of scRNA-seq array. This method is not sensitive to the normalization method, and is suitable to various types of gene expression matrix.
In scRNA-seq data, most zeros may result from the experimental problems, which are meaningless in biology and may produce errors in the data analysis. Hence, in this work, we treat the zeros in the following way (13): (1) If we cannot distinguish whether or not the zeros result from the zero-expression or the experimental problems, edgexy(k) is set to 0 when xk = 0 or yk = 0 without the consideration of the statistic. (2) If we know that the zeros result from the zero-expression, edgexy(k) is determined by the statistic.
Figure S1
Using the Seurat package in R (14,15), we clustered cells according to the gene expression and the ceRNA occurrence profiles respectively. When clustering cells according to the gene expression values, we merged the mRNA and lncRNA expression profiles and clustered the cells with the combined expression profile. When using the ceRNA occurrence as the characteristics of cells, if a certain ceRNA pair had significant correlations in the cells, the log(p) values was used as the characteristic value. On the contrary, when the ceRNA pair had no significant correlation in the cells, the characteristic values were assigned to 0. Then we got the characteristic matrix for clustering, the rows of which were the ceRNA pairs and the columns were cells.
We have learned that cancer cells have 14 crucial functional states of cancer cells, including stemness, invasion, metastasis, proliferation, EMT, angiogenesis, apoptosis, cell cycle, differentiation, DNA damage, DNA repair, hypoxia, inflammation and quiescence (1). In order to distinguish the functional states of different cancer cells, we downloaded the characteristic gene sets corresponding to the 14 functional states from CancerSEA. Based on these signatures, the activities of 14 functional states across cancer single cells in each dataset were evaluated using Gene Set Variation Analysis (GSVA) with the GSVA package in R (2). Finally, we got the enrichment scores of 14 signatures across cells in all scRNA-seq data, which were used to characterize the signature activity.
LnCeCell develops the CeRNA-Function and CeRNA-Hallmark sections to perform functional analysis of lncRNAs based on a “guilt-by-association” strategy. For a lncRNA, the corresponding downstream mRNA targets were used to perform a function enrichment analysis. LnCeCell performs a hypergeometric test to evaluate the enrichment significance based on different functional contexts. If there are a total of N genes in the genome, of which S are involved in the gene set under investigation, and there are a total of M interesting target genes for analysis, of which x are involved in the same function gene set, then the P value can be calculated as:
Significantly enriched functions were defined at the P<0.05 level and further illustrated as a bar graph based on –log10(P) values.
The CeRNA-Survival section performs COX regression analysis and provides Kaplan-Meier survival curves for lncRNAs, miRNAs, mRNAs and their composition of ceRNAs. LnCeCell derives clinical follow-up information of 10,141 patients from TCGA and performs a univariate Cox regression analysis to evaluate the association between survival state and the expression level of each lncRNA-miRNA-mRNA member in a ceRNA interaction. A risk score model, which takes into account both the strength and positive/negative association between each competing RNA and probability of survival, was developed to evaluate the association between survival and expression in a certain cancer (12). For each patient, the risk score was calculated by linearly combining the ceRNA expression values weighted by the Cox regression coefficients:
where βi is the Cox regression coefficient of a lncRNA, miRNA or mRNA in a ceRNA interaction (indicated as ci), n is the number of competing RNAs (n=3 in this study) and Exp(ci) is the expression value of competing RNA ci in the corresponding sample. The median and mean risk scores were used as cut-off points to divide samples into high and low-risk groups.
LnCeCell provides a user-friendly searching and browsing interface. In addition, as an important supplement of the database, we have set up several flexible tools that facilitate retrieval and analysis of the data. Including:
1. Cell-Map provides a global map of ceRNAs identified in distinct cellular populations;
2. Cell-Location provides sub-cellular locations of ceRNAs;
3. Cell-Network tool creates visualization of a dysregulated ceRNA network in a single cell;
4. Cell-State tool provides a global view of cell behaviors such as angiogenesis, apoptosis, cell cycle, invasion and etc.;
5. CeRNA-Function tool identify dysregulated functions of lncRNA-associated ceRNA based on Gene Ontology and biological pathways;
6. CeRNA-Halkmark tool identify ceRNA related cancer hallmarks such as Insensitivity to Antigrowth Signals, Tissue Invasion and Metastasis etc.;
7. LnCeCell-Survival performs COX regression analysis and survival curves for ceRNAs.
Our team has developed several databases and web-servers focusing lncRNA-centric regulating mechanisms in pan-cancer analysis:
1. LnCeVar: A comprehensive database that aims to infer genomic variations that disturb lncRNA-associated ceRNA regulation;
2. Lnc2Cancer 3.0: A database of experimentally supported lncRNA-cancer associations;
3. LncACTdb 2.0: A database aims to integrate manually curated and predicted ceRNA interactions;
4. LincSNP 3.0: A database aims to store and annotate disease-associated SNPs in human lncRNAs and their TFBSs;
5. Lnc2Meth: A database for clarifying the lncRNA-methylation regulatory relationships;
6. MSDD: a manually curated database that provides comprehensive experimentally supported associations among microRNAs (miRNAs), single nucleotide polymorphisms (SNPs) and human diseases;
Xia Li: lixia@hrbmu.edu.cn
Phone & Fax: +86-451-86615922
Address: 194 Xuefu Road, Harbin 150081, CHINA
Web interface
LnCeCell provides a user-friendly searching and browsing interface.
1. Main functions of the database are provided in menu bar form (boxed in red).
2. Click this circle to start a quick search.
3. Click "GET STARTED" button to Analysis tools panel in LnCeCell.
4. Click "QUICK SEARCH" button to start a quick search.
5. Click "GET HELP" button to get helps and FAQs of LnCeCell.
Figure 1-1
LnCeCell provides accurate search for four types of dataset.(Example for CeRNA)
1. Input lncRNA ID or name.
2. Input mRNA ID or name.
3. Click and select interested disease.
4. Click and select interested tissue.
5. Scroll, drag, or type No.miRNAs or Pct.Cells.
Figure 2-1
In the browse page, there are four different datasets for you to browse diseases, lncRNAs, mRNAs and biomarkers.
1. Choose a data type you want to browse.
2. Diseases result.
3. lncRNAs and mRNAs result.
4. Biomarkers result.
Figure 3-1
The result page of cernas datasets is displayed in Figure 4-1 and cells datasets is displayed in Figure 4-2
1. Click to check the detail information of the entry.
2. LncRNA basic information (ceRNAs) / cell name (Cells).
3. MRNA basic information(ceRNAs) / disease name (Cells).
4. Percentage and total number of cells(ceRNAs) / tissue name (Cells).
5. Number of related mRNA(ceRNAs) / cell types (Cells).
6. Disease and tissue name(ceRNAs) / number of cerna detected in this cell (Cells).
7. Various visual information in the entry
8. Click to download data and search the form for what you are interested in.
Figure 4-1
Figure 4-2
For each ceRNA unit, we provide basic information, which is displayed in Figure 5-1.
1. You could directly go to your interested module by click the axis.
2. The basic information of the ceRNA event you searched.
Figure 5-1
A global map of ceRNAs identified in distinct cellular populations,(Example for ceRNA), which is displayed in Figure 6-1.
1. Display the number and proportion of ceRNA in cells.
2. Click and select interested lncRNA/mRNA/miRNA/disease.
Figure 6-1
Find out the sub-cellular locations for a ceRNA.(Example for ceRNA),which is displayed in Figure 7-1.
1. This section shows the specific organelles located.
Figure 7-1
For the hit ceRNA, we provide functional annotation analysis in LnCeCell, which is displayed in Figure 8-1.
1. Click and select 5/10/20/30 top enriched functions.
2. The enriched pathways of hit downstream genes.
3. The enriched GO terms of hit downstream genes.
Figure 8-1
For the hit ceRNA, we provide hallmark analysis in LnCeCell, which is displayed in Figure 9-1.
1. Dysregulated hallmarks of ceRNA event.
Figure 9-1
For the hit ceRNA/Cell, it's related network can be displayed using different neighbor numbers, which is displayed in Figure 10-1.
1. Click and select one/two/three-step competing neighbors or visual layout.
2. The hit lncRNA-cenrtic ceRNA network./The hit cell network.
Figure 10-1
For the hit ceRNA, we provide survival analysis service based on ceRNA expression information, which is displayed in Figure 11-1 and 11-2.
1. Click and select interested lncRNA/mRNA/miRNA/disease.
2. The COX regression analysis result of the hit completing triplet.
3. Survival curves based on mean/median expression.
Figure 11-1
Figure 11-2