LnCeCell: Help

Help & FAQs

If you have any concerns please read this collection of frequently asked questions before contacting us. If
you are still unclear about something feel free to contact.

Database content

Background of LnCeCell.

Single-cell genome sequencing have greatly expanded our understanding of complex microbial ecosystems and variant cell states by isolating the contributions of distinct cellular populations. Within tumor microenvironment, cells are exhibiting different cellular behaviors driven by the fine-tuning of gene expression and regulation. Identification of cellular-specific gene regulation network will help us to understand the disease pathology of individual cell and further contribute to precision medicine.

To open a new gate to personalized characterization of diseases based on the opinion of "One Cell, One World", we describe a comprehensive database, LnCeCell, which documents cellular-specific lncRNA-associated ceRNA networks and biomarkers of high quality manual curation based on the published literature and high-throughput identification from single-cell genomics data.

Datasets in LnCeCell.

LnCeCell curated cellular-specific ceRNA regulations from thousands of cells across 25 types of cancers, including:

1. more than 9,000 experimentally supported lncRNA biomarkers associating with tumor metastasis, recurrence, prognosis, circulating, drug resistance and etc.;

2. cellular-specific ceRNA networks for each of the primary, malignant, metastatic cancer cells and immune cells;

3. detail information of ceRNA sub-cellular locations by manual curation from literatures and related data sources;

4. clusters of distinct cellular populations which exhibiting diverse behaviors such as angiogenesis, apoptosis, cell cycle, invasion, proliferation, stemness and etc.

ScRNA-seq data collection and processing.

First of all, we systematically collected cancer-related scRNA-seq datasets from CancerSEA (http://biocc.hrbmu.edu.cn/CancerSEA/), including mRNA and lncrna expression profiles with more than 100 cells (1), which can be used for single cell lncRNA-associated ceRNA network construction. And a total of 20 sets of single cell datasets from 12 cancer types were obtained. We also collected cancer-related scRNA-seq datasets from Gene Expression Omnibus (GEO) according to the following keyword search: (‘single cell’ OR ‘single-cell’ OR ‘single cells’ OR ‘single-cells’) AND (‘transcriptomics’ OR ‘transcriptome’ OR ‘RNA-seq’ OR ‘RNA-sequencing’ OR ‘RNA sequencing’ OR ‘scRNA-seq’ OR ‘scRNA seq’) AND (‘tumor’ OR ‘tumour’ OR ‘cancer’ OR ‘carcinoma’ OR ‘neoplasm’ OR ‘neoplastic’). We required that the number of cancer cells should be greater than 100 after quality control, and the expression profiles could be divided into mRNA and lncRNA expression profiles through annotation from GENCODE (release 34, GRCh38). If the original papers have included the information about whether cells are malignant or not, we only remained the malignant cells. Considering the high technical noise of single cell expression profile, we have carried out quality control on single cells. We excluded cells with the number of expressed genes fewer than 1,000. Genes with detectable expression in at least 1% cells were retained. At last, a total of 94,605 cancer cells derived from 40 single-cell datasets from 25 cancer types were remained for the construction of single cell ceRNA networks. For each dataset, we showed the clustering map of cell populations, constructed cellular-specific lncRNA-associated ceRNA networks for all cells in the dataset, showed the sub-cellular localization of these ceRNAs, and characterized the functional states of each cell.

Functional annotation data collection.

In order to distinguish the functional states of different cancer cells, we downloaded the characteristic gene sets corresponding to the 14 functional states including stemness, invasion, metastasis, proliferation, EMT, angiogenesis, apoptosis, cell cycle, differentiation, DNA damage, DNA repair, hypoxia, inflammation and quiescence from CancerSEA (1). Based on these signatures, the activities of 14 functional states across cancer single cells in each dataset were evaluated using Gene Set Variation Analysis (GSVA) with the GSVA package in R (2). The sub-cellular and extracellular vesicle locations of lncRNAs, miRNAs and mRNAs were collected from related databases (3-7) and manual curation from published literatures. A number of 9,306 experimentally supported lncRNA biomarkers associating with drug resistance, circulating, survival, immune, metastasis, recurrence, cell growth, EMT, apoptosis, and autophagy were manually curated from the literatures and integrated into the LnCeCell database. For pathway annotation, a total of 1,329 biological pathway gene sets of KEGG, BioCarta, Reactome, and other biological pathway databases were collected from MSigDB (8). For biological function annotation, a total of 5,917 gene sets representing functional terms were collected from Gene Ontology (9). Ten classic cancer hallmark processes, including Self Sufficiency in Growth Signals, Insensitivity to Antigrowth Signals, Evading Apoptosis, Limitless Replicative Potential, Sustained Angiogenesis, Tissue Invasion and Metastasis, Genome Instability and Mutation, Tumor Promoting Inflammation, Reprogramming Energy Metabolism and Evading Immune Detection, were derived from a previous study (10). We manually curated gene sets of the ten cancer hallmark processes from corresponding GO terms and mapped them to each of the cancer hallmarks.

Construction of single cell ceRNA networks.

We collected candidate ceRNA pairs from two databases: starBase v2.0 (11) and LncACTdb 2.0 (12), and used the union of them as candidate ceRNA regulations. A total of 108,668 candidate ceRNA regulations were collected. To verify whether these ceRNAs were associated in a single cell, we used a published method for cell-specific network construction based on probability theory to identify ceRNA networks in single cells (Figure S1A) (13). In this work, we assume that a ceRNA pair may have association in some cells but not in the other cells due to the difference of cell types.

We determined whether lncRNAs and mRNAs were related in a cell by testing the statistical independence of the expression values for a candidate ceRNA in the same cell. For this ceRNA: x(mRNA), y(lncRNA) in cell k, we calculate a statistic:

Where n is the total number of cells. n_x^(k) and n_y^(k) are predetermined integers. We set n_x^(k) = n_y^(k) = 0.1n in this work. We first draw the two boxes near x_k and y_k based on the predetermined n_x^(k) and n_y^(k), and then we can straight forwardly have the third box, which is simply the intersection of the previous two boxes (Figure S1B). Thus, we can obtain the value of n_xy^(k) by counting the plots in the third box.

If x and y are independent of each other, this statistic follows standard normal distribution and the mean value and variance for the n cells are 0 and 1 respectively. Therefore, we can give the significance of x, y correlation according to the statistic. If P < 0.05, edge_xy^(k) is set to 1 in the network of cell k. We retained pairs which meet P < 0.05 in a single cell for network construction. The algorithm requires that single cell datasets must have both mRNA and lncRNA expression profiles, and the numbers of cells are greater than 100. While there is no strict requirement on the data type of scRNA-seq array. This method is not sensitive to the normalization method, and is suitable to various types of gene expression matrix.

In scRNA-seq data, most zeros may result from the experimental problems, which are meaningless in biology and may produce errors in the data analysis. Hence, in this work, we treat the zeros in the following way (13): (1) If we cannot distinguish whether or not the zeros result from the zero-expression or the experimental problems, edge_xy^(k) is set to 0 when x_k = 0 or y_k = 0 without the consideration of the statistic. (2) If we know that the zeros result from the zero-expression, edge_xy^(k) is determined by the statistic.

Figure S1

Classification of cancer single cells.

Using the Seurat package in R (14,15), we clustered cells according to the gene expression and the ceRNA occurrence profiles respectively. When clustering cells according to the gene expression values, we merged the mRNA and lncRNA expression profiles and clustered the cells with the combined expression profile. When using the ceRNA occurrence as the characteristics of cells, if a certain ceRNA pair had significant correlations in the cells, the log(p) values was used as the characteristic value. On the contrary, when the ceRNA pair had no significant correlation in the cells, the characteristic values were assigned to 0. Then we got the characteristic matrix for clustering, the rows of which were the ceRNA pairs and the columns were cells.

Characterizing functional states of cancer single cells.

We have learned that cancer cells have 14 crucial functional states of cancer cells, including stemness, invasion, metastasis, proliferation, EMT, angiogenesis, apoptosis, cell cycle, differentiation, DNA damage, DNA repair, hypoxia, inflammation and quiescence (1). In order to distinguish the functional states of different cancer cells, we downloaded the characteristic gene sets corresponding to the 14 functional states from CancerSEA. Based on these signatures, the activities of 14 functional states across cancer single cells in each dataset were evaluated using Gene Set Variation Analysis (GSVA) with the GSVA package in R (2). Finally, we got the enrichment scores of 14 signatures across cells in all scRNA-seq data, which were used to characterize the signature activity.

Functional analysis of lncRNA-associated ceRNAs.

LnCeCell develops the CeRNA-Function and CeRNA-Hallmark sections to perform functional analysis of lncRNAs based on a “guilt-by-association” strategy. For a lncRNA, the corresponding downstream mRNA targets were used to perform a function enrichment analysis. LnCeCell performs a hypergeometric test to evaluate the enrichment significance based on different functional contexts. If there are a total of N genes in the genome, of which S are involved in the gene set under investigation, and there are a total of M interesting target genes for analysis, of which x are involved in the same function gene set, then the P value can be calculated as:

Significantly enriched functions were defined at the P<0.05 level and further illustrated as a bar graph based on –log10(P) values.

Survival analysis of ceRNA regulations.

The CeRNA-Survival section performs COX regression analysis and provides Kaplan-Meier survival curves for lncRNAs, miRNAs, mRNAs and their composition of ceRNAs. LnCeCell derives clinical follow-up information of 10,141 patients from TCGA and performs a univariate Cox regression analysis to evaluate the association between survival state and the expression level of each lncRNA-miRNA-mRNA member in a ceRNA interaction. A risk score model, which takes into account both the strength and positive/negative association between each competing RNA and probability of survival, was developed to evaluate the association between survival and expression in a certain cancer (12). For each patient, the risk score was calculated by linearly combining the ceRNA expression values weighted by the Cox regression coefficients:

where β_i is the Cox regression coefficient of a lncRNA, miRNA or mRNA in a ceRNA interaction (indicated as c_i), n is the number of competing RNAs (n=3 in this study) and Exp(c_i) is the expression value of competing RNA c_i in the corresponding sample. The median and mean risk scores were used as cut-off points to divide samples into high and low-risk groups.

Tools and Services in LnCeCell.

LnCeCell provides a user-friendly searching and browsing interface. In addition, as an important supplement of the database, we have set up several flexible tools that facilitate retrieval and analysis of the data. Including:

1. Cell-Map provides a global map of ceRNAs identified in distinct cellular populations;

2. Cell-Location provides sub-cellular locations of ceRNAs;

3. Cell-Network tool creates visualization of a dysregulated ceRNA network in a single cell;

4. Cell-State tool provides a global view of cell behaviors such as angiogenesis, apoptosis, cell cycle, invasion and etc.;

5. CeRNA-Function tool identify dysregulated functions of lncRNA-associated ceRNA based on Gene Ontology and biological pathways;

6. CeRNA-Halkmark tool identify ceRNA related cancer hallmarks such as Insensitivity to Antigrowth Signals, Tissue Invasion and Metastasis etc.;

7. LnCeCell-Survival performs COX regression analysis and survival curves for ceRNAs.

Related works of LnCeCell.

Our team has developed several databases and web-servers focusing lncRNA-centric regulating mechanisms in pan-cancer analysis:

1. LnCeVar: A comprehensive database that aims to infer genomic variations that disturb lncRNA-associated ceRNA regulation;

2. Lnc2Cancer 3.0: A database of experimentally supported lncRNA-cancer associations;

3. LncACTdb 2.0: A database aims to integrate manually curated and predicted ceRNA interactions;

4. LincSNP 3.0: A database aims to store and annotate disease-associated SNPs in human lncRNAs and their TFBSs;

5. Lnc2Meth: A database for clarifying the lncRNA-methylation regulatory relationships;

6. MSDD: a manually curated database that provides comprehensive experimentally supported associations among microRNAs (miRNAs), single nucleotide polymorphisms (SNPs) and human diseases;

Contact the team of LnCeCell.

Xia Li: lixia@hrbmu.edu.cn

Phone & Fax: +86-451-86615922

Address: 194 Xuefu Road, Harbin 150081, CHINA

Web interface

Quick start in LnCeCell.

LnCeCell provides a user-friendly searching and browsing interface.

1. Main functions of the database are provided in menu bar form (boxed in red).

2. Click this circle to start a quick search.

3. Click "GET STARTED" button to Analysis tools panel in LnCeCell.

4. Click "QUICK SEARCH" button to start a quick search.

5. Click "GET HELP" button to get helps and FAQs of LnCeCell.

Figure 1-1

Search various datasets in LnCeCell.

LnCeCell provides accurate search for four types of dataset.(Example for CeRNA)

1. Input lncRNA ID or name.

2. Input mRNA ID or name.

3. Click and select interested disease.

4. Click and select interested tissue.

5. Scroll, drag, or type No.miRNAs or Pct.Cells.

Figure 2-1

Browse various datasets in LnCeCell.

In the browse page, there are four different datasets for you to browse diseases, lncRNAs, mRNAs and biomarkers.

1. Choose a data type you want to browse.

2. Diseases result.

3. lncRNAs and mRNAs result.

4. Biomarkers result.

Figure 3-1

Read the search results for cernas / cells.

The result page of cernas datasets is displayed in Figure 4-1 and cells datasets is displayed in Figure 4-2

1. Click to check the detail information of the entry.

2. LncRNA basic information (ceRNAs) / cell name (Cells).

3. MRNA basic information(ceRNAs) / disease name (Cells).

4. Percentage and total number of cells(ceRNAs) / tissue name (Cells).

5. Number of related mRNA(ceRNAs) / cell types (Cells).

6. Disease and tissue name(ceRNAs) / number of cerna detected in this cell (Cells).

7. Various visual information in the entry

8. Click to download data and search the form for what you are interested in.

Figure 4-1

Figure 4-2

Basic information in LnCeCell.

For each ceRNA unit, we provide basic information, which is displayed in Figure 5-1.

1. You could directly go to your interested module by click the axis.

2. The basic information of the ceRNA event you searched.

Figure 5-1

Cell map in LnCeCell.

A global map of ceRNAs identified in distinct cellular populations,(Example for ceRNA), which is displayed in Figure 6-1.

1. Display the number and proportion of ceRNA in cells.

2. Click and select interested lncRNA/mRNA/miRNA/disease.

Figure 6-1

Cell location in LnCeCell.

Find out the sub-cellular locations for a ceRNA.(Example for ceRNA),which is displayed in Figure 7-1.

1. This section shows the specific organelles located.

Figure 7-1

Functional annotation in LnCeCell.

For the hit ceRNA, we provide functional annotation analysis in LnCeCell, which is displayed in Figure 8-1.

1. Click and select 5/10/20/30 top enriched functions.

2. The enriched pathways of hit downstream genes.

3. The enriched GO terms of hit downstream genes.

Figure 8-1

Hallmark annotation in LnCeCell.

For the hit ceRNA, we provide hallmark analysis in LnCeCell, which is displayed in Figure 9-1.

1. Dysregulated hallmarks of ceRNA event.

Figure 9-1

Network visualization in LnCeCell.

For the hit ceRNA/Cell, it's related network can be displayed using different neighbor numbers, which is displayed in Figure 10-1.

1. Click and select one/two/three-step competing neighbors or visual layout.

2. The hit lncRNA-cenrtic ceRNA network./The hit cell network.

Figure 10-1

Survival analysis in LnCeCell.

For the hit ceRNA, we provide survival analysis service based on ceRNA expression information, which is displayed in Figure 11-1 and 11-2.

1. Click and select interested lncRNA/mRNA/miRNA/disease.

2. The COX regression analysis result of the hit completing triplet.

3. Survival curves based on mean/median expression.

Figure 11-1

Figure 11-2

References.

1. Yuan, H., Yan, M., Zhang, G., Liu, W., Deng, C., Liao, G., Xu, L., Luo, T., Yan, H., Long, Z. et al. (2019) CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res, 47, D900-d908.

2. Hänzelmann, S., Castelo, R. and Guinney, J. (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC bioinformatics, 14, 7.

3. Li, S., Li, Y., Chen, B., Zhao, J., Yu, S., Tang, Y., Zheng, Q., Li, Y., Wang, P., He, X. et al. (2018) exoRBase: a database of circRNA, lncRNA and mRNA in human blood exosomes. Nucleic Acids Res, 46, D106-d112.

4. Liu, T., Zhang, Q., Zhang, J., Li, C., Miao, Y.R., Lei, Q., Li, Q. and Guo, A.Y. (2019) EVmiRNA: a database of miRNA profiling in extracellular vesicles. Nucleic Acids Res, 47, D89-d93.

5. Mas-Ponte, D., Carlevaro-Fita, J., Palumbo, E., Hermoso Pulido, T., Guigo, R. and Johnson, R. (2017) LncATLAS database for subcellular localization of long noncoding RNAs. RNA, 23, 1080-1087.

6. Quek, X.C., Thomson, D.W., Maag, J.L., Bartonicek, N., Signal, B., Clark, M.B., Gloss, B.S. and Dinger, M.E. (2015) lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res, 43, D168-173.

7. Xia, S., Feng, J., Chen, K., Ma, Y., Gong, J., Cai, F., Jin, Y., Gao, Y., Xia, L., Chang, H. et al. (2018) CSCD: a database for cancer-specific circular RNAs. Nucleic Acids Res, 46, D925-d929.

8. Liberzon, A., Birger, C., Thorvaldsdottir, H., Ghandi, M., Mesirov, J.P. and Tamayo, P. (2015) The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell systems, 1, 417-425.

9. The Gene Ontology, C. (2019) The Gene Ontology Resource: 20 years and still GOing strong. Nucleic acids research, 47, D330-D338.

10. Hanahan, D. and Weinberg, R.A. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646-674.

11. Li, J.H., Liu, S., Zhou, H., Qu, L.H. and Yang, J.H. (2014) starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res, 42, D92-97.

12. Wang, P., Li, X., Gao, Y., Guo, Q., Wang, Y., Fang, Y., Ma, X., Zhi, H., Zhou, D., Shen, W. et al. (2019) LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low- and high-throughput experiments. Nucleic Acids Res, 47, D121-D127.

13. Dai, H., Li, L., Zeng, T. and Chen, L. (2019) Cell-specific network constructed by single-cell RNA sequencing data. Nucleic Acids Res, 47, e62.

14. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. and Satija, R. (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature biotechnology, 36, 411-420.

15. Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W.M., 3rd, Hao, Y., Stoeckius, M., Smibert, P. and Satija, R. (2019) Comprehensive Integration of Single-Cell Data. Cell, 177, 1888-1902.e1821.

Help & FAQs

Database content

Web interface

Didn't find the answer?