HTAN Single Cell/Single Nucleus RNA Sequencing Data Standard


This page describes the data levels, metadata attributes, and file structure for single cell and single nucleus RNA sequencing assays.

Description of Assay

Single cell RNA sequencing is an emerging technology used to investigate the expression profiles of individual cells and/or nuclei. This technique is becoming increasingly useful for investigating the tumor microenvironment, which is composed of a heterogeneous population of cancer cells and tumor-adjacent stromal cells. In these experiments, tissues are enzymatically dissociated, and individual cells are isolated via microfluidics using oil droplet emulsion. Similarly to bulk RNA sequencing, individual transcriptomes are then uniquely tagged, reversed transcribed, amplified and sequenced. While sc-RNA sequencing captures both cytoplasmic and nuclear transcripts, single nucleus RNA sequencing measures the transcriptome of individual nuclei. Advantages of sn-RNA sequencing include differentiating cell states and identifying rare or novel cell types in heterogeneous populations.

Metadata Levels

In alignment with The Cancer Genome Atlas & NCI Genomic Data Commons, data are divided into levels:

Level Number


Example Data


Raw data

FASTQs, unaligned BAMs


Aligned primary data

Aligned BAMs


Derived biomolecular data

Gene expression matrix files, VCFs


Sample level summary data

t-SNE plot coordinates