Last modified 2025-05-27 |
Use the Cell Ranger Pipeline
At a Glance
This document describes the HISE Cell Ranger pipeline, which converts flex RNA-seq data from Chromium experiments into analysis-ready formats for computation, visualization, and exploration in a NextGen IDE. For details on the 10x Genomics Cell Ranger platform itself, see the Cell Ranger documentation and release notes.
![]() | Abbreviations Key | HTML | hypertext markup language | |
BAM | binary alignment map | MEX | market exchange format | |
bc | barcode | QC | quality control | |
csv | comma-separated values | t-SNE | t-distributed stochastic neighbor embedding | |
GEM | gel bead in emulsion | UMAP | uniform manifold approximation and projection | |
GEX | gene expression | UMI | unique molecular identifier | |
h5, HDF5 | hierarchical data format, version 5 | V(D)J | variable, diversity, and joining (gene segments) | |
HISE | Human Immune System Explorer |
Preprocessing
Cell Ranger v9.0.1 processes raw sequencing data from 10x Genomics Chromium experiments. This preprocessing occurs before the data is ingested into HISE. Nevertheless, it's helpful to understand what happens during this stage. The key steps are summarized in Table 1.
Table 1: Preprocessing steps
Step | Description |
Alignment and quantification | Aligns FASTQ files to a reference genome. Reads are assigned to cell barcodes and UMIs, and a feature-barcode matrix is generated using cellranger count (for individual samples) or cellranger multi (for multiplexed samples). |
Cell calling | Identifies true cells among all detected barcodes using an algorithm that distinguishes real cells from empty droplets. |
Feature barcoding | Outputs a feature-barcode matrix that includes gene expression counts and feature barcode counts for each cell. |
Quality control | Generates summary metrics and HTML reports for each run and sample, including key QC indicators like median UMI counts per cell, median genes per cell, and sequencing saturation. |
Ingestion
After preprocessing, HISE moves Cell Ranger outputs, such as filtered matrices, metrics, and web summary, into a watchfolder. This automated pipeline process triggers ingestion of the files into the associated Project Store. During ingestion, HISE attaches sample metadata to the files for transparent downstream analysis and tracking.
Table 2 summarizes the output files HISE creates during ingestion. Not all files are generated on every run.
Table 2: Output files
File Type | Example File Name | Content/Purpose |
Web summary | web_summary.html | Interactive HTML report for QC and metrics review |
Gene expression matrix | filtered_feature_bc_matrix.h5 | Cell-by-gene expression matrix (raw and filtered) |
Feature-barcode matrix | filtered_feature_bc_matrix.h5 | Matrix that includes gene expression and feature barcodes |
Metrics summary | metrics_summary.csv | Run- and sample-level QC metrics |
BAM file | possorted_genome_bam.bam | Aligned reads with cell barcode and UMI tags |
Cloupe file | new-cloupe.cloupe | Visualization in the 10x Genomics Loupe browser |
Web summary file
Let's take a closer look at the Cell Ranger multi QC web summary, the most comprehensive of these output files. This detailed report includes sequencing metrics, mapping rates and distribution, cell calling metrics, and quality indicators, as shown in the accompanying sample report. Scroll down for an explanation of each item marked in the sample.
A: Web summary file: Cell calling quality
Metric | Description |
Estimated number of cells | The number of cell barcodes identified as true cells. Matches expected cell recovery (for example, 10,000 cells for Chromium X). |
Confidently mapped reads in cells | Percentage of reads confidently mapped to the genome and associated with valid cell barcodes. Indicates successful cell capture and sequencing. |
Fraction of initial cell barcodes passing high occupancy GEM filtering | Fraction of barcodes retained after excluding low-occupancy GEMs (likely empty droplets or debris). Measures sample quality and cell viability. |
B: Web summary file: GEX barcode rank plot
A GEX barcode rank plot visualizes the quality of cell calling by distinguishing cell-containing droplets from empty droplets or background noise. The X axis ranks barcodes by UMI count, and the Y axis shows the log-transformed UMI counts per barcode. Blue indicates barcodes classified as cells, whereas gray identifies background (non-cell) barcodes.
C: Mapping quality
Metric | Description | Value or Range |
Reads mapped to probe set | Percentage of reads that align to any probe in the probe set, regardless of confidence. | >90% rate of alignment |
Reads confidently mapped to probe set | Percentage of reads that confidently align to a probe in the filtered (high-quality, non-ambiguous) probe set. | 60%–70% of total reads |
Reads confidently mapped to filtered probe set | A subset of probes that meet quality and specificity standards. | 50%–65% of total reads |
Reads half-mapped to probe set | Reads that partially align to a probe (one end/portion matches, but not the full read). | Typically low (<5%) |
Reads split-mapped to probe set | Reads that align to two non-contiguous regions of a probe, indicating possible splicing or artifacts. | Typically very low (<1%) |
D: Web summary file: Sequencing quality
Metric | Description | Value or Range |
Fastq ID | Identifier assigned to FASTQ files, which contain raw sequencing reads. | Variable |
Number of reads | Total number of reads generated by the sequencer. | Depends on the experiment |
Mean reads per cell | Average number of sequencing reads assigned to each cell. To calculate this figure, divide the total number of reads mapped to cells by the number of cells identified. Deep profiling requires more than 50,000 reads per cell. | 50,000–100,000 |
Sequencing saturation | Fraction of reads from duplicate UMIs, indicating sufficiency of sequencing depth. The higher the saturation, the lower the likelihood that additional sequencing will yield new information (unique transcripts). | 90%–95% |
Valid barcodes | Percentage of reads with recognized cell barcodes. A barcode is valid if it appears on a list of known, predefined barcode sequences for the specified assay, has no sequencing errors or ambiguous bases, and is the correct length and format. | 95%–99% |
Valid UMIs | Percentage of error-free UMIs that match expected patterns. | 80%–95% |
Valid GEM barcodes | Percentage of allow-listed, error-free GEM barcodes. | 90%–98% |
Valid probe barcodes | Percentage of error-free probe barcodes that match designated sequences. | 85%–95% |
Q30 barcodes | Percentage of barcode bases with a Phred quality score greater than or equal to 30 (99.9% accuracy). | >85% |
Q30 GEM barcodes | Percentage of GEM barcode bases with Q30 scores. | >85% |
Q30 probe barcodes | Percentage of probe barcode bases with Q30 scores. | >85% |
Q30 UMI | Percentage of UMI bases with Q30 scores. | >85% |
Q30 RNA read | Percentage of RNA read bases with Q30 scores. | >85% |
E: Web summary file: Metrics per probe barcode
Column | Description | Example Values |
Probe barcode ID | Unique identifier for each probe barcode sequence. Used to tag and identify specific probes. | PB0001, PB0002 |
Sample ID | Identifier for the biological sample or library from which the data was generated. | Sample_01, Patient_A |
UMIs per probe barcode | Number of UMIs for each probe barcode. Represents the number of unique transcript molecules detected by that probe. | 15, 120, 300 |
Cells per probe barcode | Number of cells in which each probe barcode was detected. | 10, 45, 200 |
F: Web summary file: Sequencing saturation visualization
The sequencing saturation visualization shows the relationship between sequencing depth and transcript detection efficiency. The curve rises steeply and then plateaus as saturation nears 100%.
G: Web summary file: Median genes per cell visualization
This figure offers a visual and numeric summary of the median number of genes detected per cell-associated barcode. It shows the median value of the number of genes detected across all cells in the dataset. This value reflects the complexity of the cell sample, with higher values indicating more genes detected per cell.
H: Web summary file: UMIs from Genomic DNA
High UMI diversity means high library complexity and good genome sampling. This visualization shows the molecular count and the number of unique UMIs. It distinguishes between biological molecules and PCR duplicates. After sequencing, reads with the same UMI that map to the same genomic location are considered PCR duplicates and are collapsed into a single consensus read.
Exploration
After ingestion into HISE, you can analyze the Cell Ranger pipeline outputs interactively in a NextGen IDE (Jupyter Notebook). The accompanying image offers suggestions for exploring the data.
Related Resources
Configure a Pipeline (Tutorial)