Last modified 2025-10-13

Produce Searchable Output Files

At a Glance

HISE supports a variety of analysis pipelines that generate searchable output files. These files contain analysis results plus one or more reports. This document discusses the specific files each analysis pipeline produces. If you have questions, contact Support .

	Abbreviations Key
ADT	antibody-derived tag	PB	probability binning
ARC	ATAC + RNA chromium	PBMCs	peripheral blood mononuclear cells
archR	analysis of regulatory chromatin in R	scATAC-seq	single-cell assay transposase-accessible chromatin sequencing
BMMCs	bone marrow mononuclear cells	scRNA-seq	single-cell RNA sequencing
CITE-seq	cellular indexing of transcriptomes and epitopes by sequencing	SLIMS	simplified laboratory information management system
CSV	comma-separated value	TSS	transcription start site
ELN	electronic laboratory notebook	tsv.gz	tab-separated values, GNU zip
FQ	FASTQ	UMI	unique molecular identifier
GS	gating set	V(D)J	variable, (diversity), and joining [gene segments]
HTO	hashtag oligo

Methods

TEA-seq combinations

TEA-seq is a trimodal single-cell assay that simultaneously measures transcriptomics, protein epitopes, and chromatin accessibility. This assay identifies cell type–specific gene regulation and expression grounded in phenotypically defined cell types.

In the TEA-seq pipeline, we couple Cell Ranger ARC with a rigorous QC process to ensure that cells from the TEA-seq pipeline are high quality, to reduce the number of doublets, and to make cells available in a variety of formats for downstream analysis.

In addition to TEA-seq, HISE supports the analysis of data generated by hashed TEA-seq and by CITE-seq + scRNA-seq. These methods integrate multiple layers of biological information at the single-cell level to yield a comprehensive view of cellular functioning.

These techniques produce the same searchable output files as scRNA-seq and ATAC-seq (see above), as well as the files listed in Table 3.

Searchable Output File Types Specific to TEA-seq Combinations
Output file type	Description
adt-batch-summary-report	A summary report of ADT data, including quality metrics and overall statistics.
adt-tea-seq-well-report	A detailed report for individual wells in a TEA-seq experiment, including per-well quality metrics and protein expression data.
tea-main-batch-summary-report	A summary report containing combined gene expression statistics, protein level metrics, and chromatin accessibility data.

scATAC-seq

In the scATAC-seq pipeline, we implement Cell Ranger alignment, followed by a rigorous quality control process to ensure that cells from the scATAC-seq pipeline are high quality, to reduce the number of doublets, and to make the cells available for downstream analysis in a variety of formats (.arrow, fragments.tsv.gz, and H5-formatted count matrices).

The pipeline results files are not immediately available for further analysis, but require review and approval by a dedicated team of scientists. Once approved, the data is put through a labeling process using archR to provide an initial cell-type label for each cell.

The specific searchable output files for the scATAC-seq pipeline are summarized in the following table.

scATAC-seq Searchable Output File Types
Output file type	Description
atac-archr-label-results	Results of cell-type labeling using archR's addGeneIntegrationMatrix against a Seurat scRNA-seq reference.
atac-assembly archr-arrow	An `.arrow` file generated by archR that can be used as an input to archR projects for downstream analysis.
atac-assembly-filtered-fragments-tsv-gz	File containing unique fragment positions for cell barcodes that pass QC and doublet filtering.
atac-assembly-read-counts-gene bodies-h5	A matrix in which each row represents a gene body, and each column represents a cell. The values show how many ATAC-seq reads align to each gene body in each cell. This read count matrix is stored in HDF5 format, similar to 10x Genomics scRNA-seq outputs. Genes are defined by Ensembl v93 and filtered to match the scRNA-seq reference.
atac-assembly-read-counts-per-region-h5	A count matrix for TSS regions (+/- 2 KB, rows) x cells (columns) stored in HDF5 format, similar to 10x Genomics scRNA-seq outputs. Genes are defined as for gene bodies, above.
atac-assembly-read-counts-per-window-h5	A whole-genome 5 KB window-count matrix with windows (rows) x cells (columns). This matrix is stored in HDF5 format, similar to the scRNA-seq outputs from 10x Genomics.
cellranger-atac-possorted_genome	The `web_summary.html` report generated by Cell Ranger-ATAC.

scRNA-seq

In a simple scRNA-seq pipeline, the core scientific analysis method is CellRanger alignment. This analysis produces a report and an output H5 file.

In the cell hashing pipeline, where a number of samples have been barcoded and then mixed, CellRanger alignment is needed. In addition, a barcode recognition and counting process is used to identify the origin of each cell so that the results can be rearranged to produce an output H5 file for each sample. Both pipelines end with a labeling process, using a Seurat-based normalization and labeling method.

The specific searchable output file types for the scRNA-seq pipeline are summarized in the following table.

scRNA-seq Searchable Output File Types
Output file type	Description
scRNA-seq-FQ-file	scRNA-seq FASTQ file.
scRNA-seq-manifest	Allows creation of SLIMS manifest in ELN.
filtered-feature-BC-matrix-H5	Compressed binary HDF5 file that stores gene expression count matrices, cell barcodes, and feature metadata.
molecule-info-H5	HDF5 file that contains detailed per-molecule information—such as cell barcode, UMI, feature assignment, and library/source metadata—for each molecule.
raw-feature-BC-matrix-H5	HDF5 file containing feature-by-barcode matrices that include every barcode with at least one read.
rna-add-metadata-report	File or report that documents the integration of sample and experiment metadata (such as cell type, experimental condition, or batch) to RNA-seq data.
scRNA-seq-CellHashing-Main-QC-report	Pipeline result report file that contains cell hashing and sample multiplexing info.
scRNA-seq-labeled	A Seurat-based, labeled H5 file. Visualizes metadata and clinical features of a sample and patient to investigate a single time point or examine a longitudinal shifts in a patient population.
scRNA-seq-merged	An H5 file containing scRNA-seq data from multiple batches.
scRNA-seq-tenx-report	A web_summary.html file generated by Cell Ranger containing QC metrics for 10x Genomics scRNA-seq data.

Fixed RNA

The Fixed RNA pipeline takes FASTQ files from short-read sequencers and applies a number of steps to generate a decorated H5 file. The initial processing is done by the 10x Genomics Cell Ranger tool, which produces a cell-by-gene matrix and performs demultiplexing if necessary. The pipeline then adds extra metadata to these H5 files and generates a QC report. This QC report determines if any samples, wells, or pools should be excluded from downstream processing.

Once the samples are approved, the pipeline merges the multiplexed samples into a single H5 file per sample. This consolidation is followed by cell-type labeling, using one of the currently available references (PBMCs or BMMCs). This final step in the pipeline produces a decorated H5 file with additional metadata and cell-type labels.

The searchable output files are summarized in the following table.

Fixed RNA Searchable Output File Types
Output file type	Description
frna-labeled-h5	A labeled H5 file for a given sample.
frna-qc-report	An fRNA QC report.
celltypist-csv	A CellTypist CSV file.
celltypist-labeled-h5	An fRNA CellTypist H5 file.
frna-Seq-tenx-report	A 10x pipeline result report file.
frna-labeled-h5	Fixed RNA labeled H5 file.
celltypist-csv	Fixed RNA CellTypist.
frna-sample-molecule-info-h5	Fixed RNA sample molecule info H5.
frna-sample-raw-feature-bc-matrix-h5	Fixed RNA sample raw feature BC matrix H5.
frna-sample-raw-probe-bc-matrix-h5	Fixed RNA sample raw probe BC matrix H5.

CITE-seq

CITE-seq integrates both ADT and transcriptomic (RNA) workflows, generating results that include file groups from each pipeline. This dual approach profiles surface proteins and gene expression simultaneously, so the output files represent a comprehensive combination of both data types. The Tag Counts CSV (tag-counts-csv) file for CITE-seq data contains ADT counts for each cell barcode, letting researchers quantify relative surface protein abundance at single-cell resolution. The file is structured as a matrix, with cell barcodes as rows and antibody/epitope tags as columns. Each entry shows the UMI (or read) count for a given antibody in a given cell. For other files, see the scRNA section.

V(D)J-seq

In the V(D)J-seq pipeline, the T cell receptor (TCR)/B cell receptor (BCR) contig information also comes with scRNA-seq data. The core scientific analysis method is also Cell Ranger alignment. This is a multimodal pipeline. Cell Ranger multi aligns the scRNA and contig sequence and improves cell calling. The Cell Ranger alignment produces an H5 output file for scRNA, and CSV files for both TCR and BCR contig information.

In the cell hashing pipeline, scRNA-seq data are processed the same as in the simple scRNA-seq pipeline. The contig CSV file of TCR/BCR is demultiplexed by the HTO barcodes and merged with each sample in the pool.

After the scRNA and contig file are dehashed and merged, the V(D)J pipeline adds the contig information for TCR/BCR, arranged by cell, into the metadata of an H5 file. The V(D)J pipeline also produces the TCR/BCR CSV files arranged by contig. The labeling pipeline is the same as for scRNA-seq, and the labels are based on scRNA-seq data.

The specific searchable output files are shown in the following table.

V(D)J Searchable Output File Types
Output file type	Description
vdj-main-batch-summary-report	A file containing all quality metrics from Cell Ranger output, HTO QC, ADT QC, and scRNA QC, as well as some basic QC of TCR/BCR contig.
scRNA HTO merge summary	A report generated by merging all the wells in the batch.
scRNA HTO count processing report	A report quantifying the HTO reads from multiplexed single-cell RNA sequencing experiments, including metrics on cell barcode identification and HTO assignment.
scRNA seq labeled	A Seurat-based, labeled H5 file for a sample. The contig information of TCR/BCR is stored in the metadata.
scRNA seq labeled report	A Seurat-based, labeled report for the entire batch.
scRNA CellRanger summary	The `web_summary.html` file generated by 10x Genomics Cell Ranger multi.
TCR contig	A CSV file containing the contig information of T cell receptors, arranged by chain type.
BCR contig	A CSV file containing the contig information of B cell receptors, arranged by chain type.

Flow Cytometry

Flow Cytometry searchable output files provide structured, standardized representations of experimental data and analysis results from flow cytometry assays in order to facilitate rapid data interpretation and downstream analysis. The files are typically formatted for compatibility with widely used platforms like FlowJo and other cytometry analysis tools.

Flow Cytometry Searchable Output File Types
Output file type	Description
FlowCytometry-decoration-report-csv	CSV of decoration metrics and annotations per sample or population.
FlowCytometry-decoration-report-html	HTML visualization of decoration metrics and annotations report.
FlowCytometry-labeled-expr-csv	CSV with expression values labeled by cell type or population.
FlowCytometry-prediction-report	Predicted labels or classifications for each event or cell.
FlowCytometry-summary-frequency-stats	Summary statistics reporting population frequency.
FlowCytometry-supervised-comp	Compensation matrix from supervised fluorescence spillover correction.
FlowCytometry-supervised-gating-set-gs	Gating hierarchy in GS format for analysis reproducibility.
FlowCytometry-supervised-gating-set-h5	Gating hierarchy saved in HDF5 format.
FlowCytometry-supervised-gating-set-pb	Gating set stored in PB format.
FlowCytometry-supervised-mfis	Mean fluorescence intensity statistics for supervised populations.
FlowCytometry-supervised-stats	Summary statistics from supervised population analysis and quantification steps.

CyAnno

The CyAnno pipeline is a machine learning framework that uses various models for each panel to label the cell types from a dataset. It requires no intermediate QC step.

The specific searchable output files for this pipeline are shown in the following table.

CyAnno Searchable Output File Types
Output file type	Description
FlowCytometry-labeled-expr-csv	A CSV report of each cell and its labeled cell type.
FlowCytometry-prediction-report	A collection of plots visualizing the cell population reports.
FlowCytometry-summary-frequency-stats	A CSV file containing cell population summaries.

Olink Proteomics

In the pipeline for Olink Proteomics, Olink provides a raw results file and a PDF report on data that's missing because of an analysis problem. The appropriate samples are associated with either the results file or the missing data report.

The specific searchable output files are shown in the following table.

Olink Searchable Output File Types
Output file type	Description
Olink-results	A PDF certificate of analysis for an Olink batch.

Xenium

The Xenium pipeline enables spatial transcriptomics analysis within HISE, converting raw Xenium data into formats that can be analyzed, explored, and visualized in HISE. For a list of Xenium output files, see Table 6 in the Xenium documentation.

Related Resources

Understand Automated Pipelines

Configure a Pipeline (Tutorial)