Produce Searchable Output Files

HISE supports a variety of analysis pipelines. Typically these pipelines generate one or more searchable output files containing analysis results plus one or more reports. 

scRNA-seq

In a simple scRNA-seq pipeline, the core scientific analysis method is a CellRanger alignment. This analysis produces a report as well as an output h5 file.

In the cell hashing pipeline, where a number of samples have first been barcoded and then have been mixed, CellRanger alignment is needed. In addition, a barcode recognition and counting process is used to understand the origin of each cell, after which the results are rearranged to produce an output h5 file for each sample.  

Both pipelines end with a labeling process, using a Seurat-based normalization and labeling method. 

The specific searchable output files are:

scATAC-seq

In the scATAC-seq pipeline, we have implemented CellRanger alignment, followed by a rigorous quality control process. This ensures that cells from the scATAC-seq pipeline are high quality, reduces the number of doublets, and are available in a variety of formats for downstream analysis (.arrow, fragments.tsv.gz, and .h5-formatted count matrices).

The result files of the pipeline are not immediately available for further analysis but require review and approval by a dedicated team of scientists. Once approved, this data will be sent through a labeling process using ArchR to provide an initial cell type label for each cell. Because this labeling process is independent of the ATAC quality control process, we will be able to independently improve our cell type labels over time utilizing new tools and references as they become available.

Follow up analysis on this data can be done with the HISE IDE, where various reading methods are available in the SDK, such as "readSCATACSeqWindowFile", "readSCATACSeqRegionFile", and others (also see below).

The specific searchable output files are:

TEA-seq combinations

TEA-seq is a trimodal single cell assay to simultaneously measure Transcriptomics, Protein Epitopes and Chromatin Accessibility. This multimodal single-cell assay provides a novel technology to identify cell type-specific gene regulation and expression grounded in phenotypically defined cell types. We see TEA-seq as an essential tool to expand our view of the full picture of immune cell state in health, disease and treatment.

In the TEA-Seq  pipeline, we have implemented CellRanger ARC alignment, followed by a rigorous quality control process. This ensures that cells from the TEA-Seq pipeline are high quality, reduces the number of doublets, and are available in a variety of formats for downstream analysis.

We support TEA-seq, hashed TEA-seq, plus the CITE-seq+scRNA-seq combination

The specific searchable output files are the same as for scRNA-seq and ATAC-seq (see the sections above), plus the following files:

Supervised Gating 

OpenCyto

Supervised gating refers to a gating approach where gates are automatically applied in the same manner as a subject matter expert would. First, ingested flow cytometry (FCS) files are examined for data irregularities when the data was generated on the instrument, using an R package called FlowCut. This step produces a new QC'd file with irregularities removed as well as a report on the QC findings. 

Next each QC'd file is sent through the supervised gating step itself, using a R package called OpenCyto. This step produces a report as well as cell population stats and MFI files. 

These result files are not immediately available for further analysis but require review and approval by a dedicated team of scientists. Once approved, this data will be available to others. If, however, a pipeline run is rejected, the team of scientists will adjust this data first before making it available to others. 

The specific searchable output files are:

Cyanno (default)

The Cyanno pipeline is a machine learning framework using various models for each panel to label the cell types from a dataset. Unlike OpenCyto there is no intermediate QC step. 

The specific searchable output files are:

Olink Proteomics

In the pipeline for Olink Proteomics, Olink provides a raw results file as well as a PDF report on missing data, and the appropriate samples are associated with either the results file or, if data is missing due to an analysis problem, the PDF report.

The specific searchable output files are:

5 Prime VDJ

In the 5 prime VDJ pipeline, the TCR/BCR contig information will come with scRNA-seq data as well. The core scientific analysis method is also CellRanger Alignment. This is a multimodal pipeline. We use cellrange-multi to align scRNA and contig sequence at the same time to improve cell calling. The Cellranger alignment will produce h5 output file for scRNA, and csv files for both TCR and BCR contig information.  

In the cell hashing pipeline. scRNA-seq data are processed the same as the simple scRNA-seq pipeline. The contig csv file of TCR/BCR will be demultiplexed by the HTO barcodes and merged by each sample in the pool.  

After the scRNA and contig file are dehashed and merged, the 5 prime VDJ pipeline will add the contig information for TCR/BCR, rearranged by each cell, into the metadata of h5 file. The 5 prime VDJ pipeline will also produce the csv files for both TCR/BCR that is arranged by each contig.  

The labeling pipeline is the same as with the scRNA-seq. The labels are based on the scRNA-seq data. The specific searchable output files are: 

Vizgen 

The Vizgen MERSCOPE is a spatial transcriptomics platform that permits the detection of multiple RNA transcripts within single cells in tandem with cellular location in a tissue, allowing for a unique insight into the physical location of different cell types within tissues and potentially the functional interactions of these cells and their neighbors.

The Vizgen pipeline takes decoded transcripts and cell segmented information from the Vizgen instrument and generates an annData object containing cell type annotations and marker genes as well as SVG and CSV files describing transcripts’ spatial properties. To generate these result files, cell segmentation gets initiated initially. This step generates QC reports and filtered parquet files. Next, each QC’d file runs through a decomposition process that generates marker genes and a metric process that generates files describing transcripts’ spatial properties.

The specific searchable output files are:

Fixed RNA

The fixed-RNA pipeline takes FASTQ files from short read sequencers and is composed of a number of steps to generate a decorated H5 file. The initial processing is done by 10x's CellRanger program which will generate a cell by gene matrix as well as perform demultiplexing if necessary. The pipeline then adds extra metadata information to these H5 files as well as generates a QC report. This QC report is used to determine if there are any samples, wells or pools that should be excluded from downstream processing.

Once the samples have been approved, the pipeline moves onto merging multiplexed samples into a single H5 file per sample. This is followed by cell type labeling with the option of choosing between a reference of choice. The current references available are for PBMCs and BMMCs and can be updated in the future. This is currently the final step in the pipeline and produces adecorated H5 file with additional metadata information as well as cell type labels.  Future versions of this pipeline will perform a "Doublet Finder" step as well as enable CITE-seq data in the H5.

The searchable output files are: