Produce Searchable Output Files
HISE supports a variety of analysis pipelines. Typically these pipelines generate one or more searchable output files containing analysis results plus one or more reports.
scRNA-seq
In a simple scRNA-seq pipeline, the core scientific analysis method is a CellRanger alignment. This analysis produces a report as well as an output h5 file.
In the cell hashing pipeline, where a number of samples have first been barcoded and then have been mixed, CellRanger alignment is needed. In addition, a barcode recognition and counting process is used to understand the origin of each cell, after which the results are rearranged to produce an output h5 file for each sample.
Both pipelines end with a labeling process, using a Seurat-based normalization and labeling method.
The specific searchable output files are:
- scRNA HTO merge summary. The HTO Summary report generated by merging all the wells in the batch.
- scRNA HTO count processing report. The HTO processing report per well
- scRNA seq labeled. A Seurat based labeled h5 file for a sample
- scRNA seq labeled report. A Seurat based labeled report for the entire batch
- scRNA cellRanger summary. The web_summary.html file generated by 10x cellranger count
scATAC-seq
In the scATAC-seq pipeline, we have implemented CellRanger alignment, followed by a rigorous quality control process. This ensures that cells from the scATAC-seq pipeline are high quality, reduces the number of doublets, and are available in a variety of formats for downstream analysis (.arrow, fragments.tsv.gz, and .h5-formatted count matrices).
The result files of the pipeline are not immediately available for further analysis but require review and approval by a dedicated team of scientists. Once approved, this data will be sent through a labeling process using ArchR to provide an initial cell type label for each cell. Because this labeling process is independent of the ATAC quality control process, we will be able to independently improve our cell type labels over time utilizing new tools and references as they become available.
Follow up analysis on this data can be done with the HISE IDE, where various reading methods are available in the SDK, such as "readSCATACSeqWindowFile", "readSCATACSeqRegionFile", and others (also see below).
The specific searchable output files are:
- ATAC archr label report. This is the Label Report generated using ArchR to provide an initial cell type label
- ATAC archr label results. Results of cell type labeling using ArchR's addGeneIntegrationMatrix against a Seurat scRNA-seq reference.
- ATAC assembly archer arrow. An .arrow file generated by ArchR, which can be used as input to ArchR Projects for downstream analysis
- ATAC barcode QC report. Quality statistics report for each scATAC-seq well
- ATAC assembly filtered fragments tsv gz. Unique fragment positions for cell barcodes that pass QC and doublet filtering.
- ATAC assembly read counts gene bodies h5. A gene body (rows) x cells (columns) count matrix stored in HDF5 format similar to 10x Genomics scRNA-seq outputs. Genes are defined by ENSEMBL v93, and filtered to match the scRNA-seq reference.
- ATAC assembly read counts per region h5. A count matrix for TSS regions (+/- 2kb, rows) x cells (columns) stored in HDF5 format similar to 10x Genomics scRNA-seq outputs. Genes are defined as for gene bodies, above.
- ATAC assembly read counts per window h5. A whole-genome 5kb window count matrix with windows (rows) x cells (columns) stored in HDF5 format similar to 10x Genomics scRNA-seq outputs.
- ATAC batch summary report. Joint QC metrics summary report for all wells in a batch.
- ATAC cellranger summary report. The web_summary.html report generated by cellranger-atac.
TEA-seq combinations
TEA-seq is a trimodal single cell assay to simultaneously measure Transcriptomics, Protein Epitopes and Chromatin Accessibility. This multimodal single-cell assay provides a novel technology to identify cell type-specific gene regulation and expression grounded in phenotypically defined cell types. We see TEA-seq as an essential tool to expand our view of the full picture of immune cell state in health, disease and treatment.
In the TEA-Seq pipeline, we have implemented CellRanger ARC alignment, followed by a rigorous quality control process. This ensures that cells from the TEA-Seq pipeline are high quality, reduces the number of doublets, and are available in a variety of formats for downstream analysis.
We support TEA-seq, hashed TEA-seq, plus the CITE-seq+scRNA-seq combination
The specific searchable output files are the same as for scRNA-seq and ATAC-seq (see the sections above), plus the following files:
- TEA-seq adt well report. For TEA-seq, the adt report per well
- TEA-seq cellranger arc summary report. The cellranger ARC batch level report
Supervised Gating
OpenCyto
Supervised gating refers to a gating approach where gates are automatically applied in the same manner as a subject matter expert would. First, ingested flow cytometry (FCS) files are examined for data irregularities when the data was generated on the instrument, using an R package called FlowCut. This step produces a new QC'd file with irregularities removed as well as a report on the QC findings.
Next each QC'd file is sent through the supervised gating step itself, using a R package called OpenCyto. This step produces a report as well as cell population stats and MFI files.
These result files are not immediately available for further analysis but require review and approval by a dedicated team of scientists. Once approved, this data will be available to others. If, however, a pipeline run is rejected, the team of scientists will adjust this data first before making it available to others.
The specific searchable output files are:
- FlowCytometry: Cytometry - QC FCS. These are the FCS files that have undergone quality control using FlowCut to remove bad events.
- FlowCytometry-decoration-report: Cytometry - QC Report. This is a QC report with plots of events that were cut out.
- FlowCytometry-supervised-stats: Cytometry - Supervised Gating Population Counts. This is a csv of the population counts for a kit generated with OpenCyto packages.
- FlowCytometry-supervised-report: Cytometry - Supervised Gating Gating Report. This is a report with plots that show where gates generated by OpenCyto were drawn.
- FlowCytometry-supervised-mfis: Cytometry - Supervised Gating MFI. This is a csv of the MFIs for a kit generated with OpenCyto packages.
- FlowCytometry-supervised-hierarchy-report: Cytometry - Supervised Gating Gating Hierarchy. This png is a graph that shows the hierarchical gating structure.
- FlowCytometry-supervised-gating-set-pb: Cytometry - Supervised Gating Set PB. This file is a part of a set of three files. It is an output from using the save_gs function, so that the gating set can be loaded into your IDE.
- FlowCytometry-supervised-gating-set-h5: Cytometry - Supervised Gating Set H5. This file is a part of a set of three files. It is an output from using the save_gs function, so that the gating set can be loaded into your IDE.
- FlowCytometry-supervised-gating-set-gs: Cytometry - Supervised Gating Set GS. This file is a part of a set of three files. It is an output from using the save_gs function, so that the gating set can be loaded into your IDE.
- FlowCytometry-supervised-comp: Cytometry - Supervised Gating Compensation. This file is a csv of the compensation that was used during the supervised gating process.
Cyanno (default)
The Cyanno pipeline is a machine learning framework using various models for each panel to label the cell types from a dataset. Unlike OpenCyto there is no intermediate QC step.
The specific searchable output files are:
- Cyanno - Labeled Expr CSV: FlowCytometry-labeled-expr-csv. A CSV report of each cell and its labeled cell type
- Cyanno - Prediction Report PDF: FlowCytometry-prediction-report. This is a collection of plots visualizing the cell population reports
- Cyanno - Summary Frequency Stats CSV: FlowCytometry-summary-frequency-stats. A csv containing summaries of cell populations.
- Cytometry - Live Logicle Transform CSV: FlowCytometry-decoration-report-csv. This is the input file of the cyanno process
Olink Proteomics
In the pipeline for Olink Proteomics, Olink provides a raw results file as well as a PDF report on missing data, and the appropriate samples are associated with either the results file or, if data is missing due to an analysis problem, the PDF report.
The specific searchable output files are:
- Olink: Olink results. This is an excel file with the results for an olink batch
- OlinkReport: Olink results PDF This is certificate of analysis for an olink batch
5 Prime VDJ
In the 5 prime VDJ pipeline, the TCR/BCR contig information will come with scRNA-seq data as well. The core scientific analysis method is also CellRanger Alignment. This is a multimodal pipeline. We use cellrange-multi to align scRNA and contig sequence at the same time to improve cell calling. The Cellranger alignment will produce h5 output file for scRNA, and csv files for both TCR and BCR contig information.
In the cell hashing pipeline. scRNA-seq data are processed the same as the simple scRNA-seq pipeline. The contig csv file of TCR/BCR will be demultiplexed by the HTO barcodes and merged by each sample in the pool.
After the scRNA and contig file are dehashed and merged, the 5 prime VDJ pipeline will add the contig information for TCR/BCR, rearranged by each cell, into the metadata of h5 file. The 5 prime VDJ pipeline will also produce the csv files for both TCR/BCR that is arranged by each contig.
The labeling pipeline is the same as with the scRNA-seq. The labels are based on the scRNA-seq data. The specific searchable output files are:
- vdj-main-batch-summary-report. This report contains all quality metrics from cellranger output, HTO QC, ADT QC, scRNA QC, and some basic QC of TCR/BCR contig.
- scRNA HTO merge summary. The HTO Summary report generated by merging all the wells in the batch.
- scRNA HTO count processing report. The HTO processing report per well
- scRNA seq labeled. A Seurat based labeled h5 file for a sample. The contig information of TCR/BCR is stored on the meta data.
- scRNA seq labeled report. A Seurat based labeled report for the entire batch
- scRNA cellRanger summary. The web_summary.html file generated by 10x cellranger multi
- TCR contig. The file will be in csv format. The contig information of T cell receptors arranged each chain.
- BCR contig. The file will be in csv format. The contig information of B cell receptors arranged each chain.
Vizgen
The Vizgen MERSCOPE is a spatial transcriptomics platform that permits the detection of multiple RNA transcripts within single cells in tandem with cellular location in a tissue, allowing for a unique insight into the physical location of different cell types within tissues and potentially the functional interactions of these cells and their neighbors.
The Vizgen pipeline takes decoded transcripts and cell segmented information from the Vizgen instrument and generates an annData object containing cell type annotations and marker genes as well as SVG and CSV files describing transcripts’ spatial properties. To generate these result files, cell segmentation gets initiated initially. This step generates QC reports and filtered parquet files. Next, each QC’d file runs through a decomposition process that generates marker genes and a metric process that generates files describing transcripts’ spatial properties.
The specific searchable output files are:
Spatial-analysis-filter
Spatial-analysis-qc
Spatial-analysis-metrics-umap
Spatial-analysis-metrics-moransl
Spatial-analysis-metrics-view
Fixed RNA
The fixed-RNA pipeline takes FASTQ files from short read sequencers and is composed of a number of steps to generate a decorated H5 file. The initial processing is done by 10x's CellRanger program which will generate a cell by gene matrix as well as perform demultiplexing if necessary. The pipeline then adds extra metadata information to these H5 files as well as generates a QC report. This QC report is used to determine if there are any samples, wells or pools that should be excluded from downstream processing.
Once the samples have been approved, the pipeline moves onto merging multiplexed samples into a single H5 file per sample. This is followed by cell type labeling with the option of choosing between a reference of choice. The current references available are for PBMCs and BMMCs and can be updated in the future. This is currently the final step in the pipeline and produces adecorated H5 file with additional metadata information as well as cell type labels. Future versions of this pipeline will perform a "Doublet Finder" step as well as enable CITE-seq data in the H5.
The searchable output files are:
- FRNA-h5-labeled: A labeled h5 for a given sample