Last modified 2025-04-16

Support

Use the Xenium Pipeline

Abbreviations Key
AnnDataannotated data [Python/R package
for storing spatial matrices]
MEXmarket exchange [format]
CSVcomma-separated valuesMIPmaximum-intensity projection
FFPEformalin-fixed paraffin-embedded
[tissue preservation method]
OMEopen microscopy environment
gzgzippedPCAprincipal component analysis
H5AD.h5 AnnData [format]QCprincipal component analysis
HDF5Hierarchical Data Format, version 5
[proprietary file format]
scRNA-seqsingle-cell RNA sequencing
H&E/IFhematoxylin and eosin
immunofluorescence [images]
STalignspatial transcriptomics alignment
HISEHuman Immune System ExplorerUMAPuniform manifold approximation
and projection
IDEintegrated development environmentVMvirtual machine
ISDinstrument sensor data

At a Glance

The Xenium pipeline enables spatial transcriptomics analysis within HISE, converting raw Xenium data into formats that can be analyzed, explored, and visualized in HISE.

This document describes the pipeline in three broad stages or steps, summarized in Table 1 and depicted in the following image.

TABLE 1
StepDescriptionLocations
Step 1: PreprocessingRaw output from the Xenium instrument is preprocessed, quality checked, labeled with spatial and cell type data, and summarized. Some of this data is archived, and the remaining data is prepared for ingestion into HISE.Output on Xenium, preprocessed on a VM or workstation, and uploaded to cloud storage
Step 2: IngestionData is ingested into HISE, where it is further decorated (associated with metadata), analyzed, labeled, and made available for downstream analysisHISE Project Store (cloud storage and metadata management)
Step 3: ExplorationData is explored, visualized, and further analyzed using advanced search queries, integration with other datasets or data types, and custom plots, such as spatial maps or dimensionality reduction visualizationsHISE NextGen IDE (Jupyter Notebook environment)

 Preprocessing

It's beyond the scope of this document to cover all of the preprocessing functions in detail, but let's briefly explore how Xenium handles data during this stage. For each FFPE tissue sample, Xenium generates two forms of raw data:

  1. Initial raw data. First, the 10x machine generates a massive amount of raw ISD data. One Xenium slide with the entire imageable area selected produces its own directory ranging from 7–60 GB, depending on the tissue and the panel. A single run typically contains four slides, for a total of 28-240 GB. (For examples of public datasets, see the 10x Genomics website.) The raw Xenium output files are listed in Table 2.
TABLE 2
File descriptionFile typeFile sizeExample file name
Web summaryHTML14,834 KBanalysis_summary.html
Gene expression metricsCSV1 KBmetrics_summary.csv
Cell-feature matrixMEX
HDF5
Zarr (zipped)

H5: 46,887 KB
Zarr: 67,433 KB

cell_feature_matrix.h5
cell_feature_matrix.zarr.zip
Transcript dataCSV (gzipped)
Parquet
Zarr (zipped)
CSV: 3,985,959 KB
Parquet: 1,868,732 KB
Zarr: 2,477,239 KB
transcripts.csv.gz
transcripts.parquet
Cell summary fileCSV (zipped)
Parquet
Zarr (zipped)
CSV: 39,067 KB
Parquet: 16,840 KB
Zarr: 1,756,885 KB
cells.csv.gz
Panel fileJSON137 KBgene_panel.json
MorphologyOME-TIF33,080,064 KBmorphology.ome.tif
morphology_focus.ome.tif
morphology_mip.ome.tif
Secondary analysis resultsCSV
Zarr (zipped)

Zarr: 8,136 KB
metrics_summary.csv
Cell and nucleus segmentation filesZarr (zipped)
CSV (zipped)
Parquet
Zarr: 20,893 KB
CSV: 108,623 KB
cell_boundaries.csv.gz
nucleus_boundaries.csv.gz
nucleus_boundaries.parquet
Xenium experiment fileJSON2 KBexperiment.xenium

2. Preprocessed raw data. The machine then parses the data in a smaller raw data set that contains decoded transcript information. This transient data remains active only until cell segmentation, at which point it's also archived. A directory containing the resulting machine-processed data is created. The Xenium preprocessing output directory includes the files listed in Table 3.

TABLE 3
StageOutputExample
Preprocessingxenium_<tissue>_adata_filtered.h5adNot pictured
QCPDF reports (for example, nucleus/cell area plots)
Cell labelingCell-type predictions
Neighborhood analysisSpatial cluster plots created using CellCharter
Summary report<tissue>_pipeline_summary.html

Ingestion

This is where HISE enters the picture. The necessary configuration files are ingested (Table 4). Then the preprocessed Xenium files (see Table 3) are moved into a watchfolder, which triggers ingestion of the data into HISE. (Be sure to follow this sequence, since the pipeline run will fail if the tar file is ingested before the configuration files.)

TABLE 4
Config fileDescription
xenium_pipeline_config-colon.json  Colon tissue-specific parameters
xenium_pipeline_config-default.json Default settings for all tissues
xenium_pipeline_config-ln.jsonLymph node-specific parameters
xenium_pipeline_config-tonsil.json Tonsil tissue-specific parameters

The data ingestion workflow, including preprocessing, is shown in the accompanying figure.



During ingest, the directory is unzipped into /processed_data/. The files in this new directory are listed in Table 5. 

TABLE 5
File typeExample file nameContentPurpose
Binary HDF5 file cell_feature_matrix.h5Cell-by-gene expression matrixServes as the primary quantitative input for downstream analysis and AnnData conversion
CSV (zipped)cells.csv.gzCell-level metadataSupplies QC info for each cell
Zarr (zipped)cells.zarr.zipSegmentation masks and boundaries for cells and nucleiUsed for spatial mapping, cell segmentation, and morphology analysis
CSVmetrics_summary.csvRun-level and sample-level metricsUsed to assess run quality and to fetch sample/region IDs for pipeline processing

TIF or

OME-TIF

Xenium_FFPE_Human_Breast_Cancer_Rep1_he_image.tif

GSM7780153_Post-Xenium_HE_Rep1.ome.tif

Post-Xenium H&E/IF imagesVisualization and spatial context

Unlike other types of data, Xenium data doesn't require a sample or submission sheet. You can simply ingest the raw data into HISE, which handles organization, validation, and metadata extraction for you. A filename looks something like this:

202208311221_EXP-00422-LN-FFPE-NDGFKF_XETG00123_region_A1

Output of results

Table 6 contains a list of downloadable/servable result file types. 

TABLE 6
File TypeKindNameFile TypeKindName
control-xenium-tar-contentWildcardControl Xenium Tar Contentxenium-filtered-h5adH5Xenium Filtered H5ad
scvi-model.PTSCVI Modelxenium-filtered-qc-pdfPDFfiltered_qc.pdf
xenium-10-x-reportHTMLXenium 10X Reportxenium-gene-panel-jsonJSONXenium Gene Panel Json
xenium-analysis-zarrZarrXenium Analysis Zarrxenium-h5adH5Xenium H5ad
xenium-cell-boundaries-csvCSV-GZXenium Cell Boundaries CSVxenium-metrics-summary-csvCSVXenium Metrics Summary CSV
xenium-cell-boundaries-parquetParquetXenium Cell Boundaries Parquetxenium-morphology-0-tifTIFXenium Morphology 0 Tif
xenium-cell-composition-counts-csvCSVXenium Cell Composition Counts Csvxenium-morphology-1-tifTIFXenium Morphology 1 Tif
xenium-cell-composition-fractions-csvCSVXenium Cell Composition Fractions Csvxenium-morphology-2-tifTIFXenium Morphology 2 Tif
xenium-cell-feature-h5H5Xenium Cell Feature H5xenium-morphology-3-tifTIFXenium Morphology 3 Tif
xenium-cell-feature-zarrZarr.ZipXenium Cell Feature Zarrxenium-morphology-ome-tifTIFXenium Morphology OME TIF
xenium-cellcharter-cluster-pdfPDFXenium Cellcharter Cluster Pdfxenium-nucleus-boundaries-csvCSVXenium Nucleus Boundaries CSV
xenium-cellcharter-h5adH5Xenium Cellcharter H5adxenium-nucleus-boundaries-parquetParquetXenium Nucleus Boundaries Parquet
xenium-cellcharter-predictions-joblibJobLibXenium Cellcharter Predictions Joblibxenium-qc-filtered-h5adH5Xenium QC Filtered H5ad
xenium-cellcharter-stability-plot-pdfPDFXenium Cellcharter Stability Plot Pdfxenium-qc-pdfPDFXenium QC PDF
xenium-cells-csvCSVXenium Cells Csvxenium-raw-qc-pdfPDFXenium Raw QC Pdf
xenium-cells-parquetParquetXenium Cells Parquetxenium-tar-contentWildcardXenium Tar Content
xenium-cells-zarrZarrXenium Cells Zarrxenium-transcripts-csvCSVXenium Transcripts CSV
xenium-celltypist-cluster-umap-pdfPDFXenium Celltypist Cluster Map Umap Pdfxenium-transcripts-parquetParquetXenium Transcripts Parquet
xenium-celltypist-predicted-labels-csvCSVXenium Celltypist Predicted Labels Csvxenium-transcripts-zarrZarrXenium Transcripts Zarr
xenium-celltypist-predictions-h5adH5Xenium Celltypist Predictions H5adxenium-zone-cell-type-counts-pdfPDFXenium Zone Cell Type Counts PDF
xenium-celltypist-predictions-joblibJobLibXenium Celltypist Predictions Joblibxenium-zone-cell-type-fractions-csvCSVXenium Zone Cell Type Fractions CSV
xenium-experiment.xeniumXenium Experiment

Visualization of results

You can use an interactive visualization tool to understand your results. Examples of the types of visualizations you can create are listed in Table 7.

TABLE 7
VisualizationDescription
Spatial maps

Overlay gene expression or cell types on tissue images using sc.pl.spatial() or similar functions

Cluster plots

Visualize clusters or cell types in reduced dimensions

QC plots

Display metrics like total counts per cell, number of genes per cell, or cell/nucleus area

Bar charts, heat maps, or violin plots

Summarize cell composition, gene expression, or spatial domains


 Exploration

After ingestion, your data is ready for exploration and interactive analysis in a HISE NextGen IDE.

Interactive data analysis

The Jupyter Notebook/IDE environment is used for intersample analysis in either interactive mode (manually interacting with a Jupyter notebook) or batch mode (notebook jobs). You can load AnnData (.h5ad) files and other outputs for custom analysis using Python libraries such as Scanpy, Squidpy, or Seaborn. You can also perform dimensionality reduction (for example, UMAP, t-SNE, or PCA) and clustering. Another option is to run an advanced query to filter cells by type, spatial region, or gene expression.

For deeper biological insights, you can combine Xenium data with other spatial transcriptomics datasets, such as Visium, or with scRNA-seq data. For cross-dataset analysis, batch correction, or spatial alignment, you can integrate tools like Scanpy, Squidpy, or STalign. Then export your results for further downstream analysis or publication.


Related Resources

Submit and Monitor Pipeline Batches (Tutorial)

Understand Automated Pipelines

Configure a Pipeline (Tutorial)