HTAN provides a wide variety of data from a diversity of imaging assays including: H&E, t-CyCIF and MxIF, clinical and multiplex IHC, SABER, IMC, MIBI, CODEX, GeoMx-DSP, and MERFISH.  There are a core set of metadata that are relevant to all imaging assays, as well as additional extended metadata that apply to only a subset (one or more) of the imaging assays.

HTAN data levels for imaging

Borrowing from the TCGA and GDC, HTAN is using the concept of “data levels”. For imaging data, “Level 1” typically represents the raw output of the machine.  At this level, the images may be in a variety of formats and may require reformatting or other pre-processing steps (such as stitching, intensity normalization, etc) before they are considered “Level 2” image files.

Currently only Levels 1 and 2 have been defined:

Level 1 Raw image data – TIFF, SVS, IMS, CZI, MCD,  RCPNL, and DICOM

Level 2 Pre-processed image data – OME-TIFF

Metadata Attributes

As a starting point, THAN followed the existing OME-XML metadata standard, as well as the extension recently proposed by the 4DN imaging standards working group.  Since these existing standards are focused on fluorescence microscopy, they have been extended to meet the metadata needs of other types of assays used by HTAN centers.

Antibody based methods are organized as shown in this figure:

Biospecimen Preparation and Experimental Metadata

For cyclic and/or multi-target methods, certain sets of attributes are necessary to describe each target within each cycle (and sub-cycle, where applicable).  For example, t-CyCIF image files contain multiple images, each representing one “channel” with a specific target, antibody, and fluorophore.  This type of channel-level metadata will be supplied in a separate “companion CSV” file for each image data file.  All other attributes described in this document are file-level metadata and can be supplied in a single spreadsheet for multiple similar data files.

For an H&E image, on the other hand, the two stains are applied to the tissue and the tissue is then imaged using three channels (Red, Green, and Blue).  Since there is no one-to-one relationship between a stain and a channel, the information about these stains properly belongs in the file-level metadata.