HTAN Mass Spectrometry

The current HTAN Mass Spectrometry Standard was developed with a focus on Proteomics data. The standard may be refined in the future to reflect other data types such as lipids and metabolites.

LevelDefinitionExample Data
1Raw spectral datamz5,dta,ms2,ms1,mzXML,mzML,mzData
2Spectrum match peaksPeptide spectrum match (PSM) in csv format
3Peptide Group informationCombined peptide spectrum as csv or tsv files
4Protein Abundancecsv, tsv

Attributes

WARNING: Manifests provided on this page are for reference only. DO NOT USE THESE MANIFESTS FOR DATA SUBMISSION.

Directions

The interactive tables below are provided to help users understand the HTAN Data Model. The tables allow a user to view, search or download attributes either:

  1. in a specific manifest; or
  2. in all manifests represented on this page.

To view a specific manifest, click on the link in the Manifests tab. The manifest will appear in a new tab on the page. Navigate to the new tab to search for attributes or download the manifest.
To search for attributes among all manifests, navigate to the All Attributes tab and use the search box provided at the top of the tab. All attributes can also be downloaded as a csv file.

Manifest
Description
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 1
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 2
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 3
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 4
Auxiliary software parameter file used in mass spectrometry data processing, recorded as synapse ID (syn12345).
Attribute
Manifest Name
Description
Required
Conditional If
Data Type
Valid Values
Filename
- Mass Spectrometry Level 1
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
- Mass Spectrometry Auxiliary File
Name of a file
True
String
File Format
- Mass Spectrometry Level 1
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
- Mass Spectrometry Auxiliary File
Format of a file (e.g. txt, csv, fastq, bam, etc.)
True
String
- hdf5
- bedgraph
- idx
- idat
- bam
- bai
- excel
- powerpoint
- tif
- tiff
- ome-tiff
- png
- doc
- pdf
- fasta
- fastq
- sam
- vcf
- bcf
- maf
- bed
- chp
- cel
- sif
- tsv
- csv
- txt
- plink
- bigwig
- wiggle
- gct
- bgzip
- zip
- seg
- html
- mov
- hyperlink
- svs
- md
- flagstat
- gtf
- raw
- msf
- rmd
- bed narrowpeak
- bed broadpeak
- bed gappedpeak
- avi
- pzfx
- fig
- xml
- tar
- r script
- abf
- bpm
- dat
- jpg
- locs
- sentrix descriptor file
- python script
- sav
- gzip
- sdf
- rdata
- hic
- ab1
- 7z
- gff3
- json
- sqlite
- svg
- sra
- recal
- tranches
- mtx
- tagalign
- dup
- dicom
- czi
- mex
- cloupe
- am
- cell am
- mpg
- m
- mzml
- scn
- dcc
- rcc
- pkc
- sf
- bedpe
HTAN Parent Biospecimen ID
- Mass Spectrometry Level 1
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
HTAN Biospecimen Identifier (eg HTANx_yyy_zzz) indicating the biospecimen(s) from which these files were derived; multiple parent biospecimen should be comma-separated
True
- Is lowest level is "Yes - Is lowest level"
String
HTAN Data File ID
- Mass Spectrometry Level 1
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
- Mass Spectrometry Auxiliary File
Self-identifier for this data file - HTAN ID of this file HTAN ID SOP (eg HTANx_yyy_zzz)
True
String
MS Batch ID
- Mass Spectrometry Level 1
Batch ID indicating a set of samples that were run together.
True
String
MS-based Assay Type
- Mass Spectrometry Level 1
Analytes are the target molecules being measured with the assay.
True
String
- lc-ms
- ms
- tmt
- lc-ms/ms
Analyte Type
- Mass Spectrometry Level 1
The kind of molecular specimen analyte: a molecular derivative (I.e. RNA / DNA / Protein Lysate) obtained from a specimen
True
- Biospecimen is "Analyte"
String
- cfdna analyte
- dna analyte
- rna analyte
- total rna analyte
- tissue block analyte
- tissue section analyte
- pbmcs or plasma or serum analyte
- cdna libraries analyte
- pbmcs
- plasma
- serum analyte
- lipid
- protein
- metabolite
MS-based Targeted
- Mass Spectrometry Level 1
Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. Example: The MALDI Imaging analyte is lipids.
True
String
- targeted
- untargeted
MS Instrument Vendor and Model
- Mass Spectrometry Level 1
An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.
True
String
MS Source
- Mass Spectrometry Level 1
The ion source type used for surface sampling (MALDI, MALDI-2, DESI, or SIMS) or LC-MS/MS data acquisition (nESI)
True
String
- maldi
- maldi-2
- desi
- sims
- nesi
- other
Polarity
- Mass Spectrometry Level 1
The polarity of the mass analysis (positive or negative ion modes)
True
String
- positive
- negative
Mass Range Low Value
- Mass Spectrometry Level 1
The low value of the scanned mass range for MS1 in m/z.
True
String
Mass Range High Value
- Mass Spectrometry Level 1
The high value of the scanned mass range for MS1 in m/z.
True
String
Data Collection Mode
- Mass Spectrometry Level 1
Mode of data collection in tandem MS assays. Either DDA (Data-dependent acquisition) or DIA (Data-indemendent acquisition.
True
String
- dda
- dia
- other
MS Scan Mode
- Mass Spectrometry Level 1
Indicates whether experiment is MS, MS/MS, or other (possibly MS3 for TMT)
True
String
- ms
- ms/ms
- ms3
- other
MS Labeling
- Mass Spectrometry Level 1
Indicates whether samples were labeled prior to MS analysis (e.g., TMT)
True
String
LC Instrument Vendor and Model
- Mass Spectrometry Level 1
The manufacturer of the instrument used for LC.
True
String
LC Column Vendor and Model
- Mass Spectrometry Level 1
The manufacturer of the LC Column unless self-packed, pulled tip capilary is used and the model number/name of the LC Column - IF custom self-packed, pulled tip calillary is used enter 'Pulled tip capilary'
True
String
LC Resin
- Mass Spectrometry Level 1
Details of the resin used for lc, including vendor, particle size, pore size
True
String
LC Length Value
- Mass Spectrometry Level 1
LC column length in cm.
True
String
LC Temp Value
- Mass Spectrometry Level 1
LC temperature in C.
True
String
LC ID Value
- Mass Spectrometry Level 1
LC column inner diameter in microns.
True
String
LC Flow Rate
- Mass Spectrometry Level 1
LC flow rate in nL/min.
True
String
LC Gradient
- Mass Spectrometry Level 1
The program dictates the mobile phase solvent composition over the course of the chromatographic run.
True
String
LC Mobile Phase A
- Mass Spectrometry Level 1
Composition of mobile phase A
True
String
LC Mobile Phase B
- Mass Spectrometry Level 1
Composition of mobile phase B
True
String
Software and Version
- Mass Spectrometry Level 1
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
Name of software used to generate expression values. String
True
- Pseudo Alignment Used is "Yes - Pseudo Alignment Used"
String
MS Instrument Metadata File
- Mass Spectrometry Level 1
Additional file containing instrument metadata details. Use either synapse_path or entity_Id
False
String
HTAN Parent Data File ID
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
- Mass Spectrometry Auxiliary File
HTAN Data File Identifier indicating the file(s) from which these files were derived
True
String
MS Assay Category
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
Type of Mass Spectrometry performed.
True
String
Mass Spectrometry Auxiliary File
- Mass Spectrometry Level 2
- Mass Spectrometry Level 3
- Mass Spectrometry Level 4
Auxiliary software parameter file used in mass spectrometry data processing, recorded as synapse ID (syn12345).
False
String