Introduction

Within HTAN, we define research participants as individuals with precancerous or cancerous disease.  From each of these research participants, we may obtain one or several biospecimen, or a biological sample at a single time point or over time, and these biospecimen are likely to be further processed into derivative biospecimens or analytes such as RNA, serum or a tissue slide.

The Biospecimen data model captures the following:

  • Link biospecimen and derivative biospecimen to research participants
  • Track multiple biospecimens over time
  • Link derivative biospecimen to their parent biospecimen
  • Link biospecimen processing to external protocols hosted on Protocols.io
  • Acquisition method, e.g. autopsy, biopsy, fine needle aspirate, etc
  • Topography Code, indicating site within the body, e.g. based on ICD-O-3
  • Collection information e.g. time, duration of ischemia, temperature, etc
  • Processing of parent biospecimen information e.g. fresh, frozen, etc
  • Biospecimen and derivative clinical metadata ie Histologic Morphology Code, e.g. based on ICD-O-3
  • Coordinates for derivative biospecimen from their parent biospecimen
  • Processing of derivative biospecimen for downstream analysis e.g. dissociation, sectioning, analyte isolation, etc

Tiers of Biospecimen Metadata

There are two tiers of Biospecimen Metadata:

Tier 1 – Tier 1 covers base biospecimen data common to most assays and most HTAN centers.

Tier 2 – Tier 2 covers assay-specific or center-specific extensions to the base model