HTAN Clinical Data

HTAN clinical data consists of three tiers.

Tier 1 is based on the NCI Genomic Data Commons (GDC) clinical data model, while Tiers 2 and 3 are extensions to the GDC model.

TierDescription
1Seven categories of clinical data, based on the GDC clinical data model. See GDC Table below.
2HTAN disease-agnostic extensions to the GDC clinical data model.
3HTAN disease-specific extensions to the GDC clinical data model.

Tier 1 Clinical Data

Tier 1 clinical data consists of seven categories of data from the GDC Data Model.

CategoryDescription
DemographicsData for the characterization of the patient by means of segmenting the population (e.g., characterization by age, sex, or race).
DiagnosisData from the investigation, analysis and recognition of the presence and nature of disease, condition, or injury from expressed signs and symptoms; also, the scientific determination of any kind; the concise results of such an investigation.
ExposureClinically relevant patient information not immediately resulting from genetic predispositions.
Family HistoryRecord of a patient's background regarding cancer events of blood relatives.
Follow-upA visit by a patient or study participant to a medical professional. A clinical encounter that encompasses planned and unplanned trial interventions, procedures and assessments that may be performed on a subject. A visit has a start and an end, each described with a rule. The process by which information about the health status of an individual is obtained before and after a study has officially closed; an activity that continues something that has already begun or that repeats something that has already been done.
Molecular TestInformation pertaining to any molecular tests performed on the patient during a clinical event.
TherapyRecord of the administration and intention of therapeutic agents provided to a patient to alter the course of a pathologic process.

Tiers 2 and 3 Clinical Data

Tier 2 consists of disease-agnostic extensions to the GDC clinical data model.

Tier 3 consists of disease-specific extensions to the GDC clinical data model. This covers additional elements for Acute Lymphoblastic Leukemia (ALL), Brain Cancer, Breast Cancer, Lung Cancer, Melanoma, Ovarian Cancer, Pancreatic Cancer, Prostate Cancer and Sarcoma.

Attributes

WARNING: Manifests provided on this page are for reference only. DO NOT USE THESE MANIFESTS FOR DATA SUBMISSION.

Directions

The interactive tables below are provided to help users understand the HTAN Data Model. The tables allow a user to view, search or download attributes either:

  1. in a specific manifest; or
  2. in all manifests represented on this page.

To view a specific manifest, click on the link in the Manifests tab. The manifest will appear in a new tab on the page. Navigate to the new tab to search for attributes or download the manifest.
To search for attributes among all manifests, navigate to the All Attributes tab and use the search box provided at the top of the tab. All attributes can also be downloaded as a csv file.

Manifest
Description
Demographic attributes
Disease diagnosis
Exposure to carcinogens
Family cancer history
Follow up clinical visits
Clinical molecular test data
Updates to a participants vital status
Clinical therapy or treatment
Tier 2 Cancer Data
Acute Lymphoblastic Leukemia attributes in Clinical Data Tier 3
Breast cancer specific attributes in Clinical Data Tier 3
Colorectal cancer specific attributes in Clinical Data Tier 3
Lung cancer specific attributes in Clinical Data Tier 3
Melanoma specific attributes in Clinical Data Tier 3
Ovarian cancer specific attributes in Clinical Data Tier 3
Pancreatic cancer specific attributes in Clinical Tier Data 3
Prostate cancer specific attributes in Clinical Data Tier 3
Sarcoma specific attributes in Clinical Data Tier 3
Attribute
Manifest Name
Description
Required
Conditional If
Data Type
Valid Values
HTAN Participant ID
- Demographics
- Diagnosis
- Exposure
- Family History
- Follow Up
- Molecular Test
- Participant Vital Status Update
... Number of manifests: 18 (Show all)
HTAN ID associated with a patient based on HTAN ID SOP (eg HTANx_yyy )
True
String
Ethnicity
- Demographics
An individual's self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau.
True
String
- hispanic or latino
- not hispanic or latino
- unknown
- not reported
- not allowed to collect
Gender
- Demographics
Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. [Identification of gender is based upon self-report and may come from a form, questionnaire, interview, etc.]
True
String
- female
- male
- unknown
- unspecified
- not reported
Race
- Demographics
An arbitrary classification of a taxonomic group that is a division of a species. It usually arises as a consequence of geographical isolation withina a species and is characterized by shared heredity, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution.
True
String
- white
- american indian or alaska native
- black or african american
- asian
- native hawaiian or other pacific islander
- other
... Number of valid options: 9 (Show all)
Vital Status
- Demographics
- Participant Vital Status Update
The survival state of the person registered on the protocol.
True
String
- alive
- dead
- unknown
- not reported
Days to Birth
- Demographics
Number of days between the date used for index and the date from a person's date of birth represented as a calculated negative number of days. If not applicable please enter 'Not Applicable'
False
String
Country of Residence
- Demographics
Country of Residence at enrollment
False
String
- afghanistan
- albania
- algeria
- andorra
- angola
- anguilla
- antigua and barbuda
- argentina
... Number of valid options: 232 (Show all)
Age Is Obfuscated
- Demographics
The age of the patient has been modified for compliance reasons. The actual age differs from what is reported. Other date intervals for this patient may also be modified.
False
String
- true
- false
Year Of Birth
- Demographics
Numeric value to represent the calendar year in which an individual was born.
False
String
Occupation Duration Years
- Demographics
The number of years a patient worked in a specific occupation.
False
String
Premature At Birth
- Demographics
The yes/no/unknown indicator used to describe whether the patient was premature (less than 37 weeks gestation) at birth.
False
String
- yes
- no
- unknown
- not reported
Weeks Gestation at Birth
- Demographics
Numeric value used to describe the number of weeks starting from the approximate date of the biological mother's last menstrual period and ending with the birth of the patient.
False
String
Dead
- Demographics
This indicates the participant is dead and defines further required metadata
False
String
Year of Death
- Demographics
Numeric value to represent the year of the death of an individual.
True
- Vital Status is "Dead"
String
Cause of Death
- Demographics
The cause of death
True
- Vital Status is "Dead"
String
- cancer related
- cardiovascular disorder nos
- end-stage renal disease
- infection
- not cancer related
- renal disorder nos
- spinal muscular atrophy
- surgical complications
... Number of valid options: 12 (Show all)
Cause of Death Source
- Demographics
The text term used to describe the source used to determine the patient's cause of death.
False
- Vital Status is "Dead"
String
- autopsy
- death certificate
- medical record
- social security death index
- obituary
- unknown
- not reported
Days to Death
- Demographics
Number of days between the date used for index and the date from a person's date of death represented as a calculated number of days. If not applicable please enter 'Not Applicable'
False
- Vital Status is "Dead"
String
Age at Diagnosis
- Diagnosis
Age at the time of diagnosis expressed in number of days since birth.
True
String
Year of Diagnosis
- Diagnosis
Numeric value to represent the year of an individual's initial pathologic diagnosis of cancer.
False
String
Primary Diagnosis
- Diagnosis
Text term used to describe the patient's histologic diagnosis, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O).
True
String
- acinar cell carcinoma
- acute basophilic leukaemia
- acute leukemia burkitt type
- acute leukemia nos
- acute lymphatic leukemia
- acute lymphoblastic leukemia-lymphoma nos
- acute lymphoblastic
... Number of valid options: 505 (Show all)
Precancerous Condition Type
- Diagnosis
The classification of pre-cancerous cells found in a specific collection of data being studied by the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL).
False
String
- ductal carcinoma in situ
- pancreatic intraductal papillary-mucinous neoplasm
- atypical adenomatous lung hyperplasia
- other
- pancreatic intraepithelial neoplasia
... Number of valid options: 41 (Show all)
Site of Resection or Biopsy
- Diagnosis
The text term used to describe the anatomic site of the resection or biopsy of the patient's malignant disease, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O).
True
- Biospecimen is "Bone"'
- 'Biospecimen is "Urine"'
- 'Biospecimen is "Tissue"
String
- abdomen nos
- abdominal esophagus
- accessory sinus nos
- acoustic nerve
- adrenal gland nos
- ampulla of vater
- anal canal
- anterior 2/3 of tongue nos
... Number of valid options: 333 (Show all)
Tissue or Organ of Origin
- Diagnosis
The text term used to describe the anatomic site of origin, of the patient's malignant disease, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O).
True
String
- abdomen nos
- abdominal esophagus
- accessory sinus nos
- acoustic nerve
- adrenal gland nos
- ampulla of vater
- anal canal
- anterior 2/3 of tongue nos
... Number of valid options: 333 (Show all)
Morphology
- Diagnosis
The third edition of the International Classification of Diseases for Oncology, published in 2000 used principally in tumor and cancer registries for coding the site (topography) and the histology (morphology) of neoplasms. The study of the structure of the cells and their arrangement to constitute tissues and, finally, the association among these to form organs. In pathology, the microscopic process of identifying normal and abnormal morphologic characteristics in tissues, by employing various cytochemical and immunocytochemical stains. A system of numbered categories for representation of data.
True
String
Tumor Grade
- Diagnosis
Numeric value to express the degree of abnormality of cancer cells, a measure of differentiation and aggressiveness.
False
String
- g1
- g2
- g3
- g4
- gx
- gb
- high grade
- intermediate grade
... Number of valid options: 12 (Show all)
Progression or Recurrence
- Diagnosis
- Follow Up
Yes/No/unknown indicator to identify whether a patient has had a new tumor event after initial treatment.
True
String
- yes - progression or recurrence
- no
- unknown
- not reported
Last Known Disease Status
- Diagnosis
Text term that describes the last known state or condition of an individual's neoplasm.
True
String
- distant met recurrence/progression
- loco-regional recurrence/progression
- biochemical evidence of disease without structural correlate
- tumor free
... Number of valid options: 9 (Show all)
Days to Last Follow up
- Diagnosis
Time interval from the date of last follow up to the date of initial pathologic diagnosis, represented as a calculated number of days. If not applicable please enter 'Not Applicable'
True
String
Days to Last Known Disease Status
- Diagnosis
Time interval from the date of last follow up to the date of initial pathologic diagnosis, represented as a calculated number of days. If not applicable please enter 'Not Applicable'
True
String
Method of Diagnosis
- Diagnosis
Text term used to describe the method used to confirm the patients malignant diagnosis.
False
String
- autopsy
- biopsy
- blood draw
- bone marrow aspirate
- core biopsy
- cytology
- cystoscopy
- debulking
... Number of valid options: 25 (Show all)
Prior Malignancy
- Diagnosis
The yes/no/unknown indicator used to describe the patient's history of prior cancer diagnosis.
False
String
- yes
- no
- unknown
- not reported
- not allowed to collect
Prior Treatment
- Diagnosis
A yes/no/unknown/not applicable indicator related to the administration of therapeutic agents received before the body specimen was collected.
False
String
- yes
- no
- unknown
- not reported
- not allowed to collect
Metastasis at Diagnosis
- Diagnosis
The text term used to describe the extent of metastatic disease present at diagnosis.
False
String
- distant metastasis
- metastasis nos
- no metastasis
- regional metastasis
- unknown
- not reported
Metastasis at Diagnosis Site
- Diagnosis
Text term to identify an anatomic site in which metastatic disease involvement is found.
False
String
- abdomen
- adrenal gland
- ascites
- bone
- bone marrow
- brain
- cerebrospinal fluid
- central nervous system
... Number of valid options: 33 (Show all)
First Symptom Prior to Diagnosis
- Diagnosis
Text term used to describe the patient's first symptom experienced prior to diagnosis and thought to be related to the disease.
False
String
- altered mental status
- headaches
- motor or movement changes
- seizures
- sensory changes
- visual changes
- unknown
- not reported
Days to Diagnosis
- Diagnosis
Number of days between the date used for index and the date the patient was diagnosed with the malignant disease. If not applicable please enter 'Not Applicable'
False
String
Percent Tumor Invasion
- Diagnosis
The percentage of tumor cells spread locally in a malignant neoplasm through infiltration or destruction of adjacent tissue.
False
String
Residual Disease
- Diagnosis
- Therapy
Text terms to describe the status of a tissue margin following surgical resection.
False
String
- r0
- r1
- r2
- rx
- unknown
- not reported
Synchronous Malignancy
- Diagnosis
A yes/no/unknown indicator used to describe whether the patient had an additional malignant diagnosis at the same time the tumor used for sequencing was diagnosed. If both tumors were sequenced, both tumors would have synchronous malignancies.
False
String
- yes
- no
- unknown
- not reported
Tumor Confined to Organ of Origin
- Diagnosis
The yes/no/unknown indicator used to describe whether the tumor is confined to the organ where it originated and did not spread to a proximal or distant location within the body.
False
String
- yes
- no
- unknown
- not reported
Tumor Focality
- Diagnosis
The text term used to describe whether the patient's disease originated in a single location or multiple locations.
False
String
- multifocal
- unifocal
- unknown
- not reported
Tumor Largest Dimension Diameter
- Diagnosis
Numeric value used to describe the maximum diameter or dimension of the primary tumor, measured in centimeters.
False
String
Gross Tumor Weight
- Diagnosis
Numeric value used to describe the gross pathologic tumor weight, measured in grams.
False
String
Breslow Thickness
- Diagnosis
The number that describes the distance, in millimeters, between the upper layer of the epidermis and the deepest point of tumor penetration.
False
String
Vascular Invasion Present
- Diagnosis
The yes/no indicator to ask if large vessel or venous invasion was detected by surgery or presence in a tumor specimen.
False
String
- yes - vascular invasion present
- no
- unknown
- not reported
- not allowed to collect
Vascular Invasion Type
- Diagnosis
Text term that represents the type of vascular tumor invasion.
False
String
- extramural
- intramural
- macro
- micro
- no vascular invasion
- unknown
- not reported
Anaplasia Present
- Diagnosis
Yes/no/unknown/Not Reported indicator used to describe whether anaplasia was present at the time of diagnosis.
False
String
- yes - anaplasia present
- no
- unknown
- not reported
Anaplasia Present Type
- Diagnosis
The text term used to describe the morphologic findings indicating the presence of a malignant cellular infiltrate characterized by the presence of large pleomorphic cells, necrosis, and high mitotic activity in a tissue sample.
False
String
- absent
- diffuse
- equivocal
- focal
- present
- sclerosis
- unknown
- not reported
Laterality
- Diagnosis
For tumors in paired organs, designates the side on which the cancer originates.
False
String
- bilateral
- left
- midline
- right
- unilateral
- unknown
- not reported
Perineural Invasion Present
- Diagnosis
A yes/no indicator to ask if perineural invasion or infiltration of tumor or cancer is present.
False
String
- yes
- no
- unknown
- not reported
Attribute
Description
Required
Conditional If
Data Type
Valid Values
HTAN Participant ID
HTAN ID associated with a patient based on HTAN ID SOP (eg HTANx_yyy )
True
String
Start Days from Index
Number of days from the date of birth (index date) to the date of an event (e.g. exposure to environmental factor, treatment start, etc.). If not applicable please enter 'Not Applicable'
True
String
Timepoint Label
Label to identify the time point at which the clinical data or biospecimen was obtained (e.g. Baseline, End of Treatment, Overall survival, Final). NO PHI/PII INFORMATION IS ALLOWED.
True
String
Stop Days from Index
Number of days from the date of birth (index date) to the end date of the event (e.g. exposure to environmental factor, treatment start, etc.). Note: if the event occurs at a single time point, e.g. a diagnosis or a lab test, the values for this column is 'Not Applicable'
False
String
Location Extent Extraprostatic Extension
Location and extent of extraprostatic extension
False
String
- left anterior
- left lateral
- left posterolateral
- left posterior
- left apex
- left mid
- left base
- left focal
... Number of valid options: 20 (Show all)
Location Nature Positive Margins
Location and nature of positive margins
False
String
- left anterior
- left lateral
- left posterolateral
- left posterior
- left apex
- left mid
- left base
- left focal
... Number of valid options: 29 (Show all)
Seminal Vesicle Invasion
An anatomic position identifying a side of the body where local spread of malignant neoplasm is found to infiltrate tissue in the saclike glandular diverticulum on the ductus deferens in a male.
False
String
- none
- left
- right
- both sides
- unknown
- not reported
Prostate Carcinoma Histologic Type
The diagnostic subclassification of an invasive prostate carcinoma.
False
String
- prostatic adenocarcinoma (conventional nos)
- prostatic duct adenocarcinoma
- acinar prostate mucinous (colloid) adenocarcinoma
- acinar prostate adenocarcinoma-signet-ring variant
- prostatic adenosquamous
... Number of valid options: 10 (Show all)
Prostate Cancer Local Extent
The response used to categorize the local extent of disease for prostate cancer.
False
String
- organ confined
- extraprostatic extension
- unknown
- not reported
Additonal Findings Uninvolved Prostate
Additional findings, uninvolved prostate
False
String
- high-grade prostatic intraepithelial neoplasia (pin)
- inflammation
- benign prostatic hyperplasia (bph)
- prostatic intraductal adenocarcinoma
- other
- unknown
- not reported
Prostate Cancer Cytologic Morphologic Subtypes
Text term that describes various morphological and cytological subtypes in protate tumors.
False
String
- prostatic basal cell hyperplasia
- prostatic clear cell cribiform hyperplasia
- prostatic atypical adenomatous hyperplasia (adenosis or aah)
- low grade prostatic intraepithelial neoplasia (pin)/(pin i)
- high grade prostatic intraepithelial
... Number of valid options: 8 (Show all)