Clinical Data Submission File Specifications | International Cancer Genome Consortium


There are three required clinical and tissue annotation submission files, and five optional template files:

All data submissions to the DCC must include the three core clinical data files.

Core Clinical Data Files

  1. Donor Data File (donor)
    Mandatory information about the donor's age, gender and vital status.
  2. Specimen Data File (specimen)
    Mandatory information about a specimen that was obtained from a donor. There may be several specimens per donor that were obtained concurrently or at different times.
  3. Analyzed Sample Data File (sample)
    Mandatory information about an analyzed sample that was subjected to molecular analysis. There may be several analyzed samples per specimen, for example, when a tumour is used to derive xenografts and cell lines.

_A project may choose not to use one or more of the five template tables. The five templates can be extended by each project to describe disease-specific clinical and specimen attributes for the cancer in question.

Optional Template Files

  1. Donor Surgical Procedures (surgery)
    Template details about surgical procedures undergone by donor; allows for mapping a surgical procedure to a specimen.
  2. Donor Environmental Exposure (exposure)
    Template details about donor's antecedent environmental exposures, such as smoking history.
  3. Donor Family History (family)
    Template details about family history of the donor.
  4. Donor Biomarkers (biomarker)
    Template details about biomarkers present in donor's tumour.
  5. Donor Therapy (therapy)
    Template details about the type and duration of therapy the donor received.

Clinical Data Encoding Notes

Coding of donor IDs

The three mandatory data files contain donor, specimen and analyzed sample IDs respectively. These IDs are to be coded specifically for ICGC purposes and must follow the following rules:

  • Only the submitting group will keep the key that will permit to link back the data to the individual donors.
  • The key must not be communicated to the data users.
  • It should not be derived from other IDs such as biobank or hospital identifiers. These IDs are to be coded in such a way that they cannot be tracked back to the individual donors, except by the submitting group.
  • IDs are assigned by each submitting group, and must be unique within all the data submitted by that group (i.e. no duplicate IDs allowed).

Coded donor IDs referring to the same donor should remain consistent across different submissions from the same submitting group.

Time intervals

To prevent potential identification of donors, the timing of all significant events in the patient history are given in terms of days counted from the date of primary diagnosis. The date of primary diagnosis is the date on which a definitive diagnostic procedure was performed, whether it be a fine needle aspiration, biopsy, or an unequivocal imaging procedure.