Data Submission FAQ

1. How do I upload my files to the submission system?

Before using the web-based submission system, you must upload your submission files to the SFTP server. Logging into the web submission system can be accomplished using your existing project SFTP account and password. For more details, visit the File Submission page.

2. Where can I find that latest DCC data file specification?

You can find the latest DCC data file specification and Release notes here.

3. How do I put my submission files together?

Submission files must follow the rules in the DCC Data File specification. More details about submission file format can be found here

4. How do I submit information about a simple somatic mutation?

Examples of the format required for mutations can be found here. Note, this is not a new format and has always been enforced by ICGC.

5. Where do I find the Release Notes showing changes between the last DCC data file specification and the current DCC data file specification?

The latest dictionary Release Notes can be found here

6. What assembly version do you use?

ICGC DCC uses hg19 (GrCh37).

7. How do I get the JSON format of the DCC data file specification?

Please refer to the Submission API which documents how to retrieve the data specification in JSON format via REST webservices.

8. What do -777 and -888 mean? How do I use them?

-777 and -888 are reserve codes accepted by ICGC for data elements where the data is verified to be unknown, or the information is not applicable. Details and examples can be found here

9. I got an error message during validation that I don't understand! How do I get more details?

Once validation is complete, you will be able to view a report generated by the submission system which gives details about errors. In addition, please refer to the "Script" section of the data file specification tables for each data type, which explain cross-field restrictions. If you encounter an error message you do not understand, please contact the DCC.

10. I can't log into the submission system. I think I forgot my username/password. How do I retrieve it?

The submission system will lock a user out if a password is entered incorrectly three times. Should you find yourself locked out, or if you have forgotten your password, please contact the DCC and we will reset it for you.

11. How do I obtain the accession number required for the "raw_data_accession" field?

When you submit your raw sequencing data to EGA's EBI repository, EGA will supply you with an accession number. You will need to use this accession number to populate the raw_data_accession field in the submission files.

12. Why do I have to submit my raw data to EBI's EGA repository before submitting data to ICGC DCC?

ICGC member projects are required by ICGC policies to submit their raw sequencing reads, and other primary data, to controlled access public repositories. The official ICGC DCC-supported repository for ICGC sequencing reads is the European Bioinformatics Institute's (EBI) European Genome-phenome Archive (EGA).

13. What does open and controlled access mean?

Open access refers to all data that is publicly available in the portal without restrictions. Controlled access refers to protected data, such as germline SNPs, that requires user certification for access.

14. What language are the cross-field validation scripts written in?

The cross-field validation scripts are implemented in MVEL

15. Can I submit just clinical data without experimental data to ICGC?

Yes, you can but clinical data alone is not useful and will not be counted towards an ICGC project's commitment target. We strongly encourage projects to submit molecular data as well.

16. What are "percentage_of_cellularity" and "level_of_cellularity" fields in the Specimen and Analyzed Sample Data File data file specifications? Do I need to fill both in?

Cellularity is defined as the proportion of tumour nuclei to total number of nuclei in a given specimen or analyzed sample. The cellularity at the specimen level (ie. pathologist reported) could be different from the cellularity reported at the sample level (ie. using sequencing methods such as cell sorting). The two fields "percentage_of_cellularity" and "level_of_cellularity" record the same piece of information, but give the submitter the option to submit the information as either an integer value (percentage_of_cellularity) or a range of percentages (level_of_cellularity). An example would be in the case where a pathologist can only specify the cellularity of a specimen to be within a certain range (40-50% for example), then the "level_of_cellularity" data element could be populated with "3" (which defines 41-60% in the controlled vocabulary table), and "percentage_of_cellularity" could be populated with "-888" (not applicable).