There are five major steps in the data submission process:
- Obtain login/password account from EGA's EBI repository
- Submit raw sequence data to the European Genome-phenome Archive
- Prepare the ICGC submission files according to DCC data format specifications
- Submit files to the DCC Secure SFTP server
- Validate submission files using web-based submission system
Note: Since Release 15, all submitted data must be based on Human reference genome assembly GRCh37 and Ensembl gene set version 69. This will be so until the end of the ICGC data submission process.
When submitting experimental data to the ICGC DCC, please make sure you've already deposited your raw data to the appropriate public data repositories (eg: sequencing reads to EBI EGA). You will need to populate the data elements raw_data_repository and raw_data_accession with the correct repository and accession number respectively.
The DCC publishes submissions to the community in curated batches called Releases. The general frequency of releases is 3 per year. Once a release is completed, a new one is created, and so on. Many releases exist at the same time, but only one of them is being prepared. The following diagram tries to demonstrate that releases happen sequentially and only one is ever being prepared:
The process goes like this:
- A Release is created and is immediately set in the
OPENEDstate. When the
Releaseis in this state:
- the data dictionary can be amended according to some constraints (see the DD spec)
- changing the dictionary requires that already validated submissions be re-validated (probably done manually)
- members can upload files for their (SFTP) projects and obtain validation reports.
VALIDsubmissions can be "signed off". At which point the submission directory becomes read-only.
- After a period of time, the release is moved to the
COMPLETEDstate. At which point:
- this release is frozen, everything about it can be read back, but nothing can be modified
- a new release is created and the process starts again.
We propose that the whole process be relatively short, maybe 1 or 2 months. Thus, if a project misses a release, they can simply submit for the next which will be just a few weeks later.
Please contact the DCC if you have any questions or comments about the data submission process.