The ICGC has joined forces with commercial and academic compute cloud partners to remove many of the barriers that prevent researchers from using the ICGC's vast genomic database. The large size of the ICGC datasets means that they can take months to download and analysing them requires computing power that many research groups do not have. These partnerships will allow scientists to access and analyze ICGC datasets through multiple cloud computing platforms, enhancing collaboration and accelerating the development of new tools and treatments for cancer patients.
The Data Coordination Center has developed a unified interface for searching and accessing data across all supported clouds. Authorized users can now access the ICGC BAM, VCF and other types of files through the Repository search tool on the Portal and retrieve manifest files that allow for bulk file downloads from the cloud repositories. User authentication and authorization is achieved through the standard ICGC DACO controlled access mechanism ensuring safe access to these datasets.
ICCG Compute Cloud Partners
Amazon Web Services is a well established commercial cloud providing a highly reliable, scalable, low-cost infrastructure platform in the cloud in 190 countries around the world. ICGC datasets are currently hosted at the US East (Northern Virginia) EC2 facility.
The Cancer Genome Collaboratory is an academic compute cloud resource built by the Ontario Institute for Cancer Research and hosted at the Compute Canada facilities. This infrastructure is still under intensive development and is currently storing only a small subset of the ICGC data for beta testing.