Today we're excited to make public the GDPx2 dataset! GDPx2 is the second functional genomics dataset released by Ginkgo Datapoints. You can access all of our current releases on the Ginkgo Datapoints data portal.
GDPx2 includes transcriptional profiles for 4 human cell lines:
Melanocytes Aortic smooth muscle cells | Dermal fibroblasts Skeletal muscle myoblasts |
Each cell line has undergone 15 different treatments (10 test compounds and 5 controls) at a range of 6 concentrations. The effect of each drug treatment was measured by collecting an RNA-seq profile of about 2 M reads using our DRUG-seq assay for high-throughput transcriptomics.
Treatments | Controls | |
Corticosterone Idarubicin Mitoxantrone Beclomethasone Cycloheximide | Thapsigargin Calcimycin Rigosertib Nocodazole Alisertib | DMSO Dexamethasone Trichostatin A Brefeldin A Dabrafenib |
The cell-treatment pairs in GDPx2 represent only a portion of a much larger dataset characterizing 85 diverse pharmacologically active small molecules. If you find GDPx2 useful, the full 4 TB of data covering 12,216 total transcriptional profiles can be requested under the appropriate license and terms for research or commercial use.
We anticipate that GDPx2 will be of interest for teams exploring the range of pharmacologically relevant transcriptional responses that human cells exhibit and the relationships between them. Use GDPx2 for AI/ML-assisted target identification or exploration and modeling of transcriptional co-regulation in heterologous settings.
The AI stack for biotech is advancing quickly. A community of developers and service providers is hard at work to bring online new tools for curating data, training models, running inference, and deriving biologically meaningful results. Much of this work is happening in public, supported by cutting-edge academic research and open-source efforts. By aligning our offerings with and contributing to emerging common standards, we seek to share our work in a way that benefits everyone.
At Ginkgo Datapoints, our role in the AI ecosystem is data provider. The automation infrastructure of the Ginkgo foundry allows us to efficiently generate large datasets for functional genomics and other applications. The GDPx2 dataset is just a small sample of the scale of data we can bring to your AI project. If you're looking for a large dataset tailored to your application - get in touch!