Blog Post

Google and Ginkgo: Foundry-Scale Data Meets AI

Today Ginkgo Bioworks announced a new partnership with Google Cloud to build a generative AI platform for engineering biology and for biosecurity.


This is a moment to reflect on the transformations that AI is bringing to biology, and the role AI will play in Ginkgo’s mission to make biology easier to engineer. The scope of the AI transformation is so massive that this isn’t the first moment to reflect on it, and it won’t be the last. These video resources offer more of our perspective on AI:

Ginkgo’s strategy for making biology easier to engineer in the AI era could be summarized with three words: “Data is Queen.” Data is the essential and scarce resource that powers machine learning applications. Large amounts of quality data make the fundamental difference for biological developers who create technologies that work, scale, reach the market and deliver for patients.

Biological data has always been one of Ginkgo’s strengths and a key differentiator. We’ve spent the past 15 years developing a foundry infrastructure for designing, building, testing, and learning from biological systems. Here’s some quick numbers to illustrate the scale of Ginkgo’s data-generating capacity and data resources:

  • ~2 Billion protein sequences in our proprietary DNA database.

  • 5 Million+ enzyme designs built and tested in our foundry.

  • ~720,000 data points generated in a single recent CAR-T experiment.

  • Up to 1 Million strains screened in a single run of the ultra-high-throughput EncapS technology.

  • 100,000+ AAV capsids partially or extensively characterized for use in gene therapy.

  • 100 Million+ multiplex genome edits performed each year.

  • 100,000+ genes synthesized per year making us the world’s largest single user of synthetic DNA.

  • 100+ active commercial programs as of the last quarter with partners across industries spanning pharmaceuticals and agriculture to industrial biotech.

Machine learning massively increases the power of all that biological data. We, like many in the biological world, were electrified in 2018 when Google DeepMind’s AlphaFold took first place in the CASP competition. The protein folding problem, previously considered among the hardest in biology, became easy for a large number of cases. That achievement was made possible by data, in this case the 200,000 protein structures deposited in the Protein Data Bank (PDB). The incredible value of that data for AI applications was, to some extent, a lucky coincidence of history. The PDB was not assembled with AI in mind. Going forward, many large new biological datasets will be.

The Ginkgo-Google partnership represents our conviction that AI tools and biological data should be developed in tandem.

Biological data will not be a passive resource that data scientists process to generate insights. Biological data will be designed and structured to provide maximum insights to AI tools, often with AI design algorithms directly in the loop. It will be necessary to closely integrate the design of the foundry, where data is generated, with the design of the algorithms targeted to particular biological problems.

Thomas Kurian, CEO of Google Cloud, put it like this: “Our strategic partnership with Ginkgo is a first-of-its-kind for Google Cloud, underscoring our confidence that Ginkgo will play a critical and pioneering role in the life sciences space, leveraging AI to reshape humanity’s understanding of biology.”

All of this is going to take a lot more of everything: more data-generating infrastructure in the Ginkgo foundry and across our biosecurity endeavors, more cloud storage for biological data in all its forms, more next-generation computing resources for model training. A key lesson of the AI era is that scale matters. Bigger models with more data deliver better results. If there is eventually a point of diminishing returns to scale, that point for most applications is still out of sight.

Biology is going to be a great proving ground for the power of AI, a space of incredibly hard problems and incredibly rich forms of data. Biology is also, in our view, the place where AI can deliver the most good for human beings. New therapeutics, green technologies, biosecurity tools, and a whole range of products are waiting to be built with biology. We’re incredibly excited about what biological product developers can do with AI, and what AI can do with data.

Let’s grow!

Posted by Jake Wintermute