The use of artificial intelligence to design proteins from scratch, rather than modifying existing ones. The field promises to revive the original vision of nanotechnology—useful molecule-size factories—which had dwindled into a marketing gimmick for sunscreen ingredients and tennis-racket frames.
How it works
Designing a protein from scratch requires three capabilities:
- Structure-to-function prediction — working out how a protein's three-dimensional structure affects what it does.
- Sequence design — devising a chain of amino acids (the building blocks of proteins) expected to fold into the desired structure.
- Computational validation — checking, before synthesising anything, that the designed chain will indeed assume the target shape.
Once a design passes these steps, scientists synthesise the appropriate DNA and insert it into a bacterium or yeast to produce the protein for testing.
Institute for Protein Design (IPD)
The leading centre in the field, based at the University of Washington in Seattle. It is run by David Baker, joint winner of the 2024 Nobel prize for chemistry. The IPD has developed three principal AI tools:
- RFdiffusion — predicts a protein's function from its structure, using a method similar to image-generating diffusion models, trained on more than 200,000 natural proteins.
- ProteinMPNN — designs amino-acid sequences by drawing on databases of how amino acids interact with each other and with other molecules.
- RoseTTAFold — a machine-learning model that validates whether designed chains will fold correctly. Its precursor, written by Dr Baker in the mid-1990s, inspired the creation of AlphaFold.
Applications
- SKYCovione — the IPD's covid-19 vaccine, which works by displaying synthetic copies of parts of the SARS-CoV-2 spike protein to attract the immune system's attention.
- Snake-bite treatment — synthetic proteins that lock onto and neutralise venom molecules in the blood, smaller and easier to make than the antibodies currently used.
- Alzheimer's disease — proteins designed to bind to the molecular precursors of neuronal plaques and tangles.
- Gene editing — custom-targeted nucleases designed to bind to particular DNA sequences, increasing the range of DNA that can be edited and reducing the risk of off-target edits.
- Biofuels — Nate Ennist of the IPD is redesigning photosynthetic machinery to broaden its light range and, on longer timescales, to generate hydrocarbons rather than sugar.
- Novel materials — circular protein fibres that could be linked like mail armour to make fabrics; hybrid organic-inorganic materials; enzymes to digest plastics such as PET.
- Artificial noses — chip-based sensors running molecules through protein pores to identify them, extending technology that already exists for DNA and RNA to a far wider range of substances.
- Protein logic gates — protein equivalents of the logic gates in silicon chips, which could control gene expression in cells. Dr Baker believes such gates could be stacked in 3D arrays more easily than their silicon counterparts.
Other players
Alphabet has two protein-design projects led by Sir Demis Hassabis, one of AlphaFold's Nobel-winning inventors:
- Isomorphic Labs, a London-based spin-out, has contracts with Eli Lilly and Novartis to test candidate drug molecules' interactions with target proteins.
- AlphaProteo, developed by Google DeepMind, designs proteins to bind to specified targets.
Profluent, based in Emeryville, California, builds protein-design AI models resembling large language models. Run by Ali Madani, it focuses on creating new CRISPR-Cas gene-editing tools, trained on a curated database of around 5m CRISPR-Cas protein complexes.
EvolutionaryScale, based in New York, takes the LLM approach further. Its model, ESM3, accounts for a protein's structure and function as well as its amino-acid sequence, trained on 2.8bn entries. Alex Rives, the firm's chief scientist, is working towards a first approximation to a virtual cell. EvolutionaryScale licenses its model to firms planning to make protein-based drugs and materials.