Machine learning is now being utilized to create synthetic light-responsive enzymes that function effectively within living cells.
Artificial intelligence has revolutionized our understanding of protein structures, exemplified by DeepMind’s AlphaFold2, which predicted the structures of 200 million proteins. Now, David Baker and his research team at the University of Washington have advanced the field even further. In a study published in *Nature* on February 22, they demonstrated how AI can be used to design custom functional proteins, which they were able to synthesize and produce within living cells. This breakthrough opens up new possibilities for protein engineering. Ali Madani, the founder and CEO of Profluent, a company specializing in AI-driven protein design, praised the study, noting that it represents a significant leap in the field and marks the "emergence of a new discipline."
Proteins are composed of various amino acids arranged in complex folded chains, resulting in an immense diversity of three-dimensional shapes. Predicting a protein's 3D structure from its amino acid sequence alone is a challenge due to the multitude of factors influencing protein folding, including amino acid sequence, molecule interactions, and surface modifications like glycosylation. Traditionally, scientists have used experimental methods such as X-ray crystallography to determine protein structures. This technique involves diffracting X-rays through crystallized proteins to reveal detailed atomic structures. However, these methods are costly, time-consuming, and require technical expertise. Despite these challenges, X-ray crystallography has allowed researchers to map thousands of protein structures, providing a valuable dataset for training AI models. DeepMind’s AlphaFold system has demonstrated that machine learning can predict protein structures with remarkable accuracy, with AlphaFold2 achieving even greater precision by training on 170,000 protein structures.
On the same day that the AlphaFold2 paper was published, Baker and his team unveiled an independent, freely accessible tool called RoseTTAFold, which predicts protein structures with accuracy comparable to AlphaFold2.
Since then, Baker and his colleagues have investigated the potential of using machine learning in a reverse-engineering approach to generate amino acid sequences for theoretical proteins that could have industrial or medical applications. Traditionally, protein engineering involves making incremental adjustments to proteins and evaluating their effects, such as introducing random mutations into the gene that codes for the protein and then screening the resulting proteins for desirable traits. Baker notes that with AI, "we can create even better designs" for these proteins "more rapidly than ever before."
To evaluate their protein design approach, the team focused on a category of light-emitting enzymes known as luciferases (derived from the Latin word for "lightbearer"). These enzymes, which shine when they interact with small molecules called luciferins, are present in a variety of organisms, including fireflies and marine creatures living in the ocean's dark depths.
To evaluate their protein design approach, the researchers focused on a group of light-emitting enzymes known as luciferases—so named from the Latin word for "lightbearer." These enzymes emit light when they interact with small molecules called luciferins and are found in various organisms, such as fireflies and deep-sea marine life.
Unlike fluorescent proteins, luciferases do not require an external light source to emit light, making them highly valuable for deep-tissue imaging. However, natural luciferases are few and far between, and many are unstable and bind more effectively to natural luciferins than to synthetic ones designed for optimal properties. This has made it challenging to utilize luciferases in scientific applications and to engineer artificial versions of these enzymes.
Utilizing a suite of AI systems, including AlphaFold2, Protein MPNN, and trRosetta, the team aimed to design an amino acid sequence for a luciferase that could bind effectively to synthetic luciferin while maintaining stability. Given that natural luciferases poorly bind synthetic luciferin, they used machine learning to assess how 4,000 proteins are known for binding small molecules compared in their binding efficiency. They identified a promising group: the nuclear transport factor 2 (NTF2)-like protein superfamily, which features a binding pocket suitable for synthetic luciferin. Despite this, the long loops in NTF2-like proteins could cause misfolding in synthetic versions. These loops, though not essential for luciferase activity, were replaced with more stable amino acid sequences using machine learning techniques.
The AI-driven approach led to the creation of 7,648 novel protein designs. These designs were then tested to see which could produce light when combined with synthetic luciferin in cells. The screening revealed that only three of the designs (0.04 percent) were successful.
Ali Madani, founder and CEO of Profluent, acknowledges the difficulty of enzyme design due to the need for precise functionality, calling any success "very impressive." While Profluent’s AI tool, ProGen, has a hit rate above 50%, Madani notes that comparing different AI approaches is challenging since each is optimized for various types of protein design.
Building on their initial findings, the team refined their design process to create additional luciferases for a different synthetic luciferin, achieving a 4 percent success rate from 46 potential designs. Andy Hsien-Wei Yeh, a postdoctoral researcher in Baker’s lab, explains that the insights gained from the first round helped refine their understanding of the geometries needed for effective luciferase design. This experience also contributed to the founding of Monod Bio, a company that has licensed its synthetic luciferases.
Despite these advances, protein design is not yet fully automated. Baker notes that there is still "room for improvement," as manual adjustments were necessary to perfect the luciferase enzyme's active site. He envisions a future where AI can design proteins "straight out of the box," though acknowledges that more complex chemical reactions will pose additional challenges.
Looking ahead, Baker and his team are developing a new AI system, RFdiffusion, to further streamline protein design. They plan to use this system to create a synthetic protein for a nasal spray aimed at preventing influenza by blocking the virus’s attachment to host cells. With expectations of producing highly stable proteins, this nasal spray could offer long-lasting protection during flu season. Beyond influenza prevention, Baker anticipates that this AI-driven approach could eventually be used to design new biomaterials, enzymes for plastic degradation, and proteins for solar energy capture.
Source: https://www.the-scientist.com/now-ai-can-be-used-to-design-new-proteins-70997