On Thursday, October 30, more than 300 people from academia and industry gathered in an auditorium to attend the Boltzgen Seminar, hosted by the Abdul Latif Jameel Clinic for Machine Learning in Health (MIT Jameel Clinic). The event was headlined by Hannes Stark, a PhD student at MIT and the first author of Boltzgen, who had announced Boltzgen just a few days earlier.
Building on Boltz-2, an open-source biomolecular structure prediction model that predicts protein binding affinity that made waves over the summer, BoltzGen (officially released on Sunday, October 26) is the first model of its kind to go a step further by generating novel protein binders that are ready to enter the drug discovery pipeline.
Three key innovations make this possible: first, Boltzgen’s ability to integrate protein design and structure prediction, accomplishing a wide variety of tasks while maintaining state-of-the-art performance. Next, Boltzgen’s built-in constraints are designed with feedback from wetlab collaborators to ensure that the model produces functional proteins that do not violate the laws of physics or chemistry. Finally, a rigorous evaluation process tests the model on “inconvenient” disease targets, pushing the limits of Boltzmann’s binder generation capabilities.
Most models used in industry or academia are capable of either structure prediction or protein design. Furthermore, they are limited to producing certain types of proteins that successfully bind to easy “targets”. Much like students answer a test question that resembles their homework, as long as the training data looks similar to the goal during binder design, the models often work. But existing methods are almost always evaluated on targets for which structures with binders already exist, and performance falters when used on more challenging targets.
“There are models that are trying to deal with binder design, but the problem is that these models are modality-specific,” explains Stark. “A general model does not only mean that we can address more tasks. Additionally, we get a better model for the individual task because the physics simulation is learned by example, and with a more general training scheme, we provide more such examples with generalized physics patterns.”
Boltzgen researchers went out of their way to test Boltzgen on 26 targets, ranging from clinically relevant cases to cases explicitly chosen for their dissimilarity to the training data.
This extensive validation process, which took place in eight wetlabs across academia and industry, demonstrates the generality of the model and its potential for important drug development.
Parabilis Medicines, one of the industry partners testing BoltzGen in a wetlab setting, praised BoltzGen’s potential: “We think that adapting BoltzGen to our existing Helicon peptide computational platform capabilities promises to accelerate our progress in delivering transformative medicines against major human diseases.”
While the open-source releases of Boltz-1, Boltz-2, and now BoltzGen (which was previewed at the 7th Molecular Machine Learning Conference on October 22) bring new opportunities and transparency to drug development, they also indicate that the biotech and pharmaceutical industries may need to reevaluate their offerings.
Amid Boltzgen’s discussion on the social media platform X, Justin Grace, principal machine learning scientist at LabGenius, raised a question. “The private-to-open performance time lag for chat AI systems is [seven] The months are falling,” Grace wrote in a post. “It seems that it has become even smaller in terms of protein. How will a Binder-as-a-Service company be able to [recoup] Investment when we can just wait a few months for the free version?
To those in academia, Boltzmann represents the expansion and acceleration of scientific possibility. “A question my students often ask me is, ‘Where can AI change the clinical game?’” says senior co-author and MIT professor Regina Barzilay, AI faculty head of the Jamil Clinic and affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL). She adds, “Until we identify the inevitable targets and propose a solution, we won’t change the game.” “The emphasis here is on unsolved problems, which is what distinguishes Haynes’s work from that of others in the field.”
Senior co-author Tommy Jaakkola, the Thomas Seibel Professor of Electrical Engineering and Computer Science, who is affiliated with the Jamil Clinic and CSAIL, says that “models like Boltzmann that are released completely open-source enable broader community-wide efforts to accelerate drug design capabilities.”
Looking ahead, Stark believes the future of biomolecular design will be influenced by AI models. He says, “I want to create tools that help manipulate biology to solve disease, or help perform tasks with molecular machines that we haven’t even imagined yet.” “I want to provide these tools and enable biologists to imagine things they have never even thought about before.”