Artificial intelligence is already proving that it can accelerate drug development and improve our understanding of disease. But to turn AI into innovative treatments, we need to put the latest, most powerful models into the hands of scientists.
The problem is that most scientists are not machine-learning experts. Now the company OpenProtein.AI is helping scientists stay on the cutting edge of AI with a no-code platform that gives them access to powerful foundation models and a suite of tools for designing proteins, predicting protein structure and function, and training models.
The company, founded by Tristan Bepler PhD ’20 and former MIT Associate Professor Tim Lu PhD ’07, is already equipping researchers at pharmaceutical and biotech companies of all sizes with its tools, including internally developed foundation models for protein engineering. OpenProtein.AI also offers its platform free of charge to scientists in academia.
“It’s a really exciting time right now because these models can not only make protein engineering more efficient—which shortens development cycles for therapeutic and industrial use—they can also enhance our ability to design new proteins with specific traits,” says Beppler. “We’re also thinking about applying these approaches to non-protein modalities. The bigger picture is that we’re creating a language to describe biological systems.”
Advancing biology with AI
Bepler came to MIT in 2014 as part of the Computational and Systems Biology PhD program, studying under Bonnie Berger, MIT’s Simons Professor of Applied Mathematics. There he realized how little we understand about the molecules that form the building blocks of biology.
“We have not characterized biomolecules and proteins well enough to make good predictive models of what an entire genome circuit will do, or how a protein interaction network will behave,” recalls Beppler. “This got me interested in understanding proteins at a more granular level.”
Bepler began searching for ways to predict the sequence of amino acids that make up proteins by analyzing evolutionary data. This was before Google released AlphaFold, a powerful predictive model for protein structure. This work gave rise to one of the first generative AI models for understanding and designing proteins – what the team calls the Protein Language Model.
“I was really excited about the connections between the classical structures of proteins and their sequence, structure, and function. We don’t understand those links very well,” says Beppler. “So how can we use these foundation models to skip the ‘structure’ component and go straight from sequence to function?”
After earning his PhD in 2020, Beipler joined Lu’s laboratory in MIT’s Department of Biological Engineering as a postdoc.
“It was around that time that the idea of integrating AI with biology was starting to take hold,” Lu recalls. “Tristan helped us create better computational models for biological design. We also realized there was a gap between the most cutting-edge tools available and biologists who would love to use these things but don’t know how to code. OpenProtein came from the idea of broadening access to these tools.”
Beppler worked at the forefront of AI as part of his PhD. He knew that technology could help scientists speed up their work.
“We started with the idea of building a general-purpose platform for doing machine learning-in-the-loop protein engineering,” says Beppler. “We wanted to create something that was user-friendly because machine-learning ideas are kind of esoteric. They require implementation, GPUs, fine-tuning, designing libraries of sequences. Especially at that time, it was a lot for biologists to learn.”
In contrast, OpenProtein’s platform offers an intuitive web interface for biologists to upload data and perform protein engineering work with machine learning. It includes a range of open-source models, including PoET, OpenProtein’s flagship protein language model.
PoET, short for Protein Evolutionary Transformer, was trained on protein clusters to generate sets of related proteins. Beppler and colleagues showed that it can generalize about evolutionary constraints on proteins and incorporate new information on protein sequences without retraining, allowing other researchers to add experimental data to improve the model.
“Researchers can use their own data to train models and optimize protein sequences, and then they can use our other tools to analyze those proteins,” says Beppler. “People are generating libraries of protein sequences in silico [on computers] And then running them through predictive models to obtain validation and structural predictors. It’s basically a no-code front-end, but we also have APIs for those who want to access it with code.
Models help researchers rapidly design proteins, then decide which models are promising enough for further laboratory testing. Researchers can also input proteins of interest, and the model can generate new proteins with similar properties.
Since its inception, the team at OpenProtein has continued to add tools to its platform for researchers regardless of the size or resources of the laboratory.
“We’ve worked really hard to make the platform an open-ended toolbox,” says Bepler. “It has specific workflows, but it’s not tied specifically to one protein function or class of protein. One of the nice things about these models is that they are very good at understanding proteins broadly. They learn about the entire space of potential proteins.”
Enabling the next generation of treatments
Large pharmaceutical company Boehringer Ingelheim began using OpenProtein’s platform as early as 2025. Recently, the companies announced an expanded collaboration, in which OpenProtein’s platform and models will be incorporated into Boehringer Ingelheim’s work as it engineers proteins to treat diseases such as cancer and autoimmune or inflammatory diseases.
Last year, OpenProtein also released a new version of its protein language model, PoET-2, which outperforms much larger models while using a smaller fraction of the computing resources and experimental data.
“We really want to solve the question of how we describe proteins,” says Beppler. “What is the meaningful, domain-specific language of protein constraints that we use when generating them? How can we introduce more evolutionary barriers? How can we describe the enzymatic reaction performed by a protein such that a model can generate the sequence to perform that reaction?
moving forward, The founders are hoping to create models that take into account the changing, interconnected nature of protein function.
“The area I’m excited about is going beyond protein binding events and using these models to predict and design dynamic properties, where proteins have to engage two, three or four biological mechanisms at the same time, or change their function after binding,” says Lu, who currently serves in an advisory role for the company.
As advances are made in AI, OpenProtein continues to see its mission as providing scientists with the best tools to develop new treatments faster.
“As the work becomes more complex, with approaches incorporating things like protein logic and dynamic therapy, existing experimental toolsets become limited,” says Lu. “Creating open ecosystems around AI and biology is really important. There is a risk that AI resources could become so concentrated that the average researcher can’t use them. Open access is extremely important for the progress of the scientific field.”