Many attempts have been made to exploit the new artificial intelligence and the power of the large language model (LLM) to try to predict the results of new chemical reactions. They have got limited success, in the part because till now they are not understood by fundamental physical principles, such as the rules of conservation of mass. Now, a team of MIT researchers is a way to include these physical obstacles on a reaction prediction model, and thus improving its output accuracy and reliability.
The new work was told in the journal on August 20 NatureRecently in a paper by Postdock Jonong Jong (now an assistant professor at Kukmin University, South Korea); Former software engineer Mun Hong Fong (now at Duke University); Chemical Engineering Graduate Student Nicholas Cassette; Postdock Jordan Lyles; Physics Bachelor of Physics NE Dassanayake; And senior writer Conner Koli, who is the class of Career Development Professor of 1957 in MIT departments of Chemical Engineering and Electrical Engineering and Computer Science.
“Response results predict is a very important task,” Jong explains. For example, if you want to create a new medicine, “you have to know how to make it. Therefore, for this we need to know what the product is likely for a response from the given set of chemical input”. But most of the previous efforts to carry out such predictions only look at a set of input and output, without viewing or considering obstacles to ensure that no mass is obtained or lost in this process, which is not possible in real reactions.
Jong explains that while large language models such as Chatgpt have been very successful in many fields of research, these models do not provide a way to limit their output to physically realistic possibilities, such as they need to follow the protection of mass. These models use computational “tokens”, which represent individual atoms in this case, but “If you do not conserve tokens, the LLM model starts creating new atoms, or removes atoms in response.” Instead of being in real scientist understanding, “it’s like alchemy,” they say. While many attempts to predict the response only look at the final products, “We want to track all chemicals, and how chemicals are replaced” from beginning to end in the response process, they say.
To solve the problem, the team used a method developed by Chemist Ever UG in the 1970s, using a bond-electron matrix to represent electrons in a reaction. He used this system as the basis of his new program, called the flower (flow matching for electron redistribution), which allows them to have a clear track on all electrons in response to ensure that no one is fully added or removed in this process.
The system uses a matrix to represent electrons in a response, and uses nongero values and represents its deficiency to represent bonds or loans electron and zero. “This helps us preserve both atoms and electrons at the same time,” Fong says. This representation, they say, was one of the major elements involving mass protection in their prediction system.
The system developed by them is still in an early stage, called Koli. “The system as it stands up, a performance – a proof of the concept is that this common approach to flow matching is very well suited to the function of chemical reaction prediction.” While the team is excited about this promising approach, they say, “We know it has specific limitations as far as the width of various chemistry is that it is seen.” Although the model was trained using data on more than a million chemical reactions, an American patent office was obtained from the database, those data do not include some metals and some types of catalytic reactions, they say.
“We are incredibly excited about the fact that we can achieve such reliable predictions of chemical mechanisms from the current system”. “It preserves mass, it preserves electrons, but we certainly accept that there is a lot of expansion and strength to work even in the coming years.”
But even in its current form, which is being provided independently through the online platform gitb, “we think it will do accurate predictions and help as a tool to assess reactions and map reaction routes,” called Koli. “We are not enough if we are really looking at the future of the art of mechanically understanding and helping to help invent new reactions. But we hope that it will be a step towards it.”
“It is all open source,” Fong says. “Models, data, all of them are there,” including a previous dataset developed by Jong, which eliminates the mechanical stages of known reactions. “I think we are one of the leading groups that make this dataset, and provide it open-sources, and make it usable for all,” they say.
Flower model matches the existing approaches to find standard model mechanical routes or performs better, the team says, and makes it possible to normalize for the pre -unseen response types. They say that the model may be relevant to predict medicinal chemistry, material discovery, combustion, atmospheric chemistry and reactions to electrical chemical systems.
Compared to them with the current response propheal systems, Koli says, “Using the architecture options we have created, we get this huge increase in validity and protection, and we get a matching or slightly better accuracy in terms of performance.”
He says that “what is unique about our approach is that when we are using the understanding of these textbooks of the mechanism to generate this dataset, we are anchoring activists and products of the overall response to the experimentally valid data from patent literature.” They are referring to the underlying mechanism, they say, instead of making them only. “We are applying them to experimental data, and this is nothing that has been done earlier and shared on such a scale.”
The next step, he says, “We are quite interested in expanding the understanding of the model of metals and catalyst cycles. We have scratched the surface in this first paper,” and most of the reactions involved have not included metals or catalysts, “so this is a direction in which we are quite interested.”
In the long term, they say, “A lot of excitement is to find new complex reactions in using such a system and to help clarify the new mechanisms.
The work was supported by the pharmaceutical discovery and machine learning for synthesis consortium and the National Science Foundation.