Categories: NewsProductivity

MIT Engineers Develop AI Model to Predict Molecular Solubility in Organic Solvents

A team of chemical engineers at MIT has taken a major stride forward in streamlining the synthesis of chemical compounds, including pharmaceuticals, with their novel machine learning model. This powerful tool can predict the solubility of a molecule in various organic solvents—a crucial step in drug development. Not only can this accelerate the manufacture of new treatments but it also provides an opportunity to find safer, more sustainable options for industrial solvents.

The Optimization of Solvent Selection

The importance of selecting the most suitable solvent in chemical synthesis cannot be overstated. With a plethora of organic solvents like ethanol and acetone available, each differing in environmental impact and effectiveness, the ability to make an accurate and efficient choice is paramount. This is where the innovative model, spearheaded by graduate students Lucas Attia and Jackson Burns, comes into play. “Predicting solubility really is a rate-limiting step in synthetic planning and manufacturing of chemicals, especially drugs. There’s been a longstanding interest in being able to make better predictions of solubility,” explains Attia.

The model, fondly called FastSolv, is freely accessible and is already being utilized by several research labs and corporations. One significant benefit it provides is the ability to spot less hazardous substitutes to the commonly employed industrial solvents. “There are some solvents which are known to dissolve most things. They’re useful, but they’re damaging to the environment and to people. Our model is extremely useful in identifying the next-best solvent, which is hopefully much less damaging,” Burns clarifies.

Birth and Evolution of a Game-Changing Model

Interestingly, this invention was the outcome of a curriculum project at MIT that fused machine learning with chemical engineering. Before this, the Abraham Solvation Model was the go-to way to estimate solubility based on molecular structure, albeit its accuracy was restrained.

In a bid to overcome such constraints, MIT’s Green Lab introduced SolProp in 2022. Although it utilized thermodynamic properties to predict solubility, it faltered when confronting unfamiliar molecules, a common occurrence in drug development. The breakthrough, it turned out, was just around the corner with the release of a comprehensive dataset—BigSolDB in 2023. Comprising solubility data from nearly 800 scientific papers, it enveloped close to 800 molecules and upwards of 100 solvents. Harnessing this vast reservoir, Attia and Burns built and trained two models—FastProp and ChemProp—with over 40,000 data points that even incorporated temperature impacts.

Impressive Outcomes and Future Potential

The duo was pleasantly surprised to note that both models excelled in performance, offering predictions that were two to three times more accurate than SolProp, notably capturing temperature-dependent solubility changes. “We were blown away to see that the static and learned embeddings were statistically indistinguishable in performance. That indicates the data quality is the main bottleneck, not the model architecture,” Burns shared.

Indeed, there’s scope for even better outcomes with more consistent experimental data, as variations in the solubility tests performed by different labs create noise and data variability. Mitigating this could further enhance the model’s effectiveness. “One of the big limitations of using these kinds of compiled datasets is that different labs use different methods and experimental conditions,” Attia highlighted. Despite its limitations, FastSolv, known for its speed and user-friendly nature, is already proving its universal applicability across pharmaceutical development, materials science, and green chemistry initiatives. Burns adds, “There are applications throughout the drug discovery pipeline. We’re also excited to see, outside of formulation and drug discovery, where people may use this model.”

This game-changing development was made possible due to funding provided by the U.S. Department of Energy. For a more detailed exploration of this breakthrough, you can visit the original article at MIT News.

Max Krawiec

Share
Published by
Max Krawiec

This website uses cookies.