Making solubility estimations for most organic compounds in a wide range of solvents freely available has always been a main long term objective for the Open Notebook Science Solubility Challenge. With current expertise and technology, it should be as easy to obtain a solubility estimate as it is now to get driving directions off the web.
Obviously this won't be attained purely by exhaustive measurements, although we have been focused on strategic measurements over the past two years. In parallel, we have been constantly evaluating the various solubility models out there for suitability.
Although there are several solubility models available for non-aqueous solvents, our additional requirement for transparent model building has proved surprisingly difficult to satisfy.
From this search, the Abraham solubility model [Abraham2009] floated to the top, with an important factor being that Abraham has made available extensive compilations of descriptors for solutes and solvents. In addition the algorithms used to convert solubility measurements to Abraham descriptors (a minimum of 5 different solvents per solute) has allowed us to generate our own Abraham descriptors automatically simply by recording new measurements into our SolSum Google Spreadsheet. These can be obtained in real time as well.
This approach permitted us to provide predictions for a limited number of solvents in a wide range of solvents and we have included these predictions in the past two editions (2nd and 3rd) of the ONS Challenge Solubility Book.
Coming at the problem from a different approach, Andrew Lang has also been trying to predict solubility using only open molecular descriptors, mainly relying on the CDK. Since our most commonly used solvent has been methanol, Andy recently generated a web service to predict solubility in that solvent.
By combining these two approaches, Andy has now created a modeling system that can not only generally predict solubility in a wide range (70+) of solvents - but it can also provide related data that can be used for modeling other phenomena such as intestinal absorption of a drug or crossing the blood-brain barrier.[Stovall 2007]
The idea is to use a Random Forest approach using freely available descriptors to predict the Abraham descriptors for any solute. A separate service then generates predicted solubilities for a wide range of solvents based on these Abraham descriptors. I'm using the term "freely available" because - although the CDK descriptors and VCCLab services are open - the model requires 2 descriptors only available from ChemSpider (ultimately from ACD/Labs).
Here is an example with benzoic acid. As long as the common name resolves to a single entry on ChemSpider, it is enough to enter it and it automatically populates the rest of the fields, which are then used by the service to generate the Abraham descriptors.
Hitting the prediction link above will automatically populate the second service and generate predicted solubilities for over 70 solvents.
This approach of allowing people to access these components separately can be useful. It can be instructive to manually play with the Abraham descriptors directly to see how predicted solubilities are affected. There are also situations where one has experimentally determined Abraham descriptors for a solute and bypassing the descriptor prediction step is required.
However, for those who prefer to cut to the chase, a convenient web service is available where the common name (or SMILES) of the solute is entered and the list of available solvents appears as a drop down menu.
Now here is where I think the real payoff comes for accelerating science with openness. Andy has also created a web service that returns the predicted solubility in molar as a number from common names (or SMILES) for solute and solvent via the URL. For example click this for benzoic acid in methanol. The advantage here is that solubility prediction can be easily integrated as a web service call from intuitive interfaces such as a Google Spreadsheet to enable even non-programmers to make use of the data. Notice that the web service provided in the fourth column for the average of measured solubility values enables an easy way to explore the accuracy of specific predictions.
Such web services could also be integrated with data from ChemSpider or custom systems. If those who use these services feed back their processed data to the open web, it could take us a step closer to automated reaction design. For example consider the custom application to select solvents for the Ugi reaction. Model builders could also use the web services for predicted and measured solubility directly.
A
while back we explored using Taverna for MyExperiment to create virtual libraries of SMILES. Unfortunately we ran into issues with getting the applications developed on Macs to run on our PCs. This might be worth revisiting as a means of filtering virtual libraries through different thresholds of predicted solubility.
while back we explored using Taverna for MyExperiment to create virtual libraries of SMILES. Unfortunately we ran into issues with getting the applications developed on Macs to run on our PCs. This might be worth revisiting as a means of filtering virtual libraries through different thresholds of predicted solubility.
Andy has described his model in detail in a fully transparent way - the model itself, how it was generated and the entire dataset can be found here. We would welcome improvements of the model as well as completely new models based on our dataset using only freely available tools.
It should be noted that when I use term "general" it refers to the ability for the model to generate a number for most compounds listed in ChemSpider. Obviously compounds that most closely resemble the training set are more likely to generate better estimates. Because of our synthetic objectives using the Ugi reaction we have mainly focused on collecting solubility data for carboxylic acids, aldehydes and amides either from new measurements or from the literature.
Another important point concerns the main intended application of the model: organic synthesis. Generally the range of interest for such applications is about 0.01 - 3M. This might be very different for other applications - such as the aqueous solubility of a drug, where distinctions between much lower solubilities may be important.
For a typical organic synthesis, a solubility of 0.001M or 0.005M will probably translate as effectively insoluble. This might be a desired property for a product intended to be isolated by filtration. On the other end of the scale knowing that a solubility is either 4M or 6M will not usually have an impact on reaction design. It is enough to know that a reactant will have good solubility in a particular solvent.
Given the above considerations for intended applications and the likelihood that the current model is far from optimized, the predictions should be used cautiously. We suggest that the model is best used as a "flagging device". For example, if a reaction is to be carried out at 0.5M, one may place a threshold at 0.4M for the predicted values of reactants during solvent selection, with the recognition that a predicted 0.4M may be an actual 0.55M. A similar threshold approach can be used for the product, where in this case the lowest solubility is desired. A practical example of this is the shortlisting of solvents candidates for the Ugi reaction.
Another example of flagging involves identifying the outliers in the model. These can be inspected for experimental errors and possibly remeasured. Alternatively outliers may shed light on the limitations of the model. For example we have found that the solubility of solutes with melting points near room temperature can be greatly underestimated by the current model. This may be an opportunity to develop other models which incorporate melting point or enthalpy of fusion.[Rohani 2008]
Although it is possible that better models and more data will improve the accuracy of the predictions, this can be true only if the training set is accurate enough. Based on conversations I've had with researchers who deal with solubility, reading modeling papers and our own experience with the ONS Challenge I am starting to suspect that much of the available data just isn't accurate enough for high precision modeling. Models using data from the literature are especially vulnerable I think. Take a look at this unsettling comparison between new measurements and literature values (not to mention the model) for common compounds.[Loftsson 2006] Here is a subset:
I have also made the point in detail for the aqueous solubility of EGCG. Could this be the reason that so many different solubility models using different physical chemistry principles have evolved and continue to co-exist?
The situation reminds me a lot of the discussions taking place in the molecular docking community.[Bissantz 2010] The differences in calculated binding energies are often small in comparison with the uncertainties involved. But docking can still be used as one tool among others to find drug candidates by flagging a collection of compounds above a certain threshold binding energy.
- ONS t-shirts from Zazzle [Last Updated On: August 17th, 2024] [Originally Added On: April 6th, 2010]
- Scientists Embrace Openness Article in Science Careers [Last Updated On: August 17th, 2024] [Originally Added On: April 12th, 2010]
- ONS Books Wiki [Last Updated On: August 17th, 2024] [Originally Added On: April 20th, 2010]
- Reaction Attempts Book Edition 1 and UsefulChem Archive [Last Updated On: August 17th, 2024] [Originally Added On: April 28th, 2010]
- NMR integration web service expanded [Last Updated On: August 17th, 2024] [Originally Added On: May 1st, 2010]
- The Synaptic Leap Experiments on Reaction Attempts [Last Updated On: August 17th, 2024] [Originally Added On: May 3rd, 2010]
- ChemSpider SyntheticPages [Last Updated On: August 17th, 2024] [Originally Added On: May 4th, 2010]
- The Scientist Article on Electronic Lab Notebooks [Last Updated On: August 17th, 2024] [Originally Added On: May 7th, 2010]
- OpenSciNY Open Notebook Science Talk [Last Updated On: August 17th, 2024] [Originally Added On: May 17th, 2010]
- Setac Europe 2010: ‘It’ll all come out in the wash’ [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
- ASMS: Forget Vioxx, eat chocolate? [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
- Smoking could be good for you – if you get the message [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
- Chemistry World's round-up of money and molecules [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
- ASMS: Anthrax attacks [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
- This week on Chemistry World… [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
- Use of ONS to protect Open Research: the case of the Ugi approach to Praziquantel [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
- IGERT NSF panel on Digital Science [Last Updated On: August 17th, 2024] [Originally Added On: June 8th, 2010]
- Reaction Attempts Explorer [Last Updated On: August 17th, 2024] [Originally Added On: June 25th, 2010]
- Methanol Solubility Prediction Model 4 for Ugi reactions in the literature [Last Updated On: August 17th, 2024] [Originally Added On: July 8th, 2010]
- Secrecy in Astronomy and the Open Science Ratchet [Last Updated On: August 17th, 2024] [Originally Added On: July 12th, 2010]
- Resveratrol Thesis on Reaction Attempts [Last Updated On: August 17th, 2024] [Originally Added On: July 22nd, 2010]
- Berkeley Open Science Summit 2010 Notes [Last Updated On: August 17th, 2024] [Originally Added On: August 2nd, 2010]
- The Reaction Attempts Solvent Selector [Last Updated On: August 17th, 2024] [Originally Added On: August 8th, 2010]
- Green Solvent Metric on Solvent Predictor [Last Updated On: August 17th, 2024] [Originally Added On: August 17th, 2010]
- ChemTaverna Workflows of ONS Web Services now on MyExperiment [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
- Open Notebook Science in Drug Discovery at Opal Event [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
- Cheminfo Retrieval Classes 1 and 2 in 2010 [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
- The Meaning of Data panel at a class on the Rhetoric of Science [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
- Dynamic links to private tagged Mendeley collections [Last Updated On: August 17th, 2024] [Originally Added On: October 16th, 2010]
- Elizabeth Brown's guest lecture for ChemInfo Retrieval [Last Updated On: August 17th, 2024] [Originally Added On: November 7th, 2010]
- Nanoinformatics 2010 Conference Report [Last Updated On: August 17th, 2024] [Originally Added On: November 7th, 2010]
- Dana Vanderwall on Cheminformatics at Drexel [Last Updated On: August 17th, 2024] [Originally Added On: December 11th, 2010]
- Mirza PhD defense on the Ugi reaction for anti-malarial screening [Last Updated On: August 17th, 2024] [Originally Added On: December 13th, 2010]
- Visualizing Social Networks in Open Notebooks [Last Updated On: August 17th, 2024] [Originally Added On: December 20th, 2010]
- Chemical Information Validation Results from Fall 2010 [Last Updated On: August 17th, 2024] [Originally Added On: January 12th, 2011]
- Science Online 2011 Thoughts [Last Updated On: August 17th, 2024] [Originally Added On: January 22nd, 2011]
- The Spectral Game with ChemDoodle [Last Updated On: August 17th, 2024] [Originally Added On: February 14th, 2011]
- Predicting temperature-dependent solubility for solvent selection [Last Updated On: August 17th, 2024] [Originally Added On: February 14th, 2011]
- Alfa Aesar melting point data now openly available [Last Updated On: August 17th, 2024] [Originally Added On: February 22nd, 2011]
- ONS Solubility Challenge Book cited in a Langmuir nanotechnology paper [Last Updated On: August 17th, 2024] [Originally Added On: February 27th, 2011]
- Validating Melting Point Data from Alfa Aesar, EPI and MDPI [Last Updated On: August 17th, 2024] [Originally Added On: March 6th, 2011]
- Open modeling of melting point data [Last Updated On: August 17th, 2024] [Originally Added On: March 23rd, 2011]
- Towards the automated discovery of useful solubility applications [Last Updated On: August 17th, 2024] [Originally Added On: March 29th, 2011]
- ACS and ACRL presentations on web services and trust in science [Last Updated On: August 17th, 2024] [Originally Added On: April 10th, 2011]
- Collaboration using Open Notebook Science in Academia book chapter [Last Updated On: August 17th, 2024] [Originally Added On: May 8th, 2011]
- Evan Curtin is the May 2011 RSC ONS Challenge Winner [Last Updated On: August 17th, 2024] [Originally Added On: May 8th, 2011]
- Breast Cancer Coalition talk on ONS and Taxol solubility [Last Updated On: August 17th, 2024] [Originally Added On: May 15th, 2011]
- La Science par Cahier de Laboratoire Ouvert à l'Acfas [Last Updated On: August 17th, 2024] [Originally Added On: May 15th, 2011]
- More Open Melting Points from EPI and other sources: on the path to ultimate curation [Last Updated On: August 17th, 2024] [Originally Added On: May 29th, 2011]
- More on 4-benzyltoluene and the impact of melting point data curation and transparency [Last Updated On: August 17th, 2024] [Originally Added On: June 12th, 2011]
- The quest to determine the melting point of 4-benzyltoluene [Last Updated On: August 17th, 2024] [Originally Added On: June 12th, 2011]
- Open Melting Points on iPhone via MMDS [Last Updated On: August 17th, 2024] [Originally Added On: June 12th, 2011]
- My talk at SLA on Trust in Science and Open Melting Point Collections [Last Updated On: August 17th, 2024] [Originally Added On: June 19th, 2011]
- Live Tweeting Haumea: the Open Science Ratchet at work? [Last Updated On: August 17th, 2024] [Originally Added On: June 19th, 2011]
- Google Apps Scripts for an intuitive interface to organic chemistry Open Notebooks [Last Updated On: August 17th, 2024] [Originally Added On: June 19th, 2011]
- The 4-benzyltoluene melting point twist [Last Updated On: August 17th, 2024] [Originally Added On: June 26th, 2011]
- Open Notebook Science Talk at HUBbub 2011 [Last Updated On: August 17th, 2024] [Originally Added On: July 3rd, 2011]
- Practical Tips on using Google Apps Scripts for Chemistry Applications [Last Updated On: August 17th, 2024] [Originally Added On: July 17th, 2011]
- Burberry Acoustic - 'Chemistry' by One Night Only for Vogue Fashion Night Out [Last Updated On: August 17th, 2024] [Originally Added On: July 23rd, 2011]
- Rapid analysis of melting point trends and models using Google Apps Scripts [Last Updated On: August 17th, 2024] [Originally Added On: July 24th, 2011]
- Open Melting Point Collection Book Edition 1 [Last Updated On: August 17th, 2024] [Originally Added On: August 14th, 2011]
- Google Apps Scripts Workshop at Drexel University [Last Updated On: August 17th, 2024] [Originally Added On: August 21st, 2011]
- Patrick Ndungu talk at Drexel on Nanotechnology [Last Updated On: August 17th, 2024] [Originally Added On: August 21st, 2011]
- MiniSymposium Bradley Lab 2011 [Last Updated On: August 17th, 2024] [Originally Added On: October 9th, 2011]
- Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 10th, 2011]
- Interpol - Rest My Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 10th, 2011]
- Queens Of The Stone Age - Better Living Through Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 12th, 2011]
- Greatest Chemistry Discoveries - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 13th, 2011]
- Butterfingers - The Chemistry - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 13th, 2011]
- Lec 1 Introduction to Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 14th, 2011]
- KATNISS AND PEETA: Chemistry Screen Test using the cave scene from The Hunger Games - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 15th, 2011]
- The Smiths - Live on Data Run c. 1984, a British TV Program - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 15th, 2011]
- Organic Chemistry reactions - 7 clues from Obi Wan - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 15th, 2011]
- CHEMISTRY Period Live Fullmetal Alchemist Brotherhood 4 OP - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 16th, 2011]
- Rush - Chemistry - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 16th, 2011]
- The Office: Jim and Pam - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 17th, 2011]
- Chemistry 1A - Lecture 3 - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 19th, 2011]
- Chemical Party - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 19th, 2011]
- Shiny Toy Guns-Chemistry of a Car Crash (with lyrics) - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 21st, 2011]
- Balancing Chemical Equations - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 22nd, 2011]