Rapid analysis of melting point trends and models using Google Apps Scripts

I recently reported on how Google Apps Scripts can be used to facilitate the recording and calculations associated with a chemistry laboratory notebook. (also see resource page)

I will demonstrate here how these scripts can be used to rapidly discover trends in the melting points of analogs for the curation of data and the evaluation of models. The two melting point services that Andrew Lang created under the gONS menu were used to keep track of the measured and predicted melting points for all reactants and product as part of a "dashboard view" of the reaction being performed.

For looking at melting point trends, the following template sheet can be used.

For reasons explained previously, the template sheet has no active scripts in the page (except for the images). These are just the values generated from running the scripts corresponding to the column headings on the common names. In order to use for another series of compounds just make a copy of the entire Google Spreadsheet (File->Make a Copy) then enter the new list and pick the desired script to run from the menus. Once the values are computed remember to copy and paste as values.

It is important to understand that our melting point service is not a "trusted source" - it simply reports the average of all recorded data sources, ignoring values marked as DONOUSE. That means that not all data points are equal and it is up to the user to determine a threshold of some type to decide how to use a particular data point.

In this investigation, I have marked in green averaged experimental values where at least 3 different values are clustered within a few degrees. A link in column H is automatically generated from the CSID to provide a very convenient way to evaluate the data sources. For example the link for methanol has 3 very close but different melting point values: -98 C, -97.6 C and -97.53 C. The -98 C value is repeated 7 times because this resulted from the automatic merging of several Open Collections.

In general we don't manually add values that are identical from different sources because it is likely that these all originate from the same source. We have to make that assumption because proper data provenance is usually lacking in chemical information sources today. A Google search will often return the same one or two melting points from dozens of sites, which may turn out to be an outlier when compared with other independent sources. (CAS numbers are generated in the template sheet because they are useful for searching Google for melting points - for example see here for methanol)

In another scenario where there are 3 or more different but close values and a few clear marked outliers, I considered these averages as having passed my threshold and colored these green as well. A good example is ethanol, which I have previously used to illustrate our curation method.

It turns out that for the series of n-alcohols from methanol to 1-decanol, I was able to mark in green every experimental melting point average, making the confidence level of the following plot about as high as it can get from current chemical information sources.

It is particularly gratifying to note that the predicted melting points based on Andrew Lang's random forest Model002 perform very well here, even predicting a melting point minimum at 3 carbons. Note that this model is Open Source and uses Open Descriptors derived from the CDK. It does not yet include the results of our most recent curation efforts. Any new models incorporating improved datasets will be listed here.

Extending the analysis to n-alkyl carboxylic acids from formic acid to decanoic acid provides the following plot, with the same confidence for the experimental averages.

For this series, the random forest model not only predicts that the lowest melting point is for the 5 carbon analog but it also appears to take the shape of a zig-zag pattern, especially for the first 6 acids. Since this alternating pattern has been attributed to the way that carboxylic acid dimer bilayers pack in 3D (Bond2004), it is hard to imagine how simple 2D descriptors from the CDK can predict this. We will have to investigate this in more detail.

More generally, molecular symmetry can greatly affect the melting point via the way that crystals pack in 3D (see Carnelley's Rule, Brown2000). At some point we would like to incorporate this factor in our models. The current model should not be able to make predictions based on symmetry or stereochemistry.

We can also explore the melting point patterns of cyclic systems. Going from cyc
lopropane to cyclohexane there is a large jump from a 5 to a 6 membered ring and this is roughly reflected in the model:

Cycloalkanones behave similarly to cycloalkanes, showing a jump from 5 to 6 membered rings which agrees well with the model going from cyclobutanone to cyclohexanone:

However, in going from methylcylopropane to methylcyclohexane, the model diverges substantially from experimental results. It does start to get harder to find corroborating melting points and only 2 values can be found for methylcyclobutane.

Going from cyclopropanecarboxylic acid to cyclohexanecarboxylic acid shows a U-type pattern and is not well matched by the model. However, there is additional uncertainty about the melting point of cyclopentanecarboxylic acid.

For the series from cyclopropylamine to cyclohexylamine, there initially appears to be a significant mismatch between the model and experiment. However, because we have retained the provenance information in the spreadsheet it becomes clear that the cyclobutylamine number (in the orange square below) comes from a single source. There is actually a good match between the other 3 values. However, as demonstrated here, there has not been enough information on when the model is reliable to assign the source of the discrepancy at this point.

These examples show that provenance information is a critical dimension in the analysis of trends in melting point data. The Google Apps Scripts and associated Google Spreadsheet template presented here offer a quick and convenient way to provide access to both averaged values and a way of assessing confidence in an averaged value. Performing these tasks manually is generally too time-consuming to encourage researchers to follow such a practice. This is perhaps the reason that the current peer-review process accepts a single "trusted source" in analyses of this kind, even though such a practice inevitably leads to mis-interpretations and errors that cascade through the scientific literature.

ONS t-shirts from Zazzle [Last Updated On: August 17th, 2024] [Originally Added On: April 6th, 2010]
Scientists Embrace Openness Article in Science Careers [Last Updated On: August 17th, 2024] [Originally Added On: April 12th, 2010]
ONS Books Wiki [Last Updated On: August 17th, 2024] [Originally Added On: April 20th, 2010]
Reaction Attempts Book Edition 1 and UsefulChem Archive [Last Updated On: August 17th, 2024] [Originally Added On: April 28th, 2010]
NMR integration web service expanded [Last Updated On: August 17th, 2024] [Originally Added On: May 1st, 2010]
The Synaptic Leap Experiments on Reaction Attempts [Last Updated On: August 17th, 2024] [Originally Added On: May 3rd, 2010]
ChemSpider SyntheticPages [Last Updated On: August 17th, 2024] [Originally Added On: May 4th, 2010]
The Scientist Article on Electronic Lab Notebooks [Last Updated On: August 17th, 2024] [Originally Added On: May 7th, 2010]
OpenSciNY Open Notebook Science Talk [Last Updated On: August 17th, 2024] [Originally Added On: May 17th, 2010]
Setac Europe 2010: â€˜Itâ€™ll all come out in the washâ€™ [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
ASMS: Forget Vioxx, eat chocolate? [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
Smoking could be good for you – if you get the message [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
Chemistry World's round-up of money and molecules [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
ASMS: Anthrax attacks [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
This week on Chemistry World… [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
Use of ONS to protect Open Research: the case of the Ugi approach to Praziquantel [Last Updated On: August 17th, 2024] [Originally Added On: June 3rd, 2010]
IGERT NSF panel on Digital Science [Last Updated On: August 17th, 2024] [Originally Added On: June 8th, 2010]
Reaction Attempts Explorer [Last Updated On: August 17th, 2024] [Originally Added On: June 25th, 2010]
Methanol Solubility Prediction Model 4 for Ugi reactions in the literature [Last Updated On: August 17th, 2024] [Originally Added On: July 8th, 2010]
Secrecy in Astronomy and the Open Science Ratchet [Last Updated On: August 17th, 2024] [Originally Added On: July 12th, 2010]
Resveratrol Thesis on Reaction Attempts [Last Updated On: August 17th, 2024] [Originally Added On: July 22nd, 2010]
General Transparent Solubility Prediction using Abraham Descriptors [Last Updated On: August 17th, 2024] [Originally Added On: July 25th, 2010]
Berkeley Open Science Summit 2010 Notes [Last Updated On: August 17th, 2024] [Originally Added On: August 2nd, 2010]
The Reaction Attempts Solvent Selector [Last Updated On: August 17th, 2024] [Originally Added On: August 8th, 2010]
Green Solvent Metric on Solvent Predictor [Last Updated On: August 17th, 2024] [Originally Added On: August 17th, 2010]
ChemTaverna Workflows of ONS Web Services now on MyExperiment [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
Open Notebook Science in Drug Discovery at Opal Event [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
Cheminfo Retrieval Classes 1 and 2 in 2010 [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
The Meaning of Data panel at a class on the Rhetoric of Science [Last Updated On: August 17th, 2024] [Originally Added On: October 11th, 2010]
Dynamic links to private tagged Mendeley collections [Last Updated On: August 17th, 2024] [Originally Added On: October 16th, 2010]
Elizabeth Brown's guest lecture for ChemInfo Retrieval [Last Updated On: August 17th, 2024] [Originally Added On: November 7th, 2010]
Nanoinformatics 2010 Conference Report [Last Updated On: August 17th, 2024] [Originally Added On: November 7th, 2010]
Dana Vanderwall on Cheminformatics at Drexel [Last Updated On: August 17th, 2024] [Originally Added On: December 11th, 2010]
Mirza PhD defense on the Ugi reaction for anti-malarial screening [Last Updated On: August 17th, 2024] [Originally Added On: December 13th, 2010]
Visualizing Social Networks in Open Notebooks [Last Updated On: August 17th, 2024] [Originally Added On: December 20th, 2010]
Chemical Information Validation Results from Fall 2010 [Last Updated On: August 17th, 2024] [Originally Added On: January 12th, 2011]
Science Online 2011 Thoughts [Last Updated On: August 17th, 2024] [Originally Added On: January 22nd, 2011]
The Spectral Game with ChemDoodle [Last Updated On: August 17th, 2024] [Originally Added On: February 14th, 2011]
Predicting temperature-dependent solubility for solvent selection [Last Updated On: August 17th, 2024] [Originally Added On: February 14th, 2011]
Alfa Aesar melting point data now openly available [Last Updated On: August 17th, 2024] [Originally Added On: February 22nd, 2011]
ONS Solubility Challenge Book cited in a Langmuir nanotechnology paper [Last Updated On: August 17th, 2024] [Originally Added On: February 27th, 2011]
Validating Melting Point Data from Alfa Aesar, EPI and MDPI [Last Updated On: August 17th, 2024] [Originally Added On: March 6th, 2011]
Open modeling of melting point data [Last Updated On: August 17th, 2024] [Originally Added On: March 23rd, 2011]
Towards the automated discovery of useful solubility applications [Last Updated On: August 17th, 2024] [Originally Added On: March 29th, 2011]
ACS and ACRL presentations on web services and trust in science [Last Updated On: August 17th, 2024] [Originally Added On: April 10th, 2011]
Collaboration using Open Notebook Science in Academia book chapter [Last Updated On: August 17th, 2024] [Originally Added On: May 8th, 2011]
Evan Curtin is the May 2011 RSC ONS Challenge Winner [Last Updated On: August 17th, 2024] [Originally Added On: May 8th, 2011]
Breast Cancer Coalition talk on ONS and Taxol solubility [Last Updated On: August 17th, 2024] [Originally Added On: May 15th, 2011]
La Science par Cahier de Laboratoire Ouvert à l'Acfas [Last Updated On: August 17th, 2024] [Originally Added On: May 15th, 2011]
More Open Melting Points from EPI and other sources: on the path to ultimate curation [Last Updated On: August 17th, 2024] [Originally Added On: May 29th, 2011]
More on 4-benzyltoluene and the impact of melting point data curation and transparency [Last Updated On: August 17th, 2024] [Originally Added On: June 12th, 2011]
The quest to determine the melting point of 4-benzyltoluene [Last Updated On: August 17th, 2024] [Originally Added On: June 12th, 2011]
Open Melting Points on iPhone via MMDS [Last Updated On: August 17th, 2024] [Originally Added On: June 12th, 2011]
My talk at SLA on Trust in Science and Open Melting Point Collections [Last Updated On: August 17th, 2024] [Originally Added On: June 19th, 2011]
Live Tweeting Haumea: the Open Science Ratchet at work? [Last Updated On: August 17th, 2024] [Originally Added On: June 19th, 2011]
Google Apps Scripts for an intuitive interface to organic chemistry Open Notebooks [Last Updated On: August 17th, 2024] [Originally Added On: June 19th, 2011]
The 4-benzyltoluene melting point twist [Last Updated On: August 17th, 2024] [Originally Added On: June 26th, 2011]
Open Notebook Science Talk at HUBbub 2011 [Last Updated On: August 17th, 2024] [Originally Added On: July 3rd, 2011]
Practical Tips on using Google Apps Scripts for Chemistry Applications [Last Updated On: August 17th, 2024] [Originally Added On: July 17th, 2011]
Burberry Acoustic - 'Chemistry' by One Night Only for Vogue Fashion Night Out [Last Updated On: August 17th, 2024] [Originally Added On: July 23rd, 2011]
Open Melting Point Collection Book Edition 1 [Last Updated On: August 17th, 2024] [Originally Added On: August 14th, 2011]
Google Apps Scripts Workshop at Drexel University [Last Updated On: August 17th, 2024] [Originally Added On: August 21st, 2011]
Patrick Ndungu talk at Drexel on Nanotechnology [Last Updated On: August 17th, 2024] [Originally Added On: August 21st, 2011]
MiniSymposium Bradley Lab 2011 [Last Updated On: August 17th, 2024] [Originally Added On: October 9th, 2011]
Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 10th, 2011]
Interpol - Rest My Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 10th, 2011]
Queens Of The Stone Age - Better Living Through Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 12th, 2011]
Greatest Chemistry Discoveries - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 13th, 2011]
Butterfingers - The Chemistry - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 13th, 2011]
Lec 1 Introduction to Chemistry [Last Updated On: August 17th, 2024] [Originally Added On: October 14th, 2011]
KATNISS AND PEETA: Chemistry Screen Test using the cave scene from The Hunger Games - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 15th, 2011]
The Smiths - Live on Data Run c. 1984, a British TV Program - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 15th, 2011]
Organic Chemistry reactions - 7 clues from Obi Wan - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 15th, 2011]
CHEMISTRY Period Live Fullmetal Alchemist Brotherhood 4 OP - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 16th, 2011]
Rush - Chemistry - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 16th, 2011]
The Office: Jim and Pam - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 17th, 2011]
Chemistry 1A - Lecture 3 - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 19th, 2011]
Chemical Party - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 19th, 2011]
Shiny Toy Guns-Chemistry of a Car Crash (with lyrics) - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 21st, 2011]
Balancing Chemical Equations - Video [Last Updated On: August 17th, 2024] [Originally Added On: October 22nd, 2011]