New development in multi-fidelity machine learning methods opens up possibilities for the use of heterogeneous data for the prediction of quantum chemical properties

Vinod, V., & Zaspel, P. (2024). Assessing Non-Nested Configurations of Multifidelity Machine Learning for Quantum-Chemical Properties. arXiv preprint 2407.17087, http://arxiv.org/abs/2407.17087 

Multi-fidelity methods in machine learning (ML) of quantum chemistry (QC) properties have made high accuracy low cost models more accessible to the community. These have been used in application for a range of properties including excitation energies. Most multi-fidelity methods usually require a nested configuration of the training data, that is, calculations for a geometry are to be made at the lower fidelities as well as the higher fidelities. 
In a recent work, available as a preprint the authors, Vivin Vinod and Peter Zaspel assess a non-nested configuration of multi-fidelity machine learning (MFML) and optimized MFML (o-MFML) methods. Preliminary results suggest that while MFML would still require a nested data structure, o-MFML can generalize reasonably well over a non-nested training data structure. That is, o-MFML opens up avenues for the use of heterogeneous datasets reducing the requirement to make costly calculations for high-fidelity data.

Dataset of diverse quantum chemical properties to enable research and benchmarking of multifidelity machine learning models released!

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules. arXiv preprint arXiv:2406.14149 https://doi.org/10.48550/arXiv.2406.14149.

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules (1.0) [Data set]. Zenodo. https://zenodo.org/records/11636903.

With research booming in the field of multifidelity methods for Quantum Chemistry (QC), it becomes important to benchmark the various methods in interest of meaningful comparison of the models. This allows for expedited research by setting standards which subsequent research assess with their own methodological developments. In interest of such a uniform comparison, the quantum Chemistry MultiFidelity (CheMFi) dataset was distributed to the community on an open source CC-BY-4.0 license. Containing 135k geometries of diverse and chemically complex molecules taken from the WS22 database, the CheMFi datatset contains QC properties ranging from excitation energies to molecular dipole moments. For each property, five fidelities of properties are provided with DFT accuracy. The fidelities themselves are formed on the basis set choice. This dataset is a major step in the direction of research and development of multifidelity machine learning methods for QC.

The authors of this work are Vivin Vinod and Peter Zaspel. The dataset is available on ZENODO and the accompanying manuscript is available as a preprint.

Optimal Combination with Multifidelity Machine Learning Achieves Coupled Cluster Accuracy

Vinod, V., Kleinekathöfer, U., & Zaspel, P. (2024). Optimized multifidelity machine learning for quantum chemistry. Machine Learning: Science and Technology, 5(1), 015054 http://doi.org/10.1088/2632-2153/ad2cef.

Recent research in Multifidelity Machine Learning (MFML) has resulted in ML methods that reduce the cost of generating a training set without compromising on the accuracy of the predictions. This is achieved by the combination of cheaper and less accurate data with high accuracy (or fidelity) and high cost data. In this work, a novel methodological improvement of MFML is benchmarked for various quantum chemical (QC) properties. Optimized MFML (o-MFML) performs the combination of the different fidelities of data are using an Optimal Combination method. With this improvement, it is shown that high accuracy methods such as Coupled Cluster Singlets Double (Triplet) are now more accessible that ever to the ML-QC community. The work is available in the Machine Learning: Science and Technology journal from IOPScience and is authored by Vivin Vinod, Ulrich Kleinekathöfer, and Peter Zaspel.

Reducing Compute Costs of Generating Training Data for Excitation Energy Prediction Using Multifidelity Methods

Vinod, V., Maity, S., Zaspel, P., & Kleinekathöfer, U. (2023). Multifidelity machine learning for molecular excitation energies. Journal of Chemical Theory and Computation, 19(21), 7658-7670 https://doi.org/10.1021/acs.jctc.3c00882.

A major challenge to accurate predictions of quantum chemical (QC) properties with machine learning methods is the lack of high accuracy data. Generating high accuracy training data for machine learning (ML) is computationally expensive. With the multifidelity machine learning (MFML) method, cheaper and less accurate data is used alongside very little high accuracy data to result in a model with better accuracy in predicting high fidelity data. In this work, the MFML method is benchmarked for vertical excitation energies, a QC property vital to understanding elementary life processes such as photosynthesis. Numerical results indicate a time benefit over a factor of 30. This is a strong step towards development of ML methods for QC reducing the compute cost of generating a training set. This work is authored by Vivin Vinod, Sayan Maity, Peter Zaspel, and Ulrich Kleinekathöfer and has been published in the Journal of Chemical Theory and Computation.