Dataset of diverse quantum chemical properties to enable research and benchmarking of multifidelity machine learning models released!

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules. arXiv preprint arXiv:2406.14149

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules (1.0) [Data set]. Zenodo.

With research booming in the field of multifidelity methods for Quantum Chemistry (QC), it becomes important to benchmark the various methods in interest of meaningful comparison of the models. This allows for expedited research by setting standards which subsequent research assess with their own methodological developments. In interest of such a uniform comparison, the quantum Chemistry MultiFidelity (CheMFi) dataset was distributed to the community on an open source CC-BY-4.0 license. Containing 135k geometries of diverse and chemically complex molecules taken from the WS22 database, the CheMFi datatset contains QC properties ranging from excitation energies to molecular dipole moments. For each property, five fidelities of properties are provided with DFT accuracy. The fidelities themselves are formed on the basis set choice. This dataset is a major step in the direction of research and development of multifidelity machine learning methods for QC.

The authors of this work are Vivin Vinod and Peter Zaspel. The dataset is available on ZENODO and the accompanying manuscript is available as a preprint.