V. Vinod and P. Zaspel. Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies. J. Chem. Theory Comput, 21, 6, 3077–3091, 2025. DOI: 10.1021/acs.jctc.4c01491; also available as arXiv:2410.11392.
Multifidelity machine learning (MFML) has shown to reduce the time-cost of generating training data for machine learning (ML) models used in predicting quantum chemistry (QC) properties. MFML achieves this by using training data from different accuracies, or fidelities. In this work, Vivin Vinod and Peter Zaspel investigate the effect of the multifidelity data hierarchies on the model cost and accuracy. With a new error metric, the error contours of MFML, the work systematically studies the impact of the different fidelities on the overall model error. Based on this outcome, a new multifidelity approach, the Γ-curve is implemented and shown to be a highly efficient method resulting in low model error with as little as two training samples at the costliest fidelity.