Kshema Shaju – Software for Data-Intensive Applications

Open PhD position in structure-preserving scientific machine learning for port-Hamiltonian ODEs and DAEs

Kshema Shaju News, News February 20, 2025March 12, 2025 0 Comment

Are you interested in developing novel scientific machine learning models for a special class of ordinary and differential algebraic equations? We are currently looking for a PhD candidate that supports us in developing Gaussian processes based models with structure preservation in port-Hamiltonian systems.

A PhD position is currently available in the Collaborative Research Center 1701 “Port-Hamiltonian Systems” at University of Wuppertal, Germany. Details on the CRC 1701 and the position can be found here: https://phi.uni-wuppertal.de/en/port-hamiltonian-institute/crc-1701/

The underlying project is under supervision of Prof. Dr. Peter Zaspel and Prof. Dr. Michael Günther. The team of Prof. Peter Zaspel is located at Bergische Universität Wuppertal. The international team focuses on methods development in machine learning, uncertainty quantification and high performance computing with context of applications from the natural sciences, engineering and beyond. It is embedded in the research group on Scientific Computing and High Performance Computing. For more details, see https://www.peter-zaspel.de/ and https://hpc.uni-wuppertal.de. Prof. Dr. Michael Günther is professor in the Applied and Computational Mathematics group. His research focuses on time integration methods for all types of (coupled) dynamical systems, with a focus on port-Hamiltonian ODEs, DAEs and PDEs with applications ranging from Computational Physics, Computational Finance to Computational Electronics. For more details see https://acm.uni-wuppertal.de/de/ .

A successful applicant is expected to have a Master’s degree (or equivalent) in mathematics, computer science, physics, or related field. Sound knowledge in (scientific) machine learning, and knowledge in numerical analysis and numerical linear algebra are expected. Knowledge in parallel programming is desirable. Prior knowledge in differential-algebraic equations, Gaussian processes or kernel based methods is a plus; programming experience in Python or C/C++ is expected. A good command of English is essential, both as the local working language and because of international collaborations. We look for a competent personality with initiative and commitment, who has the ability to work independently and in collaborations.

We offer a 3 year PhD position. The salary will be paid in accordance with the collective agreement for the public service of the German Länder (Tarifvertrag des öffentlichen Dienstes der Länder, TV-L), with salary level 13 (75%). The place of employment will be Wuppertal, Germany.

The position is available immediately. The call for applications is open until April 8, 2025. For further information and in order to apply, please visit the online jobs portal https://stellenausschreibungen.uni-wuppertal.de . There, you will find the official job description and the link for submitting the application material under reference number 25063. If you have questions on the submission process or have questions on the position please contact Prof. Peter Zaspel via zaspel(at)uni-wuppertal.de.

New development in multi-fidelity machine learning methods opens up possibilities for the use of heterogeneous data for the prediction of quantum chemical properties

Kshema Shaju News, News July 31, 2024February 20, 2025CheMFi, excitation energies, heterogeneous data, machine learning, multi-fidelity, quantum chemistry 0 Comment

Vinod, V., & Zaspel, P. (2024). Assessing Non-Nested Configurations of Multifidelity Machine Learning for Quantum-Chemical Properties. arXiv preprint 2407.17087, http://arxiv.org/abs/2407.17087

Multi-fidelity methods in machine learning (ML) of quantum chemistry (QC) properties have made high accuracy low cost models more accessible to the community. These have been used in application for a range of properties including excitation energies. Most multi-fidelity methods usually require a nested configuration of the training data, that is, calculations for a geometry are to be made at the lower fidelities as well as the higher fidelities.
In a recent work, available as a preprint the authors, Vivin Vinod and Peter Zaspel assess a non-nested configuration of multi-fidelity machine learning (MFML) and optimized MFML (o-MFML) methods. Preliminary results suggest that while MFML would still require a nested data structure, o-MFML can generalize reasonably well over a non-nested training data structure. That is, o-MFML opens up avenues for the use of heterogeneous datasets reducing the requirement to make costly calculations for high-fidelity data.

Dataset of diverse quantum chemical properties to enable research and benchmarking of multifidelity machine learning models released!

Kshema Shaju News, News July 22, 2024September 12, 2024benchmark, dataset, DFT, machine learning, multi-fidelity 0 Comment

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules. arXiv preprint arXiv:2406.14149 https://doi.org/10.48550/arXiv.2406.14149.

Vinod, V., & Zaspel, P. (2024). CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules (1.0) [Data set]. Zenodo. https://zenodo.org/records/11636903.

With research booming in the field of multifidelity methods for Quantum Chemistry (QC), it becomes important to benchmark the various methods in interest of meaningful comparison of the models. This allows for expedited research by setting standards which subsequent research assess with their own methodological developments. In interest of such a uniform comparison, the quantum Chemistry MultiFidelity (CheMFi) dataset was distributed to the community on an open source CC-BY-4.0 license. Containing 135k geometries of diverse and chemically complex molecules taken from the WS22 database, the CheMFi datatset contains QC properties ranging from excitation energies to molecular dipole moments. For each property, five fidelities of properties are provided with DFT accuracy. The fidelities themselves are formed on the basis set choice. This dataset is a major step in the direction of research and development of multifidelity machine learning methods for QC.

The authors of this work are Vivin Vinod and Peter Zaspel. The dataset is available on ZENODO and the accompanying manuscript is available as a preprint.

Optimal Combination with Multifidelity Machine Learning Achieves Coupled Cluster Accuracy

Kshema Shaju News, News July 22, 2024September 12, 2024atomization energies, coupled cluster, electronic structure theory, excitation energies, kernel ridge regression, machine learning, multi-fidelity 0 Comment

Vinod, V., Kleinekathöfer, U., & Zaspel, P. (2024). Optimized multifidelity machine learning for quantum chemistry. Machine Learning: Science and Technology, 5(1), 015054 http://doi.org/10.1088/2632-2153/ad2cef.

Recent research in Multifidelity Machine Learning (MFML) has resulted in ML methods that reduce the cost of generating a training set without compromising on the accuracy of the predictions. This is achieved by the combination of cheaper and less accurate data with high accuracy (or fidelity) and high cost data. In this work, a novel methodological improvement of MFML is benchmarked for various quantum chemical (QC) properties. Optimized MFML (o-MFML) performs the combination of the different fidelities of data are using an Optimal Combination method. With this improvement, it is shown that high accuracy methods such as Coupled Cluster Singlets Double (Triplet) are now more accessible that ever to the ML-QC community. The work is available in the Machine Learning: Science and Technology journal from IOPScience and is authored by Vivin Vinod, Ulrich Kleinekathöfer, and Peter Zaspel.

Reducing Compute Costs of Generating Training Data for Excitation Energy Prediction Using Multifidelity Methods

Kshema Shaju News, News July 22, 2024September 12, 2024electronic structure theory, excitation energies, kernel ridge regression, machine learning, multi-fidelity 0 Comment

Vinod, V., Maity, S., Zaspel, P., & Kleinekathöfer, U. (2023). Multifidelity machine learning for molecular excitation energies. Journal of Chemical Theory and Computation, 19(21), 7658-7670 https://doi.org/10.1021/acs.jctc.3c00882.

A major challenge to accurate predictions of quantum chemical (QC) properties with machine learning methods is the lack of high accuracy data. Generating high accuracy training data for machine learning (ML) is computationally expensive. With the multifidelity machine learning (MFML) method, cheaper and less accurate data is used alongside very little high accuracy data to result in a model with better accuracy in predicting high fidelity data. In this work, the MFML method is benchmarked for vertical excitation energies, a QC property vital to understanding elementary life processes such as photosynthesis. Numerical results indicate a time benefit over a factor of 30. This is a strong step towards development of ML methods for QC reducing the compute cost of generating a training set. This work is authored by Vivin Vinod, Sayan Maity, Peter Zaspel, and Ulrich Kleinekathöfer and has been published in the Journal of Chemical Theory and Computation.