Research topic: Prediction of yields and rates of chemical reactions
Project description: The economic effectiveness of a reaction is its yield that summarizes all resources needed for synthesis, including reagent consumptions and human labor. Low-yielded reactions are ineffective and have to be eliminated from possible retrosynthetic paths at the early stages of its planning. During the research project ESR will develop quantitative and qualitative yield prediction models for several well-defined types of chemical reactions for which yield information is available in Enamine. The main objectives are identification of the most frequent reaction types with yield information and collection of a diverse database; development classification (go/not go) and regression (0-100%) models for the selected types and exploration of different representations of chemical reactions including but not limited to SMILES, reaction fingerprints, physico-chemical properties of solvent and catalyst, and also data-driven embeddings in collaboration with other ESRs.
Personal Introduction: Early-stage researcher of the Advanced machine learning for Innovative Drug Discovery (AIDD) consortium, focused on the development of machine learning models for chemical reaction yield predictions.
My background is in chemistry and chemoinformatics, in particular, I got my BSc in Chemistry from National Taras Shevchenko University and my double MSc in Chemoinformatics from National Taras Shevchenko University and University in Strasbourg with a curriculum focused on the multidisciplinary understanding of chemistry, biology, and physics modern problems, organic chemistry, and chemoinformatics. During the first year of my Master's study, I had an industrial internship in Enamine on designing a small library of ligands and docking it to E3 ligase to select the promising ones. The master thesis project was done in collaboration with the Institute de Genetique et Biologie Cellularie (IGBMC, France), where I spent 5 months working on the molecular dynamics simulation investigation of small ligand interaction with catechol-O-methyltransferase (COMT), an enzyme that has a connection with the Parkinson's disease. This project ended with a publication in Molecules.
My current research is focused on developing machine learning models that could predict yields of organic reactions as well as investigations into the most important features of reactions that are crucial for the successful predictive power of models. My work goes in the direction of obtaining useful tools for chemistry decision-making, allowing chemists to choose economically profitable reaction paths.
Pre-prints and articles
- Andronov, M.; Voinarovska, V.; Andronova, N.; Wand, M.; Clevert, D.-A.; Schmidhuber, J. Reagent Prediction with a Molecular Transformer Improves Reaction Data Quality. 2023. Chem. Sci. https://doi.org/10.1039/D2SC06798F
Presentations at conferences and meetings
- Voinarovska, V.; Dudenko, D.; Torren-Peraire, P.; Tetko, I.; Genheden, S. Addressing the applicability domain in yield prediction, 23rd EuroQSAR, Heidelberg, September 26-30, Germany 2022.
- Andronov, M.; Voinarovska, V.; Wand, M.; Schmidhuber, J. Reagent Prediction With a Molecular Transformer Improves Reaction Data Quality. 23rd EuroQSAR, Heidelberg, Germany, September 2022
Helmholtz Munich, Germany, October 1st, 2021 - March 31, 2023
AstraZeneca AB, Sweden, April 1st , 2023 - September 30th, 2024
Janssen Pharmaceutica NV, Belgium, February 2023