Paula Torren Peraire

Paula

Nationality: Spanish

Research topic:  Prediction of chemical synthesis using NLP models

Project description:  It is possible to formulate a retrosynthetic task as a language translation problem and venture on solving it with recurrent LSTM-like neural networks or hybrid Transformer architectures. The first promising results utilized the simplified reactions from patents where only reactants and products were left, but all reagents and conditions were neglected. In this research project, ESR will develop a model taking into account all the available data about a reaction without any simplification. The main objectives include development of a reaction database by combining public and in-house databases; identification of the most appropriate neural network architectures in collaboration with ESR12 to generate set of reagents (“paths”) to synthesize a target molecule; extension of the model by incorporating available information on conditions (reagents, catalyst, solvent, temperature, and etc.), expert knowledge, and additional scoring filters based on yield estimation in collaborations with other ESRs.

Personal Introduction: Early-stage researcher focusing on retrosynthesis prediction, in particular, the development of template-free approaches.

During my MPharm at the University of Barcelona, I explored the field of pharmaceutical sciences, where I became interested in the growing field of computer science within drug discovery. I studied a Masters at Pompeu Fabra University exploring the different areas of Bioinformatics for Health Sciences. This culminated in a Master’s Thesis at the Structural Bioinformatics and Network Biology group, within the Institute for Research in Biomedicine Barcelona (IRB Barcelona). Here, I further developed my interest in machine learning applied to life science problems. At this point, I aimed to combine my background in pharmaceutical and computational sciences by focusing on the use of machine learning within cheminformatics.

My current research is focused on reaction prediction, in particular the prediction of retrosynthesis. With the onslaught of novel compounds being developed, it is crucial to quickly explore and produce their synthesis routes. Normally, synthesis routes are formulated from building blocks to products. With retrosynthesis, we focus on breaking down a target molecule into less complex molecules which are easily synthesizable or purchasable, this allows us to reason from the final molecule to produce the route. Importantly, I hope to focus on developing models that are not just quantitatively successful but also useful for chemists and other specialists in a real-life applications.

Contact: GitHub LinkedIn Twitter GoogleScholar ORCID

Pre-prints and articles:

Presentations at conferences and meetings:

  • Torren-Peraire P. Enhancing Chemical Synthesis Planning through Combining Single-Step and Multi-Step Retrosynthesis Prediction Strategies. STB seminar (internal departmental seminar). May 26, 2023.
  • Torren-Peraire P. Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction AIDD on-line seminar (April 5, 2023)
  • Torren-Peraire P. AI in the Lab: How Machine Learning Can Transform Chemical Synthesis. Pint of Science Munich. May 22, 2023.
  • Hassen AK., Torren-Peraire P., Genheden S., Verhoeven J, Preuss M., Tetko I. Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction. <interact> conference. March 30, 2023. 
  • Hassen AK., Torren-Peraire P., Genheden S., Verhoeven J, Preuss M., Tetko I. Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction. NeurIPS 2022 workshop AI for Science: Progress and Promises. December 2, 2022.
  • Hassen AK., Torren-Peraire P., Genheden S., Verhoeven J, Preuss M., Tetko I. Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction. arXiv. December 12, 2022. arXiv:2212.11809
  • Voinarovska, V.; Dudenko, D.; Torren-Peraire, P.; Tetko, I.; Genheden, S. Addressing the applicability domain in yield prediction, 23rd EuroQSAR, Heidelberg, September 26-30, Germany 2022

Organizations:

Helmholz Munich, Germany, February 1st, 2022 - June 30th, 2023 (tentative)

Janssen Pharmaceutica NV, Belgium,  July 1, 2023 (tentative) - December 31st, 2024

HMGU               Janssen

Secondments:

AstraZeneca AB, Sweden, August 22nd, 2022 - October 15th, 2022

AstraZeneca