Yasmine Nahal | Advanced machine learning for Innovative Drug Discovery (AIDD)

Yasmine

Nationality: Tunisian

Research topic: Improve machine learning models for drug design with human input

Project description: Not all medicinal chemistry knowledge is explicit and currently usable for machine learning modelling in drug design. A new and emerging area in machine learning is knowledge elicitation from human experts to improve the prediction accuracy of models. Drug discovery projects start with a small number of active compounds, thus limiting the usage of standard machine learning techniques resulting in low-quality models. In this project the ESR will include an additional information source as domain experts. Knowledge elicitation will be formulated as a probabilistic inference process, where expert knowledge is sequentially queried to improve predictions. The major outcomes will be development of a general methodology to elicit knowledge from human experts in drug discovery; analysis of the available descriptors and application of the methodology to query the experts to improve the models. Validation of the approach and in synthesis optimization and biological assay inference in collaboration with other ESRs.

Personal Introduction: As a pluridisciplinary scientist with a background in Bioinformatics and Applied Data Science, I am interested in applying computational approaches for solving biological problems. While studying for my Bachelor's degree in Life Sciences, I took a keen interest in understanding biomolecular structures and their involvement in disease mechanisms. After getting my Bachelor, I pursued a Master's degree in Bioinformatics, aiming to gain expertise in the field of computational drug design. During this Master program, I got the opportunity to conduct two exciting internship projects at CNRS and Sanofi, both involving the use of computational approaches for different applications: from deciphering the structural polymorphism of a therapeutic target of interest with molecular dynamics simulation to identifying potential active compounds with docking and virtual screening.

Then, I got particularly interested in applying Machine Learning in Drug Discovery. Therefore, I decided to consolidate my educational training with a second Master’s degree in Applied Data Science, where I particularly gained more expertise in Python programming for Machine Learning. I completed my thesis project at Iktos, a start-up developing deep-learning based de novo drug design technologies, and where I contributed to the development of a proprietary API for the prediction of docking scores to accelerate computations in large-scale virtual screening campaigns, and improve the model using active learning.

My current PhD project focuses on applying Bayesian and probabilistic modeling to develop novel and efficient human-in-the-loop machine learning methods that can be used to elicit chemist knowledge about small molecules and their physicochemical properties, then use this knowledge as additional data to infer and update the parameters of machine learning models that are used for the decision-making and design of new drug molecules. A human-in-the-loop method could be particularly useful in the case where insufficient amounts of training data are available, or where model uncertainty is high, which constitute a main bottleneck for the development of highly reliable models that can be deployed in production.

Contact: Github LinkedIn Twitter GoogleScholar

Articles and Pre-prints

Nahal, Y., Menke, J., Martinelli, J., Heinonen, M., Kabeshov, M., Janet, J. P., Nittinger, E., Engkvist, O. and Kaski, S. Human-in-the-loop active learning for goal-oriented molecule generation. Journal of Cheminformatics. 2024. https://doi.org/10.1186/s13321-024-00924-y
Nahal, Y. et al. Towards Interpretable Models of Chemist Preferences for Human-in-the-Loop Assisted Drug Discovery. In: Clevert, DA., Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) AI in Drug Discovery. ICANN 2024. Lecture Notes in Computer Science, vol 14894. Springer, Cham. 2024. https://doi.org/10.1007/978-3-031-72381-0_6

Presentations at conferences and meetings

Martinelli, J.; Nahal, Y.; Lê, D.; Engkvist, O.; Kaski, S. Leveraging expert feedback to align proxy and ground truth rewards in goal-oriented molecular generation. Poster at NeurIPS2023 Workshop AI4D3. 2023. https://openreview.net/forum?id=KWIM7ZNYxb. Link to full workshop paper.
Nahal, Y. Heinonen, M., Engkvist, O. Kaski, S. Human-in-the-loop active learning to improve molecular design and optimization. In AstraZeneca Molecular Design meetings. April 13, 2023.
Nahal, Y. A Survey on Human-in-the-loop Machine Learning on-line AIDD lecture. March 16, 2022.
Nahal, Y. Learning from user feedback to improve recommender models and potential applications to molecular design. In Finnish Center of Artificial Intelligence Virtual Drug Design Lab seminars. March 1, 2022.
Nahal, Y. Heinonen, M., Engkvist, O. Kaski, S. Human-in-the-loop active learning to improve molecular design and optimization. In Finnish Center of Artificial Intelligence Virtual Drug Design Lab seminars. September 6, 2022.
Nahal, Y. et al. HITL active learning for goal-oriented molecule generation. At ELLIS Doctoral Symposium 2023, Helsinki. August 28, 2023.
Nahal, Y. et al. Leveraging expert feedback to align proxy and ground truth rewards in goal-oriented molecular generation. At NeurIPS 2023, New Frontiers of AI for Drug Discovery and Development Workshop, New Orleans, USA. December 12 - 16th, 2023.
Nahal, Y. et al. Human-in-the-loop active learning for goal-oriented molecule generation. At Aalto CS Research Day 2024, Helsinki. October 9th, 2024.

Organizations:

AstraZeneca AB, Sweden - November 1st, 2021 - April 30, 2023

Aalto University, Finland May 1st , 2023 - August 31th, 2024

AstraZeneca AALTO

Secondment: University of Vienna, Austria, June 2024