Yasmine Nahal


Nationality: Tunisian

Research topic: Improve machine learning models for drug design with human input

Project descriptionNot all medicinal chemistry knowledge is explicit and currently usable for machine learning modelling in drug design. A new and emerging area in machine learning is knowledge elicitation from human experts to improve the prediction accuracy of models. Drug discovery projects start with a small number of active compounds, thus limiting the usage of standard machine learning techniques resulting in low-quality models. In this project the ESR will include an additional information source as domain experts. Knowledge elicitation will be formulated as a probabilistic inference process, where expert knowledge is sequentially queried to improve predictions. The major outcomes will be development of a general methodology to elicit knowledge from human experts in drug discovery; analysis of the available descriptors and application of the methodology to query the experts to improve the models. Validation of the approach and in synthesis optimization and biological assay inference in collaboration with other ESRs.

Personal Introduction: As a pluridisciplinary scientist with a background in Bioinformatics and Applied Data Science, I am interested in applying computational approaches for solving biological problems. While studying for my Bachelor's degree in Life Sciences, I took a keen interest in understanding biomolecular structures and their involvement in disease mechanisms. After getting my Bachelor, I pursued a Master's degree in Bioinformatics, aiming to gain expertise in the field of computational drug design. During this Master program, I got the opportunity to conduct two exciting internship projects at CNRS and Sanofi, both involving the use of computational approaches for different applications: from deciphering the structural polymorphism of a therapeutic target of interest with molecular dynamics simulation to identifying potential active compounds with docking and virtual screening.

Then, I got particularly interested in applying Machine Learning in Drug Discovery. Therefore, I decided to consolidate my educational training with a second Master’s degree in Applied Data Science, where I particularly gained more expertise in Python programming for Machine Learning. I completed my thesis project at Iktos, a start-up developing deep-learning based de novo drug design technologies, and where I contributed to the development of a proprietary API for the prediction of docking scores to accelerate computations in large-scale virtual screening campaigns, and improve the model using active learning.

My current PhD project focuses on applying Bayesian and probabilistic modeling to develop novel and efficient human-in-the-loop machine learning methods that can be used to elicit chemist knowledge about small molecules and their physicochemical properties, then use this knowledge as additional data to infer and update the parameters of machine learning models that are used for the decision-making and design of new drug molecules. A human-in-the-loop method could be particularly useful in the case where insufficient amounts of training data are available, or where model uncertainty is high, which constitute a main bottleneck for the development of highly reliable models that can be deployed in production.

Contact: Github LinkedIn Twitter


AstraZeneca AB, Sweden  - November 1st, 2021 - April 30, 2023

Aalto University, Finland  May 1st , 2023 - August 31th, 2024

AstraZeneca  AALTO

Secondment: University of Vienna, Austria, June 2024