Dr. Peter Hartog | Advanced machine learning for Innovative Drug Discovery (AIDD)

Peter Hartog

Nationality: Dutch

Research topic: One Chemistry: Unified and interpretable deep neural networks model for drug discovery

Project description: Statistical modelling in drug design project reveals correlations between chemical compounds' features and their physicochemical and biological endpoints. Though the final models possess excellent statistical characteristics, the reasoning of the models behind their predictions is only limited by the training dataset. Combining different expert modules with specific internal knowledge with subsequent simultaneously retraining on particular problem can support the model's reasoning by linking together knowledge about designing new compounds, their properties, biological responses, and synthetic accessibility. The One-Chemistry model will thus explain why it has picked up a particular drug candidate relating to different areas of chemistry and biology. ESR will implement the complex system for AI drug discovery, with pluggable modules, to design new chemical compounds. The overall method and software will be experimentally validated in ESR16's research project aiming at finding new effective drugs against prostate cancer.

Personal introduction: Early-stage researcher of the Advanced machine learning for Innovative Drug Discovery (AIDD) consortium, focussing on model interpretability and machine learning architectures to combine multiple models.

The experience from both the field of pharmaceutical sciences and the machine learning in a research setting during my university internships at the and Leiden Academic Centre for Drug Research (LACDR) and National Institute for Public Health and the Environment of the Netherlands (RIVM) inspired me to apply for the doctoral position in AIDD. My bachelor and master degree in biopharmaceutical science from Leiden University and previous lab research with data from machine learning models allows me to identify how the machine learning models are applied in the field. My experience with machine learning has shown me the promise and the pitfalls of machine learning for drug discovery.

My research is focussed on model interpretability, specifically focussed on making machine learning models more interpretable for lab researchers. Too often, lab researchers are asked to trust machine learning models based on their metrics, rather than their logic. Lab researchers need the chance to apply their molecule-based logic to the predictions made by machine learning models in order to assess the validity of a prediction. My work will aid lab researchers to critically assess specific predictions by analysing the model reasoning in a chemically understandable manner.

Additionally, my research will focus on combining the models from the rest of the consortium into one coherent “One Chemistry” model. Multiple tools exist with similar model reasoning. Combining the reasoning from these models should improve the predictive ability of this combined model. One Chemistry will allow users to predict a variety of tasks ranging from chemical synthesis to biological binding affinity prediction using the combined model reasoning of models created by the AIDD consortium.

Contact: GitHub LinkedIn Twitter GoogleScholar ORCID

Presentations at conferences and meetings:

Svensson, E., Hartog, P. AIDD Codebase: a Framework for Model Integration, Collaboration and Sharing. AIDD on-line seminar (June 22, 2022)
Hartog, P., Genheden, S., Tetko, I. Two sides of the same coin: The effect of smiles-based molecular representations on explainability. <Interact> Conference (March 31, 2023)
Hartog, P., Svensson, E., Mervin, L., Genheden, S., Engkvist, O., Tetko, I.V. Registries in Machine Learning-Based Drug Discovery: A Shortcut to Code Reuse. In: Clevert, DA., Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) AI in Drug Discovery. "AI in Drug Discovery" Workshop at ICANN2024 (Poster, 19th September 2024)

Articles and pre-prints:

Kopp, A.; Hartog, P.; Šícho, M.; Godin, G.; Tetko, I. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS Joint Compound Solubility Challenge. SLAS Discovery. 2024. https://doi.org/10.1016/j.slasd.2024.01.005
Hartog, P.; Krüger, F.; Genheden, S.; Tetko, I.V. Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition. Journal of Cheminformatics. 2024. https://doi.org/10.1186/s13321-024-00824-1
Hartog, P., Svensson, E., Mervin, L., Genheden, S., Engkvist, O., Tetko, I.V. Registries in Machine Learning-Based Drug Discovery: A Shortcut to Code Reuse. In: Clevert, DA., Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) AI in Drug Discovery. ICANN 2024. Lecture Notes in Computer Science, vol 14894. Springer, Cham. 2024. https://doi.org/10.1007/978-3-031-72381-0_9
Hartog, P., Westerlund, A., Tetko, I.V., Genheden, S. Investigations into the efficiency of computer-aided synthesis planning. Journal of Chemical Information and Modeling. 2025. https://doi.org/10.1021/acs.jcim.4c01821

Organizations:

Helmholtz Munich, Germany, November 1st, 2021 - April 30th, 2023

AstraZeneca AB, Sweden, May 1st , 2023 - October 30th, 2024

HMGU AstraZeneca

Secondments:

Johannes Kepler Universität Linz, Austria , September 11th - October 14th

LINZ