Son Hà

Son Ha

Research topic: Integrating microscopy images from different sources to inform compound design

Project descriptionMachine learning techniques require a proper chemical compound representation to build reliable prognostic models. Though the classical descriptor based method showed their effectiveness, the number of adjustable parameters usually many times bigger than the number of active compounds, especially in early-stage drug development projects. This leads to low-quality models. Supporting modelling with auxiliary data sources like microscopy image data facilitates the robustness of the models. It also offers a different notion of similarity where structurally diverse compounds could be similar because of their biological effects. ESR will work on integrating microscopy images representation to guide compound design. The major outcomes will be  development of an open source-based domain adaptation workflow and corresponding methodology by using deep learning to integrate the data generated in the JUMP-CP project together with other public datasets and proprietary datasets from the Max Planck Institute and/or Janssen Pharmaceutica; development of a pipeline using computer vision techniques together with convolutional neural networks in order to extract relevant features from microscopy images that can be used to anticipate biological or toxicological effect of a compound in collaboration with ESR8; expansion of the state-of-the-art generative models (eg. RNNs, GANs, VAEs, etc.) with the biological information contained in the images with the chemical structure to automatically design compounds that can mimic a morphological response at cellular level and validate the proposed molecules in other ESR projects.

Personal introduction:  Early-stage researcher of the Advanced machine learning for Innovative Drug Discovery (AIDD) consortium. Main research interest is developing Machine Learning methods that predict bioassay outcomes in a few-shot manner using microscopy image data.

Coming from a Maths and Statistics background in University of Oxford, I have always enjoyed the applied side of Mathematics. My internship at the Big Data Institute in Oxford, and my time spent working with the group at Oxford Protein Informatics group really inspired me to apply my Maths knowledge into Life Science. It was thanks to these experiences that I applied for the doctoral position here.

My research is on prediction of bioassay outcomes in a low-data regime, using microscopy images as molecular representation. This is an important task in drug discovery to help identify hit compounds. In fact, predicting biological assays on the basis of high-throughput microscopy data has been shown to lead to a tremendous increase in hit rates over traditional screening methods in previous drug discovery campaigns. We hope to expand on these promising results, by incorporating new ideas from computer vision and few-shot domains.

We are working with high-throughput microscopy image data as molecular representation for these models. Currently, our focus is on the Cell Painting Image Assay developed by the Broad Institute. Specifically, the JUMP-CP dataset, which is the largest of its kind Cell Painting dataset for image-based drug discovery strategies. 

Contact: Github LinkedIn Twitter


TU Dortmund, Germany   - October 1st, 2021 - February 28, 2022

Johannes Gutenberg-Universität Mainz March 1st, 2022 -- March 31, 2023

Janssen Pharmaceutica NV, Belgium,  April 1st , 2023 - September 30th, 2024

tudo             JGU       Janssen


Johannes Kepler Universität Linz, Austria July and December 2022