Research topic: Integrating microscopy images from different sources to inform compound design
Project description: Machine learning techniques require a proper chemical compound representation to build reliable prognostic models. Though the classical descriptor based method showed their effectiveness, the number of adjustable parameters usually many times bigger than the number of active compounds, especially in early-stage drug development projects. This leads to low-quality models. Supporting modelling with auxiliary data sources like microscopy image data facilitates the robustness of the models. It also offers a different notion of similarity where structurally diverse compounds could be similar because of their biological effects. ESR will work on integrating microscopy images representation to guide compound design. The major outcomes will be development of an open source-based domain adaptation workflow and corresponding methodology by using deep learning to integrate the data generated in the JUMP-CP project together with other public datasets and proprietary datasets from the Max Planck Institute and/or Janssen Pharmaceutica; development of a pipeline using computer vision techniques together with convolutional neural networks in order to extract relevant features from microscopy images that can be used to anticipate biological or toxicological effect of a compound in collaboration with ESR8; expansion of the state-of-the-art generative models (eg. RNNs, GANs, VAEs, etc.) with the biological information contained in the images with the chemical structure to automatically design compounds that can mimic a morphological response at cellular level and validate the proposed molecules in other ESR projects.
Personal introduction: Early-stage researcher of the Advanced machine learning for Innovative Drug Discovery (AIDD) consortium. Main research interest is developing Machine Learning methods that predict bioassay outcomes in a few-shot manner using microscopy image data.
Coming from a Maths and Statistics background in University of Oxford, I have always enjoyed the applied side of Mathematics. My internship at the Big Data Institute in Oxford, and my time spent working with the group at Oxford Protein Informatics group really inspired me to apply my Maths knowledge into Life Science. It was thanks to these experiences that I applied for the doctoral position here.
My research is on prediction of bioassay outcomes in a low-data regime, using microscopy images as molecular representation. This is an important task in drug discovery to help identify hit compounds. In fact, predicting biological assays on the basis of high-throughput microscopy data has been shown to lead to a tremendous increase in hit rates over traditional screening methods in previous drug discovery campaigns. We hope to expand on these promising results, by incorporating new ideas from computer vision and few-shot domains.
We are working with high-throughput microscopy image data as molecular representation for these models. Currently, our focus is on the Cell Painting Image Assay developed by the Broad Institute. Specifically, the JUMP-CP dataset, which is the largest of its kind Cell Painting dataset for image-based drug discovery strategies.
Presentations at conferences and meetings
- Ha, S.V. FSL-CP: Few-shot Prediction of small molecule activity using cell microscopy images. AIDD on-line seminar (May 17th, 2023 )
- Ha, S.V. Few-shot bioassay prediction with Cell Painting for drug discovery. RdKit UGM 2022. October 13th, 2022.
- Ha, S.V. , Tandon A., Czodrowski, P. Overview of Czodrowski Lab AK-Symposium. Johannes Gutenberg University Mainz. November 10th, 2022.
TU Dortmund, Germany - October 1st, 2021 - February 28, 2022
Johannes Gutenberg-Universität Mainz March 1st, 2022 -- March 31, 2023
Janssen Pharmaceutica NV, Belgium, April 1st , 2023 - September 30th, 2024
Johannes Kepler Universität Linz, Austria July and December 2022