Ana Sánchez Fernández


Nationality: Spanish

Research topic: Learning Representation for Molecules from Chemical Structures and Microscopy Images

Project description:  Most of the representations of chemical compounds suitable for machine learning algorithms focus on the structures and their corresponding descriptors rather than their phenotypic behavior, which may limit chemists in knowledge discovery, i.e., discovering diversified series of compounds that have a similar biological effects on the protein target. In this research project, ESR will study machine learning algorithms to learn a representation of molecules from their chemical structures and corresponding microscopy image data. In this new representation, two molecules are close to each other if they have either similar structure or phenotypic behavior. The major objectives include development of the methodology and corresponding publicly available software packages for data-driven representation of molecules based on their internal structural characteristics and microscopy images. as well as development of the concept of chemical-phenotypic similarity and its validation in generative models and models for identification of compounds likely to interference with biological assays in collaboration with other ESRs.

Personal Introduction: Early-stage researcher of the Advanced machine learning for Innovative Drug Discovery (AIDD) consortium, focusing on applying representation learning methods for molecules using microscopy imaging data

Throughout my Biochemistry undergraduate studies, I gained an understanding of the fundamental processes that maintain living systems and their correlation, from a biological and chemical point of view. As cell signaling is a subject that sparked my interest, I completed my thesis studying MAP kinase pathways in multi-drug resistant cancer cells. Afterward, my interest in how biological problems can be aided by computer science led me to study a MSc in Bioinformatics. During this time, I had the opportunity to spend one year at the Barcelona Supercomputing Center, where I developed computational tools for drug discovery. Specifically, I built an API for predicting activities over compound libraries using different types of molecular fingerprints. Besides, I also developed PELEpharmacophore, a Python package to generate pharmacophore models based on PELE, a Monte-Carlo protein simulation program. This was done in collaboration with the spin-off NostrumBiodiscovery, which gave me some insights about research in an industrial setting.

My work focuses on developing a contrastive learning method for image-based and structure-based representations of small molecules. Including the effects that a molecule has on a biological system early in the drug discovery process might be useful to improve clinical success rates. Moreover, this biotechnology presents the advantage that it is time and cost-effective as compared to standard activity measurements. Therefore, characterizing a small molecule by the morphological changes it induces in a cell, which is the aim of this project, is one of the upcoming challenges for accelerating the drug discovery process. 

Contact: GitHub LinkedIn Twitter


Johannes Kepler Universität Linz, Austria, September 1st, 2021 - February 28, 2023

Janssen Pharmaceutica NV, Belgium,  March 1st , 2023 - August 31th, 2024

LINZ                 Janssen


Johannes Gutenberg-Universität Mainz October 2022