!!! ESR3 position is re-open !!!
ESR3: Prediction of chemical synthesis using NLP models
Chemical synthesis is critical to further increase life quality by contributing to new medicine and new materials. The optimal synthesis can decrease its costs as well as the amount of produced chemical waste. The prediction of the direct, i.e., which new chemical compound results by mixing a set of reactants, or retro-synthesis, which compounds are starting materials to make a given product, is the cornerstone of chemical synthesis. The ESR3 will develop a new method (based on the preliminary results [1,2]) to predict the outcome of reactions. The goal is to extend the published models by incorporating additional information about experiments (reagents, catalyst, solvent, temperature, etc.) and expert knowledge. The fellow will actively collaborate with ESR13 (QM models for reactivity prediction), ESR4 (prediction of the yield of chemical reactions), and ESR7 (multi-objective synthesis planning) and develop a solid theoretical foundation as well as practical intuition for how additional data and knowledge can improve the models.
1. Karpov P., Godin G., Tetko I.V.: A Transformer Model for Retrosynthesis. In: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions: 17th - 19th September 2019 2019; Münich. Springer International Publishing: 817-830.
2. Tetko I.V., Karpov P., Van Deursen R., Godin G.: State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat Comm 2020, 11(1):1-11.
Send your application to firstname.lastname@example.org preferably before July 30, 2021. The position will be filled in as soon as a qualified candidate will be found.
For all other positions we have selected candidates who have accepted offers.
The project offers 15 Ph.D. positions. The positions are expected to start in Summer - early Autumn. Please read the information on the page carefully before applying.
Eligibility and Mobility Rule
Early-Stage Researchers (ESRs) shall, at the time of recruitment by the host organization, be in the first four years (full-time equivalent research experience) of their research careers and have not been awarded a doctoral degree;
Date of Recruitment normally means the first day of the employment of the fellow for the purposes of the project (i.e. the starting date indicated in the employment contract or equivalent direct contract).
Full-Time Equivalent Research Experience is measured from the date when a researcher obtained the degree which would formally entitle him/her to embark on a doctorate, either in the country in which the degree was obtained or in the country in which the researcher is recruited or seconded, irrespective of whether or not a doctorate is or was ever envisaged.
At the time of recruitment by the host organization, researchers must not have resided or carried out their main activity (work, studies, etc.) in the country of their host organization for more than 12 months in the 3 years immediately prior to the reference date. Compulsory national service and/or short stays such as holidays are not taken into account. As far as international European interest organizations or international organizations are concerned, this rule does not apply to the hosting of eligible researchers. However, the appointed researcher must not have spent more than 12 months in the 3 years immediately prior to their recruitment at the host organization.
For refugees under the Geneva Convention (1951 Refugee Convention and the 1967 Protocol), the refugee procedure (i.e. before refugee status is conferred) will not be counted as ‘period of residence/activity in the country of the beneficiary’.
Eligibility and Mobility Rules are defined only at the first employment.
More details are available here.
We are using the Code of Conduct for the Recruitment of Researchers https://euraxess.ec.europa.eu/jobs/charter/code
Common requirements for all ESR
All the applicants are expected to:
- have a Master's degree in computer science, physics, chemistry, or engineering with and sincere interest in biology and the life sciences;
- have some prior expertise in one or more of the following fields: machine learning, modeling and simulation;
- be excellent in oral and written English with good presentation skills;
- possess strong interpersonal skills, excellent written and verbal communication, and the ability to work effectively both independently and in cross-functional teams;
- be a highly creative person with outstanding problem-solving ability and the willingness to undertake challenging analysis tasks in a timely fashion.
Furthermore, the following software skills are required:
- Excellent software engineering skills are essential. Programming skills in Python must be top-notch.
- Experience with relevant libraries (TensorFlow/PyTorch, the python scientific stack) is necessary.
- Good command of modern software development tools, from git to continuous integration pipelines, is an additional plus.
The successful candidate will also demonstrate a passion for driving scientific questions with a positive and problem-solving attitude and the willingness to undertake challenging analysis tasks in a timely fashion. Excellent English is required, both spoken and written, and the ability to work effectively both independently and in cross-functional teams. We also believe that you enjoy teamwork, have a collaborative nature, and will be an encouraging colleague to all.
Descriptions of individual ESRs
For each position, academic and Industrial hosts are provided in the order of employment sequences. For example, ESR1 will start in HMGU (Germany) and then continue his/her work in AstraZeneca (Sweden). Check this order with the mobility rule.
ESR1 One Chemistry: Unified and interpretable deep neural networks model for drug discovery
This is a great opportunity to potentially shape the future of drug discovery by working on and designing cutting edge deep learning architectures. Recent years have seen an explosion in the interest of deep neural networks for drug discovery applications. This project aims to take the architecture to the next level by creating an interpretable "One-Chemistry" model, which unifies multiple predictions of physicochemical and ADME properties and incorporates tasks from the other Ph.D. projects within the AIDD. To support decision making in drug discovery projects by understanding the reasoning of the model, the interpretability of the model is crucial. Statistical modeling in drug design project reveals correlations between chemical compounds' features and their physicochemical and biological endpoints. Though the final models possess excellent statistical characteristics, the reasoning of the models behind their predictions is only limited by the training dataset. Combining different expert modules with specific internal knowledge with subsequent simultaneously retraining on a particular problem can support the model's reasoning by linking together knowledge about designing new compounds, their properties, biological responses, and synthetic accessibility. The "One-Chemistry" model will thus explain why it has picked up a particular drug candidate relating to different areas of chemistry and biology. The successful Ph.D. student will implement the complex system for AI drug discovery, with pluggable modules, to design new chemical compounds. The overall method and software will be experimentally validated in collaboration with a Ph.D. research project aiming at finding new effective drugs against prostate cancer.
ESR2: One Chemistry: Robust learning of modular AI systems for the molecular generation, chemical reactions, and synthesis
Here is a great Ph.D. project opportunity to shape the future of drug discovery by creating the foundation for robust learning of "One-Chemistry" deep learning models. The aim is to create a modular and unified system for deep learning of molecular tasks, and the project's focus is to explore and deliver robust learning and normalization techniques that enable end-to-end training of the modular AI systems. The models created from multiple self-normalizing modules will be tested and evaluated for their application in drug discovery by collaboration with other Ph.D. projects within the AIDD training network focusing on generative models as well as reaction and synthesis prediction. The successful Ph.D. candidate will first analyze and review published modular AI systems with regard identify systems where robust normalization could improve performance within a molecular generation, retrosynthesis, and synthesis aware molecular generation. Robust self-normalizing techniques will be developed and benchmarked in collaboration with architectures developed in collaboration with other Ph.D. students within the AIDD training network.
ESR3 is reopen, see above.
ESR4: Prediction of yield and rates of chemical reactions
A chemical reaction's economic effectiveness strongly depends on its yield that summarizes all resources needed for synthesis, including reagent consumptions and human labor. Low-yielded reactions are ineffective, and synthesis planning has to refrain from their usage during possible retrosynthetic path analysis. ESR will develop quantitative and qualitative yield prediction models for several well-defined types of chemical reactions. Both quantitative and qualitative models to predict the yield will be developed based on preliminary results [1,2]. The fellow will analyze the influence on the models of different representations of chemical reactions as SMILES, reaction fingerprints, and physicochemical properties. Experimental partners will test the predictions in a laboratory.
1. Kravtsov A.A., Karpov P.V., Baskin I.I., Palyulin V.A., Zefirov N.S.: Prediction of rate constants of SN2 reactions by the multicomponent QSPR method. Dokl Chem 2011, 440(2):299-301.
2. Gimadiev T., Madzhidov T., Tetko I., Nugmanov R., Casciuc I., Klimchuk O., Bodrov A., Polishchuk P., Antipin I., Varnek A.: Bimolecular Nucleophilic Substitution Reactions: Predictive Models for Rate Constants and Molecular Reaction Pairs Analysis. Mol Inform 2019, 38(4):e1800104.
ESR5: PhD on reactivity simulations by combining quantum mechanics and machine learning
Description of the work to perform during the PhD:
The next frontier in molecular simulations is to be able to perform reactive simulations using fast machine learning potentials. Accurate prediction of the outcomes of an organic reaction is still an unsolved task and only experienced chemists can make reliable predictions based on underlying mechanistic and quantum chemical intuition. In this research project, the PhD candidate will develop a new methodology for the fast simulation of reactive molecules based on fast machine-learning-quantum computation. The major expected outcomes are:
Selection of a set of relevant and simple systems to produce a database of accurate quantum mechanical data. The data will be generated using GPUGRID.net
Training models and neural network potentials to be used for the simulation of the chemical reaction. Validation of the models comparing the yield predicted and experimental values in collaboration with other PhD students participating in the project.
Further validation on internal synthesis data at Bayer and expansion of the applicability domain to a larger set of synthesis routes.
The PhD candidate will perform research 1.5 years at UPF (Barcelona, Spain) and 1.5 years at Bayer (Berlin, Germany).
The successful candidate should demonstrate a passion for driving scientific questions with a positive and problem-solving attitude and the willingness to undertake challenging analysis tasks in a timely fashion. Excellent English is required, both spoken and written, and the ability to work effectively both independently and in cross-functional teams. We also believe that you enjoy teamwork, have a collaborative nature and will be an encouraging colleague to all. Female researchers and candidates are particularly encouraged to apply.
ESR6: Integrating microscopy images from different sources to inform the compound design
Machine learning techniques require a proper chemical compound representation to build reliable prognostic models. Though the classical descriptor-based method showed its effectiveness, the number of adjustable parameters usually is many times bigger than the number of active compounds, especially in early-stage drug development projects. This leads to low-quality models or prohibits machine learning at all. Supporting modeling with auxiliary data sources like microscopy image data facilitates the robustness of the models. It also offers a different notion of similarity where structurally diverse compounds could be similar because of their biological effects. This project will work on integrating microscopy image representation to guide compound design. The major outcomes are:
1. Development of an open source-based domain adaptation workflow and corresponding methodology by using deep learning to integrate large public datasets and proprietary datasets from the Max Planck Institute of Molecular Physiology and/or Janssen Pharmaceutica.
2. Development of a pipeline using computer vision techniques together with convolutional neural networks in order to extract relevant features from microscopy images that can be used to anticipate the biological or toxicological effect of a compound.
3. Expansion of the state-of-the-art generative models (eg. RNNs, GANs, VAEs, etc.) with the biological information contained in the images with the chemical structure to automatically design compounds that can mimic a morphological response at the cellular level and validate the proposed.
ESR 7: Fast and scalable multi-objective synthesis route optimization
The applicant must have some prior expertise in one or more of the following fields: machine learning, modeling and simulation, and multi-objective optimization.
The modern organic synthesis is subject to yield, reliability, safety, hazard analysis, control performance, environmental quality, etc., apart from the major goal of achieving economic efficiency. Those outcomes are often measured in different scales and are non-commensurate, therefore they cannot be combined into a single, meaningful scalar objective function suited for conventional optimization techniques. ESR will work on developing multi-objective optimization approaches for simultaneously designing new compounds and their synthetic route planning. The major outcomes are:
Development of the methodology of multi-objective synthesis planning.
Development of publicly available software for building and using models for retrosynthesis.
Benchmarking of the method with the state-of-the-art algorithms for retrosynthesis planning in collaboration with ESR3, ESR5, ESR9, ESR12,
ESR8: Learning Representation for Molecules from Chemical Structures and Microscopy Image
Hereby we present an opportunity for a Ph.D. student to innovate drug design with novel representation learning techniques. The aim is to learn more powerful, novel, transferable, and informative molecule representations by combining structural information of molecules with information from microscopy imaging data. This innovative approach should allow us to identify new chemical scaffolds, reveal known and unknown secondary effects of drug candidates, and extend the applicability domain of predictive models, potentially resulting in a more efficient data-driven drug discovery process. Jansen has screened hundreds of thousands of small molecules using high throughput microscopy imaging . In this effort, a number of imaging datasets of unprecedented size have been generated. A successful Ph.D. candidate will have access to these datasets, learn to develop powerful Deep Learning architectures for representation learning, and use these to improve various tasks in drug design. The developed methods will then be complemented by few-shot and domain-adaptation techniques, facilitating efficient re-use of the knowledge contained in the learned representations. In all these efforts, the Ph.D. student will interact and cooperate with other Ph.D. students within the AIDD framework.
1. Simm et al., 2018, Cell Chem Bio
ESR9: Improve drug design with human-assisted AI
Not all relevant knowledge is explicitly accessible and usable for machine learning modeling in drug design. A new and emerging area in machine learning is knowledge elicitation from human experts to improve the prediction accuracy of models so-called human-in-the-loop-modeling. The goal of this Ph.D. studentship is to develop human-in-loop machine learning models applicable to drug design. The Ph.D. student will work in an already existing collaboration between AstraZeneca and Alto University. The main task will be to develop human-in-the-loop modeling so it can be used to guide deep learning-based de novo drug design. The student will be developing a system that will query the drug designer to improve existing relevant machine learning models in particular models that have been build on small data sets. The system will also be extended to elucidate from the drug designer what is the optimal desirability function that should be used to generate the most relevant molecules. The developed system will be released as open-source to the benefit of the scientific community.
ESR10: Improved uncertainty quantification of drug-target predictions through the utilization of auxiliary data
Drug-target interaction predictions provide valuable information on a molecule's potency, potential side effects, and the opportunities to repurpose the molecule for another disease. These predictions are of high importance for instance for de novo drug design based on deep learning, where finding specific and selective binders are the ultimate goals. Macau is a Bayesian matrix factorization model that was previously developed at KU Leuven and learns a latent representation of complex interactions from highly incomplete data. The Ph.D. student will develop novel deep learning models (DeepMacau) that extend this latent representation strategy and improve predictions by the use of auxiliary information, for instance, image and gene expression data. Auxiliary information beyond molecular fingerprints, such as high-content imaging data for compounds, expression data for compounds or targets, pathway information for targets, single-dose high-content screening data (as opposed to dose-response data) will be included as side-information to improve the predictive performance. Both Markov Chain Monte Carlo (MCMC) approaches (such as Stochastic Gradient Hamiltonian Monte Carlo) and bootstrap-like approaches will be used to provide reliable uncertainty modeling (confidence estimation) beyond point estimates.
ESR11: Machine learning models for the identification of compounds likely to interfere with biological assays
High-throughput screening allows the testing of thousands of compounds per day. However, a substantial proportion of the initial hits can be artifacts related to aggregate formation, chemical reactivity, photoreactivity, decomposition, etc. Early drug discovery can benefit enormously from in silico approaches for the identification of assay artifacts and rejection of such hits, but the existing methods are still in their infancy and generally do not take into account the specifics of individual assay methods and interference mechanisms. During this research project, the doctoral student will develop robust machine learning approaches to accurately assess the risk of a compound causing assay interference, taking into account the individual assay designs, chemical reactivity, and/or the assay kinetics.
ESR12: Prediction of outcome of chemical reactions using new neural network architectures
The Swiss AI Lab IDSIA, affiliated with the University of Applied Sciences and Arts of Southern Switzerland (SUPSI) and the University of Southern Switzerland (USI), offers a Ph.D. student position within the EU H2020 project AIDD. We are seeking a highly qualified and motivated young scientist with cross-cutting interest/experience in machine learning and chemistry.
The successful applicant will be part of a large, interdisciplinary network of top-level partners aiming at empowering chemical and pharmaceutical research through the latest developments in machine learning. He or she will join for 18 months the world-renowned research group of Prof. Jürgen Schmidhuber at the Swiss AI Lab IDSIA. Then another 18 months will be spent with one of our industrial partners (e.g., Bayer, Germany). The concrete project will cover end-to-end learning of outcomes of organic chemical reactions and will be part of a concentrated research effort among the 15 AIDD partners, with frequent collaboration and a high-level network-wide training program.
ESR13: Quantum machine learning for reactivity
The generation of novel chemical matter with activity on relevant targets is the core business of the pharmaceutical industry. The ability to estimate the reactivity of such compounds would be valuable both for the initial chemical synthesis of these compounds and for assessing their stability under a variety of conditions. Quantum mechanical (QM) models enable simulations on reactivity and stability of these compounds through simulation of the bond-breaking and bond-forming events. However, these simulations come at a monumental computational expense. In recent years, quantum mechanical machine learning models have been developed which can provide accuracy comparable to quantum mechanical methods at a negligible computational cost. We will design and validate a neural network trained on reference QM data, which can then be used to predict reaction barriers and transition state conformations. This information can then be leveraged to inform chemistry decision-making.
ESR14: Decomposable latent representations for in-vivo toxicity prediction
In the perspective of improving drug design with AI support, it is important to incorporate information on possible compounds tox liability as early as possible in the design. While the most indicative data for compound toxicity is recorded as pre-clinical pathologies or further as clinical indications, this data is very scarce. However, we would aim to enhance that information by modeling in the context of other heterogeneous and multi-fidelity data sources. In the project decomposable latent representations for in-vivo toxicity prediction we aim at developing state-of-the-art, novel machine learning methods to learn chemical representations that factorize into intuitive chemical or pre-clinical pathologies from public and proprietary data. These models will be developed to assess tox liability for compounds to support drug discovery in the very early stage. The goals of the project include learning under data scarcity, integration of heterogenous and multi-fidelity data sources, and model interpretability.
ESR15: Deep Learning for protein simulation
In the drug design context, an important problem is the computation of statistical quantities such as protein-ligand binding free energies or dissociation rates. Molecular dynamics (MD) simulations are used for this purpose but suffer from the rare-event sampling problem, e.g., the fact that it takes a long time to dissociate the bound drug, and thus to estimate the dissociation rate or binding affinity by direct sampling. In this project, we aim at developing deep learning methods incorporating molecular physics to enhance the sampling of protein and protein-ligand systems. See Boltzmann Generators  for an example of preliminary work in this direction.
1. Noe et al., Boltzmann Generators: sampling equilibrium states of many-body systems with deep learning. Science 365 eaaw1147 (2019)
How to apply
Be sure that you satisfy eligibility and mobility rules!
- prepare your profile and provide sufficient details about your educational and work background, proofs of your education (or expected time of your MSc/diploma), your CV, and motivation letter;
- submit your application to recruit at ai-dd.eu before the deadline of April 18st, 2021 (the screening will start immediately; do not wait until the deadline to submit your application). Indicate ESR number(s) in the title of the letter. Send your application to recruit at ai-dd.eu only and do not duplicate it to the individual PIs.
- indicate up to three ESRs positions (you can also order them but this is not important) in your application
Screening procedure is as follows
- Each application will be screened by the respective supervisors from the host organizations
- Prospective candidates will be contacted by the supervisors for individual interviews and the best ones will be shortlisted
- The shortlisted candidates will be interviewed by the recruitment commission either in person or by SKYPE/Zoom
- The candidates will be informed by e-mail about the results of their applications