• LOGIN
    Login with username and password
Repository logo

BORIS Portal

Bern Open Repository and Information System

  • Publications
  • Projects
  • Research Data
  • Organizations
  • Researchers
  • More
  • Statistics
  • LOGIN
    Login with username and password
Repository logo
Unibern.ch
  1. Home
  2. Publications
  3. Exploring the GDB-13 chemical space using deep generative models
 

Exploring the GDB-13 chemical space using deep generative models

Options
  • Details
  • Files
BORIS DOI
10.7892/boris.138445
Publisher DOI
10.1186/s13321-019-0341-z
PubMed ID
30868314
Description
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) with a subset of the enumerated database GDB-13 (975 million molecules). We show that a model trained with 1 million structures (0.1% of the database) reproduces 68.9% of the entire database after training, when sampling 2 billion molecules. We also developed a method to assess the quality of the training process using negative log-likelihood plots. Furthermore, we use a mathematical model based on the “coupon collector problem” that compares the trained model to an upper bound and thus we are able to quantify how much it has learned. We also suggest that this method can be used as a tool to benchmark the learning capabilities of any molecular generative model architecture. Additionally, an analysis of the generated chemical space was performed, which shows that, mostly due to the syntax of SMILES, complex molecules with many rings and heteroatoms are more difficult to sample.
Date of Publication
2019
Publication Type
Article
Subject(s)
500 - Science::570 - Life sciences; biology
500 - Science::540 - Chemistry
Language(s)
en
Contributor(s)
Arus Pous, Josep
Departement für Chemie und Biochemie (DCB)
Blaschke, Thomas
Ulander, Silas
Reymond, Jean-Louisorcid-logo
Departement für Chemie und Biochemie (DCB)
Chen, Hongming
Engkvist, Ola
Additional Credits
Departement für Chemie und Biochemie (DCB)
Series
Journal of cheminformatics
Publisher
Springer
ISSN
1758-2946
Access(Rights)
open.access
Show full item
BORIS Portal
Bern Open Repository and Information System
Build: ae9592 [15.12. 16:43]
Explore
  • Projects
  • Funding
  • Publications
  • Research Data
  • Organizations
  • Researchers
  • Audiovisual Material
  • Software & other digital items
More
  • About BORIS Portal
  • Send Feedback
  • Cookie settings
  • Service Policy
Follow us on
  • Mastodon
  • YouTube
  • LinkedIn
UniBe logo