Generative AI

Explore novel molecular space leveraging Generative Deep-Learning

Pending AI’s various capabilities are powered by artificial intelligence which is an experimental technology and may occasionally be misleading or incorrect.

Pending AI's Molecule Generator is a state-of-the-art, transformer-based service engineered to cut down the time and cost of early-stage drug discovery. It delivers ultra-high-throughput sampling of novel, diverse, and pharmacologically-relevant compounds, empowering R&D teams to rapidly explore vast chemical space and construct highly effective virtual libraries.

Our advanced deep learning models perform unconditional molecule generation based on the distribution of massive, quality-focused input datasets. Models are built with drug discovery as a primary concern:

  • Validity: Molecules are chemically sound, meeting standards for synthetic accessibility and structural integrity required for progression.

  • Novelty: Minimal structural overlap with known or patented compounds, ensuring a maximum RoI by focusing on true novel chemical entities.

  • Diversity: Compounds exhibit significant structural dissimilarity, reducing repeated chemical screens and maximising the chemical space explored for a target.

See the PAI Generator page for more information about the service.

PAI Generator

Applications

The Molecule Generator is built to address critical bottlenecks in the therapeutic development cycle - it flexibly addresses several applicable areas for integration into an existing drug discovery pipeline:

De Novo Drug Design

Rapid creation of entirely new molecular scaffolds to address novel or challenging therapeutic targets.

Efficiently enumerating and screening massive virtual libraries (in the billions) to identify promising lead candidates faster than traditional methods.

Novel Hit-Identification

Pinpointing unique, high-potential compounds ready for experimental validation and subsequent lead optimisation.

Focused Library Construction

Targeted sampling to build custom libraries for specific drug targets, scaffolds, or physicochemical property profiles.


Features & Advantages

1. High-Performance Architecture for Scale

Pending AI's proprietary transformer models are fine-tuned for the unique demands of chemical generation, prioritising speed without compromising quality.

  • Industry-Leading Throughput: Generate millions of high-quality molecules, drastically compressing the timeline for virtual library creation (screening is made readily available).

  • Precision Customisation: Leverage fine-tuning capabilities to train the generator on in-house datasets, directing sampling toward specific scaffolds or therapeutically relevant ADMET profiles.

2. Rigorous Quality Control & Benchmarking

A comprehensive benchmarking suite is used that validates every output against industry standards for drug-like quality.

  • Drug-Likeness Validation: Automated checks for validity, uniqueness, and adherence to established drug-like filters (e.g., Lipinski's Rule of Five).

  • Optimised Diversity: Continuous measurement of novelty and average dissimilarity to confirm the effective exploration of new chemical space.

  • Feature Consistency: Sampled physicochemical feature distributions (e.g., molecular weight, LogPLogP, TPSATPSA) are compared against optimal ranges, ensuring molecules possess certain characteristics.

3. Extensive and Vetted Chemical Data Foundation

The model is trained on a massive, diverse, and meticulously curated collection of molecular datasets drawn from both public and commercial sources. This foundation ensures the model has a vast embedded understanding of the complexity and rules of medicinal chemistry.

  • Data Integrity: A rigorous preparation process ensures underlying data is high-quality and standardised, resulting a robust model that generates reliable, chemically-sound outputs.


Model Overview

There are several models available for sampling molecules. Each model contains a semantically limited chemical space based on underlying training distributions sourced from large compound libraries.

Architecture components are specially designed for the drug discovery domain; advanced token embedding models optimised for novelty and diversity have been improved to further extract chemically relevant features for early Design & Make stages.

Model utility is uniquely validated against Pending AI's own drug discovery pipeline which has accelerated the early Hit Identification and refinement stages.

Name
Description
Overall Rank

Diverse Small Transformer

This model excels at generating a broad range of general chemical structures, including those found in patents and reaction databases, natural products, and known drug molecules. It offers high diversity in its generated output.

Docking Tiny Transformer

This model specialises in generating molecules with high drug-likeness and is particularly well-suited for virtual screening applications. It offers a very high throughput for generating potentially relevant compounds for early-stage drug discovery.

Docking Small Transformer

Building on the strengths of its predecessor, this model also excels at virtual screening and generating drug-like molecules. It has the highest throughput and improved rates of generating valid and unique molecules compared to previous iterations.

Docking Medium Transformer

This model is the most robust for virtual screening molecule generation. It offers an excellent balance of generating valid, unique, and drug-like molecules, and its larger architecture provides a more comprehensive exploration of the chemical space relevant to drug discovery, leading to a high amount of novel, drug-like compounds.

Last updated

Was this helpful?