Generative AI
Explore novel molecular space leveraging Generative Deep-Learning
Pending AI's Molecule Generator is a state-of-the-art, transformer-based service engineered to cut down the time and cost of early-stage drug discovery. It delivers ultra-high-throughput sampling of novel, diverse, and pharmacologically-relevant compounds, empowering R&D teams to rapidly explore vast chemical space and construct highly effective virtual libraries.
Our advanced deep learning models perform unconditional molecule generation based on the distribution of massive, quality-focused input datasets. Models are built with drug discovery as a primary concern:
Validity: Molecules are chemically sound, meeting standards for synthetic accessibility and structural integrity required for progression.
Novelty: Minimal structural overlap with known or patented compounds, ensuring a maximum RoI by focusing on true novel chemical entities.
Diversity: Compounds exhibit significant structural dissimilarity, reducing repeated chemical screens and maximising the chemical space explored for a target.
See the PAI Generator page for more information about the service.
PAI GeneratorApplications
The Molecule Generator is built to address critical bottlenecks in the therapeutic development cycle - it flexibly addresses several applicable areas for integration into an existing drug discovery pipeline:
De Novo Drug Design
Rapid creation of entirely new molecular scaffolds to address novel or challenging therapeutic targets.
Efficiently enumerating and screening massive virtual libraries (in the billions) to identify promising lead candidates faster than traditional methods.
Novel Hit-Identification
Pinpointing unique, high-potential compounds ready for experimental validation and subsequent lead optimisation.
Focused Library Construction
Targeted sampling to build custom libraries for specific drug targets, scaffolds, or physicochemical property profiles.
Features & Advantages
1. High-Performance Architecture for Scale
Pending AI's proprietary transformer models are fine-tuned for the unique demands of chemical generation, prioritising speed without compromising quality.
Industry-Leading Throughput: Generate millions of high-quality molecules, drastically compressing the timeline for virtual library creation (screening is made readily available).
Precision Customisation: Leverage fine-tuning capabilities to train the generator on in-house datasets, directing sampling toward specific scaffolds or therapeutically relevant ADMET profiles.
2. Rigorous Quality Control & Benchmarking
A comprehensive benchmarking suite is used that validates every output against industry standards for drug-like quality.
Drug-Likeness Validation: Automated checks for validity, uniqueness, and adherence to established drug-like filters (e.g., Lipinski's Rule of Five).
Optimised Diversity: Continuous measurement of novelty and average dissimilarity to confirm the effective exploration of new chemical space.
Feature Consistency: Sampled physicochemical feature distributions (e.g., molecular weight, , ) are compared against optimal ranges, ensuring molecules possess certain characteristics.

3. Extensive and Vetted Chemical Data Foundation
The model is trained on a massive, diverse, and meticulously curated collection of molecular datasets drawn from both public and commercial sources. This foundation ensures the model has a vast embedded understanding of the complexity and rules of medicinal chemistry.
Data Integrity: A rigorous preparation process ensures underlying data is high-quality and standardised, resulting a robust model that generates reliable, chemically-sound outputs.
Model Overview
There are several models available for sampling molecules. Each model contains a semantically limited chemical space based on underlying training distributions sourced from large compound libraries.
Architecture components are specially designed for the drug discovery domain; advanced token embedding models optimised for novelty and diversity have been improved to further extract chemically relevant features for early Design & Make stages.
Diverse Small Transformer
This model excels at generating a broad range of general chemical structures, including those found in patents and reaction databases, natural products, and known drug molecules. It offers high diversity in its generated output.
Docking Tiny Transformer
This model specialises in generating molecules with high drug-likeness and is particularly well-suited for virtual screening applications. It offers a very high throughput for generating potentially relevant compounds for early-stage drug discovery.
Docking Small Transformer
Building on the strengths of its predecessor, this model also excels at virtual screening and generating drug-like molecules. It has the highest throughput and improved rates of generating valid and unique molecules compared to previous iterations.
Docking Medium Transformer
This model is the most robust for virtual screening molecule generation. It offers an excellent balance of generating valid, unique, and drug-like molecules, and its larger architecture provides a more comprehensive exploration of the chemical space relevant to drug discovery, leading to a high amount of novel, drug-like compounds.
Last updated
Was this helpful?

