Foundation Models for
Prenatal Genomics

We combine transformer-based foundation models with advanced synthetic data generation to deliver earlier, more accurate prenatal genetic screening.

Foundation Model Architecture

Our clinical prenatal testing platform is built on a transformer-based foundation model trained on massive cfDNA datasets. Unlike traditional statistical approaches developed in 2008, our AI learns complex patterns across multiple data modalities.

The model uses multi-head attention mechanisms to capture long-range dependencies in genomic data, with the goal of detecting subtle signals at foetal fractions as low as 1-2%. This approach targets a fundamental advance over z-score methods that require 4%+ foetal fraction for reliable results.

Key technical innovations include:

  • Multi-modal input processing: fragment size, methylation patterns, coverage depth, and clinical metadata
  • Population-specific training for equitable performance across all ancestries
  • Uncertainty quantification distinguishing biological from technical limitations
  • Explainable AI providing confidence scores and clinical recommendations

Synthetic cfDNA Technology

Our synthetic data generation uses a conditional autoregressive model (AR v15) that produces biologically accurate cell-free DNA fragments. The model learns the complex statistical properties of real cfDNA and generates novel samples that pass standard NIPT detection pipelines.

The generation process provides precise control over:

  • Foetal fraction: 1-25% with continuous control
  • Karyotype: all common trisomies, sex chromosome aneuploidies, and microdeletions
  • Fragment characteristics: size distribution, GC content, coverage patterns
  • Sample depth: 1M to 16M fragments per sample

This technology enables unlimited training data for our clinical platform while also serving as a standalone product for researchers and NIPT developers worldwide.

4-Level Validation Framework

Our rigorous validation framework ensures synthetic data meets quality standards across multiple dimensions.

Level 1

Distributional Accuracy

Statistical validation ensuring synthetic fragments match real cfDNA distributions for GC content, fragment size, and genomic coverage.

92.9% match
Level 2

Z-Score Detection

Functional validation confirming synthetic aneuploid samples are detected by standard NIPT z-score algorithms at clinical thresholds.

100% T21 sensitivity
Level 3

Pipeline Compatibility

Integration testing with standard NIPT analysis pipelines to ensure synthetic data behaves as expected in downstream workflows.

Full compatibility
Level 4

Downstream Utility

Validation that models trained on synthetic data perform equivalently when deployed on real patient samples.

+10% AUC improvement

Synthetic Data Validation Results

100%T21 SensitivityAt clinical z-score threshold of 3.0
100%SpecificityZero false positives in euploid samples
92.9%Distribution MatchStatistical similarity to real cfDNA
107Conditions ModelledTrisomies, SCAs, microdeletions, microduplications, monogenic, oncology, and repeat expansions

Publications

Research publications are forthcoming in 2026. We are preparing manuscripts detailing our foundation model architecture, synthetic data validation, and clinical performance results.

Preprint expected H2 2026.

Commitment to Open Source

We believe in advancing prenatal genomics through open collaboration. Our synthetic data generation tools will be made available to the research community, enabling reproducible science and accelerating innovation worldwide.

View our GitHub

Explore Our Technology

Learn more about our clinical platform or start using our synthetic data today.