Our Approach
Foundation Models for
Prenatal Genomics
We combine transformer-based foundation models with advanced synthetic data generation to deliver earlier, more accurate prenatal genetic screening.
Clinical Platform
Foundation Model Architecture
Our clinical prenatal testing platform is built on a transformer-based foundation model trained on massive cfDNA datasets. Unlike traditional statistical approaches developed in 2008, our AI learns complex patterns across multiple data modalities.
The model uses multi-head attention mechanisms to capture long-range dependencies in genomic data, with the goal of detecting subtle signals at foetal fractions as low as 1-2%. This approach targets a fundamental advance over z-score methods that require 4%+ foetal fraction for reliable results.
Key technical innovations include:
- Multi-modal input processing: fragment size, methylation patterns, coverage depth, and clinical metadata
- Population-specific training for equitable performance across all ancestries
- Uncertainty quantification distinguishing biological from technical limitations
- Explainable AI providing confidence scores and clinical recommendations
Data Generation
Synthetic cfDNA Technology
Our synthetic data generation uses a reference-conditioned autoregressive model (132.5M parameters) trained on 82 million real cfDNA fragments. The model generates sequences base-by-base, conditioned on the GRCh38 reference genome, producing reads that align with 94.2% yield and pass standard NIPT detection pipelines.
The generation process provides precise control over:
- Foetal fraction: 1-25% with continuous control
- Karyotype: all common trisomies, sex chromosome aneuploidies, and microdeletions
- Fragment characteristics: size distribution, GC content, coverage patterns
- Sample depth: 1M to 16M fragments per sample
This technology enables unlimited training data for our clinical platform while also serving as a standalone product for researchers and NIPT developers worldwide.
Validation
4-Level Validation Framework
Our rigorous validation framework ensures synthetic data meets quality standards across multiple dimensions.
Distributional Similarity
Chromosome distribution correlation r=0.999, GC content 41.2%, 17/22 chromosomes pass KS tests. PCA variance ratio 0.64 confirms no mode collapse.
r = 0.999WisecondorX Detection
Clinical-grade NIPT tool detects all synthetic trisomies (T21, T18, T13) at 8% fetal fraction with z-scores > 4.7.
100% detectionTrain Synthetic, Test Real
Classifier trained on synthetic data detects real T21 in 26 clinical samples (Lun et al. 2014, PRJNA215135). Detects cases standard NIPT misses.
AUC = 0.87, p = 0.002Augmentation Value
Adding synthetic data to limited real samples improves AUC by up to +19.7 points. Wins 88% of splits with only 3 real training samples.
+19.7pp boostResults
Synthetic Data Validation Results
Research
Publications
Research publications are forthcoming in 2026. We are preparing manuscripts detailing our foundation model architecture, synthetic data validation, and clinical performance results.
Preprint expected H2 2026.
Open Science
Commitment to Open Source
We believe in advancing prenatal genomics through open collaboration. Our synthetic data generation tools will be made available to the research community, enabling reproducible science and accelerating innovation worldwide.
View our GitHubExplore Our Technology
Learn more about our clinical platform or start using our synthetic data today.