Available Now

Synthetic cfDNA
That Works

Unlimited biologically accurate cfDNA data for NIPT algorithm development, validation, and research. Off-the-shelf datasets or fully customised generation.

0.87AUC vs Real T21Train synthetic, test on karyotyped samples (0.98 excl. anomaly)
100%WisecondorX DetectionT21, T18, T13 all detected at FF=8%
+19.7ppAugmentation BoostAUC improvement with only 3 real samples
p = 0.002Statistical SignificancePermutation test, 1,000 resamples

Validated against 26 real clinical cfDNA samples (6 T21 patients, 20 euploid) from Lun et al. 2014, PRJNA215135.

Detecting Real T21 With Synthetic Training Data

We trained a classifier entirely on synthetic cfDNA — zero real patient data in training. Then tested it blind on 26 real clinical samples from an independent dataset (Lun et al. 2014, PRJNA215135), including 6 karyotype-confirmed trisomy 21 cases.

Per-Patient Results

4 of 6 T21 samples would fail standard clinical NIPT (z-score < 3.0). Our synthetic-trained model correctly identifies them.

PatientZ-Score (real ref)Standard NIPTTSTR Detection
NIPD-035.45DetectedAbove all euploid
NIPD-603.43BorderlineAbove all euploid
NIPD-072.47FAILAbove all euploid
NIPD-501.57FAILAbove all euploid
NIPD-041.29FAILBelow 2 euploid
NIPD-660.86FAIL (anomaly)Below 14 euploid

Summary Metrics

MetricValueNotes
TSTR AUC0.8676 T21 vs 20 euploid, trained entirely on synthetic
Adjusted AUC0.980Excluding NIPD-66 (genuine anomaly, FF ~1-2%)
Augmented AUC0.942Real + synthetic outperforms real-only (0.883) by +5.8pp
Permutation p-value0.0022/1000 random label shuffles matched observed AUC
Sub-threshold detection4/4T21 cases that FAIL standard NIPT correctly ranked

Key Finding

NIPD-66 is a genuine anomaly — chr21 fraction only 0.027pp above the euploid mean, implying fetal fraction ~1-2%. This sample would fail any chromosome-fraction NIPT method. Excluding it, the adjusted AUC is 0.980 and 4 of 5 T21 samples rank above all 20 euploid samples.

WisecondorX Trisomy Detection

Synthetic trisomy samples tested through WisecondorX, the clinical-grade NIPT tool used in European laboratories. 50 synthetic euploid samples form the reference panel.

ConditionSamplesZ-Score RangeDetection
Trisomy 2134.7 - 6.5Rank 1 (all)
Trisomy 1836.2 - 8.2Rank 1 (all)
Trisomy 1337.5 - 9.0Rank 1-2
Euploid controls50-1.2 - 1.1100% specificity

Synthetic Data Rescues Low-Data Regimes

When real training samples are scarce — the reality for rare aneuploidies — adding synthetic data dramatically improves classifier performance. 100 random stratified splits per condition.

N Real SamplesReal-Only AUCAugmented AUCImprovementWin Rate
30.6610.858+0.19788%
50.6910.832+0.14183%
70.7250.887+0.16387%
100.7680.877+0.10975%
200.8980.944+0.04626%

Practical Impact

With only 3 real training samples, synthetic augmentation improves AUC by +19.7 points (0.661 → 0.858) and wins 88% of random splits. Augmented models also have consistently lower variance — synthetic data stabilises training.

Download a Free T21 Sample

1M fragments, 15% fetal fraction, paired-end FASTQ. Evaluate in your own pipeline — no sign-up required.

Paired-end FASTQ~252 MBTrisomy 21
Download R1 Download R2

Choose Your Data Package

Off-the-shelf datasets for immediate use, or custom generation to your exact specifications.

Standard Dataset

Ready-to-use cfDNA samples

Contact Us

  • 1M-10M fragments per sample
  • Common aneuploidies: T21, T18, T13 + euploid controls
  • Configurable fetal fractions
  • Delivered as paired-end FASTQ or aligned BAM
  • Ground truth labels for all samples

Research Partnership

For academic institutions

Contact Us

  • Everything in Custom Generation
  • Academic pricing available
  • Co-authorship opportunities
  • Technical collaboration
  • Publication support

Proof That It Works

Our synthetic cfDNA has been validated end-to-end: generated sequences are aligned to GRCh38, then tested against real karyotype-confirmed clinical samples.

01

Alignment & Realism

94.2% of generated reads align at MAPQ ≥ 30. Insert size peaks at 166bp (mono-nucleosome). Error rate 0.26%, within 2x of real cfDNA.

94.2% MAPQ ≥ 30
02

Train Synthetic, Test Real

Classifier trained entirely on synthetic data detects real karyotyped T21 samples. Tested on 26 real clinical samples from PRJNA215135 (Lun et al. 2014). Detects T21 cases that standard NIPT would miss.

AUC = 0.87, p = 0.002
03

Clinical-Grade Detection

WisecondorX analysis detects all synthetic trisomies (T21, T18, T13) at 8% fetal fraction. Augmenting real data with synthetic improves AUC by up to +19.7 points.

100% detection, all trisomies
04

Statistical Rigour

Permutation testing (1,000 resamples) and bootstrap CIs confirm results are not due to chance. Low-data augmentation tested across 100 random splits per condition.

95% CI: [0.60, 1.00]

What You Can Build

Algorithm Development

Train and validate NIPT detection algorithms with unlimited labelled data. Test edge cases like low fetal fraction that are rare in real datasets.

Reference Panel Generation

Generate synthetic euploid reference panels for z-score based NIPT pipelines. WisecondorX achieves 100% trisomy detection using a purely synthetic reference.

Pipeline Validation

Drop synthetic BAMs directly into your existing bioinformatics pipeline. Validate end-to-end from FASTQ through alignment to variant calling.

Privacy-Compliant Research

Conduct research without patient data concerns. Synthetic data contains no identifiable information.

Education & Training

Train clinical scientists and bioinformaticians with realistic data. Perfect for courses and workshops.

Benchmark Creation

Create standardised benchmarks with known ground truth for comparing NIPT methods across laboratories.

What's Included

ParameterStandard DatasetCustom Generation
Fragments per sample1M - 10MConfigurable
Fragment length range50-250 bp50-250 bp
Fetal fractionConfigurable1% - 25%
ConditionsT21, T18, T13 + euploid controls107 including SCAs, microdeletions, microduplications, monogenic, oncology
Output formatPaired-end FASTQ or aligned BAM (GRCh38)FASTQ, BAM, HDF5, or custom
MetadataJSON per sample (FF, condition, params)Full provenance tracking
Ground truth labelsYesYes
DeliverySecure download linkSecure download or cloud storage

Book a 15-min Demo

See the data, ask questions, and find the right package for your needs.