Available Now
Synthetic cfDNA
That Works
Unlimited biologically accurate cfDNA data for NIPT algorithm development, validation, and research. Off-the-shelf datasets or fully customised generation.
Validated against 26 real clinical cfDNA samples (6 T21 patients, 20 euploid) from Lun et al. 2014, PRJNA215135.
Train Synthetic, Test Real
Detecting Real T21 With Synthetic Training Data
We trained a classifier entirely on synthetic cfDNA — zero real patient data in training. Then tested it blind on 26 real clinical samples from an independent dataset (Lun et al. 2014, PRJNA215135), including 6 karyotype-confirmed trisomy 21 cases.
Per-Patient Results
4 of 6 T21 samples would fail standard clinical NIPT (z-score < 3.0). Our synthetic-trained model correctly identifies them.
| Patient | Z-Score (real ref) | Standard NIPT | TSTR Detection |
|---|---|---|---|
| NIPD-03 | 5.45 | Detected | Above all euploid |
| NIPD-60 | 3.43 | Borderline | Above all euploid |
| NIPD-07 | 2.47 | FAIL | Above all euploid |
| NIPD-50 | 1.57 | FAIL | Above all euploid |
| NIPD-04 | 1.29 | FAIL | Below 2 euploid |
| NIPD-66 | 0.86 | FAIL (anomaly) | Below 14 euploid |
Summary Metrics
| Metric | Value | Notes |
|---|---|---|
| TSTR AUC | 0.867 | 6 T21 vs 20 euploid, trained entirely on synthetic |
| Adjusted AUC | 0.980 | Excluding NIPD-66 (genuine anomaly, FF ~1-2%) |
| Augmented AUC | 0.942 | Real + synthetic outperforms real-only (0.883) by +5.8pp |
| Permutation p-value | 0.002 | 2/1000 random label shuffles matched observed AUC |
| Sub-threshold detection | 4/4 | T21 cases that FAIL standard NIPT correctly ranked |
Key Finding
NIPD-66 is a genuine anomaly — chr21 fraction only 0.027pp above the euploid mean, implying fetal fraction ~1-2%. This sample would fail any chromosome-fraction NIPT method. Excluding it, the adjusted AUC is 0.980 and 4 of 5 T21 samples rank above all 20 euploid samples.
Clinical Pipeline
WisecondorX Trisomy Detection
Synthetic trisomy samples tested through WisecondorX, the clinical-grade NIPT tool used in European laboratories. 50 synthetic euploid samples form the reference panel.
| Condition | Samples | Z-Score Range | Detection |
|---|---|---|---|
| Trisomy 21 | 3 | 4.7 - 6.5 | Rank 1 (all) |
| Trisomy 18 | 3 | 6.2 - 8.2 | Rank 1 (all) |
| Trisomy 13 | 3 | 7.5 - 9.0 | Rank 1-2 |
| Euploid controls | 50 | -1.2 - 1.1 | 100% specificity |
Augmentation Value
Synthetic Data Rescues Low-Data Regimes
When real training samples are scarce — the reality for rare aneuploidies — adding synthetic data dramatically improves classifier performance. 100 random stratified splits per condition.
| N Real Samples | Real-Only AUC | Augmented AUC | Improvement | Win Rate |
|---|---|---|---|---|
| 3 | 0.661 | 0.858 | +0.197 | 88% |
| 5 | 0.691 | 0.832 | +0.141 | 83% |
| 7 | 0.725 | 0.887 | +0.163 | 87% |
| 10 | 0.768 | 0.877 | +0.109 | 75% |
| 20 | 0.898 | 0.944 | +0.046 | 26% |
Practical Impact
With only 3 real training samples, synthetic augmentation improves AUC by +19.7 points (0.661 → 0.858) and wins 88% of random splits. Augmented models also have consistently lower variance — synthetic data stabilises training.
Data Products
Choose Your Data Package
Off-the-shelf datasets for immediate use, or custom generation to your exact specifications.
Standard Dataset
Ready-to-use cfDNA samples
Contact Us
- 1M-10M fragments per sample
- Common aneuploidies: T21, T18, T13 + euploid controls
- Configurable fetal fractions
- Delivered as paired-end FASTQ or aligned BAM
- Ground truth labels for all samples
Custom Generation
Data to your exact specifications
Contact Us
- Custom fragment depth per sample
- Sample count tailored to your study
- Custom fetal fractions: 1%-25%
- 107 conditions available
- Output as FASTQ, BAM, or HDF5
- Priority support included
Research Partnership
For academic institutions
Contact Us
- Everything in Custom Generation
- Academic pricing available
- Co-authorship opportunities
- Technical collaboration
- Publication support
Validation
Proof That It Works
Our synthetic cfDNA has been validated end-to-end: generated sequences are aligned to GRCh38, then tested against real karyotype-confirmed clinical samples.
Alignment & Realism
94.2% of generated reads align at MAPQ ≥ 30. Insert size peaks at 166bp (mono-nucleosome). Error rate 0.26%, within 2x of real cfDNA.
Train Synthetic, Test Real
Classifier trained entirely on synthetic data detects real karyotyped T21 samples. Tested on 26 real clinical samples from PRJNA215135 (Lun et al. 2014). Detects T21 cases that standard NIPT would miss.
Clinical-Grade Detection
WisecondorX analysis detects all synthetic trisomies (T21, T18, T13) at 8% fetal fraction. Augmenting real data with synthetic improves AUC by up to +19.7 points.
Statistical Rigour
Permutation testing (1,000 resamples) and bootstrap CIs confirm results are not due to chance. Low-data augmentation tested across 100 random splits per condition.
Applications
What You Can Build
Algorithm Development
Train and validate NIPT detection algorithms with unlimited labelled data. Test edge cases like low fetal fraction that are rare in real datasets.
Reference Panel Generation
Generate synthetic euploid reference panels for z-score based NIPT pipelines. WisecondorX achieves 100% trisomy detection using a purely synthetic reference.
Pipeline Validation
Drop synthetic BAMs directly into your existing bioinformatics pipeline. Validate end-to-end from FASTQ through alignment to variant calling.
Privacy-Compliant Research
Conduct research without patient data concerns. Synthetic data contains no identifiable information.
Education & Training
Train clinical scientists and bioinformaticians with realistic data. Perfect for courses and workshops.
Benchmark Creation
Create standardised benchmarks with known ground truth for comparing NIPT methods across laboratories.
Specifications
What's Included
| Parameter | Standard Dataset | Custom Generation |
|---|---|---|
| Fragments per sample | 1M - 10M | Configurable |
| Fragment length range | 50-250 bp | 50-250 bp |
| Fetal fraction | Configurable | 1% - 25% |
| Conditions | T21, T18, T13 + euploid controls | 107 including SCAs, microdeletions, microduplications, monogenic, oncology |
| Output format | Paired-end FASTQ or aligned BAM (GRCh38) | FASTQ, BAM, HDF5, or custom |
| Metadata | JSON per sample (FF, condition, params) | Full provenance tracking |
| Ground truth labels | Yes | Yes |
| Delivery | Secure download link | Secure download or cloud storage |
Book a 15-min Demo
See the data, ask questions, and find the right package for your needs.