Every Condition,
Mathematically Guaranteed
Our synthetic cfDNA platform generates conditions using deterministic mathematical relationships — not approximations. Here's how each condition category works and why you can trust the labels.
Synthetic genomic data is only as useful as the accuracy of the conditions it represents. Our generation pipeline doesn't estimate or approximate — it applies known mathematical relationships to produce signals that are correct by construction. Each of the 107 supported conditions across 6 categories maps to a precise genomic alteration with a predictable, verifiable effect on the sequencing output.
1. The Core Principle
Our pipeline separates maternal and foetal fragment generation. Maternal cfDNA is sampled at baseline coverage. Foetal cfDNA is sampled with condition-specific modifiers that adjust fragment representation across the genome. The two are mixed at a specified foetal fraction (FF).
The resulting signal in the mixed sample follows directly from the mathematics:
This means every condition produces a known, calculable effect. Detection algorithms applied to our synthetic data are recovering a signal whose magnitude is mathematically defined — not biologically uncertain.
2. How Each Condition Category Works
Aneuploidies
12 conditionsTrisomy 21 (Down syndrome), Trisomy 18 (Edwards), Trisomy 13 (Patau), Monosomy X (Turner), XXY (Klinefelter), XYY, XXX, and others.
The simplest case. A trisomy means the foetal genome carries 3 copies of a chromosome instead of 2. This produces a coverage increase of exactly FF/2 on the affected chromosome. At 10% foetal fraction, chromosome 21 in a T21 sample shows a 5% coverage increase — sufficient to produce a z-score well above the clinical detection threshold of 3.0.
Monosomies work in reverse: one fewer copy produces a coverage decrease of FF/2.
This is arithmetic, not modelling. The signal magnitude is a direct consequence of copy number and mixing proportion.
Microdeletions
18 conditions22q11.2 (DiGeorge), 5p− (Cri du Chat), 4p− (Wolf-Hirschhorn), 1p36, 15q11.2 (Prader-Willi/Angelman), 7q11.23 (Williams), and others.
The same mathematics as aneuploidies, applied to a defined genomic region rather than a whole chromosome. One copy of the specified region is absent from the foetal genome, producing a regional coverage decrease of FF/2 at the exact coordinates of the deletion. Flanking regions are unaffected.
Microduplications
7 conditions22q11.2, 7q11.23, 17p11.2 (Potocki-Lupski), 16p11.2, 15q11.2.
The inverse of microdeletions. An extra copy of the specified region produces a regional coverage increase of FF/2 at the exact coordinates of the duplication.
Monogenic Conditions
50+ conditionsCystic fibrosis, sickle cell anaemia, achondroplasia, Noonan syndrome, spinal muscular atrophy, Duchenne muscular dystrophy, beta-thalassaemia, and many more — covering skeletal dysplasias, RASopathies, metabolic disorders, connective tissue disorders, and oncology-relevant variants.
Rather than altering copy number, these conditions introduce specific variant alleles at known genomic positions. The variant appears only in foetal fragments at a deterministic frequency.
At 10% foetal fraction, a heterozygous achondroplasia variant (FGFR3 G380R) appears at approximately 5% VAF. A homozygous sickle cell variant (HBB Glu6Val) appears at approximately 10% VAF. These are not estimates — they are mathematical consequences of the mixing proportion and zygosity.
Maternal fragments never carry the foetal variant. There is no leakage between the two genomes.
Repeat Expansions
4 conditionsHuntington disease (HTT CAG), Fragile X syndrome (FMR1 CGG), myotonic dystrophy type 1 (DMPK CTG), spinocerebellar ataxia type 1 (ATXN1 CAG).
Expanded repeat motifs are inserted into foetal fragments at the correct genomic locus, exceeding the pathogenic threshold for each condition. Maternal fragments retain normal repeat lengths.
Imprinting Disorders
2 conditionsBeckwith-Wiedemann syndrome, Russell-Silver syndrome.
These conditions involve parent-of-origin effects and uniparental disomy. The foetal genome reflects the correct parental copy number imbalance in the imprinted region, with associated fragment size signatures.
3. Verification Across All Categories
Every condition category is covered by automated verification tests that confirm:
- The affected region shows the mathematically expected signal change — and only the affected region.
- Unaffected chromosomes and genomic regions remain at baseline.
- Variant alleles appear at the correct frequency and only in foetal fragments.
- Foetal fraction in the generated sample matches the specified value within ±2%.
- Fragment-level properties — sequence composition, length distribution, GC content — remain biologically valid regardless of condition.
These tests run across multiple foetal fraction values (2% to 20%), confirming that signals scale correctly as foetal fraction changes.
Test Results: 65 / 65 Passed
| Category | Tests | What's Verified | Status |
|---|---|---|---|
| Aneuploidies | 19 | Coverage ratios, z-scores, FF scaling, unaffected chromosomes | PASS |
| Microdeletions | 7 | Regional coverage drop, flanking regions unaffected | PASS |
| Microduplications | 5 | Regional coverage increase, modifier precision | PASS |
| Monogenic / SNV | 12 | VAF correctness, zero maternal contamination, FF scaling | PASS |
| Repeat Expansions | 7 | Correct motif, fetal-only insertion, pathogenic threshold | PASS |
| Imprinting | 2 | Coverage modifier, fragment size shift | PASS |
| Cross-category | 7 | Euploid control (no false positive), specificity checks | PASS |
| Mathematical precision | 5 | Exact modifier values, sampling accuracy | PASS |
| Total | 65 | ALL PASS |
4. Why This Matters
When you work with our synthetic cfDNA data, the labels are exact. A sample labelled "Trisomy 21 at 8% foetal fraction" contains precisely the signal that a real T21 pregnancy at 8% foetal fraction would produce. The only variability comes from finite sampling — the same statistical noise present in real sequencing data.
This means you can:
- Benchmark detection algorithms against known ground truth
- Study the limits of detection at specific foetal fractions
- Generate balanced training sets for rare conditions that have minimal real-world samples
- Test edge cases with confidence that the underlying signal is correct
Ground Truth Guarantee
The conditions in your samples are exactly what the labels say they are. Every fragment, every variant, every coverage profile is deterministically derived from the specified condition and foetal fraction.
Interested in validated synthetic cfDNA data?
Contact us to discuss your research or product development needs.