Every Condition,
Mathematically Guaranteed

Our synthetic cfDNA platform generates conditions using deterministic mathematical relationships — not approximations. Here's how each condition category works and why you can trust the labels.

Synthetic genomic data is only as useful as the accuracy of the conditions it represents. Our generation pipeline doesn't estimate or approximate — it applies known mathematical relationships to produce signals that are correct by construction. Each of the 107 supported conditions across 6 categories maps to a precise genomic alteration with a predictable, verifiable effect on the sequencing output.

1. The Core Principle

Our pipeline separates maternal and foetal fragment generation. Maternal cfDNA is sampled at baseline coverage. Foetal cfDNA is sampled with condition-specific modifiers that adjust fragment representation across the genome. The two are mixed at a specified foetal fraction (FF).

The resulting signal in the mixed sample follows directly from the mathematics:

observed_signal = (1 - FF) × maternal_baseline + FF × foetal_modifier

This means every condition produces a known, calculable effect. Detection algorithms applied to our synthetic data are recovering a signal whose magnitude is mathematically defined — not biologically uncertain.

2. How Each Condition Category Works

Aneuploidies

12 conditions

Trisomy 21 (Down syndrome), Trisomy 18 (Edwards), Trisomy 13 (Patau), Monosomy X (Turner), XXY (Klinefelter), XYY, XXX, and others.

The simplest case. A trisomy means the foetal genome carries 3 copies of a chromosome instead of 2. This produces a coverage increase of exactly FF/2 on the affected chromosome. At 10% foetal fraction, chromosome 21 in a T21 sample shows a 5% coverage increase — sufficient to produce a z-score well above the clinical detection threshold of 3.0.

coverage_change = (copy_number / 2 - 1) × FF

Monosomies work in reverse: one fewer copy produces a coverage decrease of FF/2.

This is arithmetic, not modelling. The signal magnitude is a direct consequence of copy number and mixing proportion.

Microdeletions

18 conditions

22q11.2 (DiGeorge), 5p− (Cri du Chat), 4p− (Wolf-Hirschhorn), 1p36, 15q11.2 (Prader-Willi/Angelman), 7q11.23 (Williams), and others.

The same mathematics as aneuploidies, applied to a defined genomic region rather than a whole chromosome. One copy of the specified region is absent from the foetal genome, producing a regional coverage decrease of FF/2 at the exact coordinates of the deletion. Flanking regions are unaffected.

regional_coverage = (1 - FF) × 1.0 + FF × 0.5

Microduplications

7 conditions

22q11.2, 7q11.23, 17p11.2 (Potocki-Lupski), 16p11.2, 15q11.2.

The inverse of microdeletions. An extra copy of the specified region produces a regional coverage increase of FF/2 at the exact coordinates of the duplication.

regional_coverage = (1 - FF) × 1.0 + FF × 1.5

Monogenic Conditions

50+ conditions

Cystic fibrosis, sickle cell anaemia, achondroplasia, Noonan syndrome, spinal muscular atrophy, Duchenne muscular dystrophy, beta-thalassaemia, and many more — covering skeletal dysplasias, RASopathies, metabolic disorders, connective tissue disorders, and oncology-relevant variants.

Rather than altering copy number, these conditions introduce specific variant alleles at known genomic positions. The variant appears only in foetal fragments at a deterministic frequency.

heterozygous VAF = FF × 0.5 | homozygous VAF = FF

At 10% foetal fraction, a heterozygous achondroplasia variant (FGFR3 G380R) appears at approximately 5% VAF. A homozygous sickle cell variant (HBB Glu6Val) appears at approximately 10% VAF. These are not estimates — they are mathematical consequences of the mixing proportion and zygosity.

Maternal fragments never carry the foetal variant. There is no leakage between the two genomes.

Repeat Expansions

4 conditions

Huntington disease (HTT CAG), Fragile X syndrome (FMR1 CGG), myotonic dystrophy type 1 (DMPK CTG), spinocerebellar ataxia type 1 (ATXN1 CAG).

Expanded repeat motifs are inserted into foetal fragments at the correct genomic locus, exceeding the pathogenic threshold for each condition. Maternal fragments retain normal repeat lengths.

Imprinting Disorders

2 conditions

Beckwith-Wiedemann syndrome, Russell-Silver syndrome.

These conditions involve parent-of-origin effects and uniparental disomy. The foetal genome reflects the correct parental copy number imbalance in the imprinted region, with associated fragment size signatures.

3. Verification Across All Categories

Every condition category is covered by automated verification tests that confirm:

  • The affected region shows the mathematically expected signal change — and only the affected region.
  • Unaffected chromosomes and genomic regions remain at baseline.
  • Variant alleles appear at the correct frequency and only in foetal fragments.
  • Foetal fraction in the generated sample matches the specified value within ±2%.
  • Fragment-level properties — sequence composition, length distribution, GC content — remain biologically valid regardless of condition.

These tests run across multiple foetal fraction values (2% to 20%), confirming that signals scale correctly as foetal fraction changes.

Test Results: 65 / 65 Passed

CategoryTestsWhat's VerifiedStatus
Aneuploidies19Coverage ratios, z-scores, FF scaling, unaffected chromosomesPASS
Microdeletions7Regional coverage drop, flanking regions unaffectedPASS
Microduplications5Regional coverage increase, modifier precisionPASS
Monogenic / SNV12VAF correctness, zero maternal contamination, FF scalingPASS
Repeat Expansions7Correct motif, fetal-only insertion, pathogenic thresholdPASS
Imprinting2Coverage modifier, fragment size shiftPASS
Cross-category7Euploid control (no false positive), specificity checksPASS
Mathematical precision5Exact modifier values, sampling accuracyPASS
Total65ALL PASS

4. Why This Matters

When you work with our synthetic cfDNA data, the labels are exact. A sample labelled "Trisomy 21 at 8% foetal fraction" contains precisely the signal that a real T21 pregnancy at 8% foetal fraction would produce. The only variability comes from finite sampling — the same statistical noise present in real sequencing data.

This means you can:

  • Benchmark detection algorithms against known ground truth
  • Study the limits of detection at specific foetal fractions
  • Generate balanced training sets for rare conditions that have minimal real-world samples
  • Test edge cases with confidence that the underlying signal is correct

Ground Truth Guarantee

The conditions in your samples are exactly what the labels say they are. Every fragment, every variant, every coverage profile is deterministically derived from the specified condition and foetal fraction.

Back to Synthetic Data Products

Interested in validated synthetic cfDNA data?

Contact us to discuss your research or product development needs.