Skip to main content

Test the platform with small synthetic genomes

If you'd like to test the platform, we recommend using the small HAPNEST dataset.

Synthetic HAPNEST data

Synthetic genomes don't contain any private or sensitive information but are compatible with bioinformatics tools like our platform.

A copy of the small HAPNEST dataset is available to download from:

https://ftp.ebi.ac.uk/pub/databases/spot/intervene/

There are three files to download:

  • hapnest.pgen
  • hapnest.pvar
  • hapnest.psam

This triplet of files contains genotypes (.pgen), variants (.pvar), and sample data (.psam) in PLINK 2 format.

When you work with these files it's important to keep all of the files together.

tip
  • These genomes are in genome build GRCh38.
  • Not all scores will work on the small HAPNEST dataset. Due to the low number of variants in the small dataset many scores will fail to pass the matching threshold (75%).
  • For testing the platform these selections will produce results : scores- PGS002299,PPGS004891; trait- triple negative breast cancer (EFO_0005537).

Other genomes

caution
  • Please do not upload sensitive data - like real sequenced genomes - to the platform
  • It's OK to upload publicly available and permissively licensed data, like 1000 Genomes. Read more about publicly available and synthetic genomes, including the large HAPNEST dataset.
  • The platform and documentation is being actively developed and tested.