GenomeSync

Motivation

GenomeSync redefines genome availability. Instead of being available somewhere online, the genomes are now available on your hard disk, up-to-date and ready for analysis.

How does it work?

GenomeSync is a continuously synchronizing local collection of genomes. In a nutshell, it works like this: You install synchronization software and connect to our network. From this point, you have a collection of genome sequences that is always synchronized to the central repositority:

There are mainly two advantages of this approach: First, you are obtaining a large set of genomes, which otherwise could be difficult or time consuming to collect. Second, your local copy of the database remains up-to-date without you spending any time or effort.

Note that you can stop synchronization at any moment if you'd like to freeze your current database.

What is included?

The database contains thousands of complete (or nearly complete) genomes from genome databases. (statistics). We aim to include every publically available genome, except for large redundancies.

The database is accompanied by a matching subset of NCBI taxonomy database. All genome names are synchronized with it.

What is NOT included?

Thousands of non-reference human genomes. For animals and plants, only one genome per species is included.

Any non-public data. For example, any JGI genomes still under restrictions of their data release policy. It means that you are free to use the included genomes in any way.

Annotation. As of now, only sequence data is included.

Database structure

Each genome is stored as a single bzip-compressed FASTA file.


© 2015-2017 Kirill Kryukov
Available under the CC BY 4.0 License