Connecting to GenomeSync
Disk space. The database is distributed in bzip2-compressed FASTA format. In this format it currently occupies about 400 GB, so this is the minimum amoung of disk space required for storing the complete database. In addition to that you may need space for analysis-ready format of your choice. In the typical use case you'll convert the genomes into BLAST format, which will consume about same space, i.e., another 400 GB. So, in this scenario you'll need about 800 GB of free space in total.
(It's possible to use a subset of the database, which would require much smaller space. For example, if you want to use only genomes of Fungi, you'll only need about 20 GB.)
Fast internet connection. Using GenomeSync database obviously needs a reasonably fast internet connection. Having a slow connection does not mean you can't use GenomeSync - it simply means that downloading and synchronizing will take longer time. You can easily calculate the upper bound on downloading time if you know the bandwidth of your connection.
On the other hand, having superfast connection does not guarantee fast downloading speed, because this also depends on how fast the other nodes can upload to you, and how many nodes are uploading and downloading at this time.
Unobstructed internet connection. This may be less obvious, but most workspaces only provide a very limited connection to the internet. The connection might be fast, but severely limited on other ways: No incoming connections are possible, only specific protocols and ports can be used, etc. This usually includes a ban on all and any p2p networking. If this is your situation, your options may include some of all of the following:
- Applying for firewall exceptions for your machine.
- Convincing the management to lift the p2p ban.
- Using GenomeSync outside of your workspace.
Connecting via Syncthing
- If you are behind a firewall or router, you need to configure it to be able to receive incoming connections to TCP port 22000 (help)
- Download and start Syncthing
- In the Syncthing web interface, click "Add Device" button and device with this ID:
- Keep Device Name empty
- Keep Addresses as "dynamic"
- Check "Introducer" box
- Don't select any folders for sharing
- Wait until your device is added on master node (may take a day or two)
- You will then see a "New Folder" message saying that KIRR-PC wants to share "GenomeSync" folder with you.
- Click "Add" and select the location for this folder ("Folder Path" field).
- The hard drive where you save the folder should have sufficient free space
- Change rescan iterval to 3600 seconds (one hour), or longer, instead of default 1 minute.
- Leave "Folder Master" checkbox clear. - Click "Save"
- Syncthing will show a "Restart Needed" message. Click "Restart" (it only restarts Syncthing, not the whole machine).
- Wait for your machine to synchronize with the cluster. It may take under one day if you have good connection and fast CPU.
Note: Syncthing does not set itself to auto-start. It's OK to start it manually each time you restart the machine, alternatively you can use your OS facility to set it to auto-start (such as Startup folder in Windows).
Verifying the integrity of dowloaded files
You can use the following sets of SHA hashes:
Example command: bzip2 -dc ~/tmp/all.fna.bz2.sha512.bz2 | sha512sum -c (execute in bz2 directory).
Downloading part of the database
|© 2015-2017 Kirill Kryukov
Available under the CC BY 4.0 License