Example: download, sketch, and search a collection of FASTA files
Download 64 genomes from Awad et al., 2017:
curl -L https://osf.io/download/vbhy5 -o podar-ref.tar.gz
tar xzf podar-ref.tar.gz
Sketch them all in DNA space using default parameters, using GNU parallel per https://github.com/sourmash-bio/sourmash/issues/1796:
parallel -j 8 sourmash sketch dna {} -o {}.sig --name-from-first ::: $(ls *.fa)
Build a search database from all the signature files for k=31:
sourmash sig cat *.sig -k 31 -o podar-ref.zip
Search the database with one of the Shewanella genomes:
sourmash search 63.fa.sig podar-ref.zip
and you should see:
similarity match
---------- -----
similarity match
---------- -----
100.0% NC_011663.1 Shewanella baltica OS223, complete genome
32.1% NC_009665.1 Shewanella baltica OS185, complete genome
This example belongs to the following categories: