Example: use sourmash tax to classify a metagenome

sourmash-bio/sourmash-examples#3


First, download and sketch 64 genomes from Awad et al., 2017 using the instructions in Example: download, sketch, and search a collection of FASTA files. You'll need podar-ref.zip.

Next, you'll need podar-lineage.csv -

curl -L [https://osf.io/4yhjw/download](https://osf.io/4yhjw/download) -o podar-lineage.csv

Next we'll make a fake metagenome consisting of a signature created by merging two Shewanella baltica signatures.

First, extract signatures with Shewanella in the name from podar-ref.zip:

sourmash sig grep Shewanella podar-ref.zip -o shew-matches.sig

Then, use sourmash sig merge to merge them into one signature:

sourmash sig merge shew-matches.sig -o shew-merge.sig

This is our fake metagenome that we'll use to demonstrate sourmash tax.

Now that we've got our fake metagenome, run sourmash gather to find the minimum metagenome cover:

sourmash gather shew-merge.sig podar-ref.zip -o out.csv

and then run sourmash tax metagenome to classify this as a "mixture" using the matching genomes from out.csv and their taxonomy in podar-lineage.csv:

sourmash tax -g out.csv -t podar-lineage.csv

and you should see:

,genus,1.000,Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Shewanellaceae;Shewanella,491c0a81,,1.000,7886000
,species,1.000,Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Shewanellaceae;Shewanella;Shewanella baltica,491c0a81,,1.000,7886000

which shows that this is 100% Shewanella, as, well expected :).

Categories

This example belongs to the following categories: