Example: analyze a metagenome using 66,000 GTDB genomic representatives


This example uses the metagenome signature prepared in Example: create a metagenome signature from shotgun sequencing reads.

You'll also need to download the GTDB database as in Example: download and search 66,000 GTDB genomes with a query genome.

Now, run sourmash gather:

sourmash gather SRR5950647.sig gtdb-rs207.genomic-reps.dna.k31.zip

This should take about 5 minutes.

The output should look like this:

overlap     p_query p_match avg_abund
---------   ------- ------- ---------
383.0 kbp      2.0%    7.8%       1.6    GCF_003697165.2 Escherichia coli DSM ...
187.0 kbp      1.0%    5.0%       1.6    GCF_015074785.1 Prevotella copri stra...
142.0 kbp      0.7%    2.8%       1.4    GCF_000012825.1 Bacteroides vulgatus ...
164.0 kbp      0.3%    1.4%       1.7    GCF_019127135.1 Prevotella copri stra...
found less than 50.0 kbp in common. => exiting

found 4 matches total;
the recovered matches hit 4.0% of the abundance-weighted query

This a minimum metagenome cover for the metagenome, based on the genomes in the GTDB database: in brief, it provides a shortest list of genomes that contain all of the known content in the metagenome (in this case, about 4%).

Note: more of the metagenome might be matched if you used a larger database or a database that included eukaryotic and/or host sequence.


This example belongs to the following categories: