Example: create a signature by downloading and sketching a genome sequence

sourmash-bio/sourmash-examples#11


first, download a genome:

curl -JLO  https://osf.io/bjh2y/download

This will create a 1.4MB file GCF_000005845.2_ASM584v2_genomic.fna.gz containing an E. coli K-12 genome for strain MG1655 (see Genbank entry).

Next, calculate the signature using sourmash sketch dna:

sourmash sketch dna -p abund GCF_000005845.2_ASM584v2_genomic.fna.gz

here, the -p abund tells sourmash sketch to also retain the abundance (frequency) information for k-mers.

This will produce a signature file, GCF_000005845.2_ASM584v2_genomic.fna.gz.sig, that is much smaller than the original genome file (86k vs 1.4 MB).

You can view the metadata properties of this signature with sourmash sig describe:

sourmash sig describe GCF_000005845.2_ASM584v2_genomic.fna.gz.sig

This example was taken from Large scale sequence comparisons with sourmash, Pierce et al., 2019.

Categories

This example belongs to the following categories: