Example: create a signature by downloading and sketching a genome sequence
sourmash-bio/sourmash-examples#11
first, download a genome:
curl -JLO https://osf.io/bjh2y/download
This will create a 1.4MB file GCF_000005845.2_ASM584v2_genomic.fna.gz
containing an E. coli K-12 genome for strain MG1655 (see Genbank entry).
Next, calculate the signature using sourmash sketch dna
:
sourmash sketch dna -p abund GCF_000005845.2_ASM584v2_genomic.fna.gz
here, the -p abund
tells sourmash sketch
to also retain the abundance (frequency) information for k-mers.
This will produce a signature file, GCF_000005845.2_ASM584v2_genomic.fna.gz.sig
, that is much smaller than the original genome file (86k vs 1.4 MB).
You can view the metadata properties of this signature with sourmash sig describe
:
sourmash sig describe GCF_000005845.2_ASM584v2_genomic.fna.gz.sig
This example was taken from Large scale sequence comparisons with sourmash, Pierce et al., 2019.
Categories
This example belongs to the following categories: