Example: search collection manifests directly
sourmash-bio/sourmash-examples#5
This example follows from Example: use picklists and manifests to work with a small subset of a large database.
In #4, we see how to use a picklist to search a small subset of GTDB. But we can make this even simpler.
Build a manifest that you can search directly
Specifying the picklist is cumbersome if you want to search this database repeatedly. sourmash also lets you build a collection manifest that contains pointers into the database, using sig check
(doc link).
To create the collection manifest, first make a "pathlist" file, a text file containing the paths of sourmash databases you want to work with. In this case, the file will only contain one pathname, the GTDB database.
ls gtdb-rs207.genomic-reps.dna.k31.zip > pathlist.txt
Then, run:
sourmash sig check --picklist shew-picklist.csv:name:name \
pathlist.txt --save-manifest shew-mf.csv
which will create a manifest in shew-mf.csv
for all the entries that match the given picklist.
You can now search this manifest file directly:
sourmash search shew-query.sig shew-mf.csv
Categories
This example belongs to the following categories: