dig_sequence
dig_sequence
provides targeted annotation of genomic sequences. The sequences can be as small as a single coding region,
or as large as an entire locus. The tool will search for the sequence that best matches a specified target sequence, and
annotate just that best match.
Options are available to annotate a single sequence in a FASTA file, a single genbank ID (the sequence will be fetched from Genbank), or a list of sequences or Genbank IDs specified in a CSV file. For GenBank requests, an email address must be provided, as this is a requirement of the GenBank API.
Example usage is described in Targeted Annotation.
Annotate a genomic sequence representing the nominated receptor gene
usage: dig_sequence [-h] {fasta,single,multi,multi_seq} ...
Sub-commands
fasta
Annotate a single genomic sequence in a FASTA file
dig_sequence fasta [-h] [-align ALIGN] [-species SPECIES] [-motif_dir MOTIF_DIR] [-out_file OUT_FILE] [-debug] target germline_file query_file
Positional Arguments
- target
Name of nominated sequence in reference set
- germline_file
ungapped reference set containing the nominated sequence (FASTA)
- query_file
file containing the sequence to annotate (FASTA)
Named Arguments
- -align
gapped reference set to use for V gene alignments (required for V gene analysis
- -species
use motifs for the specified species provided with the package
- -motif_dir
use motif probability files present in the specified directory
- -out_file
output file (CSV)
- -debug
produce parsing_errors file with debug information
Default: False
single
Annotate a single sequence given its genbank accession number
dig_sequence single [-h] [-align ALIGN] [-species SPECIES] [-motif_dir MOTIF_DIR] [-out_file OUT_FILE] target germline_file genbank_acc email_addr
Positional Arguments
- target
Name of nominated sequence
- germline_file
ungapped reference set containing the nominated sequence (FASTA)
- genbank_acc
genbank accession number of the sequence to annotate
- email_addr
email address to provide to genbank
Named Arguments
- -align
gapped reference set to use for V gene alignments (required for V gene analysis
- -species
use motifs for the specified species provided with the package
- -motif_dir
use motif probability files present in the specified directory
- -out_file
output file (CSV)
multi
Read allele names and corresponding genbank accession numbers from a CSV file
dig_sequence multi [-h] [-align ALIGN] [-species SPECIES] [-motif_dir MOTIF_DIR] [-out_file OUT_FILE] locus germline_file query_file email_addr
Positional Arguments
- locus
Locus of nominated sequences
- germline_file
ungapped reference set containing the nominated sequence (FASTA)
- query_file
File containing list of targets and associated genbank accession numbers (CSV)
- email_addr
email address to provide to genbank
Named Arguments
- -align
gapped reference set to use for V gene alignments (required for V gene analysis
- -species
use motifs for the specified species provided with the package
- -motif_dir
use motif probability files present in the specified directory
- -out_file
output file (CSV)
multi_seq
Read allele names and genomic sequences from a CSV file
dig_sequence multi_seq [-h] [-align ALIGN] [-species SPECIES] [-motif_dir MOTIF_DIR] [-out_file OUT_FILE] locus germline_file query_file
Positional Arguments
- locus
Locus of nominated sequences
- germline_file
ungapped reference set containing the nominated sequence (FASTA)
- query_file
File containing list of targets and genomic sequences (CSV)
Named Arguments
- -align
gapped reference set to use for V gene alignments (required for V gene analysis
- -species
use motifs for the specified species provided with the package
- -motif_dir
use motif probability files present in the specified directory
- -out_file
output file (CSV)