Anotation format
This page describes the annotation file produced by digger / find_alignments
Columns in the Annotation File
In addition to the columns in the first table, the file contains the columns in the second table, prefixed by the reference name, for each reference specified with a -ref argument.
Column Name |
Meaning |
---|---|
contig |
ID of the sequence in which the gene or pseudogene was found |
start |
start co-ord of the coding region |
end |
end co-ord of the coding region |
start_rev |
start co-ord in the reverse-primed sequence |
end_rev |
end co-ord in the reverse-primed sequence |
sense |
sense (relative to the input sequence) |
gene_type |
gene type (e.g. IGHV) |
gene_start start co-ord of the entire gene including flanking regions |
|
gene_end end co-ord of the entire gene including flanking regions |
|
gene_start start co-ord of the entire gene including flanking regions in the reverse-primed sequence |
|
gene_end end co-ord of the entire gene including flanking regions in the reverse-primed sequence |
|
likelihood |
likelihood that the RSS is that of a functional gene (compared to a random sequence) |
l_part1 |
leader part 1 equence |
l_part2 |
leader part 2 sequence |
v_heptamer |
v-heptamer sequence |
v_nonamer |
v-nonamer sequence |
j_heptamer |
j-heptamer sequence |
j_nonamer |
j-nonamer sequence |
j_frame |
coding frame of the first nucleotide of the j region (0, 1 or 2) |
d_3_heptamer |
3-prime d-heptamer sequence |
d_3_nonamer |
3-prime d-nonamer sequence |
d_5_heptamer |
5-prime d-heptamer sequence |
d_5_nonamer |
5-prime d-nonamer sequence |
functional |
functionality (see below) |
notes |
annotation notes |
aa |
amino acid translation of the coding region |
v-gene_aligned_aa |
IMGT-gapped amino acid translation of the coding sequence (for V-genes) |
seq |
sequence of the coding region |
seq_gapped |
IMGT-gapped sequence of the coding region (V-genes only) |
5_rss_start |
co-ordinates of the 5-prime RSS |
5_rss_start_rev |
|
5_rss_end |
|
5_rss_end_rev |
|
3_rss_start |
co-ordinates of the 3-prime RSS |
3_rss_start_rev |
|
3_rss_end |
|
3_rss_end_rev |
|
l_part1_start |
co-ordinates of the leader part 1 |
l_part1_start_rev |
|
l_part1_end |
|
l_part1_end_rev |
|
l_part2_start |
co-ordinates of the leader part 2 |
l_part2_start_rev |
|
l_part2_end |
|
l_part2_end_rev |
|
matches |
number of matches to this start/end region that were produced in the BLAST analysis |
blast_match |
gene in the reference file with the highest match score in this start/end region |
blast_score |
the highest BLAST match score in this start/end region |
blast_nt_diffs |
the number of nucleotides differing from the most highly scoring reference sequence in this BLAST match |
evalue |
evalue of the most highly scoring BLAST match in this start/end region |
Columns provided for each -ref:
Column Name |
Meaning |
---|---|
_match |
ID of the closest matching reference gene |
_score |
score of the closest match |
_nt_diffs |
number of nucleotides differing from the closest reference sequence |
Functionality
Functionality is assigned as follows:
Functional
RSS and leader meet or exceed position-weighted matrix threshold
Highly-conserved nucleotides agree with the definition for the locus, if a definition has been specified
If a V-gene, leader starts with ATG, and spliced leader has no stop codons
If a V-gene, coding region has no stop codons before the cysteine at IMGT position 104
If a V-gene, conserved nucleotides are at the expected locations
If a J-gene, donor splice is as expected and coding region has no stop codons
ORF
One or more of the above conditions are not met, but no stop codon has been detected
If a V-gene, leader starts with ATG
Pseudo
Coding region contains stop codon(s)
Leader does not start with ATG