make_igblast_ndm

Make IMGT-delineated igblast ndm file from an IMGT-gapped variable-region reference set or AIRR-C JSON format file.

usage: make_igblast_ndm [-h] [--cdr_coords CDR_COORDS] ref_file chain ndm_file

Positional Arguments

ref_file

gapped reference set (.fasta, .json)

chain

chain as required by igblast (eg VH)

ndm_file

output file (ndm)

Named Arguments

--cdr_coords, -c

comma-separated list of 1-based co-ordinates for CDR1, CDR2 and CDR3-start (default 79,114,166,195,313)

Notes

If the reference set is available in AIRR-C JSON format, it is recommended to use this format as it will ensure correct placement of the CDRs using coordinates for each sequence from the file. All germline sets on OGRDB are available in this format.

If the fasta format is used, V sequences must be IMGT gapped. By default, make_igblast_ndm assumes that the gapped alignment follows the IMGT numbering scheme with no inserted codons. Please note that some reference sets available from IMGT, such as those for rhesus macaque, do contain insertions in the alignment. In this case, you can use the --cdr_coords option to specify coordinates of the CDRs. If in doubt, please analyse one sequence from the gapped set in IMGT V-QUEST and review the codon alignment: if there are insertions, denoted by a codon number followed by A, B, C, etc., you will need to specify the CDR coordinates using the --cdr_coords option. An alternative is to review the sequences in the reference set and remove the insertions. This may be an option, depending upon the application, the number of germline sequences that have the insertion, and their usage in your samples. We offer a specific tool, fix_macaque_gaps to conduct this realignment for IMGT rhesus macaque sets.

Please note that downstream analysis tools may assume that IMGT-gapped sequences follow the default alignment with no inserted codons. If you use a tool that requires a gapped set of V sequences and there is no provision to specify CDR coordinates, this is likely to be the case.