create_alignment

Create a formatted alignment display from gap-aligned IG/TR alleles. This tool takes FASTA files containing IMGT-gapped nucleotide sequences and produces a human-readable alignment showing nucleotide sequences with amino acid translations and silent/nonsilent mutations. The tool should be used on a set of alleles from the same gene.

The tool supports V, D, and J sequence types with type-specific formatting:

  • V sequences: Display both nucleotides and amino acids with CDR region delineation

  • D sequences: Display only nucleotides (no amino acid translations)

  • J sequences: Display both nucleotides and amino acids without CDR regions (for translation, the alleles should be gap-aligned to a codon boundary at the 5’ end)

For V sequences, CDR regions are marked based on configurable codon coordinates.

Sequences are automatically sorted alphabetically by allele name for consistent output formatting.

Create a formatted alignment display from gapped sequences

usage: create_alignment [-h] [--codon_wrap CODON_WRAP] [--v_coords V_COORDS] [--filter FILTER] input_file {V,D,J} output_file

Positional Arguments

input_file

FASTA file containing gap-aligned nucleotide sequences

sequence_type

Possible choices: V, D, J

Type of sequence (V, D, or J) - currently only V is implemented

output_file

Output file for the formatted alignment

Named Arguments

--codon_wrap, -w

Number of codons per line before wrapping (default: 20)

Default: 20

--v_coords, -c

Comma-separated list of 1-based codon coordinates for CDR1, CDR2 and CDR3-start (default: 27,38,56,65,105)

--filter, -f

Only include sequences in the file that contain this substring (e.g. IGHV1-18)

CDR delineation

For V sequences, the CDR regions are marked using codon coordinates (because the input sequences are IMGT-aligned, the CDR positions occupy specific coordinates). By default, the tool uses the following IMGT codon positions, which are correct for IMGT-aligned human allele sequences: - CDR1 start: 27 - CDR1 end: 38 - CDR2 start: 56 - CDR2 end: 65 - CDR3 start: 105

Codon boundaries for other species may differ, and the user can specify alternative coordinates using the --v_coords option. As an example, the default coordinates would be specified as follows:

--v_coords "27,38,56,65,105"

Examples

Create an alignment of V sequences with default CDR coordinates:

create_alignment sequences.fasta V output.txt

Create an alignment of D sequences (nucleotides only):

create_alignment d_sequences.fasta D d_alignment.txt

Create an alignment with custom CDR coordinates and shorter line wrapping:

create_alignment sequences.fasta V output.txt --v_coords "25,30,54,60,100" --codon_wrap 15

Filter sequences containing a specific pattern:

create_alignment sequences.fasta V output.txt --filter "IGHV1-18"