Usage
ngila [options] file1.fasta file2.fasta
Ngila concatenates multiple input files so that they look like one big FASTA file. Use the --pairs option to control how Ngila picks pairs of sequences to align.
Options
--model [-m] keyword
Specifies the alignment-cost model: 'zeta', 'geo', or 'cost'. Zeta and geo are models that specify alignment costs based on evolutionary parameters. They are the same except that the former uses a biologically realistic power-law distribution for indel lengths, while the latter uses the more common geometric distribution. The cost model directly specifies alignment costs.
--free-end-gaps [-e]
Enable free end gaps. The free end gaps option allows gaps at the front and back of the smaller sequence to have lower or no cost than other gaps. This is useful when you expect that the end points of a sequence pair are not homologous.
--case-insensitivity [-I]
Treat sequences and substitution matrices as case insensitive.
--threshold-larger [-M] number (default = 100000)
--threshold-smaller [-N] number (default = 100000)
Thresholds for O(MN)-memory alignment. Ngila's algorithm uses both a divide-and-conquer and a holistic dynamic programming algorithm. These settings control the transition between them.
--remove-gaps [-G] string (default = -+=)
Characters to remove from sequences before alignment. If your sequences have already been aligned and contain gap characters, this option allows you to specify the gap characters to remove before Ngila aligns the sequences.
--pairs keyword (default = first)
Control how sequence pairing is done: 'first', 'all', or 'each'. Ngila can align multiple pairs of sequences on one run. Option 'first' causes Ngila to align only the first two sequences. Option 'all' causes Ngila to align all possible pairs of sequences. If three sequences are passed to Ngila, then it will align A & B, A & C, and B & C. Option 'each' causes Ngila to align each pair of sequences. If four sequences are passed to Ngila, then it will align A & B and C & D.
--arg-file filename
Read command-line options from a file. This can simplify using common options by writing the to a file and directing Ngila to read the options from that file. Example file format:
model zeta branch-length 0.2 ratio 1.9 indel-rate 0.06 indel-slope 1.8
--input filename (default = -)
FASTA-formatted file from which to read sequences. It defaults to stdin and can be specified multiple times. Users can ignore this option because input files can be specified directly without use --input. Example file format:
>Sequence_A acgtacgtacgtacgt >Sequence_B acgtacgtacgtacgtttat
--quiet
Disable output of all warnings and error messages.
--version
Display version information.
--help
Display help information.
Evolutionary Model Options
--branch-length [-t] number (default = 0.1)
Specifies the separation time between sequences. Here separation time is measured by the expected number of substitutions per site.
--ratio [-k] number (default = 2)
Transition-transversion ratio.
--indel-rate [-r] number (default = 0.1)
Rate of insertion and deletion relative to the expected number of substitutions. Under this model the probability of an insertion and the probability of a deletion occurring at any particular site is
--indel-slope [-z] number (default = 1.7)
Slope parameter of indel length distribution. Only used by zeta model, for which the probability that an indel is of length g is
--indel-mean [-q] number (default = 10)
Mean of indel length distribution. Only used by geo model, for which the probability that an indel is of length g is
--avgaln [-l] number (default = 1000)
Average alignment skeleton length. Mostly a nuisance parameter to make the model a proper HMM. It should have little to no effect on the best alignment if it kept on the same order of magnitude as the sequence lengths.
Generic Cost Model Options
--cost-match [-i] number (default = 0)
Residue match cost.
--cost-mismatch [-j] number (default = 1)
Residue mismatch cost.
--cost-matrix [-x] filename
Matrix from which to read residue costs. Example file format:
A C G T 0 2 1 2 2 0 2 1 1 2 0 2 2 1 2 0
--cost-intersection [-a] number (default = 1)
--cost-linear [-b] number (default = 1)
--cost-logarithmic [-c] number (default = 0)
Specifies coefficients of gap cost. The gap cost is defined by the following equation
--cost-intersection-free [-f] number (default = 0)
--cost-linear-free [-g] number (default = 0)
--cost-logarithmic-free [-h] number (default = 0)
Specifies coefficients of free end gap cost. These settings only make sense if --free-end-gaps is enabled. The free end gap cost is defined by the following equation
