wiki:DawgManual

Examples

Dawg comes with several examples which can be found in the "example" directory.

  • example0.dawg - minimal
  • example1.dawg - typical usage
  • example2.dawg - simple indel formation
  • example3.dawg - robust indel formation
  • example4.dawg - recombination

Command Line Usage

dawg -[scubvh?] file1 [file2...]

  • -s: process files serially [default]
  • -c: process files combined together
  • -u: unbuffered output
  • -b: buffered output [default]
  • -v: display version information
  • -h: display help information
  • -?: same as -h

Dawg will read stdin if filename is "-".

File Format

The file format takes a series of statements in the form of "name = value," where "name" is alphanumeric and value can be a string, number, boolean, tree, or vector of values. A single variable is equivalent to a vector of a single entry.

  • string: "[char-sequence]"
  • string: <<EOF [several lines] EOF
  • number: [sign]digits[.digits][(e|E)[sign]digits]
  • boolean: true|false
  • tree: Newick Format
  • vector: { value, value, ...}

Options

The following table lists the options recognized by Dawg.

NameTypeDescription
Treevector of ector of treesphylogeny
TreeScale?numbercoefficient to scale branch lengths by
Sequencevector of ector of stringsroot sequences
Lengthvector of numberslength of generated root sequences
Ratesvector of vector of numbersrate of evolution of each root nucleotide
Modelstringmodel of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN
Freqsvector of numbersnucleotide (ACGT) frequencies
Paramsvector of numbersparameters for the model of evolution
Widthnumberblock width for indels and recombinationumbers
Scalevector of numbersblock position scales
Gammavector of numberscoefficients of variance for rate heterogenity
Alphavector of numbersshape parameters
Iotavector of numbersproportions of invariant sites
GapModel?vector of stringsmodels of indel formation: NB|PL|US
Lambdavector of numbersrates of indel formationumbers
GapParams?vector of vectors of numbersparameter for the indel model
Repsnumbernumber of data sets to output
Filestringoutput file
Formatstringoutput format: Fasta|Nexus|Phylip|Clustal
GapSingleChar?booleanoutput gaps as a single character
GapPlus?booleandistinguish insertions from deletions in alignment
LowerCase?booleanoutput sequences in lowercase
Translatebooleantranslate outputed sequences to amino acids
NexusCode?stringtext or file to include between datasets in Nexus format
Seedvector of numbersPRNG seed (integers)

Default Options

  TreeScale = 1.0
  Length = 100
  Model = "JC"
  Freqs = {0.25,0.25,0.25,0.25}
  Params = {1.0,1.0,1.0,1.0,1.0,1.0}
  Width = 1
  Scale = 1.0
  Gamma = 0.0
  Iota =  0.0
  GapModel = "US"
  GapParams = 1.0
  Reps = 1
  Format = "Fasta"
  GapSingleChar = false
  GapPlus = false
  LowerCase = false
  Translate = false

Notes

  1. The meaning of the "Params" vector is different for each substitution model.
    • GTR: Substitution rates A-C, A-G, A-T, C-G, C-T, G-T
    • JC: Ignored
    • K2P: Transition rate, Transversion rate
    • K3P: Alpha (Transitions), Beta (A-T & G-C), Gamma (A-C & G-T)
    • HKY: Transition rate, Transversion rate
    • F81: Ignored
    • F84: Kappa
    • TN: Alpha1 (A-G), Alpha2 (C-T), Beta (Transversions)
  2. Parameter "Freqs" is ignored by the models "JC", "K2P", and "K3P".
  3. If "Lambda" is a single value, then it specifies the rate of indel formation, e.g. "Lambda = 0.1" is the same as "Lambda = {0.05, 0.05}". The first parameter is the insertion rate and the second parameter is the deletion rate.
  4. The first parameter of "GapModel?" specifies the distribution model of insertion sizes. The second parameter specifies the distribution model of deletion sizes. If only one parameter is given it is the model for both insertions and deletions.
  5. The first parameter of "GapParams?" is a vector specifying the parameters for the gap model of insertions. Likewise the second parameter is a vector specifying the parameters for the gap model of deletions. If "GapParams?" is not a vector of vectors, then it specifies the vector of parameters for both insertions and deletions.
  6. The meaning of the "GapParams?" vector is different for each gap model.
    • US: The distribution of gap sizes.
    • NB: The number of failures (r), the probability of success (q).
    • PL: The rate parameter (a), the maximum gap size.
  7. To create a recombinant tree, you may need to specifically describe and label the inner nodes at which the recombination events occur. See example4.dawg.
  8. "Gamma" takes precidence over "Alpha".
  9. "Sequence" takes precidence over "Length".
  10. If "NexusCode?" is the name of a file, the code is read from that file.
  11. The following vector parameters have a size of "Width": "Scale", "Alpha", "Gamma", and "Iota". If their size is less than width then the first value in the vector will be used to fill in the rest of the values, e.g. "Scale = 1.0" is the same as "Scale = {1.0,1.0,1.0}" when "Width = 3".