Last modified 5 years ago
Examples
Dawg comes with several examples which can be found in the "example" directory.
- example0.dawg - minimal
- example1.dawg - typical usage
- example2.dawg - simple indel formation
- example3.dawg - robust indel formation
- example4.dawg - recombination
Command Line Usage
dawg -[scubvh?] file1 [file2...]
- -s: process files serially [default]
- -c: process files combined together
- -u: unbuffered output
- -b: buffered output [default]
- -v: display version information
- -h: display help information
- -?: same as -h
Dawg will read stdin if filename is "-".
File Format
The file format takes a series of statements in the form of "name = value," where "name" is alphanumeric and value can be a string, number, boolean, tree, or vector of values. A single variable is equivalent to a vector of a single entry.
- string: "[char-sequence]"
- string: <<EOF [several lines] EOF
- number: [sign]digits[.digits][(e|E)[sign]digits]
- boolean: true|false
- tree: Newick Format
- vector: { value, value, ...}
Options
The following table lists the options recognized by Dawg.
| Name | Type | Description |
| Tree | vector of ector of trees | phylogeny |
| TreeScale? | number | coefficient to scale branch lengths by |
| Sequence | vector of ector of strings | root sequences |
| Length | vector of numbers | length of generated root sequences |
| Rates | vector of vector of numbers | rate of evolution of each root nucleotide |
| Model | string | model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN |
| Freqs | vector of numbers | nucleotide (ACGT) frequencies |
| Params | vector of numbers | parameters for the model of evolution |
| Width | number | block width for indels and recombinationumbers |
| Scale | vector of numbers | block position scales |
| Gamma | vector of numbers | coefficients of variance for rate heterogenity |
| Alpha | vector of numbers | shape parameters |
| Iota | vector of numbers | proportions of invariant sites |
| GapModel? | vector of strings | models of indel formation: NB|PL|US |
| Lambda | vector of numbers | rates of indel formationumbers |
| GapParams? | vector of vectors of numbers | parameter for the indel model |
| Reps | number | number of data sets to output |
| File | string | output file |
| Format | string | output format: Fasta|Nexus|Phylip|Clustal |
| GapSingleChar? | boolean | output gaps as a single character |
| GapPlus? | boolean | distinguish insertions from deletions in alignment |
| LowerCase? | boolean | output sequences in lowercase |
| Translate | boolean | translate outputed sequences to amino acids |
| NexusCode? | string | text or file to include between datasets in Nexus format |
| Seed | vector of numbers | PRNG seed (integers) |
Default Options
TreeScale = 1.0
Length = 100
Model = "JC"
Freqs = {0.25,0.25,0.25,0.25}
Params = {1.0,1.0,1.0,1.0,1.0,1.0}
Width = 1
Scale = 1.0
Gamma = 0.0
Iota = 0.0
GapModel = "US"
GapParams = 1.0
Reps = 1
Format = "Fasta"
GapSingleChar = false
GapPlus = false
LowerCase = false
Translate = false
Notes
- The meaning of the "Params" vector is different for each substitution model.
- GTR: Substitution rates A-C, A-G, A-T, C-G, C-T, G-T
- JC: Ignored
- K2P: Transition rate, Transversion rate
- K3P: Alpha (Transitions), Beta (A-T & G-C), Gamma (A-C & G-T)
- HKY: Transition rate, Transversion rate
- F81: Ignored
- F84: Kappa
- TN: Alpha1 (A-G), Alpha2 (C-T), Beta (Transversions)
- Parameter "Freqs" is ignored by the models "JC", "K2P", and "K3P".
- If "Lambda" is a single value, then it specifies the rate of indel formation, e.g. "Lambda = 0.1" is the same as "Lambda = {0.05, 0.05}". The first parameter is the insertion rate and the second parameter is the deletion rate.
- The first parameter of "GapModel?" specifies the distribution model of insertion sizes. The second parameter specifies the distribution model of deletion sizes. If only one parameter is given it is the model for both insertions and deletions.
- The first parameter of "GapParams?" is a vector specifying the parameters for the gap model of insertions. Likewise the second parameter is a vector specifying the parameters for the gap model of deletions. If "GapParams?" is not a vector of vectors, then it specifies the vector of parameters for both insertions and deletions.
- The meaning of the "GapParams?" vector is different for each gap model.
- US: The distribution of gap sizes.
- NB: The number of failures (r), the probability of success (q).
- PL: The rate parameter (a), the maximum gap size.
- To create a recombinant tree, you may need to specifically describe and label the inner nodes at which the recombination events occur. See example4.dawg.
- "Gamma" takes precidence over "Alpha".
- "Sequence" takes precidence over "Length".
- If "NexusCode?" is the name of a file, the code is read from that file.
- The following vector parameters have a size of "Width": "Scale", "Alpha", "Gamma", and "Iota". If their size is less than width then the first value in the vector will be used to fill in the rest of the values, e.g. "Scale = 1.0" is the same as "Scale = {1.0,1.0,1.0}" when "Width = 3".
