Frequently Asked Questions
-
Running
- How do I estimate parameters from my data?
- What does Gamma mean?
- What does GapParams mean for the negative binomial model ("NB")?
- What does GapParams mean for the power law model ("PL")?
- What should I use for gap parameters ?
- Can I output ancestral and internal sequences?
- Can I use different parameters on different branches?
- Compiling
Running
How do I estimate parameters from my data?
Dawg would be rather ineffective if users could not estimate simulation parameters from their data. Users can use Paup* or other phylogeny programs to estimate GTR trees and models from their alignments. Dawg comes with a Perl script, lambda.pl, that is capable of estimating some of the indel formation parameters. To run it needs a file containing a tree with branch lengths in substitution time. (Standard GTR trees would satisfy this.) It also needs an alignment in FASTA format to go along with the tree. To estimate parameters simply run
perl -w lambda.pl data.tree data.fasta
Lambda.pl has not been fully tested with older versions of perl, like 5.001.
What does Gamma mean?
Gamma is the coefficient of variance for the G+I rate heterogenity model. It is the inverse of the shape parameter, Alpha, of the distribution. If Gamma and Alpha are both specified, Dawg will use Gamma.
What does GapParams mean for the negative binomial model ("NB")?
GapParams = {r,q}. In this model indel lengths are distributed according to a negative binomial:
![\[
f(g|r,q) = {r+g-2 \choose g-1} (1-q)^r q^{g-1}.
\]](http://scit.us/projects/dawg/tracmath/cb42f3912825ae67cfbce218a74a8cef285ffe21.png)
What does GapParams mean for the power law model ("PL")?
GapParams = {a,m}. In this model indel lengths are distributed according to a truncated power law:
![\[
f(g|a,m) = \frac{g^{-a}}{\sum_{n=1}^m{n^{-a}}}.
\]](http://scit.us/projects/dawg/tracmath/63b3ca8a6097e6ed3bd7f94f894ac966da22207a.png)
What should I use for gap parameters ?
Idealy, if you have enough data you should estimate parameters from the data using lambda.pl. In lieu of that, I recommend
Lambda = {0.035, 0.105}
GapModel = "PL"
GapParams = {1.67, 10000}
Length = 1000
Make sure that the second number in GapParams is not smaller than Length.
Can I output ancestral and internal sequences?
Yes. Dawg will output the sequence for any labeled node in the tree. The only exception are labels that begin with an underscore. (Recombinant trees may require labels that users don't want reported.) In the tree
Tree = ((A:0.1,B:0.1)E:0.1,(C:0.1,D:0.1)_F:0.1)R;
A, B, C, D, E, and R will be reported. _F will not be reported.
Can I use different parameters on different branches?
Not directly, but it is possible to chain together several runs of Dawg to produce the effect. Let's say that we want to simulate the tree ((A:0.1,B:0.1)C:0.1)R; where the C-B branch has different parameters from everything else. In the first run of dawg use the tree ((A:0.1)C:0.1);, then use the sequence returned for C to run a simulation of tree (B:0.1)C; with parameter Sequence = "...", where ... is the sequence returned for C.
There is no automated implemention of this. But it is a feature request.
Compiling
How do I compile the source code?
Dawg uses the GNU automake and autoconfig tools to provide portability across multiple systems. To compile Dawg you need to configure it to your system before you make it.
Commands
./configure make make install
How do I compile the source code for Windows?
You have three easy options if you want to compile the source code on windows: Cygwin, Mingw, and Visual Studio. With Cygwin and Mingw you can use the above commands to compile the source code. Follow the instructions below to compile in Visual Studio .Net.
How do I compile in Visual Studio .Net?
The "vs" directory in the source code has project files for Visual Studio .Net 2003. You need to install ports of GNU Bison and Flex. I use the ports from the GNUWin32 Project.
How do I checkout and compile from current?
To compile current you need the latest versions of subversion, automake, and autoconfig. First you will checkout the source code from the current branch, then you will construct the make and configure files, then you will do standard installation.
Commands
svn checkout https://scit.us/svn/dawg/current dawg-current cd dawg-current ./autogen.sh ./configure make make install
Are there any distributions of current?
Distributions of current are rolled every week. Compiling current from a distribution saves some of the above steps. You can download the latest rolled current at http://scit.us/projects/files/dawg-current.tar.gz. Once you extract the source code simply use the standard configure and make commands.
