root/release/1.2/readme.txt

Revision 358, 10.9 KB (checked in by reed, 15 months ago)

1.2 RC

  • Property svn:eol-style set to native
Line 
1DAWG VERSION 1.2-RELEASE
2
3Copyright (c) (2004-2009) Reed A. Cartwright - All rights reserved.
4
5DESCRIPTION
6
7Dawg is an application that will simulate nucleotide evolution with
8gaps.
9
10ABSTRACT
11
12DNA Assembly with Gaps (Dawg) is an application designed to simulate the
13evolution of recombinant DNA sequences in continuous time based on the
14robust general time reversible model with gamma and invariant rate
15heterogeneity and a novel length-dependent model of gap formation. The
16application accepts phylogenies in Newick format and can return the
17sequence of any node, allowing for the exact evolutionary history to be
18recorded at the discretion of users. Dawg records the gap history of
19every lineage to produce the true alignment in the output. Many options
20are available to allow users to customize their simulations and results.
21
22Many tools and procedures exist for reconstructing alignments and
23phylogenies and estimating evolutionary parameters from extant data.
24True phylogenies and alignments are known in very rare instances. In the
25absence of known data with true phylogenies, we are left with using
26simulations to test the accuracy of such procedures. Proper simulation
27of sequence evolution should involve both nucleotide substitution and
28indel formation. However, existing tools for simulating sequence
29evolution either do not include indels, like Seq-gen or evolver, or
30include a rather inexact model of indel formation, like Rose. I
31developed Dawg to fill in these gaps.
32
33CONTACT
34
35racartwr@ncsu.edu or reed@scit.us
36
37Reed A. Cartwright, PhD
38Postdoctoral Research Associate
39Department of Genetics
40Bioinformatics Research Center
41North Carolina State University
42Campus Box 7566
43Raleigh, NC 27695-7566
44
45Most work was done while I was a PhD student:
46
47Department of Genetics
48University of Georgia
49Athens, GA
50
51REFERENCE
52
53Cartwright, R.A. (2005) DNA Assembly With Gaps (Dawg): Simulating Sequence
54Evolution. Bioinformatics 21 (Suppl. 3): iii31-iii38
55
56LICENSE
57
58See COPYING for license information.
59
60DOWNLOAD
61
62Dawg can be downloaded from <http://scit.us/projects/dawg/>.
63
64INSTALLATION
65
66See Dawg's website for binary packages for Windows, Mac OSX, and other
67systems.  Alternatively, you can compile Dawg from the source.  Dawg
68requires CMake 2.6 (http://www.cmake.org/) to build it from sources.  Many
69Unix-like operating systems can install CMake through their package
70systems.  Extract the Dawg source code and issue the following commands in
71the extracted directory:
72
73    cmake .
74    make
75    make install
76
77The '-G' option to cmake is used to specify different build systems, e.g. Unix
78Makefiles versus KDevelop3 project.  The '-D' option to cmake can be used to
79set different cmake variables from the command line:
80
81    cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr .
82    make
83    make install
84
85This will build an optimized version of Dawg and install it to '/usr/bin'.
86To specify your own build flags you need to set the environment variables
87CFLAGS and LDFLAGS as necessary.  Then specify
88
89    cmake -DCMAKE_BUILD_TYPE= .
90   
91See CMake's manual for additional information.
92
93If you want to build the source code on Windows you will need to install Flex
94and Bison from the Gnuwin32 project <http://gnuwin32.sourceforge.net/>, and make
95sure that they are in your path.  You can then run the CMake GUI interface.  If
96you would prefer to run the command line version, then open up a command console
97through the Visual Studio tools shortcut (or similar shortcut).  This will add
98the required compiler programs to your command console environment.  After
99changing to the source code directory issue the following commands:
100
101    cmake -G "NMake Makefiles" .
102    nmake
103
104If successful, you should find dawg.exe in the "src" directory.
105
106If you are trying to compile Dawg on a UNIX machine that does not have CMake
107installed, and you can't install it from a package, then you may need to install
108it locally.  After downloading and extracting CMake in your home directory,
109change to its directory and issue the following commands.
110
111  ./configure --prefix=$HOME
112  make
113  make install
114 
115If "make" fails, try using "gmake" instead.
116
117EXAMPLES
118
119example0.dawg - minimal
120example1.dawg - typical usage
121example2.dawg - simple indel formation
122example3.dawg - robust indel formation
123example4.dawg - recombination
124
125COMMAND LINE USAGE
126
127dawg -[scubvhqew?] [-o outputfile] file1 [file2...] 
128  -s: process files serially [default]
129  -c: process files combined together
130  -u: unbuffered output
131  -b: buffered output [default]
132  -q: disable error and warning reports (quiet)
133  -e: enable error reports [default]
134  -w: enable warning reports [default]
135  -v: display version information
136  -h: display help information
137  -?: same as -h
138  -o outputfile: override ouput filename in the configuration file
139
140  Dawg will read stdin if filename is "-".
141
142FILE FORMAT
143
144The file format takes a series of statements in the form of "[name]" or
145"name = value", where "name" is alphanumeric and value can be a string,
146number, Boolean, tree, or vector of values.  The former specifies a heading,
147which can simplify variable assignment.  A single variable is equivalent to
148a vector of a single entry. 
149
150When using headings, the following statements are equivalent:
151
152  Out.Block.Head = "A Comment"
153  Out.Block.Tail = "B Comment"
154
155  [Out.Block]
156  Head = "A Comment"
157  Tail = "B Comment"
158
159  Out.Block.Head = "A Comment"
160  [Out.Block]
161  Tail = "B Comment"
162
163  [Out.Block]
164  Head = "A Comment"
165  []
166  Out.Block.Tail = "B Comment"
167
168  [Out]
169  Block.Head = "A Comment"
170  [Out.Block]
171  Tail = "B Comment"
172
173  [Out]
174  Block.Head = "A Comment"
175  [.Block]
176  Tail = "B Comment"
177
178Values can be specified via the following syntaxes.
179
180string:  "[char-sequence]"
181         '[char-sequence]'
182         """[multi-line char-sequence]""" (removes initial and final newlines)
183         '''[multi-line char-sequence]''' (preserves initial and final newlines)
184number:  [sign]digits[.digits][(e|E)[sign]digits]
185boolean: true|false
186tree:    Newick Format
187vector:  { value, value, ...}
188
189OPTIONS
190
191  Name            Type            Description
192--------------------------------------------------------------------------
193  Tree             VT  phylogeny
194  TreeScale        N   coefficient to scale branch lengths by
195  Sequence         VS  root sequences
196  Length           VN  length of generated root sequences
197  Rates            VVN rate of evolution of each root nucleotide
198  Model            S   model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN
199  Freqs            VN  nucleotide (ACGT) frequencies
200  Params           VN  parameters for the model of evolution
201  Width            N   block width for indels and recombination
202  Scale            VN  block position scales
203  Gamma            VN  coefficients of variance for rate heterogenity
204  Alpha            VN  shape parameters
205  Iota             VN  proportions of invariant sites
206  GapModel         VS  models of indel formation: NB|PL|US
207  Lambda           VN  rates of indel formation
208  GapParams        VVN parameter for the indel model
209  Reps             N   number of data sets to output
210  File             S   output file
211  Format           S   output format: Fasta|Nexus|Phylip|Clustal
212  GapSingleChar    B   output gaps as a single character
213  GapPlus          B   distinguish insertions from deletions in alignment
214  KeepFlank        N   undeletable flanking regions N nucs from sequence
215  KeepEmpty        B   preserve empty columns in final alignment
216  LowerCase        B   output sequences in lowercase
217  Translate        B   translate outputted sequences to amino acids
218  Seed             VN  pseudo-random-number-generator seed (integers)
219  Out.Block.Head   S   string to insert at the start of the output
220  Out.Block.Tail   S   string to insert at the end of the output
221  Out.Block.Before S   string to insert before a sequence set in the output
222  Out.Block.After  S   string to insert after a sequence set in the output
223  Out.Subst        B   do variable substitution in Out.Block.*
224
225DEFAULTS
226
227  TreeScale = 1.0
228  Length = 100
229  Model = "JC"
230  Freqs = {0.25,0.25,0.25,0.25}
231  Params = {1.0,1.0,1.0,1.0,1.0,1.0}
232  Width = 1
233  Scale = 1.0
234  Gamma = 0.0
235  Iota =  0.0
236  GapModel = "US"
237  GapParams = 1.0
238  Reps = 1
239  Format = "Fasta"
240  GapSingleChar = false
241  GapPlus = false
242  LowerCase = false
243  Translate = false
244  Out.Subst = true
245
246VARIABLE SUBSTITUTION
247
248If Out.Subst is true (the default), then Dawg will preform variable substitution
249in any Out.Block that it outputs.  Currently three variables are supported.
250  %r is replaced by the current dataset number
251  %R is replaced by the total dataset number
252  %% is replaced by a percent sign.
253
254OUTPUT FILE
255
256Dawg can automatically detect the format of the output file based on its extension.
257Supported extensions and their formats are:
258
259  Clustal: aln, poo, txt, out, Clustal
260  Fasta:   fas, Fasta
261  Nexus:   nex, Nexus
262  Phylip:  phy, Phylip
263
264Dawg also supports the filename format of "ext:file" to output to "file" with
265the format specified by extension "ext".  That way one can use "nex:-" to output
266to stdout in Nexus format.
267
268NOTES
269
270The meaning of the "Params" vector is different for each substitution model.
271  GTR: Substitution rates A-C, A-G, A-T, C-G, C-T, G-T
272  JC:  Ignored
273  K2P: Transition rate, Transversion rate
274  K3P: Alpha (Transitions), Beta (A-T & G-C), Gamma (A-C & G-T)
275  HKY: Transition rate, Transversion rate
276  F81: Ignored
277  F84: Kappa
278  TN:  Alpha1 (A-G), Alpha2 (C-T), Beta (Transversions)
279
280Parameter "Freqs" is ignored by the models "JC", "K2P", and "K3P".
281
282If "Lambda" is a single value, then it specifies the rate of indel formation,
283e.g. "Lambda = 0.1" is the same as "Lambda = {0.05, 0.05}".  The first
284parameter is the insertion rate and the second parameter is the deletion rate.
285
286The first parameter of "GapModel" specifies the distribution model of
287insertion sizes. The second parameter specifies the distribution model of
288deletion sizes.  If only one parameter is given it is the model for both
289insertions and deletions.
290
291The first parameter of "GapParams" is a vector specifying the parameters for the
292gap model of insertions.  Likewise the second parameter is a vector specifying
293the parameters for the gap model of deletions.  If "GapParams" is not a vector
294of vectors, then it specifies the vector of parameters for both insertions and
295deletions.
296
297The meaning of the GapParams vector is different for each gap model.
298  US: The distribution of gap sizes.
299  NB: The number of failures (r), the probability of success (q).
300  PL: The rate parameter (a), the maximum gap size.
301
302To create a recombinant tree, you may need to specifically describe and label
303the inner nodes at which the recombination events occur.  See example4.dawg.
304
305Gamma takes precedence over Alpha.
306
307Sequence takes precedence over Length.
308
309If Out.Block.* is the name of a file, the code is read from that file.
310
311The following vector parameters have a size of "Width": "Scale", "Alpha",
312"Gamma", and "Iota".  If their size is less than width then the first value in
313the vector will be used to fill in the rest of the values, e.g. "Scale = 1.0"
314is the same as "Scale = {1.0,1.0,1.0}" when "Width = 3".
Note: See TracBrowser for help on using the browser.