Cartwright, R.A. (2005) DNA Assembly With Gaps (Dawg): Simulating Sequence Evolution. Bioinformatics 21 (Suppl. 3): iii31-iii38 (reprint).

DNA Assembly with Gaps (Dawg) is an application designed to simulate the evolution of recombinant DNA sequences in continuous time based on the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of gap formation. The application accepts phylogenies in Newick format and can return the sequence of any node, allowing for the exact evolutionary history to be recorded at the discretion of users. Dawg records the gap history of every lineage to produce the true alignment in the output. Many options are available to allow users to customize their simulations and results.

Many tools and procedures exist for reconstructing alignments and phylogenies and estimating evolutionary parameters from extant data. True phylogenies and alignments are known in very rare instances. In the absence of known data with true phylogenies, we are left with using simulations to test the accuracy of such procedures. Proper simulation of sequence evolution should involve both nucleotide substitution and indel formation. However, existing tools for simulating sequence evolution either do not include indels, like Seq-gen or evolver, or include a rather inexact model of indel formation, like Rose. I developed Dawg to fill in these gaps.


2009/04/26 - Version 1.2 has been released, fixing a serious bug. (Thanks, WF and ZY) Dawg was not reporting correct alignments in cases where insertions were homoplasious. When constructing alignments, Dawg classified nucleotides as four states: root, insertion, deletion, and deleted insertion. The alignment algorithm required columns to consist of two types. The first type allowed root and deletion nucleotides to be aligned with one another. The second allowed insertion and deleted insertions to align with one another and inserted gaps in sequences that didn't have an insertion or deleted insertion available. However, if by chance the sequence set had two independent insertions at the same location, the alignment construction algorithm would align them, even though it shouldn't. As a result, the output of Dawg versions 1.1.2 and below contains a low level of incorrectly aligned columns. The occurrence of these incorrect columns increases with increased insertion rate, increased branch length, and increased number of branches. Pairwise alignments are not affected. Unless you have simulated large alignments saturated with insertions, this bug should not be a serious problem for your data. The erroneous columns only add a small bit of noise to the alignments, affecting trees estimated from the alignments as well as substitution models.

Dawg now colors alignments based on the branch they originated on, and requires alignment columns to consist of nucleotides or gaps that have the same color. By default Dawg allows for just over 8000 unique colors; if you are simulating a tree with over 8000 branches, drop me an email and I'll tell you how to increase the limit at a cost of increased memory usage.

2008/09/14 - Version 1.1.2 has been released, fixing a bug. (Thanks, JK)

2008/05/22 - Version 1.1.1 has been released, fixing a bug. (Thanks, BD)

2006/09/06 - Version 1.1 has been released

2006/09/05 - A STABLE branch has been created for updates to 1.x code because CURRENT is going through some major changes for 2.x.

Download and Install

DawgDownload has download and install instructions.

Quick Links


Dawg currently ships with a brief manual and example files. A longer manual is under construction. You can also download the paper which explains dawg in more detail.


I can be emailed at or Please be sure to include "Dawg" in the subject line.


For a list of citations of Dawg click here.


Dawg has been presented at SEEC 2005 in Athens, GA and Evolution 2005 in Fairbanks, AK.


Copyright (c) 2004-2009 Reed A. Cartwight - All rights reserved.

Last modified 6 years ago Last modified on Mar 23, 2011, 11:00:24 PM