DnaSP v5
Julio Rozas et al.

DnaSP is a software package for Windows that performs extensive population genetics analyses from DNA sequence data. DnaSP estimates several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites), linkage disequilibrium, recombination, gene flow and gene conversion. DnaSP can also carry out several tests of neutrality: those of Hudson, Kreitman and Aguadé (1987), Tajima (1989), McDonald and Kreitman (1991), Fu and Li (1993), and Fu (1997) tests. Additionally, DnaSP can estimate the confidence intervals of some test-statistics by the coalescent.
 

System Requirements

Hardware:
    IBM-Compatible PC; 32 Mb RAM
Operating System:
    Windows 95 / 98 / NT / XP / 2000 / Vista

DnaSP can also run on Apple Macintosh platforms (using VirtualBox, VMWare Fussion, Parallels Desktop or Virtual PC), and on Linux and Unix-based operating systems (using VirtualBox, VMWare or Wine).

 

DnaSP User Interface

DnaSP provides a standard Microsoft Windows user interface with several commands to import, export, view, print and save data files, results or graphs. DnaSP also provides a standard Microsoft windows help file with instructions and descriptions of all programs and commands.
 

Input and Output

DnaSP can automatically read or write (export) five types of file formats: MEGA (Kumar et al. 1994), NBRF/PIR (Sidman et al. 1988), NEXUS (Maddison et al. 1997), FASTA, and PHYLIP (Felsenstein 1993). In all cases one or more homologous aligned nucleotide (DNA or RNA) sequences should be included in a text file. The total number of sequences and the sequence length that can be handled by DnaSP mainly depend on the available memory, which can analyze data files with a large number (thousands) of sequences of thousands of nucleotides each. The output is displayed in windows with text, tables, grids (the output data are laid out in rows and columns as on a spreadsheet) and graphs. The output can either be sent to the printer (any Windows printer driver) or be saved in a file.
 

DnaSP modules and commands

DnaSP allows the analysis in a subset of sites, or in a subset of sequences of the data file. DnaSP also allows analyses in synonymous and nonsynonymous sites (there are nine predefined genetic codes), or in various sorts of codon positions (in zero- two- and four-fold degenerate codon positions; in the first, second and third codon positions). Additionally, DnaSP can perform several analyses by the sliding window method (the option can be used to obtain a graphic representation of the pattern of change of a specific parameter along the sequence).

Polymorphic Sites
This command displays general information about the polymorphisms in the data file: the number of sites with alignment gaps (or missing data), the number of monomorphic sites, the number of polymorphic sites segregating for two, three, or four nucleotides, the number of parsimony informative sites, the number of synonymous and nonsynonymous polymorphisms, etc.

DNA Polymorphism
This command computes several measures of the extent of DNA polymorphism and their variances. DnaSP estimates i) the average number of nucleotide differences per site between two sequences, or nucleotide diversity, Pi (Nei 1987, equations 10.5 or 10.6), and its sampling variance and standard error (Nei 1987, equation 10.7); ii) the nucleotide diversity using the Jukes and Cantor correction (Jukes and Cantor 1969; Lynch and Crease 1990, equations 1-2); iii) the nucleotide diversity by pairwise-deletion; iv) the average number of nucleotide differences, k (Tajima 1983, equation A3) and its stochastic, and sampling variances (Tajima 1993, equations 13-18); v) Theta = 4Nu, where N is the effective population size, and u is the mutation rate per nucleotide (or per sequence) and per generation (Nei 1987, equation 10.3; Tajima 1993, equation 3) and its variance for free and for no recombination (Tajima 1993, equations 4 and 8); vi) Theta per nucleotide under the finite sites model (Tajima 1996, equations 9-10, 16).

DNA Divergence Between Populations
This module allows computation of several measures of the extent of DNA divergence between populations. DnaSP computes the nucleotide diversity of each population, the average number of nucleotide substitutions per site between populations, Dxy (Nei 1987, equation 10.20), and the number of net nucleotide substitutions per site between populations, Da (Nei 1987, equation 10.21). DnaSP can estimate these parameters and their variances using the Jukes and Cantor method (Nei 1987, equations 10.20 - 10.24).

Synonymous and Nonsynonymous Substitutions
This program estimates Ka (the number of nonsynonymous substitutions per nonsynonymous site), and Ks (the number of synonymous substitutions per synonymous site) for any pair of sequences (Nei and Gojobori 1986, equations 1-3). DnaSP can estimate the nucleotide diversity for synonymous, nonsynonymous and silent (both synonymous and noncoding positions) sites. Nine pre-defined genetic codes can be used, among others: the universal nuclear code, and the mitochondrial code of Drosophila, mammals and yeast.

Polymorphism and Divergence
This module allows the analysis of the extent of DNA polymorphism and divergence in synonymous, nonsynonymous and silent (both synonymous and noncoding positions) sites. The analysis can be performed separately for noncoding, exonic or intronic regions (Jukes and Cantor 1969; Nei 1987; Nei and Gojobori 1986).

Codon Usage Bias
This module estimate some measures of the extent of the nonrandom usage of synonymous codons. DnaSP computes the RSCU, Relative Synonymous Codon Usage (Sharp et al. 1986), ENC, the Effective Number of Codons (Wright 1990), the CBI, Codon Biax Index (Morton 1993), the Scaled Chi Square (Shields et al. 1988). Additionally DnaSP can also estimate the G+C content at coding, noncoding positions.

Gene Conversion
DnaSP incorporates the algorithm developed by Betrán et al. (1997) to detect gene conversion tracts from two differentiated populations (or subpopulations). These subpopulations could be, for example, two different chromosomal gene arrangements (Rozas and Aguadé 1994), or two sets of paralogous sequences. DnaSP also estimates the parameter Psi (Betrán et al. 1997), which measures the probability per site of detecting a conversion event between two subpopulations; from this information the true number and length of the gene conversion tracts can be estimated (Betrán et al. 1997).

Gene Flow
DnaSP computes different measures of the extent of DNA divergence between populations, and from these measures it computes the average level of gene flow, assuming the island model of population structure (Wright 1951). DnaSP estimates the following measures: dST, gST and Nm (Nei 1982), NST and Nm (Lynch and Crease 1990), FST and Nm (Hudson et al. 1992).

Linkage Disequilibrium
DnaSP estimates the degree of linkage disequilibrium (or nonrandom association between variants of different polymorphic sites) with the following parameters: D (Lewontin and Kojima 1964), D' (Lewontin 1964), R and R2 (Hill and Robertson 1968). For the purposes of analysis, gametes with the most or the least common variants are considered in the coupling phase (Langley et al. 1974). Both the two-tailed Fisher's exact test and the chi-square test are computed to determine whether the associations between polymorphic sites are, or are not, significant.

Population Size Changes
Analysis of the pairwise differences distribution (mismatch distribution), and the frequency of segregating sites (frequency spectrum). DnaSP shows a graphic representation of the observed and expected values for expanding and stationary populations. (Slatking and Hudson 1991; Rogers and Harpending 1992; Harpending et al. 1993; Rogers 1994; Tajima 1989a; Tajima 1989b).

Recombination
This module computes the recombination parameter R = 4Nr, where N is the population size and r is the recombination rate per sequence -or between adjacent sites- (Hudson 1987). DnaSP has also included the algorithm (the four-gametic test) described in Hudson and Kaplan (1985) to estimate RM, the minimum number of recombination events in the history of the sample.

Hudson, Kreitman and Aguadé's Test
The Hudson, Kreitman and Aguadé's (1987) test (HKA test) is based on the neutral theory of molecular evolution (Kimura 1983) which predicts that for a particular region of the genome, its rate of evolution is correlated with the levels of polymorphism within species. The test requires data from at least two regions of the genome both for an interspecific comparison and also data for the intraspecific polymorphism from at least one species. DnaSP performs the HKA tests: i) using the sequence information included in the data files; ii) or alternatively, by entering the data (the number of nucleotide differences between species and the number of segregating sites within species) in a dialog box. This latter option allows comparison of autosomal and sex-linked regions, and to perform the HKA test when sample sizes for the two regions being compared are different, or when the number of analyzed sites is different in the intraspecific and in the interspecific comparison.

Fu and Li's Tests
DnaSP computes the D, D*, F and F* test statistics proposed by Fu and Li (1993) to test various predictions made by the neutral theory of molecular evolution (Kimura 1983). The tests statistics D and F require data from intraspecific polymorphism and from an outgroup (a sequence from a related species), and D* and F* only require intraspecific data. DnaSP uses the critical values obtained by Fu and Li (1993) to determine the statistical significance of D, F, D* and F* test statistics. DnaSP can also conduct the Fs test statistic (Fu 1997).

Tajima's Test
This command calculates the D test statistic proposed by Tajima (1989a) to test the neutral theory of molecular evolution (Kimura 1983). This test is based on the fact that under the neutral model estimates of the number of segregating sites and of the average number of nucleotide differences are correlated. DnaSP calculates the confidence limits of D (two-tailed test) assuming that this statistic follows a beta distribution (Tajima 1989a).

McDonald and Kreitman Test
This command performs the test proposed by McDonald and Kreitman (1991). That test compares the synonymous and nonsynonymous variation within and between species. Under neutrality, the ratio of nonsynonymous to synonymous fixed substitutions between species should be the same as the ratio of nonsynonymous to synonymous polymorphism within species.

Coalescent simulations
DnaSP can perform computer simulations based on the coalescent process for a neutral infinite-sites model assuming a large constant population size (Hudson 1990). DnaSP can perform the coalescent simulations for different levels of intragenic recombination (no recombination, intermediate levels and free recombination). DnaSP conducts computer simulations, (i) fixing the value of q (i.e. assuming a value of q), or (ii) fixing S, the number of segregating sites (mutations) on the genealogy.
DnaSP can generate the empirical distributions of some test-statistics. From that distributions DnaSP can provide the confidence limits for a given interval. Both one-sided and two-sided tests can be conducted. DnaSP can generate the empirical distribution of the following statistics: Haplotype diversity (Nei 1987), the number of haplotypes (Nei 1987), the nucleotide diversity (Nei 1987), theta (Watterson 1975), the ZnS test statistic for linkage disequilibrium (Kelly 1997), the Rm, the minimum number of recombination events (Hudson and Kaplan 1985), the Tajima's D (Tajima 1989), the D*, F*, D and F statistics (Fu and Li 1993), the Fs (Fu 1997), and the raggedness statistic (Harpending 1994).  

... And more
 


References


DnaSP References

Rozas, J. and Rozas, R. 1995. DnaSP, DNA sequence polymorphism: an interactive program for estimating Population Genetics parameters from DNA sequence data. Comput. Applic. Biosci. 11: 621-625.

Rozas, J. and Rozas, R. 1997. DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis. Comput. Applic. Biosci. 13: 307-311.

Rozas, J. and Rozas, R. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15: 174-175.

Rozas, J., Sánchez-DelBarrio, J. C., Messeguer, X. and Rozas, R. 2003.  DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496-2497.

Rozas, J. 2009. DNA Sequence Polymorphism Analysis using DnaSP. Pp. 337-350. In Posada, D. (ed.) Bioinformatics for DNA Sequence Analysis; Methods. In Molecular Biology Series Vol. 537. Humana Press, NJ, USA.

Librado, P. and Rozas, J. 2009. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451-1452 | doi: 10.1093/bioinformatics/btp187.


February 13, 2009
Return to DnaSP Home Page

Go to Julio Rozas Home Page