|
DnaSP v5
Julio Rozas et al.
DnaSP is a software package for Windows that performs extensive population
genetics analyses from DNA sequence data. DnaSP estimates several measures
of DNA sequence variation within and between populations (in noncoding,
synonymous or nonsynonymous sites), linkage disequilibrium, recombination,
gene flow and gene conversion. DnaSP can also carry out several tests of
neutrality: those of Hudson, Kreitman and Aguadé (1987), Tajima
(1989), McDonald and Kreitman (1991), Fu and Li (1993), and Fu (1997) tests.
Additionally, DnaSP can estimate the confidence intervals of some test-statistics
by the coalescent.
DnaSP User Interface
DnaSP provides a standard Microsoft Windows user interface with several commands to import, export, view, print and save data files, results or graphs. DnaSP also provides a standard Microsoft windows help file with instructions and descriptions of all programs and commands.Polymorphic Sites
This command displays general information about the polymorphisms in
the data file: the number of sites with alignment gaps (or missing data),
the number of monomorphic sites, the number of polymorphic sites segregating
for two, three, or four nucleotides, the number of parsimony informative
sites, the number of synonymous and nonsynonymous polymorphisms, etc.
DNA Polymorphism
This command computes several measures of the extent of DNA polymorphism
and their variances. DnaSP estimates i) the average number of nucleotide
differences per site between two sequences, or nucleotide diversity, Pi
(Nei 1987, equations 10.5 or 10.6), and its sampling variance and standard
error (Nei 1987, equation 10.7); ii) the nucleotide diversity using the
Jukes and Cantor correction (Jukes and Cantor 1969; Lynch and Crease 1990,
equations 1-2); iii) the nucleotide diversity by pairwise-deletion; iv)
the average number of nucleotide differences, k (Tajima 1983, equation
A3) and its stochastic, and sampling variances (Tajima 1993, equations
13-18); v) Theta = 4Nu, where N is the effective population size, and u
is the mutation rate per nucleotide (or per sequence) and per generation
(Nei 1987, equation 10.3; Tajima 1993, equation 3) and its variance for
free and for no recombination (Tajima 1993, equations 4 and 8); vi) Theta
per nucleotide under the finite sites model (Tajima 1996, equations 9-10,
16).
DNA Divergence Between Populations
This module allows computation of several measures of the extent of
DNA divergence between populations. DnaSP computes the nucleotide diversity
of each population, the average number of nucleotide substitutions per
site between populations, Dxy (Nei 1987, equation 10.20), and the number
of net nucleotide substitutions per site between populations, Da (Nei 1987,
equation 10.21). DnaSP can estimate these parameters and their variances
using the Jukes and Cantor method (Nei 1987, equations 10.20 - 10.24).
Synonymous and Nonsynonymous Substitutions
This program estimates Ka (the number of nonsynonymous substitutions
per nonsynonymous site), and Ks (the number of synonymous substitutions
per synonymous site) for any pair of sequences (Nei and Gojobori 1986,
equations 1-3). DnaSP can estimate the nucleotide diversity for synonymous,
nonsynonymous and silent (both synonymous and noncoding positions) sites.
Nine pre-defined genetic codes can be used, among others: the universal
nuclear code, and the mitochondrial code of Drosophila, mammals
and yeast.
Polymorphism and Divergence
This module allows the analysis of the extent of DNA polymorphism and
divergence in synonymous, nonsynonymous and silent (both synonymous and
noncoding positions) sites. The analysis can be performed separately for
noncoding, exonic or intronic regions (Jukes and Cantor 1969; Nei 1987;
Nei and Gojobori 1986).
Codon Usage Bias
This module estimate some measures of the extent
of the nonrandom usage of synonymous codons. DnaSP computes the
RSCU, Relative Synonymous Codon Usage (Sharp et al. 1986), ENC, the Effective
Number of Codons (Wright 1990), the CBI, Codon Biax Index (Morton 1993),
the Scaled Chi Square (Shields et al. 1988). Additionally DnaSP can also
estimate the G+C content at coding, noncoding positions.
Gene Conversion
DnaSP incorporates the algorithm developed by Betrán et al.
(1997) to detect gene conversion tracts from two differentiated populations
(or subpopulations). These subpopulations could be, for example, two different
chromosomal gene arrangements (Rozas and Aguadé 1994), or two sets
of paralogous sequences. DnaSP also estimates the parameter Psi (Betrán
et al. 1997), which measures the probability per site of detecting a conversion
event between two subpopulations; from this information the true number
and length of the gene conversion tracts can be estimated (Betrán
et al. 1997).
Gene Flow
DnaSP computes different measures of the extent of DNA divergence between
populations, and from these measures it computes the average level of gene
flow, assuming the island model of population structure (Wright 1951).
DnaSP estimates the following measures: dST, gST and Nm (Nei 1982), NST
and Nm (Lynch and Crease 1990), FST and Nm (Hudson et al. 1992).
Linkage Disequilibrium
DnaSP estimates the degree of linkage disequilibrium (or nonrandom
association between variants of different polymorphic sites) with the following
parameters: D (Lewontin and Kojima 1964), D' (Lewontin 1964), R and R2
(Hill and Robertson 1968). For the purposes of analysis, gametes with the
most or the least common variants are considered in the coupling phase
(Langley et al. 1974). Both the two-tailed Fisher's exact test and the
chi-square test are computed to determine whether the associations between
polymorphic sites are, or are not, significant.
Population Size Changes
Analysis of the pairwise differences distribution (mismatch distribution),
and the frequency of segregating sites (frequency spectrum). DnaSP shows
a graphic representation of the observed and expected values for expanding
and stationary populations. (Slatking and Hudson 1991; Rogers and Harpending
1992; Harpending et al. 1993; Rogers 1994; Tajima 1989a; Tajima 1989b).
Recombination
This module computes the recombination parameter R = 4Nr, where N is
the population size and r is the recombination rate per sequence -or between
adjacent sites- (Hudson 1987). DnaSP has also included the algorithm (the
four-gametic test) described in Hudson and Kaplan (1985) to estimate RM,
the minimum number of recombination events in the history of the sample.
Hudson, Kreitman and Aguadé's Test
The Hudson, Kreitman and Aguadé's (1987) test (HKA test) is
based on the neutral theory of molecular evolution (Kimura 1983) which
predicts that for a particular region of the genome, its rate of evolution
is correlated with the levels of polymorphism within species. The test
requires data from at least two regions of the genome both for an interspecific
comparison and also data for the intraspecific polymorphism from at least
one species. DnaSP performs the HKA tests: i) using the sequence information
included in the data files; ii) or alternatively, by entering the data
(the number of nucleotide differences between species and the number of
segregating sites within species) in a dialog box. This latter option allows
comparison of autosomal and sex-linked regions, and to perform the HKA
test when sample sizes for the two regions being compared are different,
or when the number of analyzed sites is different in the intraspecific
and in the interspecific comparison.
Fu and Li's Tests
DnaSP computes the D, D*, F and F* test statistics proposed by Fu and
Li (1993) to test various predictions made by the neutral theory of molecular
evolution (Kimura 1983). The tests statistics D and F require data from
intraspecific polymorphism and from an outgroup (a sequence from a related
species), and D* and F* only require intraspecific data. DnaSP uses the
critical values obtained by Fu and Li (1993) to determine the statistical
significance of D, F, D* and F* test statistics. DnaSP can also conduct
the Fs test statistic (Fu 1997).
Tajima's Test
This command calculates the D test statistic proposed by Tajima (1989a)
to test the neutral theory of molecular evolution (Kimura 1983). This test
is based on the fact that under the neutral model estimates of the number
of segregating sites and of the average number of nucleotide differences
are correlated. DnaSP calculates the confidence limits of D (two-tailed
test) assuming that this statistic follows a beta distribution (Tajima
1989a).
McDonald and Kreitman Test
This command performs the test proposed by McDonald and Kreitman (1991).
That test compares the synonymous and nonsynonymous variation within and
between species. Under neutrality, the ratio of nonsynonymous to synonymous
fixed substitutions between species should be the same as the ratio of
nonsynonymous to synonymous polymorphism within species.
Coalescent simulations
DnaSP can perform computer simulations based on the coalescent process
for a neutral infinite-sites model assuming a large constant population
size (Hudson 1990). DnaSP can perform the coalescent simulations for different
levels of intragenic recombination (no recombination, intermediate levels
and free recombination). DnaSP conducts computer simulations, (i) fixing
the value of q (i.e. assuming a value of q),
or (ii) fixing S, the number of segregating sites (mutations) on the genealogy.
DnaSP can generate the empirical distributions of some test-statistics.
From that distributions DnaSP can provide the confidence limits for a given
interval. Both one-sided and two-sided tests can be conducted. DnaSP can
generate the empirical distribution of the following statistics: Haplotype
diversity (Nei 1987), the number of haplotypes (Nei 1987), the nucleotide
diversity (Nei 1987), theta (Watterson 1975), the ZnS test statistic for
linkage disequilibrium (Kelly 1997), the Rm, the minimum number of recombination
events (Hudson and Kaplan 1985), the Tajima's D (Tajima 1989), the D*,
F*, D and F statistics (Fu and Li 1993), the Fs (Fu 1997), and the raggedness
statistic (Harpending 1994).
... And more
Rozas, J. and Rozas, R. 1995. DnaSP, DNA sequence polymorphism: an interactive program for estimating Population Genetics parameters from DNA sequence data. Comput. Applic. Biosci. 11: 621-625.
Rozas, J. and Rozas, R. 1997. DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis. Comput. Applic. Biosci. 13: 307-311.
Rozas, J. and Rozas, R. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15: 174-175.
Rozas, J., Sánchez-DelBarrio, J. C., Messeguer, X. and Rozas, R. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496-2497.
Rozas, J. 2009. DNA Sequence Polymorphism Analysis using DnaSP. Pp. 337-350. In Posada, D. (ed.) Bioinformatics for DNA Sequence Analysis; Methods. In Molecular Biology Series Vol. 537. Humana Press, NJ, USA.
Librado, P. and Rozas, J. 2009. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451-1452 | doi: 10.1093/bioinformatics/btp187.
February 13, 2009
Return to DnaSP Home Page