Molecular Systematics

MOLECULAR SYSTEMATICS

Molecular biology has revolutionized the field of systematics. DNA evolves by mutations being incorporated in the DNA and fixed in populations. This will lead to divergence of DNA sequences in different species. Although diverged, we can refer to two DNA sequences as homologous (just as we would for any morphological trait such as forelimbs). Nicely demonstrates descent with modification as a definition of evolution. For this reason, DNA should be an excellent tool for inferring phylogenies: large number of homologous characters that should be (??) less subject to convergent evolution than other characters that might lead to a confusion of grade and clade.

To estimate phylogenies we first must estimate how much sequence divergence there has been between the various taxa we want to study. Several methods: direct sequencing. Elegant molecular methods available, a lot of work but provides lots of data. Each nucleotide position is a character and the actual nucleotide that is present at that site is the character state.

A character can be phylogenetically informative when nucleotide changes are shared by two or more taxa. A character can be phylogenetically uninformative when all nucleotides are the same among taxa, or when only a single taxon has a different nucleotide.

		I			U				I
A	C	T	C	G	A	C	T	A	G	A	T
A	C	T	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	C	A	T
A	C	A	C	G	T	C	T	A	C	A	T

A less direct method of determining sequence divergence is by restriction enzyme mapping. These are enzymes that recognize specific sequences in the DNA and cut the DNA strands. Depending on the location of restriction recognition sites in the DNA, DNA fragments of various lengths will be generated by the restriction enzyme digest resulting in a restriction fragment pattern. One can also determine where various restriction enzymes cut a given piece of DNA and draw up a restriction map of the stretch of DNA. The extent to which two restriction maps (or restriction fragment patterns) are similar serves as an estimator of sequence similarity (or difference). Restriction enzymes will recognize only a fraction of the entire DNA sequence, so one will not know all the differences between two stretches of DNA. The data serve as an estimate and because it usually involved less work can be done on many individuals.

DNA-DNA hybridization is another indirect way of obtaining estimates of DNA sequence divergence between two taxa. DNA strands are melted apart at high temperature and allowed to "reanneal" in the presence of the DNA of another species. One species' DNA has been labeled with a radioactive nucleotide. This form heteroduplex DNA. The heteroduplex DNA is gradually heated and the amount of single stranded DNA that has "melted" apart is determined by the amount of radioactive label that is counted in each fraction collected from the various temperature steps. Very similar DNA will melt at a high temperature and heteroduplexes between diverged DNAs will melt at a lower temperature. Sequence divergence is proportional to melting temperature. See fig. 17.19, pg. 497.

Phenetic approaches: DNA hybridization, sequence divergence from sequencing, restriction patterns or restriction maps. In each case the data would be in the form of a single number indication the similarity or difference between the DNAs of each pair of species in the study.

Cladistic approaches: direct sequencing, restriction maps and restriction fragments (fragments less desirable). In each case one would look for shared derived character states (nucleotide, restriction recognition site) among taxa. With restriction sites shared loss is unreliable as a uniting character because the nucleotide change could have occurred anywhere in the recognition sequence. Like refrigerator/pizza/invertebrate example.

Molecular approaches to systematics for us to think about the rates of molecular evolution. If DNA or proteins evolved at a constant rate in all species, then one could use estimates of sequence divergence to build very reliable phylogenies. If there was a molecular clock we could determine the "true phylogeny". Fact is, there is no one molecular clock.

Different proteins and DNA sequences evolve at different rates. Why? Different functional constraints. Different proteins do different things and some can do their structural or functional job with any of several different amino acids at many of the positions (fibrinopeptides). Other proteins will not function properly with "any" amino acid changes (histones: two amino acid differences between peas and cows!). Intron sequences less constrained than coding exon sequences, and hence introns tend to diverge faster than exons. Synonymous sites evolve faster than non-synonymous sites, again due to different functional constraints (i.e., some form of selection against "incorrect" sequences). Nuclear DNA tends to evolve slower than mitochondrial DNA (in vertebrates). Unit evolutionary period: time required to observe a given unit of divergence. A 1% divergence of vertebrate mitochondrial DNA takes about 250,000 to 500,000 years.

What gene do I use??: Depends on the taxa you are studying and the amount of divergence among them. Histones good for "macrosystematics", fibrinopeptides, mtDNA good for "microsystematics" or population level phylogenies. See fig. 17.15, pg. 489.

Note problems with tree building from data: unequal rates and convergence.

As if the choice of gene/protein were not a problem. What if different lineages evolve at different rates? Test for this with the relative rate test. Compare the paths from two different taxa to a third taxon. If the paths are the same: taxa are evolving at the same rate; if not: different rates. Extreme rate fluctuations are a problem; slight ones are not as they would not lead to regrouping taxa (depending on how "slight" is defined)

Convergence over long stretches of DNA is unlikely, although it has been reported for lysozyme. Another kind of "convergence" can occur due to the limited number of character states in DNA. Back mutations: e.g. A changes to T, T changes to C and C changes back to A. Could occur in one step or many. Maximal random divergence: 25% similarity

Nonetheless molecular tools have allowed major leaps in our understanding of biological diversity: Bacterial evolution: three kingdoms, not five; Endosymbiont hypothesis, essentially proven; AIDS virus: rapid evolution is good for the virus, bad for us.

		I			U				I
A	C	T	C	G	A	C	T	A	G	A	T
A	C	T	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	C	A	T
A	C	A	C	G	T	C	T	A	C	A	T

		I			U				I
A	C	T	C	G	A	C	T	A	G	A	T
A	C	T	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	C	A	T
A	C	A	C	G	T	C	T	A	C	A	T

		I			U				I
A	C	T	C	G	A	C	T	A	G	A	T
A	C	T	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	G	A	T
A	C	A	C	G	T	C	T	A	C	A	T
A	C	A	C	G	T	C	T	A	C	A	T