lecture 8

GENETIC DRIFT

Genetic drift refers to random fluctuations in allele frequencies due to chance events (see figure 6.4, pg. 142). The previous lectures have all dealt with deterministic (predictable) evolutionary forces often referred to as linear pressures. Genetic drift is a stochastic (random) force that can scramble the predictable effects of selection, mutation, and gene flow. While it might seem that a random force would be of little significance to evolutionary "progress" (we'' confront this loaded term later), genetic drift is an extremely important force in evolution. However, its strength depends on the size of the population, as a simple exercise in coin tossing will illustrate. In ten tosses you might easily get seven heads; in 1000 tosses, however, you would never get 700 heads with a "fair" coin. The same sort of random fluctuation in allele frequencies can occur in small populations: consider a bag full of red and green marbles each in equal frequency; pull out a small handful and the frequency in your hand will probably not equal the frequency in the original bag. Let that handful determine the frequency in a new population that grows back to the original population size. A second small handful will randomly shift the frequency to yet another frequency. If you pulled out all the marbles in the bag (= large population) then the frequency would be maintained exactly in the next generation. Genetic drift is not a potent evolutionary force in very large randomly mating populations.

To illustrate the consequences of genetic drift we will consider what happens when drift alone is altering the frequencies of alleles among many small populations. To illustrate this we need to understand Population structure, which describes how individuals (or allele frequencies) in breeding populations vary in time and space. This structure is determined by the combined effect of deterministic and stochastic forces. We will introduce the idea of population structure by showing how genetic drift and inbreeding can change the frequencies of genotypes in populations.

Consider a grid of small populations (e.g., ponds in Minnesota), all with the same small population size and all starting at time t with p = q= 0.5. Through time each population will experience genetic drift due to random sampling and the frequencies in each population will diverge. The distribution of frequencies changes over time from a tight distribution (all 0.5), to a flat distribution (some populations at p = 0.1, some at 0.9 and all frequencies in between), to fixation (p =1.0) or loss (p = 0.0) of the alleles in all populations (see figure below). Fixation is when all alleles in the population are A; this necessarily implies loss of the a allele ("fixation" or "loss" should only be used with reference to a specific allele). If each population starts at p = 0.5, then at the end, when all populations have lost their variation, 50% of the populations will be fixed for the A allele and 50% will be fixed for the a allele (latter = "loss" for the A allele, get it?). If the initial frequency was p = 0.7, then 70% of the populations would be fixed for the A allele (again, assuming no selection, migration, mutation).

Main Points: 1) total variation does not change; variation goes from within populations (no variation between populations) to between populations (no variation within populations). 2) genetic divergence of populations entirely by chance! (no selection). This is why genetic drift can be an important force in evolution.

At the start of this drift process in our array of populations, p = 0.5 and there are 2pq = 0.5 = 50% heterozygotes. When all populations in the array have fixed or lost the allele, there can be no heterozygotes (i.e., 0%). This shows that the proportion of heterozygotes decreases as drift proceeds (this also occurs when there is inbreeding which can also be thought of as a sampling error phenomenon). We can quantify this process as follows: the proportion of heterozygotes in the "next " generation is a function of the proportion of heterozygotes in the present generation and the "rate" at which drift proceeds: Ht+1 = Ht[1 - (1/2N)] where H = the proportion of heterozygotes in the population (or in the array of populations) and N = population size. This can be extended over many generations as follows: Ht = H0[1 - (1/2N)]t where t refers to the number of generations in the future and 0 refers to the present (or starting) generation. Looking at these equations it is clear that with small population sizes, heterozygosity will be lost quickly (drift will proceed quickly), whereas in large populations there will be little loss of heterozygosity.

If we consider our grid of populations again, we note that as drift proceeds and each deme

becomes a bit different from every other deme, the variation among demes increases. Like the loss of heterozygosity due to drift, the increase in the variation among demes depends on the population size. This variation can be described as Vt = p(1-p)[1 - (1 - 1/2N)t]. Note at t=0, Vt = 0 because the term in brackets = 0. As the number of generations proceeds, the variation among populations (Vt) increases rapidly if N is small, but slowly if N is large.

A general result as drift proceeds in small populations is a deficiency of heterozygotes, and reciprocally, an excess of homozygotes. This is also a common result when there has been inbreeding (= mating between relatives). In fact genetic drift and inbreeding are related phenomena. The relation between the frequencies of expected versus observed heterozygotes allows us to determine the inbreeding coefficient, F = (He - Ho)/He (subscripts e and o mean expected and observed, respectively).

One effect of inbreeding is to increase the frequency of homozygotes (and thus, necessarily, decrease the frequency of heterozygotes). Note: while the frequency of genotypes change with inbreeding, the frequencies of alleles remains the same (assuming no selection, migration, mutation). Refer back to the data table presented on page 2 of Lecture 6 to convince yourself that those data could be a result of inbreeding: F=(0.343 - 0.14)/0.343 = 0.59. When the allele frequency is not zero, but there is a complete absence of heterozygotes , F = 1. As an exercise, work through the data in table 5.2, pg.98. Does this illustrate high or low inbreeding?

Genetic variation is generally "lost" by the action of genetic drift. This is true if we follow the fate of one deme over time. Note, however, that in our array of populations, variation is "lost" within demes, but the variation in the total system is preserved, i.e., the allele frequency in the entire metapopulation does not change, only the genotype frequencies and allele frequencies within individual demes).

Inbreeding also has the effect of increasing the variance among the individual demes of a larger population. As such, drift and inbreeding are closely related evolutionary forces. Recall that the variance = 1/N_(X-xi)2. In a random mating population with p = 0.4, f(AA) = 0.16, f(Aa) = 0.48, f(aa) = 0.36. If the alleles act in an additive manner, the heterozygotes will be intermediate and close to the mean in phenotype and will contribute little to the variance; with inbreeding most individuals are homozygous, and thus would deviate from the mean and the variance would be greater. In effect, inbreeding makes the distribution of phenotypes more "bimodal" by essentially redistributing the alleles from all three genotypes into the two homozygotes (see figure 6.5, pg. 143).

Population structure is usually quantified by a simple statistic known as Fst. This stands for the "fixation" index resulting from comparing sub populations to the total population, and is used to quantify the proportion of genetic variation that lies between subpopulations within the total population. An important way of thinking of this problem is to compare the mean heterozygosity averaged across all demes to the heterozygosity that would result if all demes were pooled into one big population. Heterozygosity is the proportion of heterozygotes in the population and is defined as H = 2 p q. Note that heterozygosity is zero at "fixation", the case where only one allele exists (p = 0 or 1), and that heterozygosity is at a maximum when alleles are equally frequent (e.g., p = q = 0.5). [For completeness, H = 1- (p2 + q2) which follows from Hardy-Weinberg above. In the case of more than two alleles, we can't just use p and q, so the following expression for heterozygosity works more generally: H = 1 - _xi2 where xi is the frequency of the "ith" allele, and summation is across all i alleles. The expression 1 - (p2 + q2) is identical to 1 - _xi2 when there are only two alleles].

In our metapopulation example above, on the left all demes have p = 0.5 and the allele frequency for the entire array is also p = 0.5. Right away you can se that there in no differentiation among demes. To quantify this we use Fst which is calculated as Fst = (Ht - meanHs) / Ht, where Ht = 2 (pooled p) (pooled q) [note that pooled q = (1-pooled p)], and meanHs = the average of H values for each of the individual demes (i.e., 2 p q for each deme, averaged across all demes). For the left-hand metapopulation Ht = 2(0.5)(0.5) = 0.5, and meanHs = 0.5 also. Thus , Fst = (0.5 - 0.5) / 0.5 = 0.0. What does this mean? In English this says that none of the variation in this system of demes lies between demes; the population structure is zero. Since there is variation (i.e., p _ 0) all of the variation lies within demes.

Now consider the metapopulation system on the right above. The pooled allele frequency is 0.5 because half of the demes are fixed for the A allele (p = 1.0) and half are fixed for the a allele (p = 0.0). Hence the Ht value is 2 (pooled p) (pooled q) = 2(0.5)(0.5) = 0.5. The meanHs value is very different. Each deme has a heterozygosity of zero (either p or q is zero in all demes), so the average Hs value is 0.0. Hence, for the metapopulation on the right, Fst = (0.5 - 0.0) / 0.5 = 1.0. In English this means that all (100%) of the variation in the system lies between the demes. By definition then, none of the variation lies within demes, which we know because the little circles are either filled or empty. By now it should be apparent that in Fst values can range between the two extremes of zero and one that we have just illustrated. As genetic drift proceeds, Fst values will increase, but the balance between drift and "linear pressures" will determine what equilibrium Fst reaches (later lecture).

A further clarification: The "speed" or "intensity" of genetic drift is actually determined by the effective population size (Ne). This may differ from the total population size (N) if some individuals do not breed. Two examples where Ne differs from N are cases of different mating system and temporal fluctuations in population size. When the number of breeding males (Nm) _ number of breeding females (Nf), the effective population size can be quite different from the actual population size. The relation is: Ne = 4NmNf/(Nm + Nf). If Nm = Nf = N/2, then Ne = N. But if a single male does all the mating (approximated in elephant seals) then Ne = 4 (because Nf/(1+Nf) is approx. = 1). So the population genetics of elephant seals and sage grouse will be very different from that of large populations of insects.

With population bottlenecks where the population size drops to a small number in one generation, the effective population size is not just the average of N's for each generation. To estimate the Ne, one calculates the harmonic mean population size as follows:

1/Ne = 1/t_1/Nt where t = the number of generations and Nt = the population size at each generation. Thus with population sizes of 100, 100, 20, 100 the arithmetic mean = 80 but 1/Ne = 1/4(1/100 + 1/100 + 1/20 + 1/100) = 1/4(0.08) = 0.02, so Ne = 1/0.02 = 50. Thus the smaller population size has a disproportionate effect on the effective population size. This is a very important issue in conservation efforts concerning endangered species.

DRIFT AND MUTATION

Drift will tend to reduce heterozygosity (for our purposes this equals the proportion of heterozygotes), mutation will introduce new alleles which will serve to increase heterozygosity. This provides yet another example of a "fight" between opposing evolutionary forces. When the mutation rate is close to the reciprocal of the population size, heterozygostity will be high (i.e., a considerable amount of variation will exist in the population). The "balance" of this equilibrium can be described by an equation for the equilibrium frequency of heterozygotes:

H Å Plug in Ne's and u's to determine balance: when Ne = 1/u, H = 0.8; when u>1/Ne then heterozygosity will be higher; when u<1/Ne heterozygosity will be lower.

DRIFT AND GENE FLOW

Gene flow can counteract the loss of heterozygosity due to drift as well as counteract the random divergence of allele frequencies among populations. The balance between these two opposing forces can be described by an equation for the equilibrium variance among populations

V(among pops.) Å . As m increases, the variance decreases (faster homogenization); as Ne increases, the variance decreases (drift acts more slowly with larger Ne). Major conclusion is that it takes very little gene flow to keep two "populations" homogeneous, as little as one reproductive migrant between populations per generation! See figure 5.13, pg. 128.

A number of different methods have been developed to estimate migration rates from standing patterns of allele frequency variation. This rests on the fact that Fst values are a result of a "balance" between gene flow and drift. One can calculate Fst from molecular markers, and if an estimate of effective population size is available, migration can be estimated from the following relation: Fst Å 1/(4Nem + 1). By rearranging the formula and plugging in trial values you can see that low values for m result in high values for Fst, and high values of m give low values of Fst. This follows from the homogenizing effect of gene flow on allele frequency variation among populations.

The nature of population differentiation can depend on population structure. Typical scenarios include a Continent-Island model (gene flow from continent to island), an Island model (equal probability of gene flow among several/many populations), Stepping Stone model (gene flow is sequential among populations), Continuous Model (e.g., carpet of individuals). In all cases (except Island model) there can be Isolation by Distance simply reflecting incomplete homogenization of populations due to incomplete flow of genes among all populations.