Latest Update: 28 October 2017
These lecture notes provide you with a study guide for the Population Genetics and Evolution material from the second half of the class. This set of notes was originally written from the Fall 2015 lectures. I have left the lecture headings in place, but not with the original dates. Each year, the precise order and timing of the lectures may not match this sequence, but these notes should work well with the material from class.
evolution (general definition)
Sustained change in the
phenotype (= appearance) of a system through time; includes
non-biological phenomena such as the universe, culture, music
evolution (biological)
Change in the allele frequencies
of a population through time
genotype
Underlying genetic constitution of an
organism
phenotype
Physical appearance of an organism; its
observed traits
\[genotype \stackrel{\mbox{environment,development}}{\longrightarrow} phenotype\]
\[phenotype = genotype + environment\]
gene
A section of DNA on a chromosome that codes for a
specific trait (e.g., flower color);
locus
location on a chromosome where a gene occurs
allele
One of two or more alternative states that exist
for a gene (e.g., red, white, purple)
homozygote
Individual that has 2 identical alleles at a
locus (rr or RR)
heterozygote
Individual that has 2 different alleles at
a locus (Rr)
dominant allele
One whose phenotype is expressed in
either homozygous or heterozygous individuals
recessive allele
One whose phenotype is expressed only
in homozygous individuals
pleiotropy
A single gene affects more than one trait
epistasis
Gene-gene interactions; the expression of one
gene affects another
polygenic trait
Small additive effects of many genes on
a single trait, such as body mass
environmental effects
The same genotype can have a
different phenotype depending on the environment in which it is raised
(e.g. appearance of same plant clone raised in sun versus
shade)
The combination of environmental effects and a polygenic trait leads to a trait that is measured on a continuous scale, and often has a normal or bell-shaped distribution. For example, body mass or height is a continuous trait. But if it were inherited as a single Mendelian gene with only 2 alleles and no environmental effects, individuals would be of only two sizes: “tall” or “short”.
A Punnet square is a simple table to determine the genotypes and phenotypes produced from a parental cross. You need to know the genotypes of both parents to draw the square.
Along the top of the square, list the gamete types that could be produced by one of the parents.
Along the side of the square, list the gamete types that could be produced by the other parent.
Remember that each gamete type is equally likely, so divide the rows and columns of the square evenly.
Inside each cell of the table, write the resulting genotype and phenotype of the offspring. Each of these cells is equally probable.
Tabulate the relative frequencies of the different genotypes and the different phenotypes in the offspring.
R = red flower color
r = white flower color
Father’s genotype: rr
Father’s phenotype: white flower
Mother’s genotype: Rr
Mother’s phenotype: red flower
Gametes | r | r |
---|---|---|
R | rR | rR |
r | rr | rr |
Offspring genotypic frequencies: Rr:rr 0.5:0.5
Offspring phenotypic frequencies: red:white 0.5:0.5
R = red flower color
r = white flower color
Y = smooth seed coat
y = wrinkled seed coat
Father’s genotype: RrYy
Father’s phenotype: red flower, smooth seed coat
Mother’s genotype: RrYy
Mother’s phenotype: red flower, smooth seed coat
Gametes | RY | rY | Ry | ry |
---|---|---|---|---|
RY | RRYY | RrYY | RRYy | RrYy |
rY | RrYY | rrYY | RrYy | rrYy |
Ry | RRYy | RrYy | RRyy | Rryy |
ry | RrYy | rrYy | Rryy | rryy |
Offspring genotypes: RRYY: RRYy: RRyy: RrYY: RrYy: Rryy: rrYY: rrYy:
rryy
Offspring genotype counts: 1: 2: 1: 2: 4: 2: 1: 2: 1
Offspring genotypic frequencies: \(\frac{1}{16},\frac{2}{16},\frac{1}{16},\frac{2}{16},\frac{4}{16},\frac{2}{16},\frac{1}{16},\frac{2}{16},\frac{1}{16}\)
Offspring phenotypes: RedSmooth: Redwrinkled: whiteSmooth:
whitewrinkled
Offspring phenotype counts: 9: 3: 3: 1
Offspring phenotypic frequencies: \(\frac{9}{16},\frac{3}{16},\frac{3}{16},\frac{1}{16}\)
gene pool
The set of all alleles in an interbreeding
population
genotypic frequencies
The proportion of different
genotypes in the population
allelic frequencies
The proportion of different alleles
in the population (regardless of genotype)
The genotypic and allelic frequencies can always be calculated directly from data in which a sample of individuals from a population is genotyped (from direct sequencing, SNP analysis, or measures of protein diversity). This calculation of observed genotypic and allelic frequencies does not make any assumptions about evolution or genetic change; it is just a snapshot of genetic diversity that has been measured.
Here are some sample data in the form of counts of different genotypes for a single gene locus:
Genotype | AA | AB | BB | Sum |
---|---|---|---|---|
Number of individuals | 75 | 20 | 100 | 195 |
f(AA) = 75/195 = 0.385
f(AB) = 20/195 = 0.103
f(BB) = 100/195 = 0.512
The allelic frequency calculation is slightly more calculated.
Remember that each homozygote carries two copies of a particular allele,
but a heterozygote carries only a single copy. So, we use
0.5 x f(AB)
to get the contribution of the heterozygote to
the allelic frequency:
f(A) = f(AA) + 0.5*f(AB)
f(A) = 0.385 + 0.5*0.103 = 0.436
f(B) = F(BB) + 0.5*f(AB)
f(B) = 0.512 + 0.5*0.103 = 0.564
As a check on your work, make sure that the genotypic frequencies sum to 1.0 and the phenotypic frequencies sum to 1.0.
This is slightly more complex, because you have to list out all of the possible genotypes, but the formulas are essentially the same. Here is an example of a single gene with 3 alleles J, K, and L
Genotype | JJ | JK | JL | KL | KK | LL | Sum |
---|---|---|---|---|---|---|---|
Number Of Individuals | 10 | 11 | 0 | 9 | 2 | 22 | 54 |
Here are the genotype frequencies:
f(JJ) = 10/54 = 0.185
f(JK) = 11/54 = 0.204
f(JL) = 0/54 = 0.000
f(KL) = 9/54 = 0.167
f(KK) = 2/54 = 0.037
f(LL) = 22/54 = 0.407
And here are the allelic frequencies. We often use small variables p,q,r… to indicate different alleles:
p = f(J) = f(JJ) + 0.5*f(JK) + 0.5*f(JL)
p = f(J) = 0.185 + 0.5*0.204 + 0.5*0.000 = 0.287
q = f(K) = f(KK) + 0.5*f(JK) + 0.5*f(KL)
q = f(K) = 0.037 + 0.5*0.204 + 0.5*0.167 = 0.222
r = f(L) = f(LL) + 0.5*f(JL) + 0.5*f(KL)
r = f(L) = 0.407 + 0.5*0.000 + 0.5*0.167 = 0.491
The Hardy-Weinberg equation (named after two population geneticists from the 1920s) uses simple rules of probability to generate the expected genotypic frequencies in a population that is subject only to random mating (see assumptions). It is based on the idea that, with random mating, the alleles present in the gene pool are paired up randomly in the genotypes of the offspring. We present the equation, show how we use it with data, and then list the assumptions.
The reasoning behind the Hardy-Weinberg equation is that the frequencies of alleles in the gene pool can be interpreted as probabilities of an allele being present in a single individual. Because each individual has two alleles for a gene, we end up multiplying probabilities together to get the expected frequency of a particular genotype.
If we have allele frequencies in a population p,q,r… that add up to 1.0, a simple binomial expnasion gives us the expected frequencies of each genotype:
Let p = f(A) allele and q = f(B) allele.
Because these are the only two alleles for this gene locus
\[p + q = 1.0\]
\[(p + q)^2 = 1.0^2\]
\[p^2 + 2pq + q^2 = 1.0\]
\[f(AA) + f(AB) + f(BB) = 1.0\]
Using the population data given above for the A and B alleles
Observed f(A) = 0.436
Observed f(B) = 0.564
f(AA) in Hardy-Weinberg equilibrium = p^2 = (0.436)*(0.436) = 0.1901
f(AB) in Hardy-Weinberg equilibrium = 2*p*q = 2*(0.436)*(0.564) = 0.4918
f(BB) in Hardy-Weinberg equilibrum = q^2 = (0.564)*(0.564) = 0.3181
When there are 3 alleles present in a population at a gene locus, we can use p, q, and r to represent their frequencies:
\[p + q + r = 1.0\]
\[(p + q + r)^2 = 1.0^2\]
\[p^2 + 2pq + 2pr + 2qr + q^2 + r^2 = 1.0\]
\[f(JJ) + f(JK) + f(JL) + f(KL) + f(KK) + f(LL) = 1.0\]
Using the population data given above for the J, K, and L alleles
p = Observed f(J) = 0.287
q = Observed f(K) = 0.222
r = Observed f(L) = 0.491
f(JJ) in Hardy-Weinberg equilibrium = p^2 = (0.287)*(0.287) = 0.0824
f(JK) in Hardy-Weinberg equilibrium = 2*p*q = 2*(0.287)*(0.222) = 0.1274
f(KL) in Hardy-Weinberg equilibrium = 2*p*r = 2*(0.287)*(0.491) = 0.2818
f(KL) in Hardy-Weinberg equilibrium = 2*q*r = 2*(0.222)*(0.491) = 0.2180
f(KK) in Hardy-Weinberg equilibrium = q^2 = (0.222)*(0.222) = 0.0493
f(LL) in Hardy-Weinberg equilibrum = r^2 = (0.491)*(0.491) = 0.2411
If the Hardy-Weinberg assumptions are met: - allelic frequencies never change
genotypic frequencies will change in a single generation of random mating from the observed frequencies to those predicted by the Hardy-Weinberg model
once the Hardy-Weinberg genotypic frequencies are achieved after a single generation of random mating, they will not change again in future generations
Remember that allelic frequencies can always be calculated from genotypic frequencies. This calculation involves no biological assumptions, it is just simple book-keeping.
However, in order to predict genotypic frequencies from allelic frequencies, we have to assume Hardy-Weinberg or some other kind of biological model that tell us what happens to allelic and genotypic frequencies each generation.
Mutation is the ultimate source of genetic variation in populations, but is it a strong molecular force by itself?
With 4 possible nucleotides, there are 43 = 64 possible 3-codon combinations. However, there are only 20 amino acids. Therefore, some substitutions (silent mutations) will code for an identical amino acids. Others (neutral mutations) will change the amino acid, but not alter the performance of the protein. Some codons indicate a start/stop to protein production, and such mutations are usually detrimental. So are frame-shift mutations in which codon sequences are misread.
For eukaryotes, rates of mutation are on the order of \(10^{-4}\) to \(10^{-6}\) mutations/gene locus/generation.
Consider an allele A, with an initial allelic frequency of \(p_0\). Each generation A alleles mutate into B alleles at a mutation rate of \(u\). After \(t\) generations of time, we have
\[q_t = 1 - p_0 e^{-ut}\]
For example, suppose \(p_0\) = 0.95, \(u\) = \(10^{-6}\), and \(t\) = 100 generations. How much of an increase will occur in the frequency of the B allele, which is starting out at f(B) = \(1 - p = 0.05\)?
u = 10^-6
p0 = 0.95
t = 100
q(100) = 1 - p0e^{-ut}
q(100) = 1 - 0.95e^{-10^{-6}*100}
q(100) = 0.050095
Even after 1000 generations, the change is only
q(1000) = 0.0509
Here is a graph illustrating the change through time
rm(list=ls())
p0 <- 0.95
t <- 1:100000
marks <- c(1,50000,100000)
u <- 0.000001
qt <- 1 - p0*exp(-u*t)
qt5 <- 1 - p0*exp(-0.00001*t)
qt4 <- 1 - p0*exp(-0.0001*t)
plot(x=t, y=qt, xlab="Time (Generations)", ylab="q (frequency of B allele)", type="l", col="red", xaxt="n", ylim=c(0,1))
axis(1,at=marks,labels=marks)
points(x=t,y=qt5,type="l",col="blue")
points(x=t,y=qt4,type="l",col="orange")
At this rate, it would take nearly 5 million generations \((5\times 10^6)\) for the B allele to go from a frequency of 0.05 to 0.95. The increase is faster if the mutation rate is \(10^{-5}\) (blue curve), but even at the faster rate of \(10^{-4}\) (orange curve), it still takes almost 50,000 generations for a substantial increase in allele frequency caused only by mutation.
Finally, we note that this analysis assumes that mutation only occurs in one direction (from A to B). But if there is also back mutation from B to A occurring at rate v, then the allele will never go to fixation. Instead an equilibrium will be reached at:
\[\hat{q} = \frac{u}{u + v}\]
\[\hat{p} = \frac{v}{u + v}\]
What are the effects of migration on allelic frequency? By migration, we mean the arrival of individuals from another population.
p0 = initial allele frequency in resident population (changing)
pm = allelic frequency in migrant population (constant)
t = time, in number of generations
m = migrant fraction (proportion of the population that consists of new migrants each generation)
1 - m = resident fraction (proportion of population that consists of non-migrants each generation)
To calculate the allelic frequency in the resident population after one generation of migration, we have:
\[p_1 = (1-m)p_0 + (m)p_m\]
More generally, after t generations, the frequency of the allele in the resident population (pt)
\[p_t = (1-m)^t(p_0 - p_m) + p_m\]
Given an intial resident frequency p0 = 0.5, a migrant allele frequency pm = 0.9, and the passage of t = 10 generations, the new allelic frequency in the resident population (p10) is:
p10 = (1 - 0.1)^10 * (0.5 - 0.9) + 0.9 = 0.76
So, after only 10 generations, the allele frequency changes from p0 = 0.50 to p10 = 0.76. In order for this calculation to hold, all of the other Hardy-Weinberg assumptions need to be in place (no mutation, no selection, large population size, random mating, random segregation of alleles).
If we examine a random stretch of DNA in an organisms’s genome, how much variation will be present, and how will it be structured?
Classical model
Very low genetic variation. Most
genes have two homozygous “wild type” alleles (+), with an occasional
recessive allele (r) showing up that is usually deleterious. Natural
selection operates mostly as purifying selection, removing recessive
alleles that are deleterious. This was the view in the early 1900s that
emerged from classical genetics, when the only way that “genotypes”
could be scored was on the basis of major mutations (which often were
deleterious).
Balance model
Low genetic variation, but some
variants are maintained through balancing selection
in
which the fitness of the heterozygote (AB) is superior to that of either
of the two homozygote genotypes (AA or BB). This can happen when the two
protein variants expressed in a homozygous individual function optimally
at slightly different conditions, which can increase the fitness of an
individual in a variable or changing environment. Sickle-cell anemia and
resistance to malaria is the classic example.
Neutral model
High genetic variation, with many
different alleles in the population and many heterozygous loci in
different individuals. These kind of alleles have no effect (good or
bad) on the fitness of the organism, although in different environments
or at some time in the past, they may have fitness
consequences.
With modern sequencing methods revealing large amounts of genetic diversity in most species, the consensus view is that the balance and neutral models capture the typical pattern, although of course there are still many examples of deleterious recessive alleles that match the classic model.
Hardy-Weinberg assumes random mating, but there are a number of different possibilities for how individuals choose mates:
random mating
Mate choice is independent of genotype
or phenotype
positive assortative mating
More frequent matings
between similar phenotypes
negative assortative mating
more frequent matings
between dissimilar phenotypes
inbreeding
More frequent matings between
relatives
The degree of inbreeding can be quantified with the inbreeding coefficient, F
inbreeding coefficient
The fractional reduction in
heterozygosity relative to a randomly mating population.\[F = \frac{H_0 - H}{H_0} = 1 - \frac{H}{H_0}\]
where H is the observed heterozygosity in the population and H0 is the expected heterozygosity in a Hardy-Weinberg population (2pq).
An equivalent definition comes from the pattern of an individuals predigree:
autozygous alleles
Two alleles in an individual that
are identical by descent from a single ancestor
allozygous alleles
Two alleles in an individual that
are identical by descent from two different ancestors
inbreeding coefficient (pedigree definition)
The
probability that two alleles in an individual are identical by descent
(=autozygous)
With these definitions, we can modify the Hardy-Weinberg equation to give the expected genotype frequencies with inbreeding:
Genotypes | Frequencies | F=0 | F=1 |
---|---|---|---|
AA | p2(1 - F) + pF | p2 | p |
AB | 2pq(1 - F) | 2pq | 0 |
BB | q2(1 - F) + qF | q2 | q |
For example, suppose the allele frequencies are p = 0.2, q = 0.8, and F = 0.5. We would have:
f(AA) = 0.2^2(1 - 0.5) + 0.2*0.5 = 0.12
f(AB) = 2*0.2*0.8*(1 - 0.5) = 0.16
f(BB) = 0.8^2(1 - 0.5) + 0.8*0.5 = 0.72
So, the primary effect of inbreeding is to reduce the frequency of heterozygotes. In the extreme case of full inbreeding, there are no heterozygotes, and we end up with two inbred homozygote lines.
The costs of inbreeding are:
expression of deleterious recessive alleles (short-term )
loss of heterozygosity (long-term)
These problems are especially acute for small populations (which are often highly inbred) that may be facing novel environments due to climate change and other factors.
However, there is also an argument that inbreeding could benefit a
population by preserving particular genotypes that function well
together (= co-adapted gene complex
).
This effect would be most beneficial for organisms living in stable environments whose offspring do not disperse very far from their parents. Accordingly, there are many examples of restricted plant populations with little or no genetic variability that seem perfectly healthy (at least until the climate changes).
Allele frequencies in a population can change from random effects caused by the segregation of alleles into gametes during meiosis. For example, imagine a cross between two heterozygous individuals that produce a total of 400 offspring:
Genotype | Frequency | Expected Number of Offspring |
---|---|---|
AA | 0.25 | 100 |
AB | 0.50 | 200 |
BB | 0.25 | 100 |
Allele | Expected Allele Frequency |
---|---|
A | 0.50 |
B | 0.50 |
Of course, by chance, we might not see precisely these numbers. Suppose the counts look like this:
Genotype | Observed Frequency | Observed Number of Offspring |
---|---|---|
AA | 0.2525 | 101 |
AB | 0.5000 | 200 |
BB | 0.2475 | 99 |
This deviation has a trivial effect on the allele frequencies:
Allele | Observed Allele Frequency |
---|---|
A | 0.5025 |
B | 0.4975 |
But now imagine the same scenario for a cross that produces only 4 offspring:
Genotype | Frequency | Expected Number of Offspring |
---|---|---|
AA | 0.25 | 1 |
AB | 0.50 | 2 |
BB | 0.25 | 1 |
Allele | Expected Allele Frequency |
---|---|
A | 0.50 |
B | 0.50 |
Look what happens this time if the genotype counts are shifted by just one individual:
Genotype | Observed Frequency | Observed Number of Offspring |
---|---|---|
AA | 0.5000 | 2 |
AB | 0.5000 | 2 |
BB | 0.0000 | 0 |
This deviation has a huge effect on the allele frequencies:
Allele | Observed Allele Frequency |
---|---|
A | 0.75 |
B | 0.25 |
In this case, the frequency of the A allele has changed from 0.50 to 0.75 in a single generation. Remember that this change will affect all of the descendants of this cross. Even if the population size should return to 400 individuals, this random change in allele frequencies will affect the subsequent evolution of this population.
genetic drift
random changes in allelic and genotypic
frequencies caused by small population size.
You can see from this example that genotypic and allelic changes from genetic drift are much more important in small populations than in large populations. Below a size of roughly 100 individuals, genetic drift becomes very important.
fixed allele
a gene locus that has only a single allele
in a population is “fixed” because the allele frequency is 1.00, and
every individual in the population is homozygous for the same
allele.
Suppose a single new allele arises in a population from mutation. The long-term probability of fixation of this allele is its frequency in the gene pool:
\[p(\mbox{fixation}) = \frac{1}{2N}\]
Remember that there are 2N alleles in the gene pool for a single gene locus.
However, with mutation rate u, each generation there will be 2nu copies of the mutant produced. Thus, the long-term probability of fixation is:
\[p(\mbox{fixation}) = \frac{2Nu}{2N} = u\]
Thus, with mutation and genetic drift, we have the interesting result that the long-term probability of fixation = u, the mutation rate of the allele.
As we discussed before, u is a small number, so the chances are slim.
In general, if the probability of a single event is small, the overall probability of it occurring at least once can be surprisingly large
\[p(\mbox{single event}) = z\]
\[p(\mbox{no event in one trial}) = 1 - z\]
\[p(\mbox{no events in t trials}) =(1 - z)^t\]
\[p(\mbox{at least 1 event in t trials}) = 1 - (1-z)^t\]
This formula applies not to just genetic drift, but to all chance events in life. Suppose for example, that you estimate that the chances of getting a ticket for speeding are 1/100, and you only speed on Friday afternoons (to get home from work quickly to start your weekend).
What are the chances of getting caught speeding at least once during a year in which you work 50 weeks?
p(speeding ticket during one year) =
1 - (1 - 0.01)^50 = 0.39
So, a 39% chance of getting caught at some time during the year, even though the chance of getting caught each individual Friday is only 0.01.
And if you commute like this for 5 years in a row?
p(speeding ticket at least once during 5 years) =
1 - (1 - 0.39)^5 = 0.92
A 92% chance you will get caught!
There is an important distinction between the observed population
size (N) and the
effective population size
(NE):
effective population size
The equivalent number of
individuals in a randomly mating population
In general:
\[N_E < N\]
Why should the effective population size be anything less than the observed population size? There are a number of forces at work that reduce the effective population size by preventing the complete mixing of alleles that we expect in a population that is mating at random. These factors include
founder effect
If a population is colonized by only
a few individuals (think of islands), the alleles carried by those
colonizers will be a small— and often non-random— subset of the larger
population they originated from.
bottleneck
If a population shrinks back to a small
size — even for a single generation — that will reduce the effective
population size more than we would expect by calculating a simple
arithmetic average of the observed population sizes in consecutive
generations. Thus, we can think of the founder effect as a special case
of a bottleneck that occurs during colonization.
unbalanced sex ratio
If the ratio of males:females
in a sexually reproducing population is different from 1:1, the alleles
represented by the rarer sex will be disproportionately represented in
the next generation.
limited natal dispersal
If individuals disperse only
a limited distance from where they were born, they will only encounter a
limited number of potential mates. Even if they mate randomly, the
allelic diversity in this limited spatial “neighborhood” is less than
that of the entire population.
Let’s look at some simple equations for calculating NE under these circumstances
If observed population size ni changes in each of t consecutive generations:
\[\frac{1}{N_E} = \frac{1}{t} (\frac{1}{n_1}+ \frac{1}{n_2}+ \frac{1}{n_3}+ \frac{1}{n_4}+\dots+ \frac{1}{n_t})\]
For example, suppose the observed population size of a population of orchids is 100,4,100,100,100, undergoing a bottleneck in generation 2, but then fully recovering in subsequent generations. For this sequence, NE is calculated as:
1/N_E = 1/5(1/100 + 1/4 + 1/100 + 1/100 + 1/100)
1/N_E = 1/5(1//100 + 1/4 + 1/100 + 1/100 + 1/100)
1/N_E = 1/5(1/100 + 25/100 + 1/100 + 1/100 + 1/100)
1/N_E = 1/5(29/100)
1/N_E = (29/500)
N_E = (500/29) = 17.24 individuals
Notice that this number (17.24) is less than the simple average of these population sizes (85). This formula is actually a calculation of the harmonic mean of a series of numbers. The harmonic mean is affected by small outliers and is always less than the arithmetic mean of a series of numbers.
Alleles will be thoroughly mixed in a randomly mating population with equal numbers of males and females. However, if the sex ratio is skewed strongly from 1:1, the allelic diversity will be limited by the rarer sex and the alleles that it is collectively carrying. If the population consists of NM males and NF females, the effective population size (NE) is:
\[ N_E = \frac{4N_M N_F}{N_M + N_F} \]
For example, if the population consists of 100 females and only 10 males, the effective population size is:
N_E = (4)(100)(10)/(100 + 10)
N_E = 4000/110 = 36.4 individuals
Notice that although the observed population size (N) is 110 individuals, the effective population size (NE) is only 36.4, which is small enough for genetic drift to become important.
For complete mixing of aleles, an individual would need to be able to mate randomly with any other individual in its population. More realistically, an individual is much more likely to mate with neighboring individuals that are close by and much less likely to mate with individuals that are distant. Under these circumstances, the effective population size is calculated as:
\[ N_E = 4\pi dx \]
where d is the population density (individuals/area), and x is the dispersal distance from where an individual is born to where it mates. With limited dispersal and/or a population that is at low density, individuals are likely to choose mates from only a limited “neighborhood” of nearby individuals. Even with random mating, the effect of this is to reduce the local genetic diversity in each of the neighborhoods. Limited dispersal introduces a kind of “viscosity” to the population that can make genetic drift important.
As a simple example, if the density of individuals is 10 per m2, but the dispersal distance is only 1 m, the effective population size is:
N_E = 4*pi*10*1
N_E =125.6 individuals
In summary, genetic drift is an important force in changing allele frequencies when effective population sizes are less than 100, and there are a number of common features of populations (bottlenecks, biased sex ratios, and limited dispersal) that can lower NE below this threshold.
Mechanism | Change in Allele Frequency? | Change in Genotype Frequency? |
---|---|---|
Mutation | Yes (unlikely) | Yes (unlikely) |
Migration | Yes | Yes |
Non-random Mating | No (yes with recessive lethals) | Yes |
Genetic Drift | Yes (if NE < 100) | Yes (if NE < 100) |
Mechanism | Strength of Change? | Lead to Fixation? | Predictable? |
---|---|---|---|
Mutation | Weak | Yes (no with back mutation) | Yes |
Migration | Strong | Yes | Yes |
Non-random Mating | Weak | No (yes with recessive lethals) | Yes |
Genetic Drift | Strong (if NE < 100) | Yes (if NE < 100) | No |
natural selection (popular definition)
“survival of the
fittest”
natural selection (biological definition)
differential
reproduction and/or survival of individuals with heritable traits
tautology
a self-referencing definition
establish an experiment with replicate individuals of each genotype in the population (there will be 3 such genotypes AA AB BB for a single-gene two-allele system)
expose individuals to a selection pressure (e.g. heat shock, presence of a predator, disease)
calculate the number surviving after the selection pressure (or their reproduction in a fecudity experiment)
calculate absolute fitness as the proportion that survive (p1, p2, p3)
calculate relative fitness by dividing absolute fitness by the absolute fitness of the “best” genotype (w1, w2, w3)
assume the environment is constant, so that the relative fitness values are the same in each generation
absolute fitness
proportional survival or relative
reproduction of genotypes in a selection experiment
relative fitness
absolute fitness values scaled to the
largest absolute fitness measured for one of the genotypes in the
population
Genotype | Before Selection | After Selection | Absolute Fitness | Relative Fitness |
---|---|---|---|---|
AA | 50 | 20 | 20/50=0.4 | 0.4/0.4 = 1.0 |
AB | 100 | 35 | 35/100=0.35 | 0.35/0.40 = 0.875 |
BB | 25 | 5 | 5/25=0.20 | 0.20/0.40 = 0.50 |
w1 = 1.0 = relative fitness of AA
w2 = 0.875 = relative fitness of AB
w3 = 0.50 = relative fitness of BB
mean fitness
average fitness of individuals in the
population after random mating and selection \(= \bar{w}\)
\[\bar{w} = p^2w_1 + 2pqw_2 + q^2w_3\]
\[ \#AA \] \[ \#AB \] \[ \#BB \]
\[ w_1 = \frac{\mbox{absolute fitness of AA}}{\mbox{largest maximum fitness}} \]
\[ w_2 = \frac{\mbox{absolute fitness of AB}}{\mbox{largest maximum fitness}} \]
\[ w_3 = \frac{\mbox{absolute fitness of BB}}{\mbox{largest maximum fitness}} \]
AA = 50 AB = 50 BB = 100
w_1 = 0.40/0.40 = 1.00
w_2 = 0.35/0.40 = 0.875
w_3 = 0.20/0.40 = 0.50
\[ f(AA) = \frac{\#AA}{N} \; f(AB) = \frac{\#AB}{N} \; f(BB) = \frac{\#BB}{N} \]
\[ f(A) = f(AA) + \frac{1}{2}f(AB) = p_0 \]
\[ f(B) = f(BB) + \frac{1}{2}f(AB) = q_0 \]
f(AA) = 50/200 = 0.25
f(AB) = 50/200 = 0.25
f(BB) = 100/200 = 0.50
f(A) = 0.25 + (0.5)*(0.25) = 0.375 = p_0
f(B) = 0.50 + (0.5)*(0.25) = 0.625 = q_0
\[ f(AA)=p_0^2 \]
\[ f(AB)= 2p_0q_0 \]
\[ f(BB)=q_0^2 \]
f(AA) = (0.375)^2 = 0.141
f(AB) = 2(0.375)(0.625) = 0.469
f(BB) = (0.625)^2 = 0.391
\[ p_0^2w_1 + 2p_0q_0w_2 + q_0^2w_3 = \bar{w} \]
f(AA) = (0.141)(1.0) = 0.141
f(AB) = (0.469)(0.875) = 0.410
f(BB) = (0.391)(0.50) = 0.196
mean fitness = 0.141 + 0.410 + 0.196 = 0.747
\[ f(AA)=\frac{p_0^2w_1}{\bar{w}} \]
\[ f(AB)= \frac{2p_0q_0w_2}{\bar{w}} \]
\[ f(BB)= \frac{q_0^2w_3}{\bar{w}} \]
f(AA) = 0.141/0.747 = 0.189
f(AB) = 0.410/0.747 = 0.549
f(BB) = 0.196/0.747 = 0.262
\[ f(A) = f(AA) + \frac{1}{2}f(AB) = p_1 \]
\[ f(B) = f(BB) + \frac{1}{2}f(AB) = q_1 \]
f(A) = 0.189 + 0.5*(0.549) = 0.464 = p_1
f(B) = 0.262 + 0.5*(0.549) = 0.536 = q_1
\[ f(AA)=p_1^2 \]
\[ f(AB)= 2p_1q_1 \]
\[ f(BB)=q_1^2 \]
f(AA) = (0.464)^2 = 0.215
f(AB) = 2(0.464)(0.536) = 0.497
f(BB) = (0.536)^2 = 0.287
Selection coefficient s
measure of relative selection
against a genotype
\[ s = 1 - w \]
s = 0 (no relative loss to selection = best genotype = w = 1)
s = 1 (lethal genotype w = 0)
Selection Scenario | w1 | w2 | w3 | Result |
---|---|---|---|---|
against recessive | 1.0 | 1.0 | 1 - s3 | Slow elimination of (a) allele |
against recessive + mutation | 1.0 | 1.0 | 1 - s3 | equilibrium \(q \approx \sqrt{(u/s)}\) |
against dominant | 1 - s1 | 1 - s2 | 1.0 | rapid elimination of (A) allele |
favoring heterozygote | 1 - s1 | 1.0 | 1 - s3 | equilibrium \(q = s_1/(s_1 + s_3)\) |
Fisher's Fundamental Theorem Of Natural Selection
Natural selection maximizes the average fitness \((\bar{w})\) in a population.
If there is selection against a dominant or against a recessive, this
leads to allele fixation with \(\bar{w} =
1\). If there is selection against a recessive with ongoing
mutation or selection favoring the heterozygote (=
heterosis
or balancing selection
) both alleles
are maintained at an equilibrium, and \((\bar{w})\) is maximized at a value less
than 1.0.
Evolutionary phenomena (genetic change in populations, adaptation, speciation) can be explained via mechanisms consistent with Mendelian genetics.
Evolution is gradual via small genetic changes; as these changes accumulate, species diverge.
Natural selection is the strongest mechanism of evolution, and may work in concert with genetic drift.
Genetic diversity in populations reflects past and current natural selection.
Microevolutionary change leads to macroevolutionary responses.
Most traits in nature are not controlled by a single gene with multiple loci. Many traits are controlled by multiple loci and can also reflect phenotypic plasticity and environmental effects. These traits, such as body size, usually exhibit a bell-shaped normal distribution. In this lecture we describe a simple theoretical framework for thinking about the evolution of these traits, and then consider three experimental designs (artificial selection, common garden, reciprocal transplant) to tease apart environmental and genetic effects on continuous traits.
discrete trait
a trait that takes on only a limited
number of discrete values (e.g. red, purple, or white feathers). These
traits are often controlled by a single Mendelian gene with a few
alternative loci.
continuous trait
a trait (such as body mass) that can
exhibit a continuous range of values. These traits are often controlled
by several genes with small additive effects, and are often
phenotypically plastic, so that their expression also depends on
environmental conditions.
directional selection
Individuals with the largest value
of a trait have higher fitness than individuals with intermediate or
small trait values. Over time, the average trait value will increase,
but there will probably be no change in the trait variance. Depending on
the particular trait, directional selection could work on the smallest
value of a trait in just the same way. Directional selection may be
important in changing environments, such as increasing global
temperatures, because it will favor those individuals with extreme
traits (e.g. heat tolerance) that confer higher fitness under new
conditions.
stabilizing selection
Individuals with intermediate
values of the trait have higher fitness than individuals with extremely
large or extremely small trait values. Over time, the average trait
value does not change, but the variance in the trait will decrease
because the individuals with the most extreme trait values will have
lower fitness. Stabilizing selection is common in stable environments,
where a single optimal phenotype (such as body size or offspring
production) maximizes individual fitness.
disruptive selection
Individuals with extreme trait
values (large or small) have higher fitness than individuals with
intermediate trait values. Over time, the average trait value will not
change, but the variance should increase, possibly leading to a bimodal
distribution of traits. This kind of selection is uncommon, but can be
seen when individuals colonize new habitats where they are rapidly
exposed to extreme conditions. One example (which we will discuss) is
the evolution of plant tolerances to heavy metals in contaminated
soils.
We humans have been conducting selective breeding experiments— both consciously and unconsciously— on plants and animals for thousands of years, and they yielded everything from fast-growing varieties of wheat to noble dog breeds like the chow-chow. Artificial selection experiments also gave Darwin the critical insight that environmental conditions in nature could impose a similar kind of natural selection on populations that would cause populations to evolve and eventually form new species.
The method is simple:
\[\bar{x} = \mbox{mean of trait in parental stock (before selection and before breeding)}\] \[\bar{y} = \mbox{mean of trait in parental stock (after selection and before breeding)}\] \[\bar{z} = \mbox{mean of trait in parental stock (after selection and after breeding)}\]
\[S = \mbox{Selection Differential} = \bar{y} - \bar{x}\] \[R = \mbox{Response To Selection} = \bar{z} - \bar{x}\] \[\mbox{heritability} = h^2 = \frac{R}{S}\]
heritability
the proportion of total variation among
individuals in a continuous trait that can be attributed to genetic
differences among individuals. Note that \(0.0
\le p \le 1.0\).
x = 10
y = 20
z = 11
S = selection differential = 20 - 10 = 10
R = response to selection = 11 - 10 = 1
h^2 = heritability = R/S = 1/10 = 0.10
Common garden experiments are a method for teasing apart genetic and environmental influences on a trait. Typically, these experiments begin with observations of average trait values seen in the field. In the following set of boxplots, we see measures of a trait value for 15 individuals sampled randomly from each of 3 populations (P1, P2, and P3).
There are statistically significant differences in the trait values (the phenotype) between these populations, and we want to understand if these are caused by differences in the genotypes in the 3 populations, or differences in the environment in the 3 locations.
The organisms are transplanted into a common garden (either in the lab or in the field) so they all experience the same environmental conditions. We further assume that the trait is not influenced by maternal effects (such as the condition of the mother during the development of the embryo) or early environment effects (such as nutrition and diet of the organism during early growth).
In this case, it looks like there is a strong genetic component to the trait, because the differences observed in the natural populations are also found when the organisms from each population are raised in the common garden experiment:
When we find this result, we say that the different populations represent different ecotypes.
ecotype
Genetically distinct geographic varieties from
different local populations.
Another scenario is that the trait distributions in the common garden
experiment are not statistically different from one another. In this
case, the traits values are approximately the same for individuals
sampled from different populations. This results suggests that the
phenotypic differences observed in nature reflect differences in the
environment that each population experiences. The populations do not
show evidence for genetic differentiation (at least with respect to this
trait), and thus they do not represent different ecotypes. Note that the
results of the common garden experiment may depend on the kind of
environment that is used in the common garden. We explore this issue
more carefully with a reciprocal transplant experiment.
A reciprocal transplant experiment is like a common garden experiment, but it is done in the field, and there are two transplant sites, representing the locations where the two populations originated. Two additional “control” treatments are established by transplanting individuals back into the site from which they were collected. This treatment controls for any effects of handling or transport on the expression of the trait. Because there are two populations from which individuals are transplanted, and two sites into which they are transplanted, this is a “crossed” or “orthogonal” experimental design.
In the diagrams below, the table and figure illustrate the mean value of the trait measured in each of the 4 treatments, and the table of statistical tests gives results for additive effects of genotype, additive effects of environment, and the genotype by environment interaction term.
Donor Population (Cold) | Donor Population (Warm) | |
---|---|---|
Recipient Site (Cold) | 10 | 10 |
Recipient Site (Warm) | 10 | 10 |
Statistical Test | P-value |
---|---|
Genotype Effect (Additive) | N.S. |
Environment Effect (Additive) | N.S. |
Genotype x Environment (Interaction) | N.S. |
Donor Population (Cold) | Donor Population (Warm) | |
---|---|---|
Recipient Site (Cold) | 20 | 20 |
Recipient Site (Warm) | 10 | 10 |
Statistical Test | P-value |
---|---|
Genotype Effect (Additive) | N.S. |
Environment Effect (Additive) | P < 0.05 |
Genotype x Environment (Interaction) | N.S. |
Donor Population (Cold) | Donor Population (Warm) | |
---|---|---|
Recipient Site (Cold) | 20 | 10 |
Recipient Site (Warm) | 20 | 10 |
Statistical Test | P-value |
---|---|
Genotype Effect (Additive) | P < 0.05 |
Environment Effect (Additive) | N.S. |
Genotype x Environment (Interaction) | N.S. |
Donor Population (Cold) | Donor Population (Warm) | |
---|---|---|
Recipient Site (Cold) | 20 | 5 |
Recipient Site (Warm) | 25 | 10 |
Statistical Test | P-value |
---|---|
Genotype Effect (Additive) | P < 0.05 |
Environment Effect (Additive) | P < 0.05 |
Genotype x Environment (Interaction) | N.S. |
Donor Population (Cold) | Donor Population (Warm) | |
---|---|---|
Recipient Site (Cold) | 20 | 50 |
Recipient Site (Warm) | 15 | 10 |
Statistical Test | P-value |
---|---|
Genotype Effect (Additive) | N.S. |
Environment Effect (Additive) | N.S. |
Genotype x Environment (Interaction) | P < 0.05 |
When the interaction term is significant, the lines connecting the treatments are no longer parallel. We can describe this interaction by saying that the difference in trait values between genotypes depends on the environment. Equivalently, we could say that the difference in trait values between environments depends on the genotype. Statistical interactions of this sort make it hard to predict the joint effect of two factors (in this case environment and genotype) just knowing how each one operates in isolation. Strong interactions between prescription drugs are a common example of this problem.
species (biological definition)
Groups of actually or
potentially interbreeding populations that are reproductively
isolated from other such groups.
species (taxonomic definition)
Populations that can be
reliably distinguished on the basis of one or more morphological/genetic
characters.
Morphologically Distinct | Not Morphologically Distinct | |
---|---|---|
Reproductively Isolated | “good” species | cryptic species |
Not Reproductively Isolated | variable populations | similar populations |
If post-mating isolating mechanisms are present, then natural selection will favor the evolution of pre-mating mechanisms.
Allopatric | Peripheral Isolates | Sympatric | |
---|---|---|---|
Vertebrates | 80% | 15% | 5% |
Plants | 20% | 20% | 60% |
dichotomy
tree is drawn with single branch forks
polytomy
multiple simultaneous branch points (with more
data, these are usually resolved into a series of dichotomous
branches)
sister species
two species sharing a most recent common
ancestor
sister taxa
any groups on a tree sharing a most recent
common ancestor
monophyletic group
an ancestor and all of its
descendants
polyphyletic group
an incorrect classification or grouping
that is not monophyletic
synapomorphy
shared derived characters uniting monophyletic
groups
homoplasy
independent or convergent evolution of the same
character state in unrelated lineages
sex
The recombination of alleles with those of another
individual via meiosis and fertilization. Reproduction can occur without
sex (e.g. plant grafts, and parthenogenesis). Sex can even occur without
reproduction (e.g. Paramecium, which undergo conjugation and
exchange genetic material without reproducing). Some organisms like
aphids alternate between sexual and asexual reproduction.
Males produce a tiny gamete (sperm), whereas females produce a large gamete (egg). All other differences between males and females are secondary sexual characteristics. Although both sexes make equal genetic contributions to offspring, the energetic investment of females per gamete may be 100 to 1000 fold greater than for males.
male strategy
gametes are inexpensive so making a
“mistake” and choosing a mate with low-fitness alleles is not costly.
Strategy should be to maximize the number of matings.
female strategy
gametes are expensive so making a
“mistake” and choosing a mate with low-fitness alleles is costly.
Strategy should be to maximize the quality of matings.
Bateman's Principle
Sexual selection should be strongest
on males, who are competing for females.
The two predictions of Bateman’s Principle are:
Both predictions have been repeatedly confirmed. In pipe fish, the pattern is opposite that of prediction (2), but that is because male pipefish carry the eggs and provide parental care of the young.
Males compete among themselves for resources, for females, and for selection as mates by females.
Females may choose males based on a number of factors that will enhance their fitness.
altruism
individuals enhance the fitness of others at
the expense of their own fitness.
Actor Benefits | Actor is Harmed | |
---|---|---|
Recipient Benefits | Cooperation | Altruism |
Recipient is Harmed | Selfishness | Spite |
Because natural selection will favor individuals that successfully pass on their own alleles, how can altruistic behavior ever evolve?
group selection
Groups of individuals that engage in
altruistic behavior will outperform groups that engage in selfish
behavior.
This argument is invoked when it is claimed that altruism is favored “for the good of the species”. But an altruistic group is always vulnerable to invasion by a cheating individual, whose own fitness will always be higher than the others in a group of altruists. For similar reasons, it is difficult for sellers to maintain an economic cartel because rivals can always undercut them by selling at cheaper prices.
The evolutionary biologist William Hamilton proposed kin selection as a general solution to how altruism might evolve. The key to kin selection is recognizing that individuals in a population share alleles not only with their offspring, but with other relatives.
kin selection
Selection favoring the spread of allele
copies in related individuals.
Altruistic behavior can spread through a population when
\[ Br - c > 0\]
B = benefit to the recipient
c = cost to the actor
r = degree of relatedness
r(degree of relatedness)
The probability that 2 alleles
in 2 different individuals are shared by descent.
Note that this definition of r
is very similar to the
definition of the inbreeding cofficient F, which is the probability that
two alleles in a single individual are shared by descent.
To calculate r, draw the arrows of relationship connecting the actor and recipient and indicate the proportion of shared alleles for each step. Next multiply the proportions for each unique path and sum the paths together.
These African birds nest in colonies of 40-150 individuals and interact within smaller subgroups of 3-17 birds. The individuals are highly cooperative. They share in colony defense, food-gathering, and cooperative rearing of young. Some females will post-pone reproduction to care for the young of others. However, a careful analysis of genetic paternity revealed that altruistic behaviors are preferentially directed towards relatives:
r | Expected Frequency of Helping | Observed Frequency of Helping |
---|---|---|
1/2 | 0.17 | 0.41 |
1/4 | 0.30 | 0.40 |
0 | 0.35 | 0.15 |
These observations confirm the basic prediction of kin selection, which is that altruism should be directed towards individuals who are related and share alleles.
Here is a collection of R functions that calculate each of the
formulas we use in BCOR 102. You can paste them into R or use in the
accompanying script file BCOR_LECTURE_NOTES.R
. These are
simple calculators that you can use to practice numerical problems and
make sure you are doing your calculations correctly.
AlleleFreq_2A
takes as inputs the number or
frequencies of 3 genotypes (AA AB BB) and returns the frequencies of the
two alleles (A B)
AlleleFreq_3A
takes as inputs the number or
frequencies of 6 genotypes (JJ JK KK KL KK LL) and returns the
frequencies of the three alleles (J K L)
HardyWeinberg_2A
takes as inputs the frequencies of
two alleles (A B) and returns the frequencies of the three genotypes at
equilibrium (AA AB BB).
HardyWeinberg_3A
takes as inputs the frequencies of
two alleles (J K L) and returns the frequencies of the three genotypes
at equilibrium (JJ JK KK KL KK LL).
# FUNCTION to calculate observed allele frequencies or a single gene with 2 alleles
# NJG
# 21 November 2015
AlleleFreq_2A <- function(x=c(AA=100, AB=50, BB=50)) {
# Pull out counts of individual genotypes
AA <- x[1]
AB <- x[2]
BB <- x[3]
# Create a vector and divide by the sum; works for frequencies or raw counts as input
Gen_Freq <- c(AA,AB,BB)/sum(AA + AB + BB)
# Print genotype frequencies
cat("Observed genotypic frequencies:", "\n","freq(AA) = ", Gen_Freq[1], "\n", "freq(AB) = ", Gen_Freq[2], "\n", "freq(BB) = ", Gen_Freq[3],"\n")
cat("\n")
# Create vector for allele frequencies
Allele_Freq <- vector("numeric",2)
# Use genotypes to calculate allele frequencies
Allele_Freq[1] <- Gen_Freq[1] + 0.5*Gen_Freq[2]
Allele_Freq[2] <- Gen_Freq[3] + 0.5*Gen_Freq[2]
# Print allelic frequencies
cat("Observed allelic frequencies:", "\n", "freq(A) = ", Allele_Freq[1], "\n", "freq(B) = ", Allele_Freq[2], "\n")
cat("\n")
# Return the output vector
return(Allele_Freq)
}
# FUNCTION to calculate observed allele frequencies or a single gene with 3 alleles
# NJG
# 21 November 2015
AlleleFreq_3A <- function(x=c(JJ=100, JK=50, JL=50, KL=50, KK=50, LL=50)) {
# Convert input vector to individual genotypes
JJ <- x[1]
JK <- x[2]
JL <- x[3]
KL <- x[4]
KK <- x[5]
LL <- x[6]
# Create a vector and divide by the sum; works for frequencies or raw counts as input
Gen_Freq <- c(JJ, JK, JL, KL, KK, LL)/sum(JJ, JK, JL, KL, KK, LL)
# Print genotype frequencies
cat("Observed genotypic frequencies:", "\n","freq(JJ) = ", Gen_Freq[1], "\n", "freq(JK) = ", Gen_Freq[2], "\n", "freq(JL) = ", Gen_Freq[3],"\n", "freq(KL) = ", Gen_Freq[4], "\n", "freq(KK) = ", Gen_Freq[5], "\n", "freq(LL) = ", Gen_Freq[6],"\n")
cat("\n")
# Create vector for allele frequencies
Allele_Freq <- vector("numeric",3)
# Use genotypes to calculate allele frequencies
Allele_Freq[1] <- Gen_Freq[1] + 0.5*Gen_Freq[2] + 0.5*Gen_Freq[3]
Allele_Freq[2] <- Gen_Freq[5] + 0.5*Gen_Freq[2] + 0.5*Gen_Freq[4]
Allele_Freq[3] <- Gen_Freq[6] + 0.5*Gen_Freq[3] + 0.5*Gen_Freq[4]
# Print allelic frequencies
cat("Observed allelic frequencies:", "\n", "freq(J) = ", Allele_Freq[1], "\n", "freq(K) = ", Allele_Freq[2], "\n", "freq(L) = ", Allele_Freq[3], "\n")
cat("\n")
# Return the output vector
return(Allele_Freq)
}
# FUNCTION to calculate Hardy-Weinberg genotypic frequency for a single gene with 2 alleles
# NJG
# 21 November 2015
HardyWeinberg_2A <- function(x=c(p=0.7, q=0.3)){
# Convert input vector to individual frequencies
p <- x[1]
q <- x[2]
# Create a vector for genotypic frequencies
Genotype_Freq <-vector("numeric",3)
# Use Hardy-Weinberg equation to calculate genotypic frequencies from allelic frequencies
Genotype_Freq[1] <- p^2
Genotype_Freq[2] <- 2*p*q
Genotype_Freq[3] <- q^2
# Print allelic frequencies
cat("Observed allelic frequencies:", "\n", "f(A) = ", p, "\n", "f(B) = ", q, "\n")
cat("\n")
# Print expected Hardy-Weinberg genotypic frequencies
cat("Expected Hardy-Weinberg genotypic frequencies:", "\n", "H-W f(AA) = ", Genotype_Freq[1], "\n", "H-W f(AB) = ", Genotype_Freq[2], "\n", "H-W f(BB) = ", Genotype_Freq[3], "\n")
cat("\n")
# Return the output vector
return(Genotype_Freq)
}
# FUNCTION to calculate Hardy-Weinberg genotypic frequency for a single gene with 3 alleles
# NJG
# 21 November 2015
HardyWeinberg_3A <- function(x=c(p=0.7, q=0.2, r=0.1)){
# Convert input vector into individual allelic frequencies
p <- x[1]
q <- x[2]
r <- x[3]
# Create a vector for genotypic frequencies
Genotype_Freq <-vector("numeric",6)
# Use Hardy-Weinberg equation to calculate genotypic frequencies from allelic frequencies
Genotype_Freq[1] <- p^2
Genotype_Freq[2] <- 2*p*q
Genotype_Freq[3] <- 2*p*r
Genotype_Freq[4] <- 2*q*r
Genotype_Freq[5] <- q^2
Genotype_Freq[6] <- r^2
# Print allelic frequencies
cat("Observed allelic frequencies:", "\n", "f(J) = ", p, "\n", "f(K) = ", q, "\n","f(L) = ", r, "\n")
cat("\n")
#i Print expected Hardy-Weinberg genotypic frequencies
cat("Expected Hardy-Weinberg genotypic frequencies:", "\n", "H-W f(JJ) = ", Genotype_Freq[1], "\n", "H-W f(JK) = ", Genotype_Freq[2], "\n", "H-W f(JL) = ", Genotype_Freq[3], "\n", "H-W f(KL) = ", Genotype_Freq[4], "\n", "H-W f(KK) = ", Genotype_Freq[5], "\n", "H-W f(LL) = ", Genotype_Freq[6], "\n")
cat("\n")
# Return the output vector
return(Genotype_Freq)
}
For the two-allele example, we started with these data:
Genotype | AA | AB | BB | Sum |
---|---|---|---|---|
Number of individuals | 75 | 20 | 100 | 200 |
First, we use AlleleFreq_2A
to get the initial genotypic
and allelic frequencies:
## Observed genotypic frequencies:
## freq(AA) = 0.3846154
## freq(AB) = 0.1025641
## freq(BB) = 0.5128205
##
## Observed allelic frequencies:
## freq(A) = 0.4358974
## freq(B) = 0.5641026
## [1] 0.4358974 0.5641026
Next we use the calculated allelic frequencies to plug into
HardyWeinberg_2A
to get the expected genotypic frequencies
for Hardy-Weinberg equilibrium:
## Observed allelic frequencies:
## f(A) = 0.4358974
## f(B) = 0.5641026
##
## Expected Hardy-Weinberg genotypic frequencies:
## H-W f(AA) = 0.1900065
## H-W f(AB) = 0.4917817
## H-W f(BB) = 0.3182117
## [1] 0.1900065 0.4917817 0.3182117
For the three-allele example, we started with these data:
Genotype | JJ | JK | JL | KL | KK | LL | Sum |
---|---|---|---|---|---|---|---|
Number Of Individuals | 10 | 11 | 0 | 9 | 2 | 22 | 54 |
First, we use AlleleFreq_3A
to get the initial genotypic
and allelic frequencies:
## Observed genotypic frequencies:
## freq(JJ) = 0.1851852
## freq(JK) = 0.2037037
## freq(JL) = 0
## freq(KL) = 0.1666667
## freq(KK) = 0.03703704
## freq(LL) = 0.4074074
##
## Observed allelic frequencies:
## freq(J) = 0.287037
## freq(K) = 0.2222222
## freq(L) = 0.4907407
## [1] 0.2870370 0.2222222 0.4907407
Next we use the calculated allelic frequencies to plug into
HardyWeinberg_3A
to get the expected genotypic frequencies
for Hardy-Weinberg equilibrium:
## Observed allelic frequencies:
## f(J) = 0.287037
## f(K) = 0.2222222
## f(L) = 0.4907407
##
## Expected Hardy-Weinberg genotypic frequencies:
## H-W f(JJ) = 0.08239024
## H-W f(JK) = 0.127572
## H-W f(JL) = 0.2817215
## H-W f(KL) = 0.218107
## H-W f(KK) = 0.04938271
## H-W f(LL) = 0.2408264
## [1] 0.08239024 0.12757199 0.28172148 0.21810696 0.04938271 0.24082643
To do this more elegantly and take advantage of the full power of R, we can chain these two functions together, so that the output from the allele frequency calculation forms the input for the Hardy-Weinberg calculation:
# Chaining functions together to get allelic frquencies and Hardy-Weinberg expected genotypic frequencies:
HardyWeinberg_2A(AlleleFreq_2A(x=c(AA=75, AB=20, BB=100)))
## Observed genotypic frequencies:
## freq(AA) = 0.3846154
## freq(AB) = 0.1025641
## freq(BB) = 0.5128205
##
## Observed allelic frequencies:
## freq(A) = 0.4358974
## freq(B) = 0.5641026
##
## Observed allelic frequencies:
## f(A) = 0.4358974
## f(B) = 0.5641026
##
## Expected Hardy-Weinberg genotypic frequencies:
## H-W f(AA) = 0.1900066
## H-W f(AB) = 0.4917817
## H-W f(BB) = 0.3182117
## [1] 0.1900066 0.4917817 0.3182117
## Observed genotypic frequencies:
## freq(JJ) = 0.1851852
## freq(JK) = 0.2037037
## freq(JL) = 0
## freq(KL) = 0.1666667
## freq(KK) = 0.03703704
## freq(LL) = 0.4074074
##
## Observed allelic frequencies:
## freq(J) = 0.287037
## freq(K) = 0.2222222
## freq(L) = 0.4907407
##
## Observed allelic frequencies:
## f(J) = 0.287037
## f(K) = 0.2222222
## f(L) = 0.4907407
##
## Expected Hardy-Weinberg genotypic frequencies:
## H-W f(JJ) = 0.08239026
## H-W f(JK) = 0.127572
## H-W f(JL) = 0.2817215
## H-W f(KL) = 0.218107
## H-W f(KK) = 0.04938272
## H-W f(LL) = 0.2408265
## [1] 0.08239026 0.12757202 0.28172154 0.21810700 0.04938272 0.24082647
Here is a short function that takes as input the initial frequency of the mutant allele, the mutation rate, and a vector of times. It returns a vector of the frequency of the mutant allele at each time point, which can then be used in a simple plot.
# FUNCTION to calculate the increase in the frequency of a mutant allele through time
# NJG
# 21 November 2015
Mutation <- function(qo=0.5,u=0.000001,t=1:10) {
qt = 1 - (1 - qo)*exp(-u*t)
return(qt)
}
## [1] 0.5000005 0.5000010 0.5000015 0.5000020 0.5000025 0.5000030 0.5000035
## [8] 0.5000040 0.5000045 0.5000050
Here is a function that takes as input the initial frequency of the allele in the resident population, the frequency of the allele in the migrant population, the fraction of the population each generation that consists of migrants, and the number of time steps from one generation to the next. The output is the frequency of the allele in the resident population at each time step.
# FUNCTION to calculate the change in allele frequency from migration
# NJG
# 21 November 2015
Migration <- function(p0=0.5, pm=0.9, m=0.1, t=1:10){
pt <- (1 - m)^t * (p0 - pm) + pm
return(pt)
}
## [1] 0.1800000 0.2520000 0.3168000 0.3751200 0.4276080 0.4748472 0.5173625
## [8] 0.5556262 0.5900636 0.6210572
Here is a function that takes as input the initial frequency of one of the two alleles and the inbreeding coefficient F. The output is the expected frequency of the three genotypes with inbreeding.
# FUNCTION to calculate the change in allele frequency from inbreeding
# NJG
# 21 November 2015
Inbreeding <- function(p=0.3, F = 0.5){
genotypes <- vector("numeric",3)
q <- 1 - p
genotypes[1] <- p^2*(1 - F) + p*F
genotypes[2] <- 2*p*q*(1- F)
genotypes[3] <- q^2*(1 - F) + q*F
return(genotypes)
}
## [1] 0.195 0.210 0.595
Here is a function that takes as input a series of sequential population sizes. The output is the effective population size, which in this case is the harmonic mean of the population sizes.
# FUNCTION to calculate effective population size with a bottleneck
# NJG
# 21 November 2015
Bottleneck <- function(N=1:5){
Ne <- 1/((1/length(N))*(sum(1/N)))
return(Ne)
}
## [1] 2.189781
Here is a function that takes as input the number of males (m) and females (f) in the population and returns the effective population size.
# FUNCTION to calculate effective population size with a skewed sex ratio
# NJG
# 21 November 2015
SexRatio <- function(m=10, f=12){
Ne <- (4*m*f)/(m + f)
return(Ne)
}
## [1] 21.81818
Here is a function that takes as input the population density (d) and the dispersal distance (x) and returns the effective population size.
# FUNCTION to calculate effective population size with limited dispersal
# NJG
# 21 November 2015
NatalDispersal <- function(d=10, x=1){
Ne <- 4*pi*d*x
return(Ne)
}
## [1] 125.6637
Here is a function that takes as input the probability p of a single event and the number of independent trials n. It returns the probability of at least one event occurring among the set of n trials.
# FUNCTION to calculate probability of at least one occurrence with individual probability p and number of trials n
# NJG
# 21 November 2015
CompoundProb <- function(p=0.01, n=52){
Prob <- 1 - (1 - p)^n
return(Prob)
}
## [1] 0.4070336
# FUNCTION to Calculate and Print 7 steps of Natural Selection Equations
# NJG
# 21 November 2015
SevenSteps <- function(gen=c(50,50,100), w=c(0.4, 0.35, 0.2)){
# Step 1: Given Initial Genotype Counts AND Relative Fitness
w <- w/max(w)
cat("Step 1: Given Initial Genotype Counts AND Relative Fitness","\n")
cat(" #(AA) = ",gen[1], " #(AB) = ",gen[2]," #(BB) = ",gen[3],"\n")
cat(" w1 = ",w[1], " w2 = ",w[2]," w3 = ",w[3],"\n")
cat("\n")
# Step 2: Calculate Initial Genotype And Allelic Frequencies (p0, q0)
gen <- gen/sum(gen)
cat("Step 2: Calculate Initial Genotype And Allelic Frequencies (p0, q0)", "\n")
cat(" f(AA) = ",gen[1], " f(AB) = ",gen[2]," f(BB) = ",gen[3],"\n")
p0 = gen[1] + 0.5*gen[2]
q0 = gen[3] + 0.5*gen[2]
cat(" f(A) = ",p0, " f(B) = ",q0,"\n")
cat("\n")
# Step 3: Calculate Genotype Frequencies AFTER Random Mating
gen[1] <- p0^2
gen[2] <- 2*p0*q0
gen[3] <- q0^2
cat("Step 3: Calculate Genotype Frequencies AFTER Random Mating", "\n")
cat(" f(AA) = ",gen[1], " f(AB) = ",gen[2]," f(BB) = ",gen[3],"\n")
cat("\n")
# Step 4: Calculate Genotype Frequencies AFTER Selection
gen <- gen*w
wbar <- sum(gen)
cat("Step 4: Calculate Genotype Frequencies AFTER Selection", "\n")
cat(" f(AA) = ",gen[1], " f(AB) = ",gen[2]," f(BB) = ",gen[3],"\n")
cat("Mean fitness = ", wbar, "\n")
cat("\n")
# Step 5: Normalize Genotype Frequencies
gen <- gen/wbar
cat("Step 5: Normalize Genotype Frequencies", "\n")
cat(" f(AA) = ",gen[1], " f(AB) = ",gen[2]," f(BB) = ",gen[3],"\n")
cat("\n")
# Step 6: Calculate New Allelic Frequencies
p1 = gen[1] + 0.5*gen[2]
q1 = gen[3] + 0.5*gen[2]
cat("Step 6: Calculate New Allelic Frequencies", "\n")
cat(" f(A) = ",p1, " f(B) = ",q1,"\n")
cat("\n")
# Step 7: Calculate New Genotype Frequencies AFTER Random Mating
gen[1] <- p1^2
gen[2] <- 2*p1*q1
gen[3] <- q1^2
cat("Step 7: Calculate New Genotype Frequencies AFTER Random Mating", "\n")
cat(" f(AA) = ",gen[1], " f(AB) = ",gen[2]," f(BB) = ",gen[3],"\n")
cat("\n")
}
## Step 1: Given Initial Genotype Counts AND Relative Fitness
## #(AA) = 50 #(AB) = 50 #(BB) = 100
## w1 = 1 w2 = 0.875 w3 = 0.5
##
## Step 2: Calculate Initial Genotype And Allelic Frequencies (p0, q0)
## f(AA) = 0.25 f(AB) = 0.25 f(BB) = 0.5
## f(A) = 0.375 f(B) = 0.625
##
## Step 3: Calculate Genotype Frequencies AFTER Random Mating
## f(AA) = 0.140625 f(AB) = 0.46875 f(BB) = 0.390625
##
## Step 4: Calculate Genotype Frequencies AFTER Selection
## f(AA) = 0.140625 f(AB) = 0.4101562 f(BB) = 0.1953125
## Mean fitness = 0.7460938
##
## Step 5: Normalize Genotype Frequencies
## f(AA) = 0.1884817 f(AB) = 0.5497382 f(BB) = 0.2617801
##
## Step 6: Calculate New Allelic Frequencies
## f(A) = 0.4633508 f(B) = 0.5366492
##
## Step 7: Calculate New Genotype Frequencies AFTER Random Mating
## f(AA) = 0.214694 f(AB) = 0.4973137 f(BB) = 0.2879924
# FUNCTION Fisher engine to calculate changes in allelic frequency with selection in each generation
# NJG
# 21 November 2015
FisherEngine <- function(t=20,p0=0.1,w=c(1,1,0.5)){
# Create vectors for storing pi, wbar, and the 3 genotypes
pvec <- vector(mode="numeric", length=(t + 1))
wbar <- vector(mode="numeric", length=(t))
gen <- vector(mode="numeric", length=3)
# Loop through the selection random mating calculations
pvec[1] <- p0
for (i in 2:(t + 1)){
gen[1] <- pvec[i-1]^2*w[1]
gen[2] <- 2*(1 - pvec[i - 1])*pvec[i - 1]*w[2]
gen[3] <- (1 - pvec[i - 1])^2*w[3]
wbar[i-1] <- sum(gen)
gen <- gen/wbar[i-1]
pvec[i] <- gen[1] + 0.5*gen[2]
}
# Graph p and wbar as a function of time
par(mfrow=c(1,2))
plot(x=1:(t+1),y=pvec,xlab="Generation",ylab="p",type="l", ylim=c(0,1),las=1)
grid()
plot(x=1:(t),y=wbar,xlab="Generation",ylab="Mean Fitness",type="l", ylim=c(0,1), las=1)
grid()
return(list(pvec,wbar))
}
## [[1]]
## [1] 0.1000000 0.1680672 0.2570056 0.3549900 0.4482305 0.5287138 0.5947657
## [8] 0.6479688 0.6907710 0.7254560 0.7538671 0.7774156 0.7971628 0.8139061
## [15] 0.8282476 0.8406466 0.8514574 0.8609558 0.8693596 0.8768421 0.8835428
##
## [[2]]
## [1] 0.5950000 0.6539439 0.7239796 0.7919811 0.8477752 0.8889447 0.9178926
## [8] 0.9380370 0.9521887 0.9623128 0.9697093 0.9752281 0.9794285 0.9826845
## [15] 0.9852506 0.9873033 0.9889675 0.9903334 0.9914665 0.9924161
# FUNCTION For Calculating Heritability From A Selective Breeding Experiment
# 22 November
# NJG
Heritability <- function(x=10,y=20,z=11){
SelectionDifferential <- y - x
cat("Selection Differential = ",SelectionDifferential,"\n")
cat("\n")
ResponseToSelection <- z - x
cat("Response To Selection = ",ResponseToSelection, "\n")
cat("\n")
h2 <- ResponseToSelection/SelectionDifferential
cat("Heritability = ",h2, "\n")
cat("\n")
}
## Selection Differential = 10
##
## Response To Selection = 1
##
## Heritability = 0.1
# FUNCTION for plotting results of common garden experiments
# 21 November 2015
# NJG
CommonGarden <- function(TraitMeans=c(40,20,70)){
Pop1 <- rnorm(15,TraitMeans[1],10)
Pop2 <- rnorm(15,TraitMeans[2],10)
Pop3 <- rnorm(15,TraitMeans[3],10)
PopData <- c(Pop1,Pop2,Pop3)
Treatment <- rep(c("P1","P2","P3"),each=15)
par(mar=c(6,6,4,2))
BoxPlotDataFrame <- data.frame(x=Treatment,y=PopData)
boxplot(y~x,data=BoxPlotDataFrame,ylab="Trait Value",sub="Donor Source",cex.sub=2,cex.axis=1.5, cex.lab=2,col="bisque")
}
# FUNCTION RecipPlot generates a plot of reciprocal transplant data
# NJG
# 21 November 2015
# The user specifies the mean trait values for each of the 4 treatments
# NJG
# 21 November 2015
RecipPlot <- function(ydata=c(10,20,30,5)){
# set margins and set up axis locations
par(mar=c(9,4,4,2))
xdata <- c(1,1,2,2)
# set up an empty plot
plot(x=xdata,y=ydata,xlim=c(0.5,2.5),ylim=c(min(ydata)-5,max(ydata)+5),ann=F,axes=F,type="n")
grid(nx=0,ny=10)
# add x axis and labels
axis(side=1,labels=c("Cold","Warm"),at=c(1,2),tick=T, cex.axis=1.5)
mtext("Donor Population",side=1,cex=2,line=4)
mtext("Trait Value",side=2,cex=2, line=1)
box()
# add lines and points
lines(c(1,2),ydata[c(1,2)])
lines(c(1,2),ydata[c(3,4)])
points(c(1,2),ydata[c(1,2)],cex=4,pch=21,bg="skyblue")
points(c(1,2),ydata[c(3,4)],cex=3,pch=22,bg="salmon")
# add legend
legend("topright",legend=c("Cold Recipient Site","Warm Recipient Site"),pch=c(21,22),pt.cex=2,pt.bg=c("skyblue","salmon"))
}
RecipPlot()