content practice a dna and genetics

Learning Objectives

After successfully completing this section, the student will be able to:

  • Explain how DNA encodes genetic information and the role of messenger RNA and transfer RNA.
  • Explain how DNA directs protein synthesis and the roles of DNA and proteins in regulating cell function.
  • Define transcription and translation.
  • Define the following terms:
  • Define and explain the significance of “crossing over” and “random assortment” during meiosis.
  • Explain how the number of chromosomes changes during male and female gametogenesis and fertilization.
  • Explain the biological basis of Down’s Syndrome.
  • Define the following terms:
  1. Homozygous
  2. Heterozygous
  3. Dominant
  4. Recessive
  5. Co-dominance
  6. Sex-linked inheritance
  • Demonstrate how to predict the possible genotypes that could occur in an offspring, provided one knows the genotype of the two parents.
  • Explain what a mutation is and give examples of how it might occur.
  • Explain the contribution of mutations to evolution.
  • Explain and give examples of natural selection.

Chromosomes Contain Our Genetic Code

All living organisms have one or more chromosomes that contain the code that directs the synthesis of proteins that are essential for its structure and function. In bacteria proteins can be structural and they can be enzymes that perform metabolic functions that can breakdown nutrients that provide energy and provide structural building blocks for growth and replication.

Each chromosome is, if fact, an enormous DNA molecule. Molecules are generally so small that they can’t be seen even with a microscope, but chromosomes can be seen with a microscope under certain circumstance, particularly when a cell is about to divide. The illustration below shows the 46 chromosomes that contain the human genome.

There are 22 homologous pairs and two sex chromosomes (the X and Y chromosomes). One chromosome in each pair is inherited from one’s mother and one from one’s father. Each chromosome is a single molecule of DNA. The illustration below illustrates this by imagining that we have grabbed one end of a chromosome and pulled it out to reveal that it is an extremely long polymer consisting of a double helix. In fact, if we were to take a single human chromosome and stretch it out, it would be about 5 centimeters long (about 2 inches), and all 46 chromosomes would be about 2 meters long if they were stretched out and laid end to end. Our cells have all 46 chromosomes, but they are coiled around proteins and highly coiled into the form of the chromosomes that are seen to the right. The chromosomes of eukaryotes are contained within the membrane-bound nucleus.

But DNA provides the essential genetic code for all living organisms, including bacteria. The bacterium E. coli has a single circular chromosome (DNA molecule) which is also coiled, supercoiled, and packaged with proteins, but in prokaryotes the chromosome is located in the cytoplasm instead of being contained in a membrane- bound nucleus.

Structure of DNA

DNA is an abbreviation for deoxyribonucleic acid, which is an extremely long polymer made from units called nucleotides. The illustration below shows the structure of both DNA and RNA (ribonucleic acid.)

The backbone of each molecule is composed of alternating sugars (the pentagon with the “S”) and phosphate groups (shown with “P), and each sugar is also covalently bonded to one of the following nucleotide bases:

  • adenine (A),
  • thymine (T),
  • cystosine (C)
  • guanine (G)
  • uracil (U)

A nucleotide “unit” (outlined by the red box in the illustration] consists of a sugar molecule, a phosphate, and one of the five. Consequently, one can think of DNA as an extremely long double-stranded polymer of nucleotides. Note also that the two strands of DNA are held together by hydrogen bonds between complementary bases on the two strands. The figure below demonstrates this complementarity. In DNA the base thymine always bonds to adenine, while cytosine always bonds to guanine because of their complementary chemical structure and “fit”. As a result of this complementary structure, if the base sequence of one strand is known, then the structure of the other strand can be deduced.

DNA and RNA differ in several ways:

  • DNA is double stranded, while RNA is single stranded (although RNA forms loops by hydrogen-bonding to itself).
  • DNA contains the sugar deoxyribose, while RNA has the sugar ribose.
  • RNA contains the base uracil in place of thymine.


Each of our cells has a complete set of our 46 chromosomes, i.e., our entire genome. Altogether our 46 chromosomes contain about 6 billion nucleotides, i.e., 3 billion base pairs. Each chromosome contains thousands of “genes.” The segments of DNA that contain genes (referred to as coding areas“) take up only 3-5% of our DNA; the rest of the DNA consists of non-coding areas .” Altogether our 23 pairs of chromosomes with their 3 billion base pairs carry the code for 20,000-25,000 genes. Most of the genes are transcribed into “messenger RNAs” (mRNA) that provide a template that is used to translate the code into specific proteins. However, about 100 genes are transcribed into “ribosomal RNAs” and “transfer RNAs” that also play a vital role in the synthesis of proteins, which will be described shortly.

Transcription and Translation of a Gene

The sequence of bases in DNA can be thought of as the “letters” that provide the basis for the genetic code for all of the proteins synthesized by our bodies, and these, in turn, provide the basis for the structure of all of our cells, all of our enzymes, and all of our inherited traits and characteristics. As noted above, the genetic code is contained in chromosomes which are gigantic molecules of DNA complexed with proteins and wound into a compact structure. Humans have 23 pairs of chromosomes, which carry our entire genome. In eukaryotes chromosomes are located in the cell nucleus, but prokaryotes (bacteria) have a more primitive cellular structure, and they do not have a true nucleus. Instead, the single bacterial chromosome is in the cytoplasm in an area sometimes referred to as the “nucleoid.” The production of cellular proteins requires two major processes.


First, cellular signals reaching the nucleus cause the TATA-binding protein to the starting point of a particular gene. Additional transcription factors then bind, and an enzyme called RNA polymerase II then binds to the complex. The the polymerase causes the strands of DNA to separate temporarily, and the enzyme synthesizes a strand of messenger RNA (mRNA) to using the the sequence of bases on one strand of DNA (the coding strand) to create a complementary strand of mRNA.

By complementary we mean that the base sequence on mRNA has bases that are the complementary pairs of those on DNA. guanine (G) dictates the insertion of its complement, cytosine (C), and cytosine dictates the insertion of guanine (G); thymine (T) dictates insertion of adenine (A) on mRNA and adenine dictates the insertion of uracil (U). [Note that RNA uses uracil in place of thymine.] Once the strand of mRNA has been created, it leaves the nucleus through pores in the nuclear membrane. The video below gives a fairly detailed picture of the process of transcription. .


Once the mRNA is in the cytoplasm, it binds to a ribosome, which is composed of protein and a different type of RNA called ribosomal RNA (rRNA). One can think of the ribosome as the work bench where protein is synthesized by covalently bonding amino acids in the sequence specified by the code on the mRNA.

How is the code translated?

One can think of the sequence of bases on mRNA as a series of code letters that are read as a series of three letter “words.”

For example, if mRNA had a sequence of bases such as

This sequence would, in effect, be read as a series of three-letter words referred to as “codons”, each of which specified the insertion of a specific amino acid. In the example just above, the codons or “words” would be:

Each of these three letter words specifies the insertion of one of the 20 amino acids that make up human proteins. The amino acids are shuttled to the ribosome by a family of transfer RNAs (tRNA), and there are specific tRNAs for each amino acid. The tRNAs consist of a single strand of RNA, but the strand tends to fold back on itself and create loops that are held in place by hydrogen bonds between segments of the tRNA as shown in the illustration below.

In the illustration above the base sequence CAT on DNA was transcribed to become the codon GUA on messenger RNA. The mRNA left the nucleus and attached to a ribosome where protein synthesis (translation) was initiated. Each codon on mRNA specified a particular amino acid to be added to the growing protein chain. In this example, the first four amino acids are designated as “AA1-AA2-AA3-AA4”. The next codon on mRNA was “GUA.” The complement to GUA is “CAU” which is the anticodon on a transfer RNA that carries the amino acid valine. The anticodon CAU on the tRNA for valine bonded to the GUA codon on mRNA. This positioned valine as the next amino acid in sequence, and with the addition of cellular energy (ATP), valine became covalently bonded to AA4 in the amino acid chain.

In the section above on transcription, we focused on creating the mRNA for a specific gene; those events took place in the cell nucleus. The figure below illustrates the subsequent events that take place after mRNA leaves the nucleus and attaches to a ribosome and initiates translation.

Sixty-one codons specify an amino acid, and the remaining three act as stop signals for protein synthesis. For example, in the figure below the codon UGA signals an end to synthesis of the protein. The code for all possible three-letter codons on mRNA is shown in the blue table below. Note that there is some redundancy in the code. For example, there are four separate codons for the amino acid proline. Nevertheless, the code is unambiguous, because no triplet codes for more than one amino acid. In addition, with only a few minor exceptions, the same code is universally found in viruses, bacteria, protists, plants, fungi, and animals.

Note that in the example below, UGA, is a signal to STOP, meaning that the amino acid chain is complete and no more amino acids are to be added. Bear in mind that these illustrations include just short sequences of codons, and an actual protein would generally have a much longer sequence. Nevertheless, these examples illustrated how the code is transcribed from DNA to mRNA and how the mRNA is then translated in order to specify the sequence of amino acids in a particular protein which is the product of that particular gene on a chromosome.

The figure above makes it clear that the order of the codons within a gene (a segment of DNA encoding for a specific protein) specifies the amino acid sequence in the protein. The start signal for protein synthesis is the codon AUG, which specifies incorporation of the amino acid methionine. When the mRNA attaches to a ribosome, enzymes look for the AUG codon, not only as a start signal, but also as a means of knowing exactly which is the first letter of each of the three-letter codons. For example, a messenger RNA might have the sequence of codons shown in the illustration above, i.e.,

However, if the signal to start was shifted by one nucleotide (e.g., starting at the first “U” instead of the “A”), the codons would be read as:

and this would result in synthesis of an very different sequence of amino acids. Errors in the sequence of amino acids can, in fact, result from mutations, as described below.

The video below gives a detailed summary of the events that occur during translation of the mRNA template to a protein.

This next video is an excellent illustration of transcription and translation, but it illustrates these in a way that provides a real-time approximation.

An Interesting Variation by HIV

The human immunodeficiency virus (HIV) is known as a retrovirus. It consists of a single strand (molecule) of RNA inside a protein coat. When HIV binds to a T lymphocyte it enters the lymphocyte and sheds its protein coat. A viral enzyme called reverse trancriptase then uses the strand of viral RNA as a template to create a molecule of DNA which can become incorporated into the DNA of the infected host cell. In this case, RNA is being used to create a molecule of DNA, and the process has been dubbed “reverse transcription.”

Mitochondrial DNA

Most of the DNA in eukaryotic cells is contained in the chromosomes within the membrane-bound nucleus, but the mitochondria also have small amounts of DNA (mitochrondrial DNA or mtDNA). As you will recall, mitochondria are membranous subcellular organelles within which there are chains of enzymes that generate cellular energy in the form of ATP (adenosine triphosphate) through a process called oxidative phosphorylation. In addition to their role in production of ATP, mitochondria also regulate apoptosis (programmed cell death) [for more on apoptosis see the section on Apoptosis in the module on the Biology of Cancer]. Mitochondria also play a role in the synthesis of cholesterol and heme (a component of hemoglobin, the oxygen carrying molecules in red blood cells). Mitochondrial DNA consists of 37 genes. Thirteen of these provide the genetic code for synthesizing the enzymes involved in oxidative phosphorylation, and the rest encode the transfer RNAs (tRNA) and ribosomal RNA (rRNA) required for synthesis of the enzymes for oxidative phosphorylation.

Inherited mutations in mitochondrial DNA can also cause a variety of problems with growth, development, and function throughput the body as a result of impaired ability to generate ATP. These conditions can produce muscle weakness and wasting, diabetes, kidney failure, heart disease, dementia, hearing loss, visual problems. In addition, mitochondria can also undergo somatic mutations (non-inherited) which may contribute to aging and age-related diseases.

Additional Resources for Mitochondrial DNA

  • Molecular Expressions, a web site from the Florida State University Research Foundation. Link to their illustrated introduction to mitochondria and mitochondrial .
  • Link to an overview of mitochondrial from the Neuromuscular Disease Center at Washington University.
  • The Howard Hughes Research Institute: Link to an article about recent research into mitochondrial .

For more information about conditions caused by mitochondrial DNA mutations:

  • Genetics Home Reference provides background information about mitochondria and mitochondrial DNA written in consumer-friendly language.
  • The Cleveland Clinic’s Introduction to mitochondrial .
  • An overview of mitochondrial from GeneReviews.

Proteins Perform Many Functions

The preceding pages describe how the genetic code in DNA specifies the assembly of proteins, which are polypeptides, meaning that their primary structure consists of a long linear assembly of amino acids. As the amino acids are being linked together into a growing polypeptide begins to fold as a result of the interactions among the amino acids. Different segments of a protein can assume different shapes: random coils, helices, and zig-zagging segments that form sheet-like structures; these secondary structures are then folded in ways that establish the tertiary structure, and these fold yet again to create the final quaternary structure, as shown in the graphic below.

Each protein has a specific shape that determines its function. The illustration below summarizes just some of the many functions carried out by proteins.

There are many thousands of enzymes that participate in synthesis and metabolism.

  • Antibodies are proteins that are specialized to bind to specific foreign proteins found on viruses and bacteria for example.
  • Proteins can act as receptor molecules to which signal molecules can bind to trigger intracellular events
  • Proteins act as signal molecules (e.g., insulin for glucose transport or gastrin which stimulates acid secretion in the stomach)
  • There are structural proteins (e.g., collagen and elastin)
  • Proteins provide the internal structure of all of our cells
  • Contractile proteins in skeletal muscles and in the smooth muscle cells in our gastrointestinal tract, respiratory tract, uro-genital tract, and our blood vessels
  • Some protein subunits self-associate into more elaborate structures that form conduits across cell membranes for transport of molecules in and out of a cell.
  • Proteins serve as shuttles to transport molecules and to transport oxygen
  • Proteins can be pigments, such as melanin in our skin

Genetic Variation: Mutations and SNPs

All humans have the same set of genes, and the sequence of our base pairs is remarkably similar. However, this doesn’t mean that we all have exactly the same nucleotide sequence in our genome. If this were the case, then all humans would be clones having exactly the same genetic code. The passage of the genetic code from generation to generation via the sperm and ova of our ancestors requires replication of DNA, and while replication is remarkably precise, errors occasionally occur and produce changes in the base sequence. It is these changes that produce the variation that makes each of us genetically unique (except for identical twins), and they also drive evolution of species


Mutations are random changes in the sequence of base pairs in DNA, and mutagens are factors that cause mutations, e.g., chemicals or radiation (UV light, x-rays, gamma radiation). Mutagens result four patterns of alteration in the base sequence:

  1. Replacement (substitution) of a single base pair
  2. Addition of one or more base pairs
  3. Deletion of one or more base pairs
  4. Relocation of a segment of base pairs

The effects of these alterations depend on several factors.

Substitution of One Base Pair

Substitution of one base pair for another produces the smallest change in the code, but it’s effects can range from none to massive depending on the details.

The example below shows an initial sequence of base pairs on the left, the mRNA transcript, and the sequence of amino acids that would result. The right portion shows that substitution of thymine (T) for adenine (A) changes the third base in the mRNA transcript, but the amino acid sequence is unchanged, because GUU and GUA are both codons for the amino acid valine. In fact, there are four codons that specify insertion of the amino acid valine: GUU, GUC, GUA, and GUG. If a normal gene had a GUU codon (specifying insertion of valine) at a particular place, changing the third base from uracil (U) to adenine (A) would still produce a protein that had the amino acid valine in the correct position. Consequently, the protein would be unchanged, and the mutation would be “silent.”

In other cases, however, substitution of a single base pair leads to a change in one of the amino acids in the protein product as shown below.

Once again, an adenine (A) in DNA has been replaced by a thymine (T), but in this case the codon in the mRNA changes from AGU, which is a codon for the amino acid serine to AGA , which codes for the amino acid arginine. Changing one amino acid in a protein may have no discernible effect on the protein’s function, but at times, changing even a single amino acid can have a massive effect. An example of this is seen with sickle cell anemia.

Sickle cell anemia is due to an inherited mutation in the gene that encodes for the β- chain of hemoglobin. Hemoglobin is a protein found in red blood cells which plays a key role in transporting oxygen from the lungs. It is composed of four protein subunits: two α- and two β-chains. In sickle cell anemia the key defect is caused by a mutation that replaces the hydrophilic amino acid glutamic acid (glutamate) with the hydrophobic amino acid valine at the sixth position of the β-chains. This causes the hemoglobin molecules to stick together when oxygen levels are low. As a result, the hemoglobin molecules form long fiber-like chains that distort the shaped of the red blood cells. Red blood cells are normally biconcave disks that flow easily through arteries and capillaries, but sickling causes red blood cells to stick to one another and occlude small arteries causing ischemia in various tissues and organs.

Link to more information on sickle cell disease.

Addition or Deletion of One Base Pair

In contrast to substitution of a single base, addition or deletion of a single base cause a substantial disruption of the sequence of amino acids in the protein product. This is because the mRNA transcript is read as three-letter codons, and insertion or deletion of a single base causes a frame shift in the sequence that throws off all of the downstream codons. Consider the following hypothetical sequence of DNA and the mRNA transcript it would produce and the final amino acid sequence. For illustration, I begin with triplets with the same base, e.g., AAA, TTT, etc.



Amino Acids — Phe — Gly– Lys–Pro–Pro –Lys –Gly –Lys–Phe–Phe–etc.

Insertion of a single base might result in the following:



Amino Acids — Phe– Trp- Glu–Thr –Pro – Gln -Arg-Glu–Ile –Phe– etc.

The insertion of a base advances all of the letters by one position, but the codons are still read as triplets, so the code is thrown off from the point of the insertion, and most of the amino acids are changed, although occasionally the original amino acid is retained by chance such as the Pro (proline) and the terminal Phe (phenylalanine) in the example above.

Deletion of a single base has the same effect, i.e., it causes a frame shift that changes all of the codons downstream from the point of the deletion.

The short video below illustrates the effects of substitution, insertion, and deletion of single base pairs.

Relocation of a Segment of Nucleotides

The other mechanism that has been described for mutation is the relocation of an entire segment of nucleotides. This is illustrated below starting with the same hypothetical sequence of bases in DNA.



Amino Acids — Phe — Gly– Lys–Pro–Pro –Lys –Gly –Lys–Phe–Phe–etc.

Relocation of a segment of bases might result in:



Amino Acids — Phe — Gly– Lys–Pro–Leu –Phe –Pro — Pro–Asn–Phe–etc.

This kind of relocation can have variable effects depending on the size of the relocated segment and whether or not it also cause frame shift errors.

Single Nucleotide Polymorphisms (SNPs)

We noted earlier that an error in replication of DNA can result in substitution of one base for another. For example, the human genome might have the following segment of base pairs in double-stranded DNA:

However, a substitution of a single base pair (changing the bold-faced TA pair to GC) might result in a sequence that read as follows:

We saw earlier that if a substitution like this occurred within a gene, i.e., within a coding area, it could cause a disease like sickle cell anemia, but it is also possible that it might have no discernible effect. Remember, however that coding areas (genes) occupy only 3-5% of our DNA, and substitutions like occur more commonly in the much more extensive non-coding areas between the genes. The accumulation of these random substitutions over time has resulted in many small differences in the sequence of base pairs from person to person. These small differences are found only about once every 300 base pairs on average. However, since we have 3 billion base pairs, this means that there are about 10 million of these small differences in our genome. These small variations have been dubbed “single nucleotide polymorphisms” or SNPs (pronounced “snips”). Each SNP is a difference in a single base in DNA.

In the discussion of mutations we saw that substitution of a single base (a SNP) within a coding area may or may not have an effect on phenotype. In contrast, SNPs occurring in non-coding areas of the genome have no effect on phenotype (characteristics). Nevertheless, each person has a unique SNP pattern, however, and this is potentially useful in a number of ways.

  • In order to look for genes that influence the likelihood of a disease, one can take blood samples from a group of individuals with the disease disease and look for similarities in the their SNP patterns. If diseased individuals are found to have a particular SNP, one can then look for differences in SNP patterns in diseased and non-diseased individuals to see if there is an association. Such a study is really just a particular type of case-control study. As a result, it is sometimes possible to establish certain SNP profiles that are associated with a greater likelihood of certain diseases.
  • SNPs that may help predict an individual’s response to certain drugs, their susceptibility to environmental factors such as toxins.
  • SNPs can be used for forensic analysis to match DNA samples.
  • SNPs can also determine how closely two individuals are related.

For more information about SNPs go to the two following links:



The sequence of bases in the human genome is remarkably similar from person to person, but over hundreds of thousands of years of evolution SNPs and other mutations have been introduced into the human gene pool. Some of these mutations produce alterations in gene products that are fatal, and these mutations are extinguished. However, other mutations in germ cells (sperm and eggs) can be passed along from generation to generation, and they provide the basis for the many variations in phenotype that make each of us unique. Over time, mutations have created variants of genes that are responsible for differences in the color of our hair, our eyes, and our skin. Mutations influence our intelligence, our height, our weight, our personalities, our blood pressure, our cholesterol levels, and how fast we can run. Mutations have introduced gene variants that encode for slightly different proteins, which in turn, influence all aspects of our phenotype. It is important to emphasize an individual’s phenotype is not solely the result of their genome; instead, phenotype is the result of the interaction between and individual’s genome and their environment from the time of conception until death.

When SNPs and other mutations create variants or alternate types of a particular gene, the alternative gene forms are referred to as alleles. In other words, a given gene can have multiple alleles (i.e., alternate forms). Some genes have just a few alleles, but others have many.

Autosomes, and Sex Chromosomes

Recall also that chromosomes come in pairs. Humans have 22 pairs of autosomal chromosomes with the same gene in both members of a given pair) and one pair of sex chromosomes, which are designated XX in females and XY in males. The X and Y chromosomes are physically different from one another in that the Y chromosome is much shorter, and the Y chromosome only has about nine gene loci that match those on the X chromosome. This means that, except for the genes on an XY pair of chromosomes, we have two copies of each gene – one from each of our parents. The alleles that we receive from each parent might be the same (homozygous) or they might differ (heterozygous). The figure below schematically depicts a pair of chromosomes and shows three hypothetical genes: hair color, body height, and multiple lipoma formation.

Since there are two copies of each gene, there are two alleles, which may be the same or different. The figure below shows a hypothetical example in which there is an allele for red hair on one chromosome and an allele for brown hair on the other.

(Note that there may be many alleles for some genes, but normally we each have two alleles for each gene on our autosomes. Note also that in the hypothetical illustration to the right the alleles for the multiple lipoma trait are also different.

The obvious question that arises is, what happens when the two alleles that are present differ? What will the phenotype be? The answer depends on whether one allele is dominant over the other.

Dominant and Recessive Alleles

A dominant allele is one that is expressed to a greater degree than the other allele that is present. For example, one possible scenario for the differing lipoma alleles is shown below.

What about another scenario in which the mom is heterozygous and the dad is homozygous recessive?

Mom is homozygous for the multiple lipoma trait (designated as “LL”), while Dad is homozygous for the absence of lipomas (designated “ll”). Mom can only contribute an “L” allele to her offspring, and Dad can only contribute the “l” allele, so all of their children will be heterozygous (“Ll”). In this particular case, heterozygous “Ll” individuals will all have multiple lipomas, because the multiple lipoma allele is dominant, while the alternate “l” allele is recessive.


For some alleles there is no dominance, and phenotype results from both alleles being expressed or from a blending of phenotype. The expression is an “average” or combination of the two traits.

Example: Major blood type in humans.

In humans, for example, there is a specific gene that codes for the protein that determines an individual’s major blood type, which can be A, B, AB, or O. This is determined by a single gene that has three alleles that can code for:

  • the A antigen on red blood cells
  • the B antigen on red blood cells
  • no major blood antigen on red blood cells

While there are three alleles, each of us has just two of them, so the possible combinations and the resulting blood types are those shown in the table below.

Phenotype (Blood Type)

Mutations Drive Evolution

The 3 billion base pairs in the human genome are remarkably similar from one person to another, but over tens of thousands of years random mutations in genes have introduced many variants. We saw previously that mutations can result in

  • inconsequential changes which do not alter the protein product
  • small changes that alter the protein product to some degree, e.g., an enzyme that is somewhat more efficient in catalyzing a biochemical reaction
  • small changes that alter phenotype markedly, e.g., the single base substitution that results in sickle cell disease
  • very large changes in the base sequence that arise from insertion or deletion of a base pair or relocation of a segment of nucleotides

Depending on the function of the gene and the magnitude of change, a mutation may or may not be compatible with life. A fetus with a critical mutation may be unable to survive, leading to a miscarriage (spontaneous abortion). Non-fatal mutations can result in protein alterations that alter characteristics like hair or eye color, or they can produce proteins that function better or worse than the usual protein. All of these accumulated differences in genes are what distinguish one person from another and make each of us unique. (Identical twins are born with exactly the same genome, but a host of environmental and epigenetic factors produce differences even in identical twins.)

These occasional random mutations are responsible for driving the evolution of species by sometimes conferring a survival advantage. Consider two brief examples:

  1. When penicillin was first introduced, Staphylococci infections were quickly defeated by small doses, but at some point in time the bacterial DNA underwent a mutation that made a particular Staphylococcus resistant to penicillin. Treatment with penicillin killed at of the other Staphylococci, which were sensitive, but the resistant bacterium survived and continued to divide giving rise to a growing colony of resistant bacteria. These mutant bacteria had a survival advantage, not only because they were resistant to penicillin, but also because they didn’t have to compete with the sensitive bacteria for space and nutrients.
  2. The light-colored pepper moth was the predominant form in England prior to the industrial revolution, but a dark-colored peppered moth appeared during the Industrial Revolution, and by 1895 98% of the moths in the industrialized areas of England were dark. Coal was the primary energy source during the industrial revolution, and the enormous increase in its use resulted in huge dark, sooty clouds in and around London and Manchester. These clouds were extensive enough to block sunlight, and they also deposited black coal dust on the bark of trees. This apparently put light-colored pepper moths at a disadvantage, because the black bark made them more visible to birds which fed on the moths. It is believed that at some point a mutation occurred that resulted in dark-colored pepper moths, and these had a survival advantage, because they blended in with the dark, soot stained bark. These dark moths had a distinct survival advantage in this environment, and on and Manchester. Tree bark became blackened, and the dark moth was more difficult for birds to see, giving it a survival advantage. The dark moths were less likely to be eaten and therefore they had more opportunity to reproduce compared to the light moths. As a result of this survival advantage, 98% of the moths in London and Manchester were dark-colored by 1895. Since then, reduced use of coal has slowly resulted in lighter bark, and the light-colored moths are now found in greater numbers.

Natural Selection

Charles Darwin postulated that if a mutation confers some advantage (e.g., resistance to penicillin, or better camouflage, or faster muscles, etc.) than those organisms will be better able to compete in an environment with limited resources or other environmental pressures (such as penicillin or predatory moths). These “fitter” organisms will therefore have more opportunity to thrive and reproduce, and their numbers will increase. As a result, the frequency of the mutant gene will increase in the organisms in the particular environment that gives those with the mutation an advantage.

Darwin’s Postulates

  1. With limited resources, reproduction generates more individuals than are able to survive and reproduce.
  2. The disparity between resources and numbers generated by reproduction creates a competition for survival.
  3. Variants with advantageous features are more likely to survive and reproduce.
  4. Variants with survival advantages pass their traits to their offspring.

The Prevalence of Sickle Cell Trait

Having an allele for the sickle cell form of hemoglobin would seem to be a bad thing, but that depends on both the genotype and the environment. The gene for sickle cell disease follows a Mendelian pattern of inheritance (described briefly on page 11). The allele for sickle cell hemoglobin is designated HbS, and there are several genotypes that are possible:

  • HbA-HbA: normal
  • HbA-HbS: heterozygous; have sickle cell “trait”
  • HbS-HbS: homozygous HbS; have sickle cell disease

Those who are homozygous for HbS have significant health problems and poor outcomes, but those who are heterozygous only have clinical manifestations under certain circumstances when the oxygen concentrations in blood dip, e.g., in high altitudes or with heavy physical activity. Without these stresses, heterozygous persons function normally. In fact, heterozygous HbA-HbS individuals have an advantage in locations where malaria is endemic, because their red blood cells are fragile and tend to lyse (fall apart), when infected with malaria. As a result, heterozygous individuals have a survival advantage in areas where malaria is endemic, and given this survival advantage the sickle cell allele tends to persist in environment where malaria is endemic as demonstrated by the graphic below which shows the prevalence of the sickle cell allele in the top panel and prevalence of malaria below.

Source: Piel et al.: Nature Communications1:104, 2010

Control of Gene Expression

By gene expression we mean the transcription of a gene into mRNA and its subsequent translation into protein. Gene expression is primarily controlled at the level of transcription, largely as a result of binding of proteins to specific sites on DNA. In 1965 Francois Jacob, Jacques Monod, and Andre Lwoff shared the Nobel prize in medicine for their work supporting the idea that control of enzyme levels in cells is regulated by transcription of DNA. occurs through regulation of transcription, which can be either induced or repressed. These researchers proposed that production of the enzyme is controlled by an “operon,” which consists a series of related genes on the chromosome consisting of an operator, a promoter, a regulator gene, and structural genes.

  • The structural genes contain the code for the proteins products that are to be produced. Regulation of protein production is largely achieved by modulating access of RNA polymerase to the structural gene being transcribed.
  • The promoter gene doesn’t encode anything; it is simply a DNA sequence that is initial binding site for RNA polymerase.
  • The operator gene is also non-coding; it is just a DNA sequence that is the binding site for the repressor.
  • The regulator gene codes for synthesis of a repressor molecule that binds to the operator and blocks RNA polymerase from transcribing the structural genes.

The operator gene is the sequence of non-transcribable DNA that is the repressor binding site. There is also a regulator gene, which codes for the synthesis of a repressor molecule hat binds to the operator

  • Example of Inducible Transcription: The bacterium E. coli has three genes that encode for enzymes that enable it to split and metabolize lactose (a sugar in milk). The promoter is the site on DNA where RNA polymerase binds in order to initiate transcription. However, the enzymes are usually present in very low concentrations, because their transcription is inhibited by a repressor protein produced by a regulator gene (see the top portion of the figure below). The repressor protein binds to the operator site and inhibits transcription. However, if lactose is present in the environment, it can bind to the repressor protein and inactivate it, effectively removing the blockade and enabling transcription of the messenger RNA needed for synthesis of these genes (lower portion of the figure below).
  • Example of Repressible Transcription: E. coli need the amino acid tryptophan, and the DNA in E. coli also has genes for synthesizing it. These genes generally transcribe continuously since the bacterium needs tryptophan. However, if tryptophan concentrations are high, transcription is repressed (turned off) by binding to a repressor protein and activating it as illustrated below.

Control of Gene Expression in Eukaryotes

Eukaryotic cells have similar mechanisms for control of gene expression, but they are more complex. Consider, for example, that prokaryotic cells of a given species are all the same, but most eukaryotes are multicellular organisms with many cell types, so control of gene expression is much more complicated. Not surprisingly, gene expression in eukaryotic cells is controlled by a number of complex processes which are summarized by the following list.

  • After fertilization, the cells in the developing embryo become increasingly specialized, largely by turning on some genes and turning off many others. Some cells in the pancreas, for example, are specialized to synthesize and secrete digestive enzymes, while other pancreatic cells (β-cells in the islets of Langerhans) are specialized to synthesis and secrete insulin. Each type of cell has a particular pattern of expressed genes. This differentiation into specialized cells occurs largely as a result of turning off the expression of most genes in the cell; mature cells may only use 3-5% of the genes present in the cell’s nucleus.
  • Gene expression in eukaryotes may also be regulated through by alterations in the packing of DNA, which modulates the access of the cell’s transcription enzymes (e.g., RNA polymerase) to DNA. The illustration below shows that chromosomes have a complex structure. The DNA helix is wrapped around special proteins called histones, and this are wrapped into tight helical fibers. These fibers are then looped and folded into increasingly compact structures, which, when fully coiled and condensed, give the chromosomes their characteristic appearance in metaphase.
  • Similar to the operons described above for prokaryotes, eukaryotes also use regulatory proteins to control transcription, but each eukaryotic gene has its own set of controls. In addition, there are many more regulatory proteins in eukaryotes and the interactions are much more complex.
  • In eukaryotes transcription takes place within the membrane-bound nucleus, and the initial transcript is modified before it is transported from the nucleus to the cytoplasm for translation at the ribosome s. The initial transcript in eukaryotes has coding segments (exons) alternating with non-coding segments (introns). Before the mRNA leaves the nucleus, the introns are removed from the transcript by a process called RNA splicing (see graphic & video below), and extra nucleotides are added to the ends of the transcript; these non-coding “caps” and “tails” protect the mRNA from attack by cellular enzymes and aid in recognition by the ribosomes.
  • Variation in the longevity of mRNA provides yet another opportunity for control of gene expression. Prokaryotic mRNA is very short-lived, but eukaryotic transcripts can last hours, or sometimes even weeks (e.g., mRNA for hemoglobin in the red blood cells of birds).
  • The process of translation offers additional opportunities for regulation by many proteins. For example, the translation of hemoglobin mRNA is inhibited unless iron-containing heme is present in the cell.
  • There are also opportunities for “post-translational” controls of gene expression in eukaryotes. Some translated polypeptides (proteins) are cut by enzymes into smaller, active final products. as illustrated in the figure below which depicts post-translational processing of the hormone insulin. Insulin is initially translated as a large, inactive precursor; a signal sequence is removed from the head of the precursor, and a large central portion (the C-chain) is cut away, leaving two smaller peptide chains which are then linked to each other by disulfide bridges.The smaller final form is the active form of insulin.
  • Gene expression can also be modified by the breakdown of the proteins that are produced. For example, some of the enzymes involved in cell metabolism are broken down shortly after they are produced; this provides a mechanism for rapidly responding to changing metabolic demands.
  • Gene expression can also be influenced by signals from other cells. There are many examples in which a signal molecule (e.g., a hormone) from one cell binds to a receptor protein on a target cell and initiates a sequence of biochemical changes (a signal transduction pathway) that result in changes within the target cell. These changes can include increased or decreased transcription as illustrated in the figure below.
  • The RNA Interference system (RNAi) is yet another mechanism by which cells control gene expression by shutting off translation of mRNA. RNAi can also be used to shut down translation of viral proteins when a cell is infected by a virus. The RNAi system also has the potential to be exploited therapeutically.

Some RNA virus will invade cells and introduce double-stranded RNA which will use the cells machinery to make new copies of viral RNA and viral proteins. The cell’s RNA interference system (RNAi) can prevent the viral RNA from replicating. First, an enzyme nicknamed “Dicer” chops any double-stranded RNA it finds into pieces that are about 22 nucleotides long. Next, protein complexes called RISC (RNA-induced Silencing Complex) bind to the fragments of double-stranded RNA, winds it, and then releases one of the strands, while retaining the other. The RISC-RNA complex will then bind to any other viral RNA with nucleotide sequences matching those on the RNA attached to the complex. This binding blocks translation of viral proteins at least partially, if not completely. The RNAi system could potentially be used to develop treatments for defective genes that cause disease. The treatment would involve making a double-stranded RNA from the diseased gene and introducing it into cells to silence the expression of that gene. For an illustrated explanation of RNAi, see the short, interactive Flash module at

The RNA interference system is also explained more completely in the video below from Nature Video.


Our genome is established when fertilization takes place, and the code remains unchanged throughout our life, except for mutations that may occur in individual cells. Nevertheless, the previous page outlined many internal mechanisms that operate to control the expression of specific genes. In addition, we now know that many external factors (epigenetics) can affect the timing of the gene expression, the degree of expression, and the eventual phenotype that is expressed. These external factors can produce small modifications to DNA, such as addition of metal ions, addition or removal of acetyl groups or methyl groups to DNA or to the histones that control the wrapping and packing of DNA. Attachment of methyl groups appears to reduce transcription or even shut it off; attachment of acetyl groups to histones turns genes on. These biological changes to the genome is known as ‘epigenetic factors’, i.e., changes occurring above the level of the genome.

The methyl group

The acetyl group

Modification of DNA by the methyl group:

In essence, the DNA in our cells provide the code for making functional proteins, and the epigenetic factors act as switches which turn genes off and on. Epigenetic factors are likely to play many important roles, such as:

  • Turning genes off and on as differentiation proceeds in a growing embryo. As a result, some cells in the pancreas become specialized to produce digestive enzymes, while others synthesis the hormone insulin.
  • Epigenetic factors in utero may have a lasting influence on phenotype years later. If a pregnant agouti mouse is fed a folate enriched diet that provides methyl groups, it’s offspring will have dark coats and be lean and healthy, because a particular gene is turned on; if the diet is poor in methyl groups, the offspring will be obese and light coats. In the winter of 1944-45 the German army set up a blockade to prevent food and fuel from reaching the western part of Belgium, reducing food consumption to exceptionally low levels for the winter. Fifty years later researchers compared the children who were conceived to starved mothers during that period to mothers who were already in their second or third trimester when the blockade began. They found that the offspring of the starved mothers weighed 14 pounds more on average, had waist circumferences that were 1.5 inches greater, and were more likely to have developed coronary heart disease.
  • Certain genes are know to predispose to certain types of cancer, such as breast and colon cancer. If is tantalizing to think about the possibility of turning such genes off epigenetically in order to reduce the risk of developing these cancers.

Epigenetics and the Influence of Our Genes

This is an 18:40 min video from TEDxOU by Courtney Griffin that provides an excellent explanation of the interaction among:

  • nature
  • nurture
  • epigenetics

In summary, all of our traits and characteristics (our phenotype) are the result of an interaction between our genome (all of the genes we inherit) and environmental factors. Some environmental factors (including our diet, our behaviors, and a myriad of environmental exposures) influence our phenotype through non-genetic mechanisms. For example, one might have a number of genes that predispose an individual to being lean; however, such an individual might still become overweight or obese despite their “lean genes” as a result of chronically overeating. Yet other epigenetic factors from the environment can modify the genome in subtle ways without actually changing the code.

For more information on epigenetics explore the following web site:

Pink hydrangeas can be made to turn blue by adding aluminum sulfate to the soil.


Binary Fission in Prokaryotes

Prokaryotes reproduce by the relatively simple process of binary fission. The single chromosome replicates and each copy attaches to a different location on the cell membrane. The cell membrane then begins to invaginate and eventually separates into two genetically identical bacteria. A similar process is used to replicate mitochondria within eukaryotic cells, but the overall process of cell replication in eukaryotes is more complicated (see below).


Adapted from

Mitosis is the process by which eukaryotic cells replicate by dividing into two genetically identical cells. It is the process by which new cells are formed in the growing embryo and after birth, and mitosis also replaces cells that have died or been shed. In humans some cells retain the capacity to divide throughout life. These “stem cells” divide by mitosis and produce daughter cells which then differentiate into a particular cell type. This provides a way of replacing cells, such as skin cells; the epithelial cells that line the respiratory, digestive, and urogenital tracts; and blood cells. Benign and malignant tumors also growth through mitosis.

The Cell Cycle

Cells normally follow a carefully controlled cell cycle, depicted below.

Many of our cells are mature functioning cells that are not actively dividing. These are cells in the G0 phase; this is sometimes called the “resting phase,” but these cells are actively functioning, and they are resting only in the sense that they aren’t replicating. The phases in dividing cells are as follows:

  • G1, when the cell grows in size in preparation for division
  • S, when synthesis of new DNA (replication) takes place
  • G2, when there is continued cell growth
  • M, which stands for mitosis, i.e., when the cell actually divides into two identical cells

The cell cycle is normally carefully controlled by a number of biochemical mechanisms. Loss of control mechanisms can result in abnormal cell division and a progression to to tumor formation. This is discussed in greater detail in the online module on cancer.


Meiosis is the specialized process by which gametes (sperm and eggs) are produced for sexual reproduction in the ovaries and testes. Recall that humans have 22 pairs of homologous chromosomes and one pair of sex chromosomes; one member of each pair came from the mother, and the other from the father. The 46 chromosomes are referred to as the diploid (2n) number, because there are two of each. In order for the fertilized egg to end up with the correct diploid number, sperm cells and eggs must be produced such that each has only one chromosome from each pair. In other words, gametes have only 23 chromosomes (referred to as the haploid number (1n). Meiosis, then, is the process by which specialized diploid stem cells in the ovary (oogonia) and testes (spermatogonia) produce eggs and sperm which have a haploid number of chromosomes. Thus, each gamete has 23 chromosomes (one from each of the 22 homologous pairs + 1 sex chromosome).

The video below shows the differences between mitosis and meiosis.

Meiosis Shuffles The Deck

Meiosis produces sperm and eggs with novel mixtures of the original parental chromosomes due to:

  • “Random assortment”: Separation of homologous pairs of maternal and paternal chromosomes results in each gamete randomly getting some maternal chromosomes and some paternal. Random assortment of 23 pairs of chromosomes can produce > 8 million possible combinations.
  • “Crossing over”: After maternal and paternal chromosomes match up as homologous pairs, they exchange sections of DNA. This further shuttles the genetic deck.

Fertilization, Oogenesis, and Spermatogenesis

Learning Objectives After successfully completing this section, the student will be able to: Explain how DNA encodes genetic information and the role of messenger RNA and transfer RNA.