BCMB 465 - Human Genetics
Spring Semester 2001


I.  A General Method for PCR Genotyping of Microsatellite Polymorphisms: A  Powerful Tool for Mapping Mammalian Genomes


This tutorial is intended to introduce you to the use of microsatellites for genotyping in mice; the method is the same in humans.  However, in humans, one does not have the advantage of large families (the mouse interspecific backcross described here).

You have been introduced to the concept that gene mapping requires polymorphisms (differences) among alleles and individuals.  In humans, differences in skin color, in  mice, coat colors and coat qualities provide such polymorphisms, but there are relatively few such polymorphisms.  Protein variants, such as isozymes, also provide polymorphism at given genetic loci but again, these are not highly polymorphic loci.  Most useful are polymorphisms detected at the level of DNA sequence itself.  Restriction fragment length polymorphisms, or RFLPs (due to presence or absence of a restriction site, or addition or deletion of sequences between two restriction sites) have been a useful source of the variation required for mapping and anchoring loci in mammalian genomes.  However, finding RFLPs in a cloned sequence is often not easy and it is laborious to type large numbers of RFLP loci, since it involves multiple restriction digests and Southern blots.  Minisatellites, or variable number tandem repeat loci have also been useful.  Such minisatellites are unit sequences that are repeated tens to thousands of times throughout the genome, so that any one minisatellite is located at a number of genetic loci.

We will see that a major goal of mapping projects is to saturate the genome with a large number of "anchor" loci, spaced at uniform distances throughout the genome, so that subsequently they can be used in genetic linkage analyses to map genes responsible for phenotypes of interest (such as obesity or manic-depressive illness).  Ideally, such anchor loci should be highly polymorphic (so that any two chromosomes in an individual have a high likelihood of carrying different alleles), should be easy to identify, and should be easy to type rapidly in large numbers of individuals (such as a series of human family pedigrees or a mouse mapping cross).

A veritable "magic bullet" for the intensive mapping efforts spurred by the Human Genome Project was the discovery of highly polymorphic microsatellite repeats interspersed throughout the genome.  Microsatellites are mono-, di-, tri-, or tetrameric sequences repeated multiple times in tandem array.  Microsatellites of all nucleotide compositions have been identified, but the class found most commonly in the mouse genome is a (CA)n-(GT)n dimer, called a CA repeat.  Many sites of CA repeats in the mouse genome are flanked by unique sequences; thus the CA repeat plus the flanking unique-sequence DNA become an anchor locus, characterized by high polymorphism (of the number of CA repeats) and a unique site (by virtue of the sequences that flank the repeat).  Eric Lander and associates of the Whitehead/MIT Genome Center laboriously screened mouse genomic libraries for CA repeat sequences embedded in unique-sequence flanking DNA; these markers, designated Mit markers, form anchor loci for constructing a framework for the map of the mouse genome.  Similar microsatellite repeats form anchor loci in the human genome.  The flanking DNA for each such marker has been sequenced and both forward and reverse PCR primers have been designed that allow amplification of the specific locus defined by the primer pairs.  Bear in mind that the population polymorphism is a result of the variation in the number of CA repeats between the flanking unique sequences, and is not due to variation in the unique sequence DNA.  This principle is illustrated here, where the line indicates the unique-sequence DNA flanking the CA repeat, and the forward and reverse arrows indicate the locus-specific forward and reverse PCR primers:

a)  Alleles:

#1  --->
#2  --->
#3  --->

b) Pattern of various allele combinations on electrophoretic gel after PCR amplification:

         1/1         2/2         3/3     1/2          1/3          2/3


Note that you can read genotype directly from the gel!

II.  The Mouse Interspecific Backcross

A powerful mapping tool in mammalian genetics is the use PCR genotyping of CA-repeat loci to prvide anchor loci in large interspecific backcrosses.

An interspecific backcross involves mating a male and female of two different mouse species to produce F1 progeny (all genetically identical to each other) and mating F1 females back to a male of one of the parental species.  An example is the Jackson Laboratory BSS and BSB backcrosses.  The first mating (the outcross)  was of a C57BL/6J (designated B) mouse (Mus musculus musculus) to a Mus spretus (designated S) male.  Mus spretus is a wild-derived strain of mice.  The mating is almost always of the B6 (or lab mouse) female to the spretus male since the B6 females are larger and can carry more progeny per pregnancy and are better mothers.  Among the F1 progeny, the males are always sterile (an interesting problem in itself), but the female F1 can be mated to either a B6 male, where the progeny form the BSB backcross progeny, or to a spretus male, where the progeny form the BSS backcross progeny.  This is illustrated below for the BSS interspecific backcross:

Cross 1:  C57BL/6 x Mus spretus

Cross 2:  (BL/6 x spr)F1 x Mus spretus


Now, if you follow one microsatellite locus (defined by the unique-sequence flanking DNA) through this cross, and use "B" to designate the C57BL/6 allele at the locus and "S”"to designate the Mus spretus allele at the locus, you get the following:

Cross 1:  B/B   x   S/S

Cross 2:  (B/S)F1   x   S/S

Progeny:   S/S and B/S

The total number of backcross progeny from the various F1 matings set up can be quite large (over 100, for example).  These crosses are useful for typing a large number of markers, by numerous investigators, over an extended period of time, since all the progeny were killed simultaneously and large amounts of DNA were made from various tissues of the mice.  This pooled DNA is stored under stable conditions and can be distributed to scientists who wish to use these backcrosses for genotyping.

Genotyping a microsatellite marker in such a backcross is very straightforward.  It involves amplifying the given locus (using the appropriate PCR pair of forward and reverse primers) from a small amount of DNA from each of the backcross progeny.  Genotype is determined directly from the gels of the PCR products, as illustrated previously.

Below are instructions (prepared for an advanced genetics laboratory course) for PCR genotyping.  This will give you an idea of the manipulations involved.  These may seem extensive, but this is MUCH less labor-intensive and time-consuming than the process required for detection of RFLPs, which requires restriction digest (overnight) --> purification of DNA (one day) --> electrophoresis (overnight) --> Southern blotting (one day) --> probing of Southern blot with radioactive probe (3 days)!

III.  Method for PCR Genotyping by Amplification of CA-Repeat Marker Sequences

1.  Select the desired backcross and the appropriate Mit marker(s) and obtain both backcross DNA and Mit-marker primer pairs. The PCR primer pairs can be selected from WWW resources.

2.  Run a test to make sure the primers work with both parental DNAs being used.  Sometimes, even when products are given in the WWW catalog, the primers will not amplify one or the other of the parental DNAs.  (This can be due to previously undetected polymorphism in the unique sequence used for the primer pair.)

3.  For the method presented here, the DNA samples should be at a concentration of 20 ng/µl (and is used at a concentration of 50 ng per PCR reaction).

4.  Have available all pipets, tips, numbered (identified) PCR tubes & racks, reagents (on ice) and PCR machine.  WEAR GLOVES AT ALL TIMES WHILE DOING ANY STEP OF PCR!!!

5.  Make a “Master PCR Mix”
 a.  Determine the correct MgCl2 concentration and primer concentration from the test PCR.
 b.  Each reaction will be 25µ total, with 2.5µl DNA and 22.5µl reaction mix.  Therefore, multiply 22.5 X one or two more than the number of desired reactions to determine the amount of Master Mix to prepare (leaving some left over for ease of pipetting).
 c.  Master Mix per reaction (make up in this order of ingredients):

  dH2O                                            (to total 22.5µl)
  10X PCR II Buffer                       2.5µl
  25mM MgCl2                               (1µl for 1mM, 2µl for 2mM)
  Forward primer                              0.5µl
  Reverse primer                              0.5µl
  dNTPs (1.25nM each of 4)            5.0µl
  Diluted Taq polymerase                 0.75µl

6.  Set up reactions
 a.  Aliquot 22.5µl Master Mix to each tube
 b.  Add 2.5µl mouse genomic DNA to each tube, cap tubes
 c.  Mix by brief vortex
 d.  Spin briefly in microfuge
 e.  With dropper bottle, layer 1-2 drops mineral oil on top of each sample; cap tightly
 f.  Put in PCR machine

7.  PCR Program:

 Start with:
  94 3 min.

 Followed by 30 cycles of:
  94 30 sec.  (melt)
  55o  2 min.   (anneal)
  72o  2 min.   (elongate)

 Ending with:
  72o  7 min.
   4 “soak” (hold)

8.  Hold PCR samples at 4o until time for electrophoresis.

9.  Pour 3% gel (1/2 agarose and 1/2 NuSieve low-melt agarose) with 3µl of 10 mg/ml stock ethidium bromide per 50ml of the gel.

10.  Prepare PCR samples for electrophoresis:

 a.  Stretch a piece of parafilm over an Eppendorf tube rack and pressed with gloved finger to make "wells."
 b.  For each reaction, put 2µl loading dye into well.
 c.  One at a time, carefully pipet the PCR mix from under the oil layer into a well and use the same pipet tip to mix it with loading dye and then directly load into gel slot.
 d.  Each row of gel lanes should have ØX174 marker DNA and the two parental DNAs.

11.  Run electrophoresis at 100v for approximately 1 hour.

12.  View gels with ultraviolet lamp; photograph; and immediately determine genotype of each backcross animal from gel.

IV.  Mapping Microsatellite Markers: Constructing and Interpreting Strain Distribution Patterns

 A stain distribution pattern, SDP, is the phenotypes (information) for a single locus (marker) across all the progeny in a given genetic cross, or all inbred strains of an RI set.  Experiments on PCR genotyping for a large number of markers generate SDPs.

For example, imagine that you have typed 40 progeny of the LXS backcross for two Mit makers, designated "12" and "27.”"  Mit markers are a large series of microsatellite markers already verified to be single locus, polymorphic markers in the mouse genome.

This backcross involves different strains than the BXS one considered earlier, but the principle is the same.  (In considering human families, rarely do all families show the same microsatellite alleles or allele combinations!)  The LXS backcross involved first an outcross between a lab mouse strain, SB/Le (carrying coat markers satin and beige) with Mus spretus, with the (LXS) F1 females backcrossed to SB/Le:

(SB/Le x M. spretus) x SB/Le

You have typed 40 of the backcross progeny for each of the two Mit markers by the method above.  For genotype scoring of the PCR products, we will designate the alleles for each marker as "L" (derived from SB/Le) and "S" (derived from M. spretus).

For each marker you determine a SDP by reading your gels and assessing whether each individual is L only (meaning that the mouse received the L allele from the father, or the backcross parent, and also the L allele from the mother, or the F1 hybrid parent)  or S (meaning that the mouse received the L allele from the father and the S allele from the mother, or the F1 hybrid parent) - remember, the L band on the gel is always present since it comes from the inbred backcross parent, and we simplify by scoring only the other allele as S or L.  If you have problems understanding this refer to the section earlier describing how a backcross is made.  Your SDP for a single marker locus might look like this, for animals #1-40, in order from right to left):

Mit12   S S L L S   L L L L U   S S S S S   S L S S L   S L L L S   S S S S S   L U S L L   S L L S L

In this SDP, "U" refers to an animal that was not typed in the experiment or for which the results were ambiguous.  Hopefully, there will not be many such holes in the data, but it is always possible and missing or ambiguous data should ALWAYS be designated as such!

Note that the SDP illustrated above is derived from PCR genotyping for a single microsatellite marker.  However, it could just as well be another microsatellite marker or the phenotype for a gene of interest, say a neurological mutation where the mouse is either "wobbler" (affected) or "normal" (unaffected).

Many experimental genetic crosses (i.e., done in laboratory animals, not in humans) are constructed to allow inference of genotype from phenotype.  For instance in the mappingbackcross cross we are using as an example, we know each animal received the L allele from its father, and so, from our typing, we know if each animal is LL or LS (and, since the L allele is always present, we simplify in scoring by writing only the second allele).  In the cross to score the phenotype for the neurological mutation, we could infer genotype only when one allele is recessive and the backcross is to a homozygous recessive individual.

The complete genetic map dataset for the LXS backcross is comprised of many SDPs, arranged to correspond to chromosome of origin and linkage order along the chromosome.  These come from the compilation of many experiments such as described here, scoring all the backcross progeny for a number of markers (both anonymous markers, such as the Mit microsatellite markers described here, as well as markers for genes for enzymes, cell-surface antigens, etc.).  The sum total of such data can be compiled to form a genetic map, limited only by the number of marker loci and the number of animals.  These genetic maps are continually  growing as more and more data are added sequentially.

 For instance, here is illustrated the SDP for chromosome 18 markers  in the LXS data backcross.

Chromosome 18 SDP in LXS backcross individuals:

D18Mit64   L L S S L   S L S L S   S L S S S   S S S L L   S S L L L   S L S S L   L L S L S   L S L L S

tg-mm        L L S S L   S L S L S   S L S S S   S S S L L   S S L L L   S L S S L   L L S L S   L S L L S

Evi-3          L L S S L   S L S L S   S L S S S   S S L L L   S S L L L   S L S S L   L L S L S   L S L L S

D18Mit20   L L S S L   S L S L S   S L S S S   S S L L L   S S L L L   S L S S L   L L S L S   L S L L S

D18Mit17   L L S S S   S L S L S   S S S S S   S S L L L   S S L L L   S L S S L   L L L L S   L S L L S

D18Mit9     S S S S S   S L S L S   S S S L S   L S L S L   S L L S L   S L S S L   L S L L S   L S L S L

D18Mit7     S S S S S   S L S L S   L S S L S   L S S S L   S L S S L   S L S S L   L S L L S   L S L S L

In this figure, the markers are listed down the left in order form the most centromeric to the most distal (all mouse chromosomes are highly acrocentric) and the individual animals in this large "family" (backcross) are vertical rows to the right of the markers.

Note that, by and large, adjacent loci have a similar SDP.  This is key in construction of the map and relates to the fact that a crossover is less likely to occur between two close markers than between two markers at a distance.  In fact, the assignment of a new marker to a given locus (position on the chromosome) within such a data set is done by looking for a concordant segregation pattern (SDP).