genome mapping - elsevierscitechconnect.elsevier.com/.../07/genome-mapping.pdf · genome mapping is...
TRANSCRIPT
Genome MappingVK Tiwari, Kansas State University, Manhattan, KS, USAJD Faris, USDA-ARS Cereal Crops Research Unit, Fargo, ND, USAB Friebe, Kansas State University, Manhattan, KS, USABS Gill, Kansas State University, Manhattan, KS, USA
ã 2016 Elsevier Ltd. All rights reserved.
Topic Highlights
• Molecular markers are important for genetic and genome
mapping studies.
• Next-generation sequencing-based marker genotyping,
such as genotyping by sequencing, is an important aid for
gene and genome mapping.
• Single-nucleotide polymorphism-based marker develop-
ment and their detection.
• Genome mapping methods use recombination-dependent
and recombination-independent approaches.
• Comparative mapping is an important tool for genome
analysis in the crops where sequence information is not
available.
Learning Objective
• To achieve an understanding of the commonly used molec-
ular markers and approaches used for genome mapping
Introduction
Genome mapping is used to assign short DNA sequences
(molecular markers) or specific genes to particular regions of
chromosomes and to determine their relative linear orders and
distances. A map is an essential tool for scientists to navigate
across the genome. Genome maps can be divided into two
groups: genetic maps and physical maps. Genetic maps are
based on recombination frequencies between genetic markers
and genes, and linked markers/genes form linkage groups
showing their relative order. A physical map of a given chro-
mosome or a genome shows the physical locations of genes
and other DNA sequences of interest, and distances are typi-
cally measured in base pairs. Physical maps can be divided into
three general types: chromosomal or cytogenetic maps, radia-
tion hybrid (RH) maps, and sequence maps. The ultimate
physical map is the complete sequence itself.
Molecular Markers and Their Visualization
DNA-based genetic markers rely on differences in DNA
sequences (polymorphisms) between two parental lines. Poly-
morphisms can result from various factors that lead to either
nucleotide changes or differences in DNA segment lengths
such as mutations, errors in DNA replication, and insertions,
inversions, and deletions of DNA fragments.
There are several established approaches for the detection of
polymorphisms using molecular markers including restriction
fragment length polymorphism (RFLP), random amplified
polymorphic DNA (RAPD), amplified fragment length poly-
morphism (AFLP), sequence-tagged site (STS), microsatellites
or simple sequence repeats (SSRs), and single nucleotide poly-
morphism (SNP). Originating in the 1980s, RFLP markers
were the first type of DNA-based markers to be used. RFLPs
involve the use of a restriction enzyme, which cleaves DNA at
specific DNA sequence palindromes, and the hybridization of
a short-labeled DNA fragment, or probe, to the restriction
enzyme-cleaved DNA. The probe label reveals the restriction
fragment hybridized by the probe, and polymorphisms are
revealed when an insertion/deletion occurs between critical
restriction sites in one genotype compared to the other or
when a particular restriction site is abolished due to mutation
in one genotype and not the other. RFLP markers can be
applied to essentially any organism, and they are still
employed to a limited extent today due to their usefulness in
comparative mapping analysis and map-based cloning studies.
However, these markers are not amenable to high-throughput
analysis, and they are difficult and laborious to handle due the
large amounts of DNA required, enzymatic digestions, South-
ern blotting, and probe labeling techniques.
Besides the RFLP marker technique, all the other types are
based on the use of polymerase chain reaction (PCR). PCR-
based markers require the development of an oligonucleotide
primer, which is a fragment of DNA typically 15–30 nucleotides
in length, to serve as a starting point for PCR amplification on
template DNA. In a PCR reaction, template DNA is mixed with
primers, nucleotides, and a specific enzyme called Taq
polymerase, which polymerizes DNA fragments. The mixture
is placed into a thermal cycler and subjected to repeated cycles
of different temperatures to allow the template DNA to dena-
ture, the oligonucleotide primers to anneal to complementary
sites on the template DNA, and the Taq polymerase to catalyze
the synthesis of new DNA strands leading to the generation of
billions of copies of the target sequence. After the completion of
the PCR reaction, the amplified product is electrophoresed
through an agarose or polyacrylamide gel and subsequently
visualized by DNA staining or other technologies.
RAPD markers are DNA fragments from PCR-based ampli-
fication of random segments of genomic DNA with a primer of
arbitrary nucleotide sequences. RAPD markers were the first
PCR-based markers to be used but, today, have very limited
application in molecular biology and mapping studies due to
the unpredictability of short primers in PCR and low
repeatability.
AFLPs, which combine the use of restriction enzymes with
PCR, have been used extensively in a wide range of organisms.
Encyclopedia of Food Grains http://dx.doi.org/10.1016/B978-0-12-394437-5.00220-5 1
The AFLP technique uses restriction enzymes to digest the
genomic DNA followed by ligation of adapters to the sticky
ends of the restriction fragments to serve as priming sites for
PCR. Subsets of the restriction fragments are selected by using
primers with sequencing complimentary to the adapter
sequence and also one or two nucleotides within the restriction
fragments of the template DNA. The reactions often employ
end-labeled radioactive or fluorescent primers for the visuali-
zation of the amplified products on polyacrylamide gels. The
AFLP technology is also highly sensitive and reproducible and
has the capability to detect various polymorphisms in different
genomic regions simultaneously. AFLP has higher reproduc-
ibility, resolution, and sensitivity at the whole-genome level
compared to some of the other marker techniques, and it also
has the capability to amplify multiple fragments (50–100) in a
single PCR, which provides a high-throughput format.
STSs are short DNA sequences (200–500 bp) with known
genomic locations. STSs can be easily detected by the PCR
using specific primers. In complex genomes, STS markers
derived from the coding regions of genes, that is, the expressed
portion of genome referred to as expressed sequence tags
(ESTs), can be a very useful resource for mapping the locations
of expressed genes. These markers are usually codominant in
nature, which allows the identification of homozygous and
heterozygous individuals in a mapping population. The STS
sequences may contain repetitive elements with unique and
conserved sequences at both ends of the site, and in broad
sense, STS can have a site for markers such as microsatellites,
sequence-characterized amplified region, cleaved amplified
polymorphic sequences, and inter-simple sequence repeats.
Microsatellite markers, also called SSRs, are widely used in
gene and genome mapping studies. These are simple sequence
tandem repeats and the repeat units are generally di-, tri- tetra-,
or pentanucleotides. In a common repeat motif (e.g., in a tri-
repeat motif in wheat (GAA)n), two nucleotides G and A are
repeated for a variable number of times in a bead-like fashion
(n could range from 8 to 50). SSRs are usually found in non-
coding regions of DNA with a few exceptions. On both sides of
the repeat unit are flanking regions that contain unordered
DNA, and these flanking regions are most important to
develop locus-specific primers to amplify SSRs with PCR. The
number and repeats within a microsatellite tend to be highly
variable within a given species, which leads to a high frequency
of polymorphism even among closely related individuals.
Many large and complex genomes, especially those of some
plants, are composed of only about 10–20% gene sequences,
whereas the vast majority (80–90%) is composed of transposable
element (TE)-related sequences or repeat-based sequences. These
repetitive or TEs are widespread throughout the genome and
therefore represent a useful resource for whole-genomemapping.
These elements have higher levels of tolerance for mutations or
rearrangements, whichmake these TEs highly polymorphic and a
good source of marker development for genome mapping. Vari-
ous TE-based marker development approaches have been used
and someof themost common repeat-basedmarkers, whichwere
developed inwheat, belong to two classes including insertion site-
based polymorphismmarkers and repeat junctionmarkers. These
markers are based on PCR with primers designed in conserved
regions of TEs. In general, repeat sequences in the genome are not
unique, but the insertion sites or repeat junctions are. Therefore,
by developing primers that are specific to particular insertion sites
or repeat junctions, it is possible to develop genome-specific
markers (Figure 1). After the identification of an insertion site or
repeat junction, the flanking sequences can be used to design the
primers. After the fragment is PCR-amplified, there are various
detection methods available for visualizing the marker polymor-
phisms including high-resolution melting analyses, temperature
gradient capillary electrophoresis, and fluorescent capillary
electrophoresis.
With advances in next-generation sequencing (NGS) tech-
nology, it is less expensive to determine the DNA sequence of a
fragment, and this has led to dramatic advances in high-
throughput marker technologies. With restriction site-
associated DNA (RAD) markers, the flanking DNA sequence
around each restriction site is an integral component for isola-
tion of restriction site-associated tags. The application of the
flanking DNA sequences in RAD tag techniques is referred as
reduced-representation method. The RAD tag isolation proce-
dure has been modified for use with high-throughput sequenc-
ing on the Illumina sequencing platform, to reduce error rates
and make the process high throughput. Isolated RAD tags can
be used to identify and genotype DNA sequence-based poly-
morphisms such as SNPs, and these polymorphic sites are
called as RAD markers.
The advent of automated Sanger sequencing and especially
recent advances in NGS technologies led to the development
of a second generation of markers based on sequence informa-
tion. SNPs differ by a single nucleotide A, T, C, or G at a given
Gene or unknown sequence
Transposable sequences
TE junction
(a)
(b)
(c)
(d)
Figure 1 Types of repeat junctions in a given genomic DNA sequence that can be used for designing unique locus-specific markers: (a) A repeatjunction between two different transposable elements (TEs). (b) Two repeat junctions with two different TEs (black and green) and an unknown sequence(pink). (c) Repeat junction with a TE on one side and a gene fragment or unknown sequence on the other side. (d) Two repeat junctions (nested)created by a TE inserting into another TE.
2 GENETICS OF GRAINS | Genome Mapping
locus between different individuals, populations, and parental
lines (Figure 2). If this variation occurs between the members
of the same population, these variations are considered alleles
(e.g., A or T), and most SNPs have only two alleles. SNPs have
emerged as the markers of choice because of their abundance
and high-throughput detection capacities. There are many
ways to identify SNPs starting from a low-throughput method
like PCR amplification followed by electrophoresis, sequence
detection, and mass spectrometry to high-throughput NGS-
based SNP discovery. After generating sequences for SNP
discovery, the next step is to detect useful SNPs. Manual iden-
tification of putative SNPs had been a major bottleneck for
high-throughput SNP calling, but now, there are numerous
software programs available for SNP discovery. These pro-
grams (CASAVA, GS Amplicon, BioScope™, NextGENe®,GigaBayes, SNPdetector, PolyScan, etc.) are very important
for the development of accurate computational methods for
automated SNP calling. There are established approaches and
protocols for SNP discovery in many species, and for species
with reference genome sequences, NGS reads can be mapped
on the reference sequences and SNP discovery can be made.
However, SNP discovery can also be done in the species with-
out a reference sequence. There are many assays available for
SNP genotyping including Illumina GoldenGate, KASPar,
iPLEX Gold technology, and Illumina BeadChips, to name a
few (Figure 3(a) and 3(b)). Exciting progress has beenmade in
Genotype1
SNP site
TTGGCCTGATTTTAGTGGTACGGCCCCGTCACCCGTGATTGGTGAAGTTGGAATGGAGGATTGGCCTGATTTTAGTGGTATGGCCCCGTCACCCGTGATTGGTCAAGTTGGAATGGAGGAGenotype2∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
Figure 2 Identification of SNP sites in a DNA sequence: Two SNPs between genomic DNA of two genotypes are shown. The length of the sequence is60 base pairs, and genotype 1 and genotype 2 show variants at positions 21 [C/T] and 44 [G/C].
A
T
G
G
A
GCG
G
C
CG
CG
CG
AT
ATAT
A
C
T
AT
A
A
GCG
TA
C
GA
A G
A G
T
T
C
C
Figure 3 (a) Hybridization-based SNP genotyping method (Illumina Infinium assay): In this assay, the genomic DNA is captured by direct hybridizationto array-bound target sequences (50 bases directly upstream of the SNP). Followed by hybridization, a single-base extension reaction withdideoxynucleotides (fluorescent) is used at the target SNP nucleotide. Differences in the relative intensity of fluorescent signals can be used to makegenotyping calls. (b) PCR-based genotyping methods (Applied Biosystems’ TaqMan assay): For each locus, two common locus-specific primers aredesigned on each side of the SNP to amplify the fragment spanning the polymorphic site. Two fluorescence resonance energy transfer (FRET)-labeledoligonucleotides called TaqMan probes are then added to the PCR. Each probe is specific to one of the alleles and is designed to hybridize atthe SNP site between the forward and reverse primers. By design, these have a reporter dye at their 50 end (different for each allele) and a quencher(Q) at their 30 end. If there is no reaction, the probes are intact and the reporter dye’s emission is suppressed by the quencher. During the PCRamplification, the Taq polymerase cleaves the probe that anneals to the template and separates reporter and quencher resulting in the emission offluorescence from the reporter. Genotype calling can then be made according to the fluorescent signal.
GENETICS OF GRAINS | Genome Mapping 3
sequencing technologies that are providing high-throughput
molecular marker information at low costs. Genotyping by
sequencing (GBS) provides marker polymorphisms using
NGS technologies followed by a bioinformatics pipeline. It is
a preferred method for several reasons including reduced cost
through an enzyme-based genomic complexity reduction step
and the use of barcoded adapters for multiplexing. Addition-
ally, it can be used for the discovery and identification of SNPs,
even for those species with complex genomes that lack a refer-
ence sequence. GBS has advantages when studying polyploid
species, which is a big challenge for any technology. It relies on
secondary genome-specific polymorphisms that are next to the
SNP, and it allows the assignation of a given sequence to a
specific genome so it becomes a single-locus marker.
Genetic Linkage Mapping
Markers are powerful for many diagnostic applications for
typing biological samples in determining the identity of
unknown samples, sample mixtures, criminal justice system,
and curation of biological collections, to name a few. High-
density genetic linkage maps facilitate map-based cloning,
quantitative trait mapping, marker-assisted breeding, and com-
parative genome evolution. Genetic mapping relies on the fact
that nuclear genomes are made up of chromosomes, which
contain both genes and noncoding DNA. When homologous
chromosomes pair at meiosis, they recombine at various posi-
tions along the chromosomes. Thus, recombination is the
basis for genetic linkage mapping and determining the order
of markers along the chromosome, that is, markers are sepa-
rated by genetic distances calculated based on the amount of
meiotic recombination that occurs between them.
An example of genetic linkage mapping of three linked
markers in 20 F2 progeny is presented in Figure 4. The markers
include two DNA markers (A and B) and one morphological
marker (disease resistance gene ‘R’). The DNA markers are
codominant, and therefore, all possible genotypes can be
determined in the F2 progeny (homozygous for parent A,
homozygous for parent B, and heterozygous). For the morpho-
logical marker, disease resistance is dominant, and therefore,
the genotypic classes of heterozygous and homozygous for the
resistant parent (parent A) cannot be distinguished (resistant
plants can have allelic compositions of ‘RR’ or ‘Rr,’ and suscep-
tible plants have ‘rr’). Inspection of Figure 4 indicates there are
three individuals (2, 6, and 12) with genotypes that differ
between markers A and B. Between A and R, there are two
individuals (6 and 12) with differing genotypes, and one indi-
vidual (2) has differing genotypes between markers B and R.
This suggests that marker R (disease resistance gene) lies
between markers A and B. The two recombination events
between markers A and R translate into ten map units (2/
20�100¼10), and there are five map units between markers
B and R (1/20�100¼5).
Par
ent
A
Par
ent
BF 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
F2 progeny
Marker A
Marker B
Disease resistance (R)gene
AR geneB
5 10
Linkage Map
R S R R R S R R R R R S R S R R R R S R R R R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Phenotype: R = resistant; S = susceptible
Genotypes: Parent A = RR; Parent B = rr; F1 = Rr; F2 progeny R = RR or Rr, S = rr
Figure 4 Genotypic data of two DNA markers (A and B) and phenotypic data for one morphological marker (disease resistance gene ‘R’) fortwo parents, the F1 plant derived from crossing the two parents, and 20 F2 individuals. The DNA markers are codominant; thus, all possible genotypescan be distinguished (homozygous for parent A and heterozygous and homozygous for parent B). The morphological marker ‘R’ is dominant,and therefore, the genotypes of resistant F2 individuals cannot be distinguished (resistant plants can be either homozygous for parent A (RR) orheterozygous (Rr)). The resulting genetic linkage map of the three loci and genetic distances separating them are shown at the bottom.
4 GENETICS OF GRAINS | Genome Mapping
This type of analysis can be applied to hundreds, or even
thousands of markers to construct complete genetic linkage
maps of chromosomes. Fortunately, there are various com-
puter software programs available to handle such large data
sets and to determine the most likely marker orders and inter-
marker distances.
The number of individuals surveyed in a mapping popula-
tion determines the precision of the genetic distance measured.
In the example, only 20 individuals were surveyed, and if no
recombinants were identified between twomarkers, this would
translate to a genetic distance of 0 map units between the
markers. If 100 individuals were surveyed, then one or more
recombinants may be identified leading to a genetic distance of
one or more map units. Generally, initial genetic maps of plant
species are generated using 80–120 individuals, which allows
for the detection of recombination between markers one to
three map units apart. This level of precision is considered
acceptable, and, at the same time, the amount of labor and
cost is considered manageable. However, certain mapping
experiments such as map-based cloning of genes by chromo-
some walking require much higher resolution in order to
separate markers extremely close to the target gene. In these
experiments, it is not uncommon to survey 3000–5000 indi-
viduals to obtain the necessary level of precision.
In plants, most populations are derived from crossing two
highly homozygous parents. The population shown in the
example in Figure 4 is an F2 population. While F2 populations
are commonly used and generally a good choice for chromo-
some mapping, other types of populations, such as backcross
(BC), doubled-haploid (DH), and recombinant inbred (RI),
are also commonly used. However, DH technology is not easily
accomplished in some crops, and it is currently impossible in
others. Each type of population has its advantages and disad-
vantages. F2, BC, and DH populations can be developed very
rapidly, while RI populations are developed by advancing each
line by single-seed descent for many generations with the goal
of selfing to homozygosity. F2 and BC populations are short-
lived and provide limited opportunity to obtain DNA and
phenotypic data, while DH and RI populations provide essen-
tially pure lines that may be tested for traits in replicated
experiments over several environments if desired. Thus, RI
and DH populations are preferred for mapping of quantitative
traits that may be affected by environmental influences. BC
and DH populations result from one cycle of meiosis, but an F2population has undergone recombination in both male and
female gametes and, therefore, provides twice the recombina-
tion information. RI populations have undergone several
cycles of meiosis but contain two identical homologues and,
therefore, provide about the same amount of information
as an F2.
The development and analysis of genetic linkage maps lead
to an abundance of information regarding genome structure.
From a more applied perspective, they provide knowledge
regarding the locations of genes and DNA markers associated
with them. In a segregating population, morphological
markers can be scored and analyzed in the same manner as
DNA markers. The difference in scoring for morphological
makers compared to DNA markers lies in the fact that, for
morphological markers, the genotype is determined based on
the visualization of the plant’s phenotype, while DNA markers
are scored at the DNA level. For example, a population segre-
gating for resistance to a particular disease would be scored
based on the reaction of each individual to the disease as being
one of either parental type. Inclusion of this phenotypic data
with genotypic DNA marker data for map generation might
reveal that the disease resistance gene is flanked by closely
linked DNA markers. Such markers are valuable tools that
can be employed by plant breeders who wish to move the
disease resistance gene into elite lines for the development of
new and improved varieties. Using the markers to make selec-
tions is known as marker-assisted selection (MAS). MAS has
advantages over selecting for the trait itself in that markers are
not affected by environmental factors as phenotypic traits
sometimes are. In addition, MAS allows breeders to make
selections in early generations and growth stages allowing
them to eliminate undesirable material early on.
In a nutshell, genetic mapping is a great resource for trait
mapping and map-based cloning studies; however, a genetic
map is not sufficient for sequencing a genome. Polymorphism
level is many times a limitation for genetic mapping; however,
the advanced and low-cost NGS approaches have been a big
boost to overcome this limitation. GBS has been widely
accepted and is now being used to map target traits in various
crops. In addition, sequence-based genome mapping, where
members (mostly RI lines) of a given mapping population are
sequenced to high coverage, is now gaining momentum as
well. This approach is very useful for generating a large number
of sequence tags, which can be assembled and anchored on the
genetic map in a chromosome-wise manner. However, precise
ordering of these tags/contigs will be an issue due to the
limitation of genetic mapping in terms of the number of
recombination events. The resolution of a genetic map
depends on the number of recombination events that have
been scored in a given population. Recombination events are
not uniformly distributed across the length of the chromosome
as recombination is suppressed around the centromeric
regions. So reduced or nearly absent recombination events
affect the resolving power of linkage analysis, which means
that genes that are several kilobases to megabases apart may
appear at the same position on the genetic map.
Physical Mapping
In contrast to genetic mapping where distances between land-
marks are calculated based on recombination frequency, phys-
ical mapping determines the actual physical distance. Physical
mapping can be done cytologically by chemically staining and
viewing whole chromosomes using techniques such as in situ
hybridization (ISH) and C-banding. Such techniques have very
low resolution in terms of physical mapping because chromo-
somes are viewed at the cellular level usually at metaphase.
However, recent techniques, such as fiber-fluorescence in situ
hybridization (FISH) where nuclear DNA is lysed on a glass
slide and used for in situ mapping, can provide a much higher
resolution (see succeeding text). The highest-resolution physi-
cal mapping is obtained by sequencing the DNA itself. It is
usually preceded by constructing local contiguous sequences
(contigs) of large-insert DNA clones and anchoring the contig
to a genetic map.
GENETICS OF GRAINS | Genome Mapping 5
In Situ Hybridization
The ISH technique was developed about 45 years ago and
allows the localization of genes or DNA sequences directly on
chromosomes in cytological preparations. The ISH technique
uses probe DNA that is labeled with biotinylated dUTP or
digoxigenin-dUTP and the hybridization sites are detected by
enzymatic reporter molecules such as horseradish peroxidase
or alkaline phosphatase-conjugated avidin/streptavidin. ISH
has been used successfully to determine the physical location
and distribution of dispersed or tandemly repetitive DNA
sequences on individual chromosomes. For example, it has
been used to determine the physical location of multicopy
gene families such as the 5S and 18S–26S ribosomal genes.
FISH uses fluorochromes for signal detection. The FISH
technique allows different DNA probes to be labeled with
different fluorochromes that emit different colors (multicolor
FISH). Thus, the physical order of two or more probes on a
chromosome can be determined simultaneously. Also, FISH
can allow more precise mapping of probes because the fluo-
rescence signals can be analyzed with special cameras and
digital imaging tools.
In humans, the order of two DNA probes can be deter-
mined by ISH on metaphase chromosomes only if the two
sequences are separated by at least 1 Mb. However, when ISH
is done using interphase nuclei, DNA sequences separated by
as little as 50 kb can be resolved. Plant metaphase chromo-
somes are more condensed than human metaphase
chromosomes, and this may be one reason why ISH using
low-copy probes is more difficult in some plant species. Thus,
it has been suggested that interphase nuclei can be exploited
for ISH mapping in plants. Subsequently, experiments where
DNA probes were hybridized to maize interphase nuclei sug-
gested that the resolving power of interphase FISH mapping
can be as little as 100 kb.
FISH technique has been used successfully to determine the
physical location of bacterial artificial chromosome (BAC)
clones on interphase and metaphase chromosomes. Rice BAC
clones have been hybridized to rice (Oryza sativa L.) chromo-
somes revealing that the repetitive DNA sequences in the BAC
clones could be efficiently suppressed by using rice genomic
DNA as a competitor in the hybridization mixture. The suc-
cessful application of this technique to plants with very large
genomes may depend on the size of the genomic clones ana-
lyzed and the amount of repetitive sequences in the genome.
Fiber-FISH
Fiber-FISH technique uses extended chromatin DNA across a
glass slide and a probe is labeled as with standard FISH and
hybridized to the extended fibers and where DNA sequences,
which are only a few kilobases apart, can be ordered. In
humans, fiber-FISH has been used to analyze overlapping
clones, detect chromosomal rearrangements, determine the
physical distances between genes, measure the sizes of long
DNA loci, and aid in the positional cloning of specific genes.
Fiber-FISH was used in Arabidopsis thaliana to measure clusters
of DNA repeats as long as 1.71 Mb, which is more than 1% of
the Arabidopsis genome. It was found that fiber-FISH signals
derived from small DNA fragments (<3 kb) were often
observed as single spots on extended DNA fibers, and thus,
sequences that are less than 5–10 kb apart cannot be ordered.
Single-Copy Gene FISH
Single-copy gene FISH is an approach to develop a cytogenetic
map of a given chromosome using full-length cDNA (fl-cDNA)
probes. Because genes and gene syntenic blocks are conserved
between different grass species such as wheat, barley, rice, and
maize, single-copy FISH provides a rapid method for determin-
ing chromosome synteny for species for which little genetic or
cytogenetic mapping information is available. In an event of
transferring important genes from wild relatives to bread wheat
(Triticum aestivum L., 2n¼6�¼42, AABBDD) by induced
homoeologous recombination, it is important to know the
chromosomal relationships of the species involved. Single-
copy FISH provides a powerful and rapid method for deter-
mining genetic relationships of relatively little studied wild
relatives with those of wheat. Once identified from single-
gene markers, fl-cDNA probes are used for FISH and the respec-
tive positions of these probes are determined to develop a
cytogenetic map. This technique can also be used to identify
structural changes between the homoeologous groups of chro-
mosomes, between the genomes of wheat, and other species
from the Triticeae tribe. This provides important information
on the strategies to be used for exploitation of those species for
wheat improvement.
Aneuploid Mapping
Wheat is a polyploid and can tolerate a high degree of aneu-
ploidy (abnormal chromosome numbers). There are a vast
array of aneuploidy stocks such as nullisomic–tetrasomic
(NT) lines and the ditelosomic (dt) lines. NT lines lack one
pair of chromosomes and extra pair of homoeologous chro-
mosomes and allow arm mapping of genes. Ditelosomic lines
lack one pair of chromosome arms and allow arm mapping of
genes.
With today’s molecular technology, the power and utility of
the wheat aneuploids have been even more fully realized. DNA
markers can be quickly located to a specific chromosome or
chromosome arm using a single hybridization or amplification
reaction without the need for polymorphism. Telocentric chro-
mosomes can be flow-sorted and DNA-amplified and used for
NGS for marker development. Dense chromosomal arm maps
have been developed and genes identified and ordered to
specific chromosome arms. These maps are useful for gene
tagging, linkage and mapping of quantitative trait loci (QTL),
cytogenetic manipulations, estimation of genetic distance, and
evolutionary studies.
Chromosome Deletion Mapping
A unique system in wheat is the use of gametocidal (Gc) factors
to construct chromosome deletion lines. Gc chromosomes
6 GENETICS OF GRAINS | Genome Mapping
were introduced into wheat by interspecific hybridization with
the related Aegilops species and backcrossing. Plants monoso-
mic for the Gc chromosome produce two types of gametes.
Only those gametes possessing the Gc chromosome are nor-
mal. Gametes lacking the Gc chromosome undergo structural
chromosome aberrations and, in most cases, are nonfunc-
tional. However, if the damage caused by the chromosome
breakage is not sufficient to kill the gamete, it may still function
and be transmitted to the offspring.
The Gc system has been used to develop wheat lines with
terminal chromosome deletions. These stocks have proved very
useful for the physical mapping of genes and DNA markers to
subarm locations and for the development of physical maps,
which have been constructed for all seven homoeologous
chromosome groups of wheat. In addition, chromosome bin
maps of most of the expressed genes in the wheat plant have
been constructed using a set of wheat aneuploid and deletion
lines (http://wheat.pw.usda.gov/wEST/binmaps/).
HAPPY Mapping
Another genome mapping approach known as HAPPY map-
ping has been used for genome mapping studies. This
approach is based on haploid DNA samples analyzed using
the polymerase chain reaction (HAPPY). HAPPY mapping
does not require marker polymorphism or time-consuming
population development. It is an in vitro approach for the
ordering of DNA markers directly on native genomic DNA
and is based on analyzing the segregation of markers amplified
from high-molecular-weight genomic DNA. It is a three-step
process. First, genomic DNA is broken into random fragments
using gamma irradiation or mechanical shearing. The DNA is
isolated and analyzed for quality and integrity, which is the
most important aspect of the technique. Various protocols
have been tested and used to avoid unwanted mechanical
breakage of the DNA molecules. It is usually done by embed-
ding the living cells in agarose gel; during DNA extraction, long
molecules of chromosomal DNA remain trapped and pro-
tected within the agarose. The high-quality DNA (DNA solu-
tion) is then subjected to random fragmentation using
mechanical shearing, gel melting, and x-ray treatments. The
average size of the broken fragments depends on the dosage
or mechanical shearing used. The next step involves the devel-
opment of a ‘mapping panel,’ and to achieve this, broken DNA
fragments are diluted to a very low concentration and �100
samples from individual treatments usually get dispensed into
DNA collecting plates or tubes. Since these samples are very
small, each well or tube may represent a small incomplete set
of random fragments. The third and final step involves a highly
sensitive PCR followed by the scoring of markers as present or
absent in the HAPPY mapping panel.
Genotyping of large sets of markers and detailed analysis of
marker data can be used for the construction of maps and to
calculate precise locations of markers on a given chromosome
or genome. Because the samples in a mapping panel are so
small that each one will contain only a randomly sampled
subset of the markers rather than the complete genome, a
given marker tested on the panel can be present in only one
subset of the panel. If two marker loci are close together, then
they will remain on the same broken fragments and not show
any break between them, whereas distant markers may be lost.
With increasing distances between two marker pairs, the fre-
quency of random breaks between them will also increase. The
statistical analysis of the cosegregation frequencies and differ-
ent mapping software can be used to deduce a marker or map
order based on the data generated from the HAPPY mapping
panel. There are certain limitations attached to this approach.
The first is that it is difficult to prepare DNA fragments of more
than a few megabases in size, and therefore, intermarker dis-
tances of more than one megabase are difficult to measure.
Another major limitation is the sample size of the DNA in the
mapping panel, as all markers need to be mapped by PCR.
RH Mapping
RH mapping has been exploited in animal genome mapping
projects and is a recombination-independent approach. It was
pioneered in the human genetics arena and uses radiation-
induced chromosome breakage rather than meiotic recombi-
nation for mapping. After fragmentation, samples containing
different subsets of the original chromosome or genome are
isolated and used for marker assays. In this method, any given
mapping panel member is assayed for the presence or absence
of a given marker, thus circumventing the need for marker
polymorphisms between genotypes.
Gross and Harris produced the first RHs by irradiating the
cultured human cells with a high dose of x-rays and their
subsequent fusion to unirradiated hamster cells. Generated
RHs showed many broken fragments of human chromosomes
with unfragmented chromosomes of hamster cells. The
approach was then modified and applied to a number of
animal species. In the modified approach, donor cells are
irradiated and then fused to unirradiated host cells, and RHs
containing donor chromosome fragments are identified using
selectable markers for a given species. Species-specific RHs can
be isolated, cultured, and saved as an immortal resource.
For genome (RH) mapping, the DNA of �100 hybrid cell
lines (each containing a different set of donor fragments) can
be assembled as an RH panel. The assembled panel can be used
for marker genotyping and the order and distances of the
markers in a given genome can be inferred. Mapping resolu-
tion in an RH panel is a function of the size of the fragments
that are generated during the development of the mapping
panel. Therefore, the mapping resolution can be altered by
simply changing the level of chromosome fragmentation.
Additionally, in RHs, map distances better reflect the true
physical distance between markers than do recombination-
based maps, so maps constructed by the RH approach can
better approximate the physical layout of a given chromosome.
The RH approach has been used to map the human genome
along with various animal genomes; however, its application
in plants has been limited. RH mapping in plants was first
reported for a maize chromosome, and then, it was applied
to cotton, barley, and wheat. Recently, RH mapping was used
for genome mapping of hexaploid wheat (Figure 5). Figure 5
presents a scheme for the development of an RH panel for D-
genome chromosomes of hexaploid wheat. Pollen from the
reference hexaploid wheat Chinese Spring was irradiated using
GENETICS OF GRAINS | Genome Mapping 7
gamma radiation, and these pollen samples were used to pol-
linate a tetraploid wheat line Altar84. F1 seeds (pentaploid)
represent an RH panel and each plant from these seeds presents
a unique RH event. Chromosome lesions induced in the A and
B genomes of Chinese Spring are masked in these quasi-
pentaploids due to the presence of A and B genome chromo-
somes from the tetraploid parent, but the chromosomes from
the D genome are present in one copy and allow RH mapping
of all D-genome chromosomes simultaneously. It has been
found that using a small RH panel (�94 lines), map resolution
of up to�300 kb can be achieved throughout the length of any
given chromosome in hexaploid wheat. The RH panel can be
used to anchor and order BAC contigs, derived from flow-
sorted chromosome arm-specific libraries to individual wheat
chromosomes. RH panels will also be highly useful for ongo-
ing wheat genome sequencing projects for ordering of
sequence scaffolds.
Large-Insert Clone Contigs
The construction of physical contig maps is important for
facilitating positional cloning of genes, sequencing of genomic
DNA, and detailed analysis of chromosome and genome struc-
ture. Physical contig mapping is the arrangement of large-insert
clones (YACs, BACs, and cosmids) in a linear array that
represents the DNA sequence along the chromosome. Clones
are selected by screening a library with DNA probes used to
detect genetic markers on a genetic linkage map of the organ-
ism. Several DNA probes that detect closely linked genetic loci
will hybridize to corresponding large-insert clones, and these
clones can then be arranged into a contig based on overlapping
segments and fingerprinting. BAC contigs are currently being
developed in many crop species. However, crops with complex
genomes offer huge problems due to large genome size, poly-
ploid nature, and very high percentages of repetitive sequences.
To address these issues in wheat, a sophisticated flow-
sorting technique was applied for isolation of individual
chromosomes or chromosome arms. The DNA from these
flow-sorted chromosomes and arms was used for the develop-
ment of BAC libraries. These BAC libraries laid the foundation
for the physical mapping of the wheat genomes. Once a phys-
ical contig map is complete, the structure and organization of
the genome, such as the distribution of repetitive and single-
copy sequences, can be discerned. A BAC-by-BAC approach has
been considered as the most suitable approach for generating
reference genome maps of barley and wheat. In this method, a
BAC library for an individual chromosome is the starting point
and BAC contigs are constructed from individual BACs by
identifying BACs containing overlapping fragments. Ideally
then, the BAC contigs are anchored onto a genetic or RH map
of the genome, so that the sequence data from the contig can
Hexaploid wheat lineChinese Spring
2n=6x=42 (AABBDD)
Green HousePlanting
X
Gammairradiation
Pollen
Genotyping
About 25 days after pollination spikes wereharvested which carried RH1 seeds.. EachSeed represents a Chinese Spring-RH andindependent deletion event(s).
Green house planting,tissue collection, DNA
extractionRH1
2n=5x=35(AABBD)
Emasculation oftetraploid wheat
spikes
Eggn=2x=14(AB)
Pollenn=3x=21(ABD)
Pollenn=3x=21(ABD)
Tetraploid wheat lineAltar
2n=4x=28 (AABB)
Figure 5 Development of Chinese Spring D-genome radiation hybrid panel: The spikes of hexaploid wheat cultivar Chinese Spring (T. aestivum;2n¼6�¼42, AABBDD) were used for g-irradiation. Pollen from irradiated spikes was immediately used to pollinate the stigmas of emasculated florets(male anthers removed) of tetraploid wheat variety Altar 84 (T. turgidum; 2n¼4�¼28, AABB). Seeds of F1 hybrids were harvested �20 days afterpollination. Each surviving F1 seed (RH1-pentaploid) on germination represents a unique RH event. DNA samples of the individual RH1 plants were thenharvested and genotyped for RH mapping.
8 GENETICS OF GRAINS | Genome Mapping
be checked and interpreted by looking for markers or genes
known to be present in a particular region. The BACs consti-
tuting the minimum tiling path are then individually
sequenced by the shotgun method and assembled into a pseu-
domolecule providing a sequence of each chromosome.
Comparing Physical Distance to Genetic Distance
Physical maps have led to a wealth of information regarding
the physical locations of morphological traits and evolutionary
translocation breakpoints and genome-wide structure and
organization. Comparisons of the physical maps with genetic
linkage maps can reveal the physical distribution of genes and
recombination along the chromosome. For example, RFLP
probes derived from mRNA (called cDNA probes) represent
expressed genes, and thus, the physical mapping of cDNA
probes will reveal the physical locations of expressed genes.
Therefore, when sets of cDNA probes are mapped genetically as
well as physically, one can infer the relationship between
physical distances and genetic distances among the common
markers. In wheat, physical maps constructed using the chro-
mosome deletion lines have been compared extensively to
corresponding genetic maps of the same chromosomes. This
work has revealed that genes and DNA markers tend to be
clustered in small physical segments that undergo a high
degree of recombination (Figure 6). These gene-rich regions
are separated by large gene-poor segments that undergo very
little recombination. This work has facilitated BAC contig con-
struction of regions containing genes of interest for the purpose
of positional cloning.
In barley, physical maps generated based on translocation
breakpoints were compared to corresponding genetic linkage
maps. The results agreed with those found in wheat by deletion
mapping and showed that the barley genome consists of rela-
tively small gene-rich regions that are hot spots for recombina-
tion interspersed among large segments that are gene-poor and
undergo very little recombination. The information obtained
by physical mapping of translocation breakpoints has facili-
tated the construction of BAC contigs and positional cloning of
important genes by allowing researchers to focus on the gene-
rich regions of the genome. More intricate comparisons of
physical and genetic relationships can be obtained by compar-
ing local BAC contigs to genetic maps. The primary goal of such
experiments is to identify a large-insert clone containing a gene
of interest, but additional important information is obtained.
For example, once a physical contig map of the region is
developed, it can be compared to the genetic linkage map of
the corresponding region to calculate physical to genetic dis-
tance ratios. This is important information because recombi-
nation is known to be distributed nonrandomly throughout
the genomes of many plant species causing the physical to
genetic distance ratios to be highly variable depending on the
characteristics of the region.
Comparative Mapping
Much effort has been put forth in comparing the genomic
relationships among grasses and among members of other
plant families. For example, comparative mapping experi-
ments among members of the Poaceae such as wheat, rice,
barley, rye, oat, and maize have revealed remarkable similari-
ties in gene content and marker synteny at the chromosome
level. It is well established that DNA probes cloned from these
related species commonly identify sets of orthologous loci that
lie at approximately the same positions relative to each other
and to the centromeres. GenomeZipper-based consensus
maps, which integrate ordered gene loci from homoeologous
wheat genomes and the corresponding chromosomes of bar-
ley, Ae. tauschii, T. monococcum, and rice, have been con-
structed. These experiments have shown that the genomes of
barley, Ae. tauschii, and T. monococcum are essentially colinear
with that of wheat. The genomes of more distantly related
cereals such as oat, rice, and maize can be divided into linkage
blocks that have homology to corresponding segments of the
wheat genome. The degree of genomic similarities observed at
the chromosome level among grass genomes led to the notion
that information from the small genome of rice could be
directly applied to the much larger genome of wheat. However,
even though a substantial degree of synteny is observed at the
chromosome level, studies of the degree of microcolinearity
between rice and wheat show less promise for gene discovery
in wheat. Genes with conserved order across these three species
with sequenced genomes can be used to predict the order of
corresponding genes conserved in other grass species using
synteny-based analysis.
There have been exciting developments in genome map-
ping studies in grasses in terms of the development of high-
density genetic maps and physical maps. This was followed by
the generation of EST databases in cereals. In the recent past,
large-scale genome sequencing projects in grasses have been
successfully implemented, the list including rice, Brachypodium,
sorghum, maize, and foxtail millet. These studies provided
extensive information on the genome organization of major
cereals. Knowledge gained from the genome sequencing has
enhanced understanding of the structural and functional com-
ponents of the genome for its effective utilization in genetic
improvement of cereals. Genome maps (whole-genome
sequences) of the diploid model grass Brachypodium (genome
size 272 Mb) are available, and these provide a useful resource
to study the evolution of genomes across the grasses. Among
sequenced cereal crops, rice has a smaller genome (420 Mbp)
and higher gene density as compared to other cereals; sorghum
is positioned after rice with genome size of �730 Mb, whereas
the maize genome is larger (2.3 Gb), and it has undergone
several rounds of genome duplications and is distinguishable
from its close relative, sorghum. Reference genome maps of
sorghum and foxtail millet are available, and altogether, these
reference genome maps provide a great resource to study com-
parative genomics in order to develop mapping information
about an orphan grass or cereals with no genomic information.
There are many software programs and databases devel-
oped to look at the syntenic relationship of the cereal genomes.
Recently, a GenomeZipper approach was developed to provide
an extensive database for studying syntenic relationships
among grass genomes (between wheat, Brachypodium, rice,
sorghum, and barley genomes). The GenomeZipper uses a
novel approach that allows systematic exploitation of con-
served synteny with model grasses. For example, it allowed
GENETICS OF GRAINS | Genome Mapping 9
the assignment of 86% of the total estimated (�32000) barley
genes to individual chromosome arms.
Future Mapping Prospects
The ultimate goal in map construction is the deciphering of the
linear DNA sequences of the full complement of chromosomes
of an organism and the utilization of map information in trait
mapping. The whole-genome sequence information available
in major cereals like rice, sorghum, maize, and foxtail millet
has revolutionized the understanding of the mechanisms
underlying genome evolution in these important cereal crops
as well as unraveling the important mechanisms in plant
growth and developmental processes and tolerance to various
biotic and abiotic stresses. The practical applications of the
genome maps and reference sequences are best realized only
when allelic diversity among diverse germplasm is better
understood. In crops where sequence information is not avail-
able, comparative genomics-based tools can be very useful for
providing a virtual gene order based on synteny. Sequence-
ready physical maps of diploid barley chromosomes, reference
Physical Map
Xbcd1030, Xrz575, Xcdo948, Xpm182
XksuA3
Xbcd204
Xpsr128
Xbcd157
XksuH1Xbcd1140, Xpm181
Xbcd9, Xwg583, Xcdo400, Xbcd183, tsn1Xmwg914, Xmwg72, Xpsr120, XksuQ63
Xpsr370, Xmwg862, Xpsr580
Xbcd873
Xabg705, Xbcd1871
Xwg363
Genetic Map
38.4
0.08.7
5.62.52.40.04.82.42.4
14.5
2.61.30.03.61.23.61.21.22.49.0
12.3
7.7
19.6
9.7
13.3
Xbcd873
Xabg705Xbcd1871
Xwg363
XksuA3Xbcd204Xpsr128Xbcd157XksuH1Xbcd1140Xpm181
Xmwg914Xmwg72Xpsr120XksuQ63Xbcd9Xwg583Xcdo400Xbcd183tsn1Xbcd1030 Xrz575
Xcdo948
Xpm182
Xpsr370
Xmwg862
Xpsr580
Figure 6 Wheat chromosome 5B genetic linkage map (left) compared to the physical map (right). The genetic linkage map was constructed using abackcross population and the physical map was constructed using the chromosome deletion lines of wheat. On the genetic linkage map, map unitsseparating markers are shown at the left, and markers are indicated on the right. On the physical map, hash marks on the left of the chromosome indicatedeletion breakpoints; black and hatched regions on the chromosome represent dark and light C-bands, respectively; and DNA markers and theirbin locations are shown to the right. Lines drawn between the maps indicate where deletion breakpoints occur relative to the genetic map. Notice thatthe centromeric region is nearly void of DNA markers and recombination, while more distal regions possess most of the DNA markers and recombination.
10 GENETICS OF GRAINS | Genome Mapping
sequences of wheat chromosome 3B, and sequence-ready
physical maps of some wheat chromosomes are available.
These ongoing efforts in wheat and barley are critical for devel-
oping amenable and high-yielding crops to fight various chal-
lenges emerging in the form of new diseases and changing
environmental conditions.
Exercises and Assignments for Revision
• What are the molecular markers?
• What are the differences between genetic mapping and RH
mapping?
• What are the limitations with HAPPY mapping in order to
develop a genome map?
• Which cereal genomes have been sequenced?
• What is comparative genome mapping?
Exercises for Readers to Explore the Topic Further
• What is the status of cereal crop genome sequencing
projects?
• How many wheat chromosomes are sequenced to date?
See also: Genetics of Grains: Wheat Genetics; Wheat Genomics.
Further Reading
Appels R, Morris R, Gill B, and May C (1998) Chromosome Biology. Boston, MA:Kluwer Academic, p. 401.
Devos KM and Gale MD (2000) Genome relationships: The grass model in currentresearch. Plant Cell 12: 637–646.
Faris JD, Friebe B, and Gill BS (2002) Wheat genomics: Exploring the polyploid model.Current Genomics 3: 577–591.
Feuillet C and Keller B (1999) High gene density is conserved at syntenic loci of smalland large grass genomes. Proceedings of the National Academy of Sciences of theUnited States of America 96: 8265–8270.
Jiang J and Gill BS (1994) Nonisotopic in situ hybridization and plant genomemapping: The first 10 years. Genome 37: 717–725.
Jiang JM and Gill BS (2006) Current status and the future of fluorescence in situhybridization (FISH) in plant genome research. Genome 49: 1057–1068.
Lander ES and Botstein D (1989) Mapping Mendelian factors underlying quantitativetraits using RFLP linkage maps. Genetics 121: 185–199.
Liu BH (1997) Statistical Genomics: Linkage, Mapping and QTL Analysis. Boca Raton,FL: CRC Press.
McCarthy LC (1996) Whole genome radiation hybrid mapping. Trends in Genetics12: 491–493.
Paterson AH (1996) Making genetic maps. In: Paterson AH (ed.) Genome Mapping inPlants, pp. 23–39. Austin, TX: R G Landes Company.
Paux E, Sourdille P, Mackay I, and Feuillet C (2012) Sequence-based markerdevelopment in wheat: Advances and applications to breeding. BiotechnologyAdvances 30: 1071–1088.
Redei GP (1999) Genetics Manual. Singapore: World Scientific, pp. 1141.Tanksley SD, Ganal MW, and Martin GB (1995) Chromosome landing: A paradigm for
map-based gene cloning in plants with large genomes. Trends in Genetics11: 63–68.
Tanksley SD, Young ND, Paterson AH, and Bonierbale MW (1989) RFLP mapping inplant breeding: New tools for an old science. Biotechnology 7: 257–263.
GENETICS OF GRAINS | Genome Mapping 11