genome mapping - elsevierscitechconnect.elsevier.com/.../07/genome-mapping.pdf · genome mapping is...

Genome MappingVK Tiwari, Kansas State University, Manhattan, KS, USAJD Faris, USDA-ARS Cereal Crops Research Unit, Fargo, ND, USAB Friebe, Kansas State University, Manhattan, KS, USABS Gill, Kansas State University, Manhattan, KS, USA

ã 2016 Elsevier Ltd. All rights reserved.

Topic Highlights

• Molecular markers are important for genetic and genome

mapping studies.

• Next-generation sequencing-based marker genotyping,

such as genotyping by sequencing, is an important aid for

gene and genome mapping.

• Single-nucleotide polymorphism-based marker develop-

ment and their detection.

• Genome mapping methods use recombination-dependent

and recombination-independent approaches.

• Comparative mapping is an important tool for genome

analysis in the crops where sequence information is not

available.

Learning Objective

• To achieve an understanding of the commonly used molec-

ular markers and approaches used for genome mapping

Introduction

Genome mapping is used to assign short DNA sequences

(molecular markers) or specific genes to particular regions of

chromosomes and to determine their relative linear orders and

distances. A map is an essential tool for scientists to navigate

across the genome. Genome maps can be divided into two

groups: genetic maps and physical maps. Genetic maps are

based on recombination frequencies between genetic markers

and genes, and linked markers/genes form linkage groups

showing their relative order. A physical map of a given chro-

mosome or a genome shows the physical locations of genes

and other DNA sequences of interest, and distances are typi-

cally measured in base pairs. Physical maps can be divided into

three general types: chromosomal or cytogenetic maps, radia-

tion hybrid (RH) maps, and sequence maps. The ultimate

physical map is the complete sequence itself.

Molecular Markers and Their Visualization

DNA-based genetic markers rely on differences in DNA

sequences (polymorphisms) between two parental lines. Poly-

morphisms can result from various factors that lead to either

nucleotide changes or differences in DNA segment lengths

such as mutations, errors in DNA replication, and insertions,

inversions, and deletions of DNA fragments.

There are several established approaches for the detection of

polymorphisms using molecular markers including restriction

fragment length polymorphism (RFLP), random amplified

polymorphic DNA (RAPD), amplified fragment length poly-

morphism (AFLP), sequence-tagged site (STS), microsatellites

or simple sequence repeats (SSRs), and single nucleotide poly-

morphism (SNP). Originating in the 1980s, RFLP markers

were the first type of DNA-based markers to be used. RFLPs

involve the use of a restriction enzyme, which cleaves DNA at

specific DNA sequence palindromes, and the hybridization of

a short-labeled DNA fragment, or probe, to the restriction

enzyme-cleaved DNA. The probe label reveals the restriction

fragment hybridized by the probe, and polymorphisms are

revealed when an insertion/deletion occurs between critical

restriction sites in one genotype compared to the other or

when a particular restriction site is abolished due to mutation

in one genotype and not the other. RFLP markers can be

applied to essentially any organism, and they are still

employed to a limited extent today due to their usefulness in

comparative mapping analysis and map-based cloning studies.

However, these markers are not amenable to high-throughput

analysis, and they are difficult and laborious to handle due the

large amounts of DNA required, enzymatic digestions, South-

ern blotting, and probe labeling techniques.

Besides the RFLP marker technique, all the other types are

based on the use of polymerase chain reaction (PCR). PCR-

based markers require the development of an oligonucleotide

primer, which is a fragment of DNA typically 15–30 nucleotides

in length, to serve as a starting point for PCR amplification on

template DNA. In a PCR reaction, template DNA is mixed with

primers, nucleotides, and a specific enzyme called Taq

polymerase, which polymerizes DNA fragments. The mixture

is placed into a thermal cycler and subjected to repeated cycles

of different temperatures to allow the template DNA to dena-

ture, the oligonucleotide primers to anneal to complementary

sites on the template DNA, and the Taq polymerase to catalyze

the synthesis of new DNA strands leading to the generation of

billions of copies of the target sequence. After the completion of

the PCR reaction, the amplified product is electrophoresed

through an agarose or polyacrylamide gel and subsequently

visualized by DNA staining or other technologies.

RAPD markers are DNA fragments from PCR-based ampli-

fication of random segments of genomic DNA with a primer of

arbitrary nucleotide sequences. RAPD markers were the first

PCR-based markers to be used but, today, have very limited

application in molecular biology and mapping studies due to

the unpredictability of short primers in PCR and low

repeatability.

AFLPs, which combine the use of restriction enzymes with

PCR, have been used extensively in a wide range of organisms.

Encyclopedia of Food Grains http://dx.doi.org/10.1016/B978-0-12-394437-5.00220-5 1

The AFLP technique uses restriction enzymes to digest the

genomic DNA followed by ligation of adapters to the sticky

ends of the restriction fragments to serve as priming sites for

PCR. Subsets of the restriction fragments are selected by using

primers with sequencing complimentary to the adapter

sequence and also one or two nucleotides within the restriction

fragments of the template DNA. The reactions often employ

end-labeled radioactive or fluorescent primers for the visuali-

zation of the amplified products on polyacrylamide gels. The

AFLP technology is also highly sensitive and reproducible and

has the capability to detect various polymorphisms in different

genomic regions simultaneously. AFLP has higher reproduc-

ibility, resolution, and sensitivity at the whole-genome level

compared to some of the other marker techniques, and it also

has the capability to amplify multiple fragments (50–100) in a

single PCR, which provides a high-throughput format.

STSs are short DNA sequences (200–500 bp) with known

genomic locations. STSs can be easily detected by the PCR

using specific primers. In complex genomes, STS markers

derived from the coding regions of genes, that is, the expressed

portion of genome referred to as expressed sequence tags

(ESTs), can be a very useful resource for mapping the locations

of expressed genes. These markers are usually codominant in

nature, which allows the identification of homozygous and

heterozygous individuals in a mapping population. The STS

sequences may contain repetitive elements with unique and

conserved sequences at both ends of the site, and in broad

sense, STS can have a site for markers such as microsatellites,

sequence-characterized amplified region, cleaved amplified

polymorphic sequences, and inter-simple sequence repeats.

Microsatellite markers, also called SSRs, are widely used in

gene and genome mapping studies. These are simple sequence

tandem repeats and the repeat units are generally di-, tri- tetra-,

or pentanucleotides. In a common repeat motif (e.g., in a tri-

repeat motif in wheat (GAA)n), two nucleotides G and A are

repeated for a variable number of times in a bead-like fashion

(n could range from 8 to 50). SSRs are usually found in non-

coding regions of DNA with a few exceptions. On both sides of

the repeat unit are flanking regions that contain unordered

DNA, and these flanking regions are most important to

develop locus-specific primers to amplify SSRs with PCR. The

number and repeats within a microsatellite tend to be highly

variable within a given species, which leads to a high frequency

of polymorphism even among closely related individuals.

Many large and complex genomes, especially those of some

plants, are composed of only about 10–20% gene sequences,

whereas the vast majority (80–90%) is composed of transposable

element (TE)-related sequences or repeat-based sequences. These

repetitive or TEs are widespread throughout the genome and

therefore represent a useful resource for whole-genomemapping.

These elements have higher levels of tolerance for mutations or

rearrangements, whichmake these TEs highly polymorphic and a

good source of marker development for genome mapping. Vari-

ous TE-based marker development approaches have been used

and someof themost common repeat-basedmarkers, whichwere

developed inwheat, belong to two classes including insertion site-

based polymorphismmarkers and repeat junctionmarkers. These

markers are based on PCR with primers designed in conserved

regions of TEs. In general, repeat sequences in the genome are not

unique, but the insertion sites or repeat junctions are. Therefore,

by developing primers that are specific to particular insertion sites

or repeat junctions, it is possible to develop genome-specific

markers (Figure 1). After the identification of an insertion site or

repeat junction, the flanking sequences can be used to design the

primers. After the fragment is PCR-amplified, there are various

detection methods available for visualizing the marker polymor-

phisms including high-resolution melting analyses, temperature

gradient capillary electrophoresis, and fluorescent capillary

electrophoresis.

With advances in next-generation sequencing (NGS) tech-

nology, it is less expensive to determine the DNA sequence of a

fragment, and this has led to dramatic advances in high-

throughput marker technologies. With restriction site-

associated DNA (RAD) markers, the flanking DNA sequence

around each restriction site is an integral component for isola-

tion of restriction site-associated tags. The application of the

flanking DNA sequences in RAD tag techniques is referred as

reduced-representation method. The RAD tag isolation proce-

dure has been modified for use with high-throughput sequenc-

ing on the Illumina sequencing platform, to reduce error rates

and make the process high throughput. Isolated RAD tags can

be used to identify and genotype DNA sequence-based poly-

morphisms such as SNPs, and these polymorphic sites are

called as RAD markers.

The advent of automated Sanger sequencing and especially

recent advances in NGS technologies led to the development

of a second generation of markers based on sequence informa-

tion. SNPs differ by a single nucleotide A, T, C, or G at a given

Gene or unknown sequence

Transposable sequences

TE junction

(a)

(b)

(c)

(d)

Figure 1 Types of repeat junctions in a given genomic DNA sequence that can be used for designing unique locus-specific markers: (a) A repeatjunction between two different transposable elements (TEs). (b) Two repeat junctions with two different TEs (black and green) and an unknown sequence(pink). (c) Repeat junction with a TE on one side and a gene fragment or unknown sequence on the other side. (d) Two repeat junctions (nested)created by a TE inserting into another TE.

2 GENETICS OF GRAINS | Genome Mapping

locus between different individuals, populations, and parental

lines (Figure 2). If this variation occurs between the members

of the same population, these variations are considered alleles

(e.g., A or T), and most SNPs have only two alleles. SNPs have

emerged as the markers of choice because of their abundance

and high-throughput detection capacities. There are many

ways to identify SNPs starting from a low-throughput method

like PCR amplification followed by electrophoresis, sequence

detection, and mass spectrometry to high-throughput NGS-

based SNP discovery. After generating sequences for SNP

discovery, the next step is to detect useful SNPs. Manual iden-

tification of putative SNPs had been a major bottleneck for

high-throughput SNP calling, but now, there are numerous

software programs available for SNP discovery. These pro-

grams (CASAVA, GS Amplicon, BioScope™, NextGENe®,GigaBayes, SNPdetector, PolyScan, etc.) are very important

for the development of accurate computational methods for

automated SNP calling. There are established approaches and

protocols for SNP discovery in many species, and for species

with reference genome sequences, NGS reads can be mapped

on the reference sequences and SNP discovery can be made.

However, SNP discovery can also be done in the species with-

out a reference sequence. There are many assays available for

SNP genotyping including Illumina GoldenGate, KASPar,

iPLEX Gold technology, and Illumina BeadChips, to name a

few (Figure 3(a) and 3(b)). Exciting progress has beenmade in

Genotype1

SNP site

TTGGCCTGATTTTAGTGGTACGGCCCCGTCACCCGTGATTGGTGAAGTTGGAATGGAGGATTGGCCTGATTTTAGTGGTATGGCCCCGTCACCCGTGATTGGTCAAGTTGGAATGGAGGAGenotype2∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

Figure 2 Identification of SNP sites in a DNA sequence: Two SNPs between genomic DNA of two genotypes are shown. The length of the sequence is60 base pairs, and genotype 1 and genotype 2 show variants at positions 21 [C/T] and 44 [G/C].

A

T

G

G

A

GCG

G

C

CG

CG

CG

AT

ATAT

A

C

T

AT

A

A

GCG

TA

C

GA

A G

A G

T

T

C

C

Figure 3 (a) Hybridization-based SNP genotyping method (Illumina Infinium assay): In this assay, the genomic DNA is captured by direct hybridizationto array-bound target sequences (50 bases directly upstream of the SNP). Followed by hybridization, a single-base extension reaction withdideoxynucleotides (fluorescent) is used at the target SNP nucleotide. Differences in the relative intensity of fluorescent signals can be used to makegenotyping calls. (b) PCR-based genotyping methods (Applied Biosystems’ TaqMan assay): For each locus, two common locus-specific primers aredesigned on each side of the SNP to amplify the fragment spanning the polymorphic site. Two fluorescence resonance energy transfer (FRET)-labeledoligonucleotides called TaqMan probes are then added to the PCR. Each probe is specific to one of the alleles and is designed to hybridize atthe SNP site between the forward and reverse primers. By design, these have a reporter dye at their 50 end (different for each allele) and a quencher(Q) at their 30 end. If there is no reaction, the probes are intact and the reporter dye’s emission is suppressed by the quencher. During the PCRamplification, the Taq polymerase cleaves the probe that anneals to the template and separates reporter and quencher resulting in the emission offluorescence from the reporter. Genotype calling can then be made according to the fluorescent signal.

GENETICS OF GRAINS | Genome Mapping 3

sequencing technologies that are providing high-throughput

molecular marker information at low costs. Genotyping by

sequencing (GBS) provides marker polymorphisms using

NGS technologies followed by a bioinformatics pipeline. It is

a preferred method for several reasons including reduced cost

through an enzyme-based genomic complexity reduction step

and the use of barcoded adapters for multiplexing. Addition-

ally, it can be used for the discovery and identification of SNPs,

even for those species with complex genomes that lack a refer-

ence sequence. GBS has advantages when studying polyploid

species, which is a big challenge for any technology. It relies on

secondary genome-specific polymorphisms that are next to the

SNP, and it allows the assignation of a given sequence to a

specific genome so it becomes a single-locus marker.

Genetic Linkage Mapping

Markers are powerful for many diagnostic applications for

typing biological samples in determining the identity of

unknown samples, sample mixtures, criminal justice system,

and curation of biological collections, to name a few. High-

density genetic linkage maps facilitate map-based cloning,

quantitative trait mapping, marker-assisted breeding, and com-

parative genome evolution. Genetic mapping relies on the fact

that nuclear genomes are made up of chromosomes, which

contain both genes and noncoding DNA. When homologous

chromosomes pair at meiosis, they recombine at various posi-

tions along the chromosomes. Thus, recombination is the

basis for genetic linkage mapping and determining the order

of markers along the chromosome, that is, markers are sepa-

rated by genetic distances calculated based on the amount of

meiotic recombination that occurs between them.

An example of genetic linkage mapping of three linked

markers in 20 F2 progeny is presented in Figure 4. The markers

include two DNA markers (A and B) and one morphological

marker (disease resistance gene ‘R’). The DNA markers are

codominant, and therefore, all possible genotypes can be

determined in the F2 progeny (homozygous for parent A,

homozygous for parent B, and heterozygous). For the morpho-

logical marker, disease resistance is dominant, and therefore,

the genotypic classes of heterozygous and homozygous for the

resistant parent (parent A) cannot be distinguished (resistant

plants can have allelic compositions of ‘RR’ or ‘Rr,’ and suscep-

tible plants have ‘rr’). Inspection of Figure 4 indicates there are

three individuals (2, 6, and 12) with genotypes that differ

between markers A and B. Between A and R, there are two

individuals (6 and 12) with differing genotypes, and one indi-

vidual (2) has differing genotypes between markers B and R.

This suggests that marker R (disease resistance gene) lies

between markers A and B. The two recombination events

between markers A and R translate into ten map units (2/

20�100¼10), and there are five map units between markers

B and R (1/20�100¼5).

Par

ent

A

Par

ent

BF 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

F2 progeny

Marker A

Marker B

Disease resistance (R)gene

AR geneB

5 10

Linkage Map

R S R R R S R R R R R S R S R R R R S R R R R

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Phenotype: R = resistant; S = susceptible

Genotypes: Parent A = RR; Parent B = rr; F1 = Rr; F2 progeny R = RR or Rr, S = rr

Figure 4 Genotypic data of two DNA markers (A and B) and phenotypic data for one morphological marker (disease resistance gene ‘R’) fortwo parents, the F1 plant derived from crossing the two parents, and 20 F2 individuals. The DNA markers are codominant; thus, all possible genotypescan be distinguished (homozygous for parent A and heterozygous and homozygous for parent B). The morphological marker ‘R’ is dominant,and therefore, the genotypes of resistant F2 individuals cannot be distinguished (resistant plants can be either homozygous for parent A (RR) orheterozygous (Rr)). The resulting genetic linkage map of the three loci and genetic distances separating them are shown at the bottom.


This type of analysis can be applied to hundreds, or even

thousands of markers to construct complete genetic linkage

maps of chromosomes. Fortunately, there are various com-

puter software programs available to handle such large data

sets and to determine the most likely marker orders and inter-

marker distances.

The number of individuals surveyed in a mapping popula-

tion determines the precision of the genetic distance measured.

In the example, only 20 individuals were surveyed, and if no

recombinants were identified between twomarkers, this would

translate to a genetic distance of 0 map units between the

markers. If 100 individuals were surveyed, then one or more

recombinants may be identified leading to a genetic distance of

one or more map units. Generally, initial genetic maps of plant

species are generated using 80–120 individuals, which allows

for the detection of recombination between markers one to

three map units apart. This level of precision is considered

acceptable, and, at the same time, the amount of labor and

cost is considered manageable. However, certain mapping

experiments such as map-based cloning of genes by chromo-

some walking require much higher resolution in order to

separate markers extremely close to the target gene. In these

experiments, it is not uncommon to survey 3000–5000 indi-

viduals to obtain the necessary level of precision.

In plants, most populations are derived from crossing two

highly homozygous parents. The population shown in the

example in Figure 4 is an F2 population. While F2 populations

are commonly used and generally a good choice for chromo-

some mapping, other types of populations, such as backcross

(BC), doubled-haploid (DH), and recombinant inbred (RI),

are also commonly used. However, DH technology is not easily

accomplished in some crops, and it is currently impossible in

others. Each type of population has its advantages and disad-

vantages. F2, BC, and DH populations can be developed very

rapidly, while RI populations are developed by advancing each

line by single-seed descent for many generations with the goal

of selfing to homozygosity. F2 and BC populations are short-

lived and provide limited opportunity to obtain DNA and

phenotypic data, while DH and RI populations provide essen-

tially pure lines that may be tested for traits in replicated

experiments over several environments if desired. Thus, RI

and DH populations are preferred for mapping of quantitative

traits that may be affected by environmental influences. BC

and DH populations result from one cycle of meiosis, but an F2population has undergone recombination in both male and

female gametes and, therefore, provides twice the recombina-

tion information. RI populations have undergone several

cycles of meiosis but contain two identical homologues and,

therefore, provide about the same amount of information

as an F2.

The development and analysis of genetic linkage maps lead

to an abundance of information regarding genome structure.

From a more applied perspective, they provide knowledge

regarding the locations of genes and DNA markers associated

with them. In a segregating population, morphological

markers can be scored and analyzed in the same manner as

DNA markers. The difference in scoring for morphological

makers compared to DNA markers lies in the fact that, for

morphological markers, the genotype is determined based on

the visualization of the plant’s phenotype, while DNA markers

are scored at the DNA level. For example, a population segre-

gating for resistance to a particular disease would be scored

based on the reaction of each individual to the disease as being

one of either parental type. Inclusion of this phenotypic data

with genotypic DNA marker data for map generation might

reveal that the disease resistance gene is flanked by closely

linked DNA markers. Such markers are valuable tools that

can be employed by plant breeders who wish to move the

disease resistance gene into elite lines for the development of

new and improved varieties. Using the markers to make selec-

tions is known as marker-assisted selection (MAS). MAS has

advantages over selecting for the trait itself in that markers are

not affected by environmental factors as phenotypic traits

sometimes are. In addition, MAS allows breeders to make

selections in early generations and growth stages allowing

them to eliminate undesirable material early on.

In a nutshell, genetic mapping is a great resource for trait

mapping and map-based cloning studies; however, a genetic

map is not sufficient for sequencing a genome. Polymorphism

level is many times a limitation for genetic mapping; however,

the advanced and low-cost NGS approaches have been a big

boost to overcome this limitation. GBS has been widely

accepted and is now being used to map target traits in various

crops. In addition, sequence-based genome mapping, where

members (mostly RI lines) of a given mapping population are

sequenced to high coverage, is now gaining momentum as

well. This approach is very useful for generating a large number

of sequence tags, which can be assembled and anchored on the

genetic map in a chromosome-wise manner. However, precise

ordering of these tags/contigs will be an issue due to the

limitation of genetic mapping in terms of the number of

recombination events. The resolution of a genetic map

depends on the number of recombination events that have

been scored in a given population. Recombination events are

not uniformly distributed across the length of the chromosome

as recombination is suppressed around the centromeric

regions. So reduced or nearly absent recombination events

affect the resolving power of linkage analysis, which means

that genes that are several kilobases to megabases apart may

appear at the same position on the genetic map.

Physical Mapping

In contrast to genetic mapping where distances between land-

marks are calculated based on recombination frequency, phys-

ical mapping determines the actual physical distance. Physical

mapping can be done cytologically by chemically staining and

viewing whole chromosomes using techniques such as in situ

hybridization (ISH) and C-banding. Such techniques have very

low resolution in terms of physical mapping because chromo-

somes are viewed at the cellular level usually at metaphase.

However, recent techniques, such as fiber-fluorescence in situ

hybridization (FISH) where nuclear DNA is lysed on a glass

slide and used for in situ mapping, can provide a much higher

resolution (see succeeding text). The highest-resolution physi-

cal mapping is obtained by sequencing the DNA itself. It is

usually preceded by constructing local contiguous sequences

(contigs) of large-insert DNA clones and anchoring the contig

to a genetic map.


In Situ Hybridization

The ISH technique was developed about 45 years ago and

allows the localization of genes or DNA sequences directly on

chromosomes in cytological preparations. The ISH technique

uses probe DNA that is labeled with biotinylated dUTP or

digoxigenin-dUTP and the hybridization sites are detected by

enzymatic reporter molecules such as horseradish peroxidase

or alkaline phosphatase-conjugated avidin/streptavidin. ISH

has been used successfully to determine the physical location

and distribution of dispersed or tandemly repetitive DNA

sequences on individual chromosomes. For example, it has

been used to determine the physical location of multicopy

gene families such as the 5S and 18S–26S ribosomal genes.

FISH uses fluorochromes for signal detection. The FISH

technique allows different DNA probes to be labeled with

different fluorochromes that emit different colors (multicolor

FISH). Thus, the physical order of two or more probes on a

chromosome can be determined simultaneously. Also, FISH

can allow more precise mapping of probes because the fluo-

rescence signals can be analyzed with special cameras and

digital imaging tools.

In humans, the order of two DNA probes can be deter-

mined by ISH on metaphase chromosomes only if the two

sequences are separated by at least 1 Mb. However, when ISH

is done using interphase nuclei, DNA sequences separated by

as little as 50 kb can be resolved. Plant metaphase chromo-

somes are more condensed than human metaphase

chromosomes, and this may be one reason why ISH using

low-copy probes is more difficult in some plant species. Thus,

it has been suggested that interphase nuclei can be exploited

for ISH mapping in plants. Subsequently, experiments where

DNA probes were hybridized to maize interphase nuclei sug-

gested that the resolving power of interphase FISH mapping

can be as little as 100 kb.

FISH technique has been used successfully to determine the

physical location of bacterial artificial chromosome (BAC)

clones on interphase and metaphase chromosomes. Rice BAC

clones have been hybridized to rice (Oryza sativa L.) chromo-

somes revealing that the repetitive DNA sequences in the BAC

clones could be efficiently suppressed by using rice genomic

DNA as a competitor in the hybridization mixture. The suc-

cessful application of this technique to plants with very large

genomes may depend on the size of the genomic clones ana-

lyzed and the amount of repetitive sequences in the genome.

Fiber-FISH

Fiber-FISH technique uses extended chromatin DNA across a

glass slide and a probe is labeled as with standard FISH and

hybridized to the extended fibers and where DNA sequences,

which are only a few kilobases apart, can be ordered. In

humans, fiber-FISH has been used to analyze overlapping

clones, detect chromosomal rearrangements, determine the

physical distances between genes, measure the sizes of long

DNA loci, and aid in the positional cloning of specific genes.

Fiber-FISH was used in Arabidopsis thaliana to measure clusters

of DNA repeats as long as 1.71 Mb, which is more than 1% of

the Arabidopsis genome. It was found that fiber-FISH signals

derived from small DNA fragments (<3 kb) were often

observed as single spots on extended DNA fibers, and thus,

sequences that are less than 5–10 kb apart cannot be ordered.

Single-Copy Gene FISH

Single-copy gene FISH is an approach to develop a cytogenetic

map of a given chromosome using full-length cDNA (fl-cDNA)

probes. Because genes and gene syntenic blocks are conserved

between different grass species such as wheat, barley, rice, and

maize, single-copy FISH provides a rapid method for determin-

ing chromosome synteny for species for which little genetic or

cytogenetic mapping information is available. In an event of

transferring important genes from wild relatives to bread wheat

(Triticum aestivum L., 2n¼6�¼42, AABBDD) by induced

homoeologous recombination, it is important to know the

chromosomal relationships of the species involved. Single-

copy FISH provides a powerful and rapid method for deter-

mining genetic relationships of relatively little studied wild

relatives with those of wheat. Once identified from single-

gene markers, fl-cDNA probes are used for FISH and the respec-

tive positions of these probes are determined to develop a

cytogenetic map. This technique can also be used to identify

structural changes between the homoeologous groups of chro-

mosomes, between the genomes of wheat, and other species

from the Triticeae tribe. This provides important information

on the strategies to be used for exploitation of those species for

wheat improvement.

Aneuploid Mapping

Wheat is a polyploid and can tolerate a high degree of aneu-

ploidy (abnormal chromosome numbers). There are a vast

array of aneuploidy stocks such as nullisomic–tetrasomic

(NT) lines and the ditelosomic (dt) lines. NT lines lack one

pair of chromosomes and extra pair of homoeologous chro-

mosomes and allow arm mapping of genes. Ditelosomic lines

lack one pair of chromosome arms and allow arm mapping of

genes.

With today’s molecular technology, the power and utility of

the wheat aneuploids have been even more fully realized. DNA

markers can be quickly located to a specific chromosome or

chromosome arm using a single hybridization or amplification

reaction without the need for polymorphism. Telocentric chro-

mosomes can be flow-sorted and DNA-amplified and used for

NGS for marker development. Dense chromosomal arm maps

have been developed and genes identified and ordered to

specific chromosome arms. These maps are useful for gene

tagging, linkage and mapping of quantitative trait loci (QTL),

cytogenetic manipulations, estimation of genetic distance, and

evolutionary studies.

Chromosome Deletion Mapping

A unique system in wheat is the use of gametocidal (Gc) factors

to construct chromosome deletion lines. Gc chromosomes


were introduced into wheat by interspecific hybridization with

the related Aegilops species and backcrossing. Plants monoso-

mic for the Gc chromosome produce two types of gametes.

Only those gametes possessing the Gc chromosome are nor-

mal. Gametes lacking the Gc chromosome undergo structural

chromosome aberrations and, in most cases, are nonfunc-

tional. However, if the damage caused by the chromosome

breakage is not sufficient to kill the gamete, it may still function

and be transmitted to the offspring.

The Gc system has been used to develop wheat lines with

terminal chromosome deletions. These stocks have proved very

useful for the physical mapping of genes and DNA markers to

subarm locations and for the development of physical maps,

which have been constructed for all seven homoeologous

chromosome groups of wheat. In addition, chromosome bin

maps of most of the expressed genes in the wheat plant have

been constructed using a set of wheat aneuploid and deletion

lines (http://wheat.pw.usda.gov/wEST/binmaps/).

HAPPY Mapping

Another genome mapping approach known as HAPPY map-

ping has been used for genome mapping studies. This

approach is based on haploid DNA samples analyzed using

the polymerase chain reaction (HAPPY). HAPPY mapping

does not require marker polymorphism or time-consuming

population development. It is an in vitro approach for the

ordering of DNA markers directly on native genomic DNA

and is based on analyzing the segregation of markers amplified

from high-molecular-weight genomic DNA. It is a three-step

process. First, genomic DNA is broken into random fragments

using gamma irradiation or mechanical shearing. The DNA is

isolated and analyzed for quality and integrity, which is the

most important aspect of the technique. Various protocols

have been tested and used to avoid unwanted mechanical

breakage of the DNA molecules. It is usually done by embed-

ding the living cells in agarose gel; during DNA extraction, long

molecules of chromosomal DNA remain trapped and pro-

tected within the agarose. The high-quality DNA (DNA solu-

tion) is then subjected to random fragmentation using

mechanical shearing, gel melting, and x-ray treatments. The

average size of the broken fragments depends on the dosage

or mechanical shearing used. The next step involves the devel-

opment of a ‘mapping panel,’ and to achieve this, broken DNA

fragments are diluted to a very low concentration and �100

samples from individual treatments usually get dispensed into

DNA collecting plates or tubes. Since these samples are very

small, each well or tube may represent a small incomplete set

of random fragments. The third and final step involves a highly

sensitive PCR followed by the scoring of markers as present or

absent in the HAPPY mapping panel.

Genotyping of large sets of markers and detailed analysis of

marker data can be used for the construction of maps and to

calculate precise locations of markers on a given chromosome

or genome. Because the samples in a mapping panel are so

small that each one will contain only a randomly sampled

subset of the markers rather than the complete genome, a

given marker tested on the panel can be present in only one

subset of the panel. If two marker loci are close together, then

they will remain on the same broken fragments and not show

any break between them, whereas distant markers may be lost.

With increasing distances between two marker pairs, the fre-

quency of random breaks between them will also increase. The

statistical analysis of the cosegregation frequencies and differ-

ent mapping software can be used to deduce a marker or map

order based on the data generated from the HAPPY mapping

panel. There are certain limitations attached to this approach.

The first is that it is difficult to prepare DNA fragments of more

than a few megabases in size, and therefore, intermarker dis-

tances of more than one megabase are difficult to measure.

Another major limitation is the sample size of the DNA in the

mapping panel, as all markers need to be mapped by PCR.

RH Mapping

RH mapping has been exploited in animal genome mapping

projects and is a recombination-independent approach. It was

pioneered in the human genetics arena and uses radiation-

induced chromosome breakage rather than meiotic recombi-

nation for mapping. After fragmentation, samples containing

different subsets of the original chromosome or genome are

isolated and used for marker assays. In this method, any given

mapping panel member is assayed for the presence or absence

of a given marker, thus circumventing the need for marker

polymorphisms between genotypes.

Gross and Harris produced the first RHs by irradiating the

cultured human cells with a high dose of x-rays and their

subsequent fusion to unirradiated hamster cells. Generated

RHs showed many broken fragments of human chromosomes

with unfragmented chromosomes of hamster cells. The

approach was then modified and applied to a number of

animal species. In the modified approach, donor cells are

irradiated and then fused to unirradiated host cells, and RHs

containing donor chromosome fragments are identified using

selectable markers for a given species. Species-specific RHs can

be isolated, cultured, and saved as an immortal resource.

For genome (RH) mapping, the DNA of �100 hybrid cell

lines (each containing a different set of donor fragments) can

be assembled as an RH panel. The assembled panel can be used

for marker genotyping and the order and distances of the

markers in a given genome can be inferred. Mapping resolu-

tion in an RH panel is a function of the size of the fragments

that are generated during the development of the mapping

panel. Therefore, the mapping resolution can be altered by

simply changing the level of chromosome fragmentation.

Additionally, in RHs, map distances better reflect the true

physical distance between markers than do recombination-

based maps, so maps constructed by the RH approach can

better approximate the physical layout of a given chromosome.

The RH approach has been used to map the human genome

along with various animal genomes; however, its application

in plants has been limited. RH mapping in plants was first

reported for a maize chromosome, and then, it was applied

to cotton, barley, and wheat. Recently, RH mapping was used

for genome mapping of hexaploid wheat (Figure 5). Figure 5

presents a scheme for the development of an RH panel for D-

genome chromosomes of hexaploid wheat. Pollen from the

reference hexaploid wheat Chinese Spring was irradiated using


gamma radiation, and these pollen samples were used to pol-

linate a tetraploid wheat line Altar84. F1 seeds (pentaploid)

represent an RH panel and each plant from these seeds presents

a unique RH event. Chromosome lesions induced in the A and

B genomes of Chinese Spring are masked in these quasi-

pentaploids due to the presence of A and B genome chromo-

somes from the tetraploid parent, but the chromosomes from

the D genome are present in one copy and allow RH mapping

of all D-genome chromosomes simultaneously. It has been

found that using a small RH panel (�94 lines), map resolution

of up to�300 kb can be achieved throughout the length of any

given chromosome in hexaploid wheat. The RH panel can be

used to anchor and order BAC contigs, derived from flow-

sorted chromosome arm-specific libraries to individual wheat

chromosomes. RH panels will also be highly useful for ongo-

ing wheat genome sequencing projects for ordering of

sequence scaffolds.

Large-Insert Clone Contigs

The construction of physical contig maps is important for

facilitating positional cloning of genes, sequencing of genomic

DNA, and detailed analysis of chromosome and genome struc-

ture. Physical contig mapping is the arrangement of large-insert

clones (YACs, BACs, and cosmids) in a linear array that

represents the DNA sequence along the chromosome. Clones

are selected by screening a library with DNA probes used to

detect genetic markers on a genetic linkage map of the organ-

ism. Several DNA probes that detect closely linked genetic loci

will hybridize to corresponding large-insert clones, and these

clones can then be arranged into a contig based on overlapping

segments and fingerprinting. BAC contigs are currently being

developed in many crop species. However, crops with complex

genomes offer huge problems due to large genome size, poly-

ploid nature, and very high percentages of repetitive sequences.

To address these issues in wheat, a sophisticated flow-

sorting technique was applied for isolation of individual

chromosomes or chromosome arms. The DNA from these

flow-sorted chromosomes and arms was used for the develop-

ment of BAC libraries. These BAC libraries laid the foundation

for the physical mapping of the wheat genomes. Once a phys-

ical contig map is complete, the structure and organization of

the genome, such as the distribution of repetitive and single-

copy sequences, can be discerned. A BAC-by-BAC approach has

been considered as the most suitable approach for generating

reference genome maps of barley and wheat. In this method, a

BAC library for an individual chromosome is the starting point

and BAC contigs are constructed from individual BACs by

identifying BACs containing overlapping fragments. Ideally

then, the BAC contigs are anchored onto a genetic or RH map

of the genome, so that the sequence data from the contig can

Hexaploid wheat lineChinese Spring

2n=6x=42 (AABBDD)

Green HousePlanting

X

Gammairradiation

Pollen

Genotyping

About 25 days after pollination spikes wereharvested which carried RH1 seeds.. EachSeed represents a Chinese Spring-RH andindependent deletion event(s).

Green house planting,tissue collection, DNA

extractionRH1

2n=5x=35(AABBD)

Emasculation oftetraploid wheat

spikes

Eggn=2x=14(AB)

Pollenn=3x=21(ABD)

Pollenn=3x=21(ABD)

Tetraploid wheat lineAltar

2n=4x=28 (AABB)

Figure 5 Development of Chinese Spring D-genome radiation hybrid panel: The spikes of hexaploid wheat cultivar Chinese Spring (T. aestivum;2n¼6�¼42, AABBDD) were used for g-irradiation. Pollen from irradiated spikes was immediately used to pollinate the stigmas of emasculated florets(male anthers removed) of tetraploid wheat variety Altar 84 (T. turgidum; 2n¼4�¼28, AABB). Seeds of F1 hybrids were harvested �20 days afterpollination. Each surviving F1 seed (RH1-pentaploid) on germination represents a unique RH event. DNA samples of the individual RH1 plants were thenharvested and genotyped for RH mapping.


be checked and interpreted by looking for markers or genes

known to be present in a particular region. The BACs consti-

tuting the minimum tiling path are then individually

sequenced by the shotgun method and assembled into a pseu-

domolecule providing a sequence of each chromosome.

Comparing Physical Distance to Genetic Distance

Physical maps have led to a wealth of information regarding

the physical locations of morphological traits and evolutionary

translocation breakpoints and genome-wide structure and

organization. Comparisons of the physical maps with genetic

linkage maps can reveal the physical distribution of genes and

recombination along the chromosome. For example, RFLP

probes derived from mRNA (called cDNA probes) represent

expressed genes, and thus, the physical mapping of cDNA

probes will reveal the physical locations of expressed genes.

Therefore, when sets of cDNA probes are mapped genetically as

well as physically, one can infer the relationship between

physical distances and genetic distances among the common

markers. In wheat, physical maps constructed using the chro-

mosome deletion lines have been compared extensively to

corresponding genetic maps of the same chromosomes. This

work has revealed that genes and DNA markers tend to be

clustered in small physical segments that undergo a high

degree of recombination (Figure 6). These gene-rich regions

are separated by large gene-poor segments that undergo very

little recombination. This work has facilitated BAC contig con-

struction of regions containing genes of interest for the purpose

of positional cloning.

In barley, physical maps generated based on translocation

breakpoints were compared to corresponding genetic linkage

maps. The results agreed with those found in wheat by deletion

mapping and showed that the barley genome consists of rela-

tively small gene-rich regions that are hot spots for recombina-

tion interspersed among large segments that are gene-poor and

undergo very little recombination. The information obtained

by physical mapping of translocation breakpoints has facili-

tated the construction of BAC contigs and positional cloning of

important genes by allowing researchers to focus on the gene-

rich regions of the genome. More intricate comparisons of

physical and genetic relationships can be obtained by compar-

ing local BAC contigs to genetic maps. The primary goal of such

experiments is to identify a large-insert clone containing a gene

of interest, but additional important information is obtained.

For example, once a physical contig map of the region is

developed, it can be compared to the genetic linkage map of

the corresponding region to calculate physical to genetic dis-

tance ratios. This is important information because recombi-

nation is known to be distributed nonrandomly throughout

the genomes of many plant species causing the physical to

genetic distance ratios to be highly variable depending on the

characteristics of the region.

Comparative Mapping

Much effort has been put forth in comparing the genomic

relationships among grasses and among members of other

plant families. For example, comparative mapping experi-

ments among members of the Poaceae such as wheat, rice,

barley, rye, oat, and maize have revealed remarkable similari-

ties in gene content and marker synteny at the chromosome

level. It is well established that DNA probes cloned from these

related species commonly identify sets of orthologous loci that

lie at approximately the same positions relative to each other

and to the centromeres. GenomeZipper-based consensus

maps, which integrate ordered gene loci from homoeologous

wheat genomes and the corresponding chromosomes of bar-

ley, Ae. tauschii, T. monococcum, and rice, have been con-

structed. These experiments have shown that the genomes of

barley, Ae. tauschii, and T. monococcum are essentially colinear

with that of wheat. The genomes of more distantly related

cereals such as oat, rice, and maize can be divided into linkage

blocks that have homology to corresponding segments of the

wheat genome. The degree of genomic similarities observed at

the chromosome level among grass genomes led to the notion

that information from the small genome of rice could be

directly applied to the much larger genome of wheat. However,

even though a substantial degree of synteny is observed at the

chromosome level, studies of the degree of microcolinearity

between rice and wheat show less promise for gene discovery

in wheat. Genes with conserved order across these three species

with sequenced genomes can be used to predict the order of

corresponding genes conserved in other grass species using

synteny-based analysis.

There have been exciting developments in genome map-

ping studies in grasses in terms of the development of high-

density genetic maps and physical maps. This was followed by

the generation of EST databases in cereals. In the recent past,

large-scale genome sequencing projects in grasses have been

successfully implemented, the list including rice, Brachypodium,

sorghum, maize, and foxtail millet. These studies provided

extensive information on the genome organization of major

cereals. Knowledge gained from the genome sequencing has

enhanced understanding of the structural and functional com-

ponents of the genome for its effective utilization in genetic

improvement of cereals. Genome maps (whole-genome

sequences) of the diploid model grass Brachypodium (genome

size 272 Mb) are available, and these provide a useful resource

to study the evolution of genomes across the grasses. Among

sequenced cereal crops, rice has a smaller genome (420 Mbp)

and higher gene density as compared to other cereals; sorghum

is positioned after rice with genome size of �730 Mb, whereas

the maize genome is larger (2.3 Gb), and it has undergone

several rounds of genome duplications and is distinguishable

from its close relative, sorghum. Reference genome maps of

sorghum and foxtail millet are available, and altogether, these

reference genome maps provide a great resource to study com-

parative genomics in order to develop mapping information

about an orphan grass or cereals with no genomic information.

There are many software programs and databases devel-

oped to look at the syntenic relationship of the cereal genomes.

Recently, a GenomeZipper approach was developed to provide

an extensive database for studying syntenic relationships

among grass genomes (between wheat, Brachypodium, rice,

sorghum, and barley genomes). The GenomeZipper uses a

novel approach that allows systematic exploitation of con-

served synteny with model grasses. For example, it allowed


the assignment of 86% of the total estimated (�32000) barley

genes to individual chromosome arms.

Future Mapping Prospects

The ultimate goal in map construction is the deciphering of the

linear DNA sequences of the full complement of chromosomes

of an organism and the utilization of map information in trait

mapping. The whole-genome sequence information available

in major cereals like rice, sorghum, maize, and foxtail millet

has revolutionized the understanding of the mechanisms

underlying genome evolution in these important cereal crops

as well as unraveling the important mechanisms in plant

growth and developmental processes and tolerance to various

biotic and abiotic stresses. The practical applications of the

genome maps and reference sequences are best realized only

when allelic diversity among diverse germplasm is better

understood. In crops where sequence information is not avail-

able, comparative genomics-based tools can be very useful for

providing a virtual gene order based on synteny. Sequence-

ready physical maps of diploid barley chromosomes, reference

Physical Map

Xbcd1030, Xrz575, Xcdo948, Xpm182

XksuA3

Xbcd204

Xpsr128

Xbcd157

XksuH1Xbcd1140, Xpm181

Xbcd9, Xwg583, Xcdo400, Xbcd183, tsn1Xmwg914, Xmwg72, Xpsr120, XksuQ63

Xpsr370, Xmwg862, Xpsr580

Xbcd873

Xabg705, Xbcd1871

Xwg363

Genetic Map

38.4

0.08.7

5.62.52.40.04.82.42.4

14.5

2.61.30.03.61.23.61.21.22.49.0

12.3

7.7

19.6

9.7

13.3

Xbcd873

Xabg705Xbcd1871

Xwg363

XksuA3Xbcd204Xpsr128Xbcd157XksuH1Xbcd1140Xpm181

Xmwg914Xmwg72Xpsr120XksuQ63Xbcd9Xwg583Xcdo400Xbcd183tsn1Xbcd1030 Xrz575

Xcdo948

Xpm182

Xpsr370

Xmwg862

Xpsr580

Figure 6 Wheat chromosome 5B genetic linkage map (left) compared to the physical map (right). The genetic linkage map was constructed using abackcross population and the physical map was constructed using the chromosome deletion lines of wheat. On the genetic linkage map, map unitsseparating markers are shown at the left, and markers are indicated on the right. On the physical map, hash marks on the left of the chromosome indicatedeletion breakpoints; black and hatched regions on the chromosome represent dark and light C-bands, respectively; and DNA markers and theirbin locations are shown to the right. Lines drawn between the maps indicate where deletion breakpoints occur relative to the genetic map. Notice thatthe centromeric region is nearly void of DNA markers and recombination, while more distal regions possess most of the DNA markers and recombination.


sequences of wheat chromosome 3B, and sequence-ready

physical maps of some wheat chromosomes are available.

These ongoing efforts in wheat and barley are critical for devel-

oping amenable and high-yielding crops to fight various chal-

lenges emerging in the form of new diseases and changing

environmental conditions.

Exercises and Assignments for Revision

• What are the molecular markers?

• What are the differences between genetic mapping and RH

mapping?

• What are the limitations with HAPPY mapping in order to

develop a genome map?

• Which cereal genomes have been sequenced?

• What is comparative genome mapping?

Exercises for Readers to Explore the Topic Further

• What is the status of cereal crop genome sequencing

projects?

• How many wheat chromosomes are sequenced to date?

See also: Genetics of Grains: Wheat Genetics; Wheat Genomics.

Further Reading

Appels R, Morris R, Gill B, and May C (1998) Chromosome Biology. Boston, MA:Kluwer Academic, p. 401.

Devos KM and Gale MD (2000) Genome relationships: The grass model in currentresearch. Plant Cell 12: 637–646.

Faris JD, Friebe B, and Gill BS (2002) Wheat genomics: Exploring the polyploid model.Current Genomics 3: 577–591.

Feuillet C and Keller B (1999) High gene density is conserved at syntenic loci of smalland large grass genomes. Proceedings of the National Academy of Sciences of theUnited States of America 96: 8265–8270.

Jiang J and Gill BS (1994) Nonisotopic in situ hybridization and plant genomemapping: The first 10 years. Genome 37: 717–725.

Jiang JM and Gill BS (2006) Current status and the future of fluorescence in situhybridization (FISH) in plant genome research. Genome 49: 1057–1068.

Lander ES and Botstein D (1989) Mapping Mendelian factors underlying quantitativetraits using RFLP linkage maps. Genetics 121: 185–199.

Liu BH (1997) Statistical Genomics: Linkage, Mapping and QTL Analysis. Boca Raton,FL: CRC Press.

McCarthy LC (1996) Whole genome radiation hybrid mapping. Trends in Genetics12: 491–493.

Paterson AH (1996) Making genetic maps. In: Paterson AH (ed.) Genome Mapping inPlants, pp. 23–39. Austin, TX: R G Landes Company.

Paux E, Sourdille P, Mackay I, and Feuillet C (2012) Sequence-based markerdevelopment in wheat: Advances and applications to breeding. BiotechnologyAdvances 30: 1071–1088.

Redei GP (1999) Genetics Manual. Singapore: World Scientific, pp. 1141.Tanksley SD, Ganal MW, and Martin GB (1995) Chromosome landing: A paradigm for

map-based gene cloning in plants with large genomes. Trends in Genetics11: 63–68.

Tanksley SD, Young ND, Paterson AH, and Bonierbale MW (1989) RFLP mapping inplant breeding: New tools for an old science. Biotechnology 7: 257–263.


genome mapping - elsevierscitechconnect.elsevier.com/.../07/genome-mapping.pdf · genome mapping is...

Documents