gn3502: bacterial genetics ken forbes medical microbiology

Post on 28-Mar-2015

238 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GN3502: Bacterial Genetics

Ken Forbes

Medical Microbiology

1. “Classical” bacterial genetics

2. New approachesPhysical mapping of genomes

Whole genome sequencing

Functional analysis

3. New perspectives on bacterial geneticsOrigin of species

Bacterial lifestyles

Lecture synopsis

“Classical” view of bacteria

• Single chromosome• May have plasmids and phage• Simple gene structure• Genes have recognisable phenotype• Can do genetics in lab

– gene transfer• transformation• transduction• Conjugation

– molecular biology

Classical methods are not adequate

• Bacteria live in many diverse habitats• Much diversity within a species• Most genes in most species have not yet been

identified

Have most of the genes in any species been identified?

• Traditional genetic and molecular methods have identified a function for only half of the genes in E. coli

• Constraints from– methodologies– many genes will not be expressed in the lab

• New approaches needed– genome oriented– sequence oriented

1. “Classical” bacterial genetics

2. New approachesPhysical mapping of genomes

Whole genome sequencing

Functional analysis

3. New perspectives on bacterial genetics

Origin of species

Bacterial lifestyles

Lecture synopsis

Lecture synopsis: 2. New approachesPhysical mapping of genomes

Methods: PFGE clone libraries

Discoveries: bacterial genomes size shape replicons

Whole genome sequencingMethods: sequencing strategies

Discoveries: gene organisation assigning function

Functional analysisDiscoveries: new genes

Methods: for individual genes for whole genomes DNA arrays proteome

Physical mapping of genomes

• Low resolution restriction enzyme maps of whole genome

• Locate genes on the map using DNA-based techniques

• PHYSICAL map of chromosome not a GENETIC map

• Restriction map whole chromosome with rare cutting REs– complete digests– partial– double digests

Pulsed-Field Gel Electrophoresis

EE

E

E

EEE

E

HH

H

1 Mb

S

S

HSE HS(H)

Molten agaroseCultured cells

Incubate with Proteinase K

Trapped HMW DNA

Embedded Cells

Inactivate Proteinase K & wash to remove cell

debris

Pulsed-Field Gel Electrophoresis

Digest with Rare-cutting restriction enzyme

+

Periodic Switching (pulsing) between

electrode pairs

Net migration

-

+

-

Pulsed-Field Gel Electrophoresis

Mapping genes on whole genome RE maps

E E

E

E

EEE

E

HH

H

S

S

| geneA

gene

B |

geneC |

• Hybridize cloned-gene DNA fragment to PFGE fragments– locate gene on map

Ordered clone librariesmethod

• Make clones of entire genome– Ø clones of whole genome

• Small (10’s kb) size of inserts means 1000’s clones required to cover whole chromosome

– Bacterial Artificial Chromosomes (BAC)• clone in E.coli F plasmid• large (100’s kb) size of inserts means fewer clones needed

• Order the clones into contigs– overlapping clones will cross hybridise

Ordered clone libraries

• Disadvantages– not all regions clonable – labour intensive and expensive

• Advantages– immortalised source of genomic DNA– minimally redundant– easy to find and sub-clone a gene of interest– identify adjacent genes– use in genome sequencing projects

Ordered clone librariesapplications

• E. coli K12– widely used lab strain

• Mycobacterium leprae– obligate human pathogen– not cultivable in vitro– genetic analysis impossible– ordered clone library allowed molecular genetic analysis

Physical mapping

• Pros – only need DNA of organism– standard molecular biology methods used

• Cons– low resolution– no phenotypic information about genes

Physical mapping of genomesMethods: PFGE clone libraries

Discoveries: bacterial genomes size shape replicons

Whole genome sequencingMethods: sequencing strategies

Discoveries: gene organisation assigning function

Functional analysisDiscoveries: new genes

Methods: for individual genes for whole genomes DNA arrays proteome

Lecture synopsis: 2. New approaches

Bacterial genomes come in many different sizes

• Range 0.6Mb – 9Mb• Bigger genomes encode more genes• < 2Mb specialist species

– restricted ecological niche (Mycoplasma)

– fastidious growth (Haemophilus influenzae)

– obligate intracellular parasites (Chlamidia)

• 3 – 5Mb generalist species – broad metabolic potential, few organic growth requirements

(E. coli)

• > 5Mb species with developmental cycles – (Streptomyces: mycelial growth, spores, complex bioactive

compounds)

Bacterial genomes come in different conformations

• Circular chromosomes– the traditional view: E. coli

• Linear chromosomes– Borrelia

• Plasmids– circular and linear forms

Bacterial genomes can have several chromosomes

• “Chromosomes must harbour some essential genes”– ribosomal RNA (rrn)

• “Plasmids should not be required for viability”– only encode supplementary functions– can be very large (1-2 Mb)

Bacterial genomes

• Most species have one chromosome– eg E. coli

• 1x circular chromosome with rrn, housekeeping genes

• Some species have 2 chromosomes (a few 3)– eg Agrobacterium tumefaciens

• 2x chromosomes each with rrn and housekeeping genes– 1x circular 3Mb– 1x linear 2Mb

• 2x plasmids, circular 200kb, 450kb

Physical mapping: conclusions

• Bacterial genomes are very variable– chromosome size, conformation, number– plasmids often very important, but not essential

• Genomes have a large coding capacity– this reflects bacterial biodiversity– there are many genes of unknown function– laboratory analysis imposes constraints on understanding

of many genes

• How can you identify all of the genes in a species?

Physical mapping of genomesMethods: PFGE clone libraries

Discoveries: bacterial genomes size shape replicons

Whole genome sequencingMethods: sequencing strategies

Discoveries: genome organisation identify genes

Functional analysisDiscoveries: new genes

Methods: for individual genes for whole genomes DNA arrays proteome

Lecture synopsis: 2. New approaches

Whole genome sequencing

• Whole genome sequences now available for– 300 bacterial species/ strains– most pathogens – representatives of most bacterial lineages

Haemophilus influenzae genomepublished 1995

Whole genome sequencing

• Advantages– inexpensive– all of genome seq’ available– all genes identified

• Requirements– automated DNA sequencing machines– massive computing power

• “Factory sequencing”

Fluorescent sequencing• DNA sequencing reaction

– Sanger terminator chemistry • nt chain extension until blocked by terminator nt

– terminator nt has fluorescent dye attached• each nt has different colour

Phases of sequencing project

• Primary sequencing phase– random accumulation of seq’ into contigs

• Linking phase– contigs linked together using directed sequencing

methods

• Polishing phase– removal of sequence ambiguities from the single

contig

• Finished sequence– analyse, annotate

Genome sequencing strategies

• Total-genome shotgun sequencing• Primer walking• Mixed strategy

Total-genome shotgun sequencing

• Shotgun cloning– shear DNA into random fragments of 1-5kb– clone into vector

• Sequencing primers in vector

vector

cloned insert

sequencingprimers

Total-genome shotgun sequencingadvantages

• Don’t require map of genome• Sequencing machines at continuous full capacity• Sequence polishing only done once• >’er accuracy through multiple coverage

– 6-10 fold genome equivalents

Total-genome shotgun sequencingdisadvantages

• Repeat coverage is wasteful• Can’t clone some genomic regions• Repetitive regions in genome

– can’t map each to its correct genomic position– prevents contigs from being joined together

• other methods required to span across each repeat

• Sequence assembly and analysis can only be done at end of sequencing phase

Primer walking

• Require ordered clone library• Primer walk along each cloned fragment

– first primer in vector• sequence into cloned DNA

– next primer in new seq’ • sequence further into cloned DNA

– start at each end of cloned fragment– cycles of:

sequencing

polishingprimer design

primersynthesis

Primer walking

• Advantages– high quality, useable sequence obtained from start– sequence produced in large contigs– no repeat coverage – both strands sequenced

• Disadvantages– many expensive primers needed– time lag between walks– little automation, sequencing machines often idle

Mixed strategy

• Most popular strategy• Combine advantages of both methods

– initial random- sequencing phase• on either whole genome or on set of ordered clones• typically 3-6 fold coverage

– final primer-walking over gaps

Ultrahigh throughput sequencing

• Sequencing by Synthesis – SBS– eg SOLEXA– generates short (18-35 base) reads

video of chemistry

Ultrahigh throughput sequencing

• Sequencing by Synthesis – SBS– template of tens of millions of individual, clonally

amplified DNA fragments– yields up to 1 gigabase sequence in total– avoids cloning steps– inexpensive: £500/ bacterial genome

Physical mapping of genomesMethods: PFGE clone libraries

Discoveries: bacterial genomes size shape replicons

Whole genome sequencingMethods: sequencing strategies

Discoveries: genome organisation identify genes

Functional analysisDiscoveries: new genes

Methods: for individual genes for whole genomes DNA arrays proteome

Lecture synopsis: 2. New approaches

Genome organisation

• Can identify– all protein and RNA coding genes– organisation of genes

• in genome • wrt each other

E. coli genome

• Traditional genetic and molecular methods have identified 2220 genes in E. coli

E. coli genome

• Whole genome sequencing has identified 4288 protein coding genes in E. coli genome

E. coli genome

genetic map = 100 min

physical map = 4.6Mb

1min = 46Kb

Genome organisation

• >90% of genome codes for genes• Genes

– identified in genome sequence by• Open Reading Frame (ORF)• homology to known genes in other spp

• Regulation of gene expression– promoter and ribosome binding site sequences– operons and linked genes

Identifying genes: by phenotype

• Genes traditionally identified by genetic analysis– Robust identification of gene by its function

Identifying genes: by DNA homology

• Identify gene by sequence homology• Need previously characterised gene in another

species– high homology between them– robust identification of the previously characterised gene– But new gene may have different biological role

Identifying genes: by Open Reading Frame

• ORF: “a DNA seq with no stop codons”• Only genes coding for proteins• Ends of the gene not easily defined

Bacterial genomes have many genes with no known function

• 60% of genes have a recognisable function– but the specific role of many are unknown

• 40% of genes have no known function– 10% found in other species

• conserved protein families• important housekeeping genes?

– 30% unique to each sp• determine pathogenicity, lifestyle

Physical mapping of genomesMethods: PFGE clone libraries

Discoveries: bacterial genomes size shape replicons

Whole genome sequencingMethods: sequencing strategies

Discoveries: genome organisation identify genes

Functional analysisDiscoveries: new genes

Methods: for individual genes for whole genomes DNA arrays proteome

Lecture synopsis: 2. New approaches

Assigning function to novel genes

• How do you determine the function of genes identified by seq’ rather than by phenotype?

• For individual genes use an appropriate molecular genetic technique– gene knockouts– conditional lethal mutations– control region probes

Assigning function to new genes

• Individual genes gene knockouts

conditional lethal mutations

control region probes

• Whole genome DNA arrays

proteome analysis

DNA arrays

• Macroarrays– DNA fragment probes (eg PCR product)– one per gene– array on membrane (103 s)

• Microarrays– oligonucleotide probes– several oligonucleotides per gene– array on glass (105 s)

DNA arrays

Colour = relative ORF expressionIntensity = extent ORF expression

Sample A Sample B

Expression in both samples

DNA arrays: applications

• Gene expression (mRNA)– transcriptome

• Presence/ absence genes (DNA)– genome polymorphisms

Proteomics

• 2D electrophoresis of cellular proteins– separate by charge then by size

– AA sequence spot of interest– refer back to genome sequence

• Characterisation of all expressed proteins

1. “Classical” bacterial genetics

2. New approachesPhysical mapping of genomes

Whole genome sequencing

Functional analysis

3. New perspectives on bacterial geneticsOrigin of speciesLifestyles

Lecture synopsis

Why have bacteria so many genes?

• 60% have recognisable function– specific role of many genes unknown

• eg only to enzyme class

• 40% have no known function– 10% common, conserved gene families– 30% unique to each species

Some genes are common to many species

• Conserved gene families • Presumably housekeeping genes• Potential targets for novel antibacterials

Some genes are unique to one species

• These genes give a sp its unique characteristics• Allow adaptation to a particular lifestyles• Virulence genes

How many genes does a pathogen need?

• Mycobacterium tuberculosis– mechanism of pathogenesis unknown– 4.4 Mb genome– 3994 genes

• 1/3 known function

• 1/3 similar proteins

• 1/3 unknown

in vivo300 genes not required

in vitro3000 genes not required

Some species are apparently “missing genes”

• Many pathogens have complex growth requirements

• Some functions or pathways absent– genes for some pathways eliminated

• nutrients supplied by host

– adaptation to niche• H.pylori lives in acidic environment of stomach

does not ferment sugars (acidic products) does ferment amino acids (alkaline products)

“Community Genomics Among Stratified Microbial Assemblages in the Ocean's Interior”

(2006) DeLong, et al Science 311, pp. 496-503

• Planktonic microbial communities in Pacific Ocean– sampled from ocean surface to sea floor– sequenced 64 million base pairs– thousands of new genes

• Variations in sequencs at different depths– near the ocean surface

• photosynthetic and mobile microorganisms• more genes for iron uptake

– deeps• a predominance of "adhesive" microbes• antibiotic synthesis genes

• Organisms do not live in isolation• Organisms interact with host/ environment• Organisms often dependent on each other

– nutrient flow through biological systems

• Use genomics to understand the interaction between spp at gene level

Bacteria are diverse

Bacteria are diverse

Stereo micrograph of dental plaque.Nutrient flow from cocci to filamentous bacteria.

top related