mastering microbes with microchips fiona brinkman fiona brinkman department of molecular biology and...

Mastering Microbes with MicrochipsMastering Microbes with Microchips

Fiona BrinkmanFiona Brinkman Department of Molecular Biology and BiochemistryDepartment of Molecular Biology and Biochemistry

Simon Fraser University, Simon Fraser University, Greater Vancouver, British Columbia, CanadaGreater Vancouver, British Columbia, Canada

What I What I won’twon’t talk about! talk about!

1.1. Pseudomonas Genome Database: Model Pseudomonas Genome Database: Model for continually-updated genome for continually-updated genome annotation and analysisannotation and analysis

2.2. Microarray analysis software Microarray analysis software development for the Pathogenomics development for the Pathogenomics (FPMI) Project(FPMI) Project

How can we best combat infectious How can we best combat infectious disease causing-bacteria?disease causing-bacteria?

+ =Rank Name

Kills1. Fiona

542. Ryan 0

Pathogens and The Art of WarPathogens and The Art of War

““What is of supreme importance in war is What is of supreme importance in war is to attack the enemy's strategy. Next best is to attack the enemy's strategy. Next best is to disrupt his alliances by diplomacy. The to disrupt his alliances by diplomacy. The next best is to attack his army. And the next best is to attack his army. And the worst policy is to attack cities.”worst policy is to attack cities.”

““And the worst policy is to attack cities.”And the worst policy is to attack cities.”

Infectious Diseases – There must be a better way…Infectious Diseases – There must be a better way…

Leading cause of productivity lossLeading cause of productivity loss Responsible for two thirds of deaths of persons under age 40Responsible for two thirds of deaths of persons under age 40

1980 1982 1984 1986 1988 1990 1992 19940

40Prevalence of Superbugs

Source: Clinical Infectious Diseases 24:S133 (1997)

MRSA VRE

““What is of supreme importance in war is to attack the enemy's strategy.”What is of supreme importance in war is to attack the enemy's strategy.”

strategy = virulence factorsstrategy = virulence factors

Pathogens and The ArtArt of War

““Attack your enemy where he is unprepared”Attack your enemy where he is unprepared”

Boost innate immune system Boost innate immune system

How can we best combat pathogens?How can we best combat pathogens?

A. Identify pathogen proteins more likely to be… A. Identify pathogen proteins more likely to be…

1.1. ……virulence factorsvirulence factors

- VGS Database and IslandPath- VGS Database and IslandPath

2.2. ……quickly accessible to drugs/immune system quickly accessible to drugs/immune system (cell surface)(cell surface)

- PSORT-B - PSORT-B

B. Identify human genes involved in boosting B. Identify human genes involved in boosting our innate immune systemour innate immune system

Summary of insights and lessons learned…Summary of insights and lessons learned…

Virulence Gene Subset (VGS) DatabaseVirulence Gene Subset (VGS) Database

• Based on literature analysisBased on literature analysis

• Experimentally determined virulence factors Experimentally determined virulence factors

• Extensive information in separate fieldsExtensive information in separate fields– Species informationSpecies information

– Gene/Protein informationGene/Protein information

– Gene knockout information relevant to virulence studiesGene knockout information relevant to virulence studies

– Infection assay informationInfection assay information

– ReferencesReferences

Horizontal Gene Transfer and Horizontal Gene Transfer and Virulence FactorsVirulence Factors

Transposons:Transposons: ST enterotoxin genes in ST enterotoxin genes in E. coliE. coli

Prophages:Prophages:Shiga-like toxins in EHECShiga-like toxins in EHECDiptheria toxin gene, Cholera toxinDiptheria toxin gene, Cholera toxinBotulinum toxinsBotulinum toxins

Plasmids:Plasmids:Shigella, Salmonella, YersiniaShigella, Salmonella, Yersinia

Pathogenicity Islands:Pathogenicity Islands:

UroUro//EnteroEntero--pathogenic pathogenic E. coliE. coliSalmonella typhimuriumSalmonella typhimuriumYersinia spp.Yersinia spp.Helicobacter pyloriHelicobacter pyloriVibrio choleraeVibrio cholerae

Pathogenicity IslandsPathogenicity Islands

Associated withAssociated with

– Atypical %G+CAtypical %G+C– tRNA sequencestRNA sequences– Transposases, Integrases and other mobility genesTransposases, Integrases and other mobility genes– Flanking repeatsFlanking repeats

IslandPath: Aiding identification of IslandPath: Aiding identification of Pathogenicity Islands and other Genomic Islands Pathogenicity Islands and other Genomic Islands

Yellow circle = high %G+C

Pink circle = low %G+C

Region of unusual dinucleotide bias

tRNA gene lies between the two dots

rRNA gene lies between the two dots

Both tRNA and rRNA lie between the two dots

Dot is named a transposase

Dot is named an integrase

Hsiao et al. (2003) Hsiao et al. (2003) BioinformaticsBioinformatics 19: 418-420 19: 418-420

Genome divided into “ORF-clusters” of 6 consecutive ORFs Genome divided into “ORF-clusters” of 6 consecutive ORFs

Dinucleotide relative abundance is calculated for the region asDinucleotide relative abundance is calculated for the region as

**XYXY = f* = f*XYXY/f*/f*XXf*f*YY where where f*f*XX denotes the frequency of the mononucleotide X denotes the frequency of the mononucleotide X

f*f*XYXY the frequency of the dinucleotide XY the frequency of the dinucleotide XY

For each ORF cluster,For each ORF cluster,the average absolute dinucleotide relative abundance difference isthe average absolute dinucleotide relative abundance difference is

where where f (fragment) is derived from sequences in an ORF-cluster f (fragment) is derived from sequences in an ORF-cluster g (genome) is derived from all predicted ORFs in the genomeg (genome) is derived from all predicted ORFs in the genome

Dinucleotide bias analysisDinucleotide bias analysis

|)(*)(*|16

1),(* gfgf xyxy

Hsiao et al. Hsiao et al. (2003) (2003) BioinformaticsBioinformatics 19: 418-420 19: 418-420

Dinucleotide bias analysisDinucleotide bias analysis

““ORF-clusters” sampled in an overlapping manner (shift by one ORF at a time)ORF-clusters” sampled in an overlapping manner (shift by one ORF at a time)

The mean The mean is calculated by averaging the results from all ORF-clusters in is calculated by averaging the results from all ORF-clusters in the genomethe genome

Regions with greater than 1 standard deviation away from the mean are marked Regions with greater than 1 standard deviation away from the mean are marked on the IslandPath graphical display with strikethrough lineson the IslandPath graphical display with strikethrough lines

Why did we use 6 ORFs per cluster?Why did we use 6 ORFs per cluster?- Not enough bp in a single ORF to get a good estimate - Not enough bp in a single ORF to get a good estimate - 4.5kb (corresponding to approximately 6-8 ORFs) is required for “reliable - 4.5kb (corresponding to approximately 6-8 ORFs) is required for “reliable

estimation of nucleotide composition”estimation of nucleotide composition” (Lawrence and Ochman, (Lawrence and Ochman, J Mol EvolutionJ Mol Evolution 1997 44:383-97) 1997 44:383-97)

),(* gf

34 3536

Boxes: Known islands in the Boxes: Known islands in the Salmonella typhi Salmonella typhi genomegenome

What features best predict Islands?What features best predict Islands?

Examined prevalence of features in over 200 known islandsExamined prevalence of features in over 200 known islands

• 94% of islands contain >25% dinucleotide bias (majority have 94% of islands contain >25% dinucleotide bias (majority have >75% dinucleotide bias coverage)>75% dinucleotide bias coverage)

• Mobility genes identified in >75% (but ID recently improved)Mobility genes identified in >75% (but ID recently improved)

• Atypical %G+C (above cutoff used in Brinkman et al., 2002) not Atypical %G+C (above cutoff used in Brinkman et al., 2002) not over 50% coverage on average, and tRNA genes not observed with over 50% coverage on average, and tRNA genes not observed with >50% of known islands>50% of known islands

34 3536

323334

Boxes: “Insertions” in the Boxes: “Insertions” in the Salmonella typhiSalmonella typhi genome verses genome verses Salmonella typhimurium Salmonella typhimurium

Properties of genes in these islands?Properties of genes in these islands?

Defined a “putative island” as Defined a “putative island” as

– 8 or more genes in a row with dinucleotide bias8 or more genes in a row with dinucleotide bias

Functional category analysis Functional category analysis Any difference for Any difference for genes in islands verses genome?genes in islands verses genome?

P value of Paired T test (66 organisms):4e-19

Hypothetical genes are more common in putative islands vs the rest of the genome

Why are hypothetical genes more common within putative Why are hypothetical genes more common within putative islands/dinucleotide biased regions?islands/dinucleotide biased regions?

1.1. Genes being horizontally acquired in bacteria come from a large pool Genes being horizontally acquired in bacteria come from a large pool of as yet unstudied genes?of as yet unstudied genes?

2.2. Genes are being miss-predicted within these regions because of the Genes are being miss-predicted within these regions because of the region’s different genomic composition? region’s different genomic composition?

Testing hypothesis 2: Testing hypothesis 2: - Genes <300 bp in size are more likely to be false positives- Genes <300 bp in size are more likely to be false positives- Therefore, remove genes less than 300 bp and reanalyze- Therefore, remove genes less than 300 bp and reanalyze

P value of Paired T test (55 organisms):0.027

P value of Paired T test (66 organisms):3e-17

Other categories more common in islandsOther categories more common in islands

COG functional categoryCOG functional category Paired T test Paired T test

p valuep value

Hypothesis to testHypothesis to test

Translation, ribosomal Translation, ribosomal structure and biogenesisstructure and biogenesis

4.6e-84.6e-8 Ribosome operons highly Ribosome operons highly expressed and so have expressed and so have unusual bp composition unusual bp composition and falsely ID’d as islandsand falsely ID’d as islands

Cell motilityCell motility 6e-36e-3 Mix of above and below Mix of above and below hypotheseshypotheses

SecretionSecretion 0.020.02 Reflects nature of Reflects nature of acquired subnetworks and acquired subnetworks and how they must interact how they must interact with the environment?with the environment?

Aquiring genes = Acquiring subnetworksAquiring genes = Acquiring subnetworks

Most functional categories involve cytoplasmic proteins Secretion category more

associated with subcellular localization and possible subnetworks that would easy to add to an existing cell network

bacterial cell

What does all this mean?What does all this mean?

1.1. Acquired genes may come from a large pool of genes of which many Acquired genes may come from a large pool of genes of which many are still uncharacterized? are still uncharacterized?

2.2. Acquired genes = acquired subnetworks …that involve interactions Acquired genes = acquired subnetworks …that involve interactions that cross cell membranes? that cross cell membranes?

3.3. What predicted gene dataset you use can have a significant effect on What predicted gene dataset you use can have a significant effect on downstream analyses.downstream analyses.

4.4. Analyzing correlations is difficult! Keep testing those hypotheses!Analyzing correlations is difficult! Keep testing those hypotheses!

Future studiesFuture studies

1.1. Vary the analysis approach Vary the analysis approach - Same result with other functional category classification systems - Same result with other functional category classification systems - More precise criteria for identifying islands- More precise criteria for identifying islands- Different dinucleotide bias calculation? - Different dinucleotide bias calculation?

2.2. Examine in the context of gene expression data Examine in the context of gene expression data

3.3. Statistical modeling of the data Statistical modeling of the data (Dana Aeschliman and Jenny Bryan)(Dana Aeschliman and Jenny Bryan)

How can we best combat pathogens?How can we best combat pathogens?

A. Identify pathogen proteins more likely to be… A. Identify pathogen proteins more likely to be…

1.1. ……virulence factorsvirulence factors

- VGS Database and IslandPath- VGS Database and IslandPath

2.2. ……quickly accessible to drugs/immune system quickly accessible to drugs/immune system (cell surface)(cell surface)

- PSORT-B - PSORT-B

B. Identify human genes involved in boosting B. Identify human genes involved in boosting our innate immune systemour innate immune system

Summary of insights and lessons learned…Summary of insights and lessons learned…

Subcellular Localization PredictionSubcellular Localization Prediction

Annotation

Experimental design

Functions

Drug/vaccine targets

www.psort.org/psortbwww.psort.org/psortb

• Web-based subcellular localization prediction toolWeb-based subcellular localization prediction tool

• Score for each of 5 primary Gram -ve localization sitesScore for each of 5 primary Gram -ve localization sites– PSORT I does not predict extracellular proteinsPSORT I does not predict extracellular proteins– Also returns “unknown” (PSORT I forces a prediction)Also returns “unknown” (PSORT I forces a prediction)

• Trained and tested using a dataset of proteins of experimentally-Trained and tested using a dataset of proteins of experimentally-verified subcellular localizationverified subcellular localization– Constructed manually through literature reviewConstructed manually through literature review– Largest dataset of its kindLargest dataset of its kind

• Analyzes 6 biological features using 6 modulesAnalyzes 6 biological features using 6 modules– More comprehensive than existing tools More comprehensive than existing tools

PSORT-B ModulesPSORT-B Modules

Signal peptides: Non-cytoplasmicSignal peptides: Non-cytoplasmic

Amino acid composition/patterns: Cytoplasmic Amino acid composition/patterns: Cytoplasmic All localizations All localizations- Support Vector Machine’s trained with aa composition - Support Vector Machine’s trained with aa composition

subsequences subsequences

Transmembrane helices: Inner membraneTransmembrane helices: Inner membrane- HMMTOP- HMMTOP

PROSITE motifs: All localizationsPROSITE motifs: All localizations

Outer membrane motifs: Outer membraneOuter membrane motifs: Outer membrane- Association-rule mining to identify - Association-rule mining to identify

Homology to proteins of experimentally known localization: All localizationsHomology to proteins of experimentally known localization: All localizations- “SCL-BLAST” against database of pro of known localizations- “SCL-BLAST” against database of pro of known localizations- E=10e-10 and Length restriction of 80-120% vs both subject and - E=10e-10 and Length restriction of 80-120% vs both subject and query query

Integration Integration with a with a Baysian Baysian NetworkNetwork

Of Precision, Recall and Accuracy…Of Precision, Recall and Accuracy…

• PSORT- B designed for high precision (97% specificity, )PSORT- B designed for high precision (97% specificity, )– PSORT I’s specificity measured at 59%PSORT I’s specificity measured at 59%

• However, recall lower (75% sensitivity, ) which affects However, recall lower (75% sensitivity, ) which affects overall measure of accuracyoverall measure of accuracy– PSORT I recall 60%PSORT I recall 60%

• New version to be released this yearNew version to be released this year

TPTPTP+FPTP+FP

TPTPTP+FNTP+FN

Insights Gained During DevelopmentInsights Gained During Development

• Localization is an highly evolutionarily conserved traitLocalization is an highly evolutionarily conserved trait

– Conserved between Gram-positives and Gram-negatives (for Conserved between Gram-positives and Gram-negatives (for localizations present in both classes)localizations present in both classes)

– Reflection of the: Reflection of the: Need for cell to conserve subcellular networks? Need for cell to conserve subcellular networks? Different environments of each localization?Different environments of each localization?

Insights Gained During DevelopmentInsights Gained During Development

• Identified motifs characteristic of outer membrane proteins Identified motifs characteristic of outer membrane proteins through a data mining approach through a data mining approach (Martin Ester, Ke Wang, and others)(Martin Ester, Ke Wang, and others)

– Motifs (~6 aa long) map primarily to periplasmic Motifs (~6 aa long) map primarily to periplasmic turn regions of known 3D structuresturn regions of known 3D structures

– May reflect importance of periplasmic May reflect importance of periplasmic turns in a transmembrane beta-barrel turns in a transmembrane beta-barrel structure vs. other similar non-membrane structure vs. other similar non-membrane barrel structures barrel structures

Periplasmic turns Periplasmic turns

Analysis of bacterial proteomesAnalysis of bacterial proteomes

• What proportion of proteins are of a particular subcellular What proportion of proteins are of a particular subcellular localization?localization?

• Investigating the hypothesis:Investigating the hypothesis:– The proportion of membrane proteins increases in those organisms The proportion of membrane proteins increases in those organisms

inhabiting a greater variety of environmentsinhabiting a greater variety of environments

• Analysis of the deduced proteomes from 77 bacterial genome Analysis of the deduced proteomes from 77 bacterial genome projects.projects.

Cytoplasmicy = 0.0781x + 15.018

R2 = 0.8992

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Proteom e Size

Cytoplasmic Membrane y = 0.1601x - 4.0179

R2 = 0.9704

0 2000 4000 6000 8000 10000

Proteom e Size

Outer Membrane

y = 0.0132x - 6.8619

R2 = 0.6356

0 2000 4000 6000 8000 10000

Proteom e Size

Extracellular y = 0.0041x - 1.1381

R2 = 0.6703

0 2000 4000 6000 8000 10000

Proteome Size

PSORT-B predictionProportion of total predicted proteins

% st dev.

Cytoplasmic 30 % 5.9 %

CytoplasmicMembrane

57 % 5.8 %

Periplasmic 7.6 % 3.1 %

OuterMembrane

3.8 % 1.9 %

Extracelluar 1.3 % 0.8 %

What does this mean?What does this mean?

1.1. Protein localization is very conserved Protein localization is very conserved

2.2. Increased genome size = increase in networksIncreased genome size = increase in networks Therefore, conservation in localization proportions Therefore, conservation in localization proportions indicates that new networks being added tend to traverse indicates that new networks being added tend to traverse localizationslocalizations

3.3. Note: Can’t discount biases in unpredicted proteins, but Note: Can’t discount biases in unpredicted proteins, but new PSORT-B version will help confirm results new PSORT-B version will help confirm results

SummarySummary

• Converting pathogens and boosting rapid defenses Converting pathogens and boosting rapid defenses may be the way to win the war against pathogens may be the way to win the war against pathogens

• Identifying virulence factors is criticalIdentifying virulence factors is critical

• Acquired genes, including virulence factors, may come from a large Acquired genes, including virulence factors, may come from a large pool of genes that are predominantly uncharacterized.pool of genes that are predominantly uncharacterized.

• Acquired genes = acquired subnetworks that involve interactions that Acquired genes = acquired subnetworks that involve interactions that tend to traverse subcellular boundaries.tend to traverse subcellular boundaries.

www.pathogenomics.sfu.ca/brinkman www.pathogenomics.sfu.ca/brinkman

The Brinkman LabThe Brinkman Lab

Genome PrairieGenome PrairieGenome BCGenome BCInimex Inimex NSERCNSERC

Ray Karsten Geoff Sébastien MattJenn Will Mike Fiona Anastasia “The other Alison Fiona”

Dana Aeschliman Dana Aeschliman Jenny BryanJenny Bryan

Martin EsterMartin EsterKe WangKe WangRong SheRong SheChristopher WalshChristopher Walsh

All Software All Software freely freely available and available and open sourceopen source

FPMIIN

C, SFU

GOVERNMENTGenome CanadaGenome Prairie

Genome BCGovt of Saskatchewan

Functional Pathogenomics of Mucosal Immunitywww.pathogenomics.ca

mastering microbes with microchips fiona brinkman fiona brinkman department of molecular biology and...

dinucleotide xy

orfcluster g genome

dots rrna gene

best combat pathogens

virulence factorstransposons

virulence factorspathogens

orf cluster

art of warwhat

Documents

brinkman smoker

brinkman lab product portfolio...

evolutionary analysis - genome.gov · evolutionary analysis...

fiona cosgrove wellness coaching australia ...

annette brinkman brinkman-forlini-williams, llc...

brinkman assignment 5 pta presentation

royal brinkman partner day

brinkman lab product portfolio development...

lieutenant colonel mark t. brinkman united states marine...

brinkman et al, 2001

scanprix: paidi / fiona 01/13 · fiona 90x200 131 8101...

denver dermapen training - debbie brinkman

brinkman 4435 owners manual

joseph brinkman designing portfolio pdf

behavior design ams - willem-paul brinkman

bill brinkman a marine’s memories

brinkman lab product portfolio development...

darcy, forchheimer, brinkman and richards: classical

a study by bob brinkman presented by

richard brinkman - kantar sport