power and weakness of data power: data + software + bioinformatician = answer. weakness: data...
Post on 15-Jan-2016
216 views
TRANSCRIPT
![Page 1: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/1.jpg)
Power and weaknessPower and weakness of data of data
Power: data + software + bioinformatician = answer.
Weakness: Data errors. Data poorly understood. Poor software. Never enough data. Few bioinformaticians available.
![Page 2: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/2.jpg)
Laerte about structures:Laerte about structures:
“Use the Force, Luke” sequence , Gert
![Page 3: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/3.jpg)
Signals in SequencesSignals in Sequences
The number of sequencesThe number of sequencesavailable for analysis rapidlyavailable for analysis rapidlyapproaches infinite.approaches infinite.
We need new ways to look We need new ways to look at all this information.at all this information.
![Page 4: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/4.jpg)
The First Law:The First Law:
First law of sequence First law of sequence analysis:analysis:
A conserved residue A conserved residue is important.is important.
![Page 5: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/5.jpg)
With thousands of aligned With thousands of aligned sequences:sequences:
Second law of sequence Second law of sequence analysis:analysis:
A very conserved residue A very conserved residue is very important.is very important.
![Page 6: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/6.jpg)
Signals in sequences:Signals in sequences:Conserved, CMA, variableConserved, CMA, variable
QWERTYASDFGRGHQWERTYASDTHRPMQWERTNMKDFGRKCQWERTNMKDTHRVWBlack = conservedWhite = variableGreen = correlated mutations(CMA)
![Page 7: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/7.jpg)
Sequence SignalsSequence Signals
Three types of information from multiple sequence alignments:
1) Conservation2) Correlation3) Variability
![Page 8: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/8.jpg)
ArtefactsArtefacts
Wrong sequence signalscan result from:
Not enough sequencesToo conserved sequencesToo variable sequencesOver-alignmentOver-interpretation
![Page 9: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/9.jpg)
Recalcitrant residues Recalcitrant residues
![Page 10: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/10.jpg)
Sequence EntropySequence Entropy
20
Ei = pi ln(pi) i=1
![Page 11: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/11.jpg)
Sequence VariabilitySequence Variability
Sequence variability is the number of residue types that is present in more than 0.5% of the sequences.
![Page 12: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/12.jpg)
Entropy - VariabilityEntropy - Variability
Evolution = try everything(and keep what works well)
Variability = Chaos (try everything)
Entropy = Information(keep what works well)
![Page 13: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/13.jpg)
Entropy - VariabilityEntropy - Variability
Variability is result of DNA trying everything.
Entropy is the protein’s break on evolutionary speed.
![Page 14: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/14.jpg)
Ras Entropy - VariabilityRas Entropy - Variability
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
![Page 15: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/15.jpg)
Ras LocationRas Location
11 Red12 Orange22 Yellow23 Green33 Blue
![Page 16: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/16.jpg)
Protease Protease Entropy - VariabilityEntropy - Variability
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
![Page 17: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/17.jpg)
Protease LocationProtease Location
11 Red12 Orange22 Yellow23 Green33 Blue
![Page 18: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/18.jpg)
Globin Globin Entropy - VariabilityEntropy - Variability
GPCR
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
![Page 19: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/19.jpg)
Globin LocationGlobin Location
11 Red12 Orange22 Yellow23 Green33 Blue
![Page 20: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/20.jpg)
And now for drug design: GPCR And now for drug design: GPCR
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
![Page 21: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/21.jpg)
GPCRs: (Membrane facing GPCRs: (Membrane facing amino acids left out)amino acids left out)
11 Red12 Orange22 Yellow23 Green33 Blue
![Page 22: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/22.jpg)
SummarySummary
Given many sequences:
Every residue’s role known.Signaling paths detectable.Two step evolutionary model: First main site, soon after modulator site.
![Page 23: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/23.jpg)
Beyond the summaryBeyond the summary
Sequence -> structure -> functionis wrong. It should be:Structure -> sequence -> function.
And, because active sites are at the surface, conserved residues are at or near the surface.
![Page 24: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/24.jpg)
Beyond the summaryBeyond the summary
Why do all TIM-barrel enzymes have the functional residues at the C-terminal side of the strands?
![Page 25: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/25.jpg)
Beyond the summaryBeyond the summary
22 Yellow: Core
11 Red: main site
23 Green: Modulator
12 Orange: Around main site
Up to 18 residue types
Up to 14 residue types
Up to 8 residue types
Up to 4 residue types11
12 22
23 33
![Page 26: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/26.jpg)
The weakness of dataThe weakness of data
Data errors.Poor software. Data poorly understood. Never enough data. Few bioinformaticians around.
![Page 27: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/27.jpg)
The weakness of dataThe weakness of data
Rob Hooft
WHAT_CHECK
www.cmbi.kun.nl/gv/servers/www.cmbi.kun.nl/gv/pdbreport/
![Page 28: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/28.jpg)
Structure validationStructure validation
Everything that can goEverything that can gowrong, will go wrong,wrong, will go wrong,especially with things asespecially with things ascomplicated as proteincomplicated as proteinstructures.structures.
![Page 29: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/29.jpg)
Why ?Why ?
Why does a sane (?) human being spend fourteen years to search for twelve million errors in the PDB?
![Page 30: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/30.jpg)
Because:Because:
All we know about proteins is derived from PDB files.
If a template is wrong the model will be wrong.
Errors become smaller when you know about them.
![Page 31: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/31.jpg)
What do we check?What do we check?
Administrative errors.Crystal-specific errors.NMR-specific errors.Really wrong things.Improbable things.Things worth looking at.Ad hoc things.
![Page 32: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/32.jpg)
Error detectionError detection
Detecting errors is one thingfixing them another…
We try not to say about the structure that it is wrong, but we try to say what is wrong about the structure.Give hints how to fix things.
![Page 33: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/33.jpg)
How difficult can it be?How difficult can it be?
![Page 34: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/34.jpg)
How difficult can it be?How difficult can it be?
![Page 35: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/35.jpg)
Your best check:Your best check:
![Page 36: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/36.jpg)
PlanarityPlanarity
![Page 37: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/37.jpg)
Little things hurt bigLittle things hurt big
![Page 38: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/38.jpg)
Improbable thingsImprobable things
![Page 39: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/39.jpg)
How wrong is wrong?How wrong is wrong?
![Page 40: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/40.jpg)
Our errorsOur errors
Four sigma: 12.000 false positives.Administrative errors misunderstood.Improbable is not wrong.Poor data makes errors unavoidable.Bugs.
![Page 41: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/41.jpg)
Contact ProbabilityContact Probability
![Page 42: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/42.jpg)
Contact ProbabilityContact Probability
![Page 43: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/43.jpg)
DACADACA
![Page 44: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/44.jpg)
DACADACA
![Page 45: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/45.jpg)
DACADACA
![Page 46: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/46.jpg)
DACADACA
![Page 47: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/47.jpg)
DACADACA
![Page 48: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/48.jpg)
Contact probability boxContact probability box
![Page 49: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/49.jpg)
Using contact probabilityUsing contact probability
![Page 50: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/50.jpg)
His, Asn, Gln ‘flips’His, Asn, Gln ‘flips’
![Page 51: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/51.jpg)
Where are the protons?Where are the protons?
![Page 52: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/52.jpg)
Hydrogen bond networkHydrogen bond network
![Page 53: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/53.jpg)
Hydrogen bond force fieldHydrogen bond force field
![Page 54: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/54.jpg)
Hydrogen bond force fieldHydrogen bond force field
![Page 55: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/55.jpg)
15% should be flipped15% should be flipped
![Page 56: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/56.jpg)
SummarySummary
Everything that could go wrong has gone wrong.Errors are on a ‘sliding scale’.Error detection can detect a lot, but surely not everything (yet).
![Page 57: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/57.jpg)
Beyond the summary,Beyond the summary,For Drug Design:For Drug Design:
Forget: High throughput.Forget: Docking.Forget: Structure in absence of many, many sequences.
First gather and digest all experimental data.
![Page 58: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/58.jpg)
Beyond the summary,Beyond the summary,For Drug Design:For Drug Design:
First know your enemy,
then defeat it.
![Page 59: Power and weakness of data Power: data + software + bioinformatician = answer. Weakness: Data errors. Data poorly understood. Poor software. Never enough](https://reader033.vdocument.in/reader033/viewer/2022051417/56649d255503460f949fbbf4/html5/thumbnails/59.jpg)
Thanks to:Thanks to:
Laerte Oliveira Sao PauloFlorence Horn San FranciscoRob Hooft DelftWilma Kuipers Weesp Bob Bywater CopenhagenNora vd Wenden The HagueMike Singer BostonAd IJzerman LeidenMargot Beukers LeidenAmos Bairoch GenevaFabien Campagne San Diego