it & health 2009 summary thomas nordahl petersen
Post on 23-Jan-2016
221 views
TRANSCRIPT
![Page 1: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/1.jpg)
It & Health 2009Summary
Thomas Nordahl Petersen
![Page 2: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/2.jpg)
Teachers
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Thomas Nordahl Petersen
Rasmus Wernersson
Lisbeth Nielsen Fink
Anders Gorm Pedersen
Bent Petersen
Ramneek Gupta
Thomas Blicher
![Page 3: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/3.jpg)
Outline of the course
• Topics will cover a general introduction to bioinformatics– Evolution– DNA / Protein– Alignment and scoring matrices
• How does it work & what are the numbers
– Visualization of multiple alignments• Phylogenetic trees and logo plots
– Commonly used databases• Uniprot/Genbank & Genome browsers
– Protein 3D-structure– Artificial neural networks & case stories– Practical use of bioinformatics tools
• Preparation for exam
![Page 4: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/4.jpg)
Topics covered - (some of them)
![Page 5: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/5.jpg)
Information flow in biological systems
![Page 6: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/6.jpg)
Amino Acids
Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon
The amino acids found in Living organisms are L-amino acids
![Page 7: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/7.jpg)
Amino Acids - peptide bond
N-terminal C-terminal
![Page 8: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/8.jpg)
1 and 3-letter codes
1.There are 20 naturally occurring amino acids2.Normally the one/three codes are used
Ala - ACys - CAsp - DGlu - EPhe - FGly - GHis - HIle - ILys - KLeu - L
Met - MAsn - NPro - PGln - QArg - RSer - SThr - TVal - VTrp - WTyr - Y
![Page 9: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/9.jpg)
CE
NT
ER
FO
R B
IOLO
GIC
AL
SE
QU
EN
CE
AN
ALY
SIS
Theory of evolution
Charles DarwinCharles Darwin1809-18821809-1882
![Page 10: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/10.jpg)
Phylogenetic tree
![Page 11: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/11.jpg)
Global versus local alignments
Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm).
Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm).
Global alignment
Seq 1
Seq 2
Local alignment
![Page 12: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/12.jpg)
Pairwise alignment: the solution
”Dynamic programming” (the Needleman-Wunsch algorithm)
![Page 13: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/13.jpg)
Sequence alignment - Blast
![Page 14: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/14.jpg)
Sequence alignment - Blast
![Page 15: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/15.jpg)
Blosum & PAM matrices
• Blosum matrices are the most commonly used substitution matrices.
• Blosum50, Blosum62, blosum80• PAM - Percent Accepted Mutations• PAM-0 is the identity matrix.• PAM-1 diagonal small deviations from 1, off-
diag has small deviations from 0• PAM-250 is PAM-1 multiplied by itself 250
times.
![Page 16: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/16.jpg)
Sequence profiles (1J2J.B)
>1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK
![Page 17: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/17.jpg)
Log-odds scores
• BLOSUM is a log-likelihood matrix:• Likelihood of observing j given you have i is
– P(j|i) = Pij/Pi
• The prior likelihood of observing j is– Qj , which is simply the frequency
• The log-likelihood score is– Sij = 2log2(P(j|i)/log(Qj) = 2log2(Pij/(QiQj))– Where, Log2(x)=logn(x)/logn(2) – S has been normalized to half bits, therefore the factor 2
![Page 18: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/18.jpg)
BLAST Exercise
![Page 19: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/19.jpg)
Genome browsers - UCSC
Intron - Exon structure
Single Nucleotide polymorphism - SNP
![Page 20: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/20.jpg)
SNPs
![Page 21: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/21.jpg)
Protein 3D-structure
![Page 22: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/22.jpg)
Protein structure
Primary structure: Amino acids sequences
Secondary structure: Helix/Beta sheet
Tertiary structure: Fold, 3D cordinates
![Page 23: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/23.jpg)
Protein structure-helix
helix 3 residues/turn - few, but not uncommon-helix 3.6 residues/turn - by far the most common helixPi-helix 4.1 residues/turn - very rare
![Page 24: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/24.jpg)
Protein structurestrand/sheet
![Page 25: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/25.jpg)
Protein folds
Class4’th is ‘few secondary structure
ArchitectureOverall shape of a domain
TopologyShare secondary structure connectivity
![Page 26: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/26.jpg)
Protein 3D-structure
![Page 27: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/27.jpg)
Neural NetworksFrom knowledge to information
Protein sequence Biological feature
![Page 28: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/28.jpg)
• A data-driven method to predict a feature, given a set of training data
• In biology input features could be amino acid sequence or nucleotides
• Secondary structure prediction
• Signal peptide prediction
• Surface accessibility
• Propeptide prediction
Use of artificial neural networks
N C
Signalpeptide
Propeptide Mature/active protein
![Page 29: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/29.jpg)
Prediction of biological featuresSurface accessible
QuickTime™ and a decompressor
are needed to see this picture.
Predict surface accessible fromamino acid sequence only.
![Page 30: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/30.jpg)
Logo plots
Information content, how is it calculated - what does it mean.
![Page 31: It & Health 2009 Summary Thomas Nordahl Petersen](https://reader035.vdocument.in/reader035/viewer/2022062305/56649d4e5503460f94a2d6f3/html5/thumbnails/31.jpg)
Logo plots - Information Content
Sequence-logo
Calculate Information Content
I = apalog2pa + log2(4), Maximal value is 2 bits
• Total height at a position is the ‘Information Content’ measured in bits.• Height of letter is the proportional to the frequency of that letter.• A Logo plot is a visualization of a mutiple alignment.
~0.5 each
Completely conserved