it & health 2009 summary
DESCRIPTION
It & Health 2009 Summary. Thomas Nordahl Petersen. Teachers. Bent Petersen. Thomas Nordahl Petersen. Ramneek Gupta. Rasmus Wernersson. Lisbeth Nielsen Fink. Thomas Blicher. Anders Gorm Pedersen. Outline of the course. Topics will cover a general introduction to bioinformatics Evolution - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/1.jpg)
It & Health 2009Summary
Thomas Nordahl Petersen
![Page 2: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/2.jpg)
Teachers
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Thomas Nordahl Petersen
Rasmus Wernersson
Lisbeth Nielsen Fink
Anders Gorm Pedersen
Bent Petersen
Ramneek Gupta
Thomas Blicher
![Page 3: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/3.jpg)
Outline of the course
• Topics will cover a general introduction to bioinformatics– Evolution– DNA / Protein– Alignment and scoring matrices
• How does it work & what are the numbers
– Visualization of multiple alignments• Phylogenetic trees and logo plots
– Commonly used databases• Uniprot/Genbank & Genome browsers
– Protein 3D-structure– Artificial neural networks & case stories– Practical use of bioinformatics tools
• Preparation for exam
![Page 4: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/4.jpg)
Topics covered - (some of them)
![Page 5: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/5.jpg)
Information flow in biological systems
![Page 6: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/6.jpg)
Amino Acids
Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon
The amino acids found in Living organisms are L-amino acids
![Page 7: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/7.jpg)
Amino Acids - peptide bond
N-terminal C-terminal
![Page 8: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/8.jpg)
1 and 3-letter codes
1.There are 20 naturally occurring amino acids2.Normally the one/three codes are used
Ala - ACys - CAsp - DGlu - EPhe - FGly - GHis - HIle - ILys - KLeu - L
Met - MAsn - NPro - PGln - QArg - RSer - SThr - TVal - VTrp - WTyr - Y
![Page 9: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/9.jpg)
CE
NT
ER
FO
R B
IOLO
GIC
AL
SE
QU
EN
CE
AN
ALY
SIS
Theory of evolution
Charles DarwinCharles Darwin1809-18821809-1882
![Page 10: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/10.jpg)
Phylogenetic tree
![Page 11: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/11.jpg)
Global versus local alignments
Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm).
Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm).
Global alignment
Seq 1
Seq 2
Local alignment
![Page 12: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/12.jpg)
Pairwise alignment: the solution
”Dynamic programming” (the Needleman-Wunsch algorithm)
![Page 13: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/13.jpg)
Sequence alignment - Blast
![Page 14: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/14.jpg)
Sequence alignment - Blast
![Page 15: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/15.jpg)
Blosum & PAM matrices
• Blosum matrices are the most commonly used substitution matrices.
• Blosum50, Blosum62, blosum80• PAM - Percent Accepted Mutations• PAM-0 is the identity matrix.• PAM-1 diagonal small deviations from 1, off-
diag has small deviations from 0• PAM-250 is PAM-1 multiplied by itself 250
times.
![Page 16: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/16.jpg)
Sequence profiles (1J2J.B)
>1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK
![Page 17: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/17.jpg)
Log-odds scores
• BLOSUM is a log-likelihood matrix:• Likelihood of observing j given you have i is
– P(j|i) = Pij/Pi
• The prior likelihood of observing j is– Qj , which is simply the frequency
• The log-likelihood score is– Sij = 2log2(P(j|i)/log(Qj) = 2log2(Pij/(QiQj))– Where, Log2(x)=logn(x)/logn(2) – S has been normalized to half bits, therefore the factor 2
![Page 18: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/18.jpg)
BLAST Exercise
![Page 19: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/19.jpg)
Genome browsers - UCSC
Intron - Exon structure
Single Nucleotide polymorphism - SNP
![Page 20: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/20.jpg)
SNPs
![Page 21: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/21.jpg)
Protein 3D-structure
![Page 22: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/22.jpg)
Protein structure
Primary structure: Amino acids sequences
Secondary structure: Helix/Beta sheet
Tertiary structure: Fold, 3D cordinates
![Page 23: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/23.jpg)
Protein structure-helix
helix 3 residues/turn - few, but not uncommon-helix 3.6 residues/turn - by far the most common helixPi-helix 4.1 residues/turn - very rare
![Page 24: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/24.jpg)
Protein structurestrand/sheet
![Page 25: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/25.jpg)
Protein folds
Class4’th is ‘few secondary structure
ArchitectureOverall shape of a domain
TopologyShare secondary structure connectivity
![Page 26: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/26.jpg)
Protein 3D-structure
![Page 27: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/27.jpg)
Neural NetworksFrom knowledge to information
Protein sequence Biological feature
![Page 28: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/28.jpg)
• A data-driven method to predict a feature, given a set of training data
• In biology input features could be amino acid sequence or nucleotides
• Secondary structure prediction
• Signal peptide prediction
• Surface accessibility
• Propeptide prediction
Use of artificial neural networks
N C
Signalpeptide
Propeptide Mature/active protein
![Page 29: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/29.jpg)
Prediction of biological featuresSurface accessible
QuickTime™ and a decompressor
are needed to see this picture.
Predict surface accessible fromamino acid sequence only.
![Page 30: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/30.jpg)
Logo plots
Information content, how is it calculated - what does it mean.
![Page 31: It & Health 2009 Summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56812b20550346895d8f1c84/html5/thumbnails/31.jpg)
Logo plots - Information Content
Sequence-logo
Calculate Information Content
I = apalog2pa + log2(4), Maximal value is 2 bits
• Total height at a position is the ‘Information Content’ measured in bits.• Height of letter is the proportional to the frequency of that letter.• A Logo plot is a visualization of a mutiple alignment.
~0.5 each
Completely conserved