lecture 1 bnfo 135 usman roshan. course overview perl progamming language (and some unix basics)...
Post on 20-Dec-2015
214 views
TRANSCRIPT
![Page 1: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/1.jpg)
Lecture 1
BNFO 135
Usman Roshan
![Page 2: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/2.jpg)
Course overview
• Perl progamming language (and some Unix basics)– Unix basics– Intro Perl exercises– Programs for comparing DNA and protein sequences
• Sequence analysis– Pairwise and multiple sequence comparison– Sequence alignments– Application of alignments– Heuristic alignment (BLAST)
![Page 3: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/3.jpg)
Overview (contd)
• Grade: 40% programming assignments, 30% mid-term and 30% final exam
• Recommended Texts:– Perl for Bioinformatics by Arun Jagota– Introduction to Bioinformatics by Arthur Lesk
![Page 4: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/4.jpg)
Nothing in biology makes sense, except in the light of evolution
AAGACTT -3 mil yrs
-2 mil yrs
-1 mil yrs
today
AAGACTT
T_GACTTAAGGCTT
_GGGCTT TAGACCTT A_CACTT
ACCTT (Cat)
ACACTTC (Lion)
TAGCCCTTA (Monkey)
TAGGCCTT (Human)
GGCTT(Mouse)
T_GACTTAAGGCTT
AAGACTT
_GGGCTT TAGACCTT A_CACTT
AAGGCTT T_GACTT
AAGACTT
TAGGCCTT (Human)
TAGCCCTTA (Monkey)
A_C_CTT (Cat)
A_CACTTC (Lion)
_G_GCTT (Mouse)
_GGGCTT TAGACCTT A_CACTT
AAGGCTT T_GACTT
AAGACTT
![Page 5: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/5.jpg)
Representing DNA in a format manipulatable by computers
• DNA is a double-helix molecule made up of four nucleotides:– Adenosine (A)– Cytosine (C)– Thymine (T)– Guanine (G)
• Since A (adenosine) always pairs with T (thymine) and C (cytosine) always pairs with G (guanine) knowing only one side of the ladder is enough
• We represent DNA as a sequence of letters where each letter could be A,C,G, or T.
• For example, for the helix shown here we would represent this as CAGT.
![Page 6: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/6.jpg)
Transcription and translation
![Page 7: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/7.jpg)
Amino acids
Proteins are chains ofamino acids. There aretwenty different aminoacids that chain indifferent ways to formdifferent proteins.
For example,FLLVALCCRFGH (this is how we could storeit in a file)
This sequence of aminoacids folds to form a 3-Dstructure
![Page 8: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/8.jpg)
Protein folding
![Page 9: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/9.jpg)
Protein folding
• The protein foldingproblem is to determinethe 3-D protein structurefrom the sequence.• Experimental techniquesare very expensive. • Computational are cheap but difficult to solve. • By comparing sequences we can deduce the evolutionary conserved portions which are also functional (most of the time).
![Page 10: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/10.jpg)
Protein structure
• Primary structure: sequence ofamino acids.• Secondary structure: parts of thechain organizes itself into alpha helices, beta sheets, and coils. Helices and sheets are usually evolutionarily conserved and can aid sequence alignment.• Tertiary structure: 3-D structure of entire chain• Quaternary structure: Complex of several chains
![Page 11: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d485503460f94a23d50/html5/thumbnails/11.jpg)
Key points
• DNA can be represented as strings consisting of four letters: A, C, G, and T. They could be very long, e.g. thousands and even millions of letters
• Proteins are also represented as strings of 20 letters (each letter is an amino acid). Their 3-D structure determines the function to a large extent.