introduction to bioinformatics · bioinformatics deals with algorithms, databases and information...

32
Introduction to Bioinformatics Lukas Mueller Boyce Thompson Institute

Upload: others

Post on 21-May-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Introduction to Bioinformatics

Lukas MuellerBoyce Thompson Institute

Page 2: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

What is bioinformatics?

● Bioinformatics /baɪ.oʊˌɪnfəәrˈmætɪks/ is the application of computer science and information technology to the field of biology and medicine.

Page 3: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Bioinformatics deals with ● algorithms, databases and information systems, web

technologies, artificial intelligence and soft computing, information and computation theory, software engineering, data mining, image processing, modeling and simulation, signal processing, discrete mathematics, control and system theory, circuit theory, and statistics,

● for generating new knowledge of biology and medicine, and improving & discovering new models of computation (e.g. DNA computing, neural computing, evolutionary computing, immuno-computing, swarm-computing, cellular-computing).

Page 4: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Bioinformatics can...● Identify similar sequences● Provide a putative function for a sequence● Assemble sequences (genomes, transcriptomes)● Annotate genomes● Build networks of genes or metabolites● Determine phylogenetic relationships● Mine literature for biological information● Uncover differences between two genomes● Calculate how a protein folds

Page 5: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

What can bioinformatics do for me?

● Speed up your research● Enable you to ask new questions

● Majority of projects involve large datasets● Basic knowledge of bioinformatics needed

● Extract information● Transform information● Run analyses● Build hypotheses, etc.

Page 6: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information
Page 7: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

● http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

Page 8: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

200 GB / run

Page 9: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

The digital revolution

Page 10: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Increase in seq data

L. Stein, Genome Biology, 2010

Page 11: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Web-based bioinformatics

Page 12: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information
Page 13: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information
Page 14: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information
Page 15: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

The next step: Running “locally”

● Perform analyses on large datasets● Analyses run faster● Output easier to handle● Chain analyses● More flexible● Better control of parameters● Needs more knowledge about your computer

and tools!

Page 16: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Highly cited bioinfo tools● 1. BLAST (Altschul SF. et al. 1990; 30,202 citations)

● Sequence search my homology/similarity.● 2. CLUSTALW (Thompson JD. et al. 1994; 32,681 citations).

● Multiple sequence alignment.● 3. PAML (Yang ZH, 1997; 2,642 citations)

● “Maximum Likelihood” phylogenetic analysis.● 4. GBROWSE (Stein LD, et al. 2002; 428 citations),

● Genome visualization.● 5. BLAST2GO (Conesa A. et al. 2005; 363 citations),

● Sequence functional multi-annotation.

Page 17: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

(continued)● 6. VELVET (Zerbino DR, et al. 2008; 323 citations),

● Sequence assembly by Bruijn Graphs. ● 7. SAMTOOLS (Li H. et al. 2009; 172 citations),

● Multi-sequence alignment processing for NGS.● 8. SOAP2 (Li RQ et al. 2009; 76 citations).

● Sequence assembly (short reads). ● 9. MAKER (Cantarel BL, et al. 2008; 23 citations),

● Genome annotation pipeline.● 10. GALAXY (Goecks J. et al. 2010; 20 citations),

● Genomic analysis platform that integrates several scripts and tools.

Page 18: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Running “pipelines”

Page 19: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Linux

● UNIX-based, free an open source operating system

● Very stable● Adopted for most bioinformatics work

● Installed on laptops, clusters, supercomputers● Can run on your computer!

● Virtualized or native

Page 20: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

C, UNIX and Linux

• Ken Thompson and Dennis Ritchie inventors of UNIX at Bell labs in front of PDP-11 early 1970's.

• Linus Torvalds implemented an open source version of UNIX (Linux) while a student in Finland in the 1990s

Page 21: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Linux

Page 22: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

UNIX – the terminal

● Runs the “shell”● Built-in scripting

Page 23: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

shell commands

● Powerful, but text based (CLI)● Automate task, combine commands● Look like gobbledegook:grep Niben /var/log/ftp | grep -i sca | sort -u | wc -l

Page 24: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Scripting

● Scripts: Small programs written by the end-user that control the execution of other programs or perform a simple algorithm

● Extremely flexible● Written in Shell, Perl, Python● You can write them yourself!!!

Page 25: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Perl● Versatile language● Developed since 1980s by Larry Wall● Useful for bioinformatics and web development● Support for objects● Excellent integration of regular expressions (text handling

language)● Vast open source code library (http:/cpan.org/)

● BioPerl● Easy to learn● http://www.perl.org/

Page 26: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Example

.....

Page 27: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

● Language designed for statistics● Support for matrix calculations, graphics● Expression analysis, Next-Gen sequence

analysis, Graphics, genome annotation statistics, phylogeny

● Interactive● Bioconductor package

Page 28: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Databases

● Biological data is highly structured● Relational database systems (postgres, mysql)● Database schemas - normalization ● SQL

Page 29: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Transcriptomics and sequence assembly

● RNASeq technology and genome sequencing using next generation sequencing

● Experimental design, multiplexing● Special tools developed

● Sequence preprocessing● Aligners such as bwa, novoalign● Assemblers such as newbler, mira, velvet● Viewers ● File conversions

● Evaluation of assemblies● Structural and functional annotation

Page 30: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Phylogenetics and comparative genomics

● How do sequences/genomes relate to each other?● Align sequences

● ClustalW● Muscle

● Build phylogenetic trees● Parsimony● Neighbor join● Maximum likelyhood

● Analyses● Orthology● Modes of selection● Identification of SNP patterns● Genome duplications

Page 31: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

Beyond this course

● BTI Perl Club

● If you have a bioinformatics question, please let us know!

Page 32: Introduction to Bioinformatics · Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information

● http://btiplantbioinfocourse.wordpress.com/