surya saha - wordpress.com · 2018-04-03 · surya saha sol genomics network (sgn) boyce thompson...

55
Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY [email protected] // Twitter:@ SahaSurya BTI Plant Bioinformatics Course 2018 http:// www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die

Upload: others

Post on 04-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Surya SahaSol Genomics Network (SGN)

Boyce Thompson Institute, Ithaca, [email protected] // Twitter:@SahaSurya

BTI Plant Bioinformatics Course 2018

http://www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die

Page 2: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

19

53

DNA Structure discovery

19

77

20

12

Sanger DNA sequencing by

chain-terminating inhibitors

19

84

Epstein-Barr virus

(170 Kb)

19

87

Abi370 Sequencer

19

95

20

01

Homo sapiens (3.0 Gb)

20

05

454

Solexa

Solid

20

07

20

11

Ion Torrent

PacBio

Haemophilusinfluenzae(1.83 Mb)

20

13

Slide concept: Aureliano Bombarely

Sequencing over the Ages

Illumina

IlluminaHiseq X

454

4/2/2018 BTI Plant Bioinformatics Course 2018 2

Pinustaeda

(24 Gb)

20

14

NanoporeMinION

20

15

10XGenomics

Page 3: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

First generation sequencing

4/2/2018 BTI Plant Bioinformatics Course 2018 3

Sanger. Annu Rev Biochem. 1988;57:1-28.

Thanks to Nick Loman for the mention

Page 4: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Sanger method

4/2/2018 BTI Plant Bioinformatics Course 2018 4

Frederick Sanger13 Aug 1918 – 19 Nov 2013

Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977

http://dailym.ai/1f1XeTB

Page 5: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Sanger method

4/2/2018 BTI Plant Bioinformatics Course 2018 5

http://en.wikipedia.org/wiki/File:Sanger-sequencing.svg

http://en.wikipedia.org/wiki/File:Radioactive_Fluorescent_Seq.jpg

Page 6: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

First generation sequencing

• Very high quality sequences (99.999% or Q50)

• Very very low throughput

4/2/2018 BTI Plant Bioinformatics Course 2018 6

Run Time Read Length Reads / Run

Total

nucleotides

sequenced

Cost / MB

Capillary

Sequencing

(ABI3730xl)

20m-3h 400-900 bp 96 or 384 1.9-84 Kb $2400

http://www.hindawi.com/journals/bmri/2012/251364/tab1/

Page 7: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Next generation sequencing

4/2/2018 BTI Plant Bioinformatics Course 2018 7

Page 8: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Use the specific technology used to generate the data

– Illumina Hiseq/Miseq/NextSeq/Novaseq

– Pacific Biosciences RS I/RS II/Sequel

– Ion Torrent Proton/PGM

– Oxford Nanopore

4/2/2018 BTI Plant Bioinformatics Course 2018 8

http://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must-diepart-2

Page 9: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

454 Pyrosequencing

One purified DNA fragment, to one bead, to one read.

4/2/2018 BTI Plant Bioinformatics Course 2018 9

http://www.genengnews.com/

GS FLX Titanium

https://mariamuir.com/wp-content/uploads/2013/04/rip.gif

Page 10: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Illumina

4/2/2018 BTI Plant Bioinformatics Course 2018 10

Output 15 Gb 120 GB 1500 GB 1800 GB

Max Number of Reads/ Run

25 Million 400 Million 5 Billion 6 Billion

Max Read Length

2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp

Cost $99K $250K $740K $10M (10 units)

Source: Illumina

250030004000

500550

Page 11: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Illumina

4/2/2018 BTI Plant Bioinformatics Course 2018 11

Output 15 Gb 120 GB 1500 GB 1800 GB

Max Number of Reads/ Run

25 Million 400 Million 5 Billion 6 Billion

Max Read Length

2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp

Cost $99K $250K $740K $10M (10 units)

Source: Illumina

250030004000

500550

Page 12: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Illu

min

a

4/2/2018 BTI Plant Bioinformatics Course 2018 12

Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402

Page 13: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Illu

min

a

4/2/2018 BTI Plant Bioinformatics Course 2018 13

Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402

Page 14: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Pacific Biosciences SMRT sequencing

Single Molecule Real Time sequencing

4/2/2018 BTI Plant Bioinformatics Course 2018 14

http://smrt.med.cornell.edu/images/pacbio_library_prep-1.gif

RS II

Sequel

Page 15: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Pacific Biosciences SMRT sequencingError correction methods

4/2/2018 BTI Plant Bioinformatics Course 2018 15

PB

cRP

ipel

ine

Page 16: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 BTI Plant Bioinformatics Course 2018 16

Pacific Biosciences SMRT sequencingRead Lengths

Page 17: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Oxford Nanopore

4/2/2018 BTI Plant Bioinformatics Course 2018 17

https://www.nanoporetech.com/

http://erlichya.tumblr.com/post/66376172948/hands-on-experience-with-oxford-nanopore-minion

http://halegrafx.com/vector-art/free-vector-despicable-me-minions/

Page 18: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 BTI Plant Bioinformatics Course 2018 18

Page 19: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 BTI Plant Bioinformatics Course 2018 19

http://lab.loman.net/2017/03/09/ultrareads-for-nanopore/

E. coli K-12 MG1655 on a standard FLO-MIN106 (R9.4) flowcell

Page 20: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Long range scaffolding

4/2/2018 BTI Plant Bioinformatics Course 2018 20

Page 21: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Hi-C Crosslinking

4/2/2018 BTI Plant Bioinformatics Course 2018 21

Page 22: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 BTI Plant Bioinformatics Course 2018 22

http://mms.businesswire.com/media/20150225005296/en/454639/5/GemCodePlatform.jpg

• Long read information from short reads using 14bp bar codes• Very low input DNA ( as low as 0.625 ng) • Short library preparation time• 1ng of DNA is split across 100,000 Gel Coated Beads (GEMs)• Chromium instrument for single-cell RNAseq

GemCode

Page 23: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 BTI Plant Bioinformatics Course 2018 23

http://www.bionanogenomics.com/technology/why-genome-mapping/

Page 24: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Many Others..

• Ion Torrent Proton/PGM

• Dovetail

• Supporting technologies

– Nabsys

– OpGen

– Fluidigm

4/2/2018 BTI Plant Bioinformatics Course 2018 24

http://nextgenseek.com/2012/11/did-you-know-there-are-at-least-14-next-gen-sequence-technology-companies/

Page 25: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Real cost of Sequencing!!

Sboner, Genome Biology, 2011

4/2/2018 25BTI Plant Bioinformatics Course 2018

Page 26: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

So What Sequencer Do I Use??

Microbial genome

• Draft genome– Illumina Miseq (100-130X)

– Illumina Hiseq (<200X)

• Complete genome– Pacific Biosciences (80-100X)

• Amplicons (16S, ITS)– Illumina Miseq

Eukaryotic genome

• Denovo assembly– Pacific Biosciences (70-80X)

– Illumina Hiseq (100X+)

– 10X Genomics

– Hi-C

• Genotyping (GBS)– Illumina Hiseq

• BACs– Pacific Biosciences

4/3/2018 BTI Plant Bioinformatics Course 2018 26

$$$$ ????

Page 27: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Genome Assembly

4/2/2018 BTI Plant Bioinformatics Course 2018 27

http://biobeans.blogspot.com/2012/11/bioinformatics-genome-assembly.html

Page 28: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 BTI Plant Bioinformatics Course 2018 28

Slide credit: Torsten Seemann

Page 29: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Whole Genome Shotgun Sequencing

4/3/2018 29Slide credit: cbcb.umd.edu

BTI Plant Bioinformatics Course 2018

Page 30: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Genome Sequencing Strategies

4/3/2018 Centre for Agricultural Bioinformatics, Pusa 30

International Human Genome Sequencing Consortium 2001

Overlap Layout Consensus

http://contig.wordpress.com/

cbcb.umd.edu

Lon

g re

ad s

eq

uen

cin

g

Page 31: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Overlap-Layout-Consensus

4/3/2018 BTI Plant Bioinformatics Course 2018 31

Slide source: Commins 2009

Page 32: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/3/2018 32BTI Plant Bioinformatics Course 2018

De

Bru

ijn G

rap

h

Page 33: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Ingredients for a Good Assembly

4/3/2018 33

Slide credit: Mike Schatz

BTI Plant Bioinformatics Course 2018

Page 34: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/3/2018 BTI Plant Bioinformatics Course 2018 34

The diploid reference genome

Page 35: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

CHROMOSOMES

SCAFFOLDSCONTIGS

BTI Plant Bioinformatics Course 2018

Gene to Genome – The BIG picture

CONTIG GAPSSCAFFOLD GAPS

GENES

MAP (chr1)Ovate (chr1)TM (chr 9)L2 (chr 10)

Page 36: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

State of the SL2.50 Build

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 1 2 3 4 5 6 7 8 9 10 11 12

Sequence Scaffold gap length Component gap length

Length 823Mb

Sequence 737Mb

Contig gaps 43Mb (5.30%)

Scaffold gaps 42Mb (5.17%)

Total gaps 86Mb (10.47%)

Reference assembly but plenty of gaps!!

Page 37: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

Summary

Any genome assembly:

• Is a hypothesis that needs to be refined

• Is a work in progress

• Can sometimes be misguiding

So is genome annotation…..

Page 38: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

Gene structure improvement example

ITAG3.2

ITAG2.40

ITAG3.2

ITAG2.40Fusion of split genes

UTR extension

RNAseq

XY plot

RNAseq

XY plot

Required for 3’ RNAseq

Page 39: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

Quality check - Annotation Edit Distance (AED)

Based on RNAseq data support

AED= 0 complete support

AED =1 lack of support

Annotation Edit Distance

AED provides a means to

evaluate quality of annotations

given RNAseq and ortholog

evidence

Cu

mu

lati

ve f

ract

ion

of

tran

scri

pts

Page 40: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

Solanaceae Apollo annotation editor

Genomes available in Apollo • Request access to Apollo by

contacting SGN• More organisms will be added

as they become available.

For creating account: https://solgenomics.net/contact/form

Apollo: collaborative genome annotation editorhttps://github.com/gmod/apollohttp://genomearchitect.org

Page 41: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

Editing an existing gene model

Page 42: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

Correction of predicted gene model

Page 43: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

BTI Plant Bioinformatics Course 2018

Information Editor• DBXRefs (InterPro, Pfam)• PubMed IDs• Gene Ontology IDs (GO)• Comments

Page 44: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Cornell Sequencing Core

• Illumina Hiseq 2500 (Rapid run and High output)

• Illumina Miseq

• Illumina Nextseq 500

• 10X Genomics GemCode

4/2/2018 BTI Plant Bioinformatics Course 2018 44

http://www.biotech.cornell.edu/brc/genomics/services/price-list#overlay-context=brc/genomics-facility/next-generation-sequencing

$

$

$

Page 45: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Library Types

Single end

Pair end (PE, 150-300 bp, Fwd:/1, Rev:/2)

Mate pair (MP, 2Kb to 20 Kb)

4/2/2018 45

F

F R

F R 454/Roche

FR Illumina

Illumina

Slide credit: Aureliano BombarelyBTI Plant Bioinformatics Course 2018

Page 46: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Implications of Choice of Library

4/2/2018 46Slide credit: Aureliano Bombarely

Consensus sequence

(Contig)

Reads

Scaffold

(or Supercontig)

Pair Read information

NNNNN

Pseudomolecule

(or ultracontig)

F

Genetic information (markers) or Optical maps

NNNNN NN

BTI Plant Bioinformatics Course 2018

Page 47: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Multiplexing Libraries

Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector.

4/2/2018 47Slide credit: Aureliano Bombarely

AGTCGT

TGAGCA

AGTCGTAGTCGT

AGTCGTAGTCGT

TGAGCATGAGCA

TGAGCATGAGCA

AGTCGT

AGTCGT

AGTCGT

AGTCGT

TGAGCATGAGCA

TGAGCA

TGAGCA

Sequencing

BTI Plant Bioinformatics Course 2018

Page 48: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Data!!

4/2/2018 BTI Plant Bioinformatics Course 2018 48

Page 49: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Fasta files:

It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.

-Wikipedia

File Formats

4/2/2018 49Slide credit: Aureliano Bombarely

BTI Plant Bioinformatics Course 2018

Page 50: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Fastq files:

FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

-Wikipedia

• Single line ID with at symbol (“@”) in the first column.

• Sequences can be in multiple lines after the ID line

• Single line with plus symbol (“+”) in the first column to represent the quality line.

• Quality ID line may contain ID

• Quality values are in multiple lines after the + line but length is identical to sequence

4/2/2018 50Slide credit: Aureliano Bombarely

File Formats

BTI Plant Bioinformatics Course 2018

Page 51: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 51

Quality control: EncodingFastq files:

!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)

KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)

BTI Plant Bioinformatics Course 2018

Page 52: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Quality control: Encoding

4/2/2018 52

!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)

KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)

BTI Plant Bioinformatics Course 2018

Page 53: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

4/2/2018 53

Quality control: Encoding

http://en.wikipedia.org/wiki/Phred_quality_score

Phred score of a base is:Qphred = -10 log10 (e)

where e is the estimated error probability of a base

BTI Plant Bioinformatics Course 2018

Page 54: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Pre-processing: Tools

Trimming

• FastQC

• FASTX toolkit

• Trimmomatic

• Scythe

Joining paired-end reads

• fastq-join

• FLASH

• PANDAseq

4/2/2018 54BTI Plant Bioinformatics Course 2018

Page 55: Surya Saha - WordPress.com · 2018-04-03 · Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, ... So is genome annotation….. BTI Plant Bioinformatics Course 2018

Thank you!!

4/2/2018 BTI Plant Bioinformatics Course 2018 55