rice sequence and map analysis leonid teytelman. rice genome annotation sequence alignments...

26
Rice Sequence and Map Analysis Leonid Teytelman

Upload: colleen-page

Post on 11-Jan-2016

228 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Rice Sequence and Map AnalysisLeonid Teytelman

Page 2: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Rice Genome Annotation

•Sequence Alignments

•Automation

Comparative Maps

•Genetic Marker Correspondences

•FPC Map

•FPC I-Map

EnsEMBL Pipeline

•Automated Annotation

•Compute Farms

Page 3: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Rice Genome Annotation

Page 4: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Non-Rice Coding Sequences

•Maize Unigene Clusters

•Maize TIGR GIs

•Maize dbEST ESTs

•Barley dbEST ESTs

•Wheat dbEST ESTs

•Sorghum dbEST ESTs

Aligned Data Sets:

Rice CUGI BAC ends

Rice JRGP/Cornell RFLP Markers

Rice Coding Sequences

•Rice Complete CDSs

•Rice TIGR GIs

•Rice BGI EST Clusters

•Rice dbEST ESTs

•Rice BGI ESTs

Rice Cornell SSRs

Page 5: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

BLAT: search & alignment

pslReps: filtering of low-quality matches

e-PCR: matches based on near-identity to the PCR primers, and correct order

Alignment Tools:

Target

Queries

Page 6: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

BLAT: search & alignment

pslReps: filtering of low-quality matches

e-PCR: matches based on near-identity to the PCR primers, and correct order

Alignment Tools:

TargetTarget

Queries

Page 7: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Rice Coding Sequences:

•BLAT search & alignment

•pslReps filtering of repetitive matches

•Accept based on percent of EST length matched

Non-Rice Coding Sequences :

•BLAT search & alignment

•pslReps filtering of repetitive matches

•Accept based on hit length and hit frequency

Rice BAC ends:

•BLAT search & alignment

•Accept based on gap length, percent of BAC end length matched, percent identity, and hit frequency.

Alignment Methods:

Page 8: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Rice Markers:

•BLAT search & alignment

•Accept based on percent of marker length matched and the gap length in case of genomic markers.

•Utilize genetic map information; accept those whose genetic & physical chromosome assignment is concordant.

Rice SSRs:

•e-PCR with default parameters, allowing 0 mismatches in the primers

Alignment Methods:

Page 9: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Total BACs/PACs: 1,847Total bp: 250,879,896 (250MB ) Phase 1: 78Phase 2: 1,238Phase 3: 531Annotated Phase 3: 330 Annotated Genes: 8,034

February 2002 BAC/PAC Dataset

Page 10: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Alignment Totals

DATASET TOTAL COMPARED

TOTAL MAPPED

% MAPPED

Rice Complete CDSs 1,358 505 37%

Rice TIGR Gis 12,354 6,290 51%

Rice BGI EST Clusters 24,179 12,135 50%

Rice dbEST ESTs 104,549 49,773 48%

Rice BGI ESTs 86,623 40,049 46%

Maize Unigene Clusters 10,678 3,972 37%

Maize TIGR Gis 27,642 6,941 25%

Maize dbEST ESTs 147,657 38,718 26%

Barley dbEST ESTs 148,651 50,579 34%

Wheat dbEST ESTs 166,513 49,146 29%

Sorghum dbEST ESTs 84,711 28,044 33%

Rice CUGI BAC ends 88,053 18,260 21%

Rice JRGP/Cornell RFLP Markers 2,682 1,320 49%

Rice Cornell SSRs 524 228 44%

Page 11: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences
Page 12: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

For each group of data sets, there is a script to automatically:

•Run pslReps

•Load results into the database

•Discard low-quality matches

•Update documentation

Automating Alignments:

Page 13: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences
Page 14: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences
Page 15: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Comparative Maps

Page 16: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Same marker on multiple mapping studies

•Name-identity

•Curated evidence

Sequence-based correspondences for JRGP and Cornell markers:

•BLAT search & alignment

•Utilize genetic mapping information, accepting matches on same chromosome and less than 30cM apart.

Map Correspondences

Page 17: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

curator

same name

sequence-based

Page 18: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

curator

same name

Page 19: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

FPC data from CUGI, synchronized with the latest release.

Page 20: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Discordant

Page 21: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Cornell/JRGP markers mapped to sequenced clones were assigned positions on the FPC contigs.

Page 22: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Total: 2,272 4,417

Page 23: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

EnsEMBL Pipeline in a Nutshell

Page 24: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

•Can take advantage of a compute farm

EnsEMBL Pipeline Overview

•System for automated genome annotation

•Executes and keeps track of computational jobs

•Analysis job execution is serial, allowing stage dependencies

•Jobs are user-defined

RepeatMasker Genscan Blast GenomeBuilder Hmmer

RepeatMasker BLAT GeneWise Hmmer

Page 25: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Organization

•Utilizes and expands on the EnsEMBL-core modules and database schema

•Database stores:

•analysis program names and parameters

•analysis results

•rules for job dependencies

•and progress status for each job

•Perl modules:

•access the database

•execute specified analysis programs

•parse and load into the database the analysis results

Page 26: Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences

Cluster Utilization

•How to split up tasks?

•Load management an scheduling (LSF, PBS, etc)

•Contig-by-contig approach

•How to execute jobs on slave nodes?

•Management of management:

•Automatic job submission

•Error/completion checking