sept 2008 ensembl funcgen perl api nathan johnson [email protected] ebi - wellcome trust genome...

34
Sept 2008 Ensembl Funcgen Perl API Nathan Johnson [email protected] EBI - Wellcome Trust Genome Campus, UK Funcgen

Upload: benjamin-wilkins

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Ensembl Funcgen Perl API

Nathan [email protected]

EBI - Wellcome Trust Genome Campus, UK

Funcgen

Page 2: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

What is Ensembl Funcgen/eFG?

A local data storage and analysis platformOR

A Ensembl functional genomics database providing epigenomic and regulatory annotations

ORBoth

Page 3: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG Dataflow

Experimental

Data

ExportAPI

Tab2MAGE

MAGE-ML

AnalysisPipeline

AnnotatedFeatures

DAS

FuncGen DB

Import API

Web API

GFF

Page 4: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG data

Experimental

Processed Peak Calls

e.g. Mpeak, TileMap, ChIPOTLE, Nessie Combinatorial analysis

e.g Regulatory Build Externally curated

e.g cisRED, MiRanda, Vista

Experimental meta data Raw & Normalised data

TechnologyArrays/Chips/Probes

e.g. Tiling arraysShort reads

e.g Solexa, SOLiD etc

Page 5: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG data

Ensembl v50 July '08: >60 data sets (ChIP-chip, wiggle, bed, custom) 3 species 9 cell types 24 Histone modifications, DHSS, CTCF, RNAPoLII … Regulatory Build v3:

Gene Associated 1584 Gene Associated - Cell type specific 5614 Non-Gene Associated 799 Non-Gene Associated - Cell type specific 520 Promoter Associated 12022 Promoter Associated - Cell type specific 1619 Unclassified 24814 Unclassified - Cell type specific 127633

Page 6: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG Display

Methylation data

CTCF Data

Regulatory Features

cisREDmiRandaVista

Page 7: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

How eFG fits in.

• ensembl-functgenomics API- Object Oriented PERL- Follows Object ObjectAdaptor paradigm

• Fully integrated with wider Ensembl family of MySQL DBs

• Multi-Assembly: eFG stores a registry of core coordinate information which allows data to be stored using different core DBs and different genome assemblies.

• Minimal maintenance: Designed to aid incremental updates to local installations. Patch and update rather than blow away and recreate.

• Fully automated data import API and analysis pipeline

Page 8: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

ArrayExperimental

Features

Sets

eFGSchema

Page 9: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Features: Probe > Annotated; External > Regulatory.

Sets - An abstract concept for manipulation of data collections: Logical association/combination

Access and administration

Supporting/Product

Set classes: ResultSet - Chips/Channels > Replicates

ExperimentalSet - Feature only import.

FeatureSet - e.g. Peak calls > AnnotatedFeatures

DataSet - Combines supporting Sets and product FeatureSet

Features & Sets

Page 10: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG data flow

1...2..3..

HitList

Data

Raw

External DB

ExportAPIGFF

DataSet3

ResultSet3

ResultSet2

ResultSet1

DataSet2

ResultSet3

ResultSet2

ResultSet1

DataSet1

SupportingSet2

SupportingSet1

Result

ProductFeatureSet

Experimental

CombinedFeatureSet

SupportingSet2

DataSet4

Feature

SupportingSet1

Feature

External

Page 11: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Technology data Array: A definitive collection of chips.

name(), format(), vendor(), description(), type(). fetch_by_name_vendor(), fetch_all_by_type().

ArrayChip: an individual chip in an array collection. name(), design_id(). fetch_all_by_array_design_ids, fetch_all_by_array_id(),

fetch_all_by_ExperimentalChip. Probe: a unique probe sequence within a given array or set of

arrays. name(), class(), length(). fetch_all_by_Array, fetch_all_by_ArrayChip(),

fetch_all_by_array_probe_probeset_name(). ProbeFeature: an alignment of a Probe against the genome.

start(), end(), strand(), mismatches(), cigarline(), analysis(). fetch_all_by_Probe, fetch_all_by_Slice_ExperimentalChips().

Page 12: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

DBAdaptor example codeuse strict;use Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor;use Bio::EnsEMBL::DBSQL::DBAdaptor;

my $dna_db = Bio::EnsEMBL::DBSQL::DBAdaptor->new(-user => ‘anonymous’,-host => ‘ensembldb.ensembl.org’,-species => ‘Homo_sapiens’,-dbname => ‘homo_sapiens_core_37_35j’,-group => ‘core’,);

my $efg_db = Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor->new(-user => ‘anonymous’,-host => ‘ensembldb.ensembl.org’,-species => ‘Homo_sapiens’,-dbname => ‘homo_sapiens_fungen_48_36j’,-group => ‘funcgen’,-dnadb => $dnadb,

);

Page 13: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Array example codeuse strict;use Bio::EnsEMBL::Registry;my $reg = "Bio::EnsEMBL::Registry";

$reg->load_registry_from_db(

-host=> ‘ensembldb.ensembl.org’, -user => ‘anonymous’,

);

my $efg_db = $reg->get_DBadaptor(‘Human’, ‘funcgen’);my $array_adaptor = $efg_db->get_ArrayAdaptor;my @arrays = @{$array_adaptor->fetch_all };

foreach my $array(@arrays){ print "\nArray:\t".$array->name."\n"; print "Type:\t".$array->type."\n"; print "Vendor:\t".$array->vendor."\n";}

Array: 2005-05-10_HG17Tiling_SetType: OLIGOVendor: NIMBLEGEN

Array: ENCODE3.1.1Type: PCRVendor: SANGER

Page 14: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

ArrayChip example codemy $array = $array_adaptor->fetch_by_name_vendor

('2005-05-10_HG17Tiling_Set', 'NIMBLEGEN’);

my @achips = @{ $array->get_ArrayChips };

foreach my $ac(@achips){

print "ArrayChip:".$ac->name."\tDesignID:".$ac->design_id."\n";

}

ArrayChip:2005-05-10_HG17Tiling_Set31 DesignID:2061ArrayChip:2005-05-10_HG17Tiling_Set24 DesignID:2054ArrayChip:2005-05-10_HG17Tiling_Set12 DesignID:2042ArrayChip:2005-05-10_HG17Tiling_Set03 DesignID:2033ArrayChip:2005-05-10_HG17Tiling_Set04 DesignID:2034ArrayChip:2005-05-10_HG17Tiling_Set29 DesignID:2059ArrayChip:2005-05-10_HG17Tiling_Set13 DesignID:2043ArrayChip:2005-05-10_HG17Tiling_Set34 DesignID:2064ArrayChip:2005-05-10_HG17Tiling_Set07 DesignID:2037ArrayChip:2005-05-10_HG17Tiling_Set17 DesignID:2047ArrayChip:2005-05-10_HG17Tiling_Set23 DesignID:2053ArrayChip:2005-05-10_HG17Tiling_Set36 DesignID:2066ArrayChip:2005-05-10_HG17Tiling_Set08 DesignID:2038

Page 15: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Probe example codemy $probe_adaptor = $efg_db->get_ProbeAdaptor;my $pfeature_adaptor = $efg_db->get_ProbeFeatureAdaptor;

my $probe = $probe_adaptor->fetch_by_array_probe_probeset_name('2005-05-10_HG17Tiling_Set', 'chr22P38797630’);

print "Got ".$probe->class." probe ".$probe->get_probename."\n";

my @pfeatures = @{$pfeature_adaptor->fetch_all_by_Probe($probe) };

print "Found ".scalar(@pfeatures)." ProbeFeatures\n";

foreach my $pfeature(@pfeatures){

print "ProbeFeature found at:\t".$pfeature->feature_Slice->name."\n";}

Got EXPERIMENTAL probe chr22P38797630

Found 1 ProbeFeatures

ProbeFeature found at: chromosome:NCBI36:22:38803076:38803125:1

Page 16: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

ExperimentalData1 Experiment provides a natural containers for experimetnal

meta. name(), group(), mage_xml(), primary_design_type(),

description(), get_ExperimentalChips(). fetch_by_name(), fetch_all_by_group(),

get_all_experiment_names(). ExperimentalChip represents a unique physical instance of

an ArrayChip. unique_id(), cell_type(), feature_type(), biological_replicate(),

technical_replicate(). fetch_all_by_experiment(), fetch_by_unique_id_vendor().

Channel represents a control or experimental channel from and ExperimentalChip.

dye(), type(), sample_id(). fetch_all_by_ExperimentalChip(),

fetch_all_type_experimental_chip_id().

Page 17: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

ExperimentalData1 example code

my $exp_adaptor = $efg_db->get_ExperimentAdaptor;

my $exp = $exp_adaptor->fetch_by_name(‘ctcf_ren’);

my $num_chips = scalar(@{$exp->get_ExperimentalChips });

print $exp->name.' '.$exp->primary_design_type." experiment contains $num_chips ExperimentalChips\n";

ctcf_ren binding_site_identification experiment contains 36 ExperimentalChips

Page 18: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

ExperimentalData2

ResultSet provides easy access to discrete sets of experimental data e.g replicates.

name(), cell_type(), feature_type(), display_label(), get_ExperimentalChips(), get_ResultFeatures_by_Slice().

fetch_all_by_name(), fetch_all_by_name_Analysis(), fetch_all_by_FeatureType(), fetch_all_by_Experiment().

ResultFeature is a special lightweight Feature optimised for display and analysis purposes.

start(), end(), score(). ResultSet::get_ResultFeatures_by_Slice().

Page 19: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

ExperimentalData2 example code

my $resultset_adaptor = $efg_db->get_ResultSetAdaptor;my $slice_adaptor = $efg_db->get_SliceAdaptor;

my ($result_set) = @{$resultset_adaptor->fetch_all_by_name(‘ctcf_ren_BR1’) };

my $slice = $slice_adaptor->fetch_by_region(‘chromosome’,‘X’);

my @result_features = @{$result_set->get_ResultFeatures_by_Slice($slice)};

print "Chromosome X has ".scalar(@result_features)." results\n";

foreach my $rf(@result_features){ print "Locus:\t".$rf->start.'-'.$rf->end.

"\tScore:".$rf->score."\n";}

Chromosome X has 582133 resultsLocus: 429-478 Score:-0.1095Locus: 529-578 Score:-0.1155Locus: 629-678 Score:0.0135Locus: 729-778 Score:-0.1735Locus: 829-878 Score:0.256

Page 20: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

More Sets

Experimental(Sub)Set are a special placeholder sets which facilitate feature import without any underlying data.

name(), cell_type(), feature_type(), format(), get_subsets(), ExperimentalSubSet->name().

fetch_by_name(), fetch_all_by_Experiment(), fetch_all_by_CellType(), fetch_all_by_FeatureType().

FeatureSet is generic set for containing features of various types e.g. AnnotatedFeatures, ExternalFeatures, RegulatoryFeatures.

name(), cell_type(), feature_type(), analysis(), get_Feature_by_Slice().

fetch_by_name(), fetch_all_by_type(), fetch_all_by_CellType, fetch_all_by_FeatureType().

Page 21: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

More Sets

DataSet is the top level container which associates underlying data or ‘supporting sets’ and a product FeatureSet i.e. the results of an analysis based on the underlying data. Supporting sets can be any other type of ‘Set’.

name(), cell_type(), feature_type(), product_FeatureSet(), get_supporting_sets().

fetch_by_name(), fetch_all_by_supporting_set(), fetch_all_by_product_FeatureSet().

Page 22: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Set example code 1my $dataset_adaptor = $efg_db->get_DataSetAdaptor;my $data_set = $dataset_adaptor->fetch_by_name

(‘Nessie_NG_STD_2_ctcf_ren_BR1’);

my @supporting_sets = @{$data_set->get_supporting_sets};

foreach my $sset(@supporting_sets){print ‘Supporting set ‘.$sset->name.”\n”;

print 'Produced by analysis '.$sset->analysis->logic_name."\n";

}

my $pfset = $data_set->product_FeatureSet;print “\nProduct FeatureSet is “.$pfset->name.”\n”;print 'Produced by analysis '.

$pfset->analysis->logic_name."\n";

Supporting set: ctcf_ren_BR1_TR1Produced by analysis VSN_GLOG

Product FeatureSet is Nessie_NG_STD_2_ctcf_ren_BR1Produced by analysis Nessie_NG_STD_2

Page 23: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Set example code 2

my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor;

my @ext_fsets = @{$featureset_adaptor->fetch_all_by_type('external')};

foreach my $ext_fset(@ext_fsets){ print "External FeatureSet:\t".$ext_fset->name."\n";}

External FeatureSet: miRanda miRNAExternal FeatureSet: cisRED group motifsExternal FeatureSet: cisRED search regionsExternal FeatureSet: VISTA enhancer set

Page 24: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Features

ProbeFeature represent an individual alignment of a probe sequence.

probe(), probeset(), probelength(), get_result_by_ResultSet(). fetch_all_by_Probe(), fetch_all_by_Slice_ExperimentalChips().

AnnotatedFeature represents any feature based on experimental information i.e. ResultSet or ExperimentalSet data.

cell_type(), feature_type(), score(), display_label().

ExternalFeature represents an individual feature from an externally curated set.

cell_type(), feature_type(), display_label().

Page 25: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Features RegulatoryFeature represents a feature generated

by the Regulatory Build. A combinatorial analysis based on DNase1 HSS’s, CTCF and histone modifications.

feature_type(), bound_start(), bound_end(), regulatory_attributes, display_label(), stable_id().

fetch_all_by_Slice, fetch_by_stable_id().

Page 26: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Features example code 1

my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor;my $feature_set = $featureset_adaptor->fetch_by_name

(‘miRanda miRNA’);

my @features = $feature_set->get_Features_by_Slice($slice);

foreach my $feat(@features){ print $feat->display_label."\t".$feat->feature_Slice->name."\n";}

ENST00000390665:mmu-miR-712 chromosome:NCBI36:X:214111:214131:-1ENST00000390665:mmu-miR-673-5p chromosome:NCBI36:X:214115:214136:-1ENST00000390665:hsa-miR-22 chromosome:NCBI36:X:214125:214146:-1ENST00000390665:hsa-miR-887 chromosome:NCBI36:X:214138:214159:-1ENST00000390665:mmu-miR-696 chromosome:NCBI36:X:214149:214165:-1ENST00000390665:hsa-miR-328 chromosome:NCBI36:X:214178:214200:-1ENST00000390665:mmu-miR-669b chromosome:NCBI36:X:214228:214250:-1ENST00000390665:hsa-miR-197 chromosome:NCBI36:X:214264:214285:-1ENST00000390665:hsa-miR-220b chromosome:NCBI36:X:214265:214286:-1ENST00000390665:hsa-miR-636 chromosome:NCBI36:X:214341:214362:-1ENST00000390665:mmu-miR-689 chromosome:NCBI36:X:214424:214445:-1

Page 27: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Features example code 2my $regfeat_adaptor = $efg_db->get_RegulatoryFeatureAdaptor;my @reg_feats = $regfeat_adaptor->fetch_by_Slice($slice);

foreach my $reg_feat(@reg_features){ print $reg_feat->stable_id.' '.

$reg_feat->feature_type->name."\n";

foreach my $attr_feat(@{$reg_feat->regulatory_attributes}){print 'AttributeFeature '.

$attr_feat->feature_type->name."\n"; }}

ENSR00000175296 Promoter Associated - Cell type specificAttributeFeature H3K4me3AttributeFeature H3K4me3AttributeFeature DNase1AttributeFeature DNase1AttributeFeature H3K4me3

ENSR00000092125 Unclassified - Cell type specificAttributeFeature DNase1

Page 28: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG Environments

eFG environments provides useful functions, configuration and administration utilities:

efg efg_pipeline

• Coming soon…• Array mapping environment:

• Affy, Illumina, Codelink, Agilent, Nimblegen.• Genomic & transcript mapping pipelines.

Page 29: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG Import

efg environment Arrays:

Nimblegen Sanger ENCODE

• Simple:• GFF• BED• Wiggle

• External:• cisRED• miRanda• VISTA• redFLY

Page 30: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG Import

ChIP-chip Normalisation: VSN; TukeyBiweight. Bio::MAGE/Tab2Mage ResultSet nomeclature:

EXP1EXP1_BR1EXP1_BR1_TR1EXP1_BR1_TR2

ChIP-Seq Pre/Post analysis

Page 31: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG Analysis

efg_pipeline environment

Pipeline - Ensembl gene build pipeline technology.

Analysis Runnables: ACME Chipotle Splitter TileMap Nessie(unpublished) SWEmbl(unpublished)

Regulatory Build

Page 32: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

eFG Analysis Regulatory Build - Feature construction:

Anchor/Focus sets: DNase1; CTCF. Attribute sets: Histone Modifications; Transcription factors.

Regulatory Annotation - Patterns associated with: Promoter regions Gene regions Non-Gene regions

DNAse1DNAse1

CTCF

H3K36me3

H3K4me3H3K4me3

H3K27me3

Page 33: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

Sept 2008

Getting More Information

Workshop material http://www.ebi.ac.uk/~njohnson/courses/15.09.2008-GI-Hinxton

perldoc – Viewer for inline API documentation. shell> perldoc Bio::EnsEMBL::Funcgen::RegulatoryFeature online at: http://www.ensembl.org/info/software/Pdoc/

eFG schema description: online at: http://www.ensembl.org/info/using/api/funcgen/funcgen_schema.html

eFG installation document: online at: http://www.ensembl.org/info/using/api/funcgen/efg_introduction.html

ensembl-dev mailing list: [email protected]

Page 34: Sept 2008 Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen