sept 2008 ensembl funcgen perl api nathan johnson [email protected] ebi - wellcome trust genome...
TRANSCRIPT
Sept 2008
Ensembl Funcgen Perl API
Nathan [email protected]
EBI - Wellcome Trust Genome Campus, UK
Funcgen
Sept 2008
What is Ensembl Funcgen/eFG?
A local data storage and analysis platformOR
A Ensembl functional genomics database providing epigenomic and regulatory annotations
ORBoth
Sept 2008
eFG Dataflow
Experimental
Data
ExportAPI
Tab2MAGE
MAGE-ML
AnalysisPipeline
AnnotatedFeatures
DAS
FuncGen DB
Import API
Web API
GFF
Sept 2008
eFG data
Experimental
Processed Peak Calls
e.g. Mpeak, TileMap, ChIPOTLE, Nessie Combinatorial analysis
e.g Regulatory Build Externally curated
e.g cisRED, MiRanda, Vista
Experimental meta data Raw & Normalised data
TechnologyArrays/Chips/Probes
e.g. Tiling arraysShort reads
e.g Solexa, SOLiD etc
Sept 2008
eFG data
Ensembl v50 July '08: >60 data sets (ChIP-chip, wiggle, bed, custom) 3 species 9 cell types 24 Histone modifications, DHSS, CTCF, RNAPoLII … Regulatory Build v3:
Gene Associated 1584 Gene Associated - Cell type specific 5614 Non-Gene Associated 799 Non-Gene Associated - Cell type specific 520 Promoter Associated 12022 Promoter Associated - Cell type specific 1619 Unclassified 24814 Unclassified - Cell type specific 127633
Sept 2008
eFG Display
Methylation data
CTCF Data
Regulatory Features
cisREDmiRandaVista
Sept 2008
How eFG fits in.
• ensembl-functgenomics API- Object Oriented PERL- Follows Object ObjectAdaptor paradigm
• Fully integrated with wider Ensembl family of MySQL DBs
• Multi-Assembly: eFG stores a registry of core coordinate information which allows data to be stored using different core DBs and different genome assemblies.
• Minimal maintenance: Designed to aid incremental updates to local installations. Patch and update rather than blow away and recreate.
• Fully automated data import API and analysis pipeline
Sept 2008
ArrayExperimental
Features
Sets
eFGSchema
Sept 2008
Features: Probe > Annotated; External > Regulatory.
Sets - An abstract concept for manipulation of data collections: Logical association/combination
Access and administration
Supporting/Product
Set classes: ResultSet - Chips/Channels > Replicates
ExperimentalSet - Feature only import.
FeatureSet - e.g. Peak calls > AnnotatedFeatures
DataSet - Combines supporting Sets and product FeatureSet
Features & Sets
Sept 2008
eFG data flow
1...2..3..
HitList
Data
Raw
External DB
ExportAPIGFF
DataSet3
ResultSet3
ResultSet2
ResultSet1
DataSet2
ResultSet3
ResultSet2
ResultSet1
DataSet1
SupportingSet2
SupportingSet1
Result
ProductFeatureSet
Experimental
CombinedFeatureSet
SupportingSet2
DataSet4
Feature
SupportingSet1
Feature
External
Sept 2008
Technology data Array: A definitive collection of chips.
name(), format(), vendor(), description(), type(). fetch_by_name_vendor(), fetch_all_by_type().
ArrayChip: an individual chip in an array collection. name(), design_id(). fetch_all_by_array_design_ids, fetch_all_by_array_id(),
fetch_all_by_ExperimentalChip. Probe: a unique probe sequence within a given array or set of
arrays. name(), class(), length(). fetch_all_by_Array, fetch_all_by_ArrayChip(),
fetch_all_by_array_probe_probeset_name(). ProbeFeature: an alignment of a Probe against the genome.
start(), end(), strand(), mismatches(), cigarline(), analysis(). fetch_all_by_Probe, fetch_all_by_Slice_ExperimentalChips().
Sept 2008
DBAdaptor example codeuse strict;use Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor;use Bio::EnsEMBL::DBSQL::DBAdaptor;
my $dna_db = Bio::EnsEMBL::DBSQL::DBAdaptor->new(-user => ‘anonymous’,-host => ‘ensembldb.ensembl.org’,-species => ‘Homo_sapiens’,-dbname => ‘homo_sapiens_core_37_35j’,-group => ‘core’,);
my $efg_db = Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor->new(-user => ‘anonymous’,-host => ‘ensembldb.ensembl.org’,-species => ‘Homo_sapiens’,-dbname => ‘homo_sapiens_fungen_48_36j’,-group => ‘funcgen’,-dnadb => $dnadb,
);
Sept 2008
Array example codeuse strict;use Bio::EnsEMBL::Registry;my $reg = "Bio::EnsEMBL::Registry";
$reg->load_registry_from_db(
-host=> ‘ensembldb.ensembl.org’, -user => ‘anonymous’,
);
my $efg_db = $reg->get_DBadaptor(‘Human’, ‘funcgen’);my $array_adaptor = $efg_db->get_ArrayAdaptor;my @arrays = @{$array_adaptor->fetch_all };
foreach my $array(@arrays){ print "\nArray:\t".$array->name."\n"; print "Type:\t".$array->type."\n"; print "Vendor:\t".$array->vendor."\n";}
Array: 2005-05-10_HG17Tiling_SetType: OLIGOVendor: NIMBLEGEN
Array: ENCODE3.1.1Type: PCRVendor: SANGER
Sept 2008
ArrayChip example codemy $array = $array_adaptor->fetch_by_name_vendor
('2005-05-10_HG17Tiling_Set', 'NIMBLEGEN’);
my @achips = @{ $array->get_ArrayChips };
foreach my $ac(@achips){
print "ArrayChip:".$ac->name."\tDesignID:".$ac->design_id."\n";
}
ArrayChip:2005-05-10_HG17Tiling_Set31 DesignID:2061ArrayChip:2005-05-10_HG17Tiling_Set24 DesignID:2054ArrayChip:2005-05-10_HG17Tiling_Set12 DesignID:2042ArrayChip:2005-05-10_HG17Tiling_Set03 DesignID:2033ArrayChip:2005-05-10_HG17Tiling_Set04 DesignID:2034ArrayChip:2005-05-10_HG17Tiling_Set29 DesignID:2059ArrayChip:2005-05-10_HG17Tiling_Set13 DesignID:2043ArrayChip:2005-05-10_HG17Tiling_Set34 DesignID:2064ArrayChip:2005-05-10_HG17Tiling_Set07 DesignID:2037ArrayChip:2005-05-10_HG17Tiling_Set17 DesignID:2047ArrayChip:2005-05-10_HG17Tiling_Set23 DesignID:2053ArrayChip:2005-05-10_HG17Tiling_Set36 DesignID:2066ArrayChip:2005-05-10_HG17Tiling_Set08 DesignID:2038
Sept 2008
Probe example codemy $probe_adaptor = $efg_db->get_ProbeAdaptor;my $pfeature_adaptor = $efg_db->get_ProbeFeatureAdaptor;
my $probe = $probe_adaptor->fetch_by_array_probe_probeset_name('2005-05-10_HG17Tiling_Set', 'chr22P38797630’);
print "Got ".$probe->class." probe ".$probe->get_probename."\n";
my @pfeatures = @{$pfeature_adaptor->fetch_all_by_Probe($probe) };
print "Found ".scalar(@pfeatures)." ProbeFeatures\n";
foreach my $pfeature(@pfeatures){
print "ProbeFeature found at:\t".$pfeature->feature_Slice->name."\n";}
Got EXPERIMENTAL probe chr22P38797630
Found 1 ProbeFeatures
ProbeFeature found at: chromosome:NCBI36:22:38803076:38803125:1
Sept 2008
ExperimentalData1 Experiment provides a natural containers for experimetnal
meta. name(), group(), mage_xml(), primary_design_type(),
description(), get_ExperimentalChips(). fetch_by_name(), fetch_all_by_group(),
get_all_experiment_names(). ExperimentalChip represents a unique physical instance of
an ArrayChip. unique_id(), cell_type(), feature_type(), biological_replicate(),
technical_replicate(). fetch_all_by_experiment(), fetch_by_unique_id_vendor().
Channel represents a control or experimental channel from and ExperimentalChip.
dye(), type(), sample_id(). fetch_all_by_ExperimentalChip(),
fetch_all_type_experimental_chip_id().
Sept 2008
ExperimentalData1 example code
my $exp_adaptor = $efg_db->get_ExperimentAdaptor;
my $exp = $exp_adaptor->fetch_by_name(‘ctcf_ren’);
my $num_chips = scalar(@{$exp->get_ExperimentalChips });
print $exp->name.' '.$exp->primary_design_type." experiment contains $num_chips ExperimentalChips\n";
ctcf_ren binding_site_identification experiment contains 36 ExperimentalChips
Sept 2008
ExperimentalData2
ResultSet provides easy access to discrete sets of experimental data e.g replicates.
name(), cell_type(), feature_type(), display_label(), get_ExperimentalChips(), get_ResultFeatures_by_Slice().
fetch_all_by_name(), fetch_all_by_name_Analysis(), fetch_all_by_FeatureType(), fetch_all_by_Experiment().
ResultFeature is a special lightweight Feature optimised for display and analysis purposes.
start(), end(), score(). ResultSet::get_ResultFeatures_by_Slice().
Sept 2008
ExperimentalData2 example code
my $resultset_adaptor = $efg_db->get_ResultSetAdaptor;my $slice_adaptor = $efg_db->get_SliceAdaptor;
my ($result_set) = @{$resultset_adaptor->fetch_all_by_name(‘ctcf_ren_BR1’) };
my $slice = $slice_adaptor->fetch_by_region(‘chromosome’,‘X’);
my @result_features = @{$result_set->get_ResultFeatures_by_Slice($slice)};
print "Chromosome X has ".scalar(@result_features)." results\n";
foreach my $rf(@result_features){ print "Locus:\t".$rf->start.'-'.$rf->end.
"\tScore:".$rf->score."\n";}
Chromosome X has 582133 resultsLocus: 429-478 Score:-0.1095Locus: 529-578 Score:-0.1155Locus: 629-678 Score:0.0135Locus: 729-778 Score:-0.1735Locus: 829-878 Score:0.256
Sept 2008
More Sets
Experimental(Sub)Set are a special placeholder sets which facilitate feature import without any underlying data.
name(), cell_type(), feature_type(), format(), get_subsets(), ExperimentalSubSet->name().
fetch_by_name(), fetch_all_by_Experiment(), fetch_all_by_CellType(), fetch_all_by_FeatureType().
FeatureSet is generic set for containing features of various types e.g. AnnotatedFeatures, ExternalFeatures, RegulatoryFeatures.
name(), cell_type(), feature_type(), analysis(), get_Feature_by_Slice().
fetch_by_name(), fetch_all_by_type(), fetch_all_by_CellType, fetch_all_by_FeatureType().
Sept 2008
More Sets
DataSet is the top level container which associates underlying data or ‘supporting sets’ and a product FeatureSet i.e. the results of an analysis based on the underlying data. Supporting sets can be any other type of ‘Set’.
name(), cell_type(), feature_type(), product_FeatureSet(), get_supporting_sets().
fetch_by_name(), fetch_all_by_supporting_set(), fetch_all_by_product_FeatureSet().
Sept 2008
Set example code 1my $dataset_adaptor = $efg_db->get_DataSetAdaptor;my $data_set = $dataset_adaptor->fetch_by_name
(‘Nessie_NG_STD_2_ctcf_ren_BR1’);
my @supporting_sets = @{$data_set->get_supporting_sets};
foreach my $sset(@supporting_sets){print ‘Supporting set ‘.$sset->name.”\n”;
print 'Produced by analysis '.$sset->analysis->logic_name."\n";
}
my $pfset = $data_set->product_FeatureSet;print “\nProduct FeatureSet is “.$pfset->name.”\n”;print 'Produced by analysis '.
$pfset->analysis->logic_name."\n";
Supporting set: ctcf_ren_BR1_TR1Produced by analysis VSN_GLOG
Product FeatureSet is Nessie_NG_STD_2_ctcf_ren_BR1Produced by analysis Nessie_NG_STD_2
Sept 2008
Set example code 2
my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor;
my @ext_fsets = @{$featureset_adaptor->fetch_all_by_type('external')};
foreach my $ext_fset(@ext_fsets){ print "External FeatureSet:\t".$ext_fset->name."\n";}
External FeatureSet: miRanda miRNAExternal FeatureSet: cisRED group motifsExternal FeatureSet: cisRED search regionsExternal FeatureSet: VISTA enhancer set
Sept 2008
Features
ProbeFeature represent an individual alignment of a probe sequence.
probe(), probeset(), probelength(), get_result_by_ResultSet(). fetch_all_by_Probe(), fetch_all_by_Slice_ExperimentalChips().
AnnotatedFeature represents any feature based on experimental information i.e. ResultSet or ExperimentalSet data.
cell_type(), feature_type(), score(), display_label().
ExternalFeature represents an individual feature from an externally curated set.
cell_type(), feature_type(), display_label().
Sept 2008
Features RegulatoryFeature represents a feature generated
by the Regulatory Build. A combinatorial analysis based on DNase1 HSS’s, CTCF and histone modifications.
feature_type(), bound_start(), bound_end(), regulatory_attributes, display_label(), stable_id().
fetch_all_by_Slice, fetch_by_stable_id().
Sept 2008
Features example code 1
my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor;my $feature_set = $featureset_adaptor->fetch_by_name
(‘miRanda miRNA’);
my @features = $feature_set->get_Features_by_Slice($slice);
foreach my $feat(@features){ print $feat->display_label."\t".$feat->feature_Slice->name."\n";}
ENST00000390665:mmu-miR-712 chromosome:NCBI36:X:214111:214131:-1ENST00000390665:mmu-miR-673-5p chromosome:NCBI36:X:214115:214136:-1ENST00000390665:hsa-miR-22 chromosome:NCBI36:X:214125:214146:-1ENST00000390665:hsa-miR-887 chromosome:NCBI36:X:214138:214159:-1ENST00000390665:mmu-miR-696 chromosome:NCBI36:X:214149:214165:-1ENST00000390665:hsa-miR-328 chromosome:NCBI36:X:214178:214200:-1ENST00000390665:mmu-miR-669b chromosome:NCBI36:X:214228:214250:-1ENST00000390665:hsa-miR-197 chromosome:NCBI36:X:214264:214285:-1ENST00000390665:hsa-miR-220b chromosome:NCBI36:X:214265:214286:-1ENST00000390665:hsa-miR-636 chromosome:NCBI36:X:214341:214362:-1ENST00000390665:mmu-miR-689 chromosome:NCBI36:X:214424:214445:-1
Sept 2008
Features example code 2my $regfeat_adaptor = $efg_db->get_RegulatoryFeatureAdaptor;my @reg_feats = $regfeat_adaptor->fetch_by_Slice($slice);
foreach my $reg_feat(@reg_features){ print $reg_feat->stable_id.' '.
$reg_feat->feature_type->name."\n";
foreach my $attr_feat(@{$reg_feat->regulatory_attributes}){print 'AttributeFeature '.
$attr_feat->feature_type->name."\n"; }}
ENSR00000175296 Promoter Associated - Cell type specificAttributeFeature H3K4me3AttributeFeature H3K4me3AttributeFeature DNase1AttributeFeature DNase1AttributeFeature H3K4me3
ENSR00000092125 Unclassified - Cell type specificAttributeFeature DNase1
Sept 2008
eFG Environments
eFG environments provides useful functions, configuration and administration utilities:
efg efg_pipeline
• Coming soon…• Array mapping environment:
• Affy, Illumina, Codelink, Agilent, Nimblegen.• Genomic & transcript mapping pipelines.
Sept 2008
eFG Import
efg environment Arrays:
Nimblegen Sanger ENCODE
• Simple:• GFF• BED• Wiggle
• External:• cisRED• miRanda• VISTA• redFLY
Sept 2008
eFG Import
ChIP-chip Normalisation: VSN; TukeyBiweight. Bio::MAGE/Tab2Mage ResultSet nomeclature:
EXP1EXP1_BR1EXP1_BR1_TR1EXP1_BR1_TR2
ChIP-Seq Pre/Post analysis
Sept 2008
eFG Analysis
efg_pipeline environment
Pipeline - Ensembl gene build pipeline technology.
Analysis Runnables: ACME Chipotle Splitter TileMap Nessie(unpublished) SWEmbl(unpublished)
Regulatory Build
Sept 2008
eFG Analysis Regulatory Build - Feature construction:
Anchor/Focus sets: DNase1; CTCF. Attribute sets: Histone Modifications; Transcription factors.
Regulatory Annotation - Patterns associated with: Promoter regions Gene regions Non-Gene regions
DNAse1DNAse1
CTCF
H3K36me3
H3K4me3H3K4me3
H3K27me3
Sept 2008
Getting More Information
Workshop material http://www.ebi.ac.uk/~njohnson/courses/15.09.2008-GI-Hinxton
perldoc – Viewer for inline API documentation. shell> perldoc Bio::EnsEMBL::Funcgen::RegulatoryFeature online at: http://www.ensembl.org/info/software/Pdoc/
eFG schema description: online at: http://www.ensembl.org/info/using/api/funcgen/funcgen_schema.html
eFG installation document: online at: http://www.ensembl.org/info/using/api/funcgen/efg_introduction.html
ensembl-dev mailing list: [email protected]