Download - Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report
![Page 1: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/1.jpg)
Marco Brandizi Marco Brandizi
Corso di Dott. in Informatica, Univ. Milano BicoccaCorso di Dott. in Informatica, Univ. Milano Bicocca
XIX CicloXIX Ciclo
Progress ReportProgress Report
Feb 2005Feb 2005
![Page 2: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/2.jpg)
AgendaMicroarrays and Gene Expression overview
A Knowledge Managment System for uA data management
Motivations
What to model and where to start from
First elaborations
Ongoing work and future
![Page 3: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/3.jpg)
Gene Expression and Microarrays
![Page 4: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/4.jpg)
DNA
gene
mRNA
protein
Genes Machine
Cell/Life
![Page 5: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/5.jpg)
Microarray Data / Details
![Page 6: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/6.jpg)
Microarray Data
![Page 7: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/7.jpg)
Microarrays Data Mgmt IssuesExp. data vs. seq. data:
Context dependent (living system, exp. Conditions)
Lack of standard unit of measure
Several normalizations methods
Multiple platforms and methods
No standard for data annotation
Vocabularies and terminology coherence
Details about: experiment, source, protocols, exp. conditions
![Page 8: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/8.jpg)
Microarrays Data Mgmt Issues / 2Evidences about data quality
What to store?
Raw Images
Computed values
Normalized values
How to find data
Complex vocabularies aware systems (ontologies)
Data mining and exp. comparison tools
Data access control
![Page 9: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/9.jpg)
Issues => MIAME/MAGE
![Page 10: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/10.jpg)
MIAME Experiment Modelling
![Page 11: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/11.jpg)
GCA DB
![Page 12: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/12.jpg)
GCA DB
![Page 13: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/13.jpg)
GCA DB
![Page 14: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/14.jpg)
Need of a KMS for uA data management
![Page 15: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/15.jpg)
The uA Experiment Cycle
![Page 16: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/16.jpg)
“Closing the loop”
![Page 17: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/17.jpg)
“Closing the loop”
![Page 18: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/18.jpg)
uA KMS: What to model?
![Page 19: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/19.jpg)
Knowledge management... what?Genes
Textual annotations, literature
Interactions, pathways.
Genes collections (functional families, clusters)
Experiment and Experimental Conditions
Keyword/ontology based searches
Tested conditions searches
Expression Values
Navigation
Same trascriptome/trend/correlation/pattern
![Page 20: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/20.jpg)
Knowledge management... what?Chips
Keyword searches
Annotations about chip quality, protocols to be used, etc.
People
“Is expert in ...”
“Works with ...”
“Is studing ...”
Its ranking is X (based on publications, user preferences, etc.
![Page 21: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/21.jpg)
Knowledge management... what?“Does IL-2 regulate something and under what conditions?”
Interactions of gene: IL2
Note added by Norman on Dec 10, 2003, deduced by text, not confirmed [see original text]: IL2 --------> UPREGULATES -----------> IL10, IL12 | ON --> Cellls --> Type: DC UNDER CONDITION: LPS Stimulus
![Page 22: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/22.jpg)
Knowledge management... what?
![Page 23: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/23.jpg)
Knowledge management... what?
![Page 24: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/24.jpg)
Knowledge management... what?
![Page 25: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/25.jpg)
uA KMS: Where to start from?
![Page 26: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/26.jpg)
What to do first?Gene Expression Formal Model
Focused on GE measures
Oriented to “closing the loop” goal
Several things to start from
Ontologies and Inference Systems
Already defined alike models
Other alike systems
![Page 27: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/27.jpg)
Defining a GE ModelStart Point: Ontologies and Inference Systems
XML->RDF->...->OWL, and related tools (ex.: Protegé, Racer, Jena)
Logics, particularly Description Logic
Inferential Systems and Languages (ex.: Prolog)
![Page 28: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/28.jpg)
Defining a GE ModelStart Point: Already defined alike models
"Modeling Gene Expression", Proceedings of NETTAB/2004, www.loa-cnr.it, A model in Description Logic of GE, but without focus on microarrays and expression intentsities
![Page 29: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/29.jpg)
Defining a GE ModelStart Point: Already defined alike models
Very similar to previous work, but with tools for annotation/querying of microarray chips
Yet, seems not focused on data/assays/etc. annotation.
![Page 30: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/30.jpg)
Defining GE ModelStart point: Other alike systems
Synapsia by Agilent, very similar, but not focused on uAs
Hybrow, www.hybrow.org, a computer-aided hypothesis evaluation
The Notebook Project, www.notebook.org, a bio-KMS based on SOAP and P2P
2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser, C., From actions to suggestions: supporting the work of biologists through laboratory notebooks
![Page 31: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/31.jpg)
Defining GE ModelStart point: Other alike systems
![Page 32: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/32.jpg)
uA KMS: Toward a GE Model
![Page 33: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/33.jpg)
Defining GE ModelGene Expression Formal Model
Basic elements: genes, hybridizations, experiments
![Page 34: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/34.jpg)
Defining GE ModelGene Expression Formal Model
Basic elements: annotated sets
![Page 35: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/35.jpg)
Defining GE ModelGene Expression Formal Model
Basic elements: annotated sets
![Page 36: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/36.jpg)
Gene Expression Entities
![Page 37: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/37.jpg)
Entities Grouping
EntitityCollection ::= Cluster of DataSet | Cluster of Entity
Cluster of DataSet ::= Cluster of DataSet | GeneCluster of DataSet.GeneSet | HybCluster of DataSet.HybSet
Cluster of Entity :: = Cluster of Entity | Set of Entity
All entities in a cluster are of same type. Ex. A cluster of genes contains hierarchically grouped sets of genes, only genes. NOTE: Grammar here used is VERY informal!!!
![Page 38: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/38.jpg)
Entities GroupingGeneSet ::=
Set of Gene
HybSet ::= Set of Hybridization
Set of X ::= { x : x IS-A X }
Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }
![Page 39: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/39.jpg)
Annotations
Annotation ::= EntityCollection => AnnotationSet
Annotation allows to track Gene Expression data with useful info.
![Page 40: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/40.jpg)
Annotations/BasicsAnnotation ( EmptySet ) ::=
EmptySet
Annotation ( Singleton ( Entity e ) ) ::= Attributes ( e ) U BaseAnnotation ( e )
![Page 41: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/41.jpg)
Annotations/BasicsBaseAnnotation ( Any ) ::=
To be decided, first ideas is a set of: Name/Value/Type, and Description like in MAGE External Reference, with URI, or attachmentGraph attachment, "vectoring" values, ex: PCA with components values, scatter plots witAnnotation AuthorAnnotation DateSecurity/Access referencesAlike the classes Extendeable, Describable, Identifiable of MAGE-OMEntity annotates another Entity, ex.: Exp author
![Page 42: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/42.jpg)
Annotations/BasicsAttributes ( Entity e ) ::=
Set of < attrib, value, type > for each declared attribute of Entityattribute may be declared in JavaBean fashion, optionally providing a mapping for type and semantic of attribute
Annotation ( GeneSet GS ) ::=BaseAnnotation ( GS )U Annotation ( g ) : g BELONGS GS U BiologicalAnnotation ( GS )
![Page 43: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/43.jpg)
Annotations/Biological Ann.BiologicalAnnotation ( GS ) ::=
Allows for tagging the gene set with a biological meaning the genes have ben grouped whyEx.:
belonging to functional family of apoptosisin the KEGG pathway about IL-2under GO ID #10234
![Page 44: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/44.jpg)
Annotations/Data SetsAnnotation ( Cluster of DataSet ds ) ::=
BaseAnnotation ( ds ) U Annotation ( < all entities in ds > ) Meaning of clustering
Clustering method / alghoritmAlghoritm annotations, ex.: parameter values
Cluster includes the case of flat set (not tree), and sub-cases: gene/hybs filtering ( genes have been filtered in from another data set ) values transformation ( normalization, PCA, average on replicas )
![Page 45: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/45.jpg)
Annotations/ExamplesTypes of annotations / searches:
Generic <attribute> LIKE <pattern><value> BETWEEN ( <lo>, <hi> ) <author> IS author
Genes public_id LIKE patternREGULATION ( g1, g2, ... gn ) g1 REGULATES | DOWN_REGULATES | UP_REGULATES | PROMOTE | INHIBITS ( g1, g2, ... gn ) geneX IN_PATHWAY ( p )
![Page 46: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/46.jpg)
Annotations/ExamplesDataSet
geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET dshybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds Not necessarily computed, annotated. CORRELATION ( dSet1, dSet2 ... dSetN, value )
annotates the expression values correlation
![Page 47: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/47.jpg)
Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.
![Page 48: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/48.jpg)
Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.
geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of SIMILAR_PROFILE geneSet1 TIME_AFTER geneSet2 ... geneSet1 TIME_BEFORE geneSet2 ...geneSet1 TIME_SHIFT geneSet2 ...geneSet1 TIME_OPPOSED geneSet2geneSet1 TIME_OPPOSED_SHIFT geneSet2
![Page 49: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/49.jpg)
Annotations/Comparisons “+/-” graphs. Common graphs of gene interaction that is evident from comparison experiments. Modeled via previous shown constructs.
![Page 50: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/50.jpg)
Annotations/Set Graphs Observing Eulero-Venn diagrams is very common. Modeled via Aset theory operations
![Page 51: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/51.jpg)
OperatorsOperations and relations with data
When storing result of operations, result source may be annotated and annotation composed coherently:
gset = gset1 U gset2 save ( gset, annotation ) gset is saved with:
further annotation provied by userSOURCE ( UNION, geset1, gset2 ) all annotations coming from gset1 and gset2 belongs to gset too
![Page 52: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/52.jpg)
OperatorsAset theory operations:
EntityCollection U EntityCollection ... U EntityCollection = EntityCollectionEntityCollection INTERSECTION EntityCollection ... INTERSECTION EntityCollection = EntityCollectionEntityCollection - EntityCollection = EntityCollection
Compositions: new Cluster ( geneSet1, geneSet2, geneSet3 ... geneSetN, geneSetAnnotation )new Cluster ( cluster1, cluster2, clusterAnnotation )
Relations on single entititiesgene1 DOWN_REGULATES ( gene2, gene3 ) AUTHOR misterX
![Page 53: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/53.jpg)
Ongoing and future...
![Page 54: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/54.jpg)
What's next?Refinements of GCA, study of BASE
Study of Ontologies tools and Ontology reasoners
Better definition of GE Model
Review with biologists
Cooperation with Ontology Groups (proposals are welcome...)
![Page 55: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report](https://reader031.vdocument.in/reader031/viewer/2022020500/56815596550346895dc371e7/html5/thumbnails/55.jpg)
To be continued...