semantic technologies for big sciences including astrophysics
DESCRIPTION
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014. Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4]. [1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/ [2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls [3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays [4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_DataTRANSCRIPT
Semantic Technologies for Big Science and Astrophysics
Invited presentation: EarthCube Solar-Terrestrial End-User Workshop
NJIT, Newark NJ, August 13-15, 2014
Amit Sheth, T. K. PrasadKno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH - 45435
2
Astrophysics
Lots of data
Complex
Heterogeneous
http://en.wikipedia.org/wiki/Astrophysics#mediaviewer/File:NGC_4414_%28NASA-med%29.jpg
3
• How can we handle this vast, heterogeneous,
and complex data space?• Focus on complexity rather than raw processing:
integration, collaboration, reuse
Challenge
Can Semantic (Web) technologies ease the challenges and empower the scientists?
4
The Semantic Web vision: 1999-2001
• Sir Tim Berners Lee, in his 1999 “Weaving the Web” book, emphasized the significance of metadata about Web documents.
• Well known May 2001 article presented an agent and an AI based vision for “next generation of the World Wide Web” with content amenable to automation.
• With Taalee (later Voquette, Semagix) I founded in 1999, I pursued a highly practical realization with semantic search, browsing and analysis products. Had commercial applications starting 2000, patent awarded in 2001.
1
2
3
of
Semantic Web
1
• Agreement and Knowledge: Agreement about a common vocabulary/nomenclature, conceptual models and domain knowledge, ontology
– Codified as Schema + Knowledge Base. – Agreement is what enables interoperability.– Formal machine processable description is what
leads to automation.– Manual, semi-automated, automated creation of
ontologies
2
• Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people.
– Manual– Semi-automatic (automatic with human
verification)– Automatic
3
• Reasoning/Computation, Applications: – Semantics enabled search, browsing – Data integration, collaboration– Visualization– Analyses including pattern discovery, mining, hypothesis
validation – Answering complex queries, making connections (paths,
sub graphs), supporting discovery
9
How to integrate well? From Syntax to Semantics
SSNOntology
2 Interpreted data(deductive)[in OWL] e.g., threshold
1 Annotated Data[in RDF]e.g., label
0 Raw Data[in TEXT]e.g., number
Using Semantics to Climb Levels of Abstraction: an example
3 Interpreted data (abductive)[in OWL]e.g., diagnosis
Intellego
“150”
Systolic blood pressure of 150 mmHg
ElevatedBlood
Pressure
Hyperthyroidism
less
use
ful …
…
mor
e us
eful
……
10
11
Semantic Web technologies – in practice
● Ontologies to capture domain knowledge (sometimes taxonomy/nomenclature is good enough)
● Languages to represent/capture domain knowledge and data - OWL, RDF/RDFS.
● Data sharing and publishing online (e.g., LOD).● Annotation, semantic search, semantic browsing● Provenance,…
Widely used in biomedicine; quite a few applications in healthcare, growing use and explorations in geosciences and more…
12
In this talk, I will review/borrow from
• ScienceWISE at EPFL which uses semantic technology to serve Physicists including Astrophysicists: shared vocabulary, annotation, browsing for related concepts
• Semantic (web) technologies for health care and life sciences encompassing collaborative research, prototypes, open source tools and ontologies, deployed applications, commercialization,…
• MaterialWays: Our project in Materials Genome Initiatives …
13
“Ontology” in physics domain – ScienseWISE
● ScienceWISE WISE - Web based Interactive Semantic Environment
● An interactive and crowdsourced tool to capture knowledge from scientists’ daily routine work.
● Core consists of a community built ontology.● Literature gets annotated and bookmarked using
the ontology.
14
ontology
annotation
bookmarking & recommendations
http://sciencewise.info/
15
Value Proposition
Associating machine-processable semantics with scientific, engineering data and
documents can help overcome challenges associated with data discovery, integration
and interoperability caused by data heterogeneity.
16
Benefits of using semantics for Astrophysicists (and other sciences)
• Challenges– Massive volume– Heterogeneity (i.e., from many sources, format/structure, text,
images).– Interoperability and sharing data– Provenance and Access Control.
• Need techniques beyond ScienceWISE– Interested in data beyond scientific publications– Data sharing (and credit/data citation for data sharing)– Provenance and Access control– A framework to capture, search, and discover astrophysical
data
17
Nature of Data and DocumentsRelational/Tabular Data
XML document
Image
Technical Specs
Irregular Tables
Publications
18
Granularity of Semantics and Applications: Examples
• Synonyms– Chemistry, Chemical Composition, Chemical Analysis, ...– Bend Test, Bending, ...– Delivery Condition, Process/Surface Finish, Temper, "as received by
purchaser", ...
• Coreference vs broadening/narrowing– Tubing vs welded tubing vs flash-welded part
• Capturing characteristic-value pairs– Recognize and Normalize: “0.1 inch and under in nominal thickness”
is translated to “Thickness <= 0.1 in”.– Glean elided characteristic: controlled term “solution heat treated”
implies the characteristic “heat treat type”.
19
Granularity of Semantics and Associated Applications
• Lightweight semantics: File and document-level annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and extraction for semantic search and summarization
• Fine-grained semantics: Data integration and interoperability.
20
Using Semantic Web Technologies
Machine-processable semantics achieved by addressing
• Syntactic Heterogeneity: Using XML syntax and RDF datamodel (labelled graph structure)
• Semantic Heterogeneity: – Using “common” controlled vocabularies, taxonomies
and ontologies – Using federated data sources, exchanges, querying,
and services
21
Ingredients for Semantics-based Cyber Infrastructure
• Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies)
• Semi-automatic annotation of data and documents
• Support for provenance and access control
22
A proposed “light-weight semantics” approach
(for highly distributed community, low start up time, long tail science)…
23
Our applications in Materials Genome Initiative
Materialways (our project related to Material Genomics Initiative):http://wiki.knoesis.org/index.php/MaterialWays
Matvocab home page
Search and discovery
Annotate documents
Visualize the knowledge base
Query vocabulary
View, edit, and add
Create processassertions
25
Search & Discovery
26
Annotate, search, and track provenance
• Vocabulary is used to annotate documents.
• Annotated documents can be indexed.
• Documents can be integrated reliably based on common terms of interest and provenance information.
27
Annotate documents using standard vocabulary
Create process assertions (OnCET)
• Add information about inputs to and outputs of a process as assertions in triple form using standard vocabulary.
• Add assertions about materials domain knowledge using vocabulary terms and relationship among them, e.g., about process control parameters and performance characteristics.
28
• Explains the origin of an artifact, such as– How was it created?– Who created it?– When was it created?
• Example: for a given material X– Which processes are involved in making the material and
what are the relevant performance properties? – What are the inputs, control parameters and outputs of a
process?– Which research/engineering team performed an
experiment?
Provenance Metadata
30
Capturing provenance metadata - iExplore
generic PMC prepreg
generic hand lay-up
generic PMC lay-up
generic autoclave cure
generic PMC
subjected to
subjected to
yields
yields
31
Vocabulary Provenance
ASM Handbook
MIL Handbook 5
MIL Handbook 17Vocabulary terms
Vocabulary term expressed in RDF and published online (http://knoesis.org/matvocab/A-basis)Wiki-based Crowd-sourcing Vocabulary
32
Capturing Vocabulary Provenance - iExplore
Definition
Rights
Source
Vocabulary term
33
Our proposal - Astrophysics
• Tagging, annotation, search• Knowledgebase -> Ontology• Provenance – at every data
level• Data access control• Capture process flows• Capture relationships
between concept instances• Visualization of process flows
ScienceWISE - Physics
• Tagging, annotation, search• Ontology ->
Knowledgebase• Provenance
Our approach to help in Astrophysics
• Access control and provenance details at every data level -> handle huge amount of astrophysics data.
• Create relationships between concepts and visualize them in graph format.
• Adding facts or assertion about each concept.
34
35
Data Access
36
Personal desktops
Lab notebooks
Databases
Single Access
37
Public-Private Data Sharing
• Enhance publicly available datasets while retaining intellectual property data privately for businesses
Private data and metadata(e.g. ongoing experimental processes, intellectual property data)
Selectively shared data and metadata(e.g. with ongoing collaborators, licensed data)
Public data and metadata (e.g., released products, material specifications)
38
OEM partner A
Federated Architecture
Private
Shared
Public
Federal Endpoint
1. User Authentication
2. Federated Semantic Query Processor
AC Processor
Semantic Query
Processor
OEM partner B
Private
Shared
Public
AC Processor
SemanticQuery
Processor
OEM supplier C
Private
Shared
Public
AC Processor
Semantic Query
Processor
3. Semantics Mappings
Principles of a Federation
• Each component controls access to its local data independently (local autonomy).
• A query is decomposed to multiple sub-queries, each sub-query is executed at one component.
• Results from sub-queries are combined by the federated query processor (control global access)
Can we choose any part of our Semantic Web data
to share with public community, or with selective collaborators ?
Different levels of granularity
– Individual resources• Example: a material product, a manufacturing process
– Individual triples• Example: properties of a product, or process
– Entire datasets
Enable flexible selection of any data piece to be shared at anytime
Local Component A
Creating Resources
Granting Permissions
Inferring Permissions
AC Processes
Federal Endpoint
2. AC-embedded Query Execution
User X of either Public group or Collaborators
Manager Yof component A
1. Query Rewriting
Various Policies
• Role-based Access Control (RBAC)• Mandatory Access Control (MAC)• Attribute-based Access Control (ABAC)• Discretionary Access Control (DAC)
1. Which policy? Depends on the
organization’s needs!
2. Our AC mechanism can be extended to
support any of these policies.
44
Advance capability: semantic browsing
• Example of Scooner: http://wiki.knoesis.org/index.php/Scooner
• Demo:http://knoesis.wright.edu/library/demos/scooner-demo/
Take Away
Use of semantic web technologies
can help overcome challenges associated with
data discovery, integration, and interoperability, caused by data heterogeneity.
Use provenance and access control information
help share/exchange data reliably.
45
46
Thank you, and please visit us at
http://knoesis.org/http://wiki.knoesis.org/index.php/MaterialWays
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA
Special Thanks (MaterialWays team): . Clare Paul (AFRL),Kalpa Gunaratna, Vinh Nguyen, Sarasi Lalithsena, Swapnil Soni. Nitisha Jayakumar,
Siva Cheekula.
Kno.e.sis