semantic technologies for big sciences including astrophysics

46
Semantic Technologies for Big Science and Astrophysics Invited presentation: EarthCube Solar -Terrestrial End-User Workshop NJIT, Newark NJ, August 13-15, 2014 Amit Sheth, T. K. Prasad Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Upload: knoesis-center-wright-state-university

Post on 18-May-2015

110 views

Category:

Science


1 download

DESCRIPTION

Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014. Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4]. [1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/ [2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls [3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays [4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data

TRANSCRIPT

Page 1: Semantic Technologies for Big Sciences including Astrophysics

Semantic Technologies for Big Science and Astrophysics

Invited presentation: EarthCube Solar-Terrestrial End-User Workshop

NJIT, Newark NJ, August 13-15, 2014

Amit Sheth, T. K. PrasadKno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Wright State University, Dayton, OH - 45435

Page 2: Semantic Technologies for Big Sciences including Astrophysics

2

Astrophysics

Lots of data

Complex

Heterogeneous

http://en.wikipedia.org/wiki/Astrophysics#mediaviewer/File:NGC_4414_%28NASA-med%29.jpg

Page 3: Semantic Technologies for Big Sciences including Astrophysics

3

• How can we handle this vast, heterogeneous,

and complex data space?• Focus on complexity rather than raw processing:

integration, collaboration, reuse

Challenge

Can Semantic (Web) technologies ease the challenges and empower the scientists?

Page 4: Semantic Technologies for Big Sciences including Astrophysics

4

The Semantic Web vision: 1999-2001

• Sir Tim Berners Lee, in his 1999 “Weaving the Web” book, emphasized the significance of metadata about Web documents.

• Well known May 2001 article presented an agent and an AI based vision for “next generation of the World Wide Web” with content amenable to automation.

• With Taalee (later Voquette, Semagix) I founded in 1999, I pursued a highly practical realization with semantic search, browsing and analysis products. Had commercial applications starting 2000, patent awarded in 2001.

Page 5: Semantic Technologies for Big Sciences including Astrophysics

1

2

3

of

Semantic Web

Page 6: Semantic Technologies for Big Sciences including Astrophysics

1

• Agreement and Knowledge: Agreement about a common vocabulary/nomenclature, conceptual models and domain knowledge, ontology

– Codified as Schema + Knowledge Base. – Agreement is what enables interoperability.– Formal machine processable description is what

leads to automation.– Manual, semi-automated, automated creation of

ontologies

Page 7: Semantic Technologies for Big Sciences including Astrophysics

2

• Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people.

– Manual– Semi-automatic (automatic with human

verification)– Automatic

Page 8: Semantic Technologies for Big Sciences including Astrophysics

3

• Reasoning/Computation, Applications: – Semantics enabled search, browsing – Data integration, collaboration– Visualization– Analyses including pattern discovery, mining, hypothesis

validation – Answering complex queries, making connections (paths,

sub graphs), supporting discovery

Page 9: Semantic Technologies for Big Sciences including Astrophysics

9

How to integrate well? From Syntax to Semantics

Page 10: Semantic Technologies for Big Sciences including Astrophysics

SSNOntology

2 Interpreted data(deductive)[in OWL] e.g., threshold

1 Annotated Data[in RDF]e.g., label

0 Raw Data[in TEXT]e.g., number

Using Semantics to Climb Levels of Abstraction: an example

3 Interpreted data (abductive)[in OWL]e.g., diagnosis

Intellego

“150”

Systolic blood pressure of 150 mmHg

ElevatedBlood

Pressure

Hyperthyroidism

less

use

ful …

mor

e us

eful

……

10

Page 11: Semantic Technologies for Big Sciences including Astrophysics

11

Semantic Web technologies – in practice

● Ontologies to capture domain knowledge (sometimes taxonomy/nomenclature is good enough)

● Languages to represent/capture domain knowledge and data - OWL, RDF/RDFS.

● Data sharing and publishing online (e.g., LOD).● Annotation, semantic search, semantic browsing● Provenance,…

Widely used in biomedicine; quite a few applications in healthcare, growing use and explorations in geosciences and more…

Page 12: Semantic Technologies for Big Sciences including Astrophysics

12

In this talk, I will review/borrow from

• ScienceWISE at EPFL which uses semantic technology to serve Physicists including Astrophysicists: shared vocabulary, annotation, browsing for related concepts

• Semantic (web) technologies for health care and life sciences encompassing collaborative research, prototypes, open source tools and ontologies, deployed applications, commercialization,…

• MaterialWays: Our project in Materials Genome Initiatives …

Page 13: Semantic Technologies for Big Sciences including Astrophysics

13

“Ontology” in physics domain – ScienseWISE

● ScienceWISE WISE - Web based Interactive Semantic Environment

● An interactive and crowdsourced tool to capture knowledge from scientists’ daily routine work.

● Core consists of a community built ontology.● Literature gets annotated and bookmarked using

the ontology.

Page 14: Semantic Technologies for Big Sciences including Astrophysics

14

ontology

annotation

bookmarking & recommendations

http://sciencewise.info/

Page 15: Semantic Technologies for Big Sciences including Astrophysics

15

Value Proposition

Associating machine-processable semantics with scientific, engineering data and

documents can help overcome challenges associated with data discovery, integration

and interoperability caused by data heterogeneity.

Page 16: Semantic Technologies for Big Sciences including Astrophysics

16

Benefits of using semantics for Astrophysicists (and other sciences)

• Challenges– Massive volume– Heterogeneity (i.e., from many sources, format/structure, text,

images).– Interoperability and sharing data– Provenance and Access Control.

• Need techniques beyond ScienceWISE– Interested in data beyond scientific publications– Data sharing (and credit/data citation for data sharing)– Provenance and Access control– A framework to capture, search, and discover astrophysical

data

Page 17: Semantic Technologies for Big Sciences including Astrophysics

17

Nature of Data and DocumentsRelational/Tabular Data

XML document

Image

Technical Specs

Irregular Tables

Publications

Page 18: Semantic Technologies for Big Sciences including Astrophysics

18

Granularity of Semantics and Applications: Examples

• Synonyms– Chemistry, Chemical Composition, Chemical Analysis, ...– Bend Test, Bending, ...– Delivery Condition, Process/Surface Finish, Temper, "as received by

purchaser", ...

• Coreference vs broadening/narrowing– Tubing vs welded tubing vs flash-welded part

• Capturing characteristic-value pairs– Recognize and Normalize: “0.1 inch and under in nominal thickness”

is translated to “Thickness <= 0.1 in”.– Glean elided characteristic: controlled term “solution heat treated”

implies the characteristic “heat treat type”.

Page 19: Semantic Technologies for Big Sciences including Astrophysics

19

Granularity of Semantics and Associated Applications

• Lightweight semantics: File and document-level annotation to enable discovery and sharing

• Richer semantics: Data-level annotation and extraction for semantic search and summarization

• Fine-grained semantics: Data integration and interoperability.

Page 20: Semantic Technologies for Big Sciences including Astrophysics

20

Using Semantic Web Technologies

Machine-processable semantics achieved by addressing

• Syntactic Heterogeneity: Using XML syntax and RDF datamodel (labelled graph structure)

• Semantic Heterogeneity: – Using “common” controlled vocabularies, taxonomies

and ontologies – Using federated data sources, exchanges, querying,

and services

Page 21: Semantic Technologies for Big Sciences including Astrophysics

21

Ingredients for Semantics-based Cyber Infrastructure

• Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies)

• Semi-automatic annotation of data and documents

• Support for provenance and access control

Page 22: Semantic Technologies for Big Sciences including Astrophysics

22

A proposed “light-weight semantics” approach

(for highly distributed community, low start up time, long tail science)…

Page 23: Semantic Technologies for Big Sciences including Astrophysics

23

Our applications in Materials Genome Initiative

Materialways (our project related to Material Genomics Initiative):http://wiki.knoesis.org/index.php/MaterialWays

Page 24: Semantic Technologies for Big Sciences including Astrophysics

Matvocab home page

Search and discovery

Annotate documents

Visualize the knowledge base

Query vocabulary

View, edit, and add

Create processassertions

Page 25: Semantic Technologies for Big Sciences including Astrophysics

25

Search & Discovery

Page 26: Semantic Technologies for Big Sciences including Astrophysics

26

Annotate, search, and track provenance

• Vocabulary is used to annotate documents.

• Annotated documents can be indexed.

• Documents can be integrated reliably based on common terms of interest and provenance information.

Page 27: Semantic Technologies for Big Sciences including Astrophysics

27

Annotate documents using standard vocabulary

Page 28: Semantic Technologies for Big Sciences including Astrophysics

Create process assertions (OnCET)

• Add information about inputs to and outputs of a process as assertions in triple form using standard vocabulary.

• Add assertions about materials domain knowledge using vocabulary terms and relationship among them, e.g., about process control parameters and performance characteristics.

28

Page 29: Semantic Technologies for Big Sciences including Astrophysics

• Explains the origin of an artifact, such as– How was it created?– Who created it?– When was it created?

• Example: for a given material X– Which processes are involved in making the material and

what are the relevant performance properties? – What are the inputs, control parameters and outputs of a

process?– Which research/engineering team performed an

experiment?

Provenance Metadata

Page 30: Semantic Technologies for Big Sciences including Astrophysics

30

Capturing provenance metadata - iExplore

generic PMC prepreg

generic hand lay-up

generic PMC lay-up

generic autoclave cure

generic PMC

subjected to

subjected to

yields

yields

Page 31: Semantic Technologies for Big Sciences including Astrophysics

31

Vocabulary Provenance

ASM Handbook

MIL Handbook 5

MIL Handbook 17Vocabulary terms

Vocabulary term expressed in RDF and published online (http://knoesis.org/matvocab/A-basis)Wiki-based Crowd-sourcing Vocabulary

Page 32: Semantic Technologies for Big Sciences including Astrophysics

32

Capturing Vocabulary Provenance - iExplore

Definition

Rights

Source

Vocabulary term

Page 33: Semantic Technologies for Big Sciences including Astrophysics

33

Our proposal - Astrophysics

• Tagging, annotation, search• Knowledgebase -> Ontology• Provenance – at every data

level• Data access control• Capture process flows• Capture relationships

between concept instances• Visualization of process flows

ScienceWISE - Physics

• Tagging, annotation, search• Ontology ->

Knowledgebase• Provenance

Page 34: Semantic Technologies for Big Sciences including Astrophysics

Our approach to help in Astrophysics

• Access control and provenance details at every data level -> handle huge amount of astrophysics data.

• Create relationships between concepts and visualize them in graph format.

• Adding facts or assertion about each concept.

34

Page 35: Semantic Technologies for Big Sciences including Astrophysics

35

Data Access

Page 36: Semantic Technologies for Big Sciences including Astrophysics

36

Personal desktops

Lab notebooks

Databases

Single Access

Page 37: Semantic Technologies for Big Sciences including Astrophysics

37

Public-Private Data Sharing

• Enhance publicly available datasets while retaining intellectual property data privately for businesses

Private data and metadata(e.g. ongoing experimental processes, intellectual property data)

Selectively shared data and metadata(e.g. with ongoing collaborators, licensed data)

Public data and metadata (e.g., released products, material specifications)

Page 38: Semantic Technologies for Big Sciences including Astrophysics

38

OEM partner A

Federated Architecture

Private

Shared

Public

Federal Endpoint

1. User Authentication

2. Federated Semantic Query Processor

AC Processor

Semantic Query

Processor

OEM partner B

Private

Shared

Public

AC Processor

SemanticQuery

Processor

OEM supplier C

Private

Shared

Public

AC Processor

Semantic Query

Processor

3. Semantics Mappings

Page 39: Semantic Technologies for Big Sciences including Astrophysics

Principles of a Federation

• Each component controls access to its local data independently (local autonomy).

• A query is decomposed to multiple sub-queries, each sub-query is executed at one component.

• Results from sub-queries are combined by the federated query processor (control global access)

Page 40: Semantic Technologies for Big Sciences including Astrophysics

Can we choose any part of our Semantic Web data

to share with public community, or with selective collaborators ?

Page 41: Semantic Technologies for Big Sciences including Astrophysics

Different levels of granularity

– Individual resources• Example: a material product, a manufacturing process

– Individual triples• Example: properties of a product, or process

– Entire datasets

Enable flexible selection of any data piece to be shared at anytime

Page 42: Semantic Technologies for Big Sciences including Astrophysics

Local Component A

Creating Resources

Granting Permissions

Inferring Permissions

AC Processes

Federal Endpoint

2. AC-embedded Query Execution

User X of either Public group or Collaborators

Manager Yof component A

1. Query Rewriting

Page 43: Semantic Technologies for Big Sciences including Astrophysics

Various Policies

• Role-based Access Control (RBAC)• Mandatory Access Control (MAC)• Attribute-based Access Control (ABAC)• Discretionary Access Control (DAC)

1. Which policy? Depends on the

organization’s needs!

2. Our AC mechanism can be extended to

support any of these policies.

Page 44: Semantic Technologies for Big Sciences including Astrophysics

44

Advance capability: semantic browsing

• Example of Scooner: http://wiki.knoesis.org/index.php/Scooner

• Demo:http://knoesis.wright.edu/library/demos/scooner-demo/

Page 45: Semantic Technologies for Big Sciences including Astrophysics

Take Away

Use of semantic web technologies

can help overcome challenges associated with

data discovery, integration, and interoperability, caused by data heterogeneity.

Use provenance and access control information

help share/exchange data reliably.

45

Page 46: Semantic Technologies for Big Sciences including Astrophysics

46

Thank you, and please visit us at

http://knoesis.org/http://wiki.knoesis.org/index.php/MaterialWays

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA

Special Thanks (MaterialWays team): . Clare Paul (AFRL),Kalpa Gunaratna, Vinh Nguyen, Sarasi Lalithsena, Swapnil Soni. Nitisha Jayakumar,

Siva Cheekula.

Kno.e.sis