design and creation of ontologies for environmental information retrieval vipul kashyap aos...

16
Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 [email protected]

Upload: bernadette-owen

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Design and Creation of Ontologies for Environmental Information Retrieval

Vipul KashyapAOS Workshop, Rome, November 2001

[email protected]

Page 2: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 2

Outline Ontologies for Information Retrieval: The InfoSleuth System

Sources for Ontology Construction

The Ontology Design Process:– “Reverse Engineering” from a database schema

– Ontology refinement based on user queries

Enhancing the ontology– Using a data dictionary

– Using a Thesaurus

Conclusions and Future Work

Page 3: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 3

Ontologies for Information Retrieval:The InfoSleuth System

ResourceAgent

ResourceAgent

ResourceAgent

UserAgent

UserAgent

UserAgent

KQML/KIFagentsDomainDomain

OntologyOntology

Page 4: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 4

Ontologies for Information Retrieval

Provide a concise, uniform, declarative description of semantic information

Independent of syntactic representations, conceptual models of the underlying information bases

Domain models provide wider access by supporting multiple world views on the same underlying data

EDEN ontology defined in the context of the InfoSleuth system:– important and crucial to capture elements of environmental information

Page 5: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 5

Sources for Ontology construction

Pre-existing Database Schemas– data directed component

Collection of representative set of queries possibly parameterized based on application user interface– application directed component

Thesauri and Vocabularies (e.g., EEA Thesaurus)– knowledge directed component

Ontology = knowledge-based middle ground between applications and data !!!

Page 6: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 6

The Ontology Design Process

Choose newDatabase Schema

Abstract detailsfrom Database Schema

Determine entitiesand attributes

Group information,Analyze foreign keysand dependencies

DetermineRelationships

EvaluateOntology

Implementand Test

Drop entitiesand attributes

Add new entitiesand attributes

Add new subclassesand superclasses

Choose new query

No morequeries

Ontology fromDatabase Schema

Ontology fromQueries

Page 7: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 7

Reverse Engineering from a Database Schema

Abstraction of details related to:– data organization

– local keys

Grouping information in multiple tables

Identifying Relationships

Incorporating new concepts suggested by new schema

Page 8: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 8

Environmental Databases

CERCLIS 3– http://www.epa.gov/enviro/html/cerclis/cerclis_overview.html

ITT

HAZDAT– http://www.atsdr.cdc.gov/hazdat.html

ERPIMS– http://www.resdyn.com/erpims

Basel Convention Database– http://www.unep.ch/basel

Page 9: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 9

Abstracting out details related to local keys

Sitesite_id (PK)site_namesite_ifms_ssid_codesite_rcra_idsite_epa_id

Site_Characteristic site_id (PK, FK to Site) rsic_code (PK, FK to Ref_Sic)sc_date

SiteId

datename

code

Database Schema

Ontology

Page 10: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 10

Grouping Information in Multiple TablesSitesite_id (PK)site_namesite_ifms_ssid_codesite_rcra_idsite_epa_id

Site_Characteristic site_id (PK, FK to Site) rsic_code (PK, FK to Ref_Sic)sc_date

Ref_Sicrsic_code (PK)rsic_code_desc

Site_Aliassite_id (PK, FK to Site)site_alias_id (PK)sa_name

Site

date

name

code

alias_name

description

Database Schema

Ontology

Page 11: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 11

Identifying RelationshipsSitesite_id (PK)site_namesite_ifms_ssid_codesite_rcra_idsite_epa_id

Actionsite_id (PK, FK to Site)rat_code (PK, FK to ref_action_type)act_code_id (PK)

Ref_action_typerat_code (PK)rat_namerat_def

Waste_Src_Media_Contaminatedwsmrc_nmbr (PK)site_id (PK, FK to Action)rat_code (FK to Action)act_code_id (FK to Action)

Remedial_Responsesite_idact_code_idrat_code

Site

Contaminant

RemedialResponsePerformedAt

actionName

Database Schema

Ontology

Page 12: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 12

Incorporation of new concepts from a different database schema

Site

Contaminant

RemedialResponsePerformedAt

ExportsTo ImportsFrom

Country

Sends_Exportuncodetoxics_codetoxics_descriptionexporterimporter

Receives_Exportuncodetoxics_codetoxics_descriptionexporterimporter

Database Schema

Ontology

Page 13: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 13

Ontology refinement based on user queries

Addition of New Attributes– At NPL sites with a land use category of INDUSTRIAL, what is the cleanup level

range for LEAD ….– Add an attribute landUseCategory to the entity Site in the ontology

Addition of new Relationships– What is the range of concentrations for ARSENIC is a contaminant of concern

in the SURFACE SOIL at NPL sites– Add a relationship HasContaminant between the entities Site and Contaminant

in the ontology

Addition of class-subclass relationships and new entities– How many Super fund sites are in Edison County, New Jersey ?– Add an entity SuperFundSite as a subclass of Site in the ontology

Page 14: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 14

Using a data dictionary (EDR) to enhance the ontology

Site

state

StateName StateCode StateAbbr

coding_scheme1

Map

coding_scheme2

coding_scheme3

select * from Site where state = ‘TX’ or state = ‘California’

select coding_scheme1 from Map where coding_scheme3 = ‘TX’

{ “Texas”, “California” } { “TX”, “CA” }

Page 15: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 15

Enhancing the Ontology by using a Thesaurus

abandoned siteTHEME POLLUTIONBT land setupNT disused military site

LandSetup

Site

AbandonedSite

DisusedMilitarySite

SuperfundSite

Page 16: Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 vipul_kashyap@yahoo.com

Knowledge Acquisition Workshop – 16

Conclusions and Future Work

Role of semantic content in handling data/information overload– Domain Specific ontologies: an approach for capturing semantic content

Design and construction of domain ontologies– labor intensive, time consuming, difficult endeavor

Re-use readily information: schemas, queries, data dictionaries, thesauri– minimize the involvement of the domain expert

Extrapolate this technique into other domains:

– telecommunication

– IP networks (use of CIM information model by DMTF)

Apply these techniques to Knowledge Management and Acquisition