ecoterm iv nbii/eionet demo of federated kos search mike frame vienna, austria april 2007

52
EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Upload: georgiana-griffith

Post on 14-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

EcoTerm IVNBII/EioNet Demo of Federated

KOS Search

Mike Frame

Vienna, Austria

April 2007

Page 2: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Discussion Topics…

• Project Background• NBII Thesaurus• GEMET Thesaurus• Prototype Client• Sample Query Results

• Including no, 1, or both thesauri • Overall Findings

Page 3: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Biocomplexity Thesaurushttp://thesaurus.nbii.gov

http://thesaurus.nbii.gov

Page 4: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

EIONET GEMET Thesaurushttp://www.eionet.europa.eu/gemet/webservices?langcode=en

Page 5: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII/EIONET Thesaurus Web-service

1

• Background - collaboration through Ecoinformatics TWG • Primary Goal – access distributed multi-lingual thesauri• Results – SKOS web-service & client

Page 6: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Latest Client & Service capabilities Access to both NBII and GEMET Single language capability Results are provided by source All documentation is completed

http://thesaurus.nbii.gov

Page 7: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Demo Client

Page 8: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Initial Challenges Identified

Thesaurus scope, intent, purpose, and coverage is different • NBII = sub-discipline of environment

• Endangered species

• Broader Terms:Species , Special status species , Taxa

• EIOINET = broad environment• Broader Terms:environmental protection

Page 9: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Current State

Users• Most aren’t aware of the underlying vocabulary

Vocabulary are often unique to organization and more for “categorization” than retrieval

Goal• Include all Vocabularies and let Search Engine

handle results

Page 10: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Demonstration Search Retrieval

Created a demonstration datasets

• NBII Cataloged Resources

•~30,000 web-sites, publications, images, maps, etc.

•Xml structured data – controlled subject

• NBII FGDC Metadata

•~22,000 resources on research studies

• 150-200 elements

•Semi-structured with no controlled vocabulary

Page 11: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII Catalog Records

Based on the Dublin Core + 18 elements, of which 10 are mandatory In place since 2002 Used by distributed content managers

Page 12: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII Metadata CH

Page 13: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Process Added thesaurus capabilities to Development

Search Engine for: • NBII Thesaurus

• EIONET GEMET Thesaurus

• Used BT, RT, NT relationships & weighting

Performed sample queries within the test repositories for:• No thesaurus

• GEMET only aided searching

• NBII only aided searching

• GEMET+NBII aided searching (X)

Page 14: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Test Repository 1

NBII Resource Catalog (Dublin Core)

Page 15: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

No Thesauri – “invasive species”

Page 16: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII Thesaurus – “invasive species”

Page 17: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

GEMET Thesaurus – “invasive species”

Page 18: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

No Thesauri – “Endangered Species”

Page 19: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII Thesaurus – “endangered species”

Page 20: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

GEMET Only – “endangered species”

Page 21: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

No Thesaurus – “rare species”

Page 22: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII Thesaurus – “rare species”

Page 23: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

GEMET Thesaurus – “rare species”

Page 24: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

GEMET Thesaurus – “rare species” (expanded degrees of relevance)

Page 25: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

No Thesauri – “protected species”

Page 26: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII Thesaurus – “protected species”

Page 27: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

GEMET Thesaurus – “protected species”

Page 28: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Results – NBII Catalog Resources

term None NBII GEMET

“invasive species”

2487 10802 2487

“endangered species”

1612 3532 1619

“rare species”

“rare species” (expanded)

249 7186 290

5847

“”protected species”

203 2345 1664

Page 29: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Results – NBII Resource Catalog

0

2000

4000

6000

8000

10000

12000

Invasive

spec ies

endangered

spec ies

rare spec ies protec ted

spec ies

None NBII GEMET

Page 30: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Test Repository 2

NBII FGDC Metadata

Page 31: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – No vocabulariesMetadata CH “ invasive species”

Page 32: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – NBII onlyMetadata CH “invasive species”

Page 33: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – GEMET onlyMetadata CH

“ invasive species”

Page 34: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – No vocabulariesMetadata CH

“endangered species”

Page 35: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – NBII onlyMetadata CH

“endangered species”

Page 36: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – GEMET onlyMetadata CH

“ endangered species”

Page 37: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

No Thesauri – Metadata CH“rare species”

Page 38: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

NBII Thesaurus – Metadata CH “rare species”

Page 39: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

GEMET Thesaurus – Metadata CH“rare species”

Page 40: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – No vocabulariesMetadata CH “protected species”

Page 41: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – NBII onlyMetadata CH

“protected species”

Page 42: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Sample Queries – GEMET onlyMetadata CH

“ protected species”

Page 43: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Results – FGDC Metadata

term None NBII GEMET

“invasive species”

302 7884 302

“endangered species”

1008 2690 1019

“rare species” 59 4259 64

“protected species”

11 2152 1011

Page 44: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Results – NBII Resource Catalog

0

1000

2000

3000

4000

5000

6000

7000

8000

Invasive

spec ies

endangered

spec ies

rare spec ies protec ted

spec ies

None NBII GEMET

Page 45: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Overall ResultsGeneral Findings

Assumption that a Thesaurus improves “number” of results is valid• Degree does vary by the term and mappings

Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results

Page 46: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Overall ResultsUsing only GEMET Terminology

Terms not included in the NBII thesaurus that were in GEMET improved search results

GEMET strength of broad coverage aided searches

In General for the Metadata repository• Results varied somewhat, but often same

top 10 results

Page 47: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Overall ResultsGeneral Findings

With “No thesaurus” test results produced poorer #1 results

Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)

Page 48: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Issues

“integrating” multi-scope and purpose thesauri presents challenges:• Can’t turn the effort into a thesaurus project

• Degrees of relevance of terms is an issue

• Concept matching or different intent

• Differing classification (RT vs. NT) across thesauri

• Differing “weighting” algorithms

Page 49: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Further Study Options

1.) Take multiple thesauri “as is”2.) Do some “attempted” concept

matchingi.e. “endangered animal species” –

“endangered animal”

3.) If not match is present, add term and relationship as is

4.) Obtain terms from XMDR

Page 50: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Further Study Options – cont.

Follow-up with additional repositories Repeat with other query terms Re-look at weighting algorithms Do queries with subset of terms Repeat with completely integrated

thesaurus as compared to>>>>>>> Repeat queries with machine integration

Complete By June

Page 51: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

Questions, Comments,

Page 52: EcoTerm IV NBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

GEMET Control file

endangered species,category of endangered species[.2],endangered animal species[0.8],endangered plant species[0.8]

protected species,category of endangered species[0.2],endangered species [0.2]

rare species,category of endangered species[0.2],extinct species[0.2],vanished species[0.2]