informaon)ar(factontology:))...

47
Informa(on Ar(fact Ontology: General Background Barry Smith 1

Upload: others

Post on 29-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Informa(on  Ar(fact  Ontology:    General  Background  

Barry  Smith  

1  

Page 2: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Slides  

h=p://ncorwiki.buffalo.edu/index.php/STIDS_2013  

2  

Page 3: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Barry  Smith  –  who  am  I?  Director:  Na(onal  Center  for  Ontological  Research  (Buffalo)  Founder:  Ontology  for  the  Intelligence  Community  (OIC,  now  STIDS)  conference  series  

Ontology  work  for  

NextGen  (Next  Genera(on)  Air  Transporta(on  System  Na(onal  Nuclear  Security  Administra(on,  DoE  Joint-­‐Forces  Command  Joint  Warfigh(ng  Center  Army  Net-­‐Centric  Data  Strategy  Center  of  Excellence  Army  Intelligence  and  Informa(on  Warfare  Directorate  (I2WD)  

and  for  many  na(onal  and  interna(onal  biomedical  research  and  healthcare  agencies  

3  

Page 4: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

I2WD  Ontology  Team  

Ron  Rudnucki  CUBRC,  University  at  Buffalo  

Dr.  Ta(ana  Malyuta  NY  City  College  of  Technology  of  CUNY,    

Data  Tac<cs  Corp.  

David  Salmen    Data  Tac<cs  Corp.  

LCOL  Dr.  William  Mandrick    Data  Tac<cs  Corp.  

4  

Page 5: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

In  the  olden  days  

people  measured  lengths  using  inches,  ulnas,  perches,  king’s  feet,  Swiss  feet,  leagues  of  Paris,  etc.,  etc.    

5  

Page 6: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

On  June  22,  1799,  in  Paris,  everything  changed  

6  

Page 7: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Interna(onal  System  of  Units  (SI)  

7  

Page 8: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Making  data  (re-­‐)usable  through  standard  terminologies  

•  Standards  provide  – common  structure  and  terminology  

– single  data  source  for  review  (less  redundant  data)  

•  Standards  allow  – use  of  common  tools  and  techniques  – common  training  – single  valida(on  of  data  

8  

Page 9: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

One  successful  part  of  the  solu(on  to  this  problem  =  Ontologies  

controlled  vocabularies  (nomenclatures)  plus  defini(ons  of  terms  in  a  logical  language  

Standardized  (logically  defined)  terms  in  an  ontology  are  the  equivalent  of  standardized  

units  in  the  SI  

9  

Page 10: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Ontologies  

•  are  computer-­‐tractable  representa(ons  of  types  in  specific  areas  of  reality  

•  are  more  and  less  general  (upper  and  lower  ontologies)  – upper  =  organizing  ontologies  –  lower  =  domain  ontology  modules  

10  

Page 11: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Linked  Open  Data  are  not  enough  

11  

Page 12: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Links  are  inconsistently  defined;  ontologies  are  full  of  redundancies  

12  

Page 13: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Towards  coordina(on  of  modular  non-­‐redundant  ontologies  

13  

Page 14: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN AND ORGANISM

Organism (NCBI

Taxonomy)

Anatomical Entity (FMA, CARO)

Organ Function

(FMP, CPRO) Phenotypic Quality (PaTO)

Biological Process (GO)

CELL AND CELLULAR

COMPONENT

Cell (CL)

Cellular Component (FMA, GO)

Cellular Function

(GO)

MOLECULE Molecule

(ChEBI, SO, RnaO, PrO)

Molecular Function (GO)

Molecular Process (GO)

The Open Biomedical Ontologies (OBO) Foundry 14  

Page 15: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OF ORGANISMS

Family, Community, Deme, Population

(PCO) Organ

Function (FMP, CPRO)

Population Phenotype

Population Process

ORGAN AND ORGANISM

Organism (NCBI

Taxonomy)

Anatomical Entity (FMA, CARO) Phenotypic

Quality (PaTO)

Biological Process (GO)

CELL AND CELLULAR

COMPONENT

Cell (CL)

Cellular Component (FMA, GO)

Cellular Function

(GO)

MOLECULE Molecule

(ChEBI, SO, RnaO, PrO)

Molecular Function (GO)

Molecular Process (GO)

Population-level ontologies 15  

Page 16: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN AND ORGANISM

Organism (NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

Organ Function

(FMP, CPRO) Phenotypic Quality (PaTO)

Biological Process (GO)

CELL AND CELLULAR

COMPONENT

Cell (CL)

Cellular Component (FMA, GO)

Cellular Function

(GO)

MOLECULE Molecule

(ChEBI, SO, RnaO, PrO)

Molecular Function (GO)

Molecular Process (GO)

Environment Ontology (EnvO)

Envi

ronm

ents

16  

Page 17: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

OBO  Foundry  approach  extended  into  other  domains  

17

NIF  Standard   Neuroscience  Informa(on  Framework    

IDO  Consor(um   Infec(ous  Disease  Ontology  

cROP   Common  Reference  Ontologies  for  Plants  

MilPortal.org   Military  Ontology  

AIRS  Ontology  Suite   Intelligence  Ontology  Suite  

Page 18: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

18  

Page 19: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

19  

Page 20: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

20  

Page 21: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

slide  from  Margaret  Storey   21  

Page 22: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Horizontal  Integra(on  of    Big  Intelligence  Data  

The  Role  of  Ontology  in  the  Era  of  Big  Data  

T.  Malyuta,  Ph.  D    New  York  City  College  of  Technology,  NY,  NY  

B.  Smith,  Ph.  D  University  at  Buffalo,  Buffalo,  NY  

R.  Rudnicki    CUBRC,  Buffalo,  NY  

Page 23: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

23  

h=p://ncorwiki.buffalo.edu/  index.php/Main_Page#Documents  

Page 24: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Big  Data  Problem  

•  Wikipedia  defines  Big  Data  as  “…a  collec(on  of  data  sets  so  large  and  complex  that  it  becomes  difficult  to  process  using  on-­‐hand  database  management  tools.”  

•  Gartner  defines  Big  Data  with  three  ‘V’s:  –  Volume  –  Velocity  (of  produc(on  and  analysis)  –  Variety  –  Recently  the  forth  ‘V’  –  Veracity  –  was  added  

•  This  means  that  Big  Data  are  beyond  our  control  (as  opposed  to  those  complex  and  big  systems  with  diverse  and  changing  data  where  the  complexity  is  known)  

24  

Page 25: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Big  Data  Solu(on  –  Agility    

•  Dimensions  of  agility  – Storage  paradigms  that  accommodate  massive  volumes  of  heterogeneous  data  

– Data  processing  paradigms  that  can  deal  with  the  massive  volumes  of  heterogeneous  data  coming  onstream  

– Dynamic  data  stores  that  can  easily  accommodate  diverse  and  a  priori  unknown  data  types  and  seman(cs  

– Methods  and  tools  that  leverage  dynamic  and  diverse  content  

25  

Page 26: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Agile  Data  Management        •  New  highly  distributed  compu(ng  (MapReduce)  and  data  processing  (Bigtable)  paradigms  and  technologies  based  on  them  (hadoop.apache.org/,  hbase.apache.org/)  help  in  solving  data  management  problems:  –  Store  and  process  Volume    –  Keep  up  with  Velocity  –  Represent  Variety  

•  These  technologies  are  not  meant  (and  never  were  meant)  to  provide  data  interpreta(on  –  In  data  systems  we  have  been  dealing  with  data  the  meaning  of  which  we  knew  (usually  via  data  applica(ons)  

–  These  technologies  do  not  help  in  solving  the  problem  of  data  integra(on  and  interoperability  of  systems  

26  

Page 27: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Agile  U(liza(on  •  Today,  the  main  problem  of  the  Big  Data  is  how  to  use  it    

–  U(liza(on  of  ‘Variety’  –  diverse  and  a  priori  unknown  types  and  seman(cs    

–  Ability  of  Big  Data  systems  to  interoperate  –  Ability  to  integrate  Big  Data    –  The  last  two  problems  are  inherently  difficult  and  could  not  be  properly  addressed  by  the  data  technology  itself  

•  Tradi(onal  data  u(liza(on  and  integra(on  approaches  fail    

•  Relying  on  legacy  data  models  and  mappings  (linked  open  data)  fails  –  creates  forking  and  mapping  degrada(on  

•  Agile  u(liza(on  and  integra(on  paradigms  are  needed    27  

Page 28: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

The  Problem  of  Horizontal  Integra(on  of    Big  Intelligence  Data    

•  HI    =Def.  the  ability  to  exploit  mul?ple  data  sources  as  if  they  are  one    

•  Recognized  issues  for  HI  with  exis(ng  approaches  –  Data  silos  –  Lexicon/seman(cs  silos  

•  Requirement  for  HI  of  Big  Intelligence  Data  –      Agile  Seman(c  Interoperability    A  strategy  for  HI  must  be  agile  in  the  sense  that  it  can  be  quickly  extended  to  new  zones  of  emerging  data  according  to  need  

  Ontology  allows  an  incremental  approach  –  big  bang  already  from  the  very  first  buck  (we  showed  on  the  project  that  is  described  below)  

  Ontology  can  provide  the  needed  agility   28  

Page 29: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Agile  Seman(c  Interoperability  

•  A  good  solu(on  has  to  be  – Able  to  grow  incrementally    

– Able  to  be  developed  in  a  distributed  manner  – Without  losing  consistency  –  Independent  of  par(cular  implementa(ons,  and  data  producers  and  consumers  

– Applicable  to  data  in  an  agile  manner  

•  We  call  our  solu(on:  ‘seman(c  enhancement’  (SE)  of  data  

29  

Page 30: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

•  Explica-­‐<on  of  general  terms  used  in  source  intelligence  ar(facts  and  in  data  models,  terminologies  and  doctrinal  publica(ons  which  provide  typo-­‐logies  of  intelligence-­‐related  IAs  to  seman(cally  enhance  data  in  a  way  that  enables  computa(onal  integra(on  and  reasoning  

•  Annota<on  of  the  instance-­‐level  informa(on  captured  by  such  IAs  to  aid  retrieval  of  informa(on  about  specific  persons,  groups,  events,  documents,  images,  and  so  forth  

Explica(on  vs.  Annota(on  

Page 31: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

SE  Types  

•  Explica-­‐<on  of  general  terms  used  in  source  intelligence  ar(facts  and  in  data  models,  terminologies  and  doctrinal  publica(ons  which  provide  typo-­‐logies  of  intelligence-­‐related  IAs  to  seman(cally  enhance  data  in  a  way  that  enables  computa(onal  integra(on  and  reasoning  

•  Annota<on  of  the  instance-­‐level  informa(on  captured  by  such  IAs  to  aid  retrieval  of  informa(on  about  specific  persons,  groups,  events,  documents,  images,  and  so  forth  

Page 32: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

SE  •  SE  is  realized  with  the  help  of  ontologies  that  are  used  to  

explicate  data  models  and  annotate  data  instances    –  Vocabulary  of  ontologies  used  for  explica(ons  and  annota(ons  

provides  agile  horizontal  integra(on  –  Ontologies,  by  virtue  of  their  nature  and  organiza(on,    provide  

seman(c  enhancement  of  data  

PersonID   Name   Descrip?on  

111   Java   Programming  

222   SQL   Database  

SQL   Java   C++  

ProgrammingSkill  

ComputerSkill  

Skill   Educa(on  

Technical  Educa(on  

32  

Page 33: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

The  Meaning  of  ‘Enhancement’  •  Seman(c  enhancement/enrichment  of  data  =  arm’s  

length  approach  (no  change  to  data)  –  through  simple  explica(on  we  associate  an  en(re  knowledge  system  with  a  database  field    –  enables  analy(cs  to  process  data,  e.g.  about  computer  skills,  

“ver(cally”  along  the  Skill  hierarchy,  as  well  as  “horizontally”  via  rela(ons  between  Skill  and  Educa(on.    

–  and  further…  while  data  in  the  database  does  not  change,  its  analysis  can  be  richer  and  richer  as  our  understanding  of  the  reality  changes  

•  For  this  richness  to  be  leveraged  by  different  communi(es,  persons,  and  applica(ons  it  needs  to  have  the  proper(es  men(oned  above  and  be  constructed  in  accordance  with  the  principles  of  the  SE  

33  

Page 34: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

SE  Principles  ⁻  Create  a  Shared  Seman(c  Resource  (SSR)  of  ontologies  to  be  used  for  explica(on  and  annota(on  

⁻  Establish  an  agile  strategy  for  building  ontologies  within  this  SSR,  and  apply  and  extend  these  ontologies  to  explicate  and  annotate  new  source  data  as  they  come  onstream  

⁻  Problem:  Given  the  immense  and  growing  variety  of  data  sources,  the  development  methodology  must  be  applied  by  mul(ple  different  groups  ⁻  How  to  manage  collabora(on?    

34  

Page 35: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Achieving  the  Goal  •  Methodology  of  incremental  distributed  ontology  development    

•  A  common  ontology  architecture  incorpora(ng  a  common,  domain-­‐neutral,  upper-­‐level  ontology  (BFO)  

•  A  shared  governance  and  change  management  process  •  A  simple,  repeatable  process  for  ontology  development  

•  An  ontology  registry    •  A  process  of  intelligence  data  capture  through  explica(on  or  source  data  models    

35  

Page 36: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Main  Methodological  Points  •  Ontological  realism  

–  Based  on  Doctrine  /  Science  –  Involves  SMEs  in  label  selec(on  and  defini(on  

–  Thoroughly  tested  in  many  projects  •  Arms-­‐length  process,  with  minimal  disturbance  to  exis(ng  data  

and  data  seman(cs  

•  Reference  ontologies  –  capture  generic  content  and  are  designed  for  aggressive  reuse  in  mul(ple  different  types  of  context:  Single  reference  ontology  for  each  domain  of  interest  

•  Applica(on  ontologies  –  are  (ed  to  specific  local  applica(ons  –  An  applica(on  ontology  is  created  by  combining  local  content  with  generic  content  taken  from  relevant  reference  ontologies  

–  S(ll  interoperable  because  based  on    common  set  of  reference  ontologies    

*  Barry  Smith  and  Werner  Ceusters,  “Ontological  Realism  as  a  Methodology  for  Coordinated  Evolu(on  of  Scien(fic  Ontologies”,  Applied  Ontology,  5  (2010),  139–188.  

36  

Page 37: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Arms-­‐length  Process  

SE  ontology  labels  

•  Focusing  on  the  terms  (labels,  acronyms,  codes)  used  in  ***our  source  data    

•  Where  mul(ple  dis(nct  terms  {t1,  …,  tn}  are  used  in  separate  data  sources  with  one  and  the  same  meaning,  they  are  associated  with  a  single  preferred  label  drawn  from  a  standard  set  of  such  labels    

•  All  the  separate  data  items  associated  with  the  {t1,  …  tn}  thereby  linked  together  through  the  corresponding  preferred  labels.  

•  Preferred  labels  form  basis  the  for  the  ontologies  we  build  

Heterogeneous  Contents  ABC   KLM  

XYZ  

37  

Page 38: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Reference  and  Applica(on  Ontologies  

vehicle  =def:  an  object  used  for  transpor(ng  people  or  goods  

tractor  =def:  a  vehicle  that  is  used  for  towing  

crane  =def:  a  vehicle  that  is  used  for  liying  and  moving  heavy  objects  

vehicle  plaJorm=def:  means  of  providing  mobility  to  a  vehicle  

wheeled  plaJorm=def:  a  vehicle  plazorm  that  provides  mobility  through  the  use  of  wheels    

tracked  plaJorm=def:  a  vehicle  plazorm  that  provides  mobility  through  the  use  of  con(nuous  tracks

ar?llery  vehicle  =  def.  vehicle  designed  for  the  transport  of  one  or  more  ar(llery  weapons  

wheeled  tractor  =  def.  a  tractor  that  has  a  wheeled  plazorm  

tracked  tractor  =  def.  a  tractor  that  has  a  tracked  plazorm  

ar?llery  tractor  =  def.  an  ar(llery  vehicle  that  is  a  tractor    

wheeled  ar?llery  tractor  =  def.  an  ar(llery  tractor  that  has  a  wheeled  plazorm  

Reference  Ontology   Applica?on  Defini?ons  

38  

Page 39: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Illustra(on  of  Ontology  Types  (Toy  Example)  Vehicle  

Tractor  

Wheeled  Tractor  

Ar(llery  Tractor  

Wheeled  Ar(llery  Tractor  

Ar(llery  Vehicle  

Black  –  reference  ontologies  

Red  –  applica(on  ontologies  

39  

Page 40: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Role  of  Reference  Ontologies  •  Normalized  

– Maintains  a  set  of  consistent  ontologies    

– Eliminates  redundancy  

•  Modular  – A  set  of  plug-­‐and-­‐play  ontology  modules  – Enables  distributed  consistent  development  

•  Surveyable  

40  

Page 41: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

SE  Architecture  •  The  Upper  Level  Ontology  (ULO)  in  the  SE  hierarchy  must  be  maximally  general  (no  overlap  with  domain  ontologies)  

•  The  Mid-­‐Level  Ontologies  (MLOs)  introduce  successively  less  general  and  more  detailed  representa(ons  of  types  which  arise  in  successively  narrower  domains  un(l  we  reach  the  Lowest  Level  Ontologies  (LLOs).  

•  The  LLOs  are  maximally  specific  representa(on  of  the  en((es  in  a  par(cular  one-­‐dimensional  domain  

41  

Page 42: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Architecture  Illustra(on  

42  

Page 43: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Challenges  to  HI    •  Too  many  lexicons    •  The  scope  of  the  domain:  signal,  sensor,  image,  …  intelligence  about  …  the  whole  world  

•  Difficult  to  conduct  governance  and  management  of  ontology  development  to  ensure  consistent  evolu(on  

•  Lack  of  exper(se  •  Complexity  of  the  ontology  development  and  applica(on  process  

43  

Page 44: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Preven(ng  Failure  •  The  method  we  use  offers  solu(ons  to  some  of  the  common  

reasons  for  failure  •   Lack  of  Consensus  

–  Realism  offers  an  objec(ve  standard  for  se=ling  disputes  over  terminology.  Ontology  development  becomes  an  empirical  science  instead  of  an  exercise  in  the  publica(on  of  dialects  

–  Governance  helps  to  resolve  conflicts  and  achieve  consensus  •  High  Maintenance  

–  Arm’s  length  implementa(on  places  no  addi(onal  overhead  onto  applica(ons    

•  Parochialism  –  Architecture  and  methodology  prevent  development  of  

vocabularies  that  apply  only  to  a  single  perspec(ve  •  Poor  Quality  

–  Experience  prevents  common  mistakes  in  vocabularies  that  cause  downstream  problems  with  search  and  analy(cs  

44  

Page 45: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

Preven(ng  Failure  (cont.)  •  Agile  ontology  development  

– Methodology  and  architecture  – Growing  SSR  

•  Agile  ontology  applica(on  –  Incremental  –  Semi-­‐automated  where  possible  –  Even  if  not  as  fast  as  some  want  it  to  be  

•  It  is  s(ll  faster  than  crea(ng  a  physical  store,  which  will  be  just  another  silo  and  will  s(ll  need  to  be  integrated  with  the  rest  of  data  

•  Once  a  data  collec(on  is  seman(cally  enhanced,  it  is  integrated  with  all  data  that  had  been  and  will  be  seman(cally  enhanced  without  any  addi(onal  efforts      

45  

Page 46: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

What  is  Next…  

–  IAO-­‐Intel:  An  Informa(on  Ar(fact  Ontology  for  the  Intelligence  Community  (BS)  

– A  Survey  of  DSGS-­‐A  Ontology  Work    and  Explica(ng  and  Annota(ng  Processes  (R.  Rudnicki)  

– Email  Ontology  –  illustra(on  of  the  methodology  of  ontology  design  and  of  the  IAO-­‐Intel  (D.  Salmen  and  W.  Mandrick)  

46  

Page 47: Informaon)Ar(factOntology:)) General)Background)stids.c4i.gmu.edu/papers/STIDSPresentations/STIDS... · RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENT DEPENDENT COMPLEX

References  •  Barry  Smith,  Ta(ana  Malyuta,  William  S.  Mandrick,  Chia  Fu,  

Kesny  Parent,  Milan  Patel,  Horizontal  Integra(on  of  Warfighter  Intelligence  Data:  A  Shared  Seman(c  Resource  for  the  Intelligence  Community,  STIDS  Conference,  2012.  

•     •  Barry  Smith,  Ta(ana  Malyuta,  David  Salmen,  William  

Mandrick,  Kesny  Parent,  Shouvik  Bardhan,  Jamie  Johnson,  “Ontology  for  the  Intelligence  Analyst”,  Crosstalk:  The  Journal  of  Defense  Soyware  Engineering,  2012.  

•     •  David  Salmen,  Ta(ana  Malyuta,  Alan  Hansen,  Shaun  

Cronen,  Barry  Smith,  Integra(on  of  Intelligence  Data  through  Seman(c  Enhancement,  STIDS  Conference,  2011.    

47