© keith g jeffery & anne assersoncerif course: implementation 20021024 1 cerif course session5:...

95
© Keith G Jeffery & Anne Asserson CERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC [email protected] Anne Asserson, University of Bergen [email protected]

Upload: marianna-arnold

Post on 16-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 1

CERIF COURSESession5:

ImplementationKeith G Jeffery, Director, IT CLRC [email protected]

Anne Asserson, University of Bergen [email protected]

Page 2: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 2

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 3: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 3

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 4: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 4

Language handling Character / Language

Variants• Character sets

– Not only ‘Latin-1’– Can use escape codes technique but

only works in linear data streams– Better to use a rich code that can

handle any character from any language (including mathematics, financial currencies) - Unicode

– But it requires more storage

Page 5: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 5

Language handling Character / Language

Variants• Language• CERIF has many text fields• Each field may exist in multiple languages• For retrieval or update need to know the

language (for text-matching)• So have within the logical record multiple sub-

records differentiated by language for each text field

• Example: Project.Abstract will usually exist in (US) English and original language and maybe language of country/region where stored

Page 6: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 6

Equip Name

Relation_254

Relation_253

Serv ice Description

Serv ice Name

Ev ent Description

Ev ent Name

Per Res Interest

Equip Desc

Exp Skill Name

Exp Skill Desc

Pers Class

Pers Facility

Pers Equip

Per Serv ice

Pers Pub

Per Product

Per Patent

Pers Con

Per Exp Skill

Ev ent Per

CV Per

Person

Per IdPer Family NamesPer First NamesPer Other NamesPer SexPer Academic TitlePer QualificationsPer NationalitiesPer Prize AwardPer URI

CVCV IdCV URI

EventEvent IdEvent TypeEvent LocationEvent Start DateEvent End DateEvent Fee or FreeEvent URL

ContactCon IdCon Addrline1Con Addrline2Con Addrline3Con AddrLine4Con Addrline5Con City TownCon Province StateCon Postal codeCon TelephoneCon FaxCon EmailCon URI

ServiceService IdService URI

Expertise Skills

Exp Skill Id

Particular Equipment

Equip IdEquip Ow Inv IdEquip OEM Id

Result Publication

Pub IdPub DatePub ReferencePub URIPub URI Type

Result PatentPatent IdPatent NumberPatent CountriesPatent Reg DatePatent Approval datePatent URI

General FacilityFacility IDFacility URL

ClassificationClassClass codeClass URL

Result ProductProd IdProd Int IdProd URI

Level 3 Person Entity

Expertise Skills Name

Exp Skill Name LanguageExp Skill Name Trans TypeExp Skill Name

Service NameService Name LanguageService Name Trans TypeService Name

Equipment DescriptionEquip Desc LanguageEquip Desc Trans TypeEquip Description

Expertise Skills DescriptionExp Skill Desc LanguageExp Skill desc Trans TypeExp Skill Description

Service Description

Service Dersc LanguageSERVICE_TRANS_TYPEService Description

Research Interest

Res Int LanguageRes Int Trans TypeRes Int Keywords

Event NameEvent LangaugeEvent Trans TypeEvent Name

Event Description

Event Desc LanguageEvent Desc Trans TypeEvent Description

Patent Title

Patent Title LanguagePatent Title Trans TypePatent Title

Patent Abstract

Patent Abs LanguagePatent Abs Trans TypePatent Abstract

Equipment Name

Equip Name LanguageEquip Name Trans TypeEquip Name

Page 7: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 7

Language handling Example:

Person_Research_Interests

  Translation char(1) m, pk(part) o(rig), h(uman), m(achine)

Person-Research interest

Person Id char(32) m, pk(part)  

  Language char(2) m, pk(part)  

  Keywords char(1024)

   

Page 8: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 8

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 9: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024 9

Classification schemes …… Enumerated Lists,

Dictionaries, Thesauri, Ontologies

• Purpose– Higher quality data: data validation– More accurate retrieval: query

keywords limited and stored words (for any attribute) limited

– Classification – allowing grouping and ranking by value of attribute

Page 10: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

10

Classification schemes …… Enumerated List

• Example: Country Code• There is an ISO standard list of valid

2-character and 3-character country codes

• On input can validate country code is from this list (commonly with a pull-down)

• If changes in countries, update the list in one place and whole system reconfigured

Page 11: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

11

Reg Con

Country Contact

Prod Ty pe

Facility Ty pe

Patent Ty pe

Patent Status

Pub Ty pe

Per Honor Title

Equip Ty pe

Per LanguagePer Qualif ication

Per Academ Title

CV Multimedia

Pers Class

Pers Facility

Pers Equip

Per Serv ice

Pers Pub

Per Product

Per Patent

Pers Con

Per Exp Skill

Ev ent Per

CV Per

Person

Per IdPer Family NamesPer First NamesPer Other NamesPer SexPer Academic TitlePer QualificationsPer NationalitiesPer Prize AwardPer URI

CV

CV IdCV URI

Event

Event IdEvent TypeEvent LocationEvent Start DateEvent End DateEvent Fee or FreeEvent URL

ContactCon IdCon Addrline1Con Addrline2Con Addrline3Con AddrLine4Con Addrline5Con City TownCon Province StateCon Postal codeCon EmailCon FaxCon TelephoneCon URI

ServiceService IdService URI

Expertise Skills

Exp Skill Id

Particular EquipmentEquip IdEquip Ow Inv IdEquip OEM Id

Result Publication

Pub IdPub DatePub ReferencePub URIPub URI Type

Result PatentPatent IdPatent NumberPatent CountriesPatent Reg DatePatent Approval datePatent URI

General Facility

Facility IDFacility URL

Classification

ClassClass codeClass URL

Result Product

Prod IdProd Int IdProd URI

Level 4 Person Entity

Honorofic TitleHonorific TitleHonorific Title Full

Academic Title

ACADEMIC_TITLEAcademic Title Full

Qualification

QualificationQualification Full

LanguageLang CodeLang Full NameLang English Name

Multimedia TypeMultimedia TypeMultimedia Type FullMultimedia Format

Equipment Type

Equip TypeEquip Type Full

Publication Type

Pub TypePub Type FullPatent Status

Patent StatusPatent Status Full

Patent TypePatent TypePatent Type Full

Facility Type

FACILITY_TYPEFacility Type Full

Product Type

PROD_TYPEProd Type Full

Country

Country CodeCountry NameCountry Name English

NUTS Region

Region codeRegion NameRegion Name English

Page 12: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

12

Classification schemes …… Lookup Table

Examples: Person-relatedPerson-honorific title

Honorific Title char (4) m, pk(part) Sir, Lady…

Person-honorific title

Title full char(32) m, pk(part)  

         

Person-academic title

Academic Title char (4) m, pk(part) Dr, Prof…

Person-academic title

Title full char(32) m, pk(part)  

         

Person-qualification

Qualification char(4) m, pk(part)  

Person-qualification

Qualification full

char(32) m, pk(part) ISCED (UNESCO International Classification of Education, level of education)

Page 13: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

13

CON_ID = CON_ID

LANG_CODE = LANG_CODE

PER_ID = PER_ID

CV_ID = CV_ID

MULTIMEDIA_TYPE = MULTIMEDIA_TYPE

EXP_SKILL_ID = EXP_SKILL_ID

PER_ID = PER_ID

PER_ID = PER_ID

CV_ID = CV_ID

PER_ID = PER_ID

PER_ID = PER_ID

QUALIFICATION = QUALIFICATION

PER_ID = PER_ID

ACADEMIC_TITLE = ACADEMIC_TITLE

PER_ID = PER_ID

HONORIFIC_TITLE = HONORIFIC_TITLE

PERSON

PER_ID CHAR(32)PER_FAMILY_NAMES VARCHAR2(32)PER_FIRST_NAMES VARCHAR2(16)PER_OTHER_NAMES VARCHAR2(32)PER_SEX CHAR(1)PER_ACADEMIC_TITLE VARCHAR2(8)PER_QUALIFICATIONS VARCHAR2(24)PER_NATIONALITIES VARCHAR2(16)PER_PRIZE_AWARD VARCHAR2(64)PER_URI VARCHAR2(512)

EXPERTISE_SKILLS

EXP_SKILL_ID CHAR(32)

CV

CV_ID VARCHAR2(32)CV_URI VARCHAR2(512)

HONOROFIC_TITLE

HONORIFIC_TITLE CHAR(4)HONORIFIC_TITLE_FULL VARCHAR2(32)

ACADEMIC_TITLE

ACADEMIC_TITLE CHAR(4)ACADEMIC_TITLE_FULL VARCHAR2(32)

QUALIFICATIONQUALIFICATION CHAR(4)QUALIFICATION_FULL VARCHAR2(32)

LANGUAGE

LANG_CODE CHAR(2)LANG_FULL_NAME VARCHAR2(32)LANG_ENGLISH_NAME VARCHAR2(32)

MULTIMEDIA_TYPE

MULTIMEDIA_TYPE VARCHAR2(16)MULTIMEDIA_TYPE_FULL VARCHAR2(64)MULTIMEDIA_FORMAT VARCHAR2(32)

HONORIFIC_TITLE_PER

HONORIFIC_TITLE CHAR(4)PER_ID CHAR(32)

ACADEMIC_TITLE_PER

ACADEMIC_TITLE CHAR(4)PER_ID CHAR(32)

QUAL_PERSON

QUALIFICATION CHAR(4)PER_ID CHAR(32)

PERS_CONTACT

PER_ID CHAR(32)CON_ID CHAR(32)PER_CON_ROLE VARCHAR2(16)PER_CON_START DATEPER_CON_END DATE

PERS_CV

CV_ID CHAR(32)PER_ID CHAR(32)

PERS_EXPERT_SKILL

PER_ID CHAR(32)EXP_SKILL_ID CHAR(32)PER_EXP_ROLE VARCHAR2(16)PER_EXP_CONDITIONS VARCHAR2(1024)PER_EXP_PRICE NUMBER(12,2)PER_EXP_CURRENCY CHAR(3)PER_EXP_AVAILABILITY VARCHAR(64)PER_EXP_START DATEPER_EXP_END DATE

CV_MULTIMEDIA_TYPE

MULTIMEDIA_TYPE VARCHAR2(16)CV_ID CHAR(32)

PERS_LANG

PER_ID CHAR(32)LANG_CODE CHAR(2)SKILL_READING CHAR(1)SKILL_WRITING CHAR(1)SKILL_SPEAKING CHAR(1)

CONTACT

CON_ID CHAR(32)COUNTRY_CODE VARCHAR2(4)REGION_CODE CHAR(10)CON_ADDRLINE1 VARCHAR2(80)CON_ADDRLINE2 VARCHAR2(80)CON_ADDRLINE3 VARCHAR2(80)CON_ADDRLINE4 VARCHAR2(80)CON_ADDRLINE5 VARCHAR2(80)CON_CITY_TOWN VARCHAR2(32)CON_PROVINCE_STATE VARCHAR2(32)CON_POSTAL_CODE VARCHAR2(16)CON_TELEPHONE VARCHAR2(24)CON_FAX VARCHAR2(24)CON_EMAIL VARCHAR2(32)CON_URI VARCHAR2(512)

Level 5 Person Characteristics

Page 14: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

14

Classification schemes …… Other Lookup Tables in

CERIF• 1.1.1       Language• The ISO 639 standard, two-letter code should be used.• This standard is applied to the entity Language in the CERIF 2000

data model.• 1.1.2       Country• The ISO 3166 standard, two-letter code should be used. • This standard is applied to entity Country in the CERIF 2000 data

model.• 1.1.3       Currency• The three-letter ISO 4217 code (SWIFT code) should be used. • This is applied to the currency values in the CERIF 2000 data

model.• 1.1.4       Address• The NUTS (territorial units EU) codes should be used for regions

in the EU. • This is applied to entity NUTS-Region in the CERIF 2000 data

model.

Page 15: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

15

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.5       Role of a person in an organization• The ISCO[1] should be used. • This is applied to the role attribute in the Person-

OrgUnit entity of the CERIF 2000 data model. • 1.1.6       Qualification of a person• The ISCED[2] should be used. • This is applied to the entity Qualification of the CERIF

2000 data model.

Page 16: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

16

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.7       Organization/Company size• The values from the harmonised information package for Fifth

Framework Programme should be used.         S1 : 0         S2 : 1-9         S3a : 10-49         S3b : 50-249         S4 : 250-499         S5 : 500-1999         S6 : 2000+ employees• Note: An SME (small and medium-sized enterprise) is defined as an

entity that has less than 250 full time equivalent employees, and has an annual turnover not exceeding EUR 40 million, or an annual balance sheet total not exceeding EUR 27 million, and is not owned by 25% or more by a non-SME.

• These values are applied to entity attribute Org_Headcount in entity OrgUnit in CERIF 2000 data model.

Page 17: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

17

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.8       Type and status of a patent• The IPC[3] should be used. • This is applied to Entity Patent Type in CERIF 2000 data model. • 1.1.9       Type of publication• The UNIMARC Manual – Bibliographic Format 1994 and the updates

from 1996 and 1998 should be used. • This is applied to entity Publication Type in CERIF 2000 data model. • 1.1.10  Type of event • The following list of values should be used:         Conference         Cultural event         Exhibition         Political event         Sport event         Trade fair         Workshop• This is applied to the entity Event Type of the CERIF 2000 data model.

Page 18: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

18

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.11  Type of multimedia item• Use the UNIMARC Manual – Bibliographic Format 1994 (IFLA-

UBCIM), and updates from 1996 and 1998 (fields 115, 125-128) (IFLA-UBCOM).

• This is applied to entity Multimedia type in CERIF 2000 data model.

• 1.1.12  Role of an organization in a project• The following definitions from the harmonised information

package for Fifth Framework programme should be used:         CO: Co-ordinator (=scientific, administrative and financial

co-ordinator)         CF: Only financial co-ordinator (if different from co-

ordinator)         AC: Associate contractor         CR: Contractor (other than the co-ordinator)• This is applied to attribute Proj_Org_Role of entity Proj_Org in

CERIF 2000 data model.

Page 19: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

19

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.13  Role of a person in a publication• Use the UNIMARC Manual – Bibliographic Format 1994 (IFLA-

UBCIM), and the annex C from the updates from 1996 and 1998.

• This is applied to attribute Pers_Pub_Role of entity Pers_Pub in the CERIF 2000 data model.

• 1.1.14  Role of a person as an expert• Use the following list of values for linking a person to

expertise/skills:         Consultant         Evaluator         Referee         Reviewer

• Applied to attribute Pers_Exp_Role of entity Pers_Expert_Skill in CERIF 2000 Data Model.

Page 20: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

20

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.15  Type of organization • Use the values from the harmonised information package of

the Fifth Framework Programme:           BES = enterprise sector including SMEs and individual consultants           HES = higher education establishments           RPR = private/commercial research centres including SMEs           RPN = private non-profit research centres           RPU = public research centres           JRC = joint research centre           PUS = non-research public sector           PNP = non-research private non-profit (? Sector)           INO = international organizations           OTH = others• This is applied to the attribute Org_Type_Full of entity

OrgUnit_Type in the CERIF 2000 Data Model.

Page 21: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

21

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.16  Role of a person related to equipment• Use the following list of values:           Contact person           Maintenance technician           Operator/technician• It is applied to attribute Pers_Equip_Role of entity Pers_Equip

in the CERIF 2000 data model.• 1.1.17  Role of an organization related to a product• The following list of values should be used:           Ownership           Franchise           License           Purchase• This is applied to attribute Org_Prod_Role of OrgUnit_Product

in the CERIF 2000 data model.

Page 22: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

22

Classification schemes …… Other Lookup Tables in

CERIF

• 1.1.18  Type of equipment• Use class 6 of UDC[4]. • This is applied to entity Equipment Type of the CERIF

2000 data model.

Page 23: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

23

Classification schemes …… Other Lookup Tables in

CERIF

• [1] ISCO stands for "International Standard Classification of Occupations"

• [2] UNESCO International Standard Classification of Education, level of education

• http://www.econ.ucl.ac.be/IRES/BASE_DE_DONNEES/Nomenclatures/occupations/CITE-ISCED.html

• [3] IPC stands for "International Patent Classification" - http://www.wipo.org/eng/general/ipc

• [4] UDC stands for "Universal Decimal Classification"

Page 24: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

24

Classification schemes …… Dictionaries

• Example: meaning of a word (term)– Used in ensuring correct use of a value in

an attribute– For explanation of result output

• Example: multilingual– Used in multilingual query (query in

language 1 and retrieve from records stored in languages 2….n)

– Used in result output – translate (crudely) to single language as required

Page 25: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

25

Classification schemes …… Dictionaries and

CERIF• Not really used• Much discussed• Overtaken by Thesauri and

Domain Ontologies

Page 26: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

26

Classification schemes …… Thesauri

• Provide the structural relationships of words (terms)– Synonym (different word same meaning)– Homonym (same word different meaning)– Antonym (word with opposite meaning)– Super-term (a word whose meaning

includes the word being used e.g. person includes {male|female})

– Sub-term (a word whose meaning is included in a Super-term)

Page 27: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

27

Classification schemes …… Thesauri in CERIF

• Ortelius– Multilingual thesaurus – Created for higher education– Plan to use for CERIF2000– Problems (legal) in using Ortelius

and maintaining / extending it– Out of use now in CERIF context

Page 28: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

28

Classification schemes …… Thesauri in CERIF

• Revision of CERIF91 Classification by Beat Sottas and his team

• http://www.aramis-research.ch/e/cerif.htm

• Well-balanced across subject areas

• Not too deep into any one subject area

Page 29: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

29

Classification schemes …… Discussion

• There is a feeling that classification schemes and codes are NOT useful

• Some claim they make values of attributes and retrieval more precise

• Some claim they assist in sorting, grouping (i.e. classifying)

• Others claim that only full-text retrieval is useful and that classification codes are unnecessary

Page 30: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

30

Classification schemes …… And so….

• The idea of Ontologies is appealing

• It allows dynamic representation of relationships between terms instead of rigid (usually hierarchic) classification schemes of allowed terms

Page 31: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

31

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 32: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

32

Domain ontologies

Ontologies

• Ontology: philosophical study of existence and nature of reality

• In practice a resource of terms, their definitions and their logical inter-relationships

• E.g. For a publication to exist it is necessary to have a title, at least 1 author

• Publication [title AND >=1 author]

Page 33: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

33

Domain ontologies

Ontologies

• Domain Ontology: Ontology covering a domain (subject area of interest)

• Example Publication• Publication [title]• Publication [author]• Publication [editor]• Collection [title + >1 author +

editor]• If Publication has Title, > 1 author and

editor it is a collection

Page 34: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

34

Domain ontologies Ontologies

• Domain Ontologies in IT• A representation in first order

logic allowing– Facts to be expressed– Relationships to be expressed– Constraints to be expressed– New facts and relationships to be

deduced or induced

Page 35: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

35

Domain ontologies Ontologies

• Used– Data validation on input– Clarification and improvement of a

query– Resolving heterogeneity of terms to

homogeneity– Expanding super-terms to subterms and

vice-versa conditionally– Deducing or inducing new facts and

relationships from stored facts and relationships

Page 36: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

36

Domain ontologies

Data Validation on Input

Data InputScreen Domain

Ontology

CERIFDatabase

Validation

UpdateOne item

Input Dialogue

One item

Page 37: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

37

Domain ontologies CERIF Ontology Work

• Work of Andrei Lopatenko http://derpi.tuwien.ac.at/~andrei/Metadata_Science.htm

• Example• <daml:Class

rdf:ID="http://derpi.tuwien.ac.at/~andrei/ontology/programmes.daml#OrgUnit"> 

• <rdfs:label>OrgUnit</rdfs:label>   • <rdfs:comment/>  

• <oiled:creationDate>16:02:07 14.11.2001</oiled:creationDate>   • </daml:Class>

Page 38: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

38

Domain ontologies Ontology in a Nutshell

• Much of the following comes from sources such as

• http://www.w3.org/2001/sw/WebOnt/ • http://www.daml.org/

Page 39: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

39

Domain ontologies

Ontology in a Nutshell

Page 40: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

40

RDFS: RDF SchemaClass

• RDF Schema allows you to define classes by direct declaration:

• <rdfs:Class rdf:ID="Product"> <rdfs:label>Product</rdfs:label> <rdfs:comment>An item sold by Super Sports Inc.</rdfs:comment> </rdfs:Class>

label of class Product is Product

Page 41: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

41

RDFS: RDF SchemaProperty

You can make similar definitions of properties:

<rdfs:Property rdf:ID="productNumber"> <rdfs:label>Product Number</rdfs:label>

<rdfs:domain rdf:resource=“#Product"/> <rdfs:range rdf:resource="http://www.w3.org/2000/01/r df-schema#Literal"/>

</rdfs:Property> Product Number is a numeric literal

Page 42: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

42

RDFS: RDF Schema Class Instances

You define instances of these classes by defining resources to be of the relevant RDF type, and then give them relevant properties:

<Product rdf:ID="WaterBottle"> <rdfs:label>Water Bottle</rdfs:label> <productNumber>38267</productNumber> </Product>

38276 is Product Number of Product ‘WaterBottle’

Page 43: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

43

Property Values

DAML+OIL allows property values to be restricted to the data types defined in XSDL or to user-defined data types. One does this by using a specialization of RDF properties: DatatypeProperty.<daml:DatatypeProperty rdf:ID="productNumber">

<rdfs:label>Product Number</rdfs:label> <rdfs:domain rdf:resource=“#Product"/> <rdfs:range

rdf:resource="http://www.w3.org/2000/10/XMLSch ema#nonNegativeInteger"/> </daml:DatatypeProperty> constraint: Product Number is a non-negative integer

Page 44: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

44

DAML + OIL: Class

The most important facilities provided by DAML+OIL are those that give designers more expressiveness in classifying resources.

The class daml:Class is defined as a subclass of rdfs:Class, and it adds many new facilities.

Page 45: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

45

RDF Class Instances

<rdfs:Class rdf:ID="MaritalStatus"/> <MaritalStatus rdf:ID="Married"/> <MaritalStatus rdf:ID="Divorced"/> <MaritalStatus rdf:ID="Single"/> <MaritalStatus rdf:ID="Widowed"/>

The main problem is that the enumeration is not closed.

Page 46: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

46

Using DAML beyond RDFS: Closed lists – ‘one

of’<daml:Class ID="Availability">

<daml:oneOf parseType="daml:collection"> <daml:Thing rdf:ID="InStock"> <rdfs:label>In stock</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="BackOrdered"> <rdfs:label>Back ordered</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="SpecialOrder"> <rdfs:label>Special order</rdfs:label> </daml:Thing>

</daml:oneOf> </daml:Class> 3 kinds of availability in the collection

Page 47: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

47

Using DAML beyond RDFS: Set up Namespaces

<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:daml="http://www.w3.org/2001/10/daml+oil#" xmlns:dt="http://rdfinference.org/eg/supersports/dt" xmlns:ss="http://rdfinference.org/eg/supersports/metadata" xmlns:xsd="http://www.w3.org/2000/10/XMLSchema#" xml:base="http://rdfinference.org/eg/supersports/metadata" >

Page 48: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

48

Using DAML beyond RDFS: Primary

Entities Classes<daml:Ontology rdf:about="">

<daml:versionInfo>1.0</daml:versionInfo> <rdfs:comment>An ontology of Super Sports Inc. store products </rdfs:comment> <daml:imports rdf:resource="http://www.w3.org/2001/10/daml+oil"/>

</daml:Ontology> <daml:Class rdf:ID="Product">

<rdfs:label>Product</rdfs:label> <rdfs:comment>An item sold by Super Sports

Inc.</rdfs:comment> </daml:Class> <daml:Class rdf:ID="Department">

<rdfs:label>Department</rdfs:label> <rdfs:comment>A Super Sports Inc.

department</rdfs:comment> </daml:Class> set up classes Product and Department

Page 49: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

49

Using DAML beyond RDFS: Secondary

Entities: Subclasses of Product<rdfs:subClassOf rdf:resource="#Product"/>

<daml:Class rdf:ID="Tool"><rdfs:label>Tool</rdfs:label> <rdfs:comment>Tools used in sports, ice axe for instance.</rdfs:comment> <rdfs:subClassOf rdf:resource="#Product"/>

</daml:Class> <daml:Class rdf:ID="Shoe">

<rdfs:label>Shoe</rdfs:label> <rdfs:subClassOf rdf:resource="#Product"/>

</daml:Class>set up subclasses of Product: Tool, Shoe

Page 50: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

50

Using DAML beyond RDFS: Enumerated

List<daml:Class rdf:ID="Activity">

<rdfs:label>Activity</rdfs:label> <rdfs:comment>A sport activity</rdfs:comment> <daml:oneOf rdf:parseType="daml:collection">

<daml:Thing rdf:ID="Hiking"> <rdfs:label>Hiking</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="Travel"> <rdfs:label>Travel</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="Camping">

<rdfs:label>Camping</rdfs:label> </daml:Thing>

</daml:Class> Activity can take values Hiking, Travel, Camping

Page 51: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

51

Using DAML beyond RDFS: Use of

classification (list)<daml:DatatypeProperty rdf:ID="productNumber"> <rdfs:label>Product Number</rdfs:label> <daml:samePropertyAs rdf:resource=“<a

href="http://rosettanet.org/FundamentalBusiness">http://rosettanet.org/FundamentalBusiness</a> DataEntities#ProprietaryProductIdentifier"/> <rdfs:domain rdf:resource="#Product"/> <rdfs:range rdf:resource="http://www.w3.org/2000/10/XMLSchema#nonNegativeInteger"/> <rdf:type rdf:resource="http://www.w3.org/2001/10/daml+oil#UniqueProperty"/>

</daml:DatatypeProperty>

Product Number within list identified of unique Product Ids

Page 52: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

52

Using DAML beyond RDFS: Entity-Entity

Link<daml:ObjectProperty rdf:ID="usedFor">

<rdfs:label>usedFor</rdfs:label> <rdfs:comment>The activity for which a product is used</rdfs:comment> <daml:domain rdf:resource=“#Product"/> <daml:range rdf:resource=“#Activity"/>

</daml:ObjectProperty>

Product x used for Activity y

Page 53: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

53

Using DAML beyond RDFS: An instance of

Link<ss:BackPack rdf:ID="ReadyRuck">

<rdfs:label>Ready Ruck back pack</rdfs:label> <rdfs:comment>The ideal pack for your most rugged adventures</rdfs:comment> <ss:productNumber>23456</ss:productNumber> <ss:packCapacity>45</ss:packCapacity> <ss:usedFor rdf:resource=“#Hiking"/>

</ss:BackPack>

Instance: ReadyRuck backpack used for Hiking

Page 54: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

54

Summary

• DAML (extending RDF / RDFS / XML / XMLS) can represent

• at schema level (and at instance level)– Classes (primary entities)– Sub-classes (secondary entities)– Relationships (binary

relations)– Enumerated lists (lookup tables)– Value from classification scheme

• So congruent with CERIF

Page 55: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

55

Summary

• Ontologies provide a means– To extend the relational domain– Further in first order logic– So allowing induction and deduction

• This is especially useful – Data validation– Query expansion / refinement– Homogeneous access over

heterogeneous data sources (schema reconciliation)

Page 56: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

56

Domain ontologies

End of Ontology in a Nutshell

Page 57: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

57

Ontologies and CERIF

• Work by Andrei Lopatenko and discussed within euroCRIS CERIF Task Group

• Based on use of – DAML + OIL– OilEd (editor for the languge)– With associated Description Logic

Classifier FaCT – which uses SHIQ Description Logics

Page 58: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

58

Example Partial Ontology for

CERIF: ClassesClasses DefinitionActivity  Programme  EUProgramme

Programme which is financed-by EU or by any EU-country

ISTProgramme one of EU ProgrammesLocation Any location can be a part-of another locationCity  Continent  Europe  Country  EuropeanCountry A country which is a part-of continent EuropeEUCountry A country which is a part-of EUUnion Geopolitical or economical union of countriesEU One of unions, but which is a part of EuropeOrganization  FundingOrganization

This organization which finances some Activities or Projects

EUFundingOrganization Any funding organization which is situated-in EUCountryEuropeanFundingOrganization

Funding organization which is situated-in Europe

Proiect  FinancedProject Project which is-financed by some of FundingOganizationEUFinancedProject Project which is financed-by EUFundingOrganizationEuropeanFinancedProject

Project which is financed-by EuropeanFundingOrganization

ISTProject Project which is a part-of IST Programme

Page 59: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

59

Example Partial Ontology for

CERIF: Relations and Axioms

Slots or Relations Properties

financed-byA financed-by Y reverse relation finances When A is financed-by Y then Y finances A

finances  

part-oftransitive relation. when X is a part of Y, and Y is a part of Z, then X is a part of Z

situated-in Geographical inclusion. transitive relation

Axioms Descriptions

Project which is a part-of any of EUProgrammes is financed-by EU

When it is known that a project is a part of EUProgramme then we are sure that this project is financed by one of EUFundingOrganizations

Page 60: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

60

Example Partial Ontology for CERIF:

Assertion of new Statements

Statement Proof

ISTProject is an EUProject

1.             fact: ISTProject is a part of ISTProgramme, 2.             fact: ISTProgramme is a EUProgramme 3.             statement: ISTProject is a part of EUProgramme 4.             3 + Axiom -> ISTProject is financed by EU 5.             4 + definition of EUFinancedProject -> IST is an EUFinancedProject

EUFinancedProject is an EuropeanFinancedPtroject

1.             EUFinancedProject is financed by FundingOrganization which situated in EU 2.             EU is situated in Europe 3.             1 + 2 + transitiveness of situated-in EUFinancedProject is financed by organization which is situated in Europe 4.             3 + definition of EuropeanFinancedProject -> EUFinancedProject is an EuropeanFinancedProject

Page 61: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

61

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 62: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

62

Versions for Relational DBMS

• It is probably easiest to implement CERIF in (extended) relational technology

• Versions exist in– Oracle– SQLServer– Postgres– MySQL

Page 63: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

63

Versions for Relational DBMS

• Between the different RDBMSs there are some differences:– Representation of certain data types

• Accuracy, precision

– Max length of character and varchar fields• Especially if coded in unicode

– Restrictions on primary key constraints and use of indexes

Page 64: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

64

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 65: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

65

Mapping CERIF to RDBMS

• The easiest way to map from CERIF Data Model to a RDBMS implementation is to use the supplied scripts on the euroCRIS CERIF Task Group Website

• Provides a full CERIF implementation• Even if do not use all of the structure

this ensures nothing is missed out

Page 66: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

66

Mapping CERIF to RDBMS

• And then can add extensions as necessary / required locally– But check first the currently

proposed changes / extensions on www.eurocris.org CERIF Task Group website

• However, if wish to build up a RDBMS implementation of CERIF proceed as follows:

Page 67: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

67

Mapping CERIF to RDBMS: Primary

Entities• Create table for each primary entity• Create link tables for the primary

entities• Create Language base tables for

the primary entities– And the required linking tables

• Create lookup tables for the primary entities– And the required linking tables

Page 68: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

68

Mapping CERIF to RDBMS: Primary

Entities• This provides a CRIS implemented

as a RDBMS that handles very simply Project, Person, OrgUnit

• Their relationships• Their language variants for certain

text attributes• Their controlled (enumerated) lists

for attribute values

Page 69: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

69

Mapping CERIF to RDBMS: Primary

Entities• Almost certainly the secondary

base table CONTACT will be required– And the required linking tables

• And possibly FUNDING PROGRAMME– And the required linking tables

Page 70: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

70

Mapping CERIF to RDBMS: Secondary

Entities• Repeat as for primary entities

– Entities themselves– Linking relations– Language base tables

• And their link tables

– Lookup tables• And their link tables

• But beware – very easy to miss something and then find problems in using the system

Page 71: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

71

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 72: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

72

OODBMS and IR Systems

Introduction

• It is possible to implement CERIF in OODBMS, IR Systems

• And indeed in Hypermedia Systems

• Or even in a document system based on SGML or XML (i.e. a semistructured database system)

Page 73: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

73

OODBMS and IR Systems OODBMS

• OODBMS possess the concept of objects with– Data (structure view)– Methods (process view)– Messages (event view)– (Note an object is a type; instances reside in a class)

• Which means that any process has to be coded specifically for any object– inheritance can help reduce the coding– but note problem of multiple conditional inheritance

• Generally OODBMS – have worse performance than RDBMS and have

poorer data representational capability

Page 74: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

74

OODBMS and IR Systems IR Systems

• Information Retrieval Systems have advantages for databases with many textual attributes– Full inverted index

• Very fast retrieval• Very slow update• Little or no structural capability

(relating entities)• Little or no reporting capability (group,

sum, average…)

Page 75: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

75

OODBMS and IR Systems Hypermedia Systems

• Provide good structure (relationship) handling through anchors and pointers

• Usually have good attribute type handling

• Require traversal / browsing rather than directed query – query relatively slow

• Update slow (pointer handling)• Usually poor reporting – group, sum,

average

Page 76: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

76

OODBMS and IR Systems Document Systems

• Recent rise in popularity with XML as ‘semistructured databases’

• In fact since 80s SGML document systems• Query usually poor – query languages not

declarative predicates but procedural and navigational– May be fast if use IR technique of full

inverted indexes• Update slow if change to several entity

instances; fast if one document• Report capability variable – group, sum,

average

Page 77: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

77

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 78: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

78

Keys

• ‘The key’ to successful implementation

• Key attribute value identifies uniquely a tuple– Within base relation : primary

key– Within another relation : foreign key

Page 79: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

79

Keys

Project

Person

Person

PROJECT PERSON

PK FK PK

Page 80: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

80

Structure

• Language handling• Classification schemes and enumerated

lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations

Page 81: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

81

Surrogate keys in linking

relations The Problem

• Wish to link flexibly– An instance in an entity to a related

instance in another entity (relationship)– An instance in an entity to another

instance in the same entity (recursion)

• Examples– Person <-> Project e.g. x is leader of y– Person <-> Person e.g. x is boss of y

Page 82: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

82

Surrogate keys in linking

relations RelationshipUsual Relation

Project

Person

Person

PROJECT PERSON

PK PKFK

Page 83: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

83

Surrogate keys in linking

relations RelationshipProblem

• Supports only 1 (Project) to n (persons)• i.e. the persons on any 1 project, with

all their attributes (dependencies)

• In many cases need to indicate that– The same person works on several projects– In different roles (e.g. leader, programmer)– At different (or the same) time periods

Page 84: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

84

Surrogate keys in linking

relations RelationshipBinary Relation

Project

Person

PROJECT PERSON

Project

Person

Page 85: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

85

Surrogate keys in linking

relations The Problem with a Binary Relation

• How to identify uniquely each tuple• Have effectively 2 primary keys• And even in combination their

concatenated value may not be unique– Same relationship with different roles– Same relationship at different

date/time intervals Surrogate key

Page 86: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

86

Surrogate keys in linking

relations RelationshipBinary Relation

Project

Person

PROJECT PERSON

Project

Person

Surrogate

Role

StartDate

EndDate

Surrogate key (primary, unique)

Page 87: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

87

Surrogate keys in linking

relations Recursion Usual Relation

PK &FK

Person

PERSON

Person

Actually works like this

PERSON

Page 88: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

88

Surrogate keys in linking

relations RecursionBinary Relation

Person

PERSON

Person

Person

Page 89: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

89

Surrogate keys in linking

relations RecursionBinary Relation

Person

PERSON

Person

Person

How the tuples from Person are represented in the binary relation

Page 90: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

90

Surrogate keys in linking

relations The Problem with a Binary Relation

• How to identify uniquely each tuple• Have effectively 2 primary keys• And even in combination their

concatenated value may not be unique– Same relationship with different roles– Same relationship at different

date/time intervals Surrogate key

Page 91: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

91

Surrogate keys in linking

relations RecursionBinary Relation

Person

PERSON

Person

Surrogate

Role

StartDate

EndDate

Person

In practice usually have more attributes than Person / Person

Surrogate key (primary, unique)

Page 92: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

92

Surrogate keys in linking

relations Binary Relation

• Flexible• Allows n : m• With added attributes e.g. role,

date/time• Thus permitting

– Conditional relationships– Temporal relationships– i.e. rich semantics

Page 93: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

93

Conclusion

• Implementation of the CERIF datamodel as a RDBMS is fairly straightforward– Entity = table– Attribute = attribute– Type = type (may need some coded

extension)– Constraint = constraint (coded, mainly not

provided in RDBMS)– Relationship = binary relation table– Language and character set variants =

tables linked to associated entity

Page 94: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

94

Conclusion

• CERIF can also be implemented in other systems / architectures

• CERIF provides assistance in data integrity– Enumerated lists (dictionaries)– Classification terms (thesauri)– Semantics (domain ontologies)

Page 95: © Keith G Jeffery & Anne AssersonCERIF Course: Implementation 20021024 1 CERIF COURSE Session5: Implementation Keith G Jeffery, Director, IT CLRC k.g.jeffery@.rl.ac.ukk.g.jeffery@.rl.ac.uk

© Keith G Jeffery & Anne Asserson

CERIF Course: Implementation 20021024

95

Conclusion

• CERIF provides consistently– Full CRIS data model: implement a

CRIS– Data exchange models: link CRISs– Metadata model: access to CRISs

• CERIF works