clarin-nl isocat workshop 2011 part 2 ineke schuurman menzo windhouwer

24
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Upload: aubrie-wilcox

Post on 27-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

CLARIN-NL ISOcat workshop 2011part 2

Ineke Schuurman

Menzo Windhouwer

Page 2: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Part A• Issues brought up by participants

– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of

detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data

Page 3: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Part B

• ISOcat and CLARIN: Do’s and don’ts (version 0.1)

– Introduction and discussion

Page 4: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

• Part 1– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of

detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data

Page 5: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

• When (not) to adopt an existing DC

– It should ‘match’ with the way you use a specific notion in your annotation scheme, application, …

– It should come with the same profile– It should handle the same phenomenon,

SpeakerID =/= SingerID

Page 6: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Speaker vs Singer

String→Name→Person→Singer→Opera → Opera singer→Tenor →Tenor in La Bohème

First: too generic, last: too specificThe others are candidates

Note that SingerID and SpeakerID are siblings, whereas SingerID is subclass of both Singer and ID (RELcat!)

Page 7: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

– When (not) to adopt an existing DC

– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of

detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data

Page 8: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Standards

• Within ISOcat currently there are little or no standards,

Therefore

• CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge (she will consult with others)

Page 9: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

– When (not) to adopt an existing DC– What about (CLARIN) standards

– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of

detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data

Page 10: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Flagged DCs

• Never link with ‘deprecated’ DCs !

(in case of doubt: consult with Ineke or Menzo)

• In other cases the flags show whether the DC specification is correct from a technical point of view.

• Note that only DCs with a green marking are qualified for standardization

Page 11: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs

– Relation DCS – profile– What should be included in ISOcat (level of

detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data

Page 12: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

DC/DCS and profile

• Profiles are not added automatically, a DCS may contain elements with various profiles

• In case the profile you need is not yet available, contact Menzo and Ineke

Page 13: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile

– What should be included in ISOcat (level of detail, abbreviations, …)

– What about TEI, metadata, webservice?– How to deal with larger amounts of data

Page 14: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

What to include?

• Cf slide on SingerID/SpeakerID

• In general: all linguistically meaningful notions mentioned in your schema, manual, definition (cf part B)

• Abbreviations (PST for /past tense/)

are to be mentioned as Data Element Name

Page 15: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of

detail, abbreviations, …)

– What about TEI, metadata, webservice?

– How to deal with larger amounts of data

Page 16: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

TEI, metadata, webservice

• TEI: likely to be taken care of at ‘higher level’, if not YOU are to insert the TEI definitions you use.

• Metadata: new in CMDI? In that case definition in ISOcat to be provided as well

• Webservice: to be taken care of in CMDI

Page 17: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of

detail, abbreviations, …)– What about TEI, metadata, webservice?

– How to deal with larger amounts of data

Page 18: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Larger amounts?

in such a case:

contact Menzo Windhouwer

([email protected])

Page 19: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Part B: do’s & don’tsDo’s:• Create a DCS for your scheme (name

project, ann.scheme, …)• Provide clear definition (short, to the point)

for your scheme, application, …. • Take care not to leave concepts used in your

definition undefined or vague• Use appropriate vocabulary (per profile)

• Check ‘adopted’ DC’s regularly till standardization !

Page 20: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Do’s (continued)

When creating a DC, fill out• Justification: used in XYZ, part of tagset

N• Language section

– Always English language section– Strong recommendation: sections for object

language(s), for working language manual– Sections in the various languages should

match (+/- be translations of each other)

Page 21: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Do’s (continued)

When creating a DC, fill out

• Example section – Note that *negative* examples may be very

helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))

Page 22: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Example sections

Suppose you want to illustrate a German phenomenon:

• Ex.sec. in EN language section– German ex with transl in English

• Ex.sec. in NL language section– German ex with transl in Dutch

• Ex.sec. in EN linguistic section– EN example

• Ex.sec. in NL linguistic section– NL example with translation in English

Page 23: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

Don’ts

• Confuse Language and Linguistic section– Latter contains language specific values for

closed domains

• Be (too) language specific in definition

• Mention scheme in definition

• Use several definitions in one DC

• Circular definitions

• Rely on authority

• Rely on standardized status– Definition should fit YOUR scheme, etc

Page 24: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer

.

-- End --