clarin-nl isocat workshop 2012 part 2 (10-10-2012) ineke schuurman menzo windhouwer

23
CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Upload: victor-richards

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

CLARIN-NL ISOcat workshop 2012part 2 (10-10-2012)

Ineke Schuurman

Menzo Windhouwer

Page 2: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

• Issues brought up by participants– Which elements are to be included in ISOcat– (CLARIN) standards, TEI etc– Type of DC– When to create a new DC/adapt an existing one– When to create several DCSs– Name of DC, several DCs with same name– How to deal with larger amounts of data

Page 3: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

What to include?

• ALL concepts dealing with linguistics/ metadata– Van Dale EN-NE

include (overgankelijk werkwoord)

1) omvatten

2) (mede) opnemen

==> 'overgankelijk werkwoord' / 'transitive verb' is to be included, same for 'overg.ww', 'trns.v.'

• One and the same DC!

Page 4: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

What to include?

‘transitive verb’

• Several entries in ISOcat–DC-1405A verb which takes a direct object; that is, a verb that

expresses an action which directly affects another person or thing.

–DC-3532A transitive verb is a verb that takes a direct object,

and describes a relation between two participants [Crystal 1997: 397; Payne 1997: 171]

– And several more, so... which one to select?

Page 5: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

• When (not) to adopt an existing DC– It should ‘match’ with the way you use a

specific notion in your annotation scheme, application, …

– It should come with the same profile and type

• That being said– Reuse a CLARIN NL/VL DC when possible

(contact Ineke when such a definition is incorrect)

Page 6: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Same name

• Not really a problem when it are good DCs, not even when coming with the same profile

• PositivePolarity– In general, positive polarity refers to an

assertion that contains no marker of negation [Crystal 1980: 299]. (DC-3405)

– the property of a word or concept to express positive sentiment (myDC-xx)

• Whether you can reuse DC-3405 depends on your use of the concept!

Page 7: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Same name

• Do not avoid reuse of a name when it is the name commonly used!

• Another type of duplicate names where one concept entails the other one:

– meewerkend voorwerp – meewerkend en belanghebbend voorwerp

– event (also called 'eventuality', and including 'state')

– event (sister of 'state')

Page 8: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

What defines a good DC?

Reusable definition

NOT

conversation (DC-2661)Communication event with more than two

participants

mother tongue (DC-2955)[…] a speaker’s mother tongue

Page 9: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

What defines a good DC?

Correct definition

NOT (?)Actor (DC-4146)

a participant in an action or process

Question: is an addressee to be considered an actor? (used in DC-4158, no proper definition yet)

Page 10: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

What defines a good DC?

Meaningful definition

NOT

annotation format (DC-2562)Specifies the annotation format that is used …

source language (DC-2494)Indicates if a language is a source language

Page 11: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Not that good examples

Mother tongue (DC-2955)Specifies whether the language is a speaker’s mother

tongue

Mother’s language (DC-4516)[…] NOT necessarily the mother tongue […]

- There is no definition of concept ‘mother tongue’

(Relation with /home language/ , /primary language/,

/heritage language/?)

- And why ‘speaker’?

Page 12: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Rule

Make your definition• as general as possible• as specific as necessary

Page 13: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Standards

• Within ISOcat currently there are little or no standards,

Therefore

• CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge, selecting new flag “recommended by CLARIN NL/VL”

Page 14: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Standards

Another issue wrt standards 'included' in ISOcat

- Athens Core DC's (recommended by metadata/CMDI): we are currently adapting them in order to avoid tautologies and/or correct smaller ‘errors’

Target language: indicates if the language is the target language

Conversation: […] three or more participants

Same may be necessary for TEI Headers etc

Page 15: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

DC/DCS and profile

• Profiles are not added automatically, a DCS may contain elements with various profiles (although you may decide to create several DCSs) (do select proper names!)

• In case the profile you need is not yet available, contact Menzo and Ineke

Page 16: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Part B: do’s & don’tsDo’s:• Create a DCS for your scheme (name

project, ann.scheme, …)• Provide clear definition (short, to the point)

for your scheme, application, …. • Take care not to leave concepts used in your

definition undefined or vague• Use appropriate vocabulary (per profile)

• Check ‘adopted’ DC’s regularly till standardization !

Page 17: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Do’s (continued)

When creating a DC, fill out• Justification: used in XYZ, part of tagset

N• Language section

– Always English language section– Strong recommendation: sections for object

language(s), for working language manual– Sections in the various languages should

match (+/- be translations of each other)

Page 18: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Do’s (continued)

When creating a DC, fill out

• Example section – Note that *negative* examples may be very

helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))

Page 19: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Example sections

Suppose you want to illustrate a German phenomenon:

• Ex.sec. in EN language section– German ex with transl in English

• Ex.sec. in NL language section– German ex with transl in Dutch

• Ex.sec. in EN linguistic section– EN example

• Ex.sec. in NL linguistic section– NL example with translation in English

Page 20: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Don’ts

• Confuse Language and Linguistic section– Latter contains language specific values for

closed domains

• Be (too) language specific in definition

• Mention scheme in definition

• Use several definitions in one DC

• Circular definitions

• Rely on authority

• Rely on standardized status– Definition should fit YOUR scheme, etc

Page 21: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Procedure - 1

Page 22: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

Procedure - 2

Page 23: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

.

-- End --