languages for aboutness

21
1 How do we describe something? What something is about? – What the content of an object is “about”? Different methods (Wilson, 1968) – counting terms (objective method) – complete description/summarization – unifying thought(s) – What stands out (main points) Challenges – Non-text

Upload: ramya

Post on 07-Jan-2016

25 views

Category:

Documents


4 download

DESCRIPTION

Languages for aboutness. Indexing languages: Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files for named entities (people, places, structures, organizations) Classification Keyword lists Natural language systems (broad interpretation). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Languages for aboutness

1

How do we describe something? What something is about?

– What the content of an object is “about”? Different methods (Wilson, 1968)

– counting terms (objective method)– complete description/summarization– unifying thought(s)– What stands out (main points)

Challenges– Non-text

Page 2: Languages for aboutness

2

Languages for aboutness Indexing languages:

– Terminological tools• Thesauri (CV – controlled vocabulary)• Subject headings lists (CV)• Authority files for named entities (people,

places, structures, organizations)

– Classification– Keyword lists– Natural language systems (broad

interpretation)

Page 3: Languages for aboutness

3

Aboutness: How to do it! Read the document [Intellectual

reading]– look for key features– many indexers mark up the items– rarely have time to read the whole document

Determine aboutness [Conceptual analysis]

Translate aboutness into the vocabulary or scheme you are using– In general: Subject headings: 1-3 headings– Descriptors, 5-8 descriptors – Classification: 1 notation (should it only be

one!?).

Page 4: Languages for aboutness

4

Features of indexing languages:

With the exception of a few general domain tools, they are generally domain specific.– MeSH– NASA Thesaurus– Astronomy Thesaurus– ERIC thesaurushttp://www.darmstadt.gmd.de/~lutes/thesoecd.html

Concepts (or concept representations) are arranged in a discernable order

Page 5: Languages for aboutness

5

Language schema designs Classified--grouping

– Hierarchies and facets

MeSH Browserhttp://www.nlm.nih.gov/mesh/MBrowser.html

Art and Architecture (Getty AAT) http://www.getty.edu/research/conducting_research/vocabularies/aat/

Alphabetical -- horizontal – Verbal/Alphabetical (ordering/filing challenges)

Page 6: Languages for aboutness

6

Controlled Vocabulary Why do we have a controlled

vocabulary? Three of you independently identify a

new human gene, and each separately name it different things.

How do we handle references/resolving/utilizing this concept which has different names. Let alone, across languages?!

Page 7: Languages for aboutness

7

Controlled Vocabulary A list or a database of subject terms in

which each concept has a preferred terms or phrase that will be used to represent it in the retrieval tool; the terms not used have references (syndetic structure), and often scope notes. Their can be aliases for preferred terms (so the all three of your gene names get recorded and are matchable to the preferred term).

Page 8: Languages for aboutness

Example

For gene names, there is an authority, HUGO Gene Nomenclature Committee that designates an official curated name for gene.

During the research process however, there may have been multiple initial names.

8

Page 9: Languages for aboutness

More Examples

Most processs however, do NOT have standardized naming.

For instance genetic conditions are not named in one standard way. Doctors treating patients often propose the first name, but often expert working groups later revise to more appropriate name.

9

Page 10: Languages for aboutness

Cont’d

The basic genetic or biochemical defect that causes the condition (for example, alpha-1 antitrypsin deficiency);

One or more major signs or symptoms of the disorder (for example, hypermanganesemia with dystonia, polycythemia, and cirrhosis);

The parts of the body affected by the condition (for example, craniofacial-deafness-hand syndrome);

The name of a physician or researcher, often the first person to describe the disorder (for example,Marfan syndrome, which was named after Dr. Antoine Bernard-Jean Marfan);

A geographic area (for example, familial Mediterranean fever, which occurs mainly in populations bordering the Mediterranean Sea); or

The name of a patient or family with the condition (for example, amyotrophic lateral sclerosis, which is also called Lou Gehrig disease after the famous baseball player who had the condition).

10

Page 11: Languages for aboutness

11

Thesaurus (structured thesaurus)

Lexical semantic relationships Composed of indexing

terms/descriptors Descriptors = representations of

concepts Concepts = Units of meaning

(Svenonius)

Page 12: Languages for aboutness

12

Thesaurus

Preferred terms Non-preferred terms Semantic relations between terms How to apply terms (guidelines, rules) Scope notes Adding terms (How to produce terms

that are not listed explicitly in the thesaurus)

Page 13: Languages for aboutness

13

Preferred Terms

Control form of the term• Spelling, grammatical form• Theatre / Theater• MLA / Modern language association

Choose preferred term between synonyms

• Brain cancer or Brain Neoplasms?

Page 14: Languages for aboutness

14

Common thesaural identifiers

SN Scope Note – Instruction, e.g. don’t invert phrases

USE Use (another term in preference to this one)

UF Used For BT Broader Term NT Narrower Term RT Related Term

Page 15: Languages for aboutness

15

Semantic Relationships

Hierarchy Equivalence Association

Page 16: Languages for aboutness

16

Hierarchies of Meaning

‘Glass’

‘Beer Glass’

‘Wine Glass’

‘Red wine glass’

‘White wine glass’

From: Controlled Vocabularies/ Paul Miller Interoperability Focus UKOLN

Page 17: Languages for aboutness

17

Hierarchy

Level of generality – both preferred terms

BT (broader term)– Robins BT Birds

NT (narrower term)– Birds NT Robins

– Inheritance, very specific rules

Page 18: Languages for aboutness

18

Equivalence

When two or more terms represent the same concept

One is the preferred term (descriptor), where all the information is collected

The other is the non-preferred and helps the user to find the appropriate term

Page 19: Languages for aboutness

19

Equivalence

Non-preferred term USE Preferred term– Nuclear Power USE Nuclear Energy– Periodicals USE Serials

Preferred term UF (used for) Non-preferred term– Nuclear Energy UF Nuclear Power– Serials UF Periodicals

Page 20: Languages for aboutness

20

Association

One preferred term is related to another preferred term

Non-hierarchical “See also” function In any large thesaurus, a significant umber

of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy

Page 21: Languages for aboutness

21

Association

Related Terms (RT) can be used to show these links within the thesaurus– Bed RT Bedding– Paint Brushes RT Painting– Vandalism RT Hostility– Programming RT Software