digitization of rare library materials metadata -introduction mark-up © adolf knoll, national...

Post on 31-Mar-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DIGITIZATION OF RARE LIBRARY MATERIALS

Metadata -Introduction

Mark-up

© Adolf Knoll, National Library of the Czech Republic

CZ

DI

LV

LT

car

lorry

car

carmotorcycle

OKaircraft

Marked up categories of objects

All are means of transport. Marked the country (CZ, LV, D, I, OK,

LT) Marked the type (car, motorcycle, lorry,

aircraft) However, the way of marking is different: if the

type expresses the means of road transport, the Czech Republic is marked as CZ, but if it is a flying object, it is marked OK.

How is driven the mark-up?

The same object can belong to different categories marked differently.

This is done on the basis of object properties.

That property considered as decisive is taken as background for classification.

We cannot foresee the number of such decisive properties in many cases.

Text processing

To display text on a page, it must be arranged graphically.

If the text is electronic, the characters organized in such a way to form sequences divided by blanks must be driven by some tools to be displayed or printed where we want.

There are several possibilities how to do it:

Text processing

Page Description Language, e.g. Postscript Text editors:

making a paragraph in obsolete editors: break line + add an empty line + indent

making a paragraph in modern editors: say that a block of a text is a ¶paragraph¶

The paragraph is marked, but what to do with it?

Text processingMark-up and behaviour

An object can be marked by a sequence of characters (a Latvian car marked as LV), by a symbol or sign (a Latvian man marked by , a paragraph marked by ¶).

Under certain conditions, we may need to assign some behaviour to the objects marked in a certain way. Thus it is evident that we need some behavioural information somewhere to tell to the identically marked objects how to behave:

Text processingMark-up and behaviour

During an ice-hockey championship men marked by will play against men marked by .

Cars marked by LV will undergo other mandatory technical control than the cars marked by CZ.

Text processingMark-up and behaviour

In a good text editor, e.g. MS World, the formatting of the marked object (paragraph) is set separately (indented or not, how many dots after or before the paragraph, etc.)

In the web language, HTML, this is analogue: the <P>paragraph</P> is marked as shown, while the web browser knows that it must be displayed on a separate line after some space is omitted.

What is marked up?

We have seen that objects are marked up.

These objects can be objects from the real world or their representations.

The objects can be represented by their denominations, which - when written - are mere sequences of characters.

However, they can be also represented by their images or symbols and by the sound.

Object

Car

Properties of the object INSECT

It may be necessary to mark also some other properties of the object, which may be relevant to group or to classify its concrete representations.

beetle

beetle

fly butterfly

spider

spider

Concrete INSECT

The concrete insect can be beetle that is lady-bird, goldsmith-beetle, longicorn beetle, or may-bug, etc.

This is its name, which is different in different languages: in Czech, for example, the above sequence of beetles have names as beruška, zlatohlávek, tesařík, chroust.

However, the differences of names do not affect the correctness of content mark-up.

Summing up

It is evident that marking an object we should distinguish between: the mark-up of the content the complementary properties of the

marked object the assigned names to the object the information about how such an object

should behaved if activated (display, printing, projection, …)

How to describe the object?

INSECTBEETLEPICTURE

Colorado BeetleINSECTBEETLETEXT

How to prescribe behaviour to the described object?

The behaviour is prescribed by special formatting - in this case - rules.

This behaviour is separate from the mark-up of the contents.

The formatting rule can take only the representations of the objects and mark up their behaviour if these representations are activated.

Among the images used for my Power Point presentation, there is also this one: <p><img SRC="Image_Colorado_Beetle.gif" ALT="Image of a Colorado Beetle" height=196 width=166 align=CENTER>Colorado Beetle<br> This beetle is very nice.

The problems

At the formatted output here, we have lost the description of the contents, which are necessary for other kind of work.

It is evident from this that such a kind of output cannot be taken as the only existing source data.

However, it can be admitted that it one of possible appearances of source data.

Source data and access

SourceData

Appearanceno. 1

Appearanceno. 3

Appearanceno. 2

Direct and simple look inside the source data is desirable

top related