practical metadata kathryn lybarger. what is metadata?

Post on 13-Jan-2016

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Practical MetadataPractical Metadata

Kathryn LybargerKathryn Lybarger

<METADATA><METADATA>

What is metadata?What is metadata?

““data about data”data about data”

Types of MetadataTypes of Metadata

Descriptive MetadataDescriptive Metadata

Structural MetadataStructural Metadata

Administrative MetadataAdministrative MetadataPreservation MetadataPreservation MetadataRights and Access MetadataRights and Access MetadataTechnical MetadataTechnical Metadata

ExamplesExamples

This space intentionallyleft blank.

ExamplesExamples

DescriptiveDescriptive StructuralStructural AdministrativeAdministrative

PRIVATE

PUBLICA - K

L - Z

PERSONAL

BUSINESS

What does metadata look like?What does metadata look like?

May be same format as dataMay be same format as data

Header added by Project Gutenberg

Ebook submitted

Metadata may have different formatMetadata may have different format

WAV audio file

Text metadata

Not all metadata is textNot all metadata is text

““Okay, it's September the 21, 1987, Okay, it's September the 21, 1987, I'm in Frankfurt Kentucky in the I'm in Frankfurt Kentucky in the home of Clarence Gunther, who was home of Clarence Gunther, who was a World War II veteran, served in the a World War II veteran, served in the Navy, entering service on March 6, Navy, entering service on March 6, 1940, and separated March 8, 1946 1940, and separated March 8, 1946 as a boats and mate first class. He as a boats and mate first class. He was at Pearl Harbor…”was at Pearl Harbor…”

Not all metadata is verbalNot all metadata is verbal

Metadata may have no structureMetadata may have no structure

"This book I gave to "This book I gave to Mary Baxter. After her Mary Baxter. After her death, I gave it to death, I gave it to Mrs. Spruill. After her Mrs. Spruill. After her death, to Kate Wilson. death, to Kate Wilson. She never read it, so She never read it, so on a visit to her, I took on a visit to her, I took back for my own back for my own reading."reading."

Metadata may have some structureMetadata may have some structure

Word processors Word processors allow “document allow “document properties”properties”

Anything can go in Anything can go in these fieldsthese fields

File names are metadataFile names are metadata

cont-vocab.doc

Some indication of content

File type

Metadata may have rich structureMetadata may have rich structure

Example: MARC Example: MARC recordrecord

Requires expertise to Requires expertise to read and createread and create

Allows very detailed Allows very detailed searchingsearching

XML: eXtensible Markup LanguageXML: eXtensible Markup Language

Many rich metadata formats are encoded as Many rich metadata formats are encoded as XMLXML

A schema or DTD specifies rules which a A schema or DTD specifies rules which a document must followdocument must follow

Examples: XHTML, EAD, TEI, NDNPExamples: XHTML, EAD, TEI, NDNP

XML: ExampleXML: Example

<book isbn="978-0898713619"><book isbn="978-0898713619">

<title>Numerical Linear <title>Numerical Linear Algebra</title>Algebra</title>

<author>Lloyd N. Trefethen</author><author>Lloyd N. Trefethen</author>

<author>David Bau, III</author><author>David Bau, III</author>

<publisher>SIAM</publisher><publisher>SIAM</publisher>

</book></book>

XML: AdvantagesXML: Advantages

““self-describing”self-describing”

Validation catches many errorsValidation catches many errors

XML tools may be used for any XML XML tools may be used for any XML languagelanguagesearchingsearching transformationtransformationcommunicationcommunication

When is metadataWhen is metadatacreated?created?

Who creates metadata?Who creates metadata?

Where is metadata?Where is metadata?

Metadata may be inside the dataMetadata may be inside the data

Physical:Physical: Title pageTitle page Table of contentsTable of contents IndexIndex

Digital:Digital: Header informationHeader information

Binary data Binary data

Header informationHeader information

in an image filein an image file

XML metadata XML metadata

Metadata can be near the dataMetadata can be near the data

Title and author on the spine of a bookTitle and author on the spine of a book

Associated .txt file with a .wav fileAssociated .txt file with a .wav file

Alternate data streams (Windows)Alternate data streams (Windows)

Metadata can be gathered Metadata can be gathered elsewhereelsewhere

Card catalogCard catalog

IndexIndex

Search engineSearch engine

Metadata can be multiple placesMetadata can be multiple places

Bee S-50Earlington, KY 98’1892 negative

microfilm

catalog

box lid

How is metadata different from How is metadata different from normal data?normal data?

No clear distinction!No clear distinction!

Metadata is also dataMetadata is also data

Metadata can have metadataMetadata can have metadata

Meta-metadata?Meta-metadata?

How much metadata?How much metadata?

Too little metadata?Too little metadata?Different objects may have the same Different objects may have the same

metadatametadata

Too much metadata?Too much metadata?You may never get startedYou may never get startedCollection may take too longCollection may take too longCollection may be inconsistent / incompleteCollection may be inconsistent / incomplete

What is goodWhat is goodmetadata?metadata?

Metadata should be accessibleMetadata should be accessible

Easy to findEasy to find

ReadableReadablePhysical: legible, permanentPhysical: legible, permanentDigital: standard, non-proprietary formatDigital: standard, non-proprietary format

Metadata should be meaningfulMetadata should be meaningful

Relationship to data should be clearRelationship to data should be clear

DigitalDigitalEncoded content should be parse-ableEncoded content should be parse-ableXML should be well-formed, validXML should be well-formed, valid

Metadata should be accurateMetadata should be accurate

Adds to recall and precision in searchingAdds to recall and precision in searching

Not all metadata is apparent from looking Not all metadata is apparent from looking at the data itselfat the data itself

False metadata may lead to false False metadata may lead to false conclusions about the data!conclusions about the data!

False metadata: ExampleFalse metadata: Example

Apparent from file:Apparent from file:

4800 x 6800 pixels4800 x 6800 pixels

Metadata: 400dpiMetadata: 400dpi

Conclusion:Conclusion:

Image: 12in x 17inImage: 12in x 17in

Paper: 11in x 16inPaper: 11in x 16in

False metadata: ExampleFalse metadata: Example

Apparent from file:Apparent from file:

4800 x 6800 pixels4800 x 6800 pixels

Metadata:Metadata: 200dpi 200dpi

Conclusion:Conclusion:

Image: 24in x 34inImage: 24in x 34in

Paper: 22in x 32inPaper: 22in x 32in

OCR: Optical character OCR: Optical character recognitionrecognition

An automated process of turning images An automated process of turning images of letters into (searchable) textof letters into (searchable) text

Very common metadata for images of Very common metadata for images of books/newspapersbooks/newspapers

Often uncorrected, somewhat inaccurateOften uncorrected, somewhat inaccurate

Uncorrected OCR: ExampleUncorrected OCR: ExampleTHREE DS TRIUMPHTHREE DS TRIUMPH

Zacaweista Easily Qutsprints T Zacaweista Easily Qutsprints T SS

Jordan and MorsunJordan and Morsun

Stadium Purse Feature of Stadium Purse Feature of Wash ¬Wash ¬

ington Park ProgramWeatherington Park ProgramWeather

and Track Conditions Idealand Track Conditions Ideal

HOMEWOOD Ill June 2 HOMEWOOD Ill June 2 Zacaweista theZacaweista the

good son of High Time in the good son of High Time in the Three DsThree Ds

Uncorrected OCR: ExampleUncorrected OCR: Example

ucrendifi-chdlcogrdpboilli, cuiusmahm ucrendifi-chdlcogrdpboilli, cuiusmahm pr£chr& iUiusdrtis imprcffospr£chr& iUiusdrtis imprcffos

rie inuentorcsfucre, trddidifftm, utedcretur, rie inuentorcsfucre, trddidifftm, utedcretur, exempkrcocilijTriburicrtexempkrcocilijTriburicrt

Why metadata?Why metadata?

Metadata can be used to Metadata can be used to identify dataidentify data

Title / author on the spine of a bookTitle / author on the spine of a book

File namesFile names

LabelsLabels

Metadata can be used toMetadata can be used tointerpret datainterpret data

Instructions for tax formsInstructions for tax forms

Language / character encodingLanguage / character encoding

XML DTD or schemaXML DTD or schema

Metadata can be used toMetadata can be used tosearch datasearch data

Card catalogCard catalog

IndexIndex

Search engineSearch engine

Metadata can be used to Metadata can be used to manage datamanage data

Call numbersCall numbers

““Burn by” dates on file boxesBurn by” dates on file boxes

Rights and accessRights and access

Metadata can be used toMetadata can be used tocommunicate about datacommunicate about data

Finding aidsFinding aids

AbstractsAbstracts

OAIOAI

Who uses metadata?Who uses metadata?

Internet users use metadataInternet users use metadata

Librarians use metadataLibrarians use metadata

ReferenceReference

CatalogingCataloging

Collection DevelopmentCollection Development

Children use metadataChildren use metadata

What is metadata?What is metadata?

You already understand You already understand metadata!metadata!

</METADATA></METADATA>

Any questions aboutAny questions aboutmetadata?metadata?

top related