janez Štebe ddi experience in adp (2002) arhiv družboslovnih podatkov (adp) university of...

37
Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: [email protected] URL: http://www.adp.fdv.uni- lj.si MOST (UNESCO) and GESIS workshop, Berlin, 22-24 February 2002

Upload: elfreda-newton

Post on 24-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Janez Štebe DDI Experience in ADP (2002)

Arhiv družboslovnih podatkov (ADP)

University of LjubljanaE-mail:

[email protected]:

http://www.adp.fdv.uni-lj.si

MOST (UNESCO) and GESIS workshop, Berlin, 22-24 February 2002

Topics of a presentation

A brief history of technical standards and its influence on Data Archives organisation

The adoption of DDI in 1999

Advantages and disadvantages of using existent but still emerging standard

What are XML and DDI?

Quick look inside DDI DTD document structure

DDI XML Codebooks production line in ADP

Discussion

A brief history of data archives technical standards (Tannenbaum,

Taylor 1990)

Late 1950s – IBM cards

Easily reproduced, recycled – the advent of DA

1960s – electronic computers – end of storage standards

A task of data conversion and interchange – DA matured

Beginning of the www era in early 90s (DDI Committee, 2001)

CSSDA electronic codebook specificationOSIRIS Codebook Dictionary (SRC,ICPSR)Standard study description

But lack of coordination resulted in noncompatible catalogues

“Midwife function” (Scheuch, 1990)

A role of ZA in late 1960 when 5 new archives were established in Europe:

“offers to share experiences, especially of past errors”

“technical information on data storage and retrieval”

Situation in 1997 when ADP establishes

“Multiplicity of classificatory languages, search techniques and standards for documenting data” (DDI Committee, 2001)

Every organisation adopt its own dialect of existing standards

A CESSDA IDC functioned as a lone example of still living integrating efforts

But... DDI was under discussion

March 1999 – DDI Beta version became operable

ADP applied for a grant which secured a six-month long intensive learning and practise of its own XML codebooks production

Results:

1. Successful implementation of first ten XML codebooks

2. Enhancing a production line for a routine codebook production.

2000 - 2001

Preparation of our own XSL for XML Codebook presentation on the internetMarch 2000 –DDI DTD Version 1.0 was publishedMachine conversion of DDI DTD Beta XML Codebooks into 1.0 version Continuing production of XML Codebooks

NESSTAR

Meanwhile a parallel refinement of NESSTAR tool was developing, which promises to add functionality to a growing collection of XML codebooks

End of 2001 – a configuration of ADP NESSTAR server catalogue

Advantages and disadvantages of using existent but still emerging

standard

There is no need for (re)inventing a local catalogue rules

Cooperation in document production (sharing documents between sites)

A danger of staying alone if others will not adopt the same standard

Less capability to add specific emphasis according to local needs

+/ -

Use of existing and emerging software tools suitable for the standard environment

Virtual catalogue

Conversion tools from SPSS and CAI software files

Dependency on others timetable in dynamic of tools production

E.g. NESSTAR was late in full adoption of UTF-8 convention which was crucial fur us

What is xml?

“XML is to a document’s intellectual content what HTML is to the physical structure of that document” (Thomas, Bloc 2001)

Why XML?

XML can be accomplished without professional or expert knowledge (user-friendly)

It is ready for preparing a multiple format presentations, e.g. printed book, internet etc.

It can be filled by different authors - each with specialist knowledge of its subject area. All obey the same content structure.

DDI DTD <> XML?

DTD= xml Document type definition

DDI DTD = a special Data documentation initiative XML Codebook definition

A Codebook xml document must be “well-formed” and “valid”

Well-formed

Any XML document, e.g. HTML, can be well-formed – in accordance with the XML syntaxMain features: <tags> must be closed</tags>Sensitive “UPPER–lower” case namingOnly one <tag-name ID=“id-entry”> per document

Valid = Well formed +

Conforms to a specific DTD

Example: an underlined path calls ...

<!DOCTYPE codeBook SYSTEM "CONFIG10/CODEBOOK-EN.DTD“>

<codeBook>

<docDscr> ...

... a file "CONFIG10/CODEBOOK-EN.DTD“>(Content of a file):...<!ELEMENT codeBook (docDscr*

, stdyDscr+ , fileDscr* , dataDscr* , otherMat*) >

<!ATTLIST codeBook %a.global; >

...

What does it all mean?

You do not have to look in the “machine-readable” “codebook.DTD” file to fill-in a .XML Codebook: A XML editor helps to check well-formedness and document

validity It helps choosing appropriate elements in accordance with

the DTD while editing

A “human-readable” Tag Library consists of element definition with practical examples. It gives you guidance on type and form of information

Let’s look

Inside DDI DTD document structure...

Integrates different levels of information in a same documentdocDscr (XML document and sources description)stdyDscr (Overall study + stdy level references)fileDscr (Physical data files)dataDscr (variables)othMat (additional material for variables documentation)

It specifies both...

The content of catalogue - suitable as input to virtual catalogue of different sites, produced on various platforms.

The content of codebook (variables description) – suitable as input to “virtual library of all individual measurements in the studies in a collection”

A dilemma of Library vs. Data service concept (Scheuch, 1990

The unit of storage is “study”

The unit of storage is the variable

In a DDI DTD XML codebook you can integrate meta-information about...

Intellectual content of a study

Its scope

Methodological details

Retrieval and dissemination policies

File location and format

(+) References to accompanying documents, e.g.

Reports on methodology,

Publications,

Classifications lists,

Questionnaires and similar,

Computer syntax files,

Tables of results, etc.

(+) Hyperlink cross-references inside and outside document

The use of ID and IDRefs attributes

The use of URI attributes

To sum up:

XML is similar to HTML in that it is:

Easy to use,

Broadly accessible,

Hyper-textual

In addition it has:

Computer&human readable and understandable structure of document content

DDI XML Codebooks production line in ADP

First step:1. Basic information about new data set file, depositor,

and accompanying material is first entered in ADP Inventory book (ACCESS Data base)

2. After choosing best suited predefined XML DDI Codebook template we extract the information from ACCESS data base to the draft XML Codebook

3. A resulting codebook is moved to an Internet catalogue for quick info about new study, viewing is supported by referenced XSL through IE5 or better.

Second step: Full Study description

1. A depositor is requested to fill a MS Word form, containing elements corresponding to DDI DTD study description section

2. A draft XML Codebook from previous step is edited with XMetaL® XML editor. Missing peaces of information are added manually

Third step: Codebook Data description generated from SPSS

data file 1. Final SPSS data file, if fully labelled, is

converted with the NSD XML Generator ® to an XML data description section of DDI Codebook and integrated with previous study description

Step 4: Codebook Data description with full questions text

1. For most important data sets full questions text is entered into dD section from original questionnaire text file

or 2. by using a conversion tool from CAI

computer readable files to a DDI XML files.

Finally NESSTAR ®

Final two documents, Slovene and English language DDI XML Codebooks, are converted into a NESSTAR complaint format and together with the data file published into a NESSTAR catalogue.

Codebook.xsl

Original paper documents

Free-text documents

Codebook.xml (XML Editor)

Computer readableHuman + computer

readable

Human readable

IE explorer view Printed codebook

NESSTAR Catalogue + Data Explorer

SPSS data + labels,

CAI quest. docDscr stdyDscr fileDscr dataDscr othMat...

Coversion Tools

stdyDscr form filled-in by depositor

Code-book.dtd

Tag Library

Common issues in DDI XML codebooks production

1. XML editors does not necessarily support UNICODE

2. The use of entities in XML document helps to standardise document production, makes it faster and easier to translate into English

Conclusions:

DDI DTD receive growing attention in a community which guaranty production of new tools for enhancing its use

Despite continuing developments and overlapping archival standards, DDI 1.0 as today’s technology promises the longevity of XML Codebook 1.0 documents

Slovene ADP have taken the experience with DDI for guidance of its organisation.

Main references

DDI Committee (2001): The Data Documentation Initiative (DDI) Version 1.1: The New Specification for Social Science Metadata. Project Description.

Data Documentation Initiative. A Project of a Social Science Community. (2002) http://www.icpsr.umich.edu/DDI

Scheuch, Erwin K. (1990): From a data archive to an infrastructure for the social sciences. International Social Science Journal, No. 123, pp. 93-111.

Tanenbaum, Eric and Marcia Taylor (1990): Developing social science archives. International Social Science Journal, No. 124?.

Thomas, Wendy L. And William C. Block (2001): An Introduction to the Data Documentation Initiative (DDI). ICPSR OR Meeting 2001. http://www.icpsr.umich.edu/DDI/PAPERS/