Transcript
Page 1: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 1

Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research Board.

Databases of Historical Sources: Principles of Good

Design

Mark Merry

History Data Service

Page 2: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 2

History as a product

“History may be thought of as either product or process. As a product, a piece of history consists of a representation of a past reality based upon the interpretation of a body of known facts. Such representations of past realities are always bounded: they treat a subject chosen by the historian which might be static (the situation at point x) or dynamic (how the situation changed between points x and y).”

Harvey and Press, Databases in Historical Research (1996)

Chapter 1

Page 3: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 3

What is a database?

• “Basically a computerized record keeping system - that is, a system whose overall purpose is to maintain information and to make that information available on demand”

C.J. Date, An Introduction to Database Systems.

Vol I. Seventh edition (1999)

• A Database Management System (DBMS) is the computer application built around a database to provide flexible ways of storing, manipulating, and examining the data

• A DBMS on a personal computer will provide facilities for:– inputting, sharing, modifying, retrieving and deleting data– querying the data (SQL)– producing reports based on the data– building ‘front-ends’ for users

Page 4: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 4

What can databases do for us?

• Store and organise large amounts of information• Provide a surrogate for the original (inaccessible, fragile)

source• Group physically dispersed material together at one virtual

location• Provide an environment for manipulating and analysing the

content of the original source• To search/filter/summarise complex information quickly

– Analysis of large amounts of data– Analysis of complex interrelated data

• Downside: the time and effort needed to convert original sources into a database

Page 5: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 5

Source → Digitisation → Resource

• The ‘input channels’ of digitisation (keyboard, scanner etc.) are narrow and can only capture a small proportion of the source’s information content

identify aspects of source to digitise

chose digitisation method

chose data model

Page 6: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 6

Organising data

• The field is the basic unit of data in a database. A field stores a single piece of information of a particular data type

• Fields are combined to form a record. A record matches an entity

• A set of records with the same fields are collected together in a table

Page 7: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 7

Data models

• Data models are the abstract definitions of structures and relationships used to organise data

• A DBMS will implement a particular data model• Data models can be characterised by how they organise the

connections between different records:– flat file– relational– mark-up (hierarchical)

• Most DBMS’s available for personal computers are either flat file or relational

Page 8: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 8

Relationships

• One to one relationships connect one entity to one other entity

• One to many relationships connect one entity to one or more other entities

• Many to many relationships connect many entities to many other entities

Page 9: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 9

Relationships example

H'hold_n ADDRESSA010 PINNER COMMONA011 PINNER COMMONA012 PINNER COMMONA013 PINNER WOODA014 PINNER GREEN TOLL

H'hold_n SURNAME FORENAME OCCUNAMEA010 SNOOK GEORGE POLICEMANA010 SNOOK ANNA010 SNOOK SARAH HANNAHA011 DEAN JAMES SAWYERA011 DEAN MARGARETA012 ROBERTSON MARIA INDEPENDENT LADYA012 EDMONDS EMILY SERVANTA013 CRAWLEY GEORGE AG LABA013 CRAWLEY MARY ANNA013 CRAWLEY CAROLINEA013 CRAWLEY ELIZABETH

OCCUCODEOCCUNAME(blank) SCHOLAR AT HOME2PP13 SCHOOL MISTRESS2PP13 SCHOOLMISTRESS4DS1 SERVANT4DS3 SERVANT AND GROOM3AG1 SHEPHERD

Page 10: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 10

Databases of historical sources

• An historical source based database is a representation of the original source, but it is not an exact replica of the original source– information may be left out– extra information may be included

• An historical database should:– try to reflect the source accurately and completely – improve the usability of the source– integrate the source with other data (additional sources, coding etc.)

• NB: these are conflicting aims!• An historical source based database mixes elements of a

primary source with elements of a secondary source

Page 11: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 11

Building databases from historical sources

• Historians work with information they do not control– incomplete, poorly structured information of varying quality– sources intended for a different purpose– multiple sources not intended to be used in an integrated way

• Nature of historical sources– ambiguity: the meaning of material may be unclear or dependent on

its context– repetition: data is often repeated in different guises– variation: the same item can be referred to using a variety of terms

and spellings– variable structure: even apparently well organised sources often

have margin notes and other types of ‘random’ additional data

Page 12: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 12

Source → information content

• Simplify the source– Ignore unwanted information– Exclude certain types of information– Select information directly from the source or define a set of

summarised information based on the source

• Model the information content subset– Break information content into discrete elements of information– Describe the characteristics of each information element– Describe how information elements relate to each other

• Successful source analysis requires a good understanding of the source and of the purpose of the database

Page 13: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 13

Source → database content

fonts

columnstext

spacing

page size

date issue

fold linemarginalia

headlines

Page 14: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 14

Software & hardware

• Technical decisions are often the least important• Remember that there is nearly always more than one way of

doing something with a computer• Define what you need to do, then seek technical advice• Seek a second opinion!

– Technical support staff will often suggest what is most convenient for them, not necessarily you

– Commercial companies obviously have their own motives

• Look for software that supports common standards• Avoid little-used software with proprietary features• Recognise that hardware may need to be replaced in 2 or 3

years

Page 15: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 15

The ‘Three Layer Model’

Standardisation Layer•provides a foundation for analysing the data

•codes and standardisation rules are applied

Source Layer•an accurate digital representation of the source

•defines level of detail captured

Interpretation Layer•incorporates researcher’s knowledge and judgement

•Links records and forms aggregates

Page 16: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 16

‘Three Layer’ design examples

Source Standardise Intrepretation6 mnths 0.5 infant

ag. lab. agricultural labourer farm occupations

J. SmithJohn A. Smith, bakerJ.A. SmithSmith & Son Bakers

J. ? SmithJ. A. SmithJ. A. Smith? ? Smith

John Smith, Baker

MdlboroughMdsbroMeddlesbroMedelsbro

Middlesbrough Middlesbrough,Yorkshire

Page 17: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 17

Some simple design hints

• The smallest unit of data should match the smallest unit of analysis– if you want to look at people by last name then have separate first

and last name fields, not just a name field

• Don’t mix data types– separate numbers and words– identify numbers being used as words (addresses)

• Document everything you do, either in the database or with the database– data entry, data standardisation and coding, data transformations,

limits of data, issues of plausibility/probability, coping with uncertainty etc

– keep information that tracks the origin and history of the database.

• Add information, don’t delete information• Have a backup procedure!

Page 18: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 18

Further Information

• History Data Service

web - http://hds.essex.ac.uk

email - [email protected]• Michael J. Hernandez, Database Design for Mere Mortals : A Hands-On

Guide to Relational Database Design, Addison-Wesley, 1997• Charles Harvey & Jon Press, Databases in Historical Research,

Macmillan Press, 1996• C. J. Date, An Introduction to Database Systems, Addison-Wesley, 1999

(7th ed.)• SearchDatabase.Com: http://searchdatabase.techtarget.com/• Concordia University: http://www.cuaa.edu/computing/softrain

/access/access15.shtml • University of Newcastle Database Service: http://www.ncl.ac.uk/ucs

/databases/

Page 19: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 19

Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research Board.

web - http://hds.essex.ac.uk/

email - [email protected]

Page 20: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 20

Source Layer

• Acts as the reference version of the original source.– An accurate representation of the source, including errors, omissions

etc.– Contents determine the highest level of detail available about the

source in the database– Includes a reference to the non-digital original source– Includes a unique identifier for each item

• Implementation:– as long text fields containing full text transcriptions– as ‘blob’ fields containing scanned images

Page 21: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 21

Standardisation Layer

• Organises the information into discrete units with fully defined contents– Separates information in the source into separate fields according to

data type and data content– Simplifies the data by standardising and coding it– Normalises the data– Includes links back to the source layer

• Implementation:– Possibly as addition columns in source layer tables– Probably as separate tables with, ideally, a one-to-one relationship

to records in the source layer

Page 22: Part of the UK Data Archive and the Arts and Humanities Data Service. Funded by the Joint Information Systems Committee and the Arts and Humanities Research

Good Design - © History Data Service 22

Interpretation Layer

• Creates historical entities from the data and the knowledge and expertise of the historian– Incorporates interpolations and extrapolations from the data in the

standardisation layer– Selectively includes and excludes information from the

standardisation layer– Links separate records to form entities such as ‘individuals’ or

‘households’


Top Related