dr anna bülow, head of preservation 1 october 2011, celebrating the census
Post on 11-Jan-2016
25 Views
Preview:
DESCRIPTION
TRANSCRIPT
Dr Anna Bülow, Head of Preservation
1 October 2011, Celebrating the Census
Preparing the 1911 census for digitisation
1911 Census (RG 14)
• Census of England and Wales of 2 April 1911
• 34,998 volumes
• Arranged according to geographical district
• Approximately 8 million schedules
• 530 x 315 mm (bigger than A3)
• Written on both sides – official address on one side, details of people at that address on the other side
• Enumerator’s Summary Books (RG 78) (2,015 pieces)
3
• Single supplier contract consortium possible,
but 1 lead supplier• 2 contracts
1 supplier to scan,
1 supplier for online
development and maintenance• Total in-house development
TNA to develop and manage all contracts for scanning, online service and support
• Service management contract TNA to let contract for development and management
Background: how to go about scanning the1911 census?
4
Contracting a supplier
• OJEU notice (Official Journal of the European Community)
• Tender throughout Europe
• Conditions of performance (amongst others) scanning must cause absolute minimum of damage records must be kept safe and secure at all times
• Competitive dialogue
• Contract awarded to ScotlandOnline, later BrightSolid scanning subcontracted transcription subcontracted
5
Condition - appearance
• Extremely consistent
• Standard volume:o 4 holes along the
edge of the spineo schedules held in
place through 2 long green tags, with 2 bows on top
o soft linen spineo hard covero belt riveted to cover
6
Condition - damage
• 1911 census was accessioned in 1966
• Closed volumes were stored off-site
• Not boxed
• Water damage and subsequent mould growth
• Unclear when damage
occurred
• Damage had to be dealt with
too ensure optimal image qualityo minimise risk during handlingo prevent health & safety risks
7
Surveying
• 4 staff surveyed between 19-23 July 2004
• Statistical sample: confidence level of 95%
• 403 volumes (every 87th volume)
• Focus on ease of scanning, image quality, and Health & Safety (mould)
8
9
• Damage distributed throughout the entire series
• Typical damage: tears, folds, curled edges
• 7% mould damage
• < 2% severe damage (521 volumes)
• 2 volumes missing
Survey results
Labels
• Original labels falling off the spines
• Re-label all 34,998 volumes
• Identify badly damaged volumes for preparation through Collection Care
10
Damage – folds
• Corners
• Across the schedules
• Obscure information• To be dealt with by scanning team
11
Damage – minor tears
• Usually along the outer edges• Where tears were smaller than 5 cm, scanning team would deal with them
12
Damage – major tears
• Minimise risk of schedules ripping apart during scanning• Where schedules were in more than one piece, carried out through
Collection Care• Where schedule was still together, it was put in polyester envelopes by
scanning operator
13
Damage – crumpled edges
• Sleeved by scanning team unless heavily damaged
14
Damage – mould
• Presents health risks• Trained scanning team to recognise and report• Always cleaned through Collection Care within
fume cabinet
15
Damage – stuck pages
• Due to previous water damage
• In a few cases whole volumes stuck together
• No option of not separating schedules
• Most time consuming work in terms of preparation
16
Damage – ‘castor oil goo’
• 2 volumes with black ‘goo’
• All pages stuck together
• Pages separated and sleeved
• Sleeves remained after scanning
17
Other issues – metal fastenings
• Metal fastenings getting rusty
• Difficult to remove as corroded metal would break
• Taken out in order to separate sheets
18
Other issues – inserts
• Some loose inserts within volumes• Some fastened inserts: adhered, pinned, tagged,…• Different size from schedules• Ensure correct association and sequence
19
Other issues – institutional booklets
• Bound like schedules
• Booklets meant that sheets became double the size• Spines were cut
20
Other issues – belts
• Belts had sharp buckles• Complete removal considered• Schedules were not retagged and bound• Held together by cotton tapes
21
Other issues – binding
• 2 options:o re-tag and bindo cotton tapes and box
• Horizontal storage after digitisation
22
Dealing with damage – pilot studies
• 2 pilot studies• First study involved 7 volumes resulting in inconclusive figures• Second pilot study
o involved 3 conservatorso for 20 weekso from November 2005
• Just over 200 volumes were prepared during that time resulting in satisfactory figures ono total time estimateso cost estimateso space requirements
23
Dealing with damage
• Focus ono costo speedo image quality
24
Supplier Conservator To be confirmed
1. Folds
2. Tears
3. Crumpled, curled
4. Damaged covers
5. Pages stuck
6. Mould
7. Metal fastenings
8. Small inserts
9. Belts
10. Booklets
Image quality
• Balance between: image quality / speed of capture / speed of downloading• 24 bit colour uncompressed TIFF, 300 dpi
25
Scanning equipment
26
• AGFA S655 with modifications to accommodate historic documents semi-automated sheet feed straight path rather than drum tray at back of scanner to collect scanned schedules
Space requirement
27
• Six times as much space as the size of the document to accommodate document: un-scanned material, scanned material, box equipment: computer, scanner
Scanning operation
28
• Within one of the Kew repositorieso secureo fast in terms of
productiono easy to monitor
• Some shelving was removed to accommodate the operation
• Space adjusted to accommodate IT requirements (sockets, cables, etc.)
Scanning operation
29
• Scanning was sub-contracted to third party
• 5 scanning stations for schedules• 1 scanning station for book covers• Space for pre-preparation• Space for post-preparation• Scanning took place 12 hours a day
(Monday – Friday)• 2 shifts
Scanning order
30
• How long does it take to prepare?• How is the damage distributed?• Most damaged volumes took between 1 and 4 hours, averaging at
around 2 hours per volume
Distribution of Preparation Time (Mins)
0
10
20
30
40
50
60
70
80
90
50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
Prep Time (Mins)
Nu
mb
er
of
Sc
he
du
les
Less Than Mins
Scanning order
31
• Scanning as storedo starts with London, Surrey, Kent,…o London was the most badly
damaged
• Scanning according to population sizeo starts with Lancashire, London,
Yorkshire,…o best for phased release
• Scanning according to easeo starts with Nottinghamshire,
Gloucestershire, Worcestershire,…o maximise available preparation
time
• Final decisiono scan in order
Registration County
Frequency of Condition (F) F F*S
Condition Score (S) 2 3 4 5 6 9 10 30
London 1 104 6 10 170 291 5609Surrey 3 55 7 195 260 6118Kent 8 2 10 92Sussex 7 3 10 118Hampshire 2 1 2 5 74Berkshire 1 2 3 64Middlesex 1 15 16 454Hertfordshire 8 1 29 38 908Buckinghamshire 1 47 48 1414Oxfordshire 17 17 510Northamptonshire 2 2 4 68Huntingdonshire 0 0Bedfordshire 1 1 4Cambridgeshire 3 3 18Essex 4 11 15 346Suffolk 0 0Norfolk 0 0Wiltshire 0 0Dorsetshire 0 0Devonshire 1 2 3 64
Scanning speed
32
• Target rate of 40,000 images per day
• ca. 1,000 images an hour per scanner
• Scanners allowed for scanning both recto and verso simultaneously
• Book covers were scanned separately
Working with scanning company
33
• Census was scanned through Advanced Data Services (ADS)• Working together before scanning to agree on
o TNA security requirements (closed documents)o scanning equipmento lay-out of work spaceo workflowo scanning speedo preparation of volumes before ando after scanning
• Working together during scanningo document handling trainingo flagging up of problem documents
Timeline
34
July 2004 survey of 1911 census
November 2005 preparation for scanning started
June 2007 preparation for scanning finished (20 months)
July 2007 scanning started
April 2009 scanning finished (22 months)
13 January 2009 online service launched with majority of English counties
March – April 2009 further English counties added
June 2009 Welsh counties added
18 June 2009 launch complete
3 January 2012 full, un-redacted release
Final statistics
35
Total number of volumes prepared 2,136 (6.1%)
of which 1,108 had damage codes
Total number of pages cleaned,
separated, flattened, repaired 53,128
Total number of schedules sleeved 14,282
Time taken for preparation 20 months
of which 5 months for pilot
Time spent on preparation through
Collection Care 255 days
Time spent on preparation through
agency staff 231 days
Total number of images 18 million
Total number of people involved > 350
of which 280 transcribed the census
Acknowledgements
36
…too many individuals to list, but in particular our commercial partners:
BrightSolid (www.brightsolid.com)
Advanced Data Services (www.ads.uk.com)
Data Capture (www.datacapture.com)
top related