Transcript
Page 1: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

The Reality of Digital Transfer

@ArchivesNZ

Ross Spencer, Talei Masters

Archives New Zealand

Records Management Network Event,

Tuesday November 25 2014

Page 2: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

Background

Born Digital and Cultural Heritage Conference

Melbourne*: http://bit.ly/1utAqz0

Spencer, Braden, Hutar, Masters, Crouch, Mosely, Fly

Away Home: Pilot Transfer of Born-digital Records at

Archives New Zealand

Collected our experiences from late 2013 through to early

2014. Royal Commission work through to GDAP Closure

and beginning of eAccessions.

* http://playitagainproject.org/conference-report/

Page 3: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

A missing piece of the jigsaw…

• An appraisal of the technical challenges

• The first of a much bigger puzzle?

• We understood a minimal set of descriptive

metadata e.g. transfer metadata file; mapping

of EDRMS fields to that schema

• But the collection profile was missing –

technical implications of digital preservation…

Page 4: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

And the numbers were/are huge!

Royal Commission on the Pike River Coal Mine Tragedy

Lotus Notes DMSAccessData Summation

Two EDRMS:

374,264 Files (200GB)

66,580 Directories

3,892 Unidentified Objects

15 Unidentified Extensions

87 Known Formats

55,425 Duplicates (Content)

Analysis time: 108 minutes

24,190 Files (5GB)

641 Directories

1,254 Unidentified Objects

8 Unidentified Extensions

62 Known Formats

6,200 Duplicates (Content)

Analysis time: 44 minutes

Page 5: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

There’s more…

The Canterbury Earthquakes Royal Commission (partial stats)

11,505 Files (57GB)

246 Directories

123 Unidentified Objects

2 Unidentified Extensions

55 Known Formats

2,468 Duplicates (Content)

Analysis time: stats not collected

Lotus Notes DMS… (but a different flavour!)

One EDRMS:

Page 6: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

Performance of tools…

Just one (fairly profound?) example for you…Pike River

metadata extraction, and checksum generation… ‘triage’

2949m21.680s

49 Hours!

Page 7: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

Questions already forming…

• How do we speed things up?

• How do we make reporting consistent?

• Where do we begin with this information?

• Some answers already appearing: stats report is now

generated by a Python script in response to these

issues: https://github.com/exponential-decay/droid-

sqlite-analysis

• Relies only on The National Archives, DROID tool, file

listing, format ID, and checksumming utility

Page 8: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

eAccession One [e1]

Legacy accessions that we have opportunity to utilise lessons

learned from Initial Digital Transfers…

175 Files (166.5 mb)

10 Directories

0 Unidentified Objects

0 Unidentified Extensions

7 Known Formats

0 Duplicates (content)

Page 9: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

eAccession Four [e4]

eAccessions were seen to be the least complex and allowed

us to focus, primarily, on the challenge of ingest…

1295 Files (565.0 mb)

6 Directories

2 Unidentified Objects

1 Unidentified Extensions

12 Known Formats

2 Duplicates (content)

Note: Obscured issue in original statistics…

A number of false positives! System files

identified as something more generic.

Thumbnail preview files, and Serif PagePlus

might normally look like MS Office file-like

objects.

Page 10: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

Technical Challenges in e1 and e4

• [Tools] Ability to handle multi-byte character encodings. Maori macrons

‘Ā’.

• [Tools] Unidentified files and false positives.

• [Tools] Recording of pre-conditioning actions on ingest into digital

preservation system.

• [Tools] Implementing CSV ingest mechanism; configuration, code, and

workflow.

• [Pre-conditioning / Tools] Digital preservation system’s ability (Rosetta)

to handle contiguous spaces in filenames.

• [Pre-conditioning] One invalid JPEG. Required rearrangement of

application marker segments.

Page 11: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

What next..?

• One step at a time. Accessions e1 and e4; develop capability

further with e2 and e3.

• Incorporate metadata extraction tool JHOVE into process

following experience with e1 and e4, possibly via FITS

• Refine current metrics and the presentation of statistics e.g.

make more useful for Archivists working on the born-digital

we’re already in possession of…

• Ideal: Archivists knowledge (processes, analysis, diagnosis)

becomes actuated.

Page 12: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs

What next..?

• SCALE!

Thank you!

Page 13: The Reality of Digital Transfer @ArchivesNZ

Department of Internal Affairs


Top Related