patricia galloway school of information university of texas at austin arma austin/san antonio annual...

37
Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much, how long, and how?

Upload: archibald-george

Post on 10-Jan-2016

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Patricia GallowaySchool of Information

University of Texas at AustinARMA Austin/San Antonio Annual Seminar, 2015

Keeping everything digital: how much, how long, and how?

Page 2: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• “The digital item created and made accessible as part of a digital preservation system is fundamentally different from an analogue item. Period.”--Edward M. Corrado and Heather Lea Moulaison

• What we thought we knew is wrong• There isn’t enough room for all of it [how much did you pay for that 2Tb

drive you have at home?]• Nobody wants it all [viz: the Long Tail—or Wikileaks, maybe?]• You can’t search it all [viz: Google]

• RMs are joining in [viz: Steve Bailey, Managing the crowd]

Surely you don’t mean it?

Page 3: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Is the conventional records schedule “all”?• Can you trust automated archiving to do what the schedule says?• Do you know where everything is replicated?• What units of your organization are keeping transactional data? What

are they doing with it?

• Is every keystroke “all”? Maybe, if valuable to the organization• Scientists whose research may be full of patentable material• Creatives whose work you have already paid for and not yet used

• Is there another way to think?

How much? How to spell “all”?

Page 4: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Organizations have plenty of creatives on the payroll, most of their creation on the clock or contracted for

• Scientists• patentable material in paper lab notebooks? NOT any more!• yet digital lab notebooks haven’t caught on; scientists use email, preprint

circulation of their work, other ways of communicating professionally• Other creatives (that includes artists and programmers)

• maybe you can capture cocktail napkins, but don’t count on it• much lives on the network; people get ideas at 2AM• in-house developed mobile apps may be partly owned by others (anybody

asked for a DUNS number?)

Scientists and creatives

Page 5: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Do you have customers?• Do you keep data on their transactions?• Does this data have value for your organization?• How long does that value last? (Amazon has a record of every

book I have ever bought since 1996) When will it stop?• Does your organization share this data with others (read your

bank statement)?

Transactional data

Page 6: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• What are your email policies? How do you enforce them?• From the desktop• From the mobile device• From home

• What are your BYOD policies? Or do you provide mobile devices?• Do you have email bots drawing on your (customer) data mart to

send emails and catalogs?• My solution years ago: record everything crossing the network

connection; it’s a bit harder now with wireless connections!

Email/Communications

Page 7: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Which of the below are more scary:• Sarbanes-Oxley requirements around work product• NSA (cf. this week’s revelations of hacks buried in the firmware of hard

drives)• Hackers-for-profit (striking health care, government, banking, Target)

• One defense: know what you have• DIY digital forensics on everything

Regulatory requirements

Page 8: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Websites: do you think they are simple to preserve?• As front-ends for databases• Constantly updated• Dynamic: formally complex digital objects

• Social media• Twitter• Instagram• Internet of Things (e.g. Jawbone, break room refrigerator)

What’s left?

Page 9: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Keeping for operational interest (foresight of leaders? destroyed upon departure?)

• Keeping pending in-house forensic examination (which Enron didn’t do)

• Keeping for historical interest (would you rather be known for good internal recordkeeping than newspaper headlines?): thus longer than the life of the organization, longer than your own life, maybe

Keeping how long? A knotty question

Page 10: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• IT best practices for backup may not be enough• IT practices for corralling data may not be well-understood—may

do things fine for backup, not fine for recordkeeping• The whole of the systems (including their audits) need to be

understood and vetted• Networking has opened house systems to the Outer World:

where are the vulnerabilities?• Timestamps (timestamps?!) may not be trustworthy, even within

a closed system (the dominant operating systems don’t treat them the same)

How? First some alarming details

Page 11: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Group A• Task 1• Task 2

Group B• Task 1• Task 2

Group C• Task 1

• IBM, Bill Inmon• Decision support systems• 1980s, 1990s, present• “central repositories of integrated

data from one or more disparate sources…maintain an infinite history which is implemented through processes that periodically migrate data from the operational systems over to the data.--Wikipedia, “Data Warehouses”

• “A data warehouse is really an abstraction, a logical representation of clean, vetted data that executives can use to make decisions.”—Wayne Eckerson

Data Warehouse: is this all?

Page 12: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Source data coming from operational machines

• Data staging and processing “cleans” data before it is used by data warehouse or big data analytics

Big Data plus Data Warehouse: is this all?

Page 13: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• For data warehouses, data are “cleaned” to rid them of errors or contradictions with related data

• For big data analytics, data are even more likely to be cleaned, as they come from multiple very different sources, not a single organization

• But what does “cleaning” do to the genuineness of the data?

• Answer: cleaned data can only testify to what was input into specific decision-making or analytical tasks, not to the source’s genuineness

Whoa! What is this “cleaning”?

Page 14: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• Well, it’s tricky• First, with everything connected, it’s hard to know where to stop

(example: websites on the Internet Archive’s Wayback Machine with dead links)

• Let’s start with thinking in terms of a single desktop machine and the order(s) represented by digital objects placed in that machine

How to keep it all, then?

Page 15: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Respect for Original Order• Archival article of faith: part of the late 19th-century view of how to

preserve physical archival materials• Thought of as “internal order” by the French, Registraturprinzip by the

Germans• It is the individual-document counterpoint to the aggregate “external

order” dictated by provenance• It is also a means of stopping time and freezing groups of unique paper

documents in a single state of relationship• But are records managers and archivists compelled to preserve only a single

state of the fonds? Do records have a single state and a single order?

Page 16: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Original Order as visible material formalism

• Represents hierarchy (nested power containers): office within division within department within agency

• Represents sequence (record containers): document within folder within drawer within filing cabinet within room within office

• These formalisms have evolved to fit organizational governance and management of physical records

Page 17: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Original Order as invisible logical formalism

• Hierarchy models bureaucracy and levels of power (this applies to aggregates of records)

• Sequence models linear filing system and custodianship-in-use (result of changes over whatever time different materials have been kept: filing errors may be preserved if system allows multiple orders)

Page 18: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Problems with Original Order• Theoretical stasis that never actually existed• Inability to model active use (and messiness; and the

causes of error)• Problem of choosing an “original” state• Questionable validity of “restoration” to some state as a

standard procedure• Any actual order is often resisted in practice as “messy”

(like correcting George Washington’s spelling)• Too much process (not enough product)!

Page 19: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Respect for Order-as-Used?• Part of the late 20th-century recognition that digital objects

are actually persistent and have affordances additional to paper

• Explained (metaphorically by digital forensics experts) as due both to the “geological” activities of the digital environment’s programmed operation as well as to the “archaeological” expression of the creator’s intentional actions

• Important to realize that the “archaeology” may not be all that’s at work

• Preservation of “order-as-used” preserves (partial) multiple states of groups of digital documents

Page 20: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

How to “arrange” digital records?

• Purpose of paper filing system: predict document location by using few rules, so file using few rules

• Purpose of digital file system: predict document location by using as many rules as necessary (or search) so that the technology can run efficiently, without need for the user’s knowing what is going on

• Purpose of digital file system interface: allow the user a choice of ways of seeing the file system, each with few rules

• aids in sensemaking ordering(s) of documents• Hides technological construction to mimic paper affordances

Page 21: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Virtual digital file arrangements

Windows Explorer (XP) Mac Finder (OSX, cover flow) (courtesy of Aashish Sheshadri)

Page 22: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Logical order(s): “archaeology”

• System options for users to “arrange” digital documents visually and cognitively

• Most popular selections of attribute categories by individual users

• Temporal• Naming (frustrations when system insists on alphabetical)

Page 23: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Capturing logical order(s)

• Document the context of creation and use: processing and storage hardware and software stacks; may include forms, templates, etc.

• Document users’ mental models: accustomed filing systems, records schedules, etc. (note these are becoming more and more digital themselves)

• Capture representation(s) of directory order(s) in use

Page 24: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Material order: “geology”

• Every hard drive is a palimpsest of active and erased (and remanent) files

• Sectors are used to achieve speed of recall• Files may become fragmented to noncontiguous sectors during heavy

use• Fragments are held together by a record kept by the system of where

all the parts are• When a copy is made, the parts are pulled together sequentially

(serialization), though they may have to be refragmented to fit their destination (deserialization)

• All this activity is carried out by the operating system and hard drive software without the attention or knowledge of the user

Page 25: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Actual digital file arrangement --Mike Dziekan, “An Inside Look at Hard Drives”

HD platter with single file in contiguous sectors

HD platter with multiple files scattered among sectors

Page 26: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Capturing material order

• Option 1: Capture by serializing directory order of overt files (files available through the normal user interface) using “tar”-style filewise copy: this normalizes a single order and does not capture other activity (e.g. erasure)

• Option 2: Capture by serializing formatted order of overt and covert (erased) files as distributed on storage medium using “dd”-family bitwise sequential copy (or various digital forensic programs), aka imaging: all the bits are copied just as they stand on the disk, to be interpreted when viewed inside an appropriate system

Page 27: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

So let’s reconsider the cast of characters• The computer system’s invisible activities (“geology”) create the material

“reality” of digital files and their relationships, the complexity and interactions of which have been ill-studied

• The computer system’s visible interface allows the creator to choose ways to see and interact with the digital documents and relationships she chooses to consider (“archaeology”)

• All of these actors (human creator, computer hardware, computer software, emergent content) are constantly in flux

• That flux doesn’t stop just because the documents are consigned to a digital repository, because the repository itself is made out of hardware and software and it is constantly in motion doing its job

• Our task is to make sure that repository changes don’t affect the files

Page 28: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

The Open Archival Information System Model

Page 29: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Looking closely at an OAIS fragment

• How the archival materials are transferred and accessioned as a Submission Information Package

• Incorporates material frameworks• Creator or other human participation• Means of preparation: actions on media, including actions

conditioned by the materiality of the media (capturing and transmitting orders)

• Media or network as route of deposit

• Occurs in temporal frameworks

Page 30: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Order can be preserved here in two ways: either inclusively (through forensic imaging of entire content of medium) or exclusively (by choosing to copy only overt files according to OS-applied structure).

Order is affected here by the creator’s activities over time; the order as last found is that which is formally captured through submission of removable media or the creator’s choice to self-archive a group of files as such.

Once captured into the repository, any order can be manipulated by mapping it to another structure, such as one or more finding-aid orders, one or more creator orders, or one or more user-created orders, all of which can be made available to the public. It may be of interest for these orders—especially successive finding-aid orders—to be archived in their turn.

An order may be ignored through granular capture, either by the creator submitting directly to the repository or by capture using a records management application (same result with a versioning system of another kind).

Digital Orders in Archival Ingest Process

Page 31: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Order as created

• Signifies a logical state• Often not seen by archives unless HD is handed over or captured

by backup or RMA (assuming the RMA actually preserves relationships among files other than artificially-imposed ones)

• If not saved, partial reconstruction requires• Capturing material context (data on technological stacks)• Capturing log files• Capturing logical context (narrative of use)

Page 32: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Order as last found (as received by archives)

• Signifies a logical state• Frequently seen by archives, depends on how files were handed

over• Preservation requires

• Capturing material context (data on technological stacks)• Capturing logical context (narrative of use)

Page 33: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Transfer order: As backed up to removable medium

• One of many choices for transfer• Signifies a material state (“fixed to a medium”) dependent

upon a material process• Frequently seen when creating action is carried out for

another purpose (such as backup)• Preservation requires

• Capturing material context (data on technological stacks)• Capturing logical context (customs of use)

Page 34: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Order as captured into repository

• Copying (reserialization) of objects into repository in received sequence (should be grouped as received)

• Addition of metadata recording transfer event• Mapping ingested metadata onto repository resource-

discovery metadata• Addition of metadata preserving identity inside repository• Placement in filestore• Generating fixity metadata

Page 35: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Order as provided by repository

• Signifies potential material/logical states• Multiple orderings possible, including order as last found, finding-

aid order• Multiple displays using emulation environments to restore original

look-and-feel, transformation tools for current reuse• User may be offered additional ways of arranging/ordering

objects

Page 36: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

• How much: as much as can be feasibly kept, because as with paper, a little time makes our judgment and our ability to extract more information better

• How long: as long as reasonably valuable to the organization—though that may prove to be in the manipulated form of a data warehouse of some kind or may be handed over periodically for mandated deposit or for permanent historical value

• How: by recognizing that digital records are not only affected by the will of the user who created them, but also by those of people who designed the system in use and yes, dear audience, our own beliefs and activities

So: how much, how long, how?

Page 37: Patricia Galloway School of Information University of Texas at Austin ARMA Austin/San Antonio Annual Seminar, 2015 Keeping everything digital: how much,

Contact information: [email protected]

Questions?