Download - Risk management and auditing
![Page 1: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/1.jpg)
Risk management and auditing
Dorothea Salo
![Page 2: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/2.jpg)
Threat model•“Preservation” means nothing unmodified.
• This is why it becomes such a bogeyman!
•Two things you need to know first:• why you’re preserving what you’re preserving, and• what you’re preserving it against.
•Libraries: your collection-development policy should inform the first question.
• Your coll-dev policy doesn’t include local born-digital or digitized materials? This is a problem. Fix it.
•The second question is your “threat model.”
![Page 3: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/3.jpg)
What is your threat model for print?
![Page 4: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/4.jpg)
Homelessness
![Page 5: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/5.jpg)
Water
![Page 6: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/6.jpg)
Flora and fauna
![Page 7: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/7.jpg)
Physical damage
![Page 8: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/8.jpg)
Loss or destruction
![Page 9: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/9.jpg)
Why did I just make you do that?
•I’m weird.• I’m trying to destroy the myth that any given
medium “preserves itself.”•Media do not preserve themselves. People
preserve media—or media get bizarrely lucky.•We need not panic over digital preservation
any more than we panic about print.•Approach digital preservation the same way
you approach print preservation.
![Page 10: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/10.jpg)
Now...
List important threats to digital data.
![Page 11: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/11.jpg)
Physical medium failure
![Page 12: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/12.jpg)
“Bitrot”
![Page 13: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/13.jpg)
File format obsolescence
![Page 14: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/14.jpg)
Forgetting what you have
![Page 15: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/15.jpg)
Forgetting what the stuff you have means
![Page 16: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/16.jpg)
Rights and DRM
![Page 17: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/17.jpg)
Lack (or disappearance) of organizational commitment
![Page 18: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/18.jpg)
One word: Geocities.
![Page 19: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/19.jpg)
?Ignorance
•“It’s in Google, so it’s preserved.” (Not even “Google Books!”)
•“I make backups, so I’m fine.”•“I have a graduate student who takes care of
these things.”•“Metadata? What’s that? I have to have it?”•“Digital preservation is an unsolvable problem,
so why even try?” (I’ve heard this one from librarians. I bet you have too.)
![Page 20: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/20.jpg)
Apathy
![Page 21: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/21.jpg)
Mitigating the risks:planning and auditing
tools
![Page 22: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/22.jpg)
Audit frameworks• Trusted Repository Audit Checklist
• (If you see “NARA/RLG” somewhere? This is the framework that evolved into TRAC. Long story.)
• You can get an actual formal TRAC audit from CRL! Who has? Portico, Hathi, “Chronicle of Life,” two-three others. This audit is HARSH. (So don’t write off a repo because it hasn’t had a TRAC audit.)
• If you hear the phrase “trusted digital repository,” it should mean that the repo has had (or is pursuing) a TRAC audit.
• DRAMBORA• More flexible, less finger-shaking than TRAC.• Less of this “designated community” nonsense.• Less dependent on OAIS model (which I consider a strength).• Encourages archives to consider and document their individual
situations and think hard about risk mitigation.
![Page 23: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/23.jpg)
Newer: SPOT model
•Even less clunky than DRAMBORA.• I quite like this one.• Identifying Threats to Successful Digital
Preservation: the SPOT Model for Risk Assessment
• http://www.dlib.org/dlib/september12/vermaaten/09vermaaten.html
![Page 24: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/24.jpg)
So what do they audit?•Mission (and adherence to it)•Plans and policies
• including contingency plans
•Staff infrastructure•Operations documentation
• including tech infrastructure, service infrastructure
•Sustainable funding•“Doing the right things with the stuff.”
• identifiers, ingest file format management, migration, etc.
•NOTICE WHAT’S FIRST ON THE LIST.• remember, the tech part is the easy part!
![Page 25: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/25.jpg)
TRAC, DRAMBORA, and DH
•TRAC, DRAMBORA, and SPOT are designed to audit repositories, not individual datasets, data files, or research projects.
• They assume a lot of infrastructure and (in TRAC’s case) a long-term time horizon that you probably aren’t.
•So if you’re trying to think through a project, where do you go?
• TRAC and DRAMBORA are probably overkill!• (Though parts of DRAMBORA won’t hurt you.)
![Page 26: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/26.jpg)
Data Curation Profiles
•Research project out of Purdue’s Digital Data Curation Center (“D2C2”)
•“Toolkit:” interview instrument, user guide for interview instrument, worksheet.
•Small library of completed profiles• Ignore the user guide. Grab the worksheet, and
use the interview instrument for reference.•http://datacurationprofiles.org
• You have to make a login to download the toolkit pieces.
![Page 27: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/27.jpg)
Mitigating specific risks
![Page 28: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/28.jpg)
Physical medium failure•Gold CDs are not the panacea we thought.
• They’re not bad; they’re just hard to audit, so they fail (when they fail) silently. Silent failure is DEADLY.
•Current state of the art: get it on spinning disk.•Back up often. Distribute your backups
geographically. Test them now and then.• Consider a LOCKSS cooperative agreement. Others have.
•Bitrot-detection techniques may help here too.•Any physical medium WILL FAIL. Have a plan
for when it does.
![Page 29: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/29.jpg)
“Digital forensics”•The art and science of investigating digital file
formats and media.• Reading obsolete ones.• Reverse-engineering and/or documenting existing ones so
they don’t go obsolete.• Ensuring secure deletion, when necessary.• Reconstructing what used to be on a physical storage
medium. (Surprising how often this is possible!)• Audit trails for legal and records-management purposes.• AMAZING report (highly highly recommended!): “Digital
Forensics and Born-Digital Content in Cultural Heritage Institutions.” http://www.clir.org/pubs/abstract/pub149abst.html. Both computer-nerdy and humanities-nerdy in the best possible way.
![Page 30: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/30.jpg)
Avoiding “bitrot”•Sometimes used for “file format obsolescence.”• I use it for “the bits flipped unexpectedly.”•Checking a file bit-by-bit against a backup copy
is computationally impractical for every day.• Though on ingest it’s a good idea to verify bit-by-bit!
•Checksums• A file is, fundamentally, a great big number.• Do math on the number file. Store the result as metadata.• To check for bitrot, redo the math and check the answer
against the stored result. If they’re different, scream.• Several checksum algorithms; for our purposes, which one
you use doesn’t matter much.• “Hash collision:” it’s possible, but unlikely, for different files
to have the same checksum. Potential hack vector!
![Page 31: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/31.jpg)
Migration vs. emulation:dealing with obsolescence
•Migration• change the file to be usable in new software/hardware
configurations• risks: information loss (FONTS!), imperfect transfer,
choosing the wrong migration path• smart systems don’t throw away the old files!
•Emulation• keep the file, train new software/hardware to behave like
the old• risks: imperfect emulation, impractical emulation• makes more sense for software (games!), less for files
•Pragmatically: redigitization.
![Page 32: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/32.jpg)
Finding tools•Migration
• Current versions of the original software may be able to open old files.
• Open-source software in the same genre may be able to translate proprietary file formats (often imperfectly). Tend to maintain translators longer than you’d think.
• Look on the web!• MIGRATE FAST. Once it’s damaged or obsolete, it’s
probably too late.
•Emulation• look for the gamers! it’s WILD what they’ll emulate!• Look to the open-source community for operating-
system, hardware-driver emulators.• Frankly, there’s a lot of hype and vaporware here.
![Page 33: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/33.jpg)
When is a PDF not a PDF?
•When it’s a .doc with the wrong file extension•When there’s no file extension on it at all•When it’s so old it doesn’t follow the
standardized PDF conventions•When it’s otherwise malformed, made by a
bad piece of software.•How do you know whether you have a good
PDF? (Or .doc, or .jpg, or .xml, or anything else.)
![Page 34: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/34.jpg)
File format registries and testing tools
•JHOVE: JSTOR/Harvard Object Validation Environment
• Java software intended to be pluggable into other software environments
• Answers “What format is this thing?” and “Is this thing a good example of the format?”
• Limited repertoire of formats
•PRONOM/DROID + GDFR = Unified Digital Formats Registry
•Wrapper tool: FITS, File Information Tool Set• JHOVE + DROID + various other testers. State of the art.
![Page 35: Risk management and auditing](https://reader034.vdocument.in/reader034/viewer/2022051207/5445a0c4b1af9fdb068b460c/html5/thumbnails/35.jpg)
Thanks!
•Copyright 2011 by Dorothea Salo.•This lecture and slide deck are licensed under a
Creative Commons Attribution 3.0 United States License.