piping hot: little bins in big workflows
DESCRIPTION
PIPING HOT: Little Bins in big workflows. Alex Garnett Digital Preservation & Data Curation SFU Library. Thesis: I am a terrible programmer. Thesis: I am a terrible programmer. 2 0% of you are thinking “no kidding!” The other 80% of you are thinking “uh huh. Stupid false-modest shmuck .”. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/1.jpg)
PIPING HOT:Little Bins in
big workflows
Alex GarnettDigital Preservation & Data
CurationSFU Library
![Page 2: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/2.jpg)
![Page 3: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/3.jpg)
Thesis: I am a terrible programmer
![Page 4: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/4.jpg)
Thesis: I am a terrible programmer
• 20% of you are thinking “no kidding!”
• The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”
![Page 5: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/5.jpg)
Thesis: I am a terrible programmer
• 20% of you are thinking “no kidding!”
• The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”
• Who needs impostor syndrome when you have a bash shell?
![Page 6: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/6.jpg)
![Page 7: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/7.jpg)
![Page 8: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/8.jpg)
![Page 9: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/9.jpg)
• For the record, this is the payoff from all those colonoscopy jokes. Yep.
![Page 10: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/10.jpg)
![Page 11: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/11.jpg)
But how does it apply to libraries?
[If MJ Suhonos is here this year, this is his cue to groan
audibly]
![Page 12: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/12.jpg)
LIBRARY PROBLEM #1: PDFA
• ProQuest wants PDFA submissions from now on
• “now on” apparently = the past five years’ backlog
• We have to convert five years of theses!
• This is now also being used at the UofA.
![Page 13: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/13.jpg)
![Page 14: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/14.jpg)
LIBRARY PROBLEM #2: ARCHIVES PROBLEM:
LIBRARY HARDERSTARRING BRUCE
WILLIS
CRAP, I USED UP THE WHOLE SLIDE ON THE
TITLE
![Page 15: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/15.jpg)
• Archives needed a GUI tool to be able to create restrictive FTP accounts for donors.
![Page 16: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/16.jpg)
LIBRARY PROBLEM #3:PDF REDACTION (IT’S LIKE THE FIRST ONE
BECAUSE NO ONE LIKED THE SEQUEL,
DOES ANYONE WANT TO WATCH TEMPLE OF
DOOM LATER, OH HELL I’VE DONE IT AGAIN)
![Page 17: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/17.jpg)
• We learned we had some poorly redacted PDFs
• Blackout meant to obscure text; still selectable
![Page 18: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/18.jpg)
• Solution:– Detect offending pages with
ghostscript…• (this is the hard part; dumping PDF guts is
appalling)
![Page 19: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/19.jpg)
• … and then:– Snip offending pages with pdftk– Convert them to images with imagemagick– OCR back into PDF (minus obscured text)
with tesseract and fix up the dimensions with gs again
– Paste back in with pdftk.– 5 lines, all free tools! Documentation &
piping.
![Page 20: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/20.jpg)
Takeaway
• If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way
![Page 21: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/21.jpg)
Takeaway
• If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way
• There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.
![Page 22: PIPING HOT: Little Bins in big workflows](https://reader035.vdocument.in/reader035/viewer/2022062516/56812a83550346895d8e199e/html5/thumbnails/22.jpg)
Takeaway
• Open-source command line tools are really good these days! They are powerful, they are straightforward, and they are often cutting edge.
• There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.