demystifying pd fs
DESCRIPTION
TRANSCRIPT
![Page 1: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/1.jpg)
Demystifying PDFsBetsy Fanning
AIIM Nashville 2010
![Page 2: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/2.jpg)
Introduction to PDF Overview of PDF Standards Adoption of PDF Standards
Agenda
![Page 3: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/3.jpg)
951,000,000 PDF pages on Google
How Many PDF Files Are There?
![Page 4: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/4.jpg)
Introduction of PDF◦ Portable Document Format
Digital format for representing documents
PDF Files created Natively Converted from other electronic
formats Digitized from paper, microform, or
other format A specification for electronic files
representing documents specification for electronic files representing documents
Digital Documents
![Page 5: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/5.jpg)
Portable Document Format◦ Widely used world wide Business Government Libraries and archives
◦ Information must be kept for long periods of time◦ Must remain useable and accessible across
multiple generations of technology
![Page 6: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/6.jpg)
Reliable consistent viewing and printing Mix text, raster images, lineart, color Basic unit is the page Easy navigation, fast access to any page Small file size Dynamic
◦ Digital signatures◦ Forms
What is PDF?
![Page 7: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/7.jpg)
ISO 32000-1:2008, Document management – Portable Document Format – Part 1: PDF 1.7
2007 Adobe contacted AIIM to request assistance in taking PDF Specification to ISO
Exact replication of PDF Specification 1.7 including changes and amendments
ISO 32000-1: 2008 (PDF)
![Page 8: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/8.jpg)
Adds support for geospatial data Supports flash Added collections (portfolios) Allows for bar codes to be used with form fields Added structure elements for MathML Enhanced accessibility Incorporated ETSI TS 102 778 for digital
signatures Future – Reader improvements and possible
merging of PDF streams
ISO/CD 32000-2 (PDF 2.0)
![Page 9: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/9.jpg)
PDF is powerful and flexible May be too flexible for some applications Restrict subset of PDF Need higher degree of reliability May want standard in hands of neutral non-
commercial body – Internationally recognized standards body such as ISO
Focus on archive needs of government, corporations, libraries
Resolve issues with font embedding replacement
Why Standardize a Version of PDF
![Page 10: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/10.jpg)
Joint sponsors of the US PDF/A and PDF/E committees◦ AIIM, Association for Information and Image Management
Secretariat to ISO/TC 171 and ISO/TC 171/SC2 Secretariat to US Technical Advisory Group (TAG) for
ISO/TC 171
◦ NPES, The Association for Suppliers of Printing, Publishing, and Converting Technologies Secretariat to ANSI Committee for Graphic Arts
Technologies Standards (CGATS) Secretariat to US TAG for ISO/TC 130
Joint sponsors of PDF Healthcare committee◦ ASTM International
Role of AIIM and Partners
![Page 11: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/11.jpg)
ISO Joint Working Groups (JWG) for PDF Standards
◦ ISO/TC 171/SC 2, Document management applications – Application issues
◦ ISO/TC 130, Graphic technology
◦ ISO/TC 46/SC 11, Information and documentation – Archives/records management
◦ ISO/TC 42, Photography
◦ ISO TC 184/SC4, Automation systems and integration, Industrial data
◦ ETSI, European Telecommunications Standards Institute
◦ PDF/A Competence Center
Role of ISO
![Page 12: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/12.jpg)
Multi-part ISO International Standard◦ ISO 19005-1:2005, Document management –
Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1)
◦ Part 2 (19005-2) intended to bring PDF/A into conformance with ISO 32000
◦ Part 3 (19005-3) Embedded documents
◦ And additional future parts, as necessary
The PDF standard
![Page 13: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/13.jpg)
PDF/X, ISO 15930 *◦ Pre-press data exchange
PDF/A, ISO 19005 (Parts 1, 2 and 3)◦ Archiving electronic documents
PDF/E (Engineering), ISO 24517-1◦ For engineering, architectural, and GIS documents
PDF/E (Engineering), ISO/NWP 24517-2◦ Archive engineering, architectural, and GIS documents
PDF/UA (Universal Access), ISO/CD 14289-1◦ Intended to address Section 508 concerns
PDF Healthcare◦ Exchange of electronic health records (CDA and CCR)
PDF, ISO 32000-1 (ISO/CD 32000-2) PDF/VT, ISO 16612 (2 parts) *
◦ Variable data exchange PRC, Product Representation Compact (ISO/CD 14739-1)
* Not AIIM Responsibility
PDF Standards
![Page 14: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/14.jpg)
Graphic technology – Prepress digital data exchange – Use of PDF (PDF/X)
Specifies the use of PDF for the dissemination of complete digital data, in a single exchange, that contains all elements for final print reproduction.
ISO 15930 (PDF/X)
![Page 15: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/15.jpg)
Specifies how to use PDF to define and exchange all content elements and supporting metadata to produce predictable output for variable or transactional document content
ISO 16612 (PDF/VT)
![Page 16: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/16.jpg)
“This International Standard specifies how to use the Portable Document Format (PDF) 1.4 for long-term preservation of electronic documents”
◦ Applicable to documents containing character, raster, and vector data
◦ The standard does not address: Processes for generating PDF/A files Specific implementation details of rendering PDF/A files Methods for storing PDF/A files Hardware and software dependencies
ISO 19005-1:2005
![Page 17: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/17.jpg)
Court documents protect citizen’s rights Access is assured in trial courts for 20 to 40 years
for the Judiciary Access is often time sensitive On-site courthouse storage not cost effective Court decisions are permanent records held “until
the end of the republic” by the National Archives Document format conveys critical information,
which must be rendered accurately Cases – New York Southern, Enron, etc. 20 years of filings are in PDF
Background for PDF/AJudiciary Use Case
![Page 18: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/18.jpg)
6 years
12 years
20 years
50 years
Person lifetimes
Life of legal business entity
Forever/historical
0% 10% 20% 30% 40% 50% 60% 70% 80%
Page 18
Records ArchiveDo you have electronic records that need to be retained for: (check all that apply)
Most organizations will be keeping some records for a very long time.
N=144, all respondents .
![Page 19: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/19.jpg)
Native format PDF PDF/A TIFF XML XPS JPEG Digital video/ audio
Print and archive in hard copy
Not archived
at all
Scanned documents 3% 48% 8% 29% 0% 0% 5% 1% 1% 4%
Electronic documents 48% 27% 7% 6% 1% 0% 2% 0% 4% 5%
Photo images 20% 4% 0% 6% 0% 1% 59% 2% 1% 8%
Email 64% 7% 4% 2% 4% 0% 0% 1% 4% 14%
Video/CCTV recordings 23% 0% 0% 0% 0% 0% 1% 35% 1% 39%
Audio recordings 23% 0% 0% 0% 1% 0% 0% 35% 1% 40%
Web pages 33% 7% 0% 1% 13% 0% 0% 1% 1% 42%
Telephone recordings 15% 0% 0% 0% 0% 1% 0% 17% 0% 67%
Instant messages 15% 0% 1% 1% 1% 0% 0% 1% 1% 79%
Page 19
Archive File TypesHow are the following content types mostly archived in your organization?
N=139, all
![Page 20: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/20.jpg)
NARA defines:“…the ability to access an electronic record throughout its lifecycle, regardless of the technology used when it was originally created”
Characteristics of Sustainable Formats◦ Published documentation and open disclosure◦ Widespread adoption and use ◦ Self-describing formats◦ External Dependency◦ Impact of Patents◦ Technical Protection Mechanism
Sustainable Formats
![Page 21: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/21.jpg)
TIFF◦ Well known◦ Difficult to create digitally born documents◦ Indexing documents may be difficult
XML◦ Many schema exist◦ Preserves content not the structure
Native File Formats◦ Several file formats ◦ May render differently depending on the device or platform
used PDF
◦ Widely adopted◦ Feature rich◦ Reliable and secure
File Formats
![Page 22: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/22.jpg)
PDF/A is intended to address three primary issues:◦ Define a file format that preserves the static
visual appearance of electronic documents over time
◦ Provide a framework for recording metadata about electronic documents
◦ Provide a framework for defining the logical structure and semantic properties of electronic documents
PDF/A
![Page 23: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/23.jpg)
Guarantees the secure reproduction of documents◦ No technology requirements
Ensures an homogeneous archive◦ Digital born and scanned documents in same
archive Valid throughout the world
◦ ISO maintained standard Sustainable file format
◦ Standards exist, files are self-documenting, adoption
Why PDF/A?
37% still have separate image and electronic archives
![Page 24: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/24.jpg)
Native (eg, DOC, XLS)
HTML (eg, emails, web)
TIFF
JPEG
PDF/A
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Page 24
Records ArchiveDo you store a significant proportion of your records in any of the following formats?
PDF/A making some ground at 30%.
Native formats still very prevalent.
N=144, all respondents .
![Page 25: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/25.jpg)
Still using PDF
Mostly using native formats
File size is too large
Files contain multimedia
Files contain digital signatures
Files contain XML
We don't have any documents worth archiving
0% 10% 20% 30% 40% 50% 60% 70% 80%
Page 25
PDF/AWhat are the main reasons you are not using PDF/A?
PDF/A benefits still not understood
N=102, Non-PDF/A Users.
![Page 26: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/26.jpg)
“This International Standard specifies how to use the Portable Document Format (PDF) ISO 32000-1 for long-term preservation of electronic documents”
◦ Applicable to documents containing character, raster, and vector data
◦ The standard does not address: Processes for generating PDF/A files Specific implementation details of rendering PDF/A files Methods for storing PDF/A files Hardware and software dependencies
ISO/DIS 19005-2
![Page 27: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/27.jpg)
Additional features in ISO 32000-1 (PDF 1.7)◦ PDF/A-1 based on PDF 1.4
JPEG 2000 Image Conversion◦ Added compression process (PDF 1.5)◦ Higher compression rates, better quality
Embedding PDF/A within Collection◦ Compile PDF/A collections
Transparency◦ Permitted in PDF/A-2
Digital Signatures◦ Follow ETSI/PadES Standard
PDF Layers (“Optional Content”)◦ Helpful for technical drawings◦ Multilingual content
What is in PDF/A-2?
![Page 28: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/28.jpg)
Two Conformance Levels◦ PDF/A-1a and PDF/A-2a
Compliance with all requirements of 19005-1 Including those regarding structural and semantic tagging
◦ PDF/A-1b and PDF/A-2b Compliance with all requirements of 19005-1 minimally
necessary to preserve the visual appearance of a PDF/A file◦ PDF/A-2u
Compliance with all requirements of 19005-2 except those requirements for logical structure of the document
Preserves the visual appearance of the file and ensures any text in the document can be reliably extracted as a series of Unicode code points.
PDF/A Conformance
![Page 29: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/29.jpg)
Will not replace or supersede PDF/A-1 Few tools will be available initially Look at new features Understand your requirements – then
decide PDF/A-1 is and will remain a valid file type
Considerations PDF/A-2
![Page 30: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/30.jpg)
Centralized resource
Outsource service provider (BPO) onshore
Offshore service provider
Distributed to point of use/line of business
No plans to back-file convert
0% 10% 20% 30% 40% 50% 60%
Page 30
Backfile Conversion to PDF/AHow would you characterize your strategy to convert
your existing documents to PDF/A?
32% driving back-conversion centrally
N=40, PDF/A users.
![Page 31: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/31.jpg)
Within 1 year
Within 2 years
Within 3 years
Within 5 years
Unlikely
I’ve not heard of PDF/A-2
0% 5% 10% 15% 20% 25% 30% 35% 40%
How soon do you plan to converge to PDF/A-2, when it is published?
Backfile Conversion to PDF/A-2
One third of PDF/A users have not heard of PDF/A-2
Another third will converge to PDF/A-2 in 3 years or less.N=40, PDF/A users.
![Page 32: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/32.jpg)
Software used to view documents
Software used to create documents
Electronic document files
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Always Probably Possibly No
Would you reject a tool or application that was not tested to a conformance standard ?
PDF/A-2 Tools
80% expect to use conformance certified creation tools.
N=40, PDF/A users.
![Page 33: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/33.jpg)
Document management – Electronic document file format for long-term preservation including embedded files – Part 3: Use of ISO 32000-1 (PDF/A-3)
Specifies the use of PDF for preserving the static visual representation of page based electronic documents over time in addition to allowing any type of other content to be included as an embedded file or attachment
ISO/NWI/CD 19005-3 (PDF/A-3)
![Page 34: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/34.jpg)
AccessibilityDoes your content need to be accessible (able to be accessed and read by assistive technologies)?
There is a recognition of accessibility regulations.
N=144, all respondents .
Always
Some of it
Should be but isn't
No
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
![Page 35: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/35.jpg)
Document management applications – Electronic document file format enhancement for accessibility (PDF/UA) – Use of ISO 32000-1 (PDF/UA-1)
Specifies how to use PDF to produce electronic documents which are accessible
Does not specify:◦ Processes for converting paper or electronic
documents◦ Storage of PDF/UA documents◦ Specific design, user interface, implementation or
other details for rendering
ISO/CD 14289-1 (PDF/UA)
![Page 36: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/36.jpg)
Document management – Engineering document format using PDF – Part 1: Use of PDF 1.6 (PDF/E-1)
Specifies the use of PDF for the creation of documents used in engineering workflows. It does not define:◦ Method of electronic distribution◦ Method of creation or conversion from paper or
electronic documents to the PDF/E format◦ Specific technical design, user interface, or
implementation◦ Required hardware or methods for validation
ISO 24517-1:2008 (PDF/E)
![Page 37: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/37.jpg)
Addresses need for reliable exchange of engineering documentation◦ Secure distribution of intellectual property◦ Reliable exchange and change management (multiple
application types and platforms)◦ Reduces costs associated with paper (distribution as well as
storage/archive) Covers 3 primary areas:
◦ Compact, accurate printing of engineering drawings◦ Support for exchanging/managing annotation and comment
data◦ Incorporation of complex data into PDF (3D, object level data,
etc.) Part 2 – Update to ISO 32000-1 and archive capabilities
ISO 24517-1: 2008 (PDF/E)
![Page 38: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/38.jpg)
Document management – 3D use of Product Representation Compact (PRC) format – Part 1: PRC 10001
Describes a file format for 3D content data for the purposes of 3D visualization and exchange.
Used for creating, viewing and distributing 3D data in a document exchange workflow
ISO/CD 14739-1 (PRC)
![Page 39: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/39.jpg)
39
What is PDF Healthcare? A “Best Practices Guide” describing
attributes of the Portable Document Format (PDF) to facilitate the capture, exchange, preservation and protection of healthcare information◦ Share data easily between healthcare
institutions◦ Ease the transition into digital health records
for information exchange and sharing◦ Bridge the gap between healthcare providers
and consumers
![Page 40: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/40.jpg)
40
PDF Healthcare Background eHealthcare is a reality in today’s environment PDF advantages in healthcare
◦ Long-standing success and adoption of PDF◦ PDF provides a secure and universal container for
multiple data types regardless of data source or destination
◦ PDF is platform- and system-neutral◦ PDF allows for interoperability and bi-directional
information exchange◦ Selected records can be easily and quickly printed from
PDF when necessary
![Page 41: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/41.jpg)
41
Initial PDF Healthcare Offering Best Practices Guide
◦ Describes the attributes of the Portable Document Format (PDF) that are relevant to facilitate the capture, exchange, preservation and protection of healthcare information
Implementation Guide / Use Cases◦ Supplemental information that will provide
examples of interoperability with existing healthcare standards such as ASTM’s Continuity of Care Record (CCR)
![Page 42: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/42.jpg)
42
Additional PDF Healthcare Offering
PDF Healthcare Supporting the Clinical Document Architecture: White Paper◦ Discusses the implementation of PDF Forms in
support of the HL7 Clinical Document Architecture (CDA) to simplify, secure, and speed transactions between entities with varying levels of automation
Creating PDF Forms for the CDA: Implementation guide◦ Supplemental information that will provide
examples of various forms, i.e., Emergency Information Form for Children with Special Needs that support a subset of the CDA schema
![Page 43: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/43.jpg)
Proposed Legislation – PDF/A
Alabama Alaska California (Repealed
10/19/2010) Connecticut Florida Idaho Kentucky
Missouri Nevada New York Ohio Wisconsin
![Page 44: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/44.jpg)
PDF/A Adoption Europe
◦ Standard eBilling (Organisation for Promotion of Automated Accounting)
◦ Germany, France, Austria, Switzerland, Poland, Norway
Brazil China MoREQ2
U.S. Nuclear Regulatory Commission
U.S. District Courts NARA Library of Congress
![Page 45: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/45.jpg)
PDF/A-1 compliance is not enough◦ Comply with NARA’s transfer instructions for
records in PDF◦ Provide transfer documentation◦ Must comply with image quality specifications for
transfer of permanent records◦ Must use OCR processes that do not alter the
original bit-mapped image
NARA Guidelines
![Page 46: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/46.jpg)
Opportunities Conversion
◦ Paper based ◦ Electronic files to PDF subsets
Validation◦ Isartor Test Suite◦ Bavaria Report (PDFLib)◦ Adobe Acrobat Preflight
Data cleanup◦ Metadata◦ Embedding Fonts and images◦ Tagging
Consulting and recommending use of PDF/A Conversion of Healthcare records
![Page 47: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/47.jpg)
Betsy Fanning Ph: +1.301.755.2682 Skype: betsy.fanning Email: [email protected] Twitter: bfanning LinkedIn: www.linkedin.com/in/betsyfanning PDF Standards – www.aiim.org/standards Get involved – Service Companies still
needed for AIIM’s National Standards Council (NSC)
Questions/Contact
![Page 48: Demystifying pd fs](https://reader034.vdocument.in/reader034/viewer/2022051514/54b8e09a4a7959df298b460e/html5/thumbnails/48.jpg)
http://www.mach2solutions.net/pdf/pdf.html
PDF Demonstration URL