the national archives and records administration digitized by nara and its digitizing partners, ......
Post on 11-May-2018
219 Views
Preview:
TRANSCRIPT
The National Archives and Records Administration (NARA)
Electronic Records Archives (ERA)
ERA Misconceptions and Facts • ERA is not yet operational.
– NARA is relying on ERA every day to preserve and provide access to electronic records, and has been operating since 2008. See About the ERA Program for a description of the different instances of ERA that have been deployed since 2008.
• ERA is a paper digitization and scanning project – ERA is primarily an archive for "born-digital" electronic
records created in the course of business by Federal agencies on their computer systems. However, ERA does have the capability to preserve and make available the electronic data resulting from scanning projects that digitize Federal records. As more and more analog Federal records are digitized by NARA and its Digitizing Partners, as well as other Federal agencies, they will be available using ERA's Online Public Access interface.
About the ERA Program
• Developing ERA will cost the American taxpayer a projected $1.4 Billion. – The total appropriation for the development
phase of the ERA project, including development, program management, and operations and maintenance costs from 2002 to 2011, will total $457 Million. Development will conclude on September 30, 2011. Operations and maintenance costs thereafter are expected to be $25 Million to $30 Million per year for at least the next couple of years.
• ERA is just an electronic data storage archive. Why couldn't NARA just buy storage devices at a technology superstore like everyone else does? – The requirement to provide a true digital archive for the National
Archives that is capable of applying all the laws and regulations that apply to Federal, Presidential, and Congressional records means that ERA is far more complicated than just a set of data storage devices. For example, ERA also provides workflow support for many of the transactions that occur between NARA and its agency customers, capabilities to process and preserve electronic records, and an access interface to make electronic records available to the public.
• ERA can't solve the problem of long-term preservation of electronic records as hardware and software technology changes over time.– ERA allowed NARA to make a quantum leap forward
in the preservation of electronic records and building a flexible and adaptable framework that will let NARA evolve as electronic recordkeeping evolves. Without ERA, NARA's legacy systems and processes for electronic records would not be able to handle the increasing volumes we are beginning to encounter.
• The public will be able to access and conduct full-text searches against all of the electronic records in ERA– The Online Public Access (OPA) piece of ERA does provide access
and full-text search capabilities for electronic records that have been determined to be free of any access restrictions, and that contain text-based content. Many of the transfers of electronic records NARA receives have some access restrictions that prohibit NARA from releasing them to the public. NARA is required to protect sensitive information about individuals, for example, and it must conduct time-consuming reviews of these records to determine records that can be released, and those that need to be withheld under applicable laws and regulations. Additionally, all Presidential electronic records are subject to the Presidential Records Act and the access provisions therein. Therefore, while OPA does provide the capability to search the content of electronic records, that capability will not apply to all electronic records in the ERA System.
• ERA will "normalize" all electronic records into one single format. – While ERA is implementing a Transformation Framework that
will allow for the conversion of electronic records from one format to another more persistent format, many of the electronic records will remain in their original format until such time as the format approaches obsolescence. Many of the formats we encounter are ubiquitous, and can be rendered by easily by the most popular, and often freely-available, software programs. NARA will continually evaluate the formats we receive from Federal agencies to make sure the essential characteristics of electronic records in our holdings are preserved.
ERA's history • 1969-1993: Formation, stagnation, then
rejuvenation • 1993: Armstrong v. The Executive Office of the
President • 1998: Transition to e-Government supports an
archives of the future • 2000: Establishment of the ERA Program
Management Office • 2004: Seeking a development contractor • 2008: Initial Operating Capability • 2009-2011: Last phases of development
ERAElectronic Records Archives
http://www.archives.gov/era/
The National Archives and Records Administration (NARA)
Archives II, College Park, MD
Hubert Wajs Ph. D.hubert.w@wp.pl
Toruń2007.12.06
• Historia Projektu (Timeline)• Organizacja Projektu
– Środki (finansowanie)– Partnerzy
• Cele Projektu• Problemy i wyzwania
Timeline I• 1997.12 John W. Carlin Archivist of the United States
powołał do życia Electronic Records Work Group. • 1998 r. NARA otrzymała pierwsze środki rządowe z National
Science Foundation na rozpoczęcie badań nad problematyką archiwów elektronicznych przez tę grupę roboczą.
• 2000 r. został utworzony w NARA ERA Project Office, aby prowadzić badania nad przechowywaniem elektronicznych dokumentów (zaprojektowanie systemu ERA).
• 2002.10.31 - NARA Directive 101- Part 3, Section 6.- Program ERA rozpoczął się oficjalnie.
• 2005.08. – NARA wybrała Lockheed Martin CorporationLockheed Martin Corporation do budowy ERA system.
• 2011 – Full operating capability of ERA system.
FinansowanieROK MLN USD UWAGI
1999 1,8 NA 3 LATA
2001 20 PREZYDENT
2002 22,3 KONGRES
2003 12
2004 36 PREZYDENT
2005 308 KONGRES DO 2012
Partnerships in innovation - research partnerships
• Wiodące instytuty i instytucje IT– National Institute of Standards and Technology, – NASA, – Army Research Laboratory, – San Diego Supercomputer Center, – Georgia Tech Research Institute, – National Center for Supercomputing Applications, the
University of Maryland• Oraz ‘archiwistyczne’
– INTERPARES, – The Library of Congress
What is ERA?ERA Vision (2004)
• ERA will authentically preserve and provide access to any kind of electronic record, free from dependency on any specific hardware or software, enabling NARA to carry out its mission into the future.
08.2006• ERA will be a
comprehensive system for preserving and providing continuing access to any type of electronic records created anywhere in the U. S. Federal Government enabling NARA to carry out its mission into the future.
[Jarrellann Filsinger]
And what about „authentically preserve records”?
• Identification and Authentication– A password– A token– A fingerprint
• Access control– Roles to perform particular function– No more privilege than necessary to perform a job
• Audit log• Integrity
– Message digest– Virus detection– Encryption
• System assurance– Policies, standards, procedures
ERA Objectives by David Lake 08.2006
• To preserve any type of electronic records, • created using any type of application, • on any computing platform,• delivered on any digital media• from any entity in the Federal Government and
any donor,• to provide discovery and delivery to anyone with
an interest and legal right of access,• Now and for the „Life of the Republic”.
Po co? czyli Dlaczego?• W amerykańskim kodeksie administracyjnym
przyjęto oficjalną definicję dokumentu (44 U.S.C. 3301), stanowiącą, iż niezależnie od nośnika wszystkie dokumenty związane z działalnością administracji przechowywane lub wytypowane do przechowywania są dowodami.
• 1998.10.22 - General Records Schedule 20 (GRS 20)
• 2000.03.20 - wyroku Sądu Najwyższego
The Presidential Projects• The Tennessee Valley Authority: Electricity for All
– 1933.05.18 FDR - Tennessee Valley Authority Act. • TVA was to improve navigability on the Tennessee River.
• 1942 MED - Manhattan Engineering District– The Manhattan Project
• The Apollo Project – 1961.05.25 President Kennedy delivering his famous Moon speech
• "... I believe this nation should commit itself to achieving the goal, before this decade is out, of landing a man on the Moon and returning him safely to the Earth."
– 1969. 07.20 Apollo 11• The Independence Project
– 1973.11 President Richard M. Nixon prescribed antidote for the energy crisis
• ERA - ‘the showcase of American research’
NARA - Zdobyte doświadczenia • Electronic and Special
Media Records Services Division (NARA) od 25 lat gromadzi dokumenty elektroniczne.
• Zasób – 15 TB – taśmy – DLT 8000– IBM 3480 – także dokumentacja
papierowa: • dokumentacja techniczna
tego, co jest w bazach danych
• dokumentacji procesu postępowania z dokumentami
1993.01.20 - do 20 2001.01.20
http://www.clintonlibrary.gov/
• Przejęto e-maile z Białego Domu – 40 mln obiektów.
• Depesze (electronic diplomatic messages) z Departamentu Stanu – 25 mln (jeszcze nie są udostępniane).
• Zrzuty stron internetowych głównych agencji rządowych (60% wszystkich agencji federalnych) oraz
Jak to opanować?• Jak
– opisać w tradycyjny sposób? – dokonać kontroli i migracji na nowe nośniki i formaty?– pomieścić w magazynach?
• Dotychczasowe podejście prowadzi w ślepą uliczkę, gdyż coraz powszechniejsze stają się:– poczta elektroniczna i – systemy obiegu dokumentów.
• A dojdą do tego jeszcze:– nagrania filmowe (cyfrowe), – telewizja wysokiej jakości (HDTV), – modele 3D (szczególnie VMR – Vritual Model Reality), – wspomagana komputerowo dokumentacja inżynierska
(systemy CAD) GIS (Geographical Information Systems).
The Electronic Records Archives
• Program ERA to nie prosta kontynuacja dotychczasowych doświadczeń.
• Systemu ERA służącego do: – Przyjmowania („Submission”), – przechowywania („Repository”) i – udostępniania („Dissemination”) dokumentów
elektronicznych; – ma też podtrzymywać zarządzanie przez NARA
różnego typu dokumentami elektronicznymi przez cały okres ich istnienia (lifecycle).
Reference Model for an Open Archival Information System (OAIS) - ISO 14721; 2003
• Digital repository– Submission Information Packages – SIPs – Archival Information Packages - AIPs– Dissemination Information Packages – DIPs
Wyzwania• Przyrost zasobu w
postępie geometrycznym• Tempo wdrażania
innowacji– Nośniki– Sprzęt– Oprogramowanie
• Antycypowania przyszłych (3-5 lat) nowości w: – metodach komunikacji – nośnikach
Projekt Persistant Archives • Grid
– NARA +– Uniwersytetem stanu Maryland +– Centrum Superkomputerowym w San Diego
• przestrzeń koncepcyjną dla badania data objects – rozproszenie zasobu w różnych miejscach– zarządzanie dokumentami niezależne od platformy
sprzętowej i programowej• IBM, DELL, SUN, APPLE
i programowej• SUN, UNIX, LINUX, WINDOWS
Authenticity linking of identity metadata to a record
– Date record is made– Date record is transmitted– Date record is received– Date record is filed– Name of author (person or organization issuing the record)– Name of addressee (person or organization for whom the record is
intended)– Name of writer (person or organization responsible for the record content)– Name of originator (e-mail address of sender)– Name of recipient(s) (person or organization to whom the record is send)– Name of creator (person or organization in whose archival found the record
exists)– Name of action matter (transaction or activities in the course of which the
record is created)– Name of documentary form (e-mail, report, memo etc.)– Identification of digital components– Identification of attachments (digital signature)– Archival classification code– Assertion about the creation of record
Using a DATA GRID I
1. User asks for data from the data grid
2. The data is found and returned1. Where and how details are
hidden
2.1.
DATA GRID
Using a DATA GRID II1. User asks for data from the data grid2. Data request goes to Storage Resource Broker
(SRB)3. Server looks up data in Metadata Catalog4. Catalog tells which SRB server has data
1. Data grid has arbitrary number of servers (addresses and logical files name)
2. Heterogeneity of data (formats) is hidden from users5. 1st server asked 2nd server for data6. The data is found and returned
Virtualization• Logical arrangement of digital records• Persistent identifier for the record is the logical file
name.• Arrangement hierarchy imposed on the logical file
name as collection hierarchy (record group, record series, file, item) associated with Life Cycle Data Requirements Guide attributes with each level of the collection hierarchy.
• Information about all operations performed upon digital record are mapped to the logical file name.
• Logical file name is the link between authenticity information and the record.
Federation of five independent Data Grids
NARA I NARA II SDSCGT UMd
MCatMCatMCat MCat MCat
SRBSRBSRBSRBSRB TL
TL
E-records• Project PERPOS (Presidential Electronic
Records PilOt System)• E-mails from the George Herbert Walker
Bush’s White House (1989-1993) • William Underwood from Georgia Tech –
Information Technology & Telecommunications Laboratory– Semantic technologies
• Information extraction• Named entity task
– Automation of search and ‘description’ of e-holding– metadata
White HouseCorrespondence
March 27, 1990Dear Mr. Allen
Thank you very much for your letter of March 15, 1990 which stated your concerns and suggestions regarding the Americans with Disabilities Act.
In order to fulfill President Bush’s campaign promise of bringing Americans with handicaps into the mainstream of American life, the Bush Administration support the objectives of the A.D.A
As you may know, the bill is still in House Committee for consideration and change. You can be sure that your thoughts have been fully noted at are appreciated.
Sincerely, Doug WeadSpecial Assistant to the President for Public Liaison
Ray Allen, PresidentAmerican Cultural TraditionsP.O. Box 1995Washington, D.C.20013
Named Entities Extractedfrom the Letter
<date>March 27, 1990</date><greeting>Dear</greeting><person>Mr. Allen</person> <p>Thank you very much for your letter of <date> March 15, 1990</date>
which stated your concerns and suggestions regarding the Americans with Disabilities Act.</p>
<p>In order to fulfill <name>Bush”s</name> campaign promise of bringing Americans with handicaps into the mainstream of American life, the Bush Administration supports the objectives of the A. D. A. </p>
<p>As you may know, the bill is still in <organization>House Committee</organization> for consideration and change. You can be sure that your thoughts have been fully noted and are appreciated.</p>
<formula of respect>Sincerely</formula of respect><person>Doug Wead</person>
<job title> Assistant to the President for Public Liaison</job title> <person> Ray Allen</person> <job title>President</job title><organization> American Cultural Traditions </organization><postal address>P.O. Box 1895 </postal address><location>Washington,
D.C. </location><zipcode>20013</zipcode>
Producer - Archive Workflow Network – (PAWN)
• przekazywania cyfrowych obiektów od twórcy (z aktualnego środowiska ) do archiwum (rozumianego tak, jak zakłada model OAIS)
• standard METS (MEetadata Transmission Standard)– ‘push model’ – gdy wytwórca przygotowuje dane do
przekazania– ‘pull model’ – gdy przejmuje je samo archiwum. – Przy transferze danych korzysta się z podpisu
cyfrowego (PKI)
Co już zrobiono?• Przygotowanie systemu wymagało ogromnych prac wstępnych:• grupy ekspertów z każdej z 18 agencji rządowych określających
działanie systemu (‘co?’), – ‘What?’ should ERA do - not ‘how?’
• drobiazgowe modele, procedur i wymagania (łącznie określono około 850 requirements).– Concept of Operations
• Dokumentacja projektowa dostępna na stronie: (http://www.archives.gov/era/about/documentation.html
• System ma mieć otwartą budowę modułową opartą na open source dla administracji federalnej USA
• Przetarg dwustopniowy: – (1. etap) wyłonienie dwóch firm, osobno pracujących nad projektem (1 rok)
– (2.. etap) wyłonienie ostatecznego zwycięzcy Lockheed Martin Lockheed Martin Corporation Corporation który będzie budował ERA system (2005.09.05)
Life cycle of the records
OfficeRecords manager
Archives
Time
Creation Semi active Appraisal and selection
Archivist
ACCESS
top related