the historymakers digital video library national...the historymakers digital video library

20
The HistoryMakers Digital Video Library – Final Report - 1 - The HistoryMakers Digital Video Library Final Report to IMLS Award: IMLS National Leadership Grant LG-03-03-0048-03 Institution: The HistoryMakers Final Report Period covered: From October 1 st , 2003 to December 31, 2005 Project Director: Julieanna Richardson, Executive Director Telephone: (312) 674-1900 E-mail: [email protected]

Upload: vanquynh

Post on 24-Apr-2018

232 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 1 -

The HistoryMakers Digital Video Library

Final Report to IMLS

Award: IMLS National Leadership Grant LG-03-03-0048-03 Institution: The HistoryMakers Final Report Period covered: From October 1st, 2003 to December 31, 2005 Project Director: Julieanna Richardson, Executive Director Telephone: (312) 674-1900 E-mail: [email protected]

Page 2: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 2 -

PURPOSE OF THE PROJECT

The purpose of the project funded by IMLS grant LG-03-03-0048-03 was the digitization, encoding, transcription, indexing and cataloging of 400 videotaped oral history interviews from The HistoryMakers' collection, using the video indexing technology developed by Carnegie Mellon University’s Informedia Digital Video Library Digital Video Library.

BACKGROUND OF THE PROJECT

Oral History and Digital Technology Until recently, oral history recordings have been considered very cumbersome research sources.1 Typically, since few copies existed, travel was necessary to access them. Furthermore, most collections have only collection-level or minimal interview-level cataloguing, so a researcher would be required to listen to hours of recordings to locate relevant passages. Even when a time-coded transcript existed, these linear recordings needed to be fast-forwarded and rewound to access particular passages. For this reason, the transcript often became the preferred or primary source of access to the interview’s contents.

In 1997, Donald A. Ritchie of the U.S. Senate Historical Office, in tracing the development of oral history theory and practice, noted the diverging opinions on the relative importance of original tapes vs. transcripts and described how the two forms of oral history, the media and then the accompanying transcript occupied almost separate realms:

“Over time, a consensus developed that tape and transcript were both important records of an interview and made for different uses. Transcripts facilitate the writing of books and articles. Easily scanned and photocopied, transcripts have allowed interviews to be cited as background information, paraphrased, quoted, or reproduced in full in a steadily increasing volume of literature. By contrast, audio and video recordings of interviews have been used extensively in museum exhibits, radio broadcasts, and documentary films.”2

The practical advantages of using transcripts alone have often been considered worth the sacrifice of being able to hear or experience nuances of emotion and speech rhythms and, in moving images, of facial expressions and gestures. Yet, the moving images add rich layers of meaning to the printed text, and sometimes, even alter its meaning. Documentaries provide a solely passive learning experience since the filmmaker controls the selection and presentation of clips. The same is generally true of museum exhibits, although, in recent years, some exhibits have tried to provide the visitor more control over the exhibit experience.

As noted In the same 1997 article quoted above, Ritchie commented: "Now, new technology promises to reunite sound and print from their divergent paths."3 The past decade has seen this promise kept as the digital revolution opens unprecedented opportunities for access to oral histories in their original audiovisual format. Nonlinear media files allow users to move quickly from one point to another in the recordings. Audiovisual content is linked to the transcripts and the metadata then brings the convenience of the transcript to the original recordings. These changes make possible new ways of exploring the material, as noted by historian Dr. Michael Frisch, Professor, Senior Research Scholar at the State University of New York–Buffalo and a member of The HistoryMakers' National Advisory Board, in a recent article:

“Oral history audio and video can now be placed in an environment in which rich annotation, cross-referencing codes, and other descriptive or analytic “meta-data” can be linked to specific passages of audio/video content. By searching or sorting by means of these reference tools, the

1 Frisch, M. ‘Oral history and the digital revolution: toward a post-documentary sensibility’ in The Oral History Reader, 2nd Edition , ed. Robert Perks and Alistair Thomson (London: Routledge, in press) 2 Ritchie, D., ‘Oral history: from sound to print and back again’, OAH Magazine of History, 1997 3 Ibid.

Page 3: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 3 -

audio and videotaped materials themselves…can be searched, browsed, accessed, studied, and selected for use at a high level of specificity.” 4

As evidenced by the technology developed by Carnegie Mellon Informedia Digital Video Library Digital Video Library, these tools are particularly valuable for: 1) facilitating access to the recordings of large video oral history collections; 2) allowing a researcher greater facility to explore the collection; 3) making greater use of video oral history recordings possible and allowing members of the public to conduct their own research, giving them greater control over what they want to see and hear or search and retrieve.

The HistoryMakers The HistoryMakers, a 501(c)(3) non-profit institution operating as a special collection as part of the Illinois State library system, is a national African American video oral history archive. The HistoryMakers' purpose is to record the life histories of individual African Americans and to preserve and make easily accessible these interviews in order to create a new and more accurate historical record of African American achievement and culture highlighting the accomplishments of individual African Americans and African American-led groups or movements. The goal is to also demonstrate the broad range of African American responses to the historical events and trends of the 20th and the 21st centuries. In its six years of existence, The HistoryMakers has grown into the nation’s largest African American video oral history archive, with a collection of 1,300 interviews and a goal of 5,000 by 2011. The HistoryMakers is committed to exposing its collection to the widest possible audience through collaborations with libraries, museums, academic institutions and community organizations; through public programs in various cities and through the media--including PBS broadcasts of The HistoryMakers' events, home video releases and a website5 rated by Google as one of the top African American websites on the Internet. At The HistoryMakers, interviews are recorded on Betacam SP videotapes, which are stored in an off-site climate-controlled media vault. Duplicate Beta SP masters and VHS and/or DVD access copies are stored in The HistoryMakers' archive room, where the temperature is maintained at 68-73°F and humidity at 35-45%. Paper records are stored in the archive room as well in half-Hollinger boxes. The interviews average 3 hours in length with the shortest interview being 1 ½ hours and the longest being 15 hours.

Informedia Digital Video Library Digital Video Library The work done under this $163,800 Institute of Museum and Library Sciences(IMLS) grant was made

possible due to a unique collaboration between The HistoryMakers and the Informedia Digital Video Library Digital Video Library (“Informedia Digital Video Library”) 6 project at Carnegie Mellon University’s School of Computer Science. Informedia Digital Video Library, under the direction of Dr. Howard Wactlar, has been working since 1994 as part of the Digital Libraries Initiative, with support from the National Science Foundation, the Defense Advanced Research Projects Agency, and the National Aeronautics and Space Administration. Informedia Digital Video Library’s goal is to develop a terabyte digital video library system. Currently,that system includes the following functions: Sphinx automatic speech recognition software; automatic speech alignment with text; dynamic time warping; geo-coded content for map displays and queries; noise reduction filtering; rapid retrieval system; video optical character recognition; automatic indexing and video paragraphing/segmentation; natural language processing;

4 Frisch, M. (in press). Oral history and the digital revolution: toward a post-documentary sensibility. In Perks and Thompson, eds., The Oral History Reader, 2nd Edition (in press) 5 http://www.thehistorymakers.com 6 http://www.Informedia.cs.cmu.edu/

Page 4: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 4 -

video skimming; face detection and tracking software; incorporation of existing databases, dictionaries and thesauri including a “named faces” database; use of artificial intelligence techniques to create metadata.

Previously, Informedia Digital Video Library devoted much of its work to television news and public affairs video. More recently, it adapted its technology to a cultural/historical subject using the award-winning multimedia CD First Emperor of China.7 By collaborating with The HistoryMakers, Informedia Digital Video Library was able to apply its technology to a large video oral history collection. The goal is to make the collection’s content more accessible to scholars, documentary producers, teachers and students and members of the media as well as the general public.

PROJECT ACTIVITIES

I. EXPLORATION AND START-UP ACTIVITIES

EARLY MEETINGS AND CONSULTATIONS

In October 1993, The HistoryMakers formed a project advisory board composed of Julieanna Richardson, Executive Director of The HistoryMakers; Howard Wactlar, director of the Informedia Digital Video Library Digital Video Library; Bryan Maher, Manager Of Systems Engineering at Carnegie Mellon University (“CMU”), Nancy John, University Librarian at the Richard J. Daley Library, University of Illinois at Chicago (“UIC”); Ellen Starkman, UIC Library Systems Coordinator, and members of The HistoryMakers' National Advisory Board and Scholar/Consultant Corps: Walter Hill (National Archives), Darlene Clark Hine (Northwestern University), David Levering Lewis (New York University), and Michael Frisch (State University of New York-Buffalo) to help guide its work on this project.

In November, 2003, staff from The HistoryMakers and Executive Director, Julieanna Richardson, and Executive Assistant, Joan Flintoft, along with Melissa Keaton of CMU, attended the IMLS-sponsored training seminar in Washington, D.C. where they learned the appropriate methodology for outcome-based evaluation and its applicability to The HistoryMakers’ IMLS-funded digitization project. During the training session, “Inputs”, “Activities”, “Services”, “Outputs” and “Outcomes” were identified. The “Inputs” included: 1) the identification of an archivist; 2) the identification of processing equipment (2-3 encoding stations, 3 decks, 3 processing stations); 3) the donation of Informedia Digital Video Library technology to The HistoryMakers; 4) the interviewing for 2 production assistants; “Activities” included: 1) the hiring of staff; 2) the purchasing of equipment; 3) the training of The HistoryMakers staff on the use of the Informedia Digital Video Library technology; 4) the contracting with a transcription service; 5) the purchasing of a data server for storage; and 6) the installation of Informedia Digital Video Library’s Segmentor software on 7 computers at The HistoryMakers.. “Services” included: 1) the donation of the Informedia Digital Video Library software in order to produce a searchable digital archive searchable by keyword, location, dates and images and 2) producing a test site. “Outputs” included: 1) 400 transcribed interviews; 2) 400 interviews processed (digitized, encoded, proofread, catalogued, and segmented). 3) the appropriate staff trained; 4) equipment purchased; and 5) technology given to The HistoryMakers by Informedia Digital Video Library;. In terms of “Outcomes”, this included the beginning of a test site or sites and the testing of the database by focus groups.

In January 2004, Julieanna Richardson and Edward Williams of The HistoryMakers traveled to CMU where they were given demonstrations of how Informedia Digital Video Library technology would be applied to The HistoryMakers' content, given software training and discussed with Informedia Digital Video Library staff ways to make the Segmentor application for editing and cataloguing more appropriate for The HistoryMakers' corpus. This was done by adding: 1) a field for manually selecting index terms from a controlled vocabulary; 2) a toolbar allowing the marking of the last entry or index point in the transcript and video; and 3) the ability to record, capture and display video time codes.

7 Wactler, H. and Chen, C., Enhanced perspectives for historical and cultural documentaries using Informedia technologies, in Proc. JCDL (2002), 338-339

Page 5: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 5 -

The HistoryMakers held focus groups for potential users of the digital archive and spent significant time in consultation with taxonomists, cataloguers, oral historians, educators and video indexing specialists including Dr. Gary Marchioni, University of North Carolina; Dr. Corrine Jorgensen, Florida State University; Dr. Abby Goodrun, Syracuse University; Dr. James Turner, University of Montreal, Canada; and Sara Shatford-Layne, UCLA Library Cataloguing Center, Nancy John, UIC. This was done in order to gather feedback and expert advice regarding the types of metadata and classification systems for both topical and content-based indexing and retrieval techniques.

SELECTION OF INTERVIEWS FOR DIGITAL VIDEO LIBRARY

The 400 interviews [Appendix A] selected represented a broad range of careers, life experiences and geographical origins. There was also an equal representation of both well known and lesser known male and female HistoryMakers. This list, originally compiled in early 2004, was revised during the grant period to include more recent interviews. The goal is to eventually include al 5000 interviews in The HistoryMakers’ Digital Video Library.

ACQUIRING AND INSTALLING HARDWARE AND SOFTWARE . The HistoryMakers already owned several Sony UVW-1800 Betacam decks and PC workstations, During the course of the grant period, IBM donated a server-class machine, IBM tower model 86475BX, configured with Windows 2000 Server(ersion 5.0.2195, SP4, Build 2195). The HistoryMakers also purchased a Promise V-trak 15100 Raid 5 hard drive array (2 terabytes of storage) to store video files;3 additional IBM Pentium 4.2.6 Ghz PC workstations; 2 encoding stations (Insignia Pentium 4.2.8 Ghz and Intel Pentium 4.2.8 Ghz) and ULEAD Video Studio 8 software for video encoding. After Informedia Digital Video Library made the changes to its Segmentor application as discussed at the planning meeting, the Segmentor software was installed on 7 workstations at The HistoryMakers headquarters.

START-UP CHALLENGES

The HistoryMakers faced with several start-up challenges. These challenges significantly delayed the start of the project and therefore, after the initial six months The HistoryMakers requested a one-year extension.

One of the challenges faced by The HistoryMakers was determining how its corpus could best be processed with the existing Informedia Digital Video Library system. Significant differences existed between The HistoryMakers’ video oral history interviews and the television news footage used by the Informedia Digital Video Library to develop its software platform. For example, The HistoryMakers corpus consists primarily of “talking heads” with the camera focused on the subject from the waist up or with closer shots of the interviewee’s face. On the other hand, while television news footage incorporates “talking heads” of the news anchors, it also includes a series of moving shots of places and people. These differences required some changes be made to Informedia Digital Video Library software and it affected what had been originally described in The HistoryMakers original IMLS grant application. In this proposal, 100 interviews were to be transcribed using Informedia Digital Video Library's Sphinx automatic speech recognition (“ASR”) technology. The Sphinx system had shown a word error rate (“WER”) as low as 10% for news text spoken in the lab, 24% on 30-minute evening news broadcasts and higher in less controlled situations.8 It performed best on Caucasian, North American males speaking in a standard news anchor style. However, the spontaneous speech of The HistoryMakers' African American, predominantly elderly, interview subjects from various regions posed a much greater challenge for the use of the Sphinx technology. Another crucial factor was reliance and need by The HistoryMakers for written and accurate transcripts. Although ASR-generated transcription with WER of 30-40% had been shown to permit fairly robust retrieval, 9 The HistoryMakers' goal was not only retrieval within the text of

8Christel, M., Speech Recognizer Results, http://www.Informedia.cs.cmu.edu/dli2/talks/Oct30_99/sld026.htm 1999. 9 Hauptmann, A.G. and Wactlar, H.D. Indexing and Search of Multimodal Information, International Conference on Acoustics, Speech and Signal Processing (ICASSP-97), Munich, Germany, April 21-24, 1997.

Page 6: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 6 -

the interview, but search retrieval that would permit retrieval of concepts or subjects that were inferred, but not specifically spoken. It became clear that an ASR-generated transcript would be insufficient for this goal. Furthermore, The HistoryMakers knew that the production of a high quality, readable transcript could be used separately in other contexts and would become a part of its collection. Another difference in processing television news video and oral history video was in the area of automatic segmentation technology, in which scene and shot changes are detected to create segments. A typical oral history interview gives no such visual cues, necessitating manual segmentation to create thematic segment divisions. However, Informedia Digital Video Library technologies translated well in other ways to The HistoryMakers' corpus. For example, while ASR was not used to generate transcripts, it was useful in speech alignment processing, as were other Informedia Digital Video Library features allowing improved retrieval and creative visualization. (See Outputs section, below.)

EARLY PROCESSING WORK

Informedia Digital Video Library processed 26 HistoryMaker interviews for a prototype database, installing the Informedia Digital Video Library software, video files and metadata on a laptop computer. Over the course of the grant period, this laptop was shown hundreds of times at conferences and meetings to elicit informal feedback. To begin its work, The HistoryMakers had to make some decisions about its cataloging practices. However, it found a paucity of information available about the cataloging of oral history interviews on a passage or segment level. In fact, Nancy MacKay, a college librarian from California, had initiated a survey of oral history curators after having found “an amazing lack of information” about oral history cataloguing practices. 10 During the first year, The HistoryMakers hired three part-time graduate students in the summer of 2004 to begin processing interviews. However, after their initial cataloguing was done and reviewed by Melissa Keaton of Informedia Digital Video Library who found numerous errors, it became clear that it would be impossible to achieve either the quantity or quality of work desired with this level of staffing. A group of full-time cataloguers would need to be hired to complete the project.

II. PRIMARY PROCESSING ACTIVITIES

HIRING NEW STAFF In the summer of 2004, The HistoryMakers hired Frederick Adams and Harvey Baker as video/archival technicians. They were responsible for making DVD and VHS copies of each interview in addition to digitizing and encoding as MPEG1 files. The HistoryMakers contracted with Pittman Enterprises, Simmons-Lathan Media Group and an independent transcriber, Gloria Swanson, to transcribe 300 interviews. Also, five volunteers were recruited to proofread the 100 transcripts that had already been transcribed..

In October, 2004, The HistoryMakers hired Cheri Pugh. Pugh had ten years experience as archivist/historian for WPA Film Library, one of the largest archival film and video collections in the United States. She also had experience as a documentary producer, a degree in history from Northwestern University and knowledge of African American history. She began recruiting and interviewing for a team of six full-time project fellows. She also developed transcription, proofreading and indexing procedures along with a comprehensive screening test [Appendix B] to test candidates’ aptitude for finalists that measured ability to audit and edit an oral history interview, skill at proofreading, research and abstract writing and aptitude for content analysis. The team hired had diverse educational and professional experience (including oral history, journalism, library science, African American history and art history); varied interests and avocations and different geographic and social backgrounds; this diversity combined into a group knowledge bank that was helpful in working with the extremely broad scope of subject matter covered in The HistoryMakers' corpus. Five project fellows were hired between December 2004 and February 2005, another in April.

10 MacKay, N., Curating Oral Histories Survey Results, http://people.mills.edu/mackay/FinalSurvey%20results.htm

Page 7: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 7 -

STANDARDS AND PROCEDURES

Procedures manuals In 2002, The HistoryMakers developed a set of guidelines for formatting, proofreading and editing transcripts, based on standard oral history practices. One of the Project Manager’s first tasks in 2004 was to develop more detailed policies and procedures and to document these in separate manuals for: 1) transcribing interviews (Appendix C); 2) proofreading transcripts (Appendix D); and 3) cataloguing and indexing interviews using Informedia’s Segmentor application (Appendix E). Over the course of the project, the Project Manager and the team of project fellows continued to refine these standards based on their day to day implementation of the procedures. The procedures were refined and revised and the manuals were updated accordingly. .

Encoding The video/archival technicians were responsible for digitizing each analog 30 minute Betacam SP videotape from the 400 selected interviews(2400 videotapes), encoding each videotape as a separate MPEG1 video file and saving the file on the data server. The technicians also created other access copies on VHS videotape or DVD for transcription. They checked on problems reported by project fellows and made changes including audio boosting, color correction or re-encoding. If a problem was found to have originated on the Betacam SP submaster, they ordered the original Betacam SP master tape from the vault to check its condition and to make another Betacam SP submaster, if necessary. Informedia Digital Video Library staff checked the MPEG-1 files and worked with The HistoryMakers to optimize the quality of its digitizing, encoding and indexing process .

Transcription The 100 pre-existing transcripts(done prior to the grant period) were mostly of poor quality, reflecting a lack of subject matter knowledge and a difficulty in understanding the speakers. During the grant period, The HistoryMakers contracted with three transcription sources, all familiar with African American subject matter, to transcribe the 300 remaining interviews. Simmons-Lathan Media Group of Los Angeles transcribed 40 interviews during 2004, while Pittman Enterprises and an independent transcriber, Gloria Swanson, were employed throughout the project. The Chicago transcribers came on a monthly basis to pick up 10-20 interviews. They then sent copies of written transcripts to The HistoryMakers via e-mail. The 100 pre-existing transcripts required extensive proofreading and editing. Transcripts produced during 2004, while generally better, were variable in quality. In January 2005, The HistoryMakers distributed the new Transcribers’ Manual and held meetings to explain the new standards and answer questions. Transcribers were also given direct feedback from the project fellows. This resulted in improved transcript quality, which reduced the subsequent time spent editing the transcripts.

Volunteer Proofreaders The 100 poorly done transcripts were given to volunteer proofreaders to read. Their responsibilities included auditing (listening to the interview and editing to make sure that the text matched the spoken words); checking and correcting spellings, including those of named entities; and editing the text to conform to style standards. During the first year, 6 volunteers (a small group from UIC library and a few individuals) proofread 30 interviews, but some expressed confusion about the guidelines. The Project Manager addressed their questions and other issues in a Proofreaders’ Manual, which was distributed to volunteers in early 2005. At the same time The HistoryMakers added 15 volunteers, including a team of eleven employees from Chicago-based Harris Bank and a few individuals. In addition, during the summer of 2005, three interns from the African and African American Studies Department at the University of Illinois at Champaign-Urbana assisted with proofreading. Quality varied greatly amongst the volunteer proofreaders and the project fellows had to recheck the proofreaders’ work, but this required less time than if the interviews had not been proofread and edited. Proofreaders were given corrected copies of their early work for their review and so that they improve the quality of their proofreading.

The HistoryMakers found it difficult to recruit and retain volunteer proofreaders. Often those who expressed initial interest wanted to support the organization and thought it would be interesting to watch interviews. They often did not continue after doing one interview (i.e. out of approximately 40 people who expressed interest at a major recruitment meeting, only 15 eventually proofread more than one interview.)

Page 8: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 8 -

Two professional proofreaders were hired and ultimately, 202 transcripts, approximately 50% of the total number, were proofread by persons other than the project fellows.

Project Fellows The project fellows’ responsibilities included: 1) preparing transcripts for processing with the Segmentor software; 2) creating tape-level metadata including abstracts; 3) segmenting the encoded video and the accompanying transcripts; 4) auditing, proofreading and editing the transcripts; 5) fact checking and adding information to the transcripts; 6) indexing the encoded segments using primarily LOC subject headings; 7) evaluating the quality of interview; 8) sending the processed files to CMU for quality control review; 9) making corrections; and 10) sending the corrected files to CMU for post-production processing.

Preparing transcripts The typewritten transcripts had to be prepared in order to be processed in the Segmentor application. Titles, time codes and other extraneous material were deleted. The transcript was also converted from a Microsoft Word document into a text document. Certain punctuation was changed, because the Segmentor was not able to recognize it. Representations of sounds such as “uh”, “um” etc. were also deleted in order to create a more readable transcript. Project fellows used the Find/Replace tool to correct common spelling or capitalization errors and then divided the transcript at the points that the tapes were changed by the videographer(approximately every 30 minutes), creating a corresponding text file for each MPEG1 file.

Creating tape-level records

To start the processing of an interview, the project fellow would open an MPEG1 file and its corresponding transcript text file in Informedia Digital Video Library's Segmentor application (See Figure 1) and create a “Project” with tape-level metadata including the interview date and location, the interviewee’s name (using Library of Congress Name Authority Files), the interviewer’s and videographer’s names, the date of the interview and the length of the interview. The project fellow was responsible for writing an abstract summarizing the contents of each tape.

Figure 1: Basic interface of IDVL Segmentor application

Segment titles list

Segment transcript window

Video viewing and editing controls

Video play window

Segment title window

Page 9: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 9 -

Segmenting videos Segments were created for the dual purpose of defining annotation intervals and for their subsequent function as retrieval units. Within each tape-level “Project”, project fellows used the Segmentor’s video editing controls to divide the MPEG1 video at natural boundaries creating thematic segments, averaging four to six minutes in length. The edited portion of this part of the transcript was then cut and pasted into the Segment Transcript Window, and the segment was titled. The segment transcript, title and codes defining segment boundaries for each segment in the video were saved under the “Project” metadata in one XML file. (See Appendix F ) For all details on Segmentor functions see Cataloguing Manual, Appendix B.)

Auditing/Proofreading/Editing Project fellows were required to audit, proofread and edit the transcript as discussed previously to check for errors in names, geographical locations and dates.

Annotation within transcript In order to improve the search process for the end user, the project fellows added information, set off by brackets for name completion (“Martin [Luther King, Jr.]”) locations (“Savannah [Georgia]”) and acronyms (“NCNW [National Council of Negro Women]”). To also improve user access, it was necessary to periodically insert information that may have been stated earlier in the interview, for example “I was the first black student there [at Tulane Law School, New

Project fellows also added other information(i.e. “Hiram Revels [first African American U.S. senator, who served from 1869-1871]” or “Jackie Robinson [baseball player]” .)The intention that The HistoryMakers Digital Video Library be used by a broad spectrum of people, including students at all educational levels, raised questions about the amount of detail needed to be inserted in the transcripts. For example, a notation like: “We were so excited when we heard about Brown [v. Board of Education of Little Rock, Arkansas].” is understandable, but should the cataloguer also explain in the brackets what Brown v Board is? Or, if an interviewee says “My father was a Pullman porter,” should there be a definition of a Pullman car and explanation of the work of the Pullman porter or the significance of the Pullman porter in the African American community? Such extensive annotation might be quite useful to aid in comprehension for elementary or secondary pupils or others who may have a limited knowledge of the subject matter. However, for a more knowledgeable researcher, it was not necessary to stop the flow of the transcript to add information that was really not needed. In an example like “Jackie Robinson [baseball player]”, it might be argued that everyone knows Jackie Robinson. Determining what is too much or too little is a very subjective decision and the greatest obstacle to a consistent indexing process. The HistoryMakers, however, decided that its primary goal was to provide access to a large number of interviews, rather than a small number of extensively annotated interviews.

Indexing

Indexing processes The HistoryMakers initially discussed the development of a web-based interactive controlled vocabulary system that would operate under the supervision of a volunteer cataloguer from the University of Illinois library, Dolores Jungheim Barber. However, The HistoryMakers lacked the appropriate funding. and this work fell outside of what was required under the grant. At first, indexing was done by all project fellows when the indexing process involved each project fellow segmenting, proofreading, auditing, editing, writing topic headings and assigning metadata to each segment of the interview. However, in order to improve quality control, the project fellows were divided according to three different groups. One group did the segmenting, another group proofread and audited/edited writing topic headings and the third group assigned the metadata to each segment. The cataloging project fellows selected indexing terms from a drop-down menu in the Segmentor. This menu was linked to a text file located on The HistoryMakers' server. Initially a flat, strictly alphabetical list was used. This proved too cumbersome and so, the initial list was re-arranged using a hierarchical structure, making it easier to locate the appropriate index terms.

Page 10: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 10 -

Choice of indexing schema The HistoryMakers chose to use manual indexing with a controlled vocabulary as an alternate or complement to keyword searches. The basic retrieval unit in Informedia Digital Video Library software is the Segment (a “video paragraph” averaging 3-5 minutes) and interviews are indexed at this “segment” level. The HistoryMakers received various recommendations from indexing experts. It seemed that no single existing controlled vocabulary met all the needs for this extremely broad collection, but the most appropriate was Lorene Byron Brown’s Subject Headings for African American Materials (“Brown’s”).11 Brown, an associate professor at Clark Atlanta School of Library and Information Studies, had assembled an extensive listing of over 3,000 terms derived from 14 standard sources, including African American history publications and the Library of Congress Subject Headings (LCSH). Brown’s headings were not radically different from LCSH; and most were either already existing LCSHs or were later adopted by the Library of Congress. Brown’s headings are presented in standard LCSH format and can easily be combined with LCSH. The clear advantage of LCSH was their widespread acceptance. However, only two of the project fellows had LCSH experience. Other advisors recommended using a much simpler controlled vocabulary based on post-coordination. One idea was to avoid both strings and double-concept terms. However The HistoryMakers determined that this system would not provide the specificity needed to make the subject headings useful as search and browse tools. The HistoryMakers ended up using something in between, very similar to OCLC’s Faceted Application of Subject Terminology (FAST)12, a schema using LCSH vocabulary with a simplified syntax. This system uses topical subdivisions of topical terms but avoids geographical and chronological subdivisions of topical terms. Locations and dates were indexed separately. Not only was this simpler than creating LCSH strings with locations and time periods, it could also be used with Informedia Digital Video Library's geographical and temporal query and search results visualization features. To facilitate such features, emphasis was placed on indexing by name, date and place whenever this information available or could be approximated.

Types of subject headings used LCSH /Brown’s (minus chronological and geographical subdivisions) were used wherever possible and make up the great majority of the list. The HistoryMakers initially selected a partial set of ca. 2000 Brown’s African American subject headings and then additional terms were added as needed. The list now numbers ca. 5400. (Appendix G.)

A user study by the Survivors of the Shoah Visual History Foundation found that researchers turn to oral history as well for personal reactions, experiences, thoughts and feelings about historical events and wanted terms helping them to access such passages. Descriptor suggestions included terms such as “guilt” and “courage”.13 With this in mind, The HistoryMakers included descriptors for emotions, states of mind and personal qualities, whether expressed during the interview or recalled as the interviewee discussed past experiences. Most of these terms were existing LCSH. In some cases, The HistoryMakers found it necessary to create new headings. Similar to the University of Southern Mississippi’s Digital Civil Rights in Mississippi Archive,14 The HistoryMakers saw that LCSH at times did not provide the specificity needed for its material. For example, both projects found the need to create more specific headings to distinguish between civil rights demonstrations. The HistoryMakers also

11 Brown, L. B. Subject Headings for African-American Materials. Libraries Unlimited, Englewood, NJ, 1995. 12 Dean, R.,”FAST: Development of Simplified Headings for Metadata”, presented at Authority Control: Definition and International Experiences conference, Florence, Italy, 2003.; http://www.oclc.org/research/projects/fast/ 13 Soergel, et al., The many uses of digitized oral history collections: Implications for design. MALACH Technical Report, College of Information Studies. University of Maryland (2002) 14 Graham, S. and Ross, D., “Metadata and Authority Control in the Digital Civil Rights in Mississippi Archive”, Journal of Internet Cataloguing, Vol.6 Issue 1, Haworth Press, Inc., 2003.

Page 11: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 11 -

found that certain topics that were not represented in LCSH were regularly discussed in interviews —for example, skin color prejudice among African Americans—and that users might want terms helping them to find such sections. In creating new terms, The HistoryMakers sometimes consulted other existing thesauri including ERIC, HASSET, UNESCO and the Getty TGN, and was guided by the standards in ANSI/NISO Z39.19. Emphasis was placed on finding terms in use in published works.

Unable to find an existing model for fine-level indexing of autobiographical material, volunteering UIC librarians suggested a special category of headings grouped under the node ‘Autobiographical’ as a means to distinguish, for example, between someone talking about his own job as opposed to work or careers in general. Other examples of the use of this node of include an interviewee talking about his/her childhood or describing his/her own personality, or for expressions of personal beliefs and ideas. Also indexed here are certain questions asked in each interview such as “What are your hopes for the African American community?”

Names For proper names, Library of Congress Name Authorities was used if possible. It was permitted to add dates to personal names. For names with no LCNAR, new headings were created based on established formats (in the case of personal names, Last, First, M.I., YYYY- and year of death if applicable.). Authority control was maintained by the project fellows keeping records of sources used for new names added.

Interview Evaluations It was determined that in addition to the work required under the grant, each project fellow would be required to complete an evaluation form evaluating interviewer’s technique, the interview’s audio/video quality as well as its historical relevance. The evaluation form was later streamlined to reduce the additional time this added to processing. Since then, the form has been relied on to evaluate interviewers and to write finding aids. At The HistoryMakers' September 2005 training summit for interviewers and videographers, project fellows gave presentations to the oral history interviewers. The Informedia Digital Video Library Segmentor cataloguing application and the prototype of the client software were demonstrated, and areas needing improvement in the interview process were illustrated by showing relevant segments. Project fellows, using the evaluation forms, discussed how interviewers could improve their technique to assist in the cataloging process. It was suggested that the interviewers standardize questions asking for locations and dates of events discussed and the spellings of personal names, technical terms and other hard to understand references. Increased attention in this area decreased the time required for cataloguers to research this information. The interview teams and the participating scholars unanimously gave high ratings to this input from the team of project fellows.

Quality Control Assessment at CMU After a group of interviews were catalogued and indexed, The HistoryMakers transferred the interviews’ XML and MPEG-1 files to Informedia Digital Video Library via CMU’s FTP server. Informedia Digital Video Library staff then conducted a quality control check (i.e. audio and video quality, segment boundaries, correct style for titles, data filled in correct fields, spelling, style and informational notes) and noted problems or suggestions on a Quality Control form (See Appendix H.). Project fellows would make the metadata corrections while the Technical Assistants would make changes to video and audio levels.

"Post Production Processing" by CMU’s Informedia Digital Video Library After corrections were made, files were transferred back to CMU to be added to the library.These files were then processed by Informedia Digital Video Library in several different ways. The audio processing system separates the audio track from the MPEG1 file, decodes the audio and down-samples it to 16kHz, 16bit samples which are then rocessed by the large-vocabulary speech recognition system. The automatic speech alignment system then links the video recording to the text. Location information from the interviews is combined with data from the external gazetteers to create geographical metadata with latitude/longitude for each segment which is then capable of display in on-screen maps. Informedia Digital Video Library also extracts “named entities” from the text, such as people, locations or organizations and these are used for search purposes.

Page 12: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 12 -

The video is processed through Informedia Digital Video Library's Video Optical Character Recognition system. This system recognizes images of text that appear onscreen. While not applicable to the majority of the interview recordings, this feature may be found useful for interviewees’ photos in which signs or documents appear or for The HistoryMakers' edited videos of live events, in which the VOCR will capture all onscreen titles and credits.15 Informedia Digital Video Library image matching and facial recognition technologies are also not really useful for The HistoryMakers' corpus aside from their use for locating images within the various photos. The data generated in “post processing” by Informedia Digital Video Library is then incorporated manually at The HistoryMakers, resulting in The HistoryMakers Digital Video Library.

Tracking of Interview Processing The process of encoding the Betacam SP videotapes, transferring access copies to and from off-site transcribers and proofreaders, assigning interviews to project fellows for segmenting and addressing problems with video or transcripts was tracked on an Microsoft Excel spreadsheet stored on the server at The HistoryMakers. (See Appendix J.) Only interviews that were completely encoded and transcribed were assigned for further processing. In addition, priority was given to interviews that had already been audited/proofread by volunteers. Also, attempts were made to balance the length and transcript quality of interviews assigned each week. From April-December, 2005, the Project fellows’ work was also tracked in more detail, by individual task, on a wall chart that could be seen by everyone.

ORGANIZATION OF WORK AND RATE OF PROCESSING

Individual vs. group processing Throughout the grant period, The HistoryMakers was constantly challenged with developing the best workflow methodology for volume processing by human cataloguers while maintaining quality control. Late in 2005, The HistoryMakers decided to transfer its processing methodology from having an individual project fellow responsible for all the work (after transcription and encoding) for a given interview--more of a “craftsman” model to having one interview processed in an “assembly line” system by a group of project fellows. Initially, the “assembly line” approach had been rejected under the theory that the redundancy would result in increased processing time and processing errors. Under this approach, only the person auditing would need to listen to the entire interview, but others would have to read the transcript to segment or index. In addition, the project fellow assigned to segment the interview would need to listen to the interview as well in order to set segment boundaries. It was also thought that the “assembly line” system might be more likely to result in the project fellows’ “burning out” from repetitive work. Therefore, The HistoryMakers initially opted for the “craftsman” model. This model was used from December 2004 through March 2005, but the average rate for processing an interview as was 10 hours of labor per hour of video (See Appendix J.). This rate of processing was clearly not adequate.

Therefore, in April, 2005, The HistoryMakers opted to try the “assembly line” approach. The project fellows decided to divide the post-encoding and transcription work for processing into six basic categories. Work was tracked via a wall chart listing interviews ready to be processed (up to ten at a time were added); and divided into columns by tasks. Project fellows volunteered for specific jobs on specific interviews, crossing off the columns as they completed the work; a final column was crossed off when the files were then transferred to CMU for processing. This allowed each project fellow to specialize in the area(s) in which he or she was most skilled, resulting in increased efficiency and less burn out with repetitive tasks. Plus, this “assembly line “ approach provided for built-in quality control. Within the first month of the new system, there was a decrease in processing time from ten to eight hours of labor per hour of video and an increase in weekly group output from six interviews per week to eight. By the end of April, 305 interviews had been encoded, 260 transcribed, and 127 proofread; only 110 interviews had been fully segmented, edited, catalogued and indexed. One more project fellow was hired at this time.

15 Wactler, H., Christel, M., Gong, Y., and Hauptmann, A. “Lessons Learned from Building a Terabyte Digital Video Library”, IEEE Computer 32, 2 pp. 66-73

Page 13: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 13 -

In June 2005, with 160 interviews processed, it was apparent that The HistoryMakers would not complete 400 interviews by the deadline of September 30, 2005, so a three-month extension from IMLS was requested. Additional attempts were made to further speed up the rate of processing. By September 2005, all 400 interviews had been encoded but 11 interviews were being re-encoded because of problems with the initial encoding process. Within the next three months, 4 more interviews had to be re-encoded, 347 interviews were transcribed, 309 interviews were segmented and catalogued, and the cataloguing time was reduced from 5:56 per hour of video to an average of 2:58 per thirty-minute tape.

Segmentation All of the project fellows were trained on the segmenting process. This involved reading the transcript for content and establishing natural segments based on content. This function was easier for some fellows than others. The fellows were able to reduce the time required for segmenting in half, from an average of 50 minutes per tape (1:40 per hour of video) in April-May to an average of 24 minutes per tape (48 minutes per hour of video) in August-September of 2005. This was done by employing the Find/Replace function in Microsoft Word and being able to make mass changes for common errors. Also the process of writing abstracts and adding tape-level metadata was also dramatically reduced from 13 minutes per tape to an average of 6 minutes per tape. Also, some longer interviews(6 hours) were replaced with shorter interviews(1 ½ hours).

The most time-consuming part of the cataloguing process involved the tasks of auditing/editing and fact-checking. This included checking names, geographic locations and dates as well as and adding more detailed information inside brackets(i.e. Congressman[John] Lewis). The amount of time audit/editing and fact checking varied greatly, based on the quality of the transcript and whether it had been already proofread; the number of names of entities, locations etc., the quantity of information needing verification, the fame or obscurity of mentioned personalities, and the speed and clarity of the interviewee's speech. Reducing the time spent involved: 1) reducing the notations to those required for comprehension or retrieval; and 2) limiting the cataloguer’s responsibility for checking the accuracy of statements made in interviews. These procedural changes resulted in a decrease from the spring of 2005 from 1:55 per tape (3:50 per hour of video) to 1:04 per tape (or 2:08 per hour of video) in the autumn of 2005. During the final two months of processing, the project fellows and tasks were divided into two basic groups: two people preparing the text files and segmenting the videos, and five fellows doing the rest of the work. Having calculated the number of tapes remaining to be processed in order to finish the 400 interviews, this number was divided by work days remaining. Each day the number of remaining tapes was announced to the entire group. The Cataloguing Director also maintained a chart on the server showing both individual and group progress. This helped to created team spirit while ensuring that each fellow was held accountable.

SEGMENTOR SOFTWARE REFINEMENT

During the early months of processing, software problems and loss of data caused interruptions to the flow of work, causing delays. However, Informedia Digital Video Library's Brian Maher developed new versions to fix the “bugs” and to increase stability. An auto-save feature, a more intuitive user interface and warning messages worked to reduce user error. In the Segmentor Version 1.0.9 version, a validation control was included to automatically check for segments that were unusually short or long with overlapping time codes or missing data. These corrections were necessary before the project fellows could send the processed files to Informedia Digital Video Library for further processing.

LESSONS

There were numerous lessons learned by The HistoryMakers and the project fellows during the course of the grant period. They are found below:

Page 14: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 14 -

Improving the interview process and the questions asked by the interviewer(i.e. the spelling of hard to recognize names or geographical locations) would make cataloguing process much easier.

Paying attention to transcript quality early on also improved the cataloguing process. Also, subject matter familiarity was a key factor in transcription accuracy.

Dividing the cataloguing work for each interview among a group of cataloguers proved to be a better and quicker method guaranteeing more quality control and resulting in less burn out.

Audit/editing and fact checking was one of the most time consuming parts of the cataloguing process.

Establishing the balance between quality and quantity was a difficult one especially given a limited budget.

The HistoryMakers still faces the ongoing challenge of funding its ongoing processing and must explore other means to do so(i.e. grants, individual giving, earned income model). In doing so, it must constantly engage in a cost benefit analysis of the benefit of detailed processing and the benefits to users by doing so.

OUTPUTS OF PROJECT ACTIVITIES The following outputs resulted from IMLS National Leadership Grant 03-03-0048-03:

Files for 400 interviews encoded in MPEG-1 format 400 proofread, edited interview transcripts 400 XML files with metadata with time-codes dividing the video into 18,254 segments 400 interview finding aids Manual for Cataloguers Manual for Transcribers Manual for Proofreaders List of subject headings Improved version of Informedia Digital Video Library Segmentor cataloguing software(Version

1.09) Interview evaluations for 400 interviews Creation of The HistoryMakers Test Digital Video Library Database

TEST DIGITAL VIDEO LIBRARY DATABASE

Page 15: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 15 -

The creation of The HistoryMakers Test Digital Video Library Database was the singularly most important result of the grant. With the interviews, the 400 interviews (ca. 1200 hours of video) are divided into 18, 254 segments, the database’s basic retrieval units. Full text searching is complemented by segment level indexing by topics, dates and broad HistoryMaker categories, providing a degree of granularity that will enable users to easily identify and access the exact sections of an interview containing the information

that they need.

Figure 1 (above left) Advanced search with options for date, subject headings and HistoryMaker category

Figure 2 (right) Search results grid

Figure 3 (left) Segment play window

In viewing a segment, the user sees simultaneously the video and the transcript. (Figure 3) The audio track is aligned with words in the transcript text.16 Keyword search terms are highlighted in the text, each in a different color, corresponding to color-coded vertical bars on the time bar that show their positions within the segment. When a user moves the marker to that point in the time bar, the interview will begin at the sentence in the interview in which that word is spoken. (The user can also use the horizontal time bar marker to jump to any other point in the video file.)

16 Duygulu, P., Wactlar, H., “Associating Video Frames with Text,” 26th Annual International Association for Computing Machinery Special Interest Group in Information Retrieval Conference paper, Toronto, Canada, July 28-August 1, 2003, http://www.Informedia.cs.cmu.edu/documents/ACM-SIGIR-2003.pdf

Page 16: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 16 -

Figure 4: Various browsing views

Informedia Digital Video Library's multimedia abstractions provide users with flexible views of keyframes, text and metadata. Search results may be viewed in a map, in a timeline or in a graphic which plots the occurrences of search terms. (See Figure 5.) The different views interact with an underlying XML representation of the set of segments, allowing the user to manipulate controls in the interface to see variations of the views. By dragging dynamic query sliders, the user can narrow or expand the geographical or chronological range or the relevance ranking to view a narrower or broader set of search results. (See Figure 6). Informedia Digital Video Library's Mike Christel explains:

“The same metadata that underlines these views can be used to support dynamic query previews. The user can explore the oral history archive through histogram breakdowns of the whole 18,254 segment space to see how the stories map to different geographic breakdowns, across the decades, according to Brown’s subject headings and according to 16 HistoryMakers general categories like “Lawmakers”. 17

The Informedia Digital Video Library system components run on the Microsoft Windows family of operating systems and are accessible to each other via a Local Area Network (LAN). The system consists of three main elements:

1) Client application used for end-user video retrieval. The end-user application is the Informedia Digital Video Library Digital Video Library client ("IDVL client"). The majority of the IDVL client code is written in C#, leveraging the .NET Framework and DirectX technologies. The target workstation must be running either Oracle 9i or 10g client with the appropriate Oracle Data Provider for .NET (ODP.Net) installed.

2) A database server that houses the searchable metadata. [The database server system in use at The HistoryMakers' headquarters consists of a DELL Precision 410 Workstation, running Oracle 10g Server, and a CI 8-bay SCSI disk enclosure.]

3) A file server for storage of the video content. The file server's role is to store video content and make it accessible to IDVL clients. The IDVL client accesses the video by mounting the file directly via Windows file shares. [The HistoryMakers uses an IBM tower model 86475BX (configured with Windows 2000 Server 5.0.2195, SP4, Build 2195) with a Promise Vtrak 15100 RAID 5 hard drive array (2 terabytes of storage).]

17 Christel, M. et al. Facilitating Access to Large Oral History Archives through Informedia Technologies, paper to be presented at JCDL ’06, Chapel Hill, NC, June 11-15, 2006

Page 17: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 17 -

In February 2006, the data server was transferred from CMU to The HistoryMakers headquarters in Chicago, where Informedia Digital Video Library Digital Video Library software was installed on fifteen client workstations. It is now available for use by visiting The HistoryMakers' Chicago office but formal testing has not yet started.

OUTCOMES OF PROJECT ACTIVITIES Working with Informedia Digital Video Library as a partner, The HistoryMakers has produced a collection of digitized video and metadata, retrievable at the segment level, which will be very useful in research and education in African American history and culture. The HistoryMakers has also learned a great deal for further development and deployment of The HistoryMakers Test Digital Video Library. The collaboration with The HistoryMakers fit well into Informedia Digital Video Library's projected broader legacy:

Broader accessibility to the historical video record being created across the globe.

Pervasive societal impact by fostering a better understanding of how events evolve and are correlated over time and geographically.

Increased use of video in domains previously dominated by text or speech. 18

THE HISTORYMAKERS’ TEST DIGITAL VIDEO LIBRARY: TESTING AND EVALUATION

The HistoryMakers has begun testing the client application to assess data retrievability, ease of navigation through search results and visualization possibilities resulting from the cataloguing work process during the grant period. Personnel from The HistoryMakers staff has subsequently travelled to CMU to work with Informedia Digital Video Library to improve the client application. In the fall of 2006, the database will be tested at several locations including New York Public Library’s Schomburg Center, Emory University, SUNY-Buffalo and Wright State University.

Researchers at the test sites will work with The HistoryMakers and Informedia Digital Video Library to establish plans for more formal testing of the database. The HistoryMakers has created a preliminary survey tool to assist in cross institutional testing and analysis. (See Appendix K.)

The automatic transaction log, which identifies all user actions, will be used to identify the time spent formulating queries, browsing, watching videos etc., and, of course, the results of those queries. During the early development of the Informedia Digital Video Library Digital Video Library, Informedia Digital Video Library used automatic transaction logging to make improvements in their software, for example, by analyzing queries that produced no results. 19 Automatic transaction logging will also be useful to The HistoryMakers as a method which can accompany interviews and surveys or can serve as a substitute for these in the case of students who do not want to spend the time reporting their experience.

GENERAL OUTCOMES

Indexing video oral histories at the passage or segment level presents many challenges, a few of which have been discussed in this report. In this database, indexing professionals, subject experts and oral historians from our test sites will be able to see how terms are applied to different segments, and discuss ways in which this can be improved. The HistoryMakers can work with this group and with interested persons from other institutions and groups to improve our system and learn things about the indexing of video oral history.

The human element that is lost when oral history is accessed only through transcripts--nuances of emotion expressed in tone of voice, facial expressions and gestures, the speech rhythms and rich dialects--will add meaning to the words for all groups of users. All users, maybe especially youth, will find

18 Informedia II Digital Video Library: Auto Summarization and Visualization Across Multiple Video Documents and Libraries”, http://www.Informedia.cs.cmu.edu/dli2/ 19 Wactler, H., “Informedia Digital Video Library Technology Outreach”, D-Lib Magazine, July/August, 1996

Page 18: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 18 -

that the availability of audio and video makes the stories more vital and memorable than the printed word. Young people in particular may find that seeing and hearing individual people talk about their own personal experiences of major events and historical time periods makes history more meaningful, more understandable at a human level as well as more memorable. The HistoryMakers' collection includes the famous and powerful but also the unsung. This will give users a better understanding of not only well-known people who have made an impact on history but a much greater number of individuals who have contributed to society and their local communities. African American youth as well as those of other backgrounds will find inspiration and role models in these stories.

USE BY SCHOLARS

For scholars, easy access to both transcripts and the video recordings of primary source materials is invaluable. The ability to search across hundreds of long interviews divided into tens of thousands of segments, with detailed annotation providing multiple access points and Informedia Digital Video Library's query and visualization features will enable rich exploration of the collection, and facilitate greatly research within The HistoryMakers' oral history collection. It will also allow for further dissemination and analysis of the information it contains. The Notes field may be also used by scholars to supply additional background information about a person or entity referred to in a particular segment further enhancing the cataloguing process. Initially, each of the initial test sites will be a completely separate entity with its own data set, so notes added would only be viewed at that institution. However, it would be possible to update the data sets to include notations from different sources, and at a later date, there may be more possibilities for cooperation.

USE IN K-12 EDUCATION The editing of The HistoryMakers' interviews into segments with natural topical boundaries, indexing at this segment level, and the visualization and sorting options available in the Informedia Digital Video Library Digital Video Library will make this an extremely flexible resource for tailoring lessons for use in the classroom. Personal stories about the interviewees’ own childhood and youth experiences will be of interest to K-12 students and will provide a window to different periods of 20th century American history, and a knowledge of African American history. Using different subject heading combinations, teachers could select stories about childhood experiences of racism, about interviewees’ memories of the neighborhoods they grew up in, or about different foods eaten in different regions. Certain standard interview questions are also indexed, so that a teacher might choose a selection of interviewees’ thoughts about the future of the black community and ask students to compare their ideas and talk or write about how their own views are similar or different.

Teachers will be able to integrate selections from interviews into lessons to enrich the curriculum and may share lesson plans with other educators, similarly to projects like the Kentucky Oral History Commission’s “Civil Rights in Kentucky” online resource. 20 The Notes field may used in many ways within a local network, since they can be added by users and searches can be limited to that field. For example, a teacher could recommend certain segments to students for independent research; or a school system could identify segments as appropriate for certain grade levels.

The database will be useful as a resource for learning about different careers. The interviewees are grouped into broad categories such as “Education Makers”, “ScienceMakers,” “CivicMakers,” or “Sports Makers”, and there are also more specific subject headings for different occupations, so that a student could view segments about African Americans in medicine, banking or journalism. Hearing about on-the-job experiences will give young people a greater appreciation of the challenges faced by older African Americans in their work and will also give them an “inside look” into various industries and professions that may help them make decisions about their own careers.

20 Civil Rights Movement in Kentucky Online Digital Media Database http://162.114.3.83/civil_rights_mvt/

Page 19: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 19 -

USE IN DOCUMENTARIES

For video and audio documentary producers, the ability to search, browse and view the actual recordings is extremely useful. Many things important to a producer are often not apparent in a transcript: emotion, significant pauses, facial expressions and gestures, add an extra meaning to the material that make certain clips stand out. Other issues, such as how intelligible the speaker’s words are, how loud or soft, or whether there is background noise or overlap between speakers, might tend to rule out use of a certain clip and it is easier to find this out immediately than after ordering tapes based on the reading of a transcript.

GENEALOGICAL RESEARCH

This database could have valuable applications for family history research. Interviewees talk in detail about their own memories and communities. They supply a wealth of detail about their families, giving names and dates and location of birth for themselves and relatives, relating memories of things their parents and older relatives have told them about their origins, family names, sometimes stories that date back to slavery and Reconstruction and have been passed down through many generations. While these are all anecdotal they provide many clues for the researcher that could be followed up. In addition many interviewees show and identify family photos.

OTHER POSSIBLE FUTURE RESEARCH:

Thesaurus building The HistoryMakers could use also use thesaurus software to develop the list of subject terms used in The HistoryMakers Digital Video Library into an original thesaurus based on The HistoryMakers’ corpus. The HistoryMakers could carry out this work in cooperation with African American subject specialists from libraries and research centers from around the country. The resulting thesaurus could be used as a tool for both cataloguers and researchers, and ideally might be incorporated with The HistoryMakers Digital Video Library database enabling users to directly access data from thesaurus records.

Automatic technologies The work on this database has also resulted in a wealth of data that may be used in future research and testing in the areas of automatic technologies for cataloguing, including automatic transcription, segmentation, indexing and summary features.

Automatic speech recognition: The HistoryMakers' recordings linked with the manually-transcribed text could be used as “training data” to improve automatic speech recognition systems’ understanding of African American speakers.

Automatic indexing: The HistoryMakers' segments have two sets of indexing: one by humans and one of automatically derived descriptors created by the Informedia Digital Video Library system. Accuracy of search results in the two systems could be compared, and possibly the data could be of use in improving automatic indexing of this type of material.

Automatic segmentation: The HistoryMakers' manually segmented interview videos could be used to measure an automatic segmentation system’s handling of the same material, in tests such as those performed by the MALACH project, which measured the agreement between computed and reference boundaries 21

CERTIFICATION

In submitting this report, I certify that all of the information is true and correct to the best of my knowledge.

Submitted by:

21 Franz, Ramabhadran, Ward and Pecheny, Automated transcription and topic segmentation of large spoken archives. In Proceedings of EUROSPEECH, Geneva, September 2003

Page 20: The HistoryMakers Digital Video Library National...The HistoryMakers Digital Video Library

The HistoryMakers Digital Video Library – Final Report

- 20 -

LIST OF APPENDICES: Documents Generated in the Course of the Project

A. List of 400 Interviews in The HistoryMakers Digital Video Library, with short bios, 2005 B. The HistoryMakers Project fellow Finalist Test Instructions, 2004-2005 C. The HistoryMakers/ Informedia Digital Video Library Cataloguing Manual, 2004-2005 D. The HistoryMakers Transcribers’ Manual, 2004-2005 E. The HistoryMakers Proofreaders’ Manual, 2004-2005 F. Example of XML document created with Informedia Digital Video Library Digital Video Library’s

Segmentor application, with metadata entered by The HistoryMakers cataloguers, 2005 G. The HistoryMakers Digital Video Library Subject Heading List, 2004-2005 H. Informedia Digital Video Library Digital Video Library/ The HistoryMakers Quality Control form,

2005 I. Project Tracking Chart, 2004-2005 J. Interview Processing Time Tracking Chart, 2005 K. The HistoryMakers Digital Video Library Basic User Survey, 2006

CONFERENCES AND PRESENTATIONS

March 2004 Julieanna Richardson and Edward Williams attended the Web-Wise Conference in

Chicago, IL. The HistoryMakers not only attended the conference, but also served as presenters at the conference for the session titled, The Road Ahead: Future Directions and Funding Considerations.

September 2004, Mid-Atlantic Regional Archives Conference, Pittsburgh, PA: Julieanna

Richardson presented in the session “Nature of Collaborations: Rewards and Pitfalls.” November 2004, Drake University, Des Moines, Iowa. Julieanna Richardson presented a lecture,

" The HistoryMakers - Oral History, Public History and New Technology.”

January 2005, Martin Luther King Day of Service Open House, The HistoryMakers' offices, Chicago, IL: Cheri Pugh gave demonstrations of the prototype client software and the “Segmentor” cataloguing software to groups of visitors throughout the day; this event was promoted in the Chicago Area Archivists listserv and was attended by a number of librarians from the region as well as ca. 200 other interested citizens.

June 2006 (upcoming) Michael Christel of Informedia Digital Video Library will present a paper

on Informedia Digital Video Library’s collaboration with The HistoryMakers at JCDL ’06