diacritics presentation20101109 jstrass

14
1 How to import diacritics into CONTENTdm from a library catalog using Excel and MarcEdit Jill Strass This talk was inspired by our struggles to digitize some Nordic Solo Songs as collected by Dan Dressen and bravely cataloged and uploaded by Kathy Blough. Jill Strass St. Olaf College Upper Midwest Online CONTENTdm Conference November 89, 2010 The Challenge Shortcut to metadata: obtain MARC records Shortcut to metadata: obtain MARC records containing diacritics from a library catalog as a tabdelimited file for easy import into CONTENTdm

Upload: jill-strass

Post on 12-Jan-2015

934 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Diacritics presentation20101109 jstrass

1

How to import diacritics into CONTENTdm from a library catalog using Excel and 

MarcEdit

Jill Strass

This talk was inspired by our struggles to digitize some Nordic Solo Songs as collected by Dan Dressen and bravely cataloged and uploaded by Kathy Blough.

Jill StrassSt. Olaf College

Upper Midwest Online CONTENTdm ConferenceNovember 8‐9, 2010

The Challenge

• Shortcut to metadata: obtain MARC records• Shortcut to metadata: obtain MARC records containing diacritics from a library catalog as a tab‐delimited file for easy import into CONTENTdm

Page 2: Diacritics presentation20101109 jstrass

2

The Method

• Export our records from the library catalog• Export our records from the library catalog as a delimited file

The Method

• Export our records from the library catalog• Export our records from the library catalog as a delimited file

• Use the tab‐delimited file to generate metadata for CONTENTdm

Page 3: Diacritics presentation20101109 jstrass

3

The Method

• Export our records from the library catalog• Export our records from the library catalog as a delimited file

• Use the tab‐delimited file to generate metadata for CONTENTdm

• Upload as a compound object into Up oad as a co pou d object toCONTENTdm  

The Challenge

• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t allow us to cleanly export fields with repeating values from the catalog to a delimited file. 

Page 4: Diacritics presentation20101109 jstrass

4

The Workaround – Catalog to MarcEdit

• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t allow us to cleanly export fields with repeating values from the catalog to a delimited file.

• No worries, we’ll use MarcEdit

The Workaround – Catalog to MarcEdit

• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t allow us to cleanly export fields with repeating values from the catalog to a delimited file.

• No worries, we’ll use MarcEdit

• Convert the tab delimited file (.out) from the catalog into an (.mrc) format file using MarcEdit

Page 5: Diacritics presentation20101109 jstrass

5

The Workaround – Catalog to MarcEdit

• Uh oh we have an export bug that won’t• Uh oh, we have an export bug that won t allow us to cleanly export from the catalog to a delimited file.

• No worries, we’ll use MarcEdit

• Convert the tab delimited file (.out) from Co e t t e tab de ted e ( out) othe catalog into an (.mrc) format file using MarcEdit

• Take the (.mrc) file and export using MarcEdit’s tool for tab‐delimited files. 

The Workaround – Catalog to MarcEdit

• Uh oh we have an export bug that won’t allow us to• Uh oh, we have an export bug that won t allow us to cleanly export from the catalog to a delimited file.

• No worries, we’ll use MarcEdit

• Convert the tab delimited file (.out) from the catalog into an (.mrc) format file using MarcEdit

• Take the (.mrc) file and export using MarcEdit’s tool ( ) p gfor tab‐delimited files. 

• In MarcEdit, we choose which MARC fields we want for our metadata in digital collections.

Page 6: Diacritics presentation20101109 jstrass

6

The Trick to know in MarcEdit for diacritics 

• Use the MarcEdit Characterset Translation• Use the MarcEdit Characterset Translation tool, and while breaking the record, select UTF‐8 as the format, so Excel can recognize diacritic characters.

The Trick to know in MarcEdit for diacritics

Note that the box forNote that the box for Translate to UTF-8 is checked.

Page 7: Diacritics presentation20101109 jstrass

7

The Trick to know in MarcEdit for diacritics

Yippee! If youYippee! If you look real close, you can see diacritics are showing up in the text editor in MarcEdit.

Trick for Diacritics in Excel

• Now we have our diacritics within a tab• Now we have our diacritics within a tab delimited file, courtesy of MarcEdit. 

• There is a trick you’ll need to use when you first open Excel. 

Page 8: Diacritics presentation20101109 jstrass

8

Trick for Diacritics in Excel

When you first open your tab-delimited file from MarcEdit, when Excel takes you through its wizard for importing the tab delimited file, select 65001 Unicode (UTF-8) from the File Origin pull-down menu.

This will allow Excel to “see” the diacritics.

Generating Metadata from tab‐delimited files

• We use a tricked out spreadsheet that• We use a tricked‐out spreadsheet that allows us to take a row from a tab delimited file, copy and paste it into Excel, and then Excel generates a compound object template for easy upload into CONTENTdm.

Page 9: Diacritics presentation20101109 jstrass

9

Generating Metadata from tab‐delimited files

• We use a tricked out spreadsheet that• We use a tricked‐out spreadsheet that allows us to take a row from a tab‐delimited file, copy and paste it into Excel, and then Excel generates a compound object template for easy upload into CONTENTdm.

• We do this to avoid manual data entry as much as possible.

Generating Metadata from tab‐delimited files

• We use a tricked out spreadsheet that• We use a tricked‐out spreadsheet that allows us to take a row from a tab‐delimited file, copy and paste it into Excel, and then Excel generates a compound object template for easy upload into CONTENTdm.

• We do this to avoid manual data entry as much as possible.

• If you’d like a spreadsheet file and documentation on how to use it contact

Page 10: Diacritics presentation20101109 jstrass

10

Generating Metadata from tab‐delimited files

• To convert the xls file• To convert the .xls file to .txt, we select, copy and paste from Excel into Notepad++.

• We do this so we can see exactly what characters are showing up in our text files. 

Generating Metadata from tab‐delimited files

• Note that Notepad++• Note that Notepad++ is so cool, we don’t need any tricks to use it!

Page 11: Diacritics presentation20101109 jstrass

11

Uploading into CONTENTdm with Diacritics (CDM 5.3)

From Project Client, j ,select Add Multiple Compound Objects, then select the Map Fields Tab.

Uploading into CONTENTdm with Diacritics (CDM 5.3)

Click the Encoding button.

Page 12: Diacritics presentation20101109 jstrass

12

Uploading into CONTENTdm with Diacritics (CDM 5.3)

If only it were thisIf only it were this simple…. For us, we had to select ANSI for this to work, but according to the documentation, UTF-8 as encodingUTF 8 as encoding is supposed to work.

Uploading into CONTENTdm with Diacritics (CDM 5.3)

We may never yknow why this is so for us. Please share your experiences.

Page 13: Diacritics presentation20101109 jstrass

13

A Sample of Diacritics on CONTENTdm

And here we are, at journey’s , j yend….

Summary of Diacritics on CONTENTdm

• Export MARC records from your catalog or source forExport MARC records from your catalog or source for text with diacritics.

• If you need to use MarcEdit in this process, select the UTF‐8 box in the Characterset Translation Tool.

• When first opening a tab‐delimited file in Excel, select 65001 Unicode (UTF‐8) from the File Origin pull‐down menu.

• When uploading to CONTENTdm, experiment with the UTF‐8 vs ANSI setting in the Add Compound Object, File Mapping, Encoding box.

Page 14: Diacritics presentation20101109 jstrass

14

How to import Diacritics from a Library Catalog into CONTENTdm Using Excel 

and MarcEdit

Jill Strass

Digital Initiatives and Metadata Librarian

St. Olaf College

[email protected]