batch-load points counter (marcedit project) amelia c. vangundy the university of virginia’s...

Post on 05-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Batch-Load Points Counter(MARCEdit project)

Amelia C. VanGundyThe University of Virginia’s College at Wise

Virginia SirsiDynix Library Users Group MeetingNov. 14, 2012

2

John Cook Wyllie Library http://library.uvawise.edu/

• Ebook titles in OPAC & Ebook packages on web in finding aids

• Rate of e-book acquisition increased netLibrary – 3k titles per year

EBSCOhost Ebook Academic Collection – 65k titles initial load– 5-10k titles additional every quarter

3

Batch Loading Problems

• Existing procedures were difficult to follow• Procedures were inconsistent– especially for different vendors

• Didn't take advantage of MARCEdit Tools• 949 holdings field now includes $a class#– previously, files loaded with AUTO “call#”

4

Solution? Wish list?

Determine quality of MARC records– OCLC files vs. other vendor files

Determine editing priorities– required (001/949), recommended, optional

Learn to construct Regular Expression Strings– Batch Editing Tools & Find/Replace

• Streamlined format– needed both an outline & more detailed info

• Make available on-line/web-page

5

MARCEdit proficiency

• Beginner

Advanced Beginner– Uses MARCEditor Tools window

(Add/Delete field, Edit Subfield Data, Sort by... )

– Can apply Regular Expression Strings

Intermediate– Uses MARC Tools wizard

(Extract Selected Records, MARCSplit, Extract selected records)

– Can construct Regular Expressions

• Expert

6

Batch-Load Points Counter (BLPC) people.uvawise.edu/acv6d/

7

Batch-Load Points Counter (BLPC) Webpage & Project link

people.uvawise.edu/acv6d/

1. Introduction– project concept & desired outcomes

2. Checklist #– outlines the batch-load procedures & steps– points counter: “what to do” & “when to stop”

3. Processing Guidelines #– procedures & how-tos & copy/paste info

4. 949 processing

8

BLPC Introduction & Outcomes

• Validation– determine integrity of the file

• Processing – determine quality of the records

• Statistics– track vendor pkgs, record counts, 001 prefixes

• Points– max. points = 150 (2.5 hours)• STOP & contact vendor (request corrected file)

9

BLPC CheckList w/Time estimates

• Step 1 & 2: Preparation & validation– number of records in file– integrity of file– valid URL links

• Step 3-4: Review & processing– quality of records– lists all processing/edits possible

• Step 5: 949 holdings

Print on one page (2 p. per sheet / front&back)

10

BLPC Processing Guidelines(Procedures)

• Gives details for CheckList– Steps 1-2, Steps 3-4, Step 5

• Gives the regular expression strings (copy/paste)– Finding/ Replacing/Deleting– MARCEditor Tools & MARCEdit Tools

• Always use along with Checklist– includes information to process every field, BUT

– not every field needs processing

Do not print out

11

BLPC Step 1: Preparation & Reports

• MARC Validator– Identify Invalid Records– Validate Record (copy/paste into text file)

• Material Type Report

• Field Count– verify vendor count against MARCEditor count (LDR/000)– count early / count often

• Deduplicate (See Addt’l Instruct.)

12

Reports/MARC Validator:Identify Invalid Records

13

Reports/MARC Validator:Validate Records

14

Reports/Material Type

15

BLPC Step 2: Verify Field Counts

• Reports/FieldCount for error checking– first field listed is 000 (corresponds to =LDR)

– last field listed is “numeric”– 245 count

• Reports/MARCValidator errors – open text file created in Step 1– look for specific errors in error file

• Check URL links to make sure they work

16

Reports/Field Count(vendor count = 8556)

17

Field Count Error & "bad field tag"(vendor count =694)

18

Reports/Field Count: Detail(highlight field & right-click)

19

Review Validate Records report(saved as text file in Step 1.B)

20

BLPC: Review for processingChecklist Step 3 workflow

Check field counts Mark-up notes on the Checklist

– Track/count fields that need processing Track points for fields that need processing Track points for fields that need manual editing

Each record to fix means extra points Rule of thumb: for more than 12 manual edits

Treat as separate post-load maintenance project

21

BLPC Checklist Step 3: Review FieldsExamples of required processing

Examine first record & check field count Title control# – 001 (prefer OCLC#)

If lacking: use info. from 035 or create local 001 Check field counts / subfield counts

Title/GMD – 245 $h URL – 856 $3 $y $u

Check Validate Record text file for errors “Invalid field format” / “Subfield cannot repeat”

Check field counts / indicator counts Subject – 650 Ind2 = 4/7 or 5/6/8

22

BLPC Checklist Step 4: Review fieldsExamples of optional processing

Check field count & delete if present 029 / 583 / 584 / 938

Check field data and delete Other vendor pkg names

(netLibrary/ebrary/myiLibrary/24x7/Ebsco) Check field data & ignore/defer

300 lacks phrase: (1 electronic resource)

23

BLPC Checklist with mark-ups

24

BLPC Processing workflowStep 3 - Step 4

Review Field Count Review Field data

– Use Find/Sort window and review first/last field Add/Delete/Edit field Review Field data

– look at field in first record or Find/Sort window– Mistake? Typo? – use the Edit/SpecialUndo

Review FieldCount Save edited file / SaveAs new filename

25

MARCEditor Tools window adding/editing/deleting fields adding/editing deleting subfields

MARCEditor Edit/Find window editing/replacing field data displays sortable list

MARCEdit Tools wizard for select & extract records extract tab-delimited records for Excel

MARCEditor / MARCEdit Tools BLPC Checklist identifies fields to process

26

BLPC Processing: Add std. Phrase506 => Step 3.S

• Check Field Count for presence of 506• Delete existing 506 field (if present)• Consult Step 3.S in BLPC Procedures– Determine that AddField Tool is needed for processing– Copy Std.phrase from Step 3.S notes– Paste into AddField Tool window and submit

• Review 506 data in first record• Check field count• Save file

27

MARCEditor Tools: Add std. Phrase506 => Step 3.S

28

BLPC Processing: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V

• Check Field Count for Presence of 650 Ind2=5/6/8• Consult Step 3.V in BLPC Procedures– Optional Review – FindAll(RegEx) instructions– Determine that Tools/DeleteField tool is needed– Copy RegEx pattern from Step 3.V– Paste into Tools/DeleteField window

– Use Regular Expressions radio button option– Submit using Delete button

• Check Field Count & Indicator count• Save file

29

MARCEditor: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V

30

Regular expressions (RegEx)

• Finding/Editing patterns in strings (letters/numbers)

– Like learning another language• Parentheses are used to group data– Forces the computer to "store" data in "chunks"– Data “chunks” are numbered for recall/retrieval/use– Helps the programmer "read" the pattern

• Optional functionality, and not necessary

• Some punctuation is "reserved" (has a special meaning)

• BLPC uses consistent format for RegEx patterns

31

Reading RegEx Patterns650 Ind2= 5/6/8 (non-LC)

Pattern: (=650 )(.[568])(\$a)(.+)

(=650 ) look for 650 fields with two blank spaces

(. [568]) look for any Ind1 & listed Ind2 numbers

(\$a) look for subfield $a (used as "anchor chunk")

(.+) any letter/number to the end of the field

Use Edit/FindAll(RegEx) to verify pattern

32

Interpreting RegEx punctuation

Pattern: (=650 )(.[568])(\$a)(.+)

( ) Parentheses for data “chunks” . Period for any single letter/number[ ] Square brackets for a list using “OR”

\ Backslash before “reserved” punctuation

esp.: $ \ ( ) [ ]

+ Plus sign for more of the same

“Chunks” are stored as: $1$2$3$4

33

Creating RegEx patterns

• Start with known pattern:For non-LC Subjects: (=650 )(.[568])(\$a)(.+)

FindAll(RegEx) for “local” Subjects (Ind2 = 4/7)(=650 )(.[47])(\$a)(.+)

FindAll(RegEx) for “local” Genres (Ind2 = 4/7)(=655 )(.[47])(\$a)(.+)

34

Editing with RegEx string pattern 650 BISAC subjects => 690

Start with known pattern: (=650 )(.[568])(\$a)(.+)

• Use Edit/Replace(RegEx): Change 650 to 690 Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh

• Determine which “chunks” change/stay the same

Find(RegEx): (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

Replace(RegEx): (=690 )$2$3$4$5

35

Reading RegEx Patterns650 BISAC subjects => 690

Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

(=650 ) look for 650 fields with two blank spaces

(.[7]) look for any Ind1 & Ind2 =7 (\$a) look for subfield $a (optional “anchor” text)(.+) any letter/number to the next “chunk”(\$2bisacsh) look for subfield & data at end of field

Can be shortened (which makes the pattern look complicated): Find(RegEx): (=650)(.+\$2bisacsh)Replace(RegEx): (=690)$2

36

MARCEditor: FindAll(RegEx) Testing the pattern: 650 BISAC subjects

37

MARCEditor: Replace(RegEx) 650 BISAC subjects => 690

38

BLPC Step 5: 949 processing Required processing

Policy: Include Class# in Unicorn Item record949

$a -- Pull the call# from the 050$a -- Insert the standard phrase: ' INTERNET'$v -- Pull the 001/OCLC# as a unique no.$w $h $t $x $z -- Add standard holdings data

• See Addt'l instruct,

39

Batch-loading• MARCEdit with files no larger than 10k records– MARCEdit/Tool MARCSplit

• MARCEditor/File: Compile File into MARC• Unicorn batch load rpt uses 001 match point– 'o' for OCLC# o & 'g' for local vendor key

• Unicorn batch load rpt settings– create new bibliographic records only

• Date cataloged -- back dated to prev. month– prevents interference w/scheduled Authority reports– max. load two files a day

40

Identifying records for Cleanup

Checklist finds problems to correct post-load

• Item maintenance projects– 949 lacks call#

• Bibliographic record maintenance projects– 245 lacks $h (if more than 5-12 records) – URLs lacking

• Record reload/overlay project– Record already in OPAC (P-N duplicates)

41

MARCEdit Tools: Select/Extract selected records

Step 3.F: 245 lacks $h

42

MARCEdit Tools: Export Tab Delimited records

43

Help!• MarcEdit Help

http://people.oregonstate.edu/~reeset/marcedit/html/help.html– Click thru the Contents menu:

Contents / Using MARCEdit / Using the MARCEditor / Editing Functions / Using Regular Expressions.

• RegularExpressions.info http://www.regular-expressions.info/

MARCEDIT-L listhttp://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L

BATCH listhttp://listserv.vt.edu/cgi-bin/wa?A0=batch

44

Amelia C. VanGundyThe University of Virginia's College at Wise

John Cook Wyllie Library

276-328-0154acv6d@uvawise.edu

http://people.uvawise.edu/acv6d/

Virginia SirsiDynix Library Users Group MeetingNov. 14, 2012

45

BLPC ProjectPresentation revisions

Originally presented Nov. 14, 2012

• Additional Slides:– BLCP Project web-page– MARCEditor: FindAll(RegEx)– MARCEdit Tools: Export Tab Delimited records– BLPC Project: Presentation revisions

top related