editing records with the marceditorkslibassoc.org/2012conf/handouts/marceditsession_three.pdfediting...
TRANSCRIPT
EDITING RECORDS WITHTHE MARCEDITORTerry ReeseGray Family Chair for Innovative Library [email protected]
Keypoints
MarcEditor What is it? What do the properties mean Preview mode? Paging mode?
Editing Functions Field Count Task Automation Validation OAI Harvesting
Editing MARC
MarcEditor Specialized TextPad designed specifically for MARC records.
Is UTF8 aware – can be used to generate records in MARC8 (thoughmnemonics) or UTF8 charactersets.
MarcEdit Templates
Templates work much like Microsoft WordTemplates Define a set of default data that will appear on a
screen Templates exist for all material formats Can be customized to suit your needs.
MarcEdit’s Preview Mode
One of the most confusing features Allows MarcEdit’s MarcEditor to address files over the
allowed 2 GB Windows page file limit (thoughpractical limits are closer to 300 MB)
Reads a small snippet of the file into the editor – butedits are done to the entire file.
Can be turned off.
MarcEdit Paging
Paging Change Notes The preview page functionality is still present, but
full page now defaults to the new pagingfunctionality.
Preview functionality – on load – the applicationreads the entire file to prep – this is where most ofthe loading time takes place. After that, pagesare addressed directly.
Editing MARC
MarcEditor Supports a number of global editing functions:
Find/Replace functionality Globally Add/Delete MARC fields Globally Edit Subfield data
Conditionally add/remove field data Globally Edit Indicator data Globally Swap field data Record Deduplication Record Sorting Call Number Generator Macros Z39.50 Cataloging
Editing MARC – Find/Replace
Works like a normalFind/Replace in mostTextpad utilities.
Unlike most Textpads,Replace supports UTF-8 (when working withUTF-8 files) and regularexpressions.
Editing MARC – Find All
Find all function wasdesigned for use withthe Paging mode
Allows users to find anytext across all pages
Generates a jump listthat can be used to findindividual records foredit
Jump List
When using the jump list: Will jump to the page and record within the set Will save (temporarily) any items modified or
pages automatically (though to set saved items,you need to actually save the page)
Jump to
Jump to…record: Allows you to jump to any records
Jump to…page: Allows you to jump to any page
Editing MARC – GlobalAdd/Delete Field
Globally add fields to all MARC records Allows users to set insertion position.
Globally delete fields Allows global delete Allows conditional delete
Supports Regular Expressions
Editing MARC – Modifying subfielddata Allows for the modification of variable MARC
field subfield data (MARC fields >10) Allows for the modification of control field data
by position or range of positions Allows users to prepend and append data to
subfields. Allows users to change subfield tagging.
Editing MARC – Modifying subfielddata Allows users to insert new subfields and
define subfield placement. Allows users to move field data from one field
to another. Supports:
UTF-8 with UTF-8 files Regular Expressions Adding new subfields.
Editing MARC – SwappingFields
Swap parts of MARCFields or entire MARCfields Define field, indicator
and subfields to move. Can move field data
and delete the originalfield or clone the fielddata and move theclone to the newlocation.
Can add data to anexisting field.
Character Conversions withinthe MarcEditor
MarcEditor allowsusers to convertcharacter databetween differentcharactersets.
Sorting Fields MarcEdit provides multiple sorting types:
Control Number Sorts record position within the file
Title Sorts record position within the file
Author Sorts record position within the file
Call Number Sorts record position within the file
0xx Fields Sorts the 0xx fields within individual records
(does *not* change record position within afile)
All Fields Sorts all fields within individual records (does
*not* change record position within a file) Custom Sort
Sorts all defined fields within individualrecords (does *not* change record positionwithin a file)
Record Deduplication
MarcEdit provides a simplededup tool that can: Dedup on a defined control
field (any field) Dedup on a transaction
field (or using an additionaltransaction field)
Output Removes all duplications
and saves the duplicationsto a file
Prints just unique itemswithin the file (i.e., thosewithout a duplicate pair)
Field Counts
Field Count Provides a quick
count of fields Report of subfields
used within aparticular field
Detailed reports ofall fields/subfieldsused within a fileset.
Material Type Report
Material Type Report Reports number of
records by materialtype
Breaks down materialtype by sub-types
Utilizes the Leader,008 and GMD todetermine formattypes
In-Line Validation
MarcValidator-lite Can access
MarcValidator forquick validation ofdata elements foundin the file set
Validation can useany defined rulesset.
Task Automation Tool
New to MarcEdit 5.2, Task Automations Task automation provides a way for non-
programmers to create defined task lists that canthen be executed automatically
The different between a task and a macro is thatMarcEdit tasks essentially function like the userwas calling specific functions within MarcEdit.
Anything that you can do in the MarcEditor, youcan automate as a task.
Task Automation
Managing Tasks Task management
works like macromanagement
You can Create new tasks Clone tasks Rename tasks Delete tasks Edit tasks
Task Automation Demo
Additional Information: Youtube:
Introduction to task automation:http://www.youtube.com/watch?v=gmqTGfTubU4
Introduction to new task automation functions:http://www.youtube.com/watch?v=fnorN0MFFN0
OCLC Classify Service
MarcEdit can leverage OCLC WorldCat togenerate call numbers automatically for files Fields used:
001 010$a$z 020$a$z 022$a$z 024$a$z 1xx$a 776$w$z
MarcEdit Regular ExpressionSupport When processing regular expressions with MarcEdit, MarcEdit
makes entire fields or subfields available for processing i.e., when processing a delete field function – all data from =[field
number] are part of the field that can be queried. MarcEdit’s regular expression by default deals with one field at a
time (i.e., regular expressions do not allow you to find data acrossfields by default)
MarcEdit’s Regular Expression Support Pre-5.x was a customregular expression engine.
MarcEdit’s Regular Expression Support 5.x+ is defined by Microsoft.NET’s Regular Expression object This object uses a syntax that looks Perl-like, but has some differences.
MarcEdit Regular ExpressionSupport When working with regular expressions with
the Replace Function, MarcEdit will rememberthe last 10 replacements. This should helpwith trial and error.
When dealing with Regular Expressions or anyglobal replacements, MarcEdit has a SpecialUndo function that will undo your last globalupdate.
Microsoft’s Regular Expressionlanguage Concepts:
Character escapes Anchors Character classes Grouping Qualifiers Substitutions
Let’s open the net_regular_expressions.htmfile.
How we use Regular Expressionsin MarcEdit Your most important parts of the regular
expression language are:1. Character escapes: \d\r\n\$\x##2. Character Classes [] & [^]3. Grouping Elements ()4. Anchors: ^$5. Quantifiers: *?+{#}6. Substitutions: $#
Examples
Looking at example.mrk using the replacefunction:
Add a period to the 500 if it is missing
Add a $h of cartographic resources between the$a and $c .
Split the 856 into two fields, breaking on the $u.
Examples 1
Add a period to the 500 if it is missing Find What: (=500 ..)(.*[^.]$) Replace With: $1$2.
Explanation: (=500 ..)
Searches for the 500 field. We leave two blanks becausethere are always 2 blank characters as part of the mnemonicformat. The two periods which stand for any character. If wewant to search for exact indicators, you’d place those valuesrather than the periods.
(.*[^.]$) Take any characters, and match on a field where the last
character in the field isn’t a period.
Example 2
Add a $h of cartographic resources between the$a and $c .
Find What: (=245.{4})(\$a.*)(/.*) (=245.{4})
Match the 245 field with any value in the next 4characters being valid.
(\$a.*) Select everything within the subfield a
(/\$c.*) Select the / value and the subfield c (and other
data)
Replace With: $1$2$$h[cartographicresource] $3
Example 3
Split the 856 into two fields, breaking on the$u. Find What: (=856.{4})(\$u.*[^$])(\$u.*)
(=856.{4}) Matches the 856 field
(\$u.*[^$]) Match $u, but stop at the end of the subfield
(\$u.*) Match reminder of field
Replace With: $1$2\n=856 41$3
Lcase/ucase
MarcEdit’s regular expression engine includesto extension functions for dealing with caseswitching of characters. lcase & ucase
Usage: (=450.{4})(\$a.)(.*) $1$2lcase($3)
Example: Find the 500 with all upper casecharacters and convert the case of all values butthe first letter in the sentence to lower case.
Example (Lcase)
Find the 500 with all upper case charactersand convert the case of all values but the firstletter in the sentence to lower case.
Find What: (=500.{4})(\$a.)([A-Z .]*) Replace With: $1$2lcase($3)
Multi-Field Replacements
By default, MarcEdit handles one field at atime when doing regular expressions. However, when you need to do evaluations
against multiple fields, you can by adding /m tothe end of your replacement in the ReplaceFunction in the MarcEditor
This is a special function added to the MarcEditregular expression engine
Example
Using test.mrk
The file has multiple 028 fields. The first fieldhas a $a and $b, the second a $b. Copy the$b to the second 028, but only if they areconsecutive
Multi-Line Example
The file has multiple 028 fields. The first fieldhas a $a and $b, the second a $b. Copy the$b to the second 028, but only if they areconsecutive Find What:
(=028.{4}\$a[^\$]+)(\$b[^\$]+)(\r?\n)(=028.{4}\$a[^\$\r\n]+)(\r?\n)/m
Replace With: $1$2$3$4$2$3