taking your customers to the cleaners: historical patron data cleanup and routine purge preparation
DESCRIPTION
Detailing how we did a major patron data cleanupPresented at ELUNA 2009TRANSCRIPT
![Page 1: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/1.jpg)
Taking Your Customers to the Cleaners:
Historical Patron Data Cleanup and Routine Purge Preparation
Roy Zimmer
Western Michigan University
![Page 2: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/2.jpg)
About 5 or 6 years ago…
No more SSN switch to using WIN
WIN is our Western Identification Number
![Page 3: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/3.jpg)
About 5 or 6 years ago…
No more SSN switch to using WIN
Banner
WIN is our Western Identification Number
![Page 4: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/4.jpg)
About 5 or 6 years ago…
No more SSN switch to using WIN
Banner
New campus ID cards
WIN is our Western Identification Number
![Page 5: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/5.jpg)
A few less years ago…
Rewrote the patron update process to use Banner
![Page 6: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/6.jpg)
A few less years ago…
Rewrote the patron update process to use Banner
Started thinking about not being SSN-based
![Page 7: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/7.jpg)
2007-2008
The WIN had become available in the data feeds for our patron update.
Needed to change Institution ID
interim step: arbitrary 14-digits -> WIN
final step: WIN -> Bronco NetID
Patron update was switched from being SSN-based to WIN-based.
BroncoNetID is our single signon ID
![Page 8: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/8.jpg)
Summer 2008 – What we started with
Have data for about 74,000 patrons.
About 183,000 barcodes (less than half are active!).
![Page 9: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/9.jpg)
Summer 2008 – What we started with
Have data for about 74,000 patrons.
About 183,000 barcodes (less than half are active!).
Several thousand duplicate records,
one with SSN, one with WIN (in the SSAN field)
The older duplicate record typically had charges, amounts owed, etc.
![Page 10: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/10.jpg)
2008: August – October
Most of my time was spent on the cleanup…
Dali
![Page 11: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/11.jpg)
Patron duplicate detector – LB4020
foreign students
various errors
Sample follows…
August
![Page 12: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/12.jpg)
(WINs & SSNs above are not real)
Sample output used one day
![Page 13: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/13.jpg)
Our first run came up with 3489 duplicate patron records.
![Page 14: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/14.jpg)
We created a program that used the LB4020 report as input to identify patron records that we wanted to alter – call it LB4020fix.
These records needed to be extracted from Voyager for modification and re-import.
Modify me with LB4020fix
![Page 15: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/15.jpg)
Voyager has a patron extract utility, but it doesn’t extract all relevant data for a patron. We’d started using our own – patronsif.pl - years ago.
![Page 16: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/16.jpg)
Voyager has a patron extract utility, but it doesn’t extract all relevant data for a patron. We’d started using our own – patronsif.pl - years ago.
Voyager extract
(Pptrnextr)
Up to 3 patron-barcode + group combinations
Similarly limited number of addresses
WMU extract
(patronsif.pl)
Unlimited patron-barcode + group combinations
Unlimited number of addresses+
- +- → +
![Page 17: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/17.jpg)
Voyager has a patron extract utility, but it doesn’t extract all relevant data for a patron. We’d started using our own – patronsif.pl - years ago.
For the patron cleanup we incorporated patronsif.pl into LB4020fix.
Patron notes field problem:
CR+LF stored if user pressed the RETURN key
creates unwanted extra lines within a record
drop_crlf utility replaces “CR+LF” with “space+space”
![Page 18: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/18.jpg)
LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records.
SIF-A
new WIN-based records
BroncoNetID in InstitutionID
change expiredate to 1981.01.01
SIF-B
old SSN-based records
change InstitutionID to current BroncoNetID
SIF-C
new WIN-based records
have the current update, expire, and purge dates and BroncoNetID
The heart of the cleanup process
![Page 19: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/19.jpg)
SIF-A
new WIN-based records
BroncoNetID in InstitutionID
change expiredate to 1981.01.01
SIF-B
old SSN-based records
change InstitutionID to current BroncoNetID
SIF-C
new WIN-based records
have the current update, expire, and purge dates and BroncoNetID
update, key on SSN
purge on expiredate 1982.01.01
[remove new records]
1
LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records.
The heart of the cleanup process
![Page 20: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/20.jpg)
SIF-A
new WIN-based records
BroncoNetID in InstitutionID
change expiredate to 1981.01.01
SIF-B
old SSN-based records
change InstitutionID to current BroncoNetID
SIF-C
new WIN-based records
have the current update, expire, and purge dates and BroncoNetID
update, key on SSN
purge on expiredate 1982.01.01
[remove new records]
update, key on SSN
[prep old records to be “new”]
1 2
LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records.
The heart of the cleanup process
![Page 21: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/21.jpg)
SIF-A
new WIN-based records
BroncoNetID in InstitutionID
change expiredate to 1981.01.01
SIF-B
old SSN-based records
change InstitutionID to current BroncoNetID
SIF-C
new WIN-based records
have the current update, expire, and purge dates and BroncoNetID
update, key on SSN
purge on expiredate 1982.01.01
[remove new records]
update, key on SSN
[prep old records to be “new”]
update, key on InstID
[unify old records with new data]
1 2 3
LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records.
The heart of the cleanup process
![Page 22: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/22.jpg)
SIF-A
new WIN-based records
have current BroncoNetID
change expiredate to 1981.01.01
SIF-B
old SSN-based records
change InstitutionID to current BroncoNetID
SIF-C
new WIN-based records
have the current update, expire, and purge dates and BroncoNetID
update, key on SSN
purge on expiredate 1982.01.01
[remove new records]
update, key on SSN
[prep old records to be “new”]
update, key on InstID
[unify old records with new data]
1 2 3
LB4020fix reads the duplicate report (LB4020) and extracts patron sif format data for the duplicate records.
The heart of the cleanup process
This clean-up process, with variations, was repeated many times.
Details omitted here for the sake of brevity (and sanity).
![Page 23: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/23.jpg)
Several things went awry along the way.
Not all records could be matched up with a WIN or SSN (as reported by LB4020), so those had to be handled by
assigning temporary SSNs, WINs, and/or Institution IDs.
![Page 24: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/24.jpg)
Several things went awry along the way.
Not all records could be matched up with a WIN or SSN (as reported by LB4020), so those had to be handled by
assigning temporary SSNs, WINs, and/or Institution IDs.
At another point, the interim records used in the process weren’t deleted during a purge. Those had to be detected, reassigned an older expiration date (1971.01.01), and carefully purged before proceeding.
![Page 25: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/25.jpg)
We now had 1081 duplicate patron records.
![Page 26: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/26.jpg)
We added the expiration date to the duplicate detector, LB4020.
Now we could see that all the SSN-based records were expired, or about to be.
![Page 27: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/27.jpg)
We added the expiration date to the duplicate detector, LB4020.
Now we could see that all the SSN-based records were expired, or about to be.
At this time we discovered that new WIN-based records were coming in as duplicates to SSN-based records that were typically set to expire 2008.09.08.
![Page 28: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/28.jpg)
We added the expiration date to the duplicate detector, LB4020.
Now we could see that all the SSN-based records were expired, or about to be.
At this time we discovered that new WIN-based records were coming in as duplicates to SSN-based records that were typically set to expire 2008.09.08.
This had to change!
![Page 29: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/29.jpg)
We added the expiration date to the duplicate detector, LB4020.
Now we could see that all the SSN-based records were expired, or about to be.
At this time we discovered that new WIN-based records were coming in as duplicates to SSN-based records that were typically set to expire 2008.09.08.
This had to change!
And the semester was about to start…
![Page 30: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/30.jpg)
Yes, we did avert disaster. But we had more problems.
Early September…
![Page 31: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/31.jpg)
Yes, we did avert disaster. But we had more problems.
The duplicate detection report, which had grown to 60 pages, was now down to 1.
The next day it had grown to 3 pages.
Early September…
![Page 32: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/32.jpg)
Yes, we did avert disaster. But we had more problems.
The duplicate detection report, which had grown to 60 pages, was now down to 1.
The next day it had grown to 3 pages.
Some records not having all fields populated on the LB4020 duplicate detector caused problems.
Also had to fix duplicate records where the SSAN field was null.
Early September…
![Page 33: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/33.jpg)
We removed several hundred obsolete records that had neither WIN nor SSN.
Discovered records that had no Institution ID – yet another problem.
Mid September…
![Page 34: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/34.jpg)
We removed several hundred obsolete records that had neither WIN nor SSN.
Discovered records that had no Institution ID – yet another problem.
We are now down to 1 SSN-based record.
Mid September…
This person had our assigned WIN being the same as the SSN. Not supposed to happen!
Identified 15 more such instances and submitted them to I.T. for correction.
![Page 35: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/35.jpg)
Found some more SSN-based records – don’t know why they still existed – and converted them to being WIN-based.
October…
Flipped the “switch” so that we no longer get SSNs for our patron update.
![Page 36: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/36.jpg)
Still had records from our NOTIS era – pre Summer 1998
Purged them if they:
did not have life-time borrowing privileges
did not have an SSN recorded
did have an Institution ID
Legacy data
![Page 37: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/37.jpg)
Trouble ahead…
3M SelfCheck
![Page 38: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/38.jpg)
Trouble ahead…
Multiple Active Barcodes
will NOT work with SelfCheck!
3M SelfCheck
![Page 39: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/39.jpg)
3M SelfCheck requires 1 active barcode per patron.
We had 11058 patrons with multiple active barcodes.
![Page 40: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/40.jpg)
3M SelfCheck requires 1 active barcode per patron.
We had 11058 patrons with multiple active barcodes.
Wrote a program to whittle that down.
Got them reduced to 300, but the next day, it was up to 1777!
![Page 41: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/41.jpg)
3M SelfCheck requires 1 active barcode per patron.
We had 11058 patrons with multiple active barcodes.
Wrote a program to whittle that down.
Got them reduced to 300, but the next day, it was up to 1777!
Under control now, with patrononeactive.pl, running Monday – Friday. This keeps only the most current active barcode for a patron.
![Page 42: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/42.jpg)
3M SelfCheck requires 1 active barcode per patron.
We had 11058 patrons with multiple active barcodes.
Wrote a program to whittle that down.
Got them reduced to 300, but the next day, it was up to 1777!
Under control now, with patrononeactive.pl, running Monday – Friday. This keeps only the most current active barcode for a patron.
Forgot about those patron records without an Institution ID.Had 882 of them. Fixed them.
![Page 43: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/43.jpg)
We looked at records created before 2008, those that had no SSN but did have an Institution ID.
Extracted these records, modified them:
expiredate = createdate
purgedate = expiredate + 4 years
Reimported these records. They should disappear with future annual patron purges.
An eye towards the future…
![Page 44: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/44.jpg)
We still had 11,696 records with no SSN (nor WIN).
We expect most of these to be routinely purged in the future, leaving us with 456.
What we ended with
![Page 45: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/45.jpg)
We still had 11,696 records with no SSN (nor WIN).
We expect most of these to be routinely purged in the future, leaving us with 456.
When we started, we had about 250,000 patron records.
We now have about 68,000.
Duplicate records are routinely dealt with.
We filter out all but the single most current active barcode for a patron.
We will have annual patron purges.
What we ended with
![Page 46: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/46.jpg)
Know what you’re starting with.
Keep your goal in mind.
Figure out a good solution.
Be flexible.
Be ready for mistakes.
Watch out for new/current data undoing your changes.
Know when you’re done.
Worthwhile points…
![Page 47: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/47.jpg)
patronsif.pl
drop_crlf
lb4020.pl
lb4020fix.pl
patrononeactive.pl
patrononactive.ksh
Contact me if you would like to get any of the above.
Resources
![Page 48: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/48.jpg)
patronsif.pl as listed, gets patron data and puts it in patron SIFformat. institution ID based. gets all patron+barcodegroupings. (not site-specific)
drop_crlf shell script that contains this line:
perl -pi -e's/\r\n/ /g' $1
replaces CR+LF combination with two spaces.
(this is useful anytime you use patronsif.pl)
Some details on the resources…
![Page 49: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/49.jpg)
lb4020.pl detects duplicate patron records.shows: name, expired (Y/N), SSAN, expire date,modify date, institution IDWMU-specific: indicates whether SSN or WIN in SSAN.modification required for your institution.
lb4020fix.pl control structure around patronsif.pl code that useslb4020.pl output as starting point for the fixing process.creates one or more patron SIF files for fixing data. use drop_crlf if necessary.
Some details on the resources…
![Page 50: Taking Your Customers to the Cleaners: Historical Patron Data Cleanup and Routine Purge Preparation](https://reader035.vdocument.in/reader035/viewer/2022081518/54c5d18c4a79590d2f8b45ad/html5/thumbnails/50.jpg)
patrononeactive.plqueries Voyager, checking patrons’ active barcodes. if more than one is found, changes all but the most recentactive barcodes to other. check the code carefully as itmay need modification for your use.(incorporates patronsif.pl code)
patrononeactive.kshcombines patrononeactive.pl and drop_crlf in a scriptsuitable for cron use
Some details on the resources…