problems with non-roman character (korean) searching

39
Problems with Non-roman Character (Korean) Problems with Non-roman Character (Korean) Searching Searching Prepared by Prepared by Young Ki Young Ki Lee Lee Senior Cataloging Senior Cataloging Specialist Specialist Korean/Chinese Korean/Chinese Team Team RCCD RCCD Library of Library of Congress Congress

Upload: yana

Post on 09-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Problems with Non-roman Character (Korean) Searching. Prepared by Young Ki Lee Senior Cataloging Specialist - PowerPoint PPT Presentation

TRANSCRIPT

  • Problems with Non-roman Character (Korean)Searching

    Prepared by Young Ki Lee Senior Cataloging Specialist Korean/Chinese Team RCCD Library of Congress

  • Topics to be covered

    1.Non-roman script (Korean) searching under CJK data fields without spacing2.No Unified index (Normalization) between Hangul (Korean) and Hancha (Chinese character)3.Microsoft Korean IME4.Display of search results5.CJK Compatibility Database

  • Title Word Search for Search (: the border):-the number of hits on this ti: search is 363

    -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / : / : //, : /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

  • Search9

  • Title Word Search for Search (: the border):-the number of hits on this ti: search is 360

    -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)-the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / : / : //, : /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

  • Title Word Search for

    Search (: the border):

    -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as

    = / = / = / = //, = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

  • Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as =

    =

    = = , = , etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

  • Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = /

    = //, = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

  • Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / =//, = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

  • Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / =// = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

  • 7

  • Title Word Search for Search (: the border):

    -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)-the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / = //, = /, etc.

    -In LC Online Catalog: (currently with space), title word search retrieves only 9 hits

  • Title Word Search for Search ( : philology):-In OCLC, the number of hits on ti: search is 308-the ratio of relevant hits is only 37% (36 out of 95) in the first group (Books 1900-1991)-Includes

    = = / = = / , = /, etc.

    -In Voyager (currently with space), same search (tkey ) retrieves 32 hits

  • Title Word Search for

    Search ( : name of ancient Korean country)

    retrieves irrelevant records, such as =/////CD-ROM = CD-ROM///// = // = //////

    = // 5= ///5//////// = ///, etc.

  • 2

  • 4

  • 7

  • Kochoson8

  • komunso1

  • Komunso2

  • Komunso3

  • Title Word Search for

    ( : Korean Economy): ti: search

    -search : the number of hits 300 -search : the number of hits 652-search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490

    Title Phrase search for : ti= search

  • Title Word Search for

    ( : Korean Economy): ti: search

    -search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490

    Title Phrase search for : ti= search

  • Title Word Search for

    ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3-search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490

    Title Phrase search for : ti= search

  • Title Word Search for

    ( : Korean Economy): ti: search

    -search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3-search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490

    Title Phrase search for : ti= search

  • Title Word Search for

    ( : Korean Economy): ti: search

    -search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,499

    Title Phrase search for : ti= search

  • Title Phrase Search for

    ( : Korean Economy): ti: search

    -search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3-search : the number of hits 0-search Hanguk kyongje : the number of hits 1,490-search # : the number of hits : 461 (ti: AND ti: )

    Title Phrase search for : ti= search

  • Search ti: nodongja or or or

  • Search ti: nodongja or or or

  • Korean IME Problems 1. Personal name search with invalid character from Korean IME

    -Search in pn: : 0 hit. (F9E1) is invalid character from Korean IME-Search in pn: : 157 hits. (674E) is valid MARC21 character

    2. Title search with invalid character from Korean IME

    -Search in ti: : 0 hit. (F941) is invalid character from Korean IME-Search in ti: : 21,393 hits. (8AD6) is valid MARC21 character

    3. Korean Family name -No MARC 21 equivalent

  • Display Order 1.Browse search: sorted by Unicode value number roman Japanese Hancha Hangul

    2.Keyword search: sorted by alphabet order of Romanization formnumber -- Romanization3.Display order : character by character on designated value

  • sort2 Unicode total strokes radical (# : stroke): 9280: 14 167 (gold) 8: 9580 : 8 169 (gate) 8: 990A: 15 184 (eat) 6: 9B42 14 194 (ghost) 10: AC00

  • sort3

  • Display OrderBrowse search: sorted by Unicode value number roman Japanese Hancha Hangul

    2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization3.Display order : character by character on designated valueNOT word by word

  • sort1: C9C4: CE68: C911: C778

  • Display Order1.Browse search: sorted by Unicode value number roman Japanese Hancha Hangul

    2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization3.Display order : character by character on designated valueNOT word by word

  • CJK Compatibility Database

    The CJK Compatibility Database includes more than 450 non-MARC21 Chinese, Japanese and Korean characters, Hangul syllables and diacritic marks, matched with their MARC21 equivalents.The database is intended to enable catalogers to quickly and conveniently replace a non-MARC21 character with its MARC21 equivalent.The list of characters in the database was initially identified by LC staff, and was supplemented by entries in a similar database at Yale University.The database is a cooperative undertaking, and is intended for the use of all CJK catalogers. If you encounter a non-MARC21 character in the course of your work, please report it to us so that it can be added to the database. Notify Young Ki Lee, Senior Cataloging Specialist, Korean/Chinese Team, Library of Congress, at [email protected].

  • Thank you