indexing & retrieval. approaches to indexing key word indexing concept indexing social indexing...

Post on 01-Jan-2016

268 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Indexing & retrievalIndexing & retrieval

Approaches to indexing

Key word indexing

Concept indexing

Social indexing

Non-text indexing

Keyword Indexing

Keyword indexing (1)

• QuickAdvantages:Entity-oriented - draw terms from entity itself

How to

succeed in

graduate

school

Keyword indexing (1)

• QuickAdvantages:Entity-oriented - draw terms from entity itself

• Inexpensive• No vocabulary lag• Multiple access points• Accuracy• No intellectual effort needed

Keyword indexing (2)

• No control over synonyms, near synonyms

Disadvantages:

• No control over homographs

Keyword indexing (3)

• Dependent on authors for informative and accurate titles

Disadvantages:

Artificial metalloenzymes based on the biotin−avidin technology: enantioselective catalysis and beyond

The golden peaches of Samarkhand

Keyword indexing (4)

• No control over word forms

Disadvantages:

Communicating in the library

or

Communications in libraries

Keyword indexing (5)

• No cross reference structureDisadvantages:

Historical key word indexing methodologies

Uniterm cards

Edge-notched cards

Optical coincidence cards

Key word in context (KWIC)

Spatial indexing

Pre- versus post-coordinate indexingMortimer TaubeChina—FolkloreChina—HistoryChina —PoliticsFrance —FolkloreFrance —HistoryFrance —PoliticsGermany —FolkloreGermany —HistoryGermany —PoliticsRussia —FolkloreRussia —HistoryRussia —Politics(12 terms)

China, France, Germany, Russia, Folklore, History, Politics(7 terms)

Post-coordinate index searchingHistory of France → France + History

Two sets of documents

Boolean AND search yields intersection of the two sets

France History

France AND History

Advantages to Taube's systemNo need to develop a list of authorized terms—pulling terms from documents themselves

No need to articulate rules of punctuation for representing complex concepts (France—History)

No need to delineate citation order (France—history v. History—France)

No need to formulate rules for subheadings ("May subdivide geog.")

Uniterm cardsOne card per term

Document no. 102"Arrest statistics of the Arizona State Police"

state31 102 53 24 75 96 107 68 49 70

34 95 117 59 115 147 109

police11 102 23 85 96 87 68 49 6091 115 107 79

Searching with uniterm cardsQuery: looking for documents about state police

102 Arrest statistics of the Arizona State Police.

state31 102 53 24 75 96 107 68 49 70

34 95 117 59 115 147 109

police11 102 23 85 96 87 68 49 6091 115 107 79

107 A short history of the Wisconsin State Police.115 The modern police state.

Edge-notched cardsOne card per bibliographic item

bearsWhirdeaux, ImaCaring for your pet pterodactyl / by Ima Whirdeaux

Call no. Q54321 .W45Turner, PaigeCaring for your pet grizzly / by Paige Turner

Call no. Q12345 .T8

pet-care

pterodactyls

Pyramid coding for edge-notched cardsCoding the year 1947*20 dots

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

*They hadn't heard of the Y2K problem yet.

10 dots

9 5 2 0 9 5 2 0 8 4 1 8 4 1 7 3 7 3 6 6

0 1 2 3 4 5 6 7 8 910 11 12 13 14 15 16 17 18 1920 21 22 23 24 25 26 27 28 2930 31 32 33 34 35 36 37 38 3940 41 42 43 44 45 46 47 48 4950 51 52 53 54 55 56 57 58 5960 61 62 63 64 65 66 67 68 6970 71 72 73 74 75 76 77 78 7980 81 82 83 84 85 86 87 88 8990 91 92 93 94 95 96 97 98 99

Optical coincidence cardsPre-printed cards with numbers for entire database

fleas

Key Word in Context (KWIC) IndexDoc 15 title: "A comparison of OCLC and WLN

hit rates for monographs and an analysis of the types of records retrieved"

CONTEXTttems of remote users: anhit rates for monograph/A

comparison of OCLC and WLNOCLC and WLN hit rates for

onographs/ A comparison ofarison of OCLC and WLN hit

n analysis of the types of s of the types of records

phs and an analysis of theA comparison of OCLC and

KEY WORDSanalysis of the types ofcomparison of OCLC and WLNhit rates for monographs and /monographs and an analysi/OCLC and WLN hit rates forrates for monographs and /records retrieved. A com/retrieved. A comparison /types of records retrieve/WLN hit rates for monogra/

POINTER15151515151515151515

Stopword

Stopword

Key Word Out of Context (KWOC) Indexaardvark 101baggage 123banyan 128, 159, 179coconut 955, 654driving 196, 488, 788elementary 455, 785elephant 128, 465, 783garage 678, 398hardware 849, 483, 399meter 768nadir 877

noxious 112opium 289opus 985, 159,

849people 629, 458quark 137, 492radar 968, 295radio 430, 206,

749stereo 294, 837,

873television 745, 727,

883ultraviolet 958, 774zebra 276

Vector space model (VSM)

Each document represented by a vector

tech

no

log

y

libraries

assi

stiv

e

Vector for document entitled "Assistive technology for libraries"

Vector space model matchingSimilarity between query and document vectors

tech

no

log

y

libraries

assi

stiv

e

Vector for document 2

Vector for query

Vector for document 1

VSM term weightingAssign high weights to terms that appear frequently in the document but infrequently in the database

Query: "I'm looking for articles about assistive technology for the blind."

Termconclusioninformationblind

Freq. w/indocumentlowhighhigh

No. ofdocumentswith termhighhighlow

VSM refinementsAdding semantic and syntactical parsing.

Bill is going to the store to make a purchase.

Bill is going to purchase the store.

Bill is going to store his purchase.

Concept indexing

Concept indexingRather than pulling terms from documents, assign concept identifier (e.g. France—History) to documents dealing with history of France

Requires intellectual effort

Takes more time than key word indexing so less economicalAvoids problems of false coordination and synonymy through use of vocabulary control

Vocabulary control (1)

One indexing term or phrase to represent a concept

– Unidentified flying objects not flying saucers

– Point user to correct term with "use" reference

– Reduces number of searches needed to find items about a particular topic

Vocabulary control (2)

One form of a word to represent the concept

– Dictionaries not dictionary

Vocabulary control (3)

One usage of a homographic term

– Fault (geologic) not fault (responsibility for error)

– Usage identified though scope note– Consistency among indexers as well

as one indexer over time– Helps user to avoid false drops

Vocabulary control (4)

Syndetic structure– Broader terms– Narrower terms– Related terms (see also)– User can negotiate structure to find

most appropriate term, as well as identify additional related terms of potential use in finding relevant documents

Social network indexing

• Tags

• Tag clouds

• User-created tags providing access to library resources

flickr

http://www.flickr.com/

Tags

Tags

Tags

Tags

(177,583 photos)

Tags

Tag clouds

Geotagging

Librarian tagging

Library using flickr

Peace Palace Library (PPL)

Social bookmarking: http://www.delicious.com

http://www.delicious.com/mauicclibrary

http://www.delicious.com/mauicclibrary

The economic case for open access in academic publishing

technology

Portable software for USB drives

CU Researcher Finds 10,000-Year-Old Hunting Weapon in Melting Ice Patch

University of Pennsylvaniahttp://www.library.upenn.edu/

PennTags

Item list with PennTags

Adding a PennTag

Add to PennTags

Non-text indexing

Indexing Music

Indexing Music - melodic contour

* R U- / - / - \

R U R D

Query by humming

Query by humming (2)

Hummed Queries

Digital Audio

Melodic contour Ranked ListOf

Matching Melodies

Pitch Tracker

Query Engine

MIDI Songs

Melody Database

Source: Ghias, Asif; Logan, Jonathan; Chamberlin, David; and Brian C. Smith. 1995. Query by humming--musical Information retrieval in an audio database. ACM Multimedia 95 - Electronic Proceedings. http://www.cs.cornell.edu/Info/Faculty/bsmith/query-by-humming.html

Indexing Music - melodic contour

* R U R U R D

http://www.musipedia.org/

Indexing Music - melodic contour

* R U R U R D

http://www.musipedia.org/

RURURD

Indexing Music - melodic contour

* R U R U R D

http://www.musipedia.org/

Indexing images

Source: Trust Territory archives.

Indexing images - chair (1)

Indexing images - ?

Indexing images - chair (2)

Biometrics - face

Biometrics - differences

Biometrics - similaritiesLook at ratios of distances between marker points

Indexing images by color

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English

Indexing images by color

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English

Indexing images by color

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English

Indexing images by color

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English

Indexing images by color

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English

Indexing images by color

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English

Indexing images by color

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English

Indexing images by layout

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English

Indexing images by layout

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English

Indexing images by layout

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English

Indexing images by layout

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English

Indexing images by layout

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English

Indexing images by layout

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English

Indexing images by shape

http://shape.cs.princeton.edu/search.html

Indexing images by shape

http://shape.cs.princeton.edu/search.html

Indexing images by shape

http://shape.cs.princeton.edu/search.html

Indexing images by shape

http://shape.cs.princeton.edu/search.html

Original

Search by Shape – Commercial Usage

http://www.youtube.com/watch?v=grShwnDXyUA

Search by Color Exercise

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English

Title?

Artist?

Title?

Artist?

1 2

34 5

top related