organization of information and access to information lis510

63
Organization of Information and Access to Information LIS510

Upload: mae-stephens

Post on 27-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Organization of Information and Access to Information

LIS510

Organization of KNOWLEDGE

• Libraries organize knowledge. Otherwise nothing that is in a library could ever be found.

• Traditional method of doing this have been labor intensive. They can not cope with the exploding amount of information.

• But the theoretical approaches and the tools developed by librarians remain very important for any attempt at organizing knowledge by computer.

Approaching knowledge

• There are things we want to know about. These are called subjects.

• And there are ways of looking at things. We can call these “disciplines”

• Example: subject sex, way of looking at it through biology, gender studies, pornography.

Classification

• It is the act of organizing the universe of knowledge into some systematic order.

• Classification provides a descriptive and explanatory framework for ideas and a structure of the relationship among the ideas.

• Example: Science > Chemistry > Organic Chemistry

Classification schemes

• Generally schemes to classify subjects are discipline oriented, rather than subject oriented.

• Example– Sports > Racing > Horse Racing– Science > Biology > Zoology > Horses >

Horse Racing

• The same thing can be viewed form different disciplines.

Use of classification schemes

• As a way to arrange items in a library. Things with same classification are next to each other. This encourages the patron to discover similar items.

• But sometimes, serials are kept apart. Individual articles in serials are not classified.

• Once a system is chosen, a library sticks with it.• There are two common ones:

– Dewey Decimal Classification– Library of Congress Classification

Dewey Decimal Classification

• Introduced by Melvil Dewey in 1876.• Ten top classes

– 000 Generalities – 100 Philosophy, parapsychology, occultism, psychology– 200 Religion– 300 Social Sciences– 400 Language– 500 Natural Sciences & Mathematics– 600 Technology (Applied Sciences)– 700 Arts– 800 Literature & Rhetoric– 900 Geography, history and auxiliary disciplines

Subclasses

• Example– 640 is home economics– 641 food and drink

• another– 795 games of chance– 795.4 card games– 795.41 card games where skill is involved– 975.412 poker

Comments on DDC

• Mainly used in public libraries.• Like any scheme, it needs updating. Such

updates a cumbersome.• Like any scheme there is a significant

cultural bias in it. • Owned by OCLC and sold very dearly.

OCLC sued the library hotel for using the scheme. This limits the uptake of the scheme and therefore it usefulness.

Library of Congress Classification

• Has twenty top letter as classes.

• Looks at the world from an academic perspective.

• Therefore used in universities.

• Owned and maintained by the Library of Congress, problems with restricted access are similar to DDC.

Library of Congress Classification Outline

A -- GENERAL WORKSB -- PHILOSOPHY. PSYCHOLOGY. RELIGIONC -- AUXILIARY SCIENCES OF HISTORYD -- HISTORY: GENERAL AND OLD WORLDE -- HISTORY: AMERICAF -- HISTORY: AMERICAG -- GEOGRAPHY. ANTHROPOLOGY. RECREATIONH -- SOCIAL SCIENCESJ -- POLITICAL SCIENCEK -- LAWL -- EDUCATION

Library of Congress Classification Outline

M -- MUSIC AND BOOKS ON MUSICN -- FINE ARTSP -- LANGUAGE AND LITERATUREQ -- SCIENCER -- MEDICINES -- AGRICULTURET -- TECHNOLOGYU -- MILITARY SCIENCEV -- NAVAL SCIENCEZ -- BIBLIOGRAPHY. LIBRARY SCIENCE.

INFORMATION RESOURCES (GENERAL)

Controlled vocabulary

• Many words can be used to describe the same thing– US, U.S., United States of America

• One approach to deal with this problem is to use only one term, consistently.

• Example: the yellow pages provide a consistent vocabulary for all professions.

Purpose of controlled vocabulary

Vocabulary control is the process of organizing a list of terms– to indicate which two or more synonymous

terms is authorized for use– to distinguish between homographs– to indicate hierarchical and associate

relationships between term

Thesauri

• A thesaurus is list of words. For each word, there is a list of related words and the type of relationship that the word has with each related work. Examples

• LIBRARIES – Narrower Terms

• Academic Libraries [+]• Branch Libraries

– Related Terms• Information Centers

• Thesauri are search tools.

Subject headings

• These are controlled vocabularies of subjects that can be added to a record.

• They may also contain similar relationships between terms.

• But unlike thesauri, they are used when creating the bibliographic records. Thus they are indexing tools.

Subject Headings

• LC subject headings (LCSH)are very complete, but are not easy to use.

• Smaller libraries use Sears subject headings– less compete– easier to use

Controlled Vocabularies

• Subject heading lists– LCSH (Library of Congress Subject Headings)– Sears List of Subject Headings– MeSH (Medical Subject Headings)

• Thesauri– AAT (Art & Architecture Thesaurus)– Thesaurus of ERIC Descriptors– Many more...

Subject Heading Searches

• were scoffed at in the 1830s: Strout: “classified catalogs and indexes were not needed because living librarians were better than subject catalogs...[and] any intelligent man who was sufficiently interested in a subject to want to consult material on it could just as well use author entries as subject, for he would, of course, know the names of all the authors who had written in his field.”

Subject searching in the20th century ...

• continued to be denigrated, even though Cutter had convinced American librarians to use subject headings in dictionary catalogs.

• Catalog use studies (in academic libraries) showed that most searching was for known items, or at least for known authors. (Studies in public libraries showing high use of subject searching tended to be ignored.)

Subject searching in the1990s ...

• was shown by transaction logs to be the most popular kind of search in online catalogs (to the surprise of many)

• but such searches were difficult because users did not know how to construct exact subject heading strings

• so they compensated with title keyword searching

Subject searching in 2009 ...

• is often ignored in favor of keyword searching, which can now be used to search every word in a record in many OPACs ...

• and so has resulted in a suggestion that subject headings should be stripped from records in order to save gigabytes of space and to save the time required to create them

Remove Subject Headings?

• Some experienced librarians had observed that some keyword searches retrieved records that only had one or more sought-after word(s) in a subject string in a subject heading field.

• How often is this true?

The Taylor-Gross Study

• Research Question:– What proportion of records retrieved by a

keyword search has a keyword only in a subject heading field, and thus would not be retrieved if there were no subject headings?

Study Conclusions

• The study found that if subject headings were to be removed from catalog records (or no longer added to them), users performing keyword searches would lose more than one third of the hits they currently retrieve (35.9% on average).

• The loss of hits would be in addition to the loss of other functions and advantages provided by controlled vocabulary in general.

Access Points - Purposes

• To identify (e.g., an entity known to the user)• To collocate (i.e., bring together related

information packages)• To aid in evaluating or selecting (e.g., Has this

author written something newer on the subject? Which of several works with the same title do I want? What level of subject treatment is needed – a whole work on the subject? a chapter? A paragraph?)

• To locate a copy of the information package represented

Access Points for Names and Titles -Purposes

• To facilitate the retrieval of names and titles that are imperfectly remembered

• To facilitate the retrieval of names and titles that are expressed differently in different information packages

• To facilitate the retrieval of names and titles that have changed over time

• To collocate expressions and manifestations of works

• To collocate works that are related to other works

Access Points for Names and Titles– How Accomplished

• Name and Title Authority Control– All access points (whether main or added entries)

need to be under authority control so that • persons or entities with the same name can be distinguished

from each other• all names used by a person or body, or all manifestations of

name of a person or body will be brought together• all differing titles of the same work can be brought together

– Therefore, current practice dictates either the establishment of a “heading” for each name or title as an access point or the provision of pointers to draw different representations of names or titles together

• Headings are kept track of in authority files

LC authority control

• LC maintain authority files. They are not free but you can consult them on the web.

• Let us try this out now, see http://authorities.loc.gov/

• Look at the personal authority file and search for someone reasonable famous that you like.

Catalogs

• Catalogs are collection of records about a library’s holdings.

• In olden days, they were organized by author only.

• In more modern days you can approach by various “access points” such as title, author, subject.

Aims of catalog

• Cutter’s 1876 work still pertinent here

• Catalogs– enable person to find a book of which either

author, title, subject is known– to show what the library has for a given

author, on a given subject, in a given type– to assist with the choice of the book by edition

or by its character (literary or topical)

Catalogs

• Cutter’s vision is more from the user’s point of view, but from the library’s point of view it is also important to know:– location– physical characteristics (e.g. oversize)– circulation properties

• “The convenience of the reader.”

Description - Purposes

• To present the characteristics of an information package• To give enough information about an information

package to identify it uniquely and to distinguish it from every other information package

• To aid in evaluating or selecting (e.g., Is the original manuscript of a book needed, or will a printed copy do? Is the 8th ed. as good as the 9th ed. for my purposes? Is a vinyl LP o.k., or can I only play a CD?)

• To provide a filter that serves as a surrogate for a full information package so that users do not have to examine a multitude of complete (e.g., full text) packages in order to find what is needed

Description – How Accomplished

• Determine the unit to be described– “catalogable” unit– finite vs. continuing resources– work – expression – manifestation – item

• Create surrogate records by selecting important pieces of information from or about the information package

• Use rules or conventions created by different communities to determine which pieces of information will be included

Bibliographic record

• This is a record that describes an item in the library.

• Anglo-American Cataloguing rules are a set of standardized rules for creating such record.

• These rules go back to the 19th century, but are being revised.

• Currently AARC2 is in use, last revised 2002.

Fields

• Parts of a record are called fields. A record can contain many fields. A field has a name, and a value. Example– Title: Rick Block’s Home Page– Author: Rick Block– URL: http://www.columbia.edu/~rjb57

• is a record with three fields. The first field is named “Title”, and its value is “Rick Block’s Home Page”

MARC

• MARC is a record with field name that are numbers and some sub field. The same example as previously (basically)– 100 Block, Rick– 245 Rick Block’s Home Page– 856 http://www.columbia.edu/~rjb57

• There are gazillions of rules to learn before you can write a correct MARC record.

Historical Overview

• Evidence of “organization” since ancient times…

• What caused the major change of events in the fifteenth century?

• Guttenberg has been called the most influential person

• Creation of “bibliographers”

More Evolution

• French national code of cataloging began (used playing cards…)

• Similarities with card catalogs still in use today!

• 19th Century, Panizzi’s 91 Rules

• 1850, Jewitt’s code for the catalog of the Smithsonian

Standard Rule

• “Uniformity is then imperative; but among many laborers can only be secured by the adherence of all to rules embracing, as far as possible, the minutest details of the work”

20th Century (or Alphabet soup…)

• ALA’s involvement• Library of Congress’ involvement• 1968 OCLC• 1967, AACR• 1973 RLIN• 1974 IFLA, International Standard Bibliographic

Description• 1978, AACR2 (incorporated ISBD)• 1988, AACR2r• 2008? RDA

Other tools: Indexes

• An index is a list of terms and for each term a list of locations where it can be found. Example, for these slides– catalog: 17,18,19,20– subject: 3,5,15,16,17,18

• They have a crucial role in information retrieval.

Types of indexing

• precoordinate indexing: an indexer (usually a person) selects all the indexing terms and decided how they are combined.

• postcoordinate: searchers can use indexing terms they like. for example they can ask if there are slides that have “subject” and “catalog”.

Other tools

• Bibliography: list of materials or items usually restricted in some way, such as by subject, form (e.g. periodicals), or coverage (e.g. items published before 1900).

• Abstract: a brief, objective representation of the contents of a primary document or oral presentation.

What about the Internet???

• Can we really organize it?

• How?

• Current Initiatives—the Dublin Core

• Search Engines, Directories, Hybrids

• Disparate types of information? Sound, graphics, multimedia?

What about the OPAC: Challenges from a changing environment

• An expanding information universe• OPAC one of many peer resources

• Multiple local collection catalogs: visual materials, GIS, archival collections, social science datasets, plus an opac (and lots of little databases)

• Licensed external services proliferating • Plus internet engines, on-line book stores, etc., etc.

• The role and place of the opac is changing dramatically!• Adaptability and integration becoming critical

OPAC attributes• Hope that OPACs evolve to enable greater

integration with the larger information environment

Better search systems

• Internet and the explosion of digital information generating tremendous research and innovation in search technology– Faster– Better results– Assist the user– Dealing with large retrieval sets

• Hope that OPAC will profit from Internet search innovation

Invaders in our domain

• Amazon.com: We’ve been hearing: “it’s so much easier to find books in Amazon – I go there first, than to the catalog”

• And now…Search Inside the Book

A Quote

• “On the other hand, there’s Amazon.com. I’m hardly the first to note that Amazon as a catalog or research tool is easier to use and significantly more productive than conventional academic library catalogs.”

• Tim Burke (Swarthmore): Burn the Catalog (2004)

Invaders in our domain

• Google Print and Google Scholar• * library metadata from OCLC and

digital libraries• * search contents of e-journals

(CrossSearch)• * search contents of books

(GooglePrint)• What next???• (and there will be something next!!!)

Invaders in our domain

• Why can’t I find journal articles along with books in the catalog?”

• (That’s what Google is doing…)• A fear: Opac increasingly ignored for more

appealing and powerful services• A hope?: Integrate opac informationwith

other search services– (search Google, then find the book location in

your library)

An unstable environment

• Many, many more players in the information environment

• Enormous amount of experimentation, creativity

• Technology enables new models, services, and players

• Change enormously rapid• Google is only 6 years old.

An unstable environment

• A fear: OPAC will stagnate and become irrelevant

• Why stagnation– Opac technical platform not flexible, unable to evolve

rapidly– opac developments tied to very long development

timeframe– underlying opac model 20 years old, interfaces 10

year old…– ILS vendors turn their attention elsewhere

• no longer invest resources in opac

More quotes from Burn the Catalog

• “I think we’d be better off to just utterly erase our existing academic catalogs”

• “lock all the vendors and librarians and scholars together in a room, and make them hammer out electronic research tools that are Amazon-plus”

• (to create) “a catalog that is a partner rather than an obstacle in the making and tracking of knowledge”

Expectations are changing

• 20 years ago the Tim Burkes of the world were wildly enthusiastic about opacs!

Problems with Existing Catalogs?

• Known item searching works pretty well (sometimes), but …

• Lots of topical searches and poor subject access– keyword gives too many or too few results – leads to general

distrust among users– authority searching is under-utilized and misunderstood

• Relevance = system sort order• Impossible to browse the collection• Unforgiving on spelling errors, stemming• Response time doesn’t meet expectations of web-savvy

users

Features of Next Generation Catalogs

• The new catalog will not contain only bibliographic records. It will have reviews, cover art, citations within works, searchable text, and even commentary by users.

• It will be less work for the users, asking for fewer choices before searching, fewer layers to go through before seeing actual resources.

• The catalog will be able to recommend. It won't just give users a bibliographic record and let them sort it out. Recommendation may come in the form of ranking, or it could give users options based on what is most popular, what is used in current courses, or what items have been chosen as preferred by other users.

Features of Next Generation Catalogs

• It will be interactive. Interactive means that users can make use of information; they can overlay data onto maps and create their own links between resources.

• The catalog will be participatory. Some libraries are already experimenting with providing blogs or wikis, or creating "MY space"-type user profiles. Few today are willing to go so far as to suggest that the user could actually make changes in the library catalog, yet real participation requires this. In a small step in this direction, WorldCat is allowing users to add reviews, but it has a long way to go before it creates the kind of community feeling that readers have about Amazon.

• Reality necessitates that the catalog will be heterogeneous; it will be able to contain more than one kind of data. Not only must the catalog take in data from resources outside of the library, but it also has to be willing to link out to non-library data, such as to Wikipedia or Google Book Search.

Features of Next Generation Catalogs

• State-of-the art Web interface• Federated searching• Enriched content• Faceted navigation• Keyword searching

– Did you mean …?– Spell correction– Stemming

• Relevancy• RSS• Tag clouds• User contributed content

Questions…

• Does the competition matter?– Let’m use Google and Amazon if that suits

their needs!

• Even if it matters, do we have the resources to hold our own in this environment?– Google spent $200M in ‘04 in R&D (not

including stock options…)– and expects to increase that by 50% this year

Questions…

• Should we shrink the role of the opac?– locating items, organizing deeply complex

parts of the collection – and shrink the cost of creating it?

• Or separate it from the ils and “modernize” it using a commercial search engine?– possibly not “MARC aware”…

Questions…

• Do opacs need expensive, complex metadata in the world of Amazon (simple metadata) and Google (full text searching)?– is the world moving towards dumb data, smart

engines?

Comment?

• “Bibliographic control is central to librarianship in a realistic and functional manner. It is impossible to imagine anything called ‘librarianship’ without the structure and patterns of thought we find in bibliographic control … it is vital that librarians think logically, understand the ways in which knowledge and information are organized for retrieval, and be able to communicate their knowledge of these structures to the library user.”