greenstone building your own collection. overview installation usage building a collection

23
Greenstone Building your own collection

Upload: wilfrid-hicks

Post on 17-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Greenstone Building your own collection. Overview Installation Usage Building a collection

Greenstone

Building your own collection

Page 2: Greenstone Building your own collection. Overview Installation Usage Building a collection

• Overview

• Installation

• Usage

• Building a collection

Page 3: Greenstone Building your own collection. Overview Installation Usage Building a collection

What is Greenstone?A suite of software which has the ability to serve

digital library collections and build new collections.

It provides a new way of organizing information and publishing it on the Internet or on CD-ROM.

Page 4: Greenstone Building your own collection. Overview Installation Usage Building a collection

Ways to find information Searching

Ex, search for particular words in the text “Full-text search” Indexes built from different parts of the document

Browsing Ex, browse document by titles Involves lists, classification

Page 5: Greenstone Building your own collection. Overview Installation Usage Building a collection

MetadataMetadata are descriptive data associated with each document.

For ex,

<Metadata name="PictureN">boon.jpg</Metadata> <Metadata name="Height">137feet</Metadata> <Metadata name="Date">1852</Metadata> <Metadata name="State">Maine</Metadata> <Metadata name="Title">Boon Island Light</Metadata>

• Can be used as searchable index• Used to generate the browsing structures (lists or hierarchical structures) through “classifiers”

Page 6: Greenstone Building your own collection. Overview Installation Usage Building a collection

Greenstone Document Format

•XML format:Source documents are converted into standard XML format by “plugins.” Plugins can process plain text, HTML, WORD, and PDF documents, and email messages.

•Multimedia documentsEither linked to the textual document or accompanied by textual descriptions.

•Multilanguage documentsUnicode to represent the character sets for consistency

Page 7: Greenstone Building your own collection. Overview Installation Usage Building a collection

Why using Greenstone? Forget cgi programming Built-in server GUI is provided Easy to use Making large collection in a short time

becomes possible

Page 8: Greenstone Building your own collection. Overview Installation Usage Building a collection

Installation Download from the www.greenstone.org page. Platform: Windows or Unix system. local library or web library?

Local library has a built-in webserver. Web library

Configure the external webserver Point to URL of Greenstone's library executable, like

http://localhost/gsdl/cgi-bin/library.exe "Enter Library" or "Restricted Version”?

“Restricted Version” used only when networking software has been installed incorrectly. Windows keeps attempting to dial up your internet service

provider. “Restricted Version” must use a Netscape web browser.

Page 9: Greenstone Building your own collection. Overview Installation Usage Building a collection

Using Greenstone Searching and Browsing

punctuations are ignored in search terms Query types --- “all” and “some” Icon meanings Setting the perferences

sensitivity, stemming, Boolean queries Change language Change presentation

Page 10: Greenstone Building your own collection. Overview Installation Usage Building a collection

BUILDING A COLLETION

Page 11: Greenstone Building your own collection. Overview Installation Usage Building a collection

Using "the Collector" easy to use builds collections based on the existing

collection with new content Not feasible to use the “collector” alone to

create collections with completely new structures

Building from command line is preferable

Page 12: Greenstone Building your own collection. Overview Installation Usage Building a collection

Step by step instructions 1. Change to the correct directory > cd “C:\Program Files\gsdl”

2. Invoke setup.bat, which is needed for each new DOS session > setup.bat

3. Make a collection > perl –S mkcol.pl –creator [email protected] Lhouses

Lhouses is the collection name. Now you have a new collection directory called Lhouses.

Page 13: Greenstone Building your own collection. Overview Installation Usage Building a collection

4. Populate the collection

Copy documents into the Lhouses collection’s import directory. This is can be done through copy and paste using Windows Explorer. Or, on the command line, type

> cd "%GSDLHOME%\collect\Lhouses”

> xcopy /s document_path\* import

If you have stored all the documents in C:\My Document\LHCollection, then document_path is C:\My Document\LHCollection.

Page 14: Greenstone Building your own collection. Overview Installation Usage Building a collection

5. Import the collection

> perl –S import.pl Lhouses

6. Edit collect.cfg file

It is the configuration file for the collection, which is in the collection’s etc directory.

    Give the collection a name through collectionmeta collectionname

  Add a description of your collection through collectionmeta collectionextra "barabara…".

    Add a collection icon through collectionmeta iconcollection “_httpprefix_/collect/Lhouses/images/icon.gif” If the image is in the collection.s images directory

=> collect.cfg

Page 15: Greenstone Building your own collection. Overview Installation Usage Building a collection

7. Build the collection

> perl –S buildcol.pl Lhouses

8. Make the collection available over the web

Either select the contents of the building directory and drag them into the index directory.

Or, remove the index directory (and all its contents) by typing

rd /s index # on Windows NT/2000

deltree /Y index # on Windows 95/98

Page 16: Greenstone Building your own collection. Overview Installation Usage Building a collection

and then change the name of the building directory to index with

ren building index

Finally, mkdir building

Page 17: Greenstone Building your own collection. Overview Installation Usage Building a collection

Unix commandscd ~/gsdl # assuming default Greenstone in home directory

source setup.bash # if you.re running the BASH shell

source setup.csh # if you.re running the C shell

mkcol.pl .creator [email protected] Lhousescd $GSDLHOME/collect/Lhousescp .r document_path/* import/import.pl dlpeoplebuildcol.pl dlpeoplerm -r index/*mv building/* index

Page 18: Greenstone Building your own collection. Overview Installation Usage Building a collection

The import process converts documents of various formats into Greenstone Archive Format. Import.pl needs to know what plugins are to be used. Plugins parse the imported documents and extract metadata from them. See ex.

The build process compresses the text, builds full-text indexes according to the collect.cfg, and precalculates the appearance of the collection.

Import and Build processes

Page 19: Greenstone Building your own collection. Overview Installation Usage Building a collection

Assigning Metadata from a file and build search indexes assign metadata from a single file, metadata.xml Make sure the plugin RecPlug is included in the

collect.cfg and the use_metadata_files option is set. add searching indexes, see collect.cfg move metadata.xml to the import directory Import the collection again and rebuilt. >perl –S import.pl Lhouses

>perl –S buildcol.pl Lhouses

>rd /s index (or deltree /Y index)

>ren building index

>mkdir building

Page 20: Greenstone Building your own collection. Overview Installation Usage Building a collection

Create Browsing Indexes Through Classifiers

•Vlist, Hlist, Datelist

•classifiers contain a metadata argument, by which the documents are classified and sorted. See collect.cfg

•For hierarchy classifier, it needs a classification file, which defines the metadata hierarchy. Three parts: Identifier, Position-in-hierarchy, name of the classification.

For ex, subheight.txt and substat.txt

•the classification file are put into the etc directory, rebuild (>perl –S buildcol.pl Lhouses, then rd /s index or deltree /Y index, and ren building index, finally, mkdir building)

Page 21: Greenstone Building your own collection. Overview Installation Usage Building a collection

Formatting Output   Format the document Format the lists produced by classifiers and searches

Add format strings to collect.cfg

Then rebuild.

Page 22: Greenstone Building your own collection. Overview Installation Usage Building a collection

Another way of assigning metadata assigning metadata from a file called index.txt, using the plugin indexplug, see collect.cfg

Put index.txt in the import directory. Modify collect.cfg. Then re-import and rebuild the collection.

Page 23: Greenstone Building your own collection. Overview Installation Usage Building a collection

References: 1. Greenstone Installation Guide,

2. Greenstone Users’ Guide,

3. Greenstone Developers’ Guide,

4. Documentations from “Light Houses” Group of CPSC 670, Fall 2001.