web search context management using javascript/cookie and jsp

62
1 WEB SEARCH CONTEXT MANAGEMENT USING JAVASCRIPT/COOKIE AND JSP/DATABASE TECHNOLOGIES Hong Yin Certificate of Approval: ________________________________ __________________________________ Wen-Chen Hu, Chair Gerry V. Dozier Assistant Professor Assistant Professor Department of Computer Science Department of Computer Science and Software Engineering and Software Engineering ________________________________ __________________________________ Alvin S. Lim Stephen L. McFarland Assistant Professor Dean, Graduate School Department of Computer Science and Software Engineering

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

1

WEB SEARCH CONTEXT MANAGEMENT USING JAVASCRIPT/COOKIE AND

JSP/DATABASE TECHNOLOGIES

Hong Yin

Certificate of Approval:

________________________________ __________________________________Wen-Chen Hu, Chair Gerry V. DozierAssistant Professor Assistant Professor

Department of Computer Science Department of Computer Science

and Software Engineering and Software Engineering

________________________________ __________________________________Alvin S. Lim Stephen L. McFarlandAssistant Professor Dean, Graduate SchoolDepartment of Computer Scienceand Software Engineering

2

WEB SEARCH CONTEXT MANAGEMENT USING JAVASCRIPT/COOKIE AND

JSP/DATABASE TECHNOLOGIES

Hong Yin

A Technical Report

Submitted to

The Graduate Faculty of

Auburn University

In Partial Fulfillment of the

Requirements for the

Degree of

Master of Computer Science and Software Engineering

Auburn, Alabama

May 11, 2001

3

Table of Contents

1. Introduction ……………………………………………………………………………………………. 1

1.1 Objectives ………………………………………………………………………….……………… 1

1.2 Project Outline …………………………………………………………………………………….. 2

1.3 Report Organization ………………………………………………………………………………. 3

2. Technical Overview ……………………………………………………………………………………. 4

2.1 JavaScript …………………………………………………………………………………………. 4

2.2 Cookie Technology ……………………………………………………………………………….. 6

2.3 Java Server Pages …………………………………………………………………………………. 9

2.4 MySQL …………………………………………………………………………………………... 16

3. The JavaScript/Cookie Approach .……………………………………………………………………. 20

3.1 System Structure ………………………………………………………………………………… 20

3.2 User Interface and Functionality ………………………………………………………………… 21

3.3 Implementation …………………………………………………………………………………... 23

4. Experimental Results of the JavaScript/Cookie Approach …..………………………………………. 29

4.1 Experiment I ……………………………………………………………………………………... 29

4.2 Experiment II ……………………………………………………………………………………. 37

5. The JSP/Database Approach …………………………………………………………………………. 42

5.1 System Structure ………………………………………………………………………………… 42

5.2 User Interface and Functionality ………………………………………………………………… 44

5.3 Implementation …………………………………………………………………………………... 45

6. Experimental Results of the JSP/Database Approach ..………………………………………………. 50

6.1 Search Results ………………………………………………………………………………….... 50

6.2 Search-Result Management ……………………………………………………………………… 54

7. Conclusions …………………………………………………………………………………………... 55

References ………………………………………………………………………………………………… 57

4

TECHNICAL REPORT ABSTRACT

WEB SEARCH CONTEXT MANAGEMENT USING JAVASCRIPT/COOKIE AND JSP/DATABASE

TECHNOLOGIES

Hong Yin

Master of Computer Science and Software Engineering

62 Typed Pages

Directed by Dr. Wen-Chen Hu

Experienced web users may present complicated behaviors while searching on the web, such as opening

multiple windows for viewing, searching the same keyword with multiple search engines, and gathering

information over many sessions. It is therefore important to keep track of a user's search context

appropriately. In this report, we propose a method to manage a web user’s search context explicitly. Two

approaches are employed to facilitate the management of a web user's search context. One approach uses

JavaScript and cookie technology to save a user's search context as cookies on the client side computer.

Four web search engines (Google, GoTo, Hotbot, and MSN Search) are used to test this JavaScript/Cookie

approach. Search results can be shown in an integrated way after a ranking process. Another approach uses

the Java Server Page (JSP) to save a user's search context into the MySQL database for long term storage

on the JSP server side. JavaBeans are used for data communication between different JSP programs. A

comparison is made between these two parallel approaches. The two-approach design of this project has

several desirable features. The JavaScript/Cookie approach has the advantage of portability across all major

platforms and browsers, and it needs no server side storage, and no additional client-server communication

overhead. The JSP/Database approach has three main advantages: (i) JSP implementations separate content

generation from presentation; (ii) JSP implementations support a Java programming language-based

scripting language, which provides inherent scalability and support for complex operations; and (iii)

database implementations have a larger storage capacity, thus enabling the storage of more detailed context

information.

5

Acknowledgements

The author would like to express her sincere gratitude to her major professor Dr. Wen-Chen Hu for his

advice, understanding and encouragement throughout the long course of project development. She also

would like to thank her committee members, Dr. Gerry Dozier and Dr. Alvin Lim, for their valuable time

and for all the knowledge they have given her during her graduate studies. Sincere thanks also go to

WebAppCabaret (http://www.webappcabaret.com) for providing free web application hosting.

6

Chapter 1

Introduction

The World Wide Web is a global, seamless environment in which all information (text, images, audio,

video, computational services) that is available on the Internet can be accessed in a consistent and simple

way by using a standard set of naming and access conventions. The number of web pages accessible via the

World Wide Web has grown rapidly in recent years. Web users often use various search engines to retrieve

information. However, even with the best search engines, which claim to have indexed millions of Web

pages, most users have problems with finding exactly what they want, unless they spend a lot of time

browsing through the numerous URLs suggested by different search engines.

1.1 Objectives

To obtain the most relevant search results from the Internet, experienced users of web search engines often

adopt the following search strategies [1]:

1. They often try the same query on several search services in a parallel way.

2. They may often search similar keywords or keywords in the same field as their specific interest.

3. They may look at maybe more than one search-result page from the above search services. When they

find fairly useful results, they are often not sure if they should go on searching or if the current results

are the best ones available.

The problem with the above search strategies is that the user accumulates a lot of contextual information

over time and there is no convenient way to record it or make it explicit. Specifically, they need to

memorize the URLs of potentially useful search results as they look for more relevant results. Also they

need to memorize useful queries over a period of time. Both of these problems could be hard to solve. A

potential solution is to save the queries and results to the browser’s bookmark collection. However, this is

not very convenient because: (1) many users do not want to overload their bookmark list with intermediate

search results, wishing to save it for long-term storage of high quality web pages; and (2) as tentative result

pages from similar queries get bookmarked, they might become interleaved and hard to distinguish.

7

1.2 Project Outline

In this project, our system consists of two parallel, independent experiments. One experiment is comprised

of: (a) a Java Swing graphical user interface (GUI), where users can select search engines, type in search

key words, and submit their queries or view search results; (b) a ranking kernel, where downloaded web

documents are integrated. Those URLs that are referenced by more search engines will get higher ranks and

will be shown in the integrated results page; and (c) a searching kernel, where searching inside the cookies

file in the computer’s local storage is performed when users choose to press the button for “View Search

Results.” Also, search results will be shown to the user in another browser. This approach embeds

JavaScript code in the result HTML documents to dynamically set up cookies that contain relevant search

context information including query term(s) and user-preferred URLs in the client-side storage.

Another experiment keeps search context information in the database on the server side for

explicit storage and efficient searching. It is comprised of: (a) a Java Server Page, which performs a similar

function to the Java Swing GUI in the first part; and (b) a second Java Server Page, which first searches in

the underlying database to see if the database contains record(s) with the same keyword. If it does exist, it

will list the search results directly. If it does not exist, it will launch its matching function and display the

search results in a reorganized form. Search context, which are queries recently deployed by the user along

with the hyperlinks of the result pages the user visited and liked in the context of each query, will be saved

in the database on the server side. The database table has the following attributes – keywords, URL, and

date. This approach is implemented using Java Server Page and the MySQL database.

Each of the two approaches used in this project has several desirable features:

(1) JavaScript/Cookie approach

• Portability across all major platforms and browsers.

• No server side storage.

• No additional client-server communication overhead.

(2) JSP/Database approach

• JSP implementations separate content generation from presentation.

8

• JSP implementations support a Java programming language-based scripting language,

which provides inherent scalability and support for complex operations.

• Database implementations have a larger storage capacity and are able to store more

detailed context information.

1.3 Report Organization

This report is organized as follows. Chapter 2 provides a highlighted technical overview of JavaScript,

cookie technology, Java Server Pages, and MySQL. The JavaScript/Cookie approach is described in

Chapter 3. Chapter 4 gives the experimental results of the JavaScript/Cookie approach. Chapter 5 presents

the JSP/Database approach and Chapter 6 shows its experimental results. Finally in Chapter 7, a

comparison of the two approaches will be given and conclusions will be drawn.

9

Chapter 2

Technical Overview

Numerous technologies are used in this project. This chapter introduces the four major techniques used: (1)

JavaScript, (2) cookie technology, (3) JSP, and (4) MySQL database.

2.1 JavaScript

JavaScript is a compact, object-based scripting language for developing client and server Internet

applications. JavaScript statements can be embedded directly in an HTML page. These statements can

recognize and respond to user events such as mouse clicks, form input, and page navigation.

There are two types of JavaScript [2]:

• Navigator JavaScript, also called client-side JavaScript

• LiveWire JavaScript, also called server-side JavaScript

Since this project is working on a client-side JavaScript application, we will focus on client-side

JavaScript only. Client-side JavaScript statements embedded in an HTML page can respond to user events

such as mouse-clicks, form input, and page navigation. For example, a JavaScript function can be written to

verify that users enter valid information into a form requesting a telephone number or zip code. Without

any network transmission, the HTML page with embedded JavaScript can check the entered data and alert

the user with a dialog box if the input is invalid [3]. Below, we will briefly discuss some important features

of JavaScript that are used in this project.

(1) Scripting event handlers

JavaScript applications in the Navigator are largely event-driven. Events are actions that occur,

usually as a result of something the user does. For example, clicking a button is an event, as is changing a

text field or moving the mouse over a hyperlink. It is possible to define event handlers, such as onChange

and onClick, to make the script react to events.

10

(2) Using cookies with JavaScript

Cookies are a mechanism for storing persistent data concerning the client in a file called cookies.txt or

cookies. Details about cookies will be discussed in a later section of this chapter. To use cookies with

JavaScript, two functions are performed:

• Set a cookie value, optionally specifying an expiration date.

• Get a cookie value, given the cookie name.

It is convenient to define functions to perform these tasks. Here, for example, is a function that sets

cookie values and expiration:

// Sets cookie values. Expiration date is optional

//

function setCookie(name, value, expire) {

document.cookie = name + "=" + escape(value)

+ ((expire == null) ? "" : ("; expires=" + expire.toGMTString()))

}

Notice the use of escape to encode special characters (semicolons, commas, spaces) in the value

string. This function assumes that cookie names do not have any special characters. The following function

returns a cookie value, given the name of the cookie:

function getCookie(Name) {

var search = Name + "="

if (document.cookie.length > 0) { // if there are any cookies

offset = document.cookie.indexOf(search)

if (offset != -1) { // if cookie exists

offset += search.length

// set index of beginning of value

end = document.cookie.indexOf(";", offset)

// set index of end of cookie value

if (end == -1)

11

end = document.cookie.length

return unescape(document.cookie.substring(offset, end))

}

}

}

Notice the use of unescape to decode special characters in the cookie value.

2.2 Cookie Technology

A cookie is a small piece of information stored on the client machine in the cookies file. Cookies can be

manipulated

• Explicitly, with a CGI program.

• Programmatically, with client-side JavaScript using the cookie property of the document object.

This will be the way we handle cookies in this project.

• Transparently, with the LiveWire client objects, when using client-cookie maintenance.

SyntaxA CGI program uses the following syntax to add cookie information to the HTTP header:

Set-Cookie:

name=value

[;EXPIRES=dateValue]

[;DOMAIN=domainName]

[;PATH=pathName]

[;SECURE]

12

Parameters

• name=value is a sequence of characters excluding semi-colons, commas and white space.

To place restricted characters in the name or value, use an encoding method such as

URL-style %XX encoding.

• EXPIRES=dateValue specifies a date string that defines the valid life time of that cookie.

Once the expiration date has been reached, the cookie will no longer be stored or given

out. If dateValue is not specified, the cookie expires when the user's session ends.

The date string is formatted as:

Wdy, DD-Mon-YY HH:MM:SS GMT

where Wdy is the day of the week (for example, Mon or Tues); DD is a two-digit

representation of the day of the month; Mon is a three-letter abbreviation for the month

(for example, Jan or Feb); YY is the last two digits of the year; HH:MM:SS are hours,

minutes, and seconds, respectively.

• DOMAIN=domainName specifies the domain attributes for a valid cookie. If no value is

specified for domainName, Navigator uses the host name of the server that generated the

cookie response.

• PATH=pathName specifies the path attributes for a valid cookie. If no value is specified

for pathName, Navigator uses the path of the document that created the cookie property

(or the path of the document described by the HTTP header, for CGI programming).

• SECURE specifies that the cookie is transmitted only if the communications channel with

the host is secure. Only HTTPS (HTTP over SSL) servers are currently secure. If

SECURE is not specified, the cookie may be sent over any channel.

Description

A server sends cookie information to the client in the HTTP header when the server responds to a

request. Included in that information is a description of the range of URLs for which it is valid. Any future

HTTP requests made by the client which fall in that range will include a transmittal of the current value of

the state object from the client back to the server.

13

Many different application types can take advantage of cookies. For example, a shopping

application can store information about the currently selected items for use in the current session or a future

session, and other applications can store individual user preferences on the client machine [3].

Determining a valid cookie

When searching the cookie list for valid cookies, a comparison of the domain attributes of the

cookie will be made with the domain name of the host from which the URL is retrieved. If the domain

attribute matches the end of the fully qualified domain name of the host, then path matching is performed to

determine if the cookie should be sent. For example, a domain attribute of “auburn.edu’’ would match host

names “.auburn.edu.’’

Only hosts within the specified domain can set a cookie for a domain. In addition, domain names

must use at least two periods. Any domain in the “COM,’’ “EDU,’’ “NET,’’ “ORG,’’ “GOV,’’ “MIL,’’

and “INT’’ categories requires only two periods; all other domains require at least three periods.

PATH=pathName specifies the URLs in a domain for which the cookie is valid. If a cookie has already

passed domain matching, then the pathname component of the URL is compared with the path attribute,

and if there is a match, the cookie is considered valid and is sent along with the URL request. For example,

PATH=/foo matches "/foobar" and "/foo/bar.html". The path "/" is the most general path.

Syntax of the cookie HTTP request header

When requesting a URL from an HTTP server, the browser matches the URL against all existing

cookies. When a cookie matches the URL request, a line containing the name/value pairs of all matching

cookies is included in the HTTP request in the following format:

Cookie: NAME1=OPAQUE_STRING1; NAME2=OPAQUE_STRING2 ...

Saving cookies

A single server response can issue multiple Set-Cookie headers. Saving a cookie with the same

PATH and NAME values as an existing cookie overwrites the existing cookie. Saving a cookie with the

same PATH value but a different NAME value adds an additional cookie.

The EXPIRES value indicates when to purge the mapping. Navigator will also delete a cookie

before its expiration date arrives if the number of cookies exceeds its internal limits.

14

A cookie with a higher-level PATH value does not override a more specific PATH value. If there

are multiple matches with separate paths, all the matching cookies are sent, as shown in the examples

below.

A CGI script can delete a cookie by returning a cookie with the same PATH and NAME values,

and an EXPIRES value which is in the past. Because the PATH and NAME must match exactly, it is

difficult for scripts other than the originator of a cookie to delete a cookie.

Specifications for the client

When sending cookies to a server, all cookies with a more specific path mapping are sent before

cookies with less specific path mappings. For example, a cookie "name1=foo" with a path mapping of "/"

would be sent after a cookie "name1=foo2" with a path mapping of "/bar" if they are both to be sent.

The Navigator can receive and store the following:

• A total of 300 cookies.

• 4 kilobytes per cookie, where the name and the OPAQUE_STRING combine to form the 4

kilobyte limit.

• 20 cookies per server or domain. Completely specified hosts and domains are considered separate

entities, and each has a 20 cookie limitation.

When the 300-cookie limit or the 20 cookies per server limit is exceeded, Navigator deletes the least

recently used cookie. When a cookie larger than 4 kilobytes is encountered the cookie is trimmed to fit, but

the name should remain intact as long as it is less than 4 kilobytes.

2.3 Java Server Pages

Java Server Pages (JSP) allows its users to mix regular, static HTML with dynamically-generated HTML.

JSP is an important component of Java 2 Enterprise Edition (J2EE, http://java.sun.com/j2ee). J2EE is the

official Java framework for enterprise application development, and has the backing of Sun, IBM, Oracle,

BEA, the Apache Group and a host of other application server vendors.

15

J2EE provides: (a) a platform for enterprise applications, with full API support for enterprise code

and guarantees of portability between server implementations; and (b) a clear division between code which

deals with presentations, business logic, and data. J2EE consists of the following APIs, ordered roughly by

where in a three-tier application they are used:

• Java Server Pages

• Servlets

• Java supports for XML

• Enterprise Java Beans (EJBs)

• Java Messaging

• Java Transaction support

• Java Mail

• Java Naming and Directory Interface (JNDI)

• JDBC

• Java support for CORBA

The Java Server Pages 1.1 specification provides web developers with a framework to create

dynamic content on the server side using HTML and XML templates, and Java code, which is more secure,

fast, and independent of server platform.

How JSP works

We will first discuss how a JSP file, Java classes, the web server, and the JSP engine interact, treating the

JSP engine as a black box. Then we will discuss the JSP engine in more detail.

Figure 2.1 pictorially describes the interaction from the client to the Java components as requests cascade

down and the responses return from the client to the Java Components (Java classes) through the web

server, JSP engine, and JSP.

16

Figure 2.1 JSP program diagram

The diagram starts with the client, a web browser, requesting a JSP file. The web server receives the

requests and recognizing that it is a JSP file (based on its file extension), requests the JSP engine to process

the JSP file. On input of the JSP file name, the JSP engine will create HTML code for the client to view by

processing the JSP file and its associated tags.

The JSP engine is an add on to the Servlet engine and is treated as a Servlet itself. When the server

requests a JSP file to be processed, it calls a Servlet. The JSP engine Servlet then acquires all the Java

classes and the requested JSP file and creates a Servlet from the Java classes and the requested JSP file. It

then runs the Servlet and returns the HTML code to the server, which will send it to the client. Making

Servlets for every request is quite time consuming, so the JSP engine uses some simple logic before it sends

a response to the server.

Figure 2.2 shows the JSP engine logic. In essence, the JSP file will only be compiled into a Servlet

on the first request after the JSP file has been altered. As a result, the graphics designer is completely

independent of the compilation process as well as the dynamic generation code.

17

Figure 2.2 JSP engine logic

Advantages of JSP

JSP can be directly compared to the other major techniques which are available for the creation of dynamic

web pages:

(1) Active Server Pages (ASP). ASP is a similar technology from Microsoft. The advantages of JSP are

twofold. First, the dynamic part is written in Java, rather than Visual Basic or another MS-specific

language, so it is more powerful and easier to use. Second, it is portable and can thus be used on other

operating systems and non-Microsoft Web servers.

(2) Pure Servlets. In principle, everything JSP does can also be done using a Servlet. However, it is more

convenient to write (and to modify) regular HTML than to have a zillion println statements that

generate the HTML. Plus, by separating the look from the content, different people can work on

different tasks: Web page design experts can build the HTML, leaving places for the servlet

programmers to insert the dynamic content.

18

(3) Server-Side Includes(SSI). SSI is a widely supported technology for including externally defined

pieces into a static Web page. However, JSP is easier to use because it lets you use servlets instead of a

separate program to generate the dynamic parts of the page. Besides, SSI was designed for simple

inclusions, and it is not straightforward to use it for "real" programs that use form data, make database

connections, and the like.

The Syntax of JSP

JSP is still evolving. A descriptive listing of the current JSP syntax standard includes:

• Declaration This is used to declare variables or methods

Syntax: <%! declaration %>

• Expression This defines a scripting language expression and casts the result as a String.

Syntax: <%= expression %>

• Scriplet This is used to handle declarations, expressions, or any other type of code fragment valid

in the page scripting language.

Syntax: <% code fragment %>

Example:

<body>

<%! String name; %>

<% name = request.getParameter("name");

if ( name == null )

name = "World"; %>

<H1>Hello, <%= name%>. </H1>

19

</body>

• HTML Comment This is used to add a comment that can be viewed in the JSP page source file.

Syntax: <!-- comment <% expression %> -->

Example:

<!-- this is just a HTML comment -->

<!-- This page was loaded on <%= (new java.util.Date()).toLocaleString() %>-->

View Source:

<!-- this is just a HTML comment -->

<!-- This page was loaded on 01-Jan-00 3:50:12 PM -->

• Hidden Comment This is similar to the HTML comment, except that the comment will not be

displayed in the JSP page source file.

Syntax: <%-- comment --%>

• Include Directive This inserts a file of text or code in a JSP file when the file is compiled .

Syntax: <%@ include file="relativeURL" %>

Example:

main.jsp:

<html>

<body>

Current date and time is:

<%@ include file="date.jsp" %>

20

</body>

</html>

date.jsp:

<%@page import="java.util.*" %>

<%= (new java.util.Date()).toLocaleString() %>

Output:

Current date and time is:

05-Mar-01 4:56:50 PM

• <jsp:forward> This forwards a request to another file (HTML, JSP, or Servlet) for processing.

Syntax: <jsp:forward page="relativeURL" />

• <jsp:include> This includes either a static or dynamic file in a JSP file.

Syntax: <jsp:include page="relativeURL" />

• <jsp:useBean> This locates or instantiates a JavaBean component.

Syntax:

<jsp:useBean id="beanInstanceName"

scope="page | request | session | application"

class="package.class" />

Example:

<jsp:useBean id="calendar" scope="page" class="employee.Calendar" />

21

• <jsp:setProperty> This is used to set the value of one or more properties in a Bean, using the

Bean's setter methods.

Syntax: <jsp:setProperty name="beanInstanceName"

{

property ="*" |

property ="propertyName" |

property ="propertyName" value="{string | <%= expression%>}"

}

/>

Example:

<jsp:useBean id="calendar" scope="page" class="employee.Calendar" />

<jsp:setProperty name="calendar" property="username" value="Steve" />

• <jsp:getProperty> This is used to obtain a Bean property value, using the Bean's getter methods.

Syntax: <jsp:getProperty name="beanInstanceName" property="propertyName" />

Example:

<jsp:useBean id="calendar" scope="page" class="employee.Calendar" />

<H1> Calendar of <jsp:getProperty name="calendar" property="username" />

</H1>

22

2.4 MySQL

MySQL is a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database

server that is provided by MySQL AB. MySQL was originally developed to handle very large databases

and has been successfully used in highly demanding production environments for several years. Though

under constant development, its connectivity, speed, and security make MySQL highly suited for accessing

databases on the Internet. The following list describes some of its important characteristics:

• Fully multi-threaded using kernel threads. This means it can easily use multiple CPUs if

available.

• C, C++, Eiffel, Java, Perl, PHP, Python and Tcl APIs.

• Works on many different platforms.

• Many column types: signed/unsigned integers 1, 2, 3, 4, and 8 bytes long, FLOAT, DOUBLE,

CHAR, VARCHAR, TEXT, BLOB, DATE, TIME, DATETIME, TIMESTAMP, YEAR, SET,

and ENUM types.

• Very fast joins using an optimized one-sweep multi-join.

• Full operator and function support in the SELECT and WHERE parts of queries. For example:

mysql> SELECT CONCAT(first_name, " ", last_name) FROM tbl_name

WHERE income/dependents > 10000 AND age > 30;

• SQL functions are implemented through a highly optimized class library and are very fast.

Usually there is no memory allocation after the query initialization.

• Full support for SQL GROUP BY and ORDER BY clauses. Support for group functions

(COUNT(), COUNT(DISTINCT ...), AVG(), STD(), SUM(), MAX() and MIN()).

23

• Support for LEFT OUTER JOIN and RIGHT OUTER JOIN with ANSI SQL and ODBC

syntax.

• Tables from different databases can be combined in the same query.

• The privilege and password system is very flexible and secure, and allows host-based

verification. Passwords are secure because all password traffic is encrypted when connected to

a server.

• ODBC (Open-DataBase-Connectivity) support for Win32 (with source). All ODBC 2.5

functions are available, along with many others. For example, MS Access can be used to

connect to the user’s MySQL server.

• Very fast B-tree disk tables with index compression.

• Up to 32 indexes per table are allowed. Each index may consist of 1 to 16 columns or parts of

columns. The maximum index length is 500 bytes, although this may be changed when

compiling MySQL. An index may use a prefix of a CHAR or VARCHAR field.

• Fixed-length and variable-length records.

• In-memory hash tables which are used as temporary tables.

• Handles large databases with up to 50,000,000 records.

• All columns have default values. INSERT can be used to insert a subset of a table's columns;

any columns that are not explicitly given values are set to their default values.

• Uses GNU Automake, Autoconf, and Libtool for portability.

• Although it was originally written in C and C++, it has been tested with a broad range of

different compilers.

• A very fast thread-based memory allocation system.

24

• No memory leaks. MySQL has been tested with Purify, a commercial memory leakage

detector.

• Includes myisamchk, a very fast utility for table checking, optimization, and repair. All of the

functionality of myisamchk is also available through the SQL interface.

• Full support for several different character sets, including ISO-8859-1 (Latin1), big5, ujis, and

more.

• All data are saved in the chosen character set. All comparisons for normal string columns are

case insensitive.

• Sorting is done according to the chosen character set, using the Swedish set as the default. It is

possible to change this when the MySQL server is started up. MySQL supports many different

character sets that can be specified at compile and run time.

• Aliases on tables and columns are allowed, as in the SQL92 standard.

• DELETE, INSERT, REPLACE, and UPDATE return the number of rows that were changed

(affected). It is possible to return the number of rows matched instead by setting a flag when

connecting to the server.

• Function names do not clash with table or column names. For example, ABS is a valid column

name. The only restriction is that for a function call, no spaces are allowed between the

function name and the ‘(‘ that follows it.

• All MySQL programs can be invoked with the --help or -? options to obtain online assistance.

• The server can provide error messages to clients in many languages.

• Clients may connect to the MySQL server using TCP/IP Sockets, Unix Sockets (Unixes), or

Named Pipes (NT).

25

Chapter 3

The JavaScript/Cookie Approach

As described in Chapter 2, two independent technologies are applied in this project. This chapter gives

details of the JavaScript/Cookie approach.

3.1 System Structure

Figure 3.1 shows an architectural diagram of the JavaScript/Cookie approach.

Figure 3.1 Architecture of JavaScript/Cookie approach

User-Selected Pages

Query

Java Swing GUIQuery Interface

Formatted Query

Commercial WebSearch Engine

Downloaded Pages

Ranking Kernel

Ranked Pages

Result Interface

Query

Java Swing GUIQuery Interface

Cookie File

User

26

The JavaScript/Cookie approach works as follows. A query is submitted by the user from a Java Swing

GUI. The query is then formatted according to the search engine selected, the key word entered, and the

number of results requested. After the query has been formatted upon user’s request, it is sent to the

corresponding search engines, such as Google [4], Goto [5], HotBot [6],and MSN Search [7]. The search

engines then return the results pages. If the user chooses to view the results in separate windows, the results

from each search engine will be displayed individually. Alternatively if the user wants the search results

from multiple search engines to be displayed in an integrated way, the ranking kernel will take the search

results of the multiple search engines as input, calculate the rank of each result and output the combined

results according to the descending order of each result URL’s rank. In either case, the result pages will be

designed to show in such a way that for each result item, its title and URL will be displayed. Once the user

views the URLs and finds a result item is worthy of intermediate storage, it can be conveniently stored as a

cookie by checking the checkbox associated with this result item. The new cookie will then be saved in the

cookie file of the local computer storage.

3.2 User Interface and Functionality

A stand-alone Java application is implemented to generate a GUI for selecting search engines, submitting a

query, selecting the number of results to be shown each time and executing the program for downloading

results, ranking, displaying the reorganized results and/or searching previous queries from the cookie file.

3.2.1 GUI Components and Layout

Figure 3.2 shows the Java Swing look-and-feel interface. JFC’s Swing components are used for the GUI

design. The components used in the GUI include JPanel, JFrame, JLabel, JCheckBox, JRadioButton,

JButton, JTextField. We chose to use BoxLayout manager to produce our expected layout effects. The four

JCheckBox objects represent four commercial search engines – Google, Goto, HotBot, and MSN Search.

The JTextField object is used to type in the query term. The three JRadioButton objects allow the user to

choose the number of results to be displayed for each result page. There are four JButton objects in the

interface – “Submit Query,” “View Separate Results,” “View Integrated Results,” and “View Search

Results.”

27

Figure 3.2 Swing interface for JavaScript/Cookie approach

3.2.2 GUI Event Handling

By adding ActionListener to each of the Swing components in the GUI, the event handler will respond

differently according to the source of each action event.

• Submit Query

Once the program receives an event issued by the button “Submit Query,” the event handler will

check which search engine(s) have been selected, extract the query term from the key word text

field and the number of results the user wants to see each time, then construct a query according to

the predefined format unique to each search engine. The query is then sent to the selected search

engines and the results pages downloaded.

• View Separate Results

This button enables results from different search engines to be displayed individually in separate

browsers.

• View Integrated Results

28

Search results from different search engines will be collected and sent to the ranking software if

this button is selected. After ranking, the integrated results will be displayed to the user in the

descending order of result ranks.

• View Search Results

Pressing this button makes the searching software to take the query term as the input parameter

and search the cookie file to check if there are any cookies whose names contain the query term

(the construction of cookie will be described in the following section). Search results will then be

displayed.

3.3 Implementation

Detailed implementation is described in the following eight subsections: (1) Extraction of query terms, (2)

Querying the existing web search engines, (3) Downloading results pages, (4) Parsing the search results, (5)

Ranking kernel, (6) Displaying the results, (7) Cookie setup, (8) Searching kernel.

3.3.1 Extraction of Query Terms

In the Java application of GUI, a class called “NewPanel” is defined to implement the DocumentListener

interface, which is designed to get the query string from the JTextField. The DocumentListener interface

contains three methods:

• void changedUpdate(DocumentEvent)

Called when the style of some of the text in the listened-to document changes. This sort of event is

fired only from a StyledDocument -- a PlainDocument does not fire these events.

• void insertUpdate(DocumentEvent)

Called when text is inserted into the listened-to document.

• void removeUpdate(DocumentEvent)

Called when text is removed from the listened-to document.

29

Each document event method has a single parameter: an instance of a class that implements the

DocumentEvent interface. To get the document that fired the event, DocumentEvent's getDocument

method can be used.

3.3.2 Querying the Existing Web Search Engines

After receiving information from the GUI, queries will be formatted and sent to the corresponding web

search engines. For most commercial search engines, including the four search engines chosen for our

system, once the query term is submitted, a formatted query statement is constructed and sent through CGI.

The results pages whose URL's are built with embedded query terms are then returned. Usually, each web

search engine has its own format for the URL of the returned page. Therefore, the querying term can be

embedded in the predefined format of the individual search engine. For example, the format of a Google

query could be obtained through the observation of Google's URLs for returning pages after a query. If the

query has more than one word, they are usually connected with a "+" sign.

On Google's search page, for the query "computer" and selecting 50 results to be displayed per

page, the URL of the returning page that displays the results is

http://www.google.com/search?as_q=computer&num=50&btnG=Google+Search&as_epq=&as

_oq=&as_eq=&as_occt=any&lr=&as_dt=i&as_sitesearch=&safe=active

For another example, using the advanced search page of MSN Search, again using the query

"computer" and selecting 50 results per page, the URL of the returning page will be

http://search.msn.com/results.asp?q=computer&FORM=SMCA&cfg=SMCINK&v=1&ba=0&f=a

ny&co=50&RS=CHECKED&sort=&rgn=&lng=&dom=&depth=&d0=&d1=&cf=

Using the common URL formats for different commercial web search engines, we can obtain

pages for any query strings just by embedding the query term in the URL format without manually

submitting the query through each individual web search engine's web user interface.

3.3.3 Downloading Results Pages

30

The next step is to download the HTML source file of the results page to a local file for further processing.

This is achieved by Lynx, a text-based browser for the World Wide Web. It will display Hypertext Markup

Language (HTML) documents containing links to files on the local system, as well as files on remote

systems running http, gopher, ftp, or finger servers. Lynx can be used to access information on the World

Wide Web, or to build information systems intended primarily for local access. Our system uses Lynx

running on Unix. The syntax of Lynx is:

Lynx [options] [path or URL]

For example, option –dump stores the formatted output of the default document to either standard

output or to a file specified on the command line. Therefore, by calling

Lynx –dump <URL> > <filename>

We can download the formatted text of the result page returned by the search engine for a pre-formatted

query and dump the text to a specified local file.

3.3.4 Parsing the Search Results

Each search engine has a different format for the results text. Besides the target results, the text usually

contains many other information which will be discarded. Thus the downloaded text should be parsed. In

this system, four programs (ParseGoogle.java, ParseGoto.java, ParseHotbot.java, ParseMsn.java) were

developed to specifically parse the documents downloaded from the four search engines, according to their

unique formats. After parsing, only the title and URL for each result item will be extracted and saved. Here

a wrapper class called “ResultSet” was defined that has three data members – title, url, and rank (which is

initialized to be 0). Corresponding access methods, helper methods such as toString() and print(), are also

defined in the class “ResultSet”. The extracted information will then be stored in an array of ResultSet

objects.

3.3.5 Ranking Kernel

The purpose of the ranking kernel is to take the collections of ResultSet objects from the different search

engines and calculate the rank of each ResultSet object using a simple ranking algorithm [8]. The basic

idea for the proposed ranking is that the more often a ResultSet object’s url is referred to by the selected

31

search engines, the higher rank it will receive. For example, when the query term “Java” is submitted, all

four of the search engines show

http://www.javaworld.com

as one of their result items. Therefore, the ResultSet object whose url is “http://www.javaworld.com” will

receive the highest rank. In this system, since there are at most four search engines, the highest rank should

be 4. After calculating the rank for each object in the collections of ResultSet arrays, the ranking kernel will

proceed to merge several arrays into one and eliminate the duplicates. Then it will sort and rearrange the

merged array in descending order of rank. To do this in a convenient way, the ResultSet class is allowed to

implement the Comparable interface by defining the compareTo(Object a) method in the ResultSet class.

When the current ResultSet object (“this’’) calls the compareTo((ResultSet) X) method, if the rank of the

current ResultSet object is larger than that of X, the method will return a positive number, otherwise it will

return a negative number. In addition, the class called “GenericSorts” has been defined that contains a static

method “selectionSort(Comparable[] a, int length)’’ to perform sorting using a selection sort algorithm.

After the selection sort, the integrated array of ResultSet objects will be rearranged in descending order of

rank.

Therefore if the user chooses to view integrated results at a certain number of results per page, say

20 results per page, the integrated results page will display the first 20 ResultSet objects including their

titles, URLs and ranks since they are supposed to rank as the top 20 results in the whole array.

3.3.6 Displaying the Results

After the results documents are downloaded and parsed into arrays of ResultSet objects, the event handler

in the Swing interface will check how the user chooses to display the results – either “View Separate

Results’’ or “View Integrated Results’’, or both. If “View Separate Results’’ is selected, the event handler

will launch new web browsers to display each of the search engine results individually. For example, if the

search results of Google are to be displayed, the following Java system call needs to be made

Runtime.getRuntime().exec("netscape

http://www.eng.auburn.edu/users/yinhong/project/displayGoogle.html");

32

In displayGoogle.html, the array of ResultSet objects will be listed in a form, having a checkbox associated

with each of the record (title, url). If the user looks at a particular URL and shows that they are interested in

this link by checking the corresponding checkbox, a new cookie will be generated and stored in the cookie

file. The formation of cookie’s (key, value) pair will be described in the following section.

If the user chooses to view the integrated results, as mentioned above, all the results will first be

ranked, sorted and then displayed. The integrated results are displayed in the same way as the separate

results.

3.3.7 Cookie Setup

In this JavaScript/Cookie approach, JavaScript is used to set up cookies to keep track of the user’s search

context. The cookie is saved in the cookie file as a (key, value) pair in order to memorize some important

characteristics of the user search context, such as keywords and URLs. The key and value of cookies are

defined using the following special format:

Key: search engine name + “@” + query term + “@” + certain sequence number associated

with that URL

Value: the specific URL user selected from the search results

In the above (key, value) design method, a unique (key, value) pair will be generated for each item selected

by the user from the search results on the result page. Further, the newly generated cookie with the unique

(key, value) pair will be saved into the cookie file in the local computer’s storage by using JavaScript’s

setCookie() function.

function SetCookie (name, value) {var argv = SetCookie.arguments;var argc = SetCookie.arguments.length;var expDays = 10;var exp = new Date();exp.setTime(exp.getTime() + (expDays*24*60*60*1000));var expires = (argc > 2) ? argv[2] : null;var path = (argc > 3) ? argv[3] : null;var domain = (argc > 4) ? argv[4] : 'eng.auburn.edu';var secure = (argc > 5) ? argv[5] : false;document.cookie = name + "=" + value + "; expires=" +exp.toGMTString() +((path == null) ? "" : ("; path=" +path)) + ((domain == null) ? "" : ("; domain=" + domain))+((secure == true) ? "; secure" : "");

}

33

In the results page, once the user has clicked the checkboxes associated with the result URLs, the

setCookie(name, value) function enclosed in the <script>..</script> tags will be called, and a new cookie

will be created and saved in the local storage. Here the lifetime of the cookie is set to ten days and after that

time it will no longer exist in the cookie file. Setting the expiration time makes the information gathered in

the user search context to be relatively fresh and up-to-date.

3.3.8 Searching Kernel

Note that in the Swing interface of the JavaScript/Cookie approach, there is a “View Search Result’’

button. The functionality of that button is to call a static method ToView( String keyword) which is defined

in another program “View.java.’’ In this method, the JavaScript function getCookie(name) will be called

with the keyword the user just typed in as the parameter for the function call. It will search in the cookie

file in the local computer to check each cookie to see if the cookie’s name contains the key word the user

just typed in. Therefore after View.ToView(name) is called, a new browser will pop out with the search

results shown under another component of cookie name – search engine name.

function getCookieVal (offset) {var endstr = document.cookie.indexOf (";", offset);if (endstr == -1)

endstr = document.cookie.length;return document.cookie.substring(offset, endstr);

}

function GetCookie (name) {

var arg = name + "=";var alen = arg.length;var clen = document.cookie.length;var i = 0;

while (i < clen) {var j = i + alen;if (document.cookie.substring(i, j) == arg)

return getCookieVal (j);i = document.cookie.indexOf(" ", i) + 1;if (i == 0)

break;}return null;

}

34

Chapter 4

Experimental Results of the JavaScript/Cookie Approach

This chapter provides sample results from some experiments. The experiments were conducted on a Sun

workstation running Solaris 2.6.

4.1 Experiment I

In this experiment, we used the one-word query term “Java,” selected all four of the search engines

available, and set number of pages to be displayed as 20. Figure 4.1 shows the query interface.

Figure 4.1 Swing interface for the query “Java”

4.1.1 Separate Search Results

Figures 4.2 through 4.9 show the separate display results for the four search engines for the search keyword

“Java.” As we can see from the pictures below, a checkbox is associated with each record, which makes

marking of records and creating cookies more convenient.

35

Figure 4.2 Search results of the query “Java” from Google

Figure 4.3 Search results of the query “Java” from Google (continued)

36

Figure 4.4 Search results of the query “Java” from GoTo

Figure 4.5 Search results of the query “Java” from GoTo (continued)

37

Figure 4.6 Search results of the query “Java” from Hotbot

Figure 4.7 Search results of the query “Java” from Hotbot (continued)

38

Figure 4.8 Search results of the query “Java” from MSN search

Figure 4.9 Search results of the query “Java” from MSN search (continued)

39

4.1.2 Integrated Search Results

If the user chooses to view integrated results from the selected multiple search engines, the ranking kernel

will open the local files that contain the downloaded documents, parse them separately, run the ranking

software and finally display the results after ranking. In the test, again using “Java” as the query term and

setting number of results to be displayed each time as 20, after ranking, the result page is captured as

follows.

Figure 4.10 Integrated results of the query “Java” after ranking

40

Figure 4.11 Integrated results of the query “Java” after ranking (continued)

From Figures 4.10 and 4.11, it can be observed that http://www.javaworld.com/ has the highest rank value

4 and hence is displayed in the first place in the integrated results page. Another URL

http://www.java.sun.com/, has three search engines that refer to it and the rank value is therefore 3.

41

4.1.3 Stored Search Results

If the user chooses to check if this same query has been sent and some cookies placed in the cookie file

within the past ten days, he/she might want to click the “View Search Results” button in the GUI. If this

button is clicked, the event handler will fire the searching kernel, which is mainly composed of

getCookie(name) function, to check if the same query has been sent and cookies entered in the last ten

days. Figure 4.12 shows the search results for the query “Java.”

Figure 4.12 Search results of the query “Java”

Note that even if the same query has been sent before, if it was longer than ten days previouly, the cookie

will have automatically been deleted from the cookie log.

42

4.2 Experiment II

According to Search Engine Watch [9], multiple-keyword queries are becoming more popular because a

multiple-keyword query will usually return a higher number of more relevant pages. Since web users want

to get more accurate results, they usually tend to enter multiple-keyword queries to narrow down the results

obtained. The purpose of this experiment is therefore to determine the relevancy of the search results from

the selected search engines for multiple-keyword queries.

In the second experiment, we used the two-word query term “Computational Chemistry,” selected

all four of the search engines provided, and set number of pages to be displayed as 20. Since the process for

“View Separate results” for this two-word query is very similar to that for a one-word query, the individual

result pages from each search engine are not shown. Figure 4.13 shows the swing interface for the query

“computational chemistry.”

Figure 4.13 Swing interface for the query “computational chemistry”

43

4.2.1 Integrated Search Results

Figures 4.14 – 4.15 show the top 20 ranked pages for the query “Computational Chemistry.”

Figure 4.14 Integrated results for the query “computational chemistry”

44

Figure 4.15 Integrated results of the query “computational chemistry” (continued)

As shown in the results page, the top four URLs all have the highest rank, and then 5 – 13 had rank of 3.

From the results above, it can be observed that multiple-keyword queries do return a higher number of

more relevant results.

45

4.2.2 Stored Search Results

Figure 4.16 shows the search results for the query “Computational Chemistry” stored in the cookie file.

Figure 4.16 Search results of query “Computational Chemistry”

46

4.2.3 Actual cookie File

Figure 4.17 shows a snapshot of the actual cookie file. The cookies with the specific format of (key, value)

store the web user’s search context.

Figure 4.17 Snapshot of the cookie file

47

Chapter 5

The JSP/Database Approach

This chapter describes the other independent approach to manage a web user’s search context--the

JSP/Database approach.

5.1 System structure

Figure 5.1 shows the architecture of the JSP/Database approach. This system works as follows. A query is

submitted by the user from a Java Server Page GUI query interface. After extracting the search keywords

from the GUI text field, the system will first check with the underlying database to see if the database keeps

a record of the user search context of these keywords. If there is no related record found in the database, a

new search will be initialized according to the search engine selected as well as the search keyword

entered. If the records are found in the database, the results will be shown to the user and no new search

will be initiated.

48

Figure 5.1 Architecture of JSP/Database approach

User-Selected Pages

Query

Java Server PageQuery Interface

Query Results

Formatted Query

Commercial WebSearch Engines

Downloaded Pages

Filtering Kernel

Filtered Pages

Result Interface

Query

Java Server PageResult Pages

MySQL Database

If the databasecontains recent user

search context ofthese keywords?

N

Y

49

5.2 User Interface and Functionality

Figure 5.2 shows the user interface implemented using Java Server Pages. This interface looks similar to

the interface for the JavaScript/Cookie approach. The difference between these two interfaces is that, the

interface of the JavaScript/Cookie approach is a stand-alone Java program that mainly utilizes the Java 2

platform's Swing package. Once a JButton object such as “Submit Query” is clicked, the corresponding

actionlistener will catch the ActionEvent and respond appropriately. The interface of the JSP/Database

approach, on the other hand, is a JSP program which is mainly written in HTML. Programs are activated

from myproj.jsp. It receives two parameters from its GUI: a query term, and the search engine(s) selected.

<form method=post action = "step0.jsp">

Once the “submit” button inside the form is clicked, another program step0.jsp will be called.

Here Netscape [10], Yahoo [11], AltaVista [12] and AskJeeves [13] were chosen as the search engines. The

execution of the whole approach will be described in more details in the following sections.

Figure 5.2 GUI for JSP/Database approach

50

5.3 Implementation

5.3.1 Execution path of JSP/Database approach

Figure 5.3 shows the execution path of the JSP/Database approach.

Figure 5.3 Execution path of JSP/Database approach

5.3.2 Database Searches

The main functionality of this step is to get user input from the system interface--query term and the names

of the search engines selected and to perform database searches. JavaBean is used to keep parameters

accessible by different programs.

<jsp:useBean id="bean2" scope="session" class="myexample.NameHandler1" />

<jsp:getProperty name="bean2" property="searchengine" />

<jsp:getProperty name="bean2" property="keyword" />

<jsp:useBean id="bean3" scope="session" class="myexample.ArrayHandler" />

<jsp:useBean id="bean4" scope="session" class="myexample.FavorHandler" />

Web Interface(myproj.jsp)

Searching in Database--displaying results or

forwarding queries to thenext program

(step0.jsp)

Downloading result pages,and fitting filtered results

into a new form.(step1.jsp)

Inserting user-selected resultsinto database and displaying

database contents(step2.jsp)

51

We use jsp:getProperty to get values for the properties of beans and jsp:setProperty

to give values to the properties of beans. The following fragment of code shows how to set values

to properties ("engine" and "key") of an instance of bean2 as an example.

<%

if ( request.getParameter("submit") != null ) {

key = request.getParameter("keyword");

engine = request.getParameter("searchengine");

bean2.setSearchengine(engine);

bean2.setKeyword(key);

} // end of if

%>

Here request is an implicit object of type javax.servlet.ServletRequest interface. The ServletRequest

interface declares the methods that are to be used to provide client request information to a servlet. The

getparameter() method returns a String object containing the value of the specified parameter, or a null

value if the parameter does not exist.

Once the query term is set, the following part of the program searches the database to see if there

are any historical results (with the same search keyword) already stored in the database. The database used

for the project is the MySQL database provided by WebAppCabaret [3].

The database connection expression is as follows:

Class.forName("org.gjt.mm.mysql.Driver");

Connection conn =

DriverManager.getConnection("jdbc:mysql://10.0.0.1/yinhong?user=yinhong&pas

sword=yh");

If the result is not null, it will display the selected results without forming a new query and connecting to

the web, and therefore will not send the query to the corresponding commercial web search engine. If there

is no record found in the database with the same keyword, the following part of the program will redirect

52

execution of the program to another program -- step1.jsp. To forward the client request to another URL,

whether it be an HTML file or servlet, the following syntax is used:

<jsp:forward page="step1.jsp" />

5.3.3 Search-Result Display

In step1.jsp, the query is formatted and sent to the selected search engines and results are then collected,

filtered and fit into a new form. Similarly each result URL is associated with a checkbox in the form. Once

the user views a result URL that he/she wants to keep, he/she may want to click the associated checkbox.

Upon clicking on the checkbox, the associated URL, together with the search key word, will be inserted

into the database table, which referred to as “bookmark.” At the time of insertion, the program will get the

system time and insert this long type number inside the bookmark table.

Note that lynx is used to download the documents of results pages in the JavaScript/Cookie

approach. Here lynx is not supported, therefore Java’s URL class and its methods must be used to

download the HTML source files of results pages to local files.

Downloading the Result Pages

The following Java statements are embedded in the HTML-based file:

URL url = new URL(s);

InputStream in = url.openStream();

File file = new File("searchfile");

FileOutputStream out = new FileOutputStream(file);

while ((bytes_read = in.read(buffer)) != -1)

out.write(buffer,0,bytes_read);

where s is a String type that represents the formatted query string ready to be submitted to an individual

web search engine. By using the above Java statements, the raw results page returned by the search engine

for a given query term can be downloaded to a local file called “searchfile.” Then this local file is

processed by the filtering kernel to filter out useful information.

53

Filtering Kernel

The collection of downloaded pages is stored in a local file. If this file is examined, it can be seen that the

file contains not only the desired information closely related to the query term, but also a great deal of

unrelated information such as tags and various advertisements. The functionality of the filtering kernel is

that it will take the query term and the local file containing the raw results pages as its input, and filter out

the unrelated contents inside the raw results file. It also filters out some other embedded tags such as

<form>, </form>, etc. A web page will then be constructed to display the filtered results. Each result item

is associated with a checkbox, and when a user clicks on the checkbox, the corresponding result item will

be saved in a predefined database table.

5.3.4 Saving the selected Results

step2.jsp provides the insertion and selection operation of the JDBC part of the JSP/Database approach.

First it will check in the form created in step1.jsp to see which checkboxes have been selected. For those

checkboxes chosen, it will insert the corresponding keyword, URL of the result page, and the current

system time in milliseconds into the database. The following code fragment implements this:

Statement stmt = conn.createStatement();

Long curr_time1 = System.currentTimeMillis();

for ( int i = 0; i < bCount; i++) {

if ( request.getParameter("favorite"+i) != null ) {

try {

stmt.executeUpdate("insert into bookmark values('"+keyword+"',

'"+cArray[i]+"', "+curr_time1+")");

}

catch (SQLException e) {

out.println("An SQLException has occurred!");

}

}

54

}

In Java language, the System class provides several methods that provide miscellaneous functionality

including getting the current time. The currentTimeMillis method returns the current time in milliseconds

since 00:00:00, January 1, 1970. The currentTimeMillis method is commonly used during performance

tests: get the current time, perform the operation, get the current time again. The difference in the two time

samples corresponds to the amount of time that the operation took to perform.

When selecting all records in the database, the currentTimeMillis method can be called again to get the

current time. The difference between the current time and the time stored in the database, therefore, should

be the duration time for which this record has existed in the database. Figure 5.4 shows a schema of the

database table.

Figure 5.4 The schema of the database table bookmark

55

Chapter 6

Experimental Results of the JSP/Database Approach

In this experiment, we used WebAppCabaret (http://www.webappcabaret.com/) [14] as the JSP/Servlet

engine and database provider. WebAppCabaret is a free web application hosting and collaboration service

that enables the users to run and share web applications. The Servlet/JSP Engine of WebAppCabaret is a

Lightweight Servlet/JSP WEB Server designed for supporting multiple hosts (contexts) with each host and

context having the capability to be insulated from each other via the Security Manager. The WEB Server

engine supports the HTTP1.1 protocol. The servlet engine (NGASI) implements the Java Servlet 2.2 API.

The JSP engine (NGASI) supports the JSP 1.1 specification. The Database supported by

www.webappcabaret.com is MySQL database. The driver for MySQL database is

“org.gjt.mm.mysql.Driver,” which is already set in the user’s classpath. The database Grants are SELECT,

INSERT, UPDATE, DELETE, CREATE, ALTER and DROP. INDEX and LOAD are not allowed.

6.1 Search Results

The following figures show the results pages when “computer” is used as the query term, Netscape was

chosen as the search engine. Figure 6.1 shows the web interface for the query “java.”

Figure 6.1 Web interface for the query “java”

56

Once the query has been submitted, the underlying program step0.jsp will connect to the database, open the

database connection and send the query to check if the bookmark table contains tuples with similar

keywords. If it does, it will return the query results back to the user. For example, for the query term “java,”

there are records already in existance in the bookmark table, therefore the tuples whose keyword attribute is

“java” will be displayed. Figure 6.2 shows the search results of the query “java.”

Figure 6.2 Search results from the bookmark table for the query “java”

If, on the other hand, the database does not contain any records with a similar keyword, step0.jsp will

forward the query to step1.jsp. Step1.jsp will then form a query in a predefined format, send the query to

the selected search engine, download and filter the results, and finally display the results in a reorganized

form. The example here is the query “internet.” Figure 6.3 shows the web interface of the query “internet”.

Yahoo! is selected as the search engine here. Figure 6.4 and Figure 6.5 show the search results from

Yahoo! for the query “internet.”

57

Figure 6.3 Web interface for the query “internet”

Figure 6.4 Search results from Yahoo! for the query “internet”

58

Figure 6.5 Search results from Yahoo! for the query “internet” (continued)

59

6.2 Search-Result Management

After the results page has been displayed, step2.jsp is in charge of collecting the records whose checkboxes

are clicked and storing the corresponding keywords and URLs into the database. The following figure

shows a snapshot of all the database records in the bookmark table after sequential execution of step1.jsp

and step2.jsp.

Figure 6.6 The bookmark table contents

60

Chapter 7

Conclusions

In this project, the web user's search context was tracked from both the client side and the server side. On

the client side, the Java Swing stand-alone program was used as the graphical user interface and a new web

interface generated to display the results pages of a user’s query. JavaScript is embedded in the result

HTML files to set up cookies for small amount, short-term memory of user's search context. On the server

side, Java Server Page (JSP) was used to design web interfaces, and MySQL database was created and

connected to the servlet. The user's search context can then be stored in the database over the long term for

later retrieval. Each solution has its own advantages and disadvantages.

For the client-side search context processing, JavaScript can generate HTML dynamically on the

client’s own computer. The generation of cookies is very fast and does not need a server's large computing

and storage resources. This is a useful capability, but only handles situations where the dynamical

information is based on the client's environment. Also, since it runs on the client side, JavaScript can't

access server-side resources like databases, catalogs, pricing information, and the like. When running

JavaScript, it is less secure, browser dependent and unstable. This is not the case with JSP pages.

For server-side search context processing, Java Server Page can also generate web pages

dynamically since the Java program can be embedded into the HTML language. Also since a database

connected to the JSP server was chosen to store search context, the storage capability is vastly increased

compared to using cookies, which are limited to a maximum of 20 cookies per domain, and 400 KB per

cookie. Pages built using JSP technology are typically implemented using a translation phase that is

performed once, the first time the page is called. The page is compiled into a Java Servlet class and remains

in the server memory, so subsequent calls to the page have a very fast response time, whereas in ASP the

page is recompiled for every request. JSP implementations support a Java programming language-based

scripting language, which provides inherent scalability and support for complex operations. Most JSP

pages rely on reusable, cross-platform components (JavaBeans or Enterprise JavaBeansTM components) to

perform the more complex processing required of the application, instead of relying heavily on scripting

within the page itself. Developers can share and exchange components that perform common operations, or

61

make them available to larger customer communities. The component-based approach speeds overall

development and lets organizations leverage their existing expertise and development efforts for optimal

results. However, each time the search context is stored or retrieved from the database, the connection

between client and server needs to be established, which increases the network traffic and slows down the

program execution. Server-side storage also increases the burden on the server's storage capability.

62

References

[1] Krishna Bharat. “SearchPad: Explicit Capture of Search Context to Support Web Search.” In Proc. 9th

WWW Conf., 1999.

[2] Arman Danesh and Wes Tatters. “JavaScript 1.1 Developer's Guide.” Sams.net Publishing, 1996.

[3] Persistent Client State - HTTP Cookies, Netscape.

http://www.netscape.com/newsref/cookie_spec.html.

[4] Google. http://www.google.com/.

[5] GoTo. http://www.goto.com/.

[6] Hotbot. http://www.hotbot.com/.

[7] MSN Search. http://search.msn.com/.

[8] Wen-Chen Hu, Yining Chen, Mark S. Schmalz, and Gerhard X. Ritter. “An overview of World Wide

Web search technologies.” In Proceedings of the 5th World Multi-Conference on Systemics,

Cybernetics and Informatics, SCI 2001, Orlando, Florida, July 22-25, 2001.

[9] Search Engine Watch. http://www.searchenginewatch.com/.

[10] Netscape. http://www.netscape.com/.

[11] Yahoo!. http://www.yahoo.com/.

[12] Alta Vista. http://www.altavista.com/.

[13] Ask Jeeves. http://www.ask.com/.

[14] WebAppCabaret. http://www.webappcabaret.com/.