artificial intelligence chat - penn state harrisburg math/computer

31
The Pennsylvania State University The Graduate School Capital College Artificial Intelligence Chat A Master’s Paper in Computer Science By Resham N. Mahadeo ©2004 Resham N. Mahadeo Submitted in Partial Fulfillment Of requirements of for the degree of Master of Science October 2004

Upload: others

Post on 12-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

The Pennsylvania State University

The Graduate School Capital College

Artificial Intelligence Chat

A Master’s Paper in

Computer Science

By

Resham N. Mahadeo

©2004 Resham N. Mahadeo

Submitted in Partial Fulfillment

Of requirements of for the degree of

Master of Science

October 2004

Page 2: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

II

Table of Contents I ABSTRACT ........................................................................................................IV

II ACKNOWLEDGEMENTS ................................................................................. V

1 INTRODUCTION................................................................................................. 1

2 LOCATING THE BEST CHAT PROGRAM SHELL ....................................... 3

3 INSTALLING AND CONFIGURING THE MYSQL DATABASE................... 4

4 DESIGN DETAILS............................................................................................... 6

4.1 SELECTING THE RESPONSE ALGORITHM .................................................................... 6 4.2 ACQUIRING CONVERSATION................................................................................... 10 4.2.1 JMSN Main Functionality.............................................................................. 13 4.2.2 Main Interface ............................................................................................... 13 4.3 CHAT INTERFACE................................................................................................... 14 4.4 MAIN INTERFACE................................................................................................... 15 4.4.1 ChatDialog : .................................................................................................. 16 4.4.2 BuddyTree :................................................................................................... 16 4.4.3 BuddyList:..................................................................................................... 17 4.4.4 AbstractProcessor : ........................................................................................ 17 4.4.5 MSNMenuBar: .............................................................................................. 17 4.4.6 SelectCommon: ............................................................................................. 17 4.4.7 SelectResponse:............................................................................................. 17 4.4.8 AddFriendsDialog: ........................................................................................ 17 4.4.9 ChatArea: ...................................................................................................... 18 4.4.10 MsnListener:.................................................................................................. 18 4.4.11 MsnFriend: .................................................................................................... 18 4.4.12 Main: ............................................................................................................. 18 4.4.13 MSNMessenger : ........................................................................................... 18 4.4.14 SwitchboardSession: ...................................................................................... 18

5 EFFECTIVENESS OF THE RESULTS............................................................ 19

6 A COMPARISON OF OTHER AI CHAT CLIENTS....................................... 19

7 ISSUES AND PROBLEMS ................................................................................ 22

8 FUTURE ENHANCEMENTS............................................................................ 25

9 CONCLUSION ................................................................................................... 25

10 REFERENCES.................................................................................................... 26

Page 3: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

III

Table of Figures

FIGURE 1: TABLE OF DATABASE TABLES ....................................................................... 4 FIGURE 2: SELECT RESPONSE ALGORITHM.................................................................... 7 FIGURE 3: ACQUIRING CONVERSATION........................................................................ 10 FIGURE 4: FLOW OF TRIANGULATION (SIMPLIFIED VERSION) ...................................... 11 FIGURE 5: MAIN INTERFACE OPTIONS ......................................................................... 13 FIGURE 6: CHAT DIALOG ............................................................................................. 14 FIGURE 7: MAIN INTERFACE ........................................................................................ 15 FIGURE 8: TABLE OF UNMATCHED PAIRS.................................................................... 23

Page 4: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

IV

I Abstract

Our contribution includes the modification of an ordinary chat client to

create an automated chat client. Several functionalities were added to achieve

this goal. Adding arbitrary users to our buddy lists is one of these functionalities.

Means were developed by which data can be collected from two users chatting,

without the user’s awareness. The ability by which a single response can be sent

selectively to a user has been added. The ability to be able to respond to a

selected user automatically without any input has been added. Another important

functionality added is a reliable algorithm that can pick the appropriate response.

A mechanism to create a response from a customized list of responses in

conjunction with regular words received has also been added.

Page 5: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

V

II Acknowledgements

I would like to express thanks to Dr. Pavel Naumov for extending his hours

to accommodate me during this project. It is dvice, guidance, encouragement,

and motivation that took me to the finish line. I would also like to say thanks to

Dr. Thang Bui for all his input, help and guidance during my course of study.

Finally, I would like to thank Dr. Linda Null, Dr. Qin Ding and Dr. Sukmoon Chang

for their input and guidance.

Page 6: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

1

1 Introduction

There are programs such as Eliza, Chatbot, and Alicebot that have been

created for a similar purpose as the one in this project. These programs strive to

pass the standard test set by Alan Turing in 1950. This test states that if a person

cannot tell the difference between a computer and a human, both of which are in

separate rooms away from the person, then the computer has said to pass the

Turing test.

Instant messaging is sending short text messages between people

electronically in real time. A key feature of IM software programs is the buddy list,

which tells whether or not a friend or colleague is online and available to chat

[11]. Another popular feature is the ability to create different groups. There are

several statuses that users can set if they are away, busy, available, etc. Actions

available to the users in these buddy lists usually include, add, remove, block,

sort, search and unblock. However they can vary with different clients. Similar

actions can be taken on groups as well. It is also common to send or receive files

using this application. Some of the newer features that are available are the

ability to send or receive voice and video. Taking all of this into consideration, it is

easily seen why IM software programs are becoming more popular as time

progresses, particularly with the younger generation. However the scope of

instant messaging has room for advancement. Soon the scope and use of IM will

parallel that of e-mail.

Page 7: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

2

Instant messaging has become more popular than it was several years

ago. It is very portable due to the fact that it can be used on PDAs, some cell

phones, and other thin clients. Some of the more popular ones on the Internet

are MSN Messenger, Yahoo Instant Messenger, AOL Instant Messenger and

ICQ. These clients provide a free and convenient way of communication between

computers and thin client users.

The goal of this project is to modify an existing chat client to create a chat

client that can respond independently. The responses are taken from a database

of stored responses. These stored responses have been collected from

conversations of other users.

There is lots of IM client software that has been developed by

programmers around the world to work with existing servers or to work with

their own servers. Some clients have been developed for use on intranets

whereas some have been created for use on the Internet.

The general idea is to send a message from one client to another client

via a server using an agreed-upon protocol. The most common protocols are

sockets and TCP/IP or UDP. After the server receives the message, it is then

sent to the addressed client.

The server is generally a listener who waits at a given port or at an

initiated socket. After a communication channel is established, initial messages

are sent back and forth. The first message is automatic, and its purpose is to

authenticate the user to the server. Subsequent automatic messages consist of

Page 8: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

3

buddy lists, buddy statuses, and other information that the server keeps for the

client. These are done while the user is logging onto the chat client.

2 Locating the Best Chat Program Shell

There are many websites where open source can be found. Two of the

more popular ones at were found are http://sourceforge.net/ and

http://freshmeat.net/. Several clients were considered and time was dedicated to

them. The source was downloaded and compilations were attempted. It was a

very challenging task to find clients that contained complete source, could be

compiled, did not have many errors, and contained errors that could be resolved.

One such client Ebjava was found at http://sourceforge.net/projects/ebjava/.

Compiling Ebjava was successful. However after working with Ebjava for

sometime we found errors that could not be resolved. Several months were

dedicated to making Ebjava work but all attempts were unsuccessful.

Finally, a complete working client, JMSN Messenger, was found. This

client was written by Jang-Ho Hwang from Korea [5]. This client was found on

Source Forge’s website. The author was contacted through e-mail but no

response was received. Since his copyright statement allowed for modification or

redistribution, modifications were initiated.

Page 9: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

4

3 Installing and configuring the MYSQL database

The two possible mediums that can be used to store conversation are files

or a database. Storing the conversation in files is not appropriate because of the

time it would take to search the files and the file access time. It was determined

that storing the responses, user names, people online and other necessary

information on a database was the ideal solution. We have considered MYSQL

as an ideal database for small-scale projects. The MYSQL database server 3.23

was downloaded from http://www.mysql.com/. The installation was done on an

HP Pavilion XH156 laptop. Documentation on installing and configuring MYSQL

was obtained from http://dev.mysql.com/doc/mysql/en/index.html. The table

space and database were created using the mysqlgui and mysql tools. Tables

were then created for Common_words, Conversation1, Conversation2,

Question, Code and Users_online. The structure of the tables is shown below.

Table Columns

Common_words words

Conversation1 Statement1

Conversation2 Statement1

Question Question, Answer

Code first

Users_online Online

FIGURE 1: TABLE OF DATABASE TABLES

The Common_words table is used to store frequently used words that

generally do not alter the primary meaning of the statement. Conversation1

Page 10: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

5

table is used to store responses from the initial person chatting. Conversation2

table is used to store responses from the second person chatting. Question

table is used to store complete pairs of responses from both parties. Code table

is used to store internal codes for the application. Users_online table is used to

store users that have been added to the buddy list previously.

Page 11: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

6

4 Design Details 4.1 Selecting the response Algorithm

Figure 2 demonstrates the actual steps taken in the algorithm.

On the first step the received response is checked against the database. The

response received is compared to the question column of the question table for

an exact match. If a unique result is found then the answer column is chosen as

a response. If there is more than one in the result set received from the

database, each question column result is separated into common and regular

words. The common words are determined by cross-referencing the

Common_words table. The Common_words table was constructed manually

and it was fine tuned to improve the performance of the system. Only the regular

words of each statement are considered. The common words are ignored. The

response taken is also separated into common and regular words. The common

words are also removed. The regular words in the statement are compared with

the regular words of each match from the database. The entry that has the most

matches will have its corresponding answer entry returned. If there are no

matches by cross-referencing the question column, the next step is to make a

comparison on the answer column, repeating the previously described steps.

The second step is to replace the spaces found between the words in the received statement by wildcards, and check the database for any similar statements. A comparison of the response received to the question column of the question table is done first. If a result set of one is received then its answer column is sent as a response. If there are more than one in the result set received from the database, each statement is broken down into common and

Page 12: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

7

regular words. As previously discussed the regular words from the input

Single Response

Found Multiple Find best + [%] Database Response Response Lookup

(question / Answer)

Not Found Database Lookup (common_words) Remove Common Words

Add

Wildcards Database Lookup (question / answer) Single Response Found Multiple Find best Response Response

Not Found Remove a Regular word Database Lookup Single (question / answer) Response Multiple Find best Not Found Response Response Found

No Response Find Significant Create

Can be Found Regular word Automated Response

FIGURE 2: SELECT RESPONSE ALGORITHM

Input Text

Return Response

Return Response

Return Response

Return Response

Return Response

Return Response

Return Response

Page 13: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

8

statement are compared with those of each entry in the result set. The

question column in the result set that has the highest number of matches has its

corresponding answer column sent to the other person. If there are no matches

by matching the question column, the next step is to make a comparison on the

answer column, repeating the previously described steps.

The next step is to remove the common words from the received

response. We remove one of the regular words from the right side of the

remaining statement. Wild cards are then added. The resulting statement is then

compared to the question column on the question table. If matches are not

found, a comparison is then made to the answer column on the question table.

In either case, if several matches are found, the best one is determined by

matching the regular words of the result and the response. If none of these

produce any results, another regular word is removed and the previously

described steps are repeated. If this does not produce any result, another regular

word is removed and the process is repeated. In the case where no data is found

after searching through the database, a constructed answer is returned. Several

possible answers have been stored as possible partial answers. The input

statement is stripped of all of its regular words (words not on the common words

table), and these words are added to an array. The most significant of the regular

words is picked from the array; this is basically the one with the largest length.

This word is added to one of the randomly selected possible answers. The

constructed response is then returned. An example of a possible constructed

response is "Tell me more of this + 'significant regular word'." Common words are

Page 14: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

9

an important consideration because they can be interchanged without changing

the basic meaning of the phrase. Consider the 2 phrases, “Can I go to the park”

and “Can we go to the park”, if the “I” and “we” are eliminated the statements

both have the same basic meaning. Since “to” and “the” really don’t add meaning

to the statement, they would be considered common words also. Considering the

previous statements, the response to both would likely apply, thus they are

considered. In view of this, the best course of action is to remove the common

words from the responses before making more refined comparisons.

The common words table has a direct reflection on the response that is

produced. If too many words were added to the common words list, this would

result in the regular words list being smaller. This may cause some larger result

sets from the database, thus resulting in a larger result set of unrefined

responses to process. If we consider an extreme situation where a poor selection

of common words can cause all the words to be considered as common in a

statement, this would produce a situation where no matches can be found

because there are no words to compare. Thus the common words list was

modified several times to produce the best results.

Page 15: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

10

4.2 Acquiring Conversation Initial Statement Sends Response Initial Statement Receive Response

Thread 2 waits, Insert response Selects and delete

User 3 Conversation2 Conversation1

Thread 1 waits Selects and delete

Insert Pairs of response

Question

Insert response

Receive Response

Sends response FIGURE 3: ACQUIRING CONVERSATION

Figure 3 shows the flow of capturing data from a conversation triangle. A

more simplified version can be seen in the Figure 4. Figure 3 shows the actual

flow of data between the database and the client. The boxes shown with ‘client 2

& User3’ and ‘client 1 & User3’ represent dialog windows on the AI JMSN

application. ‘User 1’ and ‘User 2’ are two external users that have been added to

the buddy list.

AI JMSN Application

Client 2 & User3

User 2

User 1 Client 1 & User3

Page 16: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

11

User1 User2

Conversation1 Conversation2

User3 FIGURE 4: FLOW OF TRIANGULATION (SIMPLIFIED VERSION)

Conversation1 is used to store responses from user1 as shown in Figure

3. User3, which is the AI JMSN application, relays this response to User2.

Similarly User3 relays the response from Conversaiont2 to User1. The Question

table is used to store pairs of responses from User1 and User2. The question of

how to get the database populated was addressed in the following manner. First

find two people online initiate a chat with person one. The response attained from

person one is then sent to person two. It would appear to both people that they

are chatting with person three. However person three is just serving as a relay for

messages between person one and person two. All the complete pairs of

responses were then stored in the Question table (this can be easily seen in

Fig1). This task involved adding several modules to be referenced by the

chatdialog module. The buddytree module is used to display the users that are

online. When setting up conversations with 2 people, only person 3 will post

Page 17: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

12

complete pairs of responses to the question table so as to reduce redundancy.

Both modules post their single responses to their respective tables. The other

person views these responses. User 1 will create a thread that will check for

responses from person two on conversation 2. User 2 will create a thread that

will check for responses from person one on conversation 1(figure 3).

Another module was created to create possible usernames. This module

adds the possible names to the buddy list. A base name is entered into the input

window. Text is then added to it that then creates different variations of the base

name. The base name was attained from a census of the most popular names

chosen. The names used were first names, last names, and a combination of first

and last. The result was then used as a base name. If the name was used

before, it will not be used again. This is verified by querying the database

(users_online table). The resulting names are then added to the buddy list. The

new names are also stored in the database so it is not possible to use the same

name multiple times. After the name is added to the buddy list, it is then sent to

the MSN server to let the user know that he has been added to the buddy list. If

the user acknowledges this, he accepts it and makes it possible to be seen when

online, etc.

Several modules were created to select, delete, insert, and update data on

the tables. The connection is made through Java’s JDBC.

Page 18: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

13

4.2.1 JMSN Main Functionality 4.2.2 Main Interface

FIGURE 5: MAIN INTERFACE OPTIONS

The Main Interface contains the usual options that are common with other

chat clients. The general options were modified to include the option of adding

several users (Add several buddies).

General options

User Logged on

Groups

Page 19: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

14

4.3 Chat Interface

FIGURE 6: CHAT DIALOG

Response button is used to return one response based on the received response

Get Conv button is used to set up conversations between two other users

auto-res button is used to automatically respond to all responses

Page 20: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

15

4.4 Main Interface FIGURE 7: MAIN INTERFACE

Figure 7 shows the links between the objects in this project. The Main

object starts the application. After the main Object is initiated, an instance of

MainFrame

MSNMessenger

MsnAdapter

BuddyTree

Main

LocalCopy

UserStatus

MSNMenuBar

EventViewer

AddConfirmDialog

NotificationProcessor

DispatchProcessor

JScrollPane

MsnFriend

Hashtable

BuddyGroup

LoginSplash

BuddyList

ActionGroup

LocalCopy

MsnFriend

MsnFriend

MSNMessenger

MainFrame

BuddyTree

MSNMessenger

MainFrame

MsnListener

Page 21: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

16

MainFrame is created. The MainFrame creates instances of listeners and other

objects that are shown in the above diagram. These objects create the main

interface that can be seen in Figure 7. The MSNMessenger object that is created

initiates listeners such as the MsnListener. The MsnListener communicates with

the MSN server to establish a steady communication channel. The diagram

shows multiple instances of the same objects. However, these refer to the

existing object where a new object may be created, but it is cast to the existing

one. One-way to explain this is they are just images of the initial object.

Below are some of the main modules used in this project:

4.4.1 ChatDialog :

This is responsible for parsing conversation received from the

MSNlistener. This module formats the text sent out to the person involved in the

chat. It has the interface to produce an automated response depending on the

response received. It also is used to relay conversations between two MSN

users. This module is also used to send files. Lastly it is used in the process to

determine the most appropriate response to a statement.

4.4.2 BuddyTree :

This module keeps track of the buddy list it communicated with the

AbstractProcessor module. It receives or transmits updates to the buddy list.

This object keeps the actual MSN user objects for the task.

Page 22: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

17

4.4.3 BuddyList:

This module stores all the buddies, groups, and the information

received from MSN. This structure also sorts the users in alphabetical order. This

sorting is done in the individual groups.

4.4.4 AbstractProcessor :

This listens for information from the MSN server. Information that is

listened for are user statuses (online, away, offline, etc), user information, etc.

4.4.5 MSNMenuBar:

This module represents the menu options located on the main

application. This module actually transfers control to the actual application when

an option is chosen.

4.4.6 SelectCommon:

This class makes calls to the database to find common words.

4.4.7 SelectResponse:

This class makes calls to the database to find the response that is

similar to the buddy’s response.

4.4.8 AddFriendsDialog:

This module accepts input from an input dialog window and uses it as a

base. This base is then incremented with text. The resulting text is then added to

the buddy list. The Buddy tree object then sends the list to the MSN server which

checks for the newly used id’s status.

Page 23: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

18

4.4.9 ChatArea: This class is responsible for displaying the chat area and keeping track

of the responses between the parties chatting.

4.4.10 MsnListener: This class is responsible for relaying all of the actions that are

communicated with the MSN server.

4.4.11 MsnFriend:

This object contains all the information pertaining to the individual

users and buddies. This is the object that is used to on the buddy tree to

represent all of the individual buddies.

4.4.12 Main:

This class is the main class that starts the application. This class

initiates the listener’s class. It also initiates the classes that logs in the user to the MSN server.

4.4.13 MSNMessenger :

This class is responsible for all the events that are part of the MSN set

of events. The common events are unread mail, add buddy failed, who added

me, who removed me, file sent, file received, instant message received, etc.

4.4.14 SwitchboardSession:

Page 24: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

19

This module keeps track of all of the current activities. These activities

are relayed to the MSN server. Some of the activities are current conversations,

invitations, receipt of files, sending of files, processing of messages, who is

typing, who joined the conversation, etc.

5 Effectiveness of the results

The effectiveness of the resulting application is measured on the

conversation that is collected from the initial users. Using simple questions and

responses, the effectiveness of the application is evident. However, after chatting

with the AI Chat client and in depth conversation is reached, we have found

some of the responses were unrelated. We chatted with some of the other AI

chat clients online and found that this was a common problem. If the AI client

does not understand a response or question, a vague or unrelated response is

returned.

6 A Comparison of other AI Chat clients

Some of the other AI Chat clients are A.L.I.C.E created by A. L. I. C. E.

Artificial Intelligence Foundation, the Eliza program, Ella, the winner of the 2002

Loebner Prize Contest, and The Electronic Brain AI Bot. Later in this section,

comparisons will be made with some of these AI chat clients.

A.L.I.C.E. is an artificial intelligence natural language chat robot

based on an experiment specified by Alan M. Turing in 1950. The A.L.I.C.E.

Page 25: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

20

software utilizes AIML, an XML language we designed for creating stimulus-

response chat robots. Some view A.L.I.C.E. and AIML as a simple extension of

the old ELIZA psychiatrist program. The comparison is fair regarding the

stimulus-response architecture. However, the A.L.I.C.E. bot has at present more

than 40,000 categories of knowledge, whereas the original ELIZA had only about

200. Another innovation was provided by the web, which enabled natural

language sample data collection possible on an unprecedented scale [6].

A.L.I.C.E was first implemented in 1995 using SETL, a language based on

set theory and mathematical logic [6]. After chatting with A.L.I.C.E, we found that

this project shared common problems with ours. Common statements and

questions are not a problem for A.L.I.C.E or this project. However when the

question or statement over steps the boundary of being common, the response is

vague and sometimes not related. On the technical side, A.L.I.C.E has a web

user interface; whereas, this project is a complete chat client with AI functionality.

The A.L.I.C.E application was written in Java, same as this project, and the

questions or statements are stored on flat files. A.L.I.C.E has the flexibility of

adding a database. The database that the website recommends is MYSQL.

A.L.I.C.E also uses XML for the logging of data to the files. In regards to the

structure of this project and A.L.I.C.E, it is a chat client that can be accessed

through a browser or a local client. It uses similar client server technology as this

project. A.L.I.C.E is not connected in any way to a chat service such as ICQ,

MSN, or Yahoo. In contrast this project is connecting to MSN. Both A.L.I.C.E and

Page 26: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

21

this project go through a state of acquiring data (questions or statements) to be

used for actual conversations.

Joseph Weizenbaum in Communications of the ACM described the

original ELIZA in January 1966. ELIZA was one of the first programs that

attempted to communicate in natural language [3]. After examining snippets of

conversation between ELIZA and a person, it seemed like a conversation

between a psychiatrist and a patient. A good portion of the responses are a

manipulation in conjunction with other words of the actual question or statement.

However this could have been due to the lack of saved responses on the parts of

conversation that were examined. The application finds keywords and patterns in

the statement and creates a response with the matches that are found. The

keywords are given weights and based on these weights a response is selected.

ELIZA is very similar to this project in the way that the response is selected. The

versions of ELIZA that we researched were scripts that ran in a web browser.

Similar problems were found with the responses when an actual chat session

was started. However, what makes ELIZA a more rigorous chat client is the way

an answer is produced when there is no related material stored. As mentioned

before, a manipulation of the input statement or question is manipulated to

produce the output response in this case.

“Ella is the winner of the 2002 Loebner Prize Contest for "Most Human

Computer". She is a charming on-line chatterbot with an interface using multiple

images and text display boxes. Ella can play full-featured Blackjack, tell

Page 27: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

22

I Ching fortunes, and performs various useful functions, all with natural language

interaction. A lexical database with more than 120,000 entries is used to assist

her knowledge and usefulness.”[9] After chatting with a version of Ella, it seemed

that the application is used for learning purposes. There are versions available

for different purposes such as math, games, books etc. These applications are

loaded with learning information and then distributed. These applications provide

an interactive way for people to learn materials on different topics. The source is

not available for download so an analysis of the architecture could not be done.

This application combines voice, images, and video in its responses. It uses

voice recognition software to handle voice conversations. Ella uses Databases to

store the entries for the different books.

7 Issues and problems

Finding the appropriate chat client to work with was very difficult. The

sources that were considered have been found to be incomplete or riddled with

problems. It took some time to understand the code and become familiar with the

clients. Some of the clients were found to be functionally inflexible, thus were not

considered. The client that was chosen was found to be inflexible towards this

project in some ways. The only way possible to find out if someone is logged on

is by adding the person to the buddy list. The reason for this is that objects are

passed between the client and the MSN server for the entire buddy list so it can

be refreshed. This makes it impossible to send only one user to find out his or

her status. Due to this reason, random users are added to the buddy list, which

Page 28: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

23

only has a capacity of approximately 100. The user must add the person to the

buddy list, which may or may not happen. After adding 100 users to the buddy

list and waiting a few days to be added, it was disappointing to discover that only

a few users were found to be online. This restricted the amount of people

available for conversation. For this reason a very large amount of conversation

data was not gathered. This may have been a restriction that was instigated by

MSN. Another possibility was to send a random message to a user without

finding out if this person was logged on. After exploring this, it was found that this

would only work if the user (MsnFriend object) object came from MSN. This

resulted in the buddy list being the only way to communicate with random users.

The data acquired from the conversations was very raw and could have

been cleaned up to be more effective. Some users still use full words in their

online conversations so good data can be acquired from these users. If one user

sends out more than one question or statement, or responds slow to a statement,

this would cause a mismatch when the data was saved. To expand on this

consider Figure 8 below:

User1 User2

1 Hi How are you doing?

2 What did you do today?

3 Not too much

4 I am great

FIGURE 8: TABLE OF UNMATCHED PAIRS

Page 29: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

24

Looking at Figure 8, it is not possible to accurately save the response that

applies to the question unless they are in order. Sometimes one user may

respond to something that was not related to the previous statement this could

cause strange or unrelated pairs in the database.

Difficulty was met when trying to initiate conversation between two

strangers. Most of the unsuspecting users spent time trying to find whom they

are chatting with than actually chatting. After finding out that the other person

was just as clueless about who they were chatting with, the conversation usually

ended. Due to this reason in-depth conversation was very difficult to attain.

Collecting data from random users online can have its setbacks. Users online

can use abbreviations, instead of full words, which may cause some

complications with the parsing algorithm. Since only some people online uses

abbreviations, unless there were entries in the database with these

abbreviations, a good match would not be found. Some of the common

abbreviations were added to the common_words table so as to exclude them

from the search.

Gaining familiarity with MYSQL from an administrator and developer

perspective was challenging but a very rewarding experience. In this project we

have gained the knowledge of setting up a database, assigning privileges,

creating indexes etc.

Page 30: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

25

8 Future enhancements

Refining the acquired conversation data will give the responses more

effectiveness. Finding a way to remove the responses that are not properly

paired would improve the responses. By creating a method to find only users that

are online would enable a greater accumulation of conversations. Refining the

database of common conversations would increase the effectiveness of the

responses.

9 Conclusion

The effectiveness of this application is dependent upon the data

(conversations) that is captured and stored as reference. Storing structured and

intelligent conversation that was created would increase the complexity and

intelligence level of the responses.

Several issues may have influenced the data captured. One of these

issues is that sometimes responses are not always directed to the last response.

Some responses may be arbitrary or a response to a previous statement.

However, when these responses were stored, they were paired with the previous

or current response incorrectly.

The method that was used to capture data was discussed in this paper.

This method may not have been the most effective way to store data. The reason

is most people chatting online who do not know each other will spend more time

figuring out who the other person is instead of actually having a meaningful

Page 31: Artificial Intelligence Chat - Penn State Harrisburg Math/Computer

26

conversation. When they realize that they are talking to a stranger they will end

the conversation. This method was successful in capturing basic conversation

responses but not in-depth conversation responses.

In this project, an attempt was made to capture natural conversation

responses. However this data could be altered to make it more meaningful.

Some responses captured may be crude and unpleasant. These could be

removed, leaving only the acceptable responses. These are customization steps

that would be important if a more focused purpose is determined.

10 References 1. A. M. Turing (1950) Computing Machinery and Intelligence. Mind 49: 433- 460. 2. Deitel and Deitel, Third edition, Java™ How to Program, Prentice Hall 1999,

Upper Saddle River NJ 07458. 3. http://chayden.net/eliza/Eliza.shtml 4. http://freshmeat.net/. 5. http://sourceforge.net/ 6. http://www.alicebot.org/ 7. http://www.botspot.com/search/s-chat.htm 8. http://www.codeproject.com/useritems/AI_Chatbot.asp 9. http://www.ellaz.com/AI/ 10. http://www.realtor.org/WebIntell.nsf/0/9d9b6f0d9a364b7886256aa900510d4d?O

penDocument 11. Joseph Weizenbaum: ELIZA - a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1): 36-45 (1966) 12. Vikram Vaswani, Pamela Smith , MySQL: The Complete Reference, , McGraw- Hill Companies 2002, 2100Powell Street, 10th floor Emeryville, CA 94608.