honours project 2 - carleton universitypeople.scs.carleton.ca/~arpwhite/documents/honours... ·...

34
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681 1 Abstract This project involves an application that runs in the background that waits for incoming messages, and responds with an answer. This application is what is called “chatbot” and it responds to English questions about stock quotes. A chatbot is a computer program that runs without human interaction and replies to messages that are sent to it. A chatbot is short for “chatting robot”. This chatbot combines the functionality of the Jabber protocol for messaging, Alice for natural language rule based processing and Cocoa for the user interface. It runs under Mac OS X and has been verified to run on version 10.2.2 of the OS. Acknowledgements The author would like to thank Dr. Tony White for all of his advise and guidance throughout this project. The author would also like to thank all the people that worked on the Alice tool, especially the ones that created J-Alice.

Upload: others

Post on 13-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

1

Abstract

This project involves an application that runs in the background that waits for

incoming messages, and responds with an answer. This application is what is called

“chatbot” and it responds to English questions about stock quotes. A chatbot is a

computer program that runs without human interaction and replies to messages that are

sent to it. A chatbot is short for “chatting robot”. This chatbot combines the

functionality of the Jabber protocol for messaging, Alice for natural language rule based

processing and Cocoa for the user interface. It runs under Mac OS X and has been

verified to run on version 10.2.2 of the OS.

Acknowledgements

The author would like to thank Dr. Tony White for all of his advise and guidance

throughout this project. The author would also like to thank all the people that worked on

the Alice tool, especially the ones that created J-Alice.

Page 2: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

2

Table of Contents:

1) Introduction pg. 4 1.1) Chatbot pg. 4 1.2) Jabber pg. 4 1.3) Alice pg. 5 1.4) Cocoa pg. 5 2) Alice pg. 5 2.1) How Alice Works pg. 6 2.2) Rules pg. 6 2.3) AIML pg. 7 3) Jabber pg. 8 3.1) Architecture pg. 8 3.2) Message Example pg. 9 4) Chatbot pg. 10 4.1) Purpose pg. 11 4.2) Choice of Technology pg. 11 5) User Interface pg. 12 6) Program Flow pg. 14

6.1) Receiving a Question pg. 14 6.1.1) Connecting to the Jabber Server pg. 15

6.1.2) Receiving a Message pg. 17 6.1.3) Parsing the Message pg. 18 6.2) Processing the Question pg. 19 6.2.1) Interacting with Alice pg. 19 6.2.2) Stock Handler pg. 20 6.2.3) Rules for Stock Handler pg. 22 6.3) Replying With an Answer pg. 23 7) Testing pg. 24 7.1) What was expected pg. 24 7.2) Connecting to the Jabber Server pg. 24 7.3) Responding to Messages pg. 25 7.4) Results pg. 27 8) Conclusion pg. 28 8.1) Future Work pg. 29 8.2) Bugs pg. 30 9) References pg. 30 10) Licenses pg. 31 11) Appendix A A-1

Page 3: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

3

List of Figures:

FIGURE 1 - Screen shot of the chatbot user interface. pg. 12

FIGURE 2 - Screen shot of the chatbot after connecting to the server. pg. 14

FIGURE 3 - Two messages received while connecting to the server. pg. 25

FIGURE 4 - Screen shot of the chat session within Fire. pg. 26

FIGURE 5 - A screen shot showing the received messages. pg. 27

FIGURE B-1 - Simple system overview. B-1

FIGURE B-2 - UML overview of the chatbot architecture. B-2

FIGURE B-3 - Message Sequence Chart for Chatbot Connection B-3

FIGURE B-4 - Message Sequence Chart for Incoming Messages B-4

FIGURE B-5 - Message Sequence Chart for Alice Response B-5

Page 4: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

4

1) Introduction

This project is a chatbot that responds to queries about stock prices and returns

current price of that stock. It contains a User Interface (UI) that was written in Cocoa

using Objective-C. In addition to the UI, the chatbot contains code that works with the

Jabber Instant Messaging (IM) protocol, and Alice for natural language processing. An

simple overview of the system can be seen in Fig. 6.

1.1) Chatbot

A chatbot is a program that runs in the background on a computer connected to a

network that waits for messages to be sent to it. Once a message is received from a user,

the chatbot decides what to respond with, and sends a message back to the user. This

way, the program can run unattended, and it makes its own choices of what to respond

with without human interaction or supervision.

1.2) Jabber

Jabber is an open-source Instant Messaging protocol that is based on XML.

Jabber. Jabber has other attractive features, such as: the server is free, it has transports

that allow it to work with other IM schemes, and the protocol is simple. The use of other

transports for MSN, ICQ, etc. are not used in this project and therefore will not be

discussed.

Page 5: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

5

1.3) Alice

Alice (Artificial Linguistic Internet Computer Entity) is an open-source program

that does natural language processing using rules. It is mostly used as a robot for

chatting. It uses AIML (Artificial Intelligence Markup Language), which is an XML-

compliant language for the rules. There is a version called J-Alice that is written in C++,

which is the version used for this project.

1.4) Cocoa

Cocoa is a framework from Apple that runs under Mac OS X. Cocoa was

developed from OpenStep and is therefore tied in with Objective-C. Objective-C is an

object oriented language that is very similar to C++. The other language that works with

Cocoa is Java, but since the Jabber and Alice code is in C++, Objective-C is great

because it can work with unison with C++. In addition, the development tools to Cocoa

are free and included with Mac OS X.

2) Alice

The next three sections describe how Alice works, AIML files, and the rules that

are inside these files.

Page 6: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

6

2.1) How Alice Works

The internal workings of Alice are basically composed of two items. The first is

the Kernel, the second is the Handlers that work with the additional XML tags. The

kernel is responsible for loading the AIML files, which contain the rules, and to process

statements. In the chatbot, a new kernel is created when the application is launch. This

new kernel reads in specified AIML files. Once the files are read in, Alice is ready to

match up statements to rules, and provide an answer. When a statement is passed into

Alice, it will match it up to a rule, and based on the information in the rule, provide a

logical response.

2.2) Rules

Once the AIML files are read in, each rule is “learned”. When Alice learns a rule,

it is able to match the rule up to a message passed in that was received from an outside

user. Matching up a rule occurs by looking at the message and seeing if the grammar in a

rule matches it. For example, if the rule is “_ school”, which matches up the word

“school” after the beginning of a sentence, Alice would match it up to the message “I am

at school”. This is true since the word school is after the beginning of the sentence. By

having multiple rules, Alice can give a meaningful answer to a wide range of questions

about the same topic.

There are two main symbols that are used for most of the rules for the chatbot. They are

the ‘*’ and ‘_’ symbols. The ‘*’, or wildcard symbol tells Alice to match up any word

Page 7: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

7

for this symbol. So the rule “I like ” would match up to “I like computers” or “I like

skiing”. The ‘_’ symbol can be used a the beginning or the end of a rule. If it is used at

the beginning, like the rule”_ dogs”, Alice will match up any sentence that has the word

“dogs” after the beginning of the sentence. If it is used at the end of a rule, like “_

school”, Alice will match up any sentence that ends with the word school. The

‘*’ and ‘_’ symbols can be used together in a rule, but Alice only supports one ‘*’ per

rule.

2.3) AIML

The files that are read in that contain the rules are called AIML files. The files are

based on XML and contain information about how to handle specific questions. For the

use in the chatbot, there are 4 important tags that are used in the AIML files. An example

of an AIML containing one rule and the meaning of each tag are as follows:

<aiml version=”1.0”>

<category>

<pattern>_ happy *</pattern>

<template>Why are you happy?<template>

</category>

</aiml>

<aiml> - This tag indicates that the file is of type AIML and the version is 1.0.

Page 8: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

8

<category> - This tag tells Alice that everything inside of it is a unit of knowledge which

is also referred to as a rule.

<pattern> - This tag is what needs to be matched up for the rule to occur. In this case if

the question passed in contains the word “happy” somewhere other than the beginning of

the sentence, the result “Why are you happy?” will be passed back to the user.

<template> - This tag contains the answer that Alice will pass back to the user if the rule

is matched up with the question.

There are many more tags available to use, such as the <srai> tag for recursive pattern

matching, but since they were not in the scope of this project, they will not be discussed

here. More information can be found on the web at http://www.alicebot.org.

3) Jabber

The next two sections describes how Jabber works and an example of an instant

message.

3.1) Architecture

Jabber uses a client-server architecture as opposed to a client-client architecture

that some other IM systems use. This enables Jabber user to message other users who are

not on the same Jabber server. When a user is ready to login to a server, a TCP/IP

connection is made on port 5222. This connection will stay alive until the user logs off.

When a message arrives on the server for a user, the user’s client is set the message. This

means that the client does not have to poll the server to see if there are messages waiting.

Page 9: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

9

This reduces the amount of traffic flowing to the clients. Because each user could be on a

different server, the user’s address is their username @ servername. For example, it

could be [email protected] All messages that are exchanged between the server

and clients, including connection, registration, and instant messages use Jabber’s XML

protocol.

If two users are communicating together, but they are on different servers, Jabber

ensures that the message goes to the right person. Suppose there are two users, user1 and

user2, and they are on server1.com and server2.com respectively. If [email protected]

sends a message to [email protected], server1 will connect with server2 and deliver the

message. server2 will then forward the message on to user2.

3.2) Message Example

In the Jabber protocol, there are two different types of instant messages that can

be passed between the server and client once the client has logged in. The first is a single

instant message, and the second is a chat message. A single message is exactly what it is;

it is one message that is not part of a group of messages. The message is sent and the

client waits for a response. A chat message on the other hand is part of a chat session,

where each message is part of a group of messages that each user can see. The advantage

to chatting is that you don’t need to fill out a new message each time you want to say

something. The chat window stays open and all communications between users stays

visible until the session is over.

Page 10: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

10

For the example, a single instant message will be used since it is very similar to a

chat message but contains a little more information. Here is a sample message, and a

description of what it contains:

<message to=’[email protected]’ from='[email protected]’><body>How are you doing

today?</body><subject>Hello</subject>

<body>How are you doing today?</body >

</message>

In this example, a message is being sent from [email protected] to [email protected]. The

Notice that the name of who it is from and directed to contains the username followed by

the address of the server. In this case, the address of the servers are different. The

subject of the message is “Hello”, and the body of the message is “How are you doing

today?”. The message is user readable since it is in XML which makes developing code

for Jabber easier.

4) Chatbot

For this project, a chatbot was developed that used the Jabber for the IM protocol,

Alice for the natural language processing and Cocoa for the UI. The following sections

description in detail the inner workings of this application. A UML overview of the

different parts of the chatbot can be seen in Fig. 7.

Page 11: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

11

4.1) Purpose

The purpose of the chatbot is to allow user to send an IM to it asking about stock

prices. The user sends a question about a particular stock, and the chatbot responds with

the current price of that stock. For example, if the user sent the message “What price is

ERICY at?”, the chatbot could respond with “ERICY is at $11.53” (ERICY is the stock

ticker for Ericsson). There are two main benefits of the chatbot over a user manually

going to a webpage and finding the price. The first is that the message that the user needs

to send is small and only takes a few seconds. The second is that the user can ask in

English what stock they would like to look at. This means they can ask in a variety of

ways. Examples are: “How is Ericsson doing?”, “How is ERICY doing in the market?”

or simply “ERICY”.

4.2) Choice of Technology

For the instant messaging part of the chatbot, Jabber was chosen for a number of

reasons. The first being that the protocol is public, the second is that the server is free

and runs under Mac OSX. The final reason is that there are numerous open source clients

that made learning the Jabber protocol easier.

For the natural language processing, Alice was chosen because of the fact that it is

mature, and there is a C++ version of the library. In addition, the C++ version (J-Alice)

is free and open source which made the integration into the chatbot easier.

Page 12: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

12

For the UI, Cocoa was used because this project runs under Mac OS X and Cocoa

is a powerful and easy to use framework. This allowed the UI to be developed quickly,

so the focus could remain on the internal workings of the chatbot.

5) User Interface

The user interface for the chabot it quite simple. There are text fields for the

username and password of the Jabber account that the chatbot will be using and for the

address of the Jabber server. In addition, there are buttons for connecting to the server

and a button to refresh the rules that Alice uses.

FIGURE 1 – Screen shot of the chatbot user interface.

(1) This is the text field for the IP address of the Jabber server.

Page 13: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

13

(2) This is the text field for the username that the chatbot will be using to connect with to

the Jabber server.

(3) This is the text field for the password for the given username. Notice that it is a

secure text field, in that the password is hidden from the user. This is to prevent someone

who is looking at the screen from knowing the password.

(4) This is the button that the user must press to have the chatbot connect with the Jabber

server.

(5) This button is used to have Alice re-read the AIML files that contain the rules.

(6) This is the area where incoming message from the server are displayed.

To use the chatbot, the user must first fill in the username, password and server address

text field with the appropriate information. Once this is done, the user must press the

“Connect” button. After pressing this, the chatbot will connect with the Jabber server,

and any incoming messages from the server will be displayed in the “Incoming

Messages” text area.

Page 14: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

14

FIGURE 2- Screen shot of the chatbot after connecting to the server.

6.) Program Flow

There are 3 major steps that the chatbot goes through from the time it receives a

question, to the time it responds. The first is communicating with Jabber to receive the

instant messages (questions). The second is processing the question with Alice to come

up with an answer. The last is returning this answer back to the user that sent the original

message. Each of these steps will be detailed in the sections below.

6.1) Receiving a Question

Before the chatbot can receive a question, it must first connect with the Jabber

server and establish a connection. For this project, it is assumed that the user account that

the chatbot is using has already been setup. Once the chatbot is connected to the server,

Page 15: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

15

it waits for and instant message (the question). After receiving a question, the chatbot

parses it and prepares it for processing in Alice. Each of these three steps is detailed in

the sections below.

6.1.1) Connecting to the Jabber Server

Connecting to the Jabber server is a fairly straightforward process since all the

messages are in human readable XML. To begin the connection, the user must first fill in

the “Server IP”, “Username” and “Password” fields in the chatbot’s user interface. Once

this is done, the user must click on the “Connect” button. This will initiate the

connection. For the connection, a BSD socket is made and an attempt to connect to the

Jabber server begins. Once the socket connection is made, the registration process

begins. This process has 3 steps to it, which are: connect, registration and presence. All

of the sending and receiving with the Jabber server is done through BSD socket calls. A

message sequence chart for the connection can be see in Fig. 8.

Now that a socket has been established, the first XML string is sent to the server. It

contains information telling the server that the chatbot would like to connect as a Jabber

user. This string contains the address of the server, and a message that identifies us as

type “client”. This string is:

"<?xml version=\"1.0\" encoding=\"UTF-8\" ?><stream:stream

to='24.42.217.7' xmlns='jabber:client'

xmlns:stream='http://etherx.jabber.org/streams'>"

Page 16: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

16

The string also contains a web address which can be used for connections to AIM, ICQ,

MSN, etc. This is not relevant in the scope of this project since only the Jabber protocol

is begin used. Therefore, it will not be discussed here.

The server receives the request for a connection, and it replies with an answer. This

response contains a unique identification number (id) that is assigned to all of the

messages in this connection coming from the server to the chatbot. This string is:

<?xml version='1.0'?><stream:stream

xmlns:stream='http://etherx.jabber.org/streams' id='3DF53FBB'

xmlns='jabber:client' from='24.42.217.7'>

The next step is to send a login to the server as a user with a registration request. The

request contains the username and password. In addition, the id of the message is

“auth2” which tells the server we are authenticating ourselves, as well as the <resource>

tag is set to “client”, since we are a client of the server. The string sent is:

"<iq id='auth2' type='set'>

<query xmlns='jabber:iq:auth'>

<username>myUsername</username>

<password>myPassword</password>

<resource>client</resource>

</query>

</iq>"

The response from the server to this message is:

<iq id='auth2' type='result'/><stream:stream

xmlns:stream='http://etherx.jabber.org/streams' id='3DF53FBB'

Page 17: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

17

xmlns='jabber:client' from='24.42.217.7'>

This message is close to the first message from the server, except the type is “result”.

This refers to the fact that the server has accepted our registration, and we must now

announce our presence, so that we are visible to other users. This means that once the

chatbot is visible on the Jabber server, it can start to receive instant messages.

The last step is to send a presence message to the server, which is:

"<presence/>"

This is a simple message indicating that the chatbot is available and can accept messages

from other users.

6.1.2) Receiving a Message

Now that the chatbot is connected to server, it must wait for incoming messages

(questions). This is accomplished by setting up a timer that polls the socket using the

select() method every 0.10 seconds. By polling the socket, the chatbot can quickly see if

an incoming message is waiting to be read. If select() method comes back saying that

there is a message, the chatbot reads the message into a buffer, and the next step is to

parse this data. A sample instant message could be:

<message type='chat' to='[email protected]

from='[email protected]/everybuddy'><body>What is the price of

AAPL?</body></message>

Notice that the message has a minimal amount of data, with the type of message, who it

is from, who it is for, and the body. A message sequence chart for receiving a message

can be see in Fig. 9.

Page 18: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

18

6.1.3) Parsing the Message

After receiving a message, it must be parsed to determine the specific information

needed for a reply. This information includes who is it from, the subject, the body of the

message, and whether it is a single instant message, or part of a chat. The first, who it is

from, is determined by examining the “from” property. It contains the username and the

address of the sender. The subject will be enclosed in the <subject></subject> tag if

there is a subject since some messages only have a body. The body of the message is

contained in the <body></body> tag. Finally, to determine if the message is a single

instant message or part of a chat, the “type” property is examined. If is it equal to “chat”

then the message is from a chat session, otherwise it is a single message. Here is an

example using the sample message from the previous part:

<message type='chat' to='[email protected]

from='[email protected]/everybuddy'><body>What is the price of

AAPL?</body></message>

From: “[email protected]

To: “[email protected]

Subject: “” (empty)

Body: “What is the price of AAPL?”

Page 19: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

19

6.2) Processing the Question

Alice now needs to process the message so it can find an appropriate answer.

This is done though a new handler called the StockHandler, and its job is to find the price

of a specific stock. How the chatbot interacts with Alice, and how the StockHandler

work are described in the sections below. A message sequence chart for the processing

of a message within Alice can be seen in Fig. 10.

6.2.1) Interacting with Alice

Now that a question has been received from a user, it must be passed on to Alice

for processing. Alice will determine the correct response to send back to the user based

on the rules it previously learned, and the context of the question. The way this works is

the chatbot sends the body of the message to Alice. The chatbot sends this information by

passing the body of the message to the Kernel class (which is part of Alice). Recall that

the Kernel was created when the application was launched. This Kernel finds the

appropriate handler, which in this case will be the StockHandler and passes the

information to it. So, for example, if the message body was “What is the price of

Corel?”, then the StockHandler will end up receiving that sentence. The way the

StockHandler works, and the rules involved are detailed in the two sections below.

Page 20: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

20

6.2.2) Stock Handler

Since the chatbot is needs to respond with the stock price of a company, a new

handler was added to Alice to deal with this. This job of this handler, called

StockHandler was to look up a ticker symbol on Yahoo Finance

(http://finance.yahooo.com), parse the web page and return the price of that ticker at that

moment. This is also known as “screen scraping”, which refers to the fact that the chatbot

is only interested in one small part of the web page. To have this handler called when a

stock price was needed, a new XML tag was used. This tag is defined as: <stock>ticker

symbol</stock>. The ticker symbol that is enclosed in the tag is the stock that will be

looked up. So when Alice matches up the question to a rule, the rule would have the

stock ticker inside the <stock> tag. The StockHandler would then be called with the

ticker symbol. An example of this would be:

Question:

“What is the price of apple stock?”

Rule:

<category>

<pattern>APPLE STOCK *</pattern>

<template><stock>AAPL</stock></template>

</category>

Page 21: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

21

Result:

Alice matches up the string “Apple Stock” to the ticker “AAPL”. The StockHandler

would then be called with AAPL as the ticker symbol since that is what is enclosed in the

<stock> tag. The price of Apple Computer (AAPL) would then be returned.

The StockHandler is a fairly simple piece of code. It finds the stock price of a ticker

symbol off the web, and returns the current price. The steps involved are detailed below.

The first step was to build the correct web address for the given ticker symbol. This was

done by using the page ������������������������<ticker

symbol> ��� and replacing the <ticker symbol> with the actual symbol. For example,

if the stock ticker was “MTIB.TO”, then the webpage would be

������������������������MTIB.TO���. This page provides basic

information for a stock including the current stock price. Once this address was formed,

the source for the page was downloaded. Within the source, a specific, unique string was

searched for. This string is “</font></td><td nowrap><font face=arial size=-1><b>”.

Immediately after this string is the price of the stock. Following the price is the string

“</b>” so it is easy to find where the price starts and ends. The following is a clipping of

HTML source from the Yahoo Finance webpage with the price in bold:

<font face=arial size=-1>Dec 11</font></td><td nowrap><font face=arial size=-

1><b>15.49</b></font>

Page 22: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

22

Once the price has been found, the string “<ticker symbol> is at <current price>” is

returned to the chatbot. An example of this could be “MTIB.TO is at $0.15”.

6.2.3) Rules for Stock Handler

To match up the questions that users send to the chatbot, and special set of rules

for Alice needed to be made. These rules are contained in AIML files that Alice reads in

when then chatbot is launched.

Each stock ticker (ex: AAPL, CORL, etc.) has its own AIML file that Alice reads in on

startup. This way, when the user sends a question to the chatbot, Alice can try and match

up the question to a rule. Since each rule matches up to a stock ticker, Alice can send the

stock ticker to the StockHandler for further processing. The alternative would be to have

a list of every since stock ticker and company name, and have Alice check if the user

picked any of them along with certain keywords.

Each rule looks for a different way that the user could be asking the price of a stock.

Since there are many different variations, the most common were chosen. This amounts

to about 30 rules for each stock. There are different groups within the rules. There are

groups that look for the company’s name with specific keywords, and others that look for

the stock ticker. Examples of each group of rules for Apple Computer (AAPL):

Company name: _ STOCK MARKET, * APPLE COMPUTER *

_ APPLE COMPUTER'S PRICE *

Page 23: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

23

Stock ticker: _ AAPL *

AAPL _

A complete list of the rules used for AAPL can be seen in Appendix A.

6.3) Replying With an Answer

The final step in processing a question from a user is to respond with a valid

answer. Since the chatbot deals with stock prices, it would be logical to send a message

back to the user with the stock price they were asking about. Once the chatbot has the

answer from Alice, it sends a message with the answer back to the user. This message

with either be part of a chat, or a single instant message.

An example of a chat message is:

<message type='chat' to=’[email protected]

from=’[email protected]/Chatbot'><body>ERICY is at $8.93</body></message>

An example of a single instant message is:

<message to='[email protected]

from='[email protected]/Chatbot'><subject>Hello</subject><html

xmlns='http://www.w3.org/1999/xhtml'><body>ERICY is at

$8.93</body></html></message>

Page 24: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

24

So, an example of a sequence of messages could be:

User: “What is the price of Ericsson?”

Chatbot: “ERICY is at $9.10”

7) Testing

There are two parts to the testing phase. The first was to ensure that the chatbot

connected to the Jabber server properly. The second was to use an extenal Jabber client,

in this case Fire, and have it initiate a chat session with the chatbot. These steps are

detailed, along with what was expected and the actual results in the sections below.

7.1) What Was Expected

It was expected that the chatbot would be able to successfully connect to the

Jabber server as well as respond with the correct answer to incoming messages from an

outside Jabber client.

7.2) Connecting to the Jabber Server

First off, the chatbot was launched. Once the user interface appeared, the

username was set to “maustin” and the password to “test”. This is an account that was

created earlier on the Jabber server for the chatbot to use. The server address was set to

“24.42.217.7” since this is my home machine’s address (the server was running on it).

Then the “Connect” button was pressed so that the chatbot would begin to connect to the

Page 25: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

25

server. Two messages appeared in the chatbot’s “Incoming Messages” text box which

confirmed that it had connected successfully to the server.

FIGURE 3 - Two messages received while connecting to the server.

In addition, debug output from the Jabber server shows that there is now 1 user, which is

the chatbot. This text is: “Sat Dec 14 17:02:53 2002 usercount 1 total users”

7.3) Responding to Messages

The next step in the testing was to ensure that the chatbot could respond properly

to incoming messages. The external Jabber client used in called “Fire” and was used to

connect to the server with username “user” and password “user”. This account was set

up earlier and used for testing purposes. Messages asking for stock quotes was sent to

the chatbot. The chatbot responded to each question.

Page 26: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

26

FIGURE 4 - Screen shot of the chat session within Fire. Notice that the same question

was asked using different sentences, but the answer was always the same.

Page 27: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

27

FIGURE 5 - A screen shot showing the received messages.

7.4) Results

The result of the testing was that the chatbot performed exactly as expected. It

was able to connect to the Jabber server and respond correctly to the messages it

received. In addition it was able to screen scrape the price of AAPL (Apple Computer)

and figure out the correct answer using Alice.

Page 28: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

28

8) Conclusion

Overall, the chatbot project was a success. Integrating the Cocoa UI, Jabber and

Alice was a challenge considering the UI was developed in Objective-C, the Jabber

routines were in C/Objective-C and the Alice code was in C++. The fact that it all works

together is nice to see, and the end result is a functioning chatbot that is able to give stock

quotes from English questions sent to it.

This project allowed me to learn about topics I didn’t even know about. Jabber

was new to me, and after seeing how easy it is to implement a client I was impressed.

Learning about natural language processing was a bit of a challenge since I had never

researched anything on this topic. After finding an open source version of Alice that was

written in C++, I found it was fairly easy after a few modifications to have it working

with the chatbot. Learning AIML took a while since it was new, but after learning it, the

rules were easy to create. Putting it all together gave me the experience of working with

multiple packages, integrating them, and having a user interface to show the user what

was happening.

Because the chatbot can be easily extended to handle other topics, such as weather

reports, how many items are in stock, or even someone’s telephone number, it leads itself

to being useful in a business aspect. For example, it could be used in e-commerce to

quickly find out how much a product costs from a particular store. This would be quick

given that the user could ask the question in English, and have a rapid response. It would

Page 29: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

29

save the user time from having to go to the store’s webpage, search for the item, and then

find the price.

Perhaps asking questions in a natural manner will be the future of computer

interactions. Certain web pages could be replaced, or complemented with a chatbot that

would respond to simple user questions. Either way, this project is an excellent start for a

full fledged chatbot, or a specialized, custom chatbot.

8.1) Future Work

There are a few things that could be added to the chatbot to make it even more

useful. They are:

a) Have an intermediate server with a static IP that the chatbot connects to. This means

that any user wishing to communicate with the chatbot only has to know the address of

the intermediate server, which is static.

b) Add a “disconnect” feature that gracefully logs off from the Jabber server.

c) Add functionality to create a new Jabber user through the chatbot.

d) Have the chatbot keep a log of all the incoming/outgoing messages.

e) Add new handlers for Alice to respond to topics other that stock prices. This could

include weather reports, or current news.

f) Allow the user to save or print the incoming/outgoing messages.

Page 30: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

30

g) Add rules for different stock tickers.

8.2) Bugs

a) The chatbot crashes in the Alice code after numerous messages are sent to the chatbot.

Because of time constraints, this bug has not been fixed. Since the Alice code is 3rd

party, perhaps an update of this code from the author might fix the problem.

b) If the Jabber server is not available, the application might crash when it tries to

connect.

9) References

Jabber Software Foundation – “Jabber :: About”, 2002. [On-line]

http://www.jabber.org/about/overview.html

Horn, Max & Moore, Jason – “JabberFoX – A Jabber Client for Mac OS X”, 2002. [On-

line]

http://jabberfox.sourceforge.net

Func@all - “Func@ll : How to build Jabber 1.4.1 on Apple Mac OS X”, 2002. [On-line]

http://www.funcall.com/Documentation/Jabber/BuildingJabberOnMacOSX.html

The A.L.I.C.E. AI Foundation – “A.L.I.C.E. AI Foundation”, 2001-2002. [On-line]

http://www.alicebot.org

Page 31: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

31

SourceForge.net – “J-Alice :: Home”, 2002. [On-line]

http://j-alice.sourceforge.net

Ringate, Thoman – “ALICE AIML Primer”, 2001. [On-line]

http://www.comp.mq.edu.au/courses/comp248/Resources/aiml-primer.html

Apple Computer, Inc. – “Cocoa Developer Documentation”, 2002. [On-line]

http://developer.apple.com/techpubs/macosx/Cocoa/CocoaTopics.html

10) Licenses

License for J-Alice (from http://www.opensource.org/licenses/mit-license.php):

Permission is hereby granted, free of charge, to any person

obtaining a copy of this software and associated

documentation files (the "Software"), to deal in the

Software without restriction, including without limitation

the rights to use, copy, modify, merge, publish,

distribute, sublicense, and/or sell copies of the Software,

and to permit persons to whom the Software is furnished to

do so, subject to the following conditions:

Page 32: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681

32

The above copyright notice and this permission notice shall

be included in all copies or substantial portions of the

Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY

KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE

WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR

PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS

OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR

OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR

OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE

SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Page 33: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

A-1

Appendix A: Rules for the StockHandler

The file stocks.aiml contains the rules that the StockHandler uses to find

prices for AAPL. There are 3 different groups of rules that are in this file.

One looks for the stock ticker (AAPL), one looks for the company’s name

with specific keywords, and the other does the same but uses an abbreviated

version of the company’s name. Here are the rules that are used and which

group they belong in:

Stock Ticker Group:

AAPL

AAPL_

_AAPL

_AAPL *

Company’s Name with Keywords Group:

APPLE COMPUTER STOCK

_ APPLE COMPUTER STOCK *

_ APPLE COMPUTER STOCK

_ STOCK MARKET, * APPLE COMPUTER *

_ APPLE COMPUTER * STOCK MARKET

_ MARKET * APPLE COMPUTER *

_ APPLE COMPUTER * MARKET *

_ MARKET, * APPLE COMPUTER *

_ PRICE * APPLE COMPUTER *

_ PRICE * APPLE COMPUTER

_ APPLE COMPUTER'S PRICE *

Page 34: Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin

A-1

_ APPLE COMPUTER'S PRICE

Company’s Abbreviated Name with Keywords Group:

APPLE STOCK

_ APPLE STOCK *

_ APPLE STOCK

_ STOCK MARKET * APPLE *

_ STOCK MARKET, * APPLE *

_ APPLE * STOCK MARKET *

_ APPLE * STOCK MARKET

_ MARKET * APPLE *

_ APPLE * MARKET *

_ MARKET, * APPLE *

_ PRICE * APPLE *

_ PRICE * APPLE

_ APPLES * PRICE *

_ APPLES PRICE *

_ APPLE'S * PRICE *

_ APPLE'S PRICE

_ APPLE'S PRICE *