1 qa for the web language computer corporation dallas, texas pi: dan moldovan...

24
1 QA for the Web QA for the Web Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan [email protected]

Upload: ella-harvey

Post on 28-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

1

QA for the WebQA for the Web

Language Computer Corporation

www.languagecomputer.comDallas, TexasPI: Dan Moldovan

[email protected]

Page 2: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

2

MotivationMotivation

In the US alone, there are more than 100 million Internet users per day

Each user asks on average 5 questions

Each user spends about half an hour to find answers

Page 3: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

3

TasksTasks

Task 1 – Adapt the QA technology to the universality of the Web hypertexts

Task 2 – Interface the QA system with the emerging Semantic Web technologies

Page 4: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

4

Task 1 Adapt QA Task 1 Adapt QA Technology to the WebTechnology to the Web

Two approaches: use available Search Engines gather documents from the Web and

form a local collection

Page 5: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

5

QA on Top of a Search QA on Top of a Search EngineEngine

Search Engine

Paragraph Retrieval

Answer Processin

g

Question Processin

g

Format Manage

r

Documents

Normalized Documents

Keywords

Page 6: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

6

QA on Top of a QA on Top of a Database EngineDatabase Engine

Database Engine

Paragraph Retrieval

Answer Processin

g

Question Processin

g

Format Manage

r

Database Records

Normalized Documents

Keywords

Query Builder

Query

Page 7: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

7

Technical challengesTechnical challenges

Different formats: pdf, html, doc, ps

Document layout Pages dynamically generated Password protection Subscription required Cookies

Page 8: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

8

Build local collections of Build local collections of documentsdocuments

Gather documents from a specific site, and cache locally

Transform in text canonical form, then index documents

Maintain document collection: constantly update, avoid redundant documents, garbage collection, etc.

Page 9: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

9

ExperimentsExperiments

Business: InterVoice Brite Product Manuals

Community: City of Irving NEWS: cnn.com, abcnews.com,

dallasnews.com, time.com, washingtonpost.com

Page 10: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

10

InterVoiceBriteInterVoiceBrite

Collection: product manuals size: 38MB files: 802 format: PDF layout: specific to manuals changes occur at large time intervals

Page 11: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

11

PECULIARITIES OF PECULIARITIES OF THEIR NEEDSTHEIR NEEDS

The Question is in the form of a problem description

The expected answer is a solution to the problem

The answer is compiled from different parts of documents and given in the form of a procedure to be followed

Follow-ups are frequently leading to dialogue

Page 12: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

12

An ExampleAn Example

Question: “I would like to have the caller be able to control the playback of a long set of instructions with speech recognition. While the message is playing the caller may say “stop”, “go back”, “forward”, “start over” and have the system respond appropriately. Can this be done? The SpeechAccess engine is Nuance.

Answer: “Yes this can be done. Play a lead in message to tell the caller to say “next” “backup” or “done”. Then with the loop play the first instruction you want the caller to hear in keyover mode. To obtain line balancing procedure and the required files please visit the continuing engineering web page”

Page 13: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

13

Our DemoOur Demo

Q: How can I obtain line balancing information ?

A: READ DSLAC Request AI1 DSLAC line balancing information

Q:How can I modify a message ? A: Your Voice The feature that enables a

voice mail user to change specific voice messages

Q: What is the runtime engine ? A: ISINIT, the runtime engine,

Page 14: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

14

Our DemoOur Demo

Q: What type of error is HH ? A: Hardware Handler (HH) error

Q: What causes telephony connection problems ?

A: Telephony connection problems can be caused by the InterSoft system or by the telephony equipment (PBX)

Q: What does FUSE mean ? A: FUSE Indicates a problem with the fuse

Page 15: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

15

City of IrvingCity of Irving

Collection: heterogeneous, city information size: 96MB files: 1097 format: HTML, PDF, DOC layout: WWW space small daily changes

Page 16: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

16

ExamplesExamples

Q: When does the Farmer’s Market take place ? A: Irving Farmers ‘ Market: 1st and 3rd Saturdays

in Downtown Irving

Q: What is Irving ‘s news source ? A: Irving ‘s news source is the City Spectrum

Q: Where does Irving’ s water supply come from ? A: The City of Irving purchases its entire water

supply from the City of Dallas

Page 17: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

17

ExamplesExamples

Q: Where can I pay traffic fines ? A: Irving Municipal Court Criminal Justice

Center 305 N. O’Connor Rd

Q: How do I apply for a job with the City ? A: Applications are accepted from 8a.m. to

5p.m. Monday – Friday at the Civic Center Complex, 825 W. Irving Blvd. Job listings are available on the city ‘s Web site, www.ci.irving.tx.us , or by calling the city ‘s 24 –hour job line at (972) 721 3773

Page 18: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

18

NEWSNEWS

Collection: sources: CNN.COM, TIME.COM,

ABCNEWS.COM, DALLASNEWS.COM, WASHINGTONPOST.COM

size: 531MB files: 55880 format: HTML, PDF, DOC frequent changes

Page 19: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

19

IssuesIssues

broken links garbage collection for obsolete

files cumulative NEWS updates depending on the type of

source (TIME.COM - weekly)

Page 20: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

20

ExamplesExamples Q: How many soldiers died in Afghanistan? A: The US military has opened an investigation into last

week’s friendly fire incident in Afghanistan that killed four Canadian soldiers and injured eight others

Q: How much did President Bush increase aid for poor countries ?

A: Bush said the US will increase its initial pledge of $ 200 million only after the fund proves successful

Q: Who is the owner of Dallas Mavericks ? A: Mark Cuban, Internet entrepreneur and owner of the

NBA ‘s Dallas Mavericks

Page 21: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

21

QA and Semantic WebQA and Semantic Web QA Technology can contribute to

the development of Semantic Web Possible architectures:

1. QA as an interface between Intelligent Agent and the Semantic Web

Human

Agent

QA Web Web

Page 22: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

22

QA and Semantic WebQA and Semantic Web 2. QA works on a local collection

Human

Agent

QA

Web Web

Local Collection

Human

Agent

QALocal

Collection

Page 23: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

23

Technical ChallengesTechnical Challengesto be Addressedto be Addressed

1. Make QA system compatible with semantic web language (i.e. XML, RDF, DAML, OIL, etc.)

2. Make QA ontologies compatible with the Semantic Web ontology

3. Interface QA system with Intelligent Agents

Page 24: 1 QA for the Web Language Computer Corporation  Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com

24

Thank you!Thank you!