watson computer

WATSON COMPUTER

INTRODUCTION

WATSON is an artificially intelligent computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project.

QA technology takes a question expressed in natural language, seeks to understand it in much greater detail, and returns a precise answer to the question.

Watson applies advanced natural language processing, information retrieval, knowledge representation, automated reasoning, and machine learning technologies for this purpose.

It incorporates a local corpus (database) of information.

JEOPARDY !

Initially developed to answer questions on Jeopardy !, a quiz show known for its tricky questions.

Watson participated in 2011 against former champions Brad Rutter and Ken Jennings and won over them.

REQUIREMENTS• 90 x IBM Power 750 servers • 2880 POWER7 cores• POWER7 3.55 GHz chip• 500 GB per sec on-chip bandwidth• 10 Gb Ethernet network• 16 Terabytes of memory• 20 Terabytes of disk, clustered• Can operate at 80 Teraflops• Runs IBM DeepQA software• Scales out with and searches vast amounts of unstructured

information with UIMA & Hadoop open source components

• Linux provides a scalable, open platform, optimized to exploit POWER7 performance

• 10 racks include servers, networking, shared disk system, cluster controllers 3/23

ALGORITHMS USED

1. SVM (Support Vector Machines) Classifier

• SVM is supervised learning model that analyzes data and recognizes patterns

• Given a set of training examples, each marked as belonging to one of two categories, it builds a model that assigns new examples into one category or the other

• It is a non-probabilistic binary linear classifier.

2. Naïve Bayes’ Classifier

• It is a family of classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

• So it is a conditional probability model.

• Particularly suited when the dimensionality of the inputs is high.

ALGORITHMS USED

3. Word Sense Disambiguation• It is an open problem of natural language

processing and ontology. WSD is identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings.

• It requires two strict things: a dictionary to specify the senses and a corpus of language data to be disambiguated. WordNet is used as a dictionary in this context. For example –

3. Word Sense Disambiguation (contd.)• The sentence as well as the query forms an ordered set of words. We then compute the sense

network between every pair of words from query and sentence.

ALGORITHMS USED

PROCESS

The basic working of Watson computer is based on four steps –

1. Question Analysis

4. Answer Extraction

(Result)

3. Hypotheses Generation

2. Document Retrieval

PROCESS

Step 1 – Determining answer type•Uses machine learning techniques like SVM (Support Vector Machine), Naïve Bayes classifiers•Above techniques apply on a tagged corpus of information

Step 2 – Query formation• Assume question is a valid IR query• Remove stop words from question

Example:In 1897 Swiss climber Matthias Zurbriggen became the first to scale this Argentinean peak.

1. Question Analysis

PROCESS

•The task of the document retrieval module is to select a small set from the collection which can be practically handled in the later stages.

•Using important terms from the question, Watson performs a search over millions of documents to find relevant passages.

• Data can be stored either in a local corpus or can be accessed from the Internet.

2. Document Retrieval

PROCESS

• Extracts important entities – so called “candidate answers” – from the documents.• WordNet is used as a sense/semantic dictionary.• Obtain statistics of a particular word from a large corpus by assigning probabilities based on occurrence of target concept. • Hypotheses generation of example given above -

3. Hypotheses Generation

PROCESS

Step 1 – Answer Scoring •Candidate answers are scored using a large number of answer scoring analytics running parallel.• Algorithms like Type Coercion scorer, temporal match etc. are used. • Answer scoring of example given above -

PROCESS

Step 2 – Analysing Scores •The scores are grouped into meaningful groups, or evidence dimensions. •A plot of these yields the evidence profile for the candidate. •Watson statistically combines the scores to produce a final confidence score.

Aconcagua

EXAMPLE

EXISTING CHALLENGES

HealthcareMedical information doubles every three years, physician’s inability to be up-to-date, complex decision making

RetailFulfilling customers’ high expectation of satisfaction and effectively analysing growing mountain of data

FinanceEach day huge financial information is generated, difficult to harness

Public SectorEfficient analysis of enormous volumes of unstructured, unverified data

APPLICATIONSHealth Care

Finance

Retail

Public Sector

Memorial Sloan Kettering

Genesys

MD Anderson

DBS (Development Bank of Singapore)

The North Face

Decision-making

Policy and performance

Public security

Engaging shoppers !

Wellpoint

FUTURE SCOPE

Recipe generating platform

Pharmaceutical industry

Publishing

Biotechnology

Research or inventions

Requires a huge database of prior knowledge and information

Has trouble responding to short clues

Incapable of coming up with fresh ideas

More than base knowledge, clues may require thought, an area where humans still have an edge over Watson Computer

LIMITATIONS

BIBLIOGRAPHY https://researcher.ibm.com/researcher/viewpage.php?id=2121

Science Behind an Answer http://www03.ibm.com/innovation/us/watson/what-is-watson/science-behind-an-answer.html

Jeopardy! IBM Watson Day 1 (Feb 14, 2011)

http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related

Tom M. Mitchell. 1997. Machine Learning. Computer Science Series. McGraw-Hill.

Corpora for Question Answering Task, Cognitive Computation Group at the Department of Computer Science, University of Illinois at Urbana-Champaign.

Dell Zhang and Wee Sun Lee. 2003. Question Classification using Support Vector Machines. In Proceedings of the 26th ACM International Conference on Research and Developement in Information Retrieval (SIGIR’03), pages 26–32, Toronto, Canada.

www.google.com

www.wikipedia.com

www.ibm.com

QUESTIONS ??

watson computer

question example

question analysis

information retrieval

corpus of language data

word sense disambiguation

sense network

introduction watson

machine learning techniques

Technology

artificial intelligence analyst - singapore...• describe...

ibm watson ecosystem - department of computer...

microsoft research faculty summit 2008. paul watson...

student name level college desc program desc major ......

computer graphics - week...

using watson for enhancing human-computer co-creativity

figure 16.0 watson and crick. figure 16.0x james watson

justin fessler presentation - ibm · watson engagement...

watson portfolio...

watson institute for international and public affairs ·...

wcp18 dancing fountain - binghamton university...

1 computer forensics michael watson director of security...

cs 112 introduction to programming inheritance hierarchy;...

computer graphics - week...

dpl programmers' manual - computer conservation … report...

sabrina watson, president, crystal river computer users...

introduction to bthis is watson - ibm research people...

watson is an artificial intelligence computer · watson is...

ibm watson : courses overvie€¦ · ibm watson platform...

computer graphics - week 2 - department of...