andres dorado -finding relevant data in the cloud
DESCRIPTION
ÂTRANSCRIPT
© CGI GROUP INC. All rights reserved
_experience the commitment TM
The Cloud: Searching for Meaning
Finding Relevant Data in the Cloud for Actionable Decisions
APRIL 2012
2
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
3
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
4
Confidential
Information Retrieval is beyond databases
DBMS
Enterprise Data
> SELECT *FROM
Information Retrieval*, aka Search, is
finding material (usually documents) of an unstructured nature (usually text) that
satisfies an information need from within large collections (usually stored on computers).
* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009
“An Information Need* is the topic
about which the user desires to know more, and is differentiated from a
query, which is what the user conveys
to the computer in an attempt to
communicate the information need.”
Search Go
5
Confidential
Volume, variety and velocity… Big Data
* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009
DBMS
Enterprise Data
> SELECT *FROM
Information Retrieval*, aka Search, is
finding material (usually documents) of an unstructured nature (usually text) that
satisfies an information need from within large collections (usually stored on computers).
Big Data refers to fast growing,
large data sets that cannot be managed with “traditional” Database
Management Systems.
The “Cloud”
Search Go
6
Confidential
Consumer market is there and Organizations can learn from it
Personal
DataThe “Cloud”
iPhone
Siri: Searching for…
7
Confidential
Analytics is enabling these capabilities
* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009
The “Cloud”
Big Data
DBMS
Enterprise Data
> SELECT *FROM
Search Go
Information Retrieval applies analytic
techniques such as clustering and classification to support users in
browsing or filtering document collections or further processing a set of retrieved documents.*
8
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
9
Confidential
The “ABC” Formula
The “Cloud”
Big Data
Analytics
DBMS
Enterprise Data
> SELECT *FROM
Search Go
Analytics + Big Data + The “Cloud” = Enhanced Business Operations
10
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
11
Confidential
Some of the Challenges
Finding relevant data
Large-scale data sets
Quality of search results
• “A document is relevant* if it is one that the user perceives as containing information of value with respect to their personal information need.”
• “Something (A) is relevant** to a task (T) if it increases the likelihood of accomplishing the goal (G), which is implied by T.”
* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009** Hjorland, B. and Christensen, F. S. Work tasks and socio-cognitive relevance: A specific example. 2002
12
Confidential
Some of the Challenges
Finding relevant data
Large-scale data sets
Quality of search results
• Personal Information Retrieval: The system searches operating systems, e-mail, and other device applications.
• Enterprise, Institutional, and domain-specific search: Documents are typically stored on centralized file systems and/or dedicated servers.
• Web Search: The system has to provide search over billions of documents stored on millions of computers.
13
Confidential
Some of the Challenges
Finding relevant data
Large-scale data sets
Quality of search results
• To assess effectiveness of an Information Retrieval system (i.e., the quality of its search results), a user will usually want to know two key statistics about the system’s returned results for a query or search:
• Precision: What fraction of the returned results are relevant to the information need?
• Recall: What fraction of the relevant documents in the collection were returned by the system?
14
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
15
Confidential
Example 1: The Right Profile
The “Cloud”
Big Data:LinkedIn
150 million professionals
Analytics:Text Mining
DBMS
Pipeline Data
> SELECT *FROM
Search Go
Analytics + Big Data + The “Cloud” = Enhanced Recruitment Process
16
Confidential
Example 1: The Right Profile
17
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
18
Confidential
Example 2: Like it ����
The “Cloud”
Big Data:Twitter
340 million tweets/day
Sentiment Analysis
DBMS
Pipeline Data
> SELECT *FROM
Search Go
Analytics + Big Data + The “Cloud” = Enhanced Customer Satisfaction
19
Confidential
Example 2: Like it ����
Public Relations using “Twitter Earth”Case: Tracking tweets and displaying them by location
20
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
21
Confidential
Example 3: Promote it
The “Cloud”
Big Data:Facebook
800 million users
“Wisdom”
DBMS
Pipeline Data
> SELECT *FROM
Search Go
Analytics + Big Data + The “Cloud” = Enhanced Marketing Effectiveness
22
Confidential
Example 3: Promote it
Social Intelligence using “Wisdom”Case: Analyzing 10 million Facebook users to promote Engineering
23
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
24
Confidential
Conclusions
• Analytics add capabilities to information retrieval systems that facilitate finding relevant data in the “cloud”.
• Analytics enables information retrieval systems to deal with large-scale data sets and therefore is recommendable for working with Big Data.
• Analytics provides advanced techniques for more effective browsing and filtering of Big data.
How are you driving business value with the data assets accessible in by your organization?
Consider the “ABC” formula
25
Confidential
Agenda
• Information Retrieval
• The “ABC” Formula
• Some of the Challenges
• Example 1: The Right Profile
• Example 2: Like it �
• Example 3: Promote it
• Conclusions
• Q&A
_experience the commitment TM
Our commitment to youCGI delivers outcomes your business can count on.