Optimizing Business by Unleashing Big
Data in the Enterprise (Technion Computer Engineering Conference, May 2013)
Aya Soffer Director, Big Data Analytics IBM Research – Haifa
Agenda
• Big data in the Enterprise, why now
• Big Data Research in IBM
• “Deep Dive” – Customer Analyst
“A report by the World
Economic Forum in
Davos, Switzerland,
declared data a new
class of economic asset,
like currency or gold.
“Companies are being
inundated with data—
from information on
customer-buying habits
to supply-chain
efficiency. But many
managers struggle to
make sense of the
numbers.”
“Data is the new oil.”
Clive Humby
Big Data in the Press and on Business Leaders Minds
Big Data Is Hollywood's
New Rising Star -
Hollywood has discovered
Big Data’s talents to
determine how to
distribute and promote
movies
1 in 3 Business leaders make critical decisions without the information they need
53% Business leaders say they don’t have access to the information they need to do their jobs
From the Press From Our Surveys
2.2X Organizations leveraging analytics more likely to outperform their industry peers
Big Data is Big, Fast, and Diverse
Cost efficiently processing the
growing Volume
50x
35 ZB
2020 2010
Responding to the
increasing Velocity
30 Billion RFID
sensors and counting
Collectively analyzing
the broadening Variety
80% of the worlds data
is unstructured
In Order to Realize New Opportunities, Companies are Thinking Beyond Traditional Sources of Data
Transactional and Application Data
Machine Data Social Data Enterprise
Content
Volume
Structured
Throughput
Velocity
Semi-structured
Ingestion
Variety
Highly unstructured
Veracity
Variety
Highly unstructured
Volume
Big Data Success Stories are emerging – Combining Organizational Data with Public sources
Retailer reduces time to run queries by 80% to optimize
inventory
Stock Exchange cuts queries from 26 hours to 2
minutes on 2 PB
Government cuts security analysis from hours to
70 Milliseconds
Utility avoids power failures by analyzing
10 PB of data in minutes
Telco analyses streaming network data to reduce hardware costs by 90%
Hospital analyses streaming vitals to detect illness
24 hours earlier
© 2011 IBM Corporation
An Example - Retail Data Asset Landscape
Finance Inventory Suppliers Shipments Orders
Sales Stores Products Employees Customers
eCommerce Marketing Social Mobile Third Party
Video
© 2011 IBM Corporation
Technology like Hadoop and commercial Big Data platforms make it possible to cost-effectively analyze all available data
Visualize and Experiment
Predict
Integrate and Govern
Hadoop
System
Stream
Computing
Data
Warehouse
Analyze Real-time
Search and Discover
From http://www.ebizq.net/blogs/enterprise/
IBM Big Data Platform
Log analytics and event monitoring Enterprise knowledge management Contact centers Customer acquisition and retention Digital Marketing Effectiveness Decision Analytics and Operations
Agenda
• Big data in the Enterprise, why now
• Big Data Research in IBM
• “Deep Dive” – Customer Analyst
Big Data Research in IBM (Not an exhaustive list)
• New varieties of data
– Text / Social Media
– Networks
– Multimedia
– Machine Data / Sensors
• Visual Analytics
- Navigating through data
- Visualizing and interacting with analytics
• Big Data Performance
– In memory
– HW acceleration (FPGA)
– Benchmarks
– New Architectures
• Information Integration
– Integrating Enterprise and public data
– Linking data / context
– Entity Extraction and integration
• Industry Applications
- Healthcare
- Telco
- Retail & Marketing
- Smarter Workforce
- Energy
- Water / Agriculture
- Public Safety
- ….. - Blue – work in Haifa
© 2011 IBM Corporation 10
Building an Environment for Analyzing Data
We are creating a “plug-and-play” environment for exploring massive data
– A collaborative and exploration-focused user experience
– Rich collection of analytics and tools for analysis
– Powerful infrastructure for data management and analytics
– Pre-integrated data sets to provide context
– Expertise in all aspects of the process
Lets the domain expert focus on their strengths; we handle the data challenges
Big Data
Analysis
Traditional
Data
Analysis
Application Layer: Models, Analytics, Applications
User Services: Visualization, Reporting, Collaboration
Data
Preparation
& Ingestion
Data and Analytic Services & Tools: Libraries, Catalogs
Data
Management
Systems Infrastructure Domain Informatics Researchers
Human-Computer Interaction
Analytics and Mathematical Sci.
Data scientists
Information Mgmt Researchers
Information retrieval Researchers
Computer systems Researchers
IT operations support
Customer Care
Telco Monitization
Personalized Medicine
Advanced Discovery Lab
Data Sets
Other Projects
© 2011 IBM Corporation
Smarter Workforce Analytics
Expertise Location
Expertise Building
Engaging Experts
Social Pulse Expertise
Predictive and Social Analytics for
identifying employee retention
propensity
Retention
Social Pulse
Derive insight for employee’s sentiment
from social media, to refine policies,
focus communications & drive culture
Customers
Partners
Employees
Smarter Workforce Analytics applications leverage the Enterprise Social Graph which
combines transactional, social and business data, to perform analysis such as:
influence, social proximity, reputation, impact, expertise and more
Agenda
• Big data in the Enterprise, why now
• Big Data Research in IBM
• “Deep Dive” – Customer Analyst
© 2011 IBM Corporation
Customer Analyst – analyzes customer behavior and digital traces to build rich customer profiles
Input:
– Documents people read
– Things people write and respond
to in social media
– Searches people do on the web
– Transactional data
– Organizational data (e.g product
catalogs, demographics)
Analysis:
– Evolving personal interests and
preferences
– Life events
– Topical Influence
– Local/Global communities to
which users belong
Applications:
– Customer segmentation
– Marketing promotions and advertisement
– Products recommendations
– Churn prediction
– Demand prediction
– ….
13
© 2009 IBM Corporation
Customer Analyst High Level View
Infrastructure IBM BigInsights IBM Streams
Fusion
Telco
Communities Influencers
Hadoop
SDA
BoardReader
Data Layer
Accelerators Layer
BigIndex
Data Collection GNIP T4J SFC SyndicationHub ETL
Wiki
Categorizer
Content Analytics & Discovery Layer
Social Analytics Layer
User Profiler
Data Parsers
Metrics
Life
Event
Detection Sentiments
Personalization Layer
URL Analyzer
Targeting Recommendation
Industry Solutions Layer
Retail CP HC Marketing
Wikipedia
Index
ODP
Index Blogs Facebook Twitter Browsing Mobile Transactions
FB API
SMA
14
© 2009 IBM Corporation
Detecting Interests and Taste based on Mobile Data Usage
URL/App Analysis: for each user, report the
most meaningful interests
to describe her profile.
Large scale analysis to to map pages to a clear
and well defined taxonomy
Update users
profiles Consume
Browsing activity on mobile devices
Data Cleansing
Userid Category Strength
012013a474 Sports/Football 22
012013a474 Shopping/Vehicles 15
012013a474 Sports/Swim 14
15
Microsegmentation
Tiered pricing plans
Promotions
Churn
Mobile
Gateway
Logs
© 2009 IBM Corporation
URLs are transformed into concepts
{docid: d1, wwpokec.azet.sk}
{docid:d2, http://news.yahoo.com/recall-news-215006441.htm}
Concepts (categories) Selection
{docid: d3, www.youtube.com}
ODP-
Business/Marketing_and_Advertising/News_and_Media
Concepts Aggregation
(Top-k concepts per user)
WIKIPEDIA
Product recalls
URL Parsing (Types)
Userid Category Strength
012013a474 Sports/Football 22
012013a474 Shopping/Vehicles 15
012013a474 Sports/Swim 14
16
© 2009 IBM Corporation
Demographic Analysis Example Top Level browsing behaviour does not vary widely
by age group
25-34 year olds concentrate a higher proportion of
their browsing in the “top categories”
Male Female
News & Media Online Shopping
Sports Health & Medicine
Football Cinemas
Autotrader Personal Finance
Adult Content
Mobile Gaming
Analysing only the top 100
browsing categories it is
possible to identify clear
preferences by Male and
Female customers
Top ten categories remain
the same for Men and
Women, though the
ordering varies slightly
Those categories for which
there are significant
differences between men
and women:
17
© 2009 IBM Corporation
There is a correlation between browsing diversity and churn propensity in Prepay customers Each MSISDN in Consumer PrePay has been allocated a Churn Percentile score
Comparing each percentile group’s top categories shows that Churn Percentile
seems to be correlated positively with increasing variety of categories browsed
Further findings indicated a higher propensity to churn for heavy users of social media
sites and for soccer fans ( the reason: a competitor proposing SMS updates with
match scores)
0
0.05
0.1
0.15
0.2
0.25
0.3
90-100th 80-90th 70-80th 60-70th 50-60th 40-50th 30-40th 20-30th 10-20th 0-10th
Churn Propensity Percentile
Bro
wsin
g D
ivers
ity In
dex
18
© 2009 IBM Corporation
Example: Localized community analysis for marketing
Understanding the marketing potential of particular locations
Understanding the potential of viral marketing
Identifying promising community types and targeting marketing to them
Lowering marketing costs by targeting earned media
Extended community
of people that talk about some subject
19
© 2009 IBM Corporation
Location 1 Location 2 Location 3
Geographical Analytics – How it works
• GPS Geotagging (<5% of tweets)
• Even if explicit in profile – disambiguation might be needed:
– E.g., “Springfield” by itself can refer to 30 different cities in the USA.
• Techniques used – Rule-based
• E.g., “I live in ..”, “lets meet at ..”
– Machine learning (supervised): • Statistical methods- find the most
characteristic terms of people that report they live in some location.
• E.g., “The Strip”, “Bellagio fountains”, “Freemont St.”…-> Las Vegas
– Based on Social Network, • i.e. learn location of people
based on the locations of their friends
20
© 2009 IBM Corporation
How we build the communities:
–Build social graph based on the data flow in the social media. For
example, in Twitter, using the @Reply tag.
–Extend the connections with friends, followers, following, etc.
–Then use clustering-based approach
Features of a community: – Content (profile + messages of participants): what do participants talk about?
which topics? how much?
– Topological (structure, level of activity): how is the community organized?
how fast do messages spread? how many are people connected?
– Role of participants (structure, level of activity): are there community
leaders? influencers? advocates? connectors to other communities?
– Type (e.g. religious congregation, school, teen friends, reading club, yoga):
what is the type of the community? is it possible to market to all communities
of the same type?
– Dynamics: what makes communities grow/shrink? how to influence a
community? Which features have commercial significance? Which features
can be acted upon?
Community Analytics - How it works:
21
Using same technology with events vs. people - managing Natural Disasters
Event 1 – 10:10 river water surging (from
accumulation of tweets)
Event 2 – 11:15 fast moving water
(from accumulation of mobile
messages) Event 3 – 11:15 – flood, major road blocked
(from accumulation of tweets and mobile
messages)
Event 4 – 12:30 – flood (from
accumulation of tweets and mobile
messages) Event 5 – 12:30 – traffic accident (from
accumulation of mobile messages)
23
Summary
• The wealth of available data affords many opportunities
– To do better science, improve our world, make more money
• Getting insight (let alone foresight) from data is still too hard
– Must handle the 4V’s of data
– Requires multiple skills – data science, systems, computer science, math
– Requires data and tooling
– Requires significant computing resources
• There is a lot of exciting research being done in this space
– User experience, Data and semantics, Analytics and modeling, Application of Big Data, Systems research
• More information see recently released IBM Journal of Research and Development, Issue 3 / 4, May – July 2013, Massive-Scale Analytics, Guest Editor Aya Soffer
– http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6517282
© 2011 IBM Corporation
IBM Research
24
תודהHebrew (Toda)
Thank You
Merci Grazie
Gracias
Obrigado
Danke
Japanese
English
French
Russian
German
Italian
Spanish
Portuguese
Arabic
Traditional Chinese
Simplified Chinese
Thai
Korean
KIITOS Danish