social data analytics using ibm big data technologies

27
© 2012 IBM Corporation October 21, 2013 Social Data Analytics using IBM Big Data technologies Vijay Bommireddipalli [email protected] Development Manager, Social Data Accelerator IBM Big Data

Upload: nicolas-morales

Post on 18-Nov-2014

754 views

Category:

Technology


1 download

DESCRIPTION

Distilling Insights from Social Media Using Big Data Technologies Have you ever wondered what your customers are saying about you in Social media, and the impact it might be having on your business? This session will focus on how BigInsights and Big Data technologies can be used to glean useful and actionable insights from social media data. You'll see how data can be ingested and prepped and do text analytics on social data in real time. Using Hadoop, we'll show you how you can store and analyze your large volume of historical social media data and reference data. This talk and demo will provide an introduction to text analytics and how it is used within the IBM Big Data platform for a social media solution.

TRANSCRIPT

Page 1: Social Data Analytics using IBM Big Data Technologies

© 2012 IBM Corporation October 21, 2013

Social Data Analytics

using

IBM Big Data technologies

Vijay Bommireddipalli [email protected]

Development Manager, Social Data Accelerator

IBM Big Data

Page 2: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 2

Please note

IBM’s statements regarding its plans, directions, and intent are subject to change or

withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general

product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment,

promise, or legal obligation to deliver any material, code or functionality. Information

about potential future products may not be incorporated into any contract. The

development, release, and timing of any future features or functionality described for

our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM

benchmarks in a controlled environment. The actual throughput or performance that

any user will experience will vary depending upon many factors, including

considerations such as the amount of multiprogramming in the user’s job stream, the

I/O configuration, the storage configuration, and the workload processed. Therefore,

no assurance can be given that an individual user will achieve results similar to those

stated here.

Page 3: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 3

Before we begin …

Page 4: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 4

Tag ! You’re it ! - Micro-segmentation

Page 5: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 5

Maybe our politicians should take

a playbook out of the rivalry

between duke/unc and take it

to the courts

http://ity.com/wfUsir

I'm at Mickey's Irish Pub Downtown

(206 3rd St, Court Ave, Raleigh) w/ 2

others http://4sq.com/gbsaYR @silliesylvia good!!! U

shouldnt! Think about the

important stuff, like ur 43rd

birthday ;)

btw happy birthday Sylvia ;)

Location

Intent to consume

@silliesylvia I <3 your leather

leggings!! Its so katniss!!

Age

Personal Attributes

• Sylvia Campbell, Female, In a

Relationship

• 32 years old, birthday on 7/17

• Lives near Raleigh, NC

• College graduate; Income of 80-120k

Buzz/Sentiment

• Retweets BF’s comments

• Interest in BBC shows: Downton Abbey,

Sherlock, Fringe, (P&P?)

• Sherlock Holmes, Robert Downey, Jr.

• Hunger Games, Katniss/J. Lawrence

Interests/Behavior

• Watch movies, tv shows

• Romance plots, “hero types”, strong

women

• Uses iPad 3, Redbox, Hulu

• Shopping , interest in sales/deals

• Duke/ UNC basketball

@silliesylvia $10 dollars says

matthew & mary get married

next season :)

#downtownabbey

Behavior

Interest

@bamagirl can’t wait to

watch sherlock with you!

Oh, robert downey jr, I still

love you but bbc is so

amazing

OMG OMG. just

dropped my new ipad3

crappola!!!

Interest

Consumption

Prediction

dear redbox please have

kings speech for my new tv

colin firth movie marathon

360 degree profile

Intent to consume

Consumption

Social Data Analytics - Using social media as a rich source of information

Page 6: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 6

Name: Jane Doe, Cava Address: Tampa, Fl Twitter: @maryguida Blog Topic: politics Hobbies: running, yoga, … Relationships: Tony C (brother)…

Challenges: Scale

1000’s sites, 100s millions users

Complex matching decisions Partial, noisy and incomplete profile

attributes Only 3% of consumers have sufficient

attribute information in their profiles.

Name: Jane Doe Id: jaydee Address: Home of the Buccaneers Interests: running, yoga, football…

Name: jane Address: Tampa, FL Relationships: Tony C (brother)., …

Entity Integration

Name: Jane Doe Address: Tampa, FL Twitter: jaydee Blog Topic: food Hobbies: running, yoga, … Relationships: Tony C (brother)…

Name: J Doe Blog Topic: food

All names are fictitious

Social Data Analytics - Comprehensive Entity Extraction and Integration

Page 7: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 7

Consumer Intelligence

Personal Attributes • Identifiers: name, address, age, gender, occupation… • Interests: sports, pets, cuisine… • Life Cycle Status: marital, parental

Relationships • Personal relationships: family, friends and roommates… • Business relationships: co-workers and work/interest network…

Products Interests • Personal preferences of products • Product Purchase history

Social Media based

360-degree Consumer Profiles

Life Events • Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house…

Monetizable intent to buy

products

Life Events

Location announcements

Intent to buy a house I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate #austin

Looks like we'll be moving to New Orleans sooner than I thought.

College: Off to Stanford for my MBA! Bbye chicago!

I'm at Starbucks Parque Tezontle http://4sq.com/fYReSj

I need a new digital camera for my food pictures, any recommendations around 300?

What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??!

Timely Insights • Intent to buy various products • Current Location

Page 8: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 8

Social Data Analytics - Profile construction

Page 9: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 9

Social Data Analytics - Profile construction

Page 10: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 10

Big Data Platform and Accelerators - Summary

Software components that

accelerate development and/or

implementation of specific

solutions or use cases on top

of the Big Data platform

Provide business logic, data

processing, and

UI/visualization, tailored for a

given use case

Bundled with Big Data platform

components – InfoSphere

BigInsights and InfoSphere

Streams

Key Benefits

Time to value

Leverage best practices

around implementation of a

given use case. Cloud | Mobile | Security

BI /

Reporting

Exploration /

Visualization

Functional

App

Industry

App

Predictive

Analytics

Content

Analytics

Analytic Applications

IBM Big Data Platform

Systems

Management

Applications &

Development

Visualization

& Discovery

Accelerators

Information Integration & Governance

Hadoop

System

Stream

Computing

Data

Warehouse

Contextual

Search

Page 11: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 11

Social Media Analytics Architecture

Data Ingest

and Prep

Extract Buzz,

Intent ,

Sentiment

Entity

Analytics:

Profile

Resolution

Real time analytics.

Pre-defined views

and charts

Dashboard

Stream Computing and Analytics

BigInsights System and Analytics

Online flow: Data-in-motion analysis

Offline flow: Data-at-rest analysis

Pre-defined

Workbooks and

Dashboards

Social Media

Data

Extract Buzz,

Intent ,

Sentiment And

Consumer

Profiles

Entity

Analytics and

Integration

Comprehensive

Social Media

Customer

Profiles

Social Media

Optional: Indexed Search

Index using

Push API

Data Explorer

Ad hoc access

Page 12: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 12

SDA 1.2

Social Media Sources Supported – Gnip, Boardreader – Tweets, Boards, Blogs

Analyze Streaming data as well as data at rest

– Streams for processing of streaming data – BigInsights/Hadoop for input, output and configuration data

Key Micro-segmentation Attributes (out-of-box)

– Personal Info: Gender, Location, Parental status, Marital status, Employment – Interests: Movie interest, Comic book fan, Product interest, Current customer

of, Products owned – ** Attributes can be added in (requires some development effort)

Entity resolution across the different social media sources

Page 13: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 13

SDA 1.2

Outputs/Measures (out-of-box) – Buzz – Sentiment – Intent to buy/start service – Intend to attend/see

Example use cases – Retail – Lead generation, Brand management – Financial – Lead generation and Brand management – Media & Entertainment: Brand management – Generic

Visualization using BigSheets

Extendable/Customizable Solution

Page 14: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 14

SDA - Acting on the insights

Metrics based understanding of Feedback in Social Media – And more importantly Feedback from whom !

Comprehensive (social media) profiles with microsegmentation

information

Campaign execution can be done in Social Media

Entity resolution across the different social media sources

External (social media) to Internal (CRM) linkage **coming

Page 15: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 15

SDA Outputs

Pre-defined Workbooks

Dashboards

Granular outputs for further slicing and dicing by Data Scientists

Page 16: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 16

SDA Conceptual Flow

Page 17: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 17

BigInsights & Streams Text Analytics

High Performance rule based Information Extraction Engine

Highly scalable solution available for at-rest and in-motion analytics

Pre-built extractors, and toolkit to build custom Extractors

• Rich Extractor library supports multiple languages

• Declarative Information Extraction (IE) system based on an algebraic framework

Sophisticated tooling to help build, test, and refine rules

Developed at IBM Research since 2004

Embedded in several IBM products • BigInsights, Streams.

• Lotus Notes

• Cognos Consumer Insights

What is TA

How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 18: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 18

Applications of Text analytics

Broad range of applications in many industries • CRM Analytics

Voice of customer

Product and Services gap analysis

Customer churn

• Social Media Analytics Purchase intent

Customer churn prediction

Reputational Risk

• Digital Piracy Illegal broadcast of streaming and video content

• Log Analytics Failure analysis and root cause identification

Availability assurance

• Regulatory Compliance Data Redaction

• Identify and protect sensitive information What is TA

How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 19: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 19

Performance Comparison (with ANNIE open source **)

0

100

200

300

400

500

600

700

0 20 40 60 80 100

Average document size (KB)

Th

rou

gh

pu

t (K

B/s

ec)

Open Source Entity Tagger

SystemT

ANNIE

Task: Named Entity Recognition

Dataset : Different document collections from the Enron corpus obtained by randomly sampling 1000 documents for each

size

>10x faster

< 60% memory

** http://dl.acm.org/citation.cfm?id=1858681.1858695

Performance comparison with GATE 5

What is TA How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 20: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 20

Text Analytics Development Flow

Text Analytics Optimizer

Text Analytics Runtime

Compiled Operator

Graph

Rule based language Annotator Query Language - AQL

with familiar SQL-like syntax Specify annotator semantics

declaratively

Choose an efficient execution plan

Highly scalable, embeddable Java runtime

Sample Input Documents

Extracted Information

Development Tooling

Declarative language for extractor logic

Optimization and deployment to scalable runtime

Extractor

What is TA How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 21: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 21

JAQL Function Wrapper

SystemT

Runtime

Input

Adapter

Output

Adapter

{

label: “http://www.ibm ...”,

text: “<html>\n<head> …”

}

{

label: “http://www.ibm ...”,

text: “<html>\n<head> …”

Person:

[

{ firstName: [10, 15],

lastName: [16, 25] },

{ firstName: [1042, 1045],

lastName: [1046, 1050] }

],

Hyperlink:

[

{ anchorText: [25, 33] },

{ anchorText: [990, 997] }

],

H1: …

}

Input Record Output Record

Document

encoded as

JSON record.

Invoking Text Analytics within BigInsights

Jaql runtime coordinates a

multi-stage map-reduce flow.

AQL SystemT

Optimizer

Compiled

Plan Annotations added as

additional attributes to

JSON record. Dictionaries

Page 22: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 22

Additional Advantages of IBM Text Analytics

Quality: Drives effectiveness of entire application

• Enables high accuracy and coverage

Performance: Dominant cost is CPU

• Process large documents and large number of documents

with high throughput

Explain-ability

• Determine the cause of errors and fix it without affecting the

remaining correct results

Reusability: easily adaptable for a different domain

• The development platform must enable layers of abstractions to be built and easily reused

in a different domain

Expressivity

• Rule language with a rich set of operators available to enable complex extraction tasks

What is TA How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 23: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 23

BigInsights Text Analytics Development

What is TA How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 24: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 24

AQL editor with content assist What is TA How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 25: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 25

Click to drill down and see

the rules that triggered

inclusion of results

Explain and search

through the results

Understanding the lineage of results

What is TA How is TA Deployed & used

Dev. tools Why

Biginsights TA

Page 26: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 26

IBM Text Analytics for Big Data

High Performance Information Extraction Engine

Analysis can be applied to data at-rest and in-motion

• Build extractor once and use with BigInsights or Streams

Parallel execution scales to Big Data volumes

• Linearly scalable to extremely high volumes

Highly customizable to a variety of domains and languages

• Pre-built extractors available out of the box

Sophisticated tooling enables ease of development and refinement of results

Page 27: Social Data Analytics using IBM Big Data Technologies

© 2011 IBM Corporation 27

Thank you