text analytics for unlocking the potential of big data

24
1 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

Upload: emlyn

Post on 10-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Text Analytics for Unlocking the Potential of Big Data. 1. T ext analytics & big data. 2. New opportunities with text analytics. 3. Challenges when mining text. 4. Solutions to overcome challenges. 5. Wrap-up. Bhavani Raskutti @ Pacific Brands. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Text Analytics for Unlocking the Potential of Big Data

1

Text Analytics for Unlocking the Potential of Big Data

Bhavani Raskutti @ Pacific Brands

5

1 Text analytics & big data

2 New opportunities with text analytics

3 Challenges when mining text

4 Solutions to overcome challenges

Wrap-up

Page 2: Text Analytics for Unlocking the Potential of Big Data

2

Text Analytics for Unlocking the Potential of Big Data

Bhavani Raskutti @ Pacific Brands

5

1 Text analytics & big data

2 New opportunities with text analytics

3 Challenges when mining text

4 Solutions to overcome challenges

Wrap-up

Page 3: Text Analytics for Unlocking the Potential of Big Data

3

Text Analytics & Big Data

Data used for Analytics Now Other Data Available

Customer Data• Demographics• Usage summary• Product Usage

Traditional customer feedback• Surveys• Customer complaints• Inbound emails

Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance

Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos

Product Data• Mix & usage

Access Device Data• GPS & locale data

… …

Linear growth Exponential growth

Page 4: Text Analytics for Unlocking the Potential of Big Data

4

Text Analytics & Big Data

Data used for Analytics Now Other Data Available

Customer Data• Demographics• Usage summary• Product Usage

Traditional customer feedback• Surveys• Customer complaints• Inbound emails

Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance

Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos

Product Data• Mix & usage

Access Device Data• GPS & locale data

… …

Linear growth Exponential growth

Page 5: Text Analytics for Unlocking the Potential of Big Data

5

Text Analytics & Big Data

Data used for Analytics Now Other Data Available

Customer Data• Demographics• Usage summary• Product Usage

Traditional customer feedback• Surveys• Customer complaints• Inbound emails

Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance

Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos

Product Data• Mix & usage

Access Device Data• GPS & locale data

… …

Linear growth Exponential growth

Page 6: Text Analytics for Unlocking the Potential of Big Data

6

Text Analytics & Big Data

Data used for Analytics Now Other Data Available

Customer Data• Demographics• Usage summary• Product Usage

Traditional customer feedback• Surveys• Customer complaints• Inbound emails

Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance

Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos

Product Data• Mix & usage

Access Device Data• GPS & locale data

… …

Linear growth Exponential growth

Page 7: Text Analytics for Unlocking the Potential of Big Data

7

Text Analytics & Big Data

Data used for Analytics Now Other Data Available

Customer Data• Demographics• Usage summary• Product Usage

Traditional customer feedback• Surveys• Customer complaints• Inbound emails

Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance

Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos

Product Data• Mix & usage

Access Device Data• GPS & locale data

… …

Linear growth Exponential growth

Page 8: Text Analytics for Unlocking the Potential of Big Data

8

Text Analytics for Unlocking the Potential of Big Data

Bhavani Raskutti @ Pacific Brands

5

1 Text analytics & big data

2 New opportunities with text analytics

3 Challenges when mining text

4 Solutions to overcome challenges

Wrap-up

Page 9: Text Analytics for Unlocking the Potential of Big Data

9

New Opportunities with Text Analytics

Mine freely available social media data for:• Understanding customer sentiment• Identifying major customer concerns• Tracking sentiment/issues over time

Business implications:• Ability to act on negative sentiments quickly• Respond to customer concerns in a timely manner• Target initiatives appropriately by continuous tracking

Superior market research & focus group outcomes

Page 10: Text Analytics for Unlocking the Potential of Big Data

10

Sentiment AnalysisMethodology:• Score based on positive & negative sentiment words• OR Use supervised learning with labelled examples

New Opportunities

No sarcasm detection

Page 11: Text Analytics for Unlocking the Potential of Big Data

11

Topic DetectionMethodology:1. Create term frequency matrix from text sequences

2. Use un-supervised learning to create clusters

3. Create cluster descriptions

New Opportunities

Concerns Examples of tweets in the cluster Change plan “@Telstra I don't normally do this but ridiculous service today. Can't change my plan instore.Also urgent service issue 1 full week no call.”

“@Telstra if i sign up on the $50 Every Day Connect BYO plan, can I upgrade to a similarly priced plan when the next iPhone is released?” “@Telstra Hey guys, need to change my plan. Due to constant drop outs in CBD I keep going over my cap. Can you help?”

Tech Support “On the phone to @telstra (bigpond) support and I really think I'm being punked. Tech keeps asking the same question. #frustrating” “@Telstra Thanks Greg. Tech support didn't really have an answer for me :\ V frustrating!”

Bigpond “@telstra what's going on with bigpond tonight? It's terribly slow” “@Telstra Hi. I'm trying to access my bigpond music account but keep getting directed to the mog trial page. Is this a glitch?” “@Telstra Hi! I have emails stuck in my outbox... getting an error code. Receiving ok. Are there any problems with Bigpond at the moment?”

Call centre “ Telstra sending its call centre offshore. Don't think i will renew my contract with them now. @telstra” “@Telstra Cannot connect to call centres now. Why close more?” “@Telstra what will Russ be able to do that your call centre can't, for a telecommunications company your customer service is pretty poor!”

Pay Bill “@Telstra - Paying a bill is a nightmare, you guys need to centralise things.. jeez, i have to login 3 times to pay one bill” “@Telstra Can't pay my bill if you didn't send me one. This is your mistake and not the first time it's happened.” “It is worth to pay the few extra dollars to stay away from @Vodafone_AU and go with @telstra. You get what you pay for.”

Job Cuts “Very sad day for the @Telstra staff losing their job, it's never good when people lose their job here at home” “Sending good vibes to the @Telstra folk impacted by the recent job cuts. DM me for details for #jobs @auspost” “So help me out can you please? You cut jobs move services off shore pay less in wages an somehow your plans are more expensive”

Page 12: Text Analytics for Unlocking the Potential of Big Data

12

Text Analytics for Unlocking the Potential of Big Data

Bhavani Raskutti @ Pacific Brands

5

1 Text analytics & big data

2 New opportunities with text analytics

3 Challenges when mining text

4 Solutions to overcome challenges

Wrap-up

Page 13: Text Analytics for Unlocking the Potential of Big Data

13

Challenges in Text Analytics

1. Creating term frequency matrix for machine learning– One row for each entry– One column for each term/feature describing the entries

3 a

bill

can

cap

cbd

centralise

change

constant

drop

due

going

guys

have

help

hey i

in

is

jeez

keep

my

need

nightmare

one

outs

over

pay

paying

plan

sign

things

times

to

you

1 1 3 2 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 3 12 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 2 1 0 0 1 1 0 0 1 0 0 0 1 1

Treat non-alpha as white spaceCase-insensitiveTerm = word

Page 14: Text Analytics for Unlocking the Potential of Big Data

14

1. Term Frequency MatrixChallenges

3 a

bill

can

cap

cbd

centralise

change

constant

drop

due

going

guys

have

help

hey i

in

is

jeez

keep

my

need

nightmare

one

outs

over

pay

paying

plan

sign

things

times

to

you

1 1 3 2 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 3 12 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 2 1 0 0 1 1 0 0 1 0 0 0 1 1

• Presence of non-informative words

• Different forms of the same words

• Spelling error & typos

• Synonyms

• Homonyms

Page 15: Text Analytics for Unlocking the Potential of Big Data

15

2. Very Large Feature Space Challenges

3 a

bill

can

cap

cbd

centralise

change

constant

drop

due

going

guys

have

help

hey i

in

is

jeez

keep

my

need

nightmare

one

outs

over

pay

paying

plan

sign

things

times

to

you

1 1 3 2 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 3 12 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 2 1 0 0 1 1 0 0 1 0 0 0 1 1

• Many different terms within a single entry – 104 features with just 50 to 100 entries– Sparse entries: Many zeros in the martrix

• Unsupervised learning– Hard to form cohesive clusters with sparse entries

• Supervised learning – Traditional statistical learning techniques need at least 10

labelled examples for each uncorrelated feature

Page 16: Text Analytics for Unlocking the Potential of Big Data

16

Text Analytics for Unlocking the Potential of Big Data

Bhavani Raskutti @ Pacific Brands

5

1 Text analytics & big data

2 New opportunities with text analytics

3 Challenges when mining text

4 Solutions to overcome challenges

Wrap-up

Page 17: Text Analytics for Unlocking the Potential of Big Data

17

1. Term Frequency MatrixSolutions

• Presence of non-informative words– Create a list of stopwords– Remove them from consideration

• Different forms of the same words– Use rule based stemming to remove suffix

• Spelling error & typos– Use some spell-checker OR– Use n-grams (character sequences) as features

• 5-grams for 'single bill': 'singl', 'ingle', 'ngle ', 'gle b', 'le bi', 'e bil‘, ' bill'

• Synonyms– Use a thesaurus (manual or statistical)

• Homonyms– Provide context by using word pair or triplets as features

Page 18: Text Analytics for Unlocking the Potential of Big Data

18

2. Very Large Feature SpaceSolutions

• Use feature selection to identify significant features

• Features are of 3 types:– Very frequent low information content (e.g., stopwords)– Infrequent low information content (occurs once/twice in the set)– Significant middle frequency features

• Many statistical techniques– Inverse document frequency weight– signal-noise ratio– Average discrimination value– …

Unsupervised learningHard to form cohesive clusters with sparse entries

Page 19: Text Analytics for Unlocking the Potential of Big Data

19

2. Very Large Feature Space (Cont’d)Solutions

• Use new techniques based on maximal margin separators that can handle large feature space

• Support Vector Machines

Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

Page 20: Text Analytics for Unlocking the Potential of Big Data

20

Support Vector MachinesSolutions

Customers whoChurned to otherproviders

Customers whoare loyal

Objective:To learn a separator to identify people likely to churn before they do

Page 21: Text Analytics for Unlocking the Potential of Big Data

21

Support Vector MachinesSolutions

What is a good separator?

Maximises margin between two parallel supporting hyperplanes

Separator depends on support vectors

Page 22: Text Analytics for Unlocking the Potential of Big Data

22

Support Vector MachinesSolutions

Why does maximising margins work? Small margin means

more choice & overfits data

Large margin meansless choice & no overfitting

Page 23: Text Analytics for Unlocking the Potential of Big Data

23

2. Very Large Feature Space (Cont’d)Solutions

• Use new techniques based on maximal margin separators that can handle large feature space

• Support Vector Machines– Maximises margin between two classes– Separator depends only on support vectors– Separator obtained using quadratic programming

• Available in some statistical packages

Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

Page 24: Text Analytics for Unlocking the Potential of Big Data

24

Wrap-up

• Text analytics creates new opportunities for businesses to understand their customers– Understanding customer sentiment– Identifying major customer concerns– Tracking sentiment/issues over time

• A few challenges in implementing text analytics– Creating term frequency matrix from text sequence– Large number of features in matrix

• Many techniques to overcome these challenges

Now is the time to use text analytics to unlock the potential of big data in your business!!