unstructured big data - ihbi webinar b… · unstructured)big)data 2016003024 customers: gender/age...

31
chp.cmich.edu/ihbi UNSTRUCTURED BIG DATA 20160324 Customers: Gender/age Socioeconomic Status Geography Weather/Culture/Terrain Satisfaction Products/Services: Features Benefit Irritant Undesireable Cost Purchase Environmental Influencers Promotion Competitors Antagonists Producers: Reputation Innovation Quality Social Contract

Upload: others

Post on 24-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

UNSTRUCTURED  BIG  DATA2016-­‐03-­‐24

Customers:Gender/age

Socioeconomic   Status

GeographyWeather/Culture/Terrain

Satisfaction

Products/Services:FeaturesBenefitIrritantUndesireable

CostPurchaseEnvironmental

InfluencersPromotionCompetitorsAntagonists

Producers:Reputation

Innovation

Quality

Social  Contract

Page 2: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Big  Data:  Internet  of  Things2

Internet  of  Things  (IoT):  the  world  through  the  eyes  of  sensors

Page 3: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Big  Data:  Unstructured  Communications3

Unstructured  Data  (Texts):  the  world  through  the  eyes  of  humans

In  this  talk,  we  focus  on  text,  recognizing  that  audio,   image,  and  gesturing   are  active  research   areas.

Page 4: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Topics¨ Business  Value¨ Data  Sources¨ Data  Selection¨ Taxonomy  &  Ontology¨ Analytics¨ Knowledge  Extraction¨ Knowledge  Presentation¨ RoadMap

4

Page 5: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Business  Value5

¨ Competition  is  forcing  companies  to  be  much  quicker   at  detecting   changing  customer  values  &  expectations:¤ Performance¤ Environmental   impact¤ Safety¤ Aesthetics¤ Cost

¨ Text  Mining  promises  rapid,  continuing  detailed  identification  of  market  niches  and  recent  technology  developments:¤ Geographic   and  weather  related¤ Age-­‐group,  gender¤ Ethnic  background¤ Socio-­‐economic  background

Page 6: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Requirements6

¨ To  obtain  and  maintain  a  capability  to  keep  abreast  of  these  dynamics,  the  successful  organization  must:¤ Monitor  the  communications  between  members  of  the  market  segments¤ Identify  individuals  (persons  and  organizations)  with  high  impact¤ Be  able  to  extract  and  quantify  changes  in  the  values  and  sentiments  

relevant  to  the  organizations  goals¤ Identify  new  solutions  for  your  business  problems¤ Inform  the  appropriate  business  decision  makers  about  important  

changes.¨ This  implies  hardware,  communications  and  software  investments  

matched  by  people  with  advanced  IT  and  business  analytics  skills

Page 7: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Organizational  Implications7

¨ Deployment  of  Capabilities   (Skills  &  Hardware)¨ Obtaining  and  using  data  from  outside the  organization

¨ Data  Security   (Knowing  your  sources  – hacking)¨ Data  Quality  Assessment  methods   (Contamination)¨ Retention  Policies   (Where  to  keep  what  data)¨ Data  Stewardship  

Page 8: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Deployment8

¨ Outsourcing¨ Funding  projects

¤ Investments  to  learn  how  to  extract  value  ¨ Centralization  of  the  capability

¤ Until  tools  are  available  to  reduce  the  learning  curves¤ Requires  project  prioritization  to  pursue  realistic  business  value

Page 9: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Data  Sources

¨ Primary  Data  Sources¨ Data  Products  from  3rd parties

¨ Extractors¨ Filters¨ Text   Cleansing/Transforms¨ Synchronize,  harmonize,   and  

integrate¨ Data  staging

¨ Language   (Varies  by  ‘Mother  tongue’)¤ Spelling  accuracy¤ Grammar¤ Translation  

¨ Source  purpose¤ Observations¤ Opinion  ¤ Analysis¤ Emotional  reaction

¨ Timeline¨ Granularity¨ Target  audience¨ Author  and  Author  Affiliations

AttributesCollection  Factors9

Page 10: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Data  Collections10

Craw

lers,  

Filte

rs  &  Cleaners

Topical   RepositoriesGeography,  Time  frame

Enhanced  Data  Warehouse

Meta  DataDirectory

Page 11: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Data  Sources¨ Social  Media¨ Blogs¨ Emails¨ Search  Engines¨ Web-­‐pages¨ Click-­‐streams¨ RSS  feeds¨ Newspapers,   Trade  journals,  Magazines¨ Patents,  Reports¨ Peer-­‐reviewed   journals¨ Government/Industry  data  sources

Collection   Management¨ Meta-­‐data  of  specific  sources

¤ Language,  Frequency,  Filter   details¨ Crawlers¨ Third-­‐party  extractors¨ Scheduler

How  /  Who  will  manage  this?

11

Page 12: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Data  Selection  Criteria¨ Subject  of  interest¨ Granularity

¤ Time  (When)¤ Geography  (Where)¤ Population  segment  (Who)¤ Technology   (What)

¨ Values   (Why)¤ Performance¤ Cost¤ Sustainability¤ Environmental  impact¤ etc.

¨ Keywords¤ Connected  documents  (links,  

citations,  site-­‐maps,  ..)¨ Tags

¤ Explicit  (Annotators)¤ Inferred

¨ Fuzzy  Logic  (convert  quantitative  data  to  words)

12

Page 13: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Search  Logic¨ Generate   new  extraction   processes  

for  new  subjects

¨ Ongoing  Maintenance   of  data  collections   for  specific   business  purposes

¨ Modification   of  previous   projects’  data  collection   processes

¨ Shut  down  of  obsolete   data  collections

¨ Meta-­‐Data   of  Past  Studies¤ Sources¤ Search  logic¤ Search  engine   identification¤ Post-­‐project  Assessment

¨ Granularity  transforms¤ Time,  Geography,  ..

¨ Keyword  lists   for  specific   concepts¨ Previously  generated   Annotators

How  /  Who  will  manage   this?

13

Page 14: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Data  Extraction  for  Business  Goals14

Topical   RepositoriesGeography,  Time  frame Unique  

requirements

Previous  selection  logic

WhenWhereWhoWhatWhy

Taxonomy  &  Ontology

Clusters

Annotators• Time  frame• Geography• Author• Inheritance• Sentiment• Concept   titles• …

Harmonization• Part  of  speech• Synonyms• Extractors• Start/Stop  lists• …

Page 15: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Taxonomy,  Ontology  and  Annotators15

¨ Taxonomy  – set  of  unique  concepts  (with  tags)  that  cover  the  subject  of  interest

¨ Ontology  – relationships  between  terms  of  the  taxonomy:¤ Contained  within/  is  part  of¤ Sequence  in  time

¨ Annotators  – generation  of  tags  to  specify  non-­‐obvious  attributes

Page 16: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Taxonomy  &  Ontology  Development¨ What  is  a  document?

¤ Sentence?  Paragraph?

¨ Word  x  Document  matrix¤ Parse  &  Stem

¨ Taxonomy  Generation

¨ Clustering

¨ Naming  the  Clusters

¨ Sentiment  Assignment

¨ Residuals  (not  clearly  interesting)

¨ Synonyms

¨ Concepts

¨ Start  words/concepts

¨ Stop  words/concepts

¨ Sentiment  words  &  phrases¨ Annotators

¤ Parts  of  Speech

¤ UIMA-­‐rules  (Unstructured  Information  Management   applications)

¤ Inference  (products  without  a  component  of  interest  might  be  tagged  as  ‘component-­‐free’)

¤ Inheritance

16

Page 17: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Normalization17

¨ Define  the  base  for  counts:¤ Put  in  terms  of  communication  intensity

n Segment   populationn School   in  session,   or  not

¤ Number  of  ‘touches’n Followers,   readers,   subscribers

¤ Responsesn Likes,   replies,   forum   thread   length,   …

Page 18: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Clustering  Process¨ Iterative  clustering  and  

cluster/document  naming  by  subject

¨ Multiple  dimensions:¤ Verbs¤ Nouns

¨ Hierarchic  clustering¤ Features,   Benefits¤ Values

¨ Self-­‐organizing,  k-­‐means,  ..¨ Organized  lists  of:

¤ Synonyms¤ Stop  Lists¤ Start  Lists¤ Common   concepts

¨ Sentiment  Rules  for  weights¨ Annotators  &  Data  Supplements

How  /  Who  will  manage  this?

18

Page 19: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Venn  Diagrams  -­‐ Relationships

¨ Universe  of  documents¨ Classes  of  interest  in  hierarchical  fashion¨ Classes  that  have  high  chronological  correlation¨ Classes  that  are  ‘competitors’

¤ Within  same  higher  level¤ In  different  higher  levels  of  hierarchy¤ Differences  in  Feature  frequencies  between  ‘competitors’

19

Page 20: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Modeling¨ Map  classes  together  to  

define  interest  in  differences  

¨ Analysis  of  existing  data  in  paired  clusters  over  time

¨ Implement  SPC  to  trigger  alerts  when  outside  control  intervals

¨ Mapping  tool  to  show  classes  and  relationships

¨ Process  to  convert  maps  to  statistical  comparisons  of  frequencies  within  classes

¨ SPC  triggers:

¤ Big  change   in  frequency  of  {concept}   in  {class}

¤ Emerging/fading   {concepts}  in  {class}¤ Big  change   in  relationship   between   {class1}  

and  {class2}

¨ Find  high  impact  documents  that  have/will   likely  affect  {concept}   frequency  in  {classes}  in  future

¨ Business  user  feedback  module  to  identify  false  positives/negatives  for  model  improvements

20

Page 21: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Analytics21

WhenWhereWhoWhatWhy

Clusters

Class  Relationships• Competitive• Parent-­‐Child• Sibling• ..

ClassifiersNeural  Networks

Support  Vector  Machines

FrequencyRatios  across  ClassesRatios  within  Classes

SPC over  time

NetworksInfluencers

New  Docs

Decision  TreesScoring  Leaves

Sequence  Analysis

Events  that  are  followed  shortly  by  shifts  in  distributions:Event  →  Impacts  (duration)

Feature  CorrelationsWithin  a  ClassBetween   Classes

SentimentPos/Neg/Neutral

Page 22: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Analytics¨ Classification¨ Trends  within  classes¨ Organizing  Classes

¤ Competing   (Sentiment,  with/without,  ..)¤ Independent¤ Hierarchical¤ Chronological

¨ Sentiment  measure   for  classes¨ Statistical  differences   in  related  class  

contents¨ Emerging/dying  concepts¨ Influence   Tracking/Measurement

¨ Neural  Networks  (classifiers)¨ Normalization¨ SPC¨ Integration  with  structured  data¨ Network  Analysis¨ Sequence  Analysis¨ False  positives/negatives¨ Sentiment  measures¨ Issues:

¤ Ambiguity¤ Sarcasm¤ Analogies

22

Page 23: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Knowledge  Extraction¨ Dynamics  of  past  to  current  Classes

¤ Size  of  subsets¤ Relationships   of  competitors¤ Relationships   between  peer  classes

¨ Role  of  specific  authors  or  groups  of  authors¤ Age,  gender¤ Geography,  organizational   association

¨ Relationship  of  Class  statistics  to  events  (chronology)¤ What  events  change  perceptions?

¨ Important  events¤ With   impacts

¨ Important  sources/authors¤ Influence

¨ Strong  associations¤ Classes   that  change  synchronously

¨ Sequence   rules¤ Possible  cause-­‐effects

¨ SPC¤ Dynamics  of  the  field

¨ Insight  Delivery:  Alerting  sub-­‐system  to  route  specified   insights  to  roles  →  people   (email  addresses)

23

Page 24: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Knowledge  Presentation¨ Flow  charts  of  data  through  

processing  steps¨ Texts  (Examples  or  

summaries)¨ Responsibilities¨ Class  Relationships¨ Timelines¨ Related  Class  attributes¨ Document  and  Agent  

relationships

¨ PPTX  of  Procedures¨ Narratives  of  conclusions¨ Swimlanes¨ Venn  Diagrams¨ Line  graphs  (Timelines)¨ Tables  of  Class  Attributes

¤ Pie  charts,  Gauges¨ Network  Diagrams¨ Alerts

24

Page 25: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Alerts25

ClassifiersNeural  Networks

Support  Vector  Machines

Updated  Knowledge  Bases

Business  Analysts

Page 26: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

High  Impact  Agents

¨ People  – who,  how  identified,  when¨ Organizations¨ Events¨ Document  types¨ Source  Types

26

Page 27: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Analysis  →  Prediction  →  Proscription27

¨ If  sub-­‐markets  are  unhappy  with  your  ‘product’:¨ Use  predictive  models  to  estimate  value  involved.

¤ If  customer  education  is  the  answer  (“you  have  an  appropriate  product”),  send  education  messages  via  the  Influencers  showing  evidence  of  mis-­‐education

¤ If  you  evolve  your  product  features/costs,  do  that  and  then  educate  your  potential  customers.

Page 28: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

RoadMap Swimlanes28

Sponsor  Sets  Budget  +  Timeline   for  POC

Find  Case  Histories

Specify  PoCscope  and  goals

Find  Trusted  Consultants  &  Vendors

ExecutiveMgt

KnowledgeMgt

Bus.  ProcessMgt

IT  &  Procurement

Set  Budget  +  Timeline   for  Production  Env

Establish  methodology   for  Taxonomy  &  Ontology

Prioritize   projects  of  scope  &  goals

Identify  &  train  resources;  procure  hardware

Monitor  Budget  &  Returns  of  Big  Data   investments

Manage  Taxonomy  &  Ontology  growth

Deploy  capability  as  capacity  grows

Maintain  skills  &  hardware  to  support  capability

Page 29: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Topics  Discussed¨ Business  Value¨ Data  Sources¨ Data  Selection¨ Taxonomy  &  Ontology¨ Analytics¨ Knowledge  Extraction¨ Knowledge  Presentation¨ RoadMap

29

Page 30: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

References¨ Elder,  J.,  Miner  ,  G.,  &  Nisbet,  B.  (2012).  Practical  text  mining  and  statistical  analysis  for  non-­‐structured  text  data  applications.Waltham:  Elsevier.

¨ Goutam,  C.,  Pagolu,  M.,  &  Garla,  a.  S.  (2013).  Text  Mining  and  Analysis  Practical  Methods,  Examples,  and  Case  Studies  using  SAS. Cary:  SAS  Institute.

¨ Reamy,  T.  (n.d.).  Enterprise  Content  Categorization  –How  to  Successfully  Choose,  Develop  and  Implement  a  Semantic  Strategy. KAPS  Group.

30

Page 31: Unstructured Big Data - IHBI webinar B… · UNSTRUCTURED)BIG)DATA 2016003024 Customers: Gender/age Socioeconomic>Status Geography Weather/Culture/Terrain Satisfaction Products/Services:

chp.cmich.edu/ihbi

Contacts31

¨ Dr.  Imad Haidar,  Sr Researcher  &  Data  Scientist,  IHBI,  [email protected]

¨ Chunxia (Shar)  Tang,  Senior  Research  Analyst,  [email protected]

¨ James  Mentele,  Senior  Research  Fellow,  [email protected]