jeroen kleinhoven (treparel), turn big content into business insights - data donderdag

17
Treparel Delftechpark 26 2628 XH Delft The Netherlands www.treparel.com Turn Big Content in to Business Insights Jeroen Kleinhoven CEO September 4, 2014

Upload: cre-aid

Post on 28-Nov-2014

149 views

Category:

Data & Analytics


2 download

DESCRIPTION

Presentatie van Jeroen Kleinhoven, CEO van Treparel over Big Content en Big data tijdens Data Donderdag op 4 september 2014.

TRANSCRIPT

Page 1: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Treparel Delftechpark 26 2628 XH Delft

The Netherlands www.treparel.com

Turn Big Content

in to Business Insights

Jeroen Kleinhoven CEO

September 4, 2014

Page 2: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Gartner  Hype  Cycle,  Emerging  Technologie  July  2014:  Where  are    Content  Analy?cs  and  Big  Data?  

Treparel KMX – All Rights Reserved 2014 www.treparel.com 2

Mainstream adoption •  Content Analytics is 2 to 5 years away. •  Big Data is 5 to 10 years away.

Page 3: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

About  Treparel  

•  Company  

–  HQ  in  DelG  (The  Netherlands)  

–  R&D  in  ecosystem:  DelG  University  of  Technology,  Univ.  of  Paris  and  Sao  Paulo  

–  Founded  by  a  Data  Scien?st,  a  Visualiza?on  Prof  and  Search/Machine  Learning  engineers.  Managed  by  Gartner  VP  since  2013  

•  Treparel  is  a  solu?on  provider:  

–  Rooted  in  Patent  Analy?cs  &  Visualiza?on,  Evolved  in  to  Big  Content  Solu?ons  

–  Big  Content  and  KMX:  content  type  agnos?c  Search,  Text  Analy?cs  &  Visualiza?on    

–  KMX  (Knowledge  Mapping  eXplora?on)  provides  fast  and  accurate  insights  in  Big  Content  (email,  patents,  literature,  web,  social)  for  making  be_er  informed  decisions  

•  3  types  of  clients:  

–  End  Users:  Client/Server  applica?on  (Download,  Install,  Run)  

–  Partners:  Client/Server  +  Developer  API  (Download,  Install,  Run  +  Integrate)  

–  Independent:  Developers,  Researchers:  Developer  API  +  C/S…  OpenSource  (tbd)  

Treparel KMX – All Rights Reserved 2014 3 www.treparel.com

Page 4: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

KMX  -­‐  extract,  analyze  &  visualize  pa_erns    in  large  content  collec?ons  

Treparel KMX – All Rights Reserved 2014 5 www.treparel.com

1.  Landscaping / Clustering: Examine a content cluster and extract entities or references to people, products, locations, and other concepts

2. Categorization/Classification: Group similar information together

Page 5: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Value  from  Big  Content  in  Publishing  

•  Examples  of  added  value  for  Publishers  :  1.  Content  dashboarding:  offering  Business  Intelligence  

style  Search,  Repor?ng,  Analy?cs  and  Visualiza?on    3.  Explora?on  of  content  that  will  not  show  up  in  a  

standard  search  query    4.  Interac?ve  Content  Naviga?on  As  well  as:  4.  Ar?cle  recommenda?ons,  Smart  collec?ons,  Group  

tagging  

Treparel KMX – All rights reserved 2014 6

Page 6: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

1.  Content  Dashboard:  ease-­‐of-­‐use  naviga?on  in  large  sets  of  content  (Report  –  Search  –  Analyse  –  Visualize)  

Page 7 |

Ease of Use access to Research, Patents, Business News, Legislation

Treparel KMX – All Rights Reserved 2014 7

Recorded  Demo:  h_p://treparel.com/next-­‐gen-­‐ip-­‐

rd-­‐dashboard/      

Page 7: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

2.  Enhance  users  ability  to  visually  explore  relevant  (hidden)  content  -­‐  2  

Page 8 |

Interactive taxonomy with multiple coupled views incl. integrated visualizations and search in large sets of documents

Treparel KMX – All Rights Reserved 2014 8

Page 8: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

2.  Enhance  users  ability  to  visually  explore  content  (example:  search  in  research  on  Ebola)  

Page 9 |

Zoomlevel 1

Zoomlevel 2 Zoomlevel 3

Clustering: Automatic annotation and zooming on large sets of documents Treparel KMX – All Rights Reserved 2014 9

Page 9: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

3.  Explora?on  through  classifica?on  of  content    (that  will  not  show  up  in  a  standard  search  query)  

Publishing  Database  

10.000 documents

1.000 documents

10 documents

Ranking  

Queries  

Filtering  

Present Final Results

Content  Dashboard      

Ranking   Filtering  

Ranking   Filtering  

Treparel KMX – All rights reserved 2014

Page 10: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Key  Take  Aways  

Page 11 | Treparel KMX – All Rights Reserved 2014 11

Treparel is interested to partner to empower Content Rich Search-Driven solutions.

•  Mail me your details at [email protected] when you’re interested in:

1.  Getting a 30 days free trial 2.  Test driving the KMX API in your content application

or 3.  To be part of the pre launch group for… KMX OpenSource.

Page 11: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Treparel KMX – All rights reserved 2014 12 www.treparel.com

APPENDIX  

Page 12: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

How  to  posi?on  KMX  in  Big  Content  Analy?cs  

KMX & Developer API

Content Dashboard

Developer Partnerships

Treparel KMX – All Rights Reserved 2014 13

Key Solutions: 1.  Intellectual Property 2.  eDiscovery 3.  Publishing: Law, IP & Science 4.  Risk & Compliance 5.  Fraud & Forensics

Today’s topic

Page 13: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Visualiza?on  

Clustering   Classifica?on  

Text  Preprocessing  and  Indexing  

Acquire  documents  

Present  Results  

Taxonomies,  Ontologies  

Seman?c  Analysis  

KMX  Text  Analy?cs  Applica?on  overview  

KMX  unique  func?ons:  •  Extract  concepts  in  context  using  clustering  and  classifica?on  of  documents  

•  Use  classifica?on  to  create  ranked  lists  and  to  tag  subsets  

•  Support  of  binary  and  mul?-­‐class  Classifica?on  

•  Enterprise  edi?on  (server/cloud)  &  Professional  edi?on  (desktop)  

•  Integra?on  with  other  applica?ons  through  KMX  API  

Treparel KMX – All rights reserved 2014 www.treparel.com 14

Query & Search Tools

Page 14: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Benefits:  Get  quick  insights  through  automated  visual  clusters  with  annota?ons  to  enhance  the  discovery  process    1.  Analyze  the  clusters  and  the  rela?onships  in  the  data    2.  Explore  outliers  in  the  data  3.  Find  documents  of  interest  

What  it  does:  A  visualiza?on  of  clusters  where  the  documents  are  displayed  as  points  and  the  distance  between  them  shows  their  similarity.      What  KMX  delivers:  Use  KMX  to  do:  1.  Perform  text  preprocessing  (stemming/tokeniza?on  etc)  2.  Calculate  between  all  documents  a  similarity  measure  3.  Calculate  visualiza?on  (landscape)  with  automa?c  annota?on  4.  Create  the  visualiza?on    

–  As  a  sta?c  image  –  Or  provide  interac?on  where  the  user  can  zoom  in/out  with  

support  for  adap?ve  annota?on  

Clustering:  User  Unsupervised  Analy?cs  

Treparel KMX – All rights reserved 2014 15

Page 15: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Benefits:  Finding  fast,  accurate  and  precise  small  result  sets  and  enabling  trend  repor?ng  and  Aler?ng  by  reusing  predefined  categoriza?on  models.  1.  Obtain  a  ranked  list  of  the  most  relevant  documents    2.  Separate  the  important  documents  from  the  irrelevant  documents  

(noise)  

How  it  works:  A  list  of  the  relevant  documents  defined  from  a  users  perspec?ve.      What  KMX  delivers  Use  KMX  to  do:  1.  Tag  (label)  a  small  number  of  relevant  and  irrelevant  documents  

–  Use  search  to  iden?fy  documents  that  need  to  be  tagged  –  Perform  manual  tagging  –  Select  documents  interac?ve  from  the  visualiza?on  

2.  Create  a  Classifier  (categorizer)  using  the  tagged  documents  3.  Automa?cally  perform  the  classifica?on  on  all  documents    4.  Obtain  the  important  documents  as  ranked  high  and  the  irrelevant  

documents  which  are  ranked  low  

Classifica?on:  User  Supervised  Analy?cs  

Treparel KMX – All rights reserved 2014 16

Page 16: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

KMX  API:  Embed  Advanced  Text  Analy?cs  func?ons  

Clustering Provides users unsupervised analytics and automatically identifies inherent themes or information clusters. Through a dynamic hierarchical topic view into search results it enables users to quickly focus on annotated subjects rather than scrolling through long results lists.

Classification Supervised analytics to help users automatically categorize large sets of documents. The Classification process can use a small number of documents sets for learn-by-example categorization. By sorting the content of documents by topic, relevancy and keywords users can apply their own models or rules for classification.

Visualization Advanced visual knowledge

discovery for displaying, exporting and sharing data

results, ranked document lists, labeled and enriched data or

interactive visualizations.

Terms can be extracted to use in building thesauri or

taxonomies.

KMX API XML-RPC and REST (JSON)

Python Pickle protocol

Server: User / Tenant mgt User objects mgt (datasets,

work spaces, classifiers, stop lists,.)

Databases: Oracle, PostgreSQL

Client Application:

Native Windows (for creating Analysis pipelines)

Using QT for GUI Using OpenGL for

visualizations

Page 17: Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag

Industry  Thought  Leaders  about  KMX  

“Treparel  KMX’s  visualiza(on  capabili(es  around  its  auto-­‐categoriza>on  and  clustering  offer  immediate  insight  into  unstructured  data  sets  and  appear  to  be  adaptable  and  customizable  to  customer  needs.  Its  approach  to  auto-­‐categoriza>on  u>lizes  sta>s>cal  principles  and  machine  learning  that  require  significantly  less  training  and  tuning  on  the  part  of  customers  than  other  approaches.”  David  Schubmehl,  IDC  

“As  we  acquire  more  and  more  informa>on,  we  need  tools  that  will  guide  us  through  the  data  maze.  Analysts  need  tools  to  help  them  understand  pa;erns  and  define  clusters.    Users  need  to  explore  data  to  uncover  rela>onships  from  scaNered  sources.  Treparel’s  KMX  serves  both  these  needs  with  its  ability  to  cluster  and  categorize  collec(ons  of  data  with  a  high  degree  of  accuracy,  and  its  interac>ve  visualiza>on  tools  that  enable  explora>on  of  large  data  sets.”  Sue  Feldman,  Synthexis.com  (author:  The  Answer  Machine.  

Treparel KMX – All Rights Reserved 2014 18