text analysis with sap hana

22
.consulting .solutions .partnershi p Text Analysis with SAP HANA

Upload: christian-lechner

Post on 14-Apr-2017

332 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Text Analysis with SAP HANA

.consulting .solutions .partnership

Text Analysis with SAP HANA

Page 2: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 2

Text Analysis with SAP HANA

Motivation1 3

Text Analysis with SAP HANA2 7

Enhancement Options - Dictionaries and Rules3 21

Page 3: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 3

Text Analysis with SAP HANA

Motivation1 3

Text Analysis with SAP HANA2 7

Enhancement Options - Dictionaries and Rules3 21

Page 4: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4

Text Analysis with SAP HANA

Why do we need Text Analysis?

• According to Merril Lynch 80-90% of all potentially usable business information may originate in unstructured form (Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.)

• The data might origin from:- Social Networks- “Letters” from Customer- ...

• What is the problem with unstructured data?

• It is unstructured!- Not organized- No pre-defined data model- No metadata or mix of data and metadata

We have a lot of information that is relevant for the business but we cannot access it

Page 5: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 5

Text Analysis with SAP HANA

How can we solve that issue?

• Text Analysis: Extracting high quality information from texts

• Typical process of a text analysis:- Parsing of the text- Adding features like linguistic information- Entity recognition: Is it an organization or a person or a place including domain facts like

requests?- Sentiment analysis: What attitudinal information is “hidden” in the text?- Insertion of information to database in structured manner

Page 6: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 6

Text Analysis with SAP HANA

Motivation1 3

Text Analysis with SAP HANA2 7

Enhancement Options - Dictionaries and Rules3 21

Page 7: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 7

Text Analysis with SAP HANA

What has this to do with SAP HANA?

© SAP SE

Page 8: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 8

Text Analysis with SAP HANA

Fulltext Index - Basics

• Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …)

• Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)

Page 9: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 9

Text Analysis with SAP HANA

Entity Extraction

• In order to get valuable information out of the data SAP delivers several configurations

• These configurations focus on entity and fact extraction under specific aspects

• Types of Extraction:

- EXTRACTION_CORE

- EXTRACTION_CORE_ENTERPRISE

- EXTRACTION_CORE_PUBLIC_SECTOR

- EXTRACTION_CORE_VOICEOFCUSTOMER

Page 10: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 10

DEMO

Page 11: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 11

Text Analysis with SAP HANA

Motivation1 3

Text Analysis with SAP HANA2 7

Enhancement Options - Dictionaries and Rules3 21

Page 12: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 12

Text Analysis with SAP HANA

Custom Dictionary

• In several use cases you need to enhance the dictionary due to your business domain

• Structure of a dictionary

© SAP SE

Page 13: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 13

Text Analysis with HANA – Workflow of Enhancement

1. Find an extraction configuration that is most fitting for you

2. Copy the configuration into the target folder

3. Create a new custom dictionary

4. Reference the dictionary in your configuration copy

5. Recreate the fulltext index using your custom configuration

Page 14: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 14

DEMO

Page 15: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 15

Text Analysis with HANA – What’s next?

• Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities

• Good example for this are sports!

• We use the example of CrossFit® … as there are some funny facts to extract

• Question: How can we extract complex entities from a text?

• Examples: - Did somebody attend a CrossFit training?- Does somebody want to join a CrossFit box?

Page 16: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 16

Text Analysis with HANA – Text Analysis Extraction Rules

• Extraction rules (CGUL rules): pattern-based language for pattern matching using character or token-based regular expressions combined with linguistic attributes to define custom entity types.

• Goal of the rule sets:- Extract complex facts based on relations between entities and predicates.

- Identify entities in domain-specific language and capture facts expressed in new, popular “slang”

Page 17: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 17

Text Analysis with HANA – Text Analysis Extraction Rules

Extraction Rule

Regular ExpressionsTokens

Luck Dictionaries

Page 18: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 20

DEMO

Page 19: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 21

Text Analysis with HANA – “Lessons Learned”

• Text Analysis on SAP HANA is extremely powerful

• Besides the delivered content you have a lot of options to adopt the text analysis to extract the entities and facts that you need

• This also means you have a lot of options that you can set the wrong way

• Since SP09 rules get compiled upon activation (no separate compilation necessary)

• The documentation is mostly ok but has room for improvement in case of extraction rules

• Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell No support in IDE You can usually activate all objects, create the index … but the index remains empty

Page 20: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 22

Q&A

Page 21: Text Analysis with SAP HANA

.consulting .solutions .partnership

Dr. Christian LechnerPrincipal IT Consultant

+49 (0) 171 7617190 [email protected]

http://scn.sap.com/people/christian.lechner @lechnerc77

Page 22: Text Analysis with SAP HANA

Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 24

Text Analysis with HANA – Ressources

• SAP HANA Search Developer Guide (Fulltext Index Options)help.sap.com -> Search Developer Guide

• SAP HANA Text Analysis Developer Guide: help.sap.com -> TA Developer Guide

• SAP HANA Text Analysis Language Reference Guide: help.sap.com -> TA Language Refrence Guide

• SAP HANA Text Analysis Extraction Customization Guide:help.sap.com -> TA Extraction Customization Guide

• YouTube Playlist of SAP HANA Academy:Text Analysis and Search