text analysis with sap hana
TRANSCRIPT
.consulting .solutions .partnership
Text Analysis with SAP HANA
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 2
Text Analysis with SAP HANA
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 3
Text Analysis with SAP HANA
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4
Text Analysis with SAP HANA
Why do we need Text Analysis?
• According to Merril Lynch 80-90% of all potentially usable business information may originate in unstructured form (Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.)
• The data might origin from:- Social Networks- “Letters” from Customer- ...
• What is the problem with unstructured data?
• It is unstructured!- Not organized- No pre-defined data model- No metadata or mix of data and metadata
We have a lot of information that is relevant for the business but we cannot access it
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 5
Text Analysis with SAP HANA
How can we solve that issue?
• Text Analysis: Extracting high quality information from texts
• Typical process of a text analysis:- Parsing of the text- Adding features like linguistic information- Entity recognition: Is it an organization or a person or a place including domain facts like
requests?- Sentiment analysis: What attitudinal information is “hidden” in the text?- Insertion of information to database in structured manner
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 6
Text Analysis with SAP HANA
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 7
Text Analysis with SAP HANA
What has this to do with SAP HANA?
© SAP SE
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 8
Text Analysis with SAP HANA
Fulltext Index - Basics
• Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …)
• Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 9
Text Analysis with SAP HANA
Entity Extraction
• In order to get valuable information out of the data SAP delivers several configurations
• These configurations focus on entity and fact extraction under specific aspects
• Types of Extraction:
- EXTRACTION_CORE
- EXTRACTION_CORE_ENTERPRISE
- EXTRACTION_CORE_PUBLIC_SECTOR
- EXTRACTION_CORE_VOICEOFCUSTOMER
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 10
DEMO
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 11
Text Analysis with SAP HANA
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 12
Text Analysis with SAP HANA
Custom Dictionary
• In several use cases you need to enhance the dictionary due to your business domain
• Structure of a dictionary
© SAP SE
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 13
Text Analysis with HANA – Workflow of Enhancement
1. Find an extraction configuration that is most fitting for you
2. Copy the configuration into the target folder
3. Create a new custom dictionary
4. Reference the dictionary in your configuration copy
5. Recreate the fulltext index using your custom configuration
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 14
DEMO
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 15
Text Analysis with HANA – What’s next?
• Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities
• Good example for this are sports!
• We use the example of CrossFit® … as there are some funny facts to extract
• Question: How can we extract complex entities from a text?
• Examples: - Did somebody attend a CrossFit training?- Does somebody want to join a CrossFit box?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 16
Text Analysis with HANA – Text Analysis Extraction Rules
• Extraction rules (CGUL rules): pattern-based language for pattern matching using character or token-based regular expressions combined with linguistic attributes to define custom entity types.
• Goal of the rule sets:- Extract complex facts based on relations between entities and predicates.
- Identify entities in domain-specific language and capture facts expressed in new, popular “slang”
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 17
Text Analysis with HANA – Text Analysis Extraction Rules
Extraction Rule
Regular ExpressionsTokens
Luck Dictionaries
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 20
DEMO
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 21
Text Analysis with HANA – “Lessons Learned”
• Text Analysis on SAP HANA is extremely powerful
• Besides the delivered content you have a lot of options to adopt the text analysis to extract the entities and facts that you need
• This also means you have a lot of options that you can set the wrong way
• Since SP09 rules get compiled upon activation (no separate compilation necessary)
• The documentation is mostly ok but has room for improvement in case of extraction rules
• Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell No support in IDE You can usually activate all objects, create the index … but the index remains empty
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 22
Q&A
.consulting .solutions .partnership
Dr. Christian LechnerPrincipal IT Consultant
+49 (0) 171 7617190 [email protected]
http://scn.sap.com/people/christian.lechner @lechnerc77
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 24
Text Analysis with HANA – Ressources
• SAP HANA Search Developer Guide (Fulltext Index Options)help.sap.com -> Search Developer Guide
• SAP HANA Text Analysis Developer Guide: help.sap.com -> TA Developer Guide
• SAP HANA Text Analysis Language Reference Guide: help.sap.com -> TA Language Refrence Guide
• SAP HANA Text Analysis Extraction Customization Guide:help.sap.com -> TA Extraction Customization Guide
• YouTube Playlist of SAP HANA Academy:Text Analysis and Search