finding the needle in an unstructured big data …...unstructured data more prevalent (computer...

16
#AnalyticsX Copyright © 2016, SAS Institute Inc. All rights reserved. Finding the Needle in an Unstructured Big Data Haystack Feng Ye Senior Software Developer Teradata

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

#AnalyticsXC o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Finding the Needle in an Unstructured Big Data Haystack

Feng YeSenior Software DeveloperTeradata

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Why search is relevant?

Information age with exploding amount of data

» Business data doubles every 1.2 years per SanDisk

Unstructured data more prevalent (Computer World: 80% of all data)

Critical step to filter out noise, gain insights and enable analytics

SAS DNA

Every SAS solution will include a search capability

Highly distributed solution to index and search business data

Viya Search

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Leveraging Viya Capabilities Cloud-ready, elastic and scalable

Open analytics coding environment (Java, Python, LUA, SAS)

Fast, distributed in-memory processing

Resilient architecture with guaranteed failover

Out of box in Viya, no need for separate install

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Architecture Diagram

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Flexible Search Index Easy to Populate Records/Documents of Various Fields

from database records and observation tables

from structured data like HTML, XML - tags suggest fields

from unstructured data - free text fills long fields

from data mining output - fields take entities of many types

title body_text date facets etc

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Top Features High performance cloud based indexing/search over big data

Support for both unstructured (text) and structured data

Full text searching: tokenized string field search with wildcard and proximity support or exact field match

Multi-language support (30+)

Flexible query: nested Boolean logic (AND, OR and NOT) and range filters

Flexible/high performance aggregations: metric and bucket based aggregations

Geo support: data tied to location; filtering based on geo shapes

Incremental indexing: support streaming data input

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Key Advantages Powerful aggregation: post-search statistics over user-defined

record groupings

Multiple metrics per aggregation as opposed to one per aggregation

Terms aggregation over multiple fields in one shot

Sub-second response time of such aggregation for 30 fields over a million documents, on shared cloud

search group metricsmean, sum

mean, sum

metricssearch group

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Key Advantages (cont’d) Unified search query and filters

Easy integration with other SAS analytics capabilities

Text mining, categorization, sentiment analysis

Recommendation on top of search

Comprehensive support for 30 languages

Committed to backward compatibility (SAS standard)

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Geo-aware Search Data includes geo location information

Geo filters for various shapes

Rectangle

Circle

Polygon (City, County, State, Country)

Geo aggregations

Geo Distance (circle based)

Geo Bounds

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Timeline Time based aggregation/chronicle trend

Drill down/zoom in

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Facet Search/Aggregation Report counts for multiple facets in one shot

High performance facet report calculation

Multiple valued fields support

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Text Analytics Flow

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

A3S integration: REST API

Dynamic schema support

Nested document search and aggregation

Future

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Demo

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Viya Search offers flexible and powerful search and aggregation capabilities

It provides geo awareness and enables timeline drawing

Performance/scalability and fault tolerance

Take Aways

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#AnalyticsX