spark and elasticsearch for real-time data analysis · what is elasticsearch? scalable, real-time...

28
Spark and Elasticsearch for real-time data analysis 1 Costin Leau, @costinl Elastic

Upload: dinhquynh

Post on 02-Jul-2018

241 views

Category:

Documents


0 download

TRANSCRIPT

Spark and Elasticsearch for real-time data

analysis

1!

Costin Leau, @costinl Elastic

What is Elasticsearch?

Scalable, real-time search and analytics engine

Open-source (on Github, Apache 2 License)

Unstructured search

Sorting

Pagination

Enrichment

Suggestions

Structured search

Aggregations

Elasticsearch for Apache Hadoop

Map/Reduce integration

Scala API

Java API

Spark SQL support

Spark SQL Data Sources

Partition-to-Partition Architecture

Dynamic Runtime Matching

Failure Handling

Co-location

Reacting to streaming data

Live loops

Data keeps on changing

Adapt set of rules

Improves reaction time

Build a model for fast decision making

Keeps the prevention rate high

Categorize data on the fly

Finding interesting data basic approach

Finding interesting data analytics

Finding interesting data through a ML model

MLlib integration - wip

Hashing and featurize functions

Expose the Elasticsearch engine data structures

term vectors

term frequency

document frequency

(vectorize API in the works)

Thank you!

@costinl github.com/elastic

elastic.co