how to integrate splunk with any data solution

Post on 26-May-2015

4.785 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A presentation Julian Hyde gave to the Splunk 2012 User conference in Las Vegas, Tue 2012/9/11. Julian demonstrated a new technology called Optiq, described how it could be used to integrate data in Splunk with other systems, and demonstrated several queries accessing data in Splunk via SQL and JDBC.

TRANSCRIPT

Copyright © 2012 Splunk Inc.

How to Integrate Splunkwith any Data Solution

Julian Hyde (Optiq) @julianhyde

http://github.com/julianhyde/optiqhttp://github.com/julianhyde/optiq-splunk

Splunk Worldwide UsersConference 2012

Why are we here?I'm going to explain how to use Splunk to access all of the data in your

enterprise.

And also to let people in your enterprise use data in Splunk.

This isn't easy. We'll be showing some raw technology – the new Optiq project and its Splunk adapter.

But it's open source, so you can all get your hands on it. :)

About meDatabase hacker

Open source hacker

Author of Mondrian (Pentaho Analysis)

Startup fiend

http://www.flickr.com/photos/torkildr/3462606643

http://www.flickr.com/photos/sylvar/31436961/

“Big Data”Right data, right time

Diverse data sources / Performance / Suitable format

ExampleAccessing Splunk data via SQL

Sqlline (a standard JDBC client)

How do it (wrong)

Splunk Optiq

SELECT “source”, “product_id”FROM “splunk”.”splunk”WHERE “action” = 'purchase'

“search”

filter

action ='purchase'

How do it (right)

Splunk Optiq

SELECT “source”, “product_id”FROM “splunk”.”splunk”WHERE “action” = 'purchase'

“searchaction=purchase”

Example #2Combining data from 2 sources (Splunk & MySQL)

Also possible: 3 or more sources; 3-way joins; unions

MySQL

Splunk

Expression tree SELECT p.“product_name”, COUNT(*) AS cFROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id”WHERE s.“action” = 'purchase'GROUP BY p.”product_name”ORDER BY c DESC

join

Key: product_id

group

Key: product_nameAgg: count

filter

Condition:action =

'purchase'

sort

Key: c DESC

scan

scan

Table: splunk

Table: products

Splunk

Expression tree(optimized)

SELECT p.“product_name”, COUNT(*) AS cFROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id”WHERE s.“action” = 'purchase'GROUP BY p.”product_name”ORDER BY c DESC

join

Key: product_id

group

Key: product_nameAgg: count

filter

Condition:action =

'purchase'

sort

Key: c DESC

scan

Table: splunk

MySQL

scan

Table: products

Optiq is not a database.

http://www.flickr.com/photos/torkildr/3462606643

http://www.flickr.com/photos/telstra-corp/5069403309/

Conventional database architecture

JDBC server

SQL parser /validatorQuery

optimizer

Metadata

DataData

Data-flowoperators

JDBC client

Optiq architecture

JDBC server

SQL parser /validatorQuery

optimizer

3rd partydata

3rd partydata

JDBC client

3rd

partyops

3rd

partyops

Optional

Pluggable

Core

MetadataSPI

Pluggablerules

What is Optiq?A really, really smart JDBC driver

Framework

Potential core of a data management system

Writing an adapterDriver – if you want a vanity URL like “jdbc:splunk:”

Schema – describes what tables exist (Splunk has just one)

Table – what are the columns, and how to get the data. (Splunk's table has any column you like... just ask for it.)

Operators (optional) – non-relational operations

Rules (optional, but recommended) – improve efficiency by changing the question

Parser (optional) – to query via a language other than SQL

Splunk AdapterRules for pushing down filters, projections

The tricky bit: changed the validator to allow tables to have any column

To be written: rules for pushing down aggregations, joins

(What you've seen today is in github.)

Would be really nice if... Splunk pushed down filters, projections, aggregations from its search pipeline to the MySQL connector. (Currently you have to hand-write a SQL statement.)

http://www.flickr.com/photos/walkercarpenter/4697637143/

Optiq roadmap ideasMondrian use Optiq to read from data sources such as Splunk

Kettle integration (read/write SQL to ETL)

Adapters: Cascading, MongoDB, Hbase, Apache Drill, …?

Front-ends: linq4j, Scala SLICK, Java8 streams

Contributions

ConclusionsLiberate your data!

Optiq is a framework

Build & share Optiq adapters

Questions?

@julianhyde

http://julianhyde.blogspot.com

http://github.com/julianhyde/optiq

http://github.com/julianhyde/optiq-splunk

Additional material: The following queries were used in the demo

select s."source", s."sourcetype" from "splunk"."splunk" as s;

select s."source", s."sourcetype", s."action" from "splunk"."splunk" as s

where s."action" = 'purchase';

select s."source", s."sourcetype", s."action" from "splunk"."splunk" as s

where s."action" = 'purchase';

select s."action", count(*)

from "splunk"."splunk" as s

group by s."action";

select s."action", s."method", count(*)

from "splunk"."splunk" as s

group by s."action", s."method";

select * from "mysql"."products";

select p."product_name", s."action"

from "splunk"."splunk" as s

join "mysql"."products" as p

on s."product_id" = p."product_id";

select p."product_name", s."action", COUNT(*) AS c

from "splunk"."splunk" AS s

join "mysql"."products" AS p

on s."product_id" = p."product_id"

where s."action" = 'purchase'

group by p."product_name", s."action"

order by c desc;

top related