building a data driven search application with lucidworks silk

18
Confidential and Proprietary © Copyright 2013 Building a Data- Driven Log Application with SILK April 21, 2014 Search | Discover | Analyze

Upload: lucidworks-archived

Post on 27-Jan-2015

121 views

Category:

Technology


4 download

DESCRIPTION

LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following: - A powerful UI to analyze time series data stored in Lucene/Solr - Creating and sharing visualizations, dashboards and reports - Discovery and analysis of data coming from servers, applications, devices and more - Exploration of click, geospatial and social data in ways previously unimaginable

TRANSCRIPT

Page 1: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Building a Data-Driven Log Application

with SILK

April 21, 2014Search | Discover | Analyze

Page 2: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Agenda

• Introduction to LucidWorks• The Continuum of Search• LucidWorks SILK

– Enabling Big Data Search– 360-degree view of customers and systems– Breakthrough ROI

• Solution Components• Demonstration• Summary and Q&A

Page 3: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Speakers

• Chief Product Officer at LucidWorks• 15 years product, marketing and BD

experience• Prior to LW 8 years @Splunk (Employee ~9)• Proud Search Snob

• Leads LucidWorks’ newly created Solutions team

• 16-year track record of data-driven solutions– Customer analytics/nano-targeting– Improving product development operations– Video processing and transmission

• Establishing search as the paradigm for solving the "last mile problem" of big data

Page 4: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Commercial entity behind Lucene/Solr - industry leading open search engine:

• 300+ enterprise customers

• Consulting, training, SLAs and “Pro-Active Support” for open source

LucidWorks platform provides advanced search capabilities directly on Solr:

Connectors , Entity Extraction, Security, pipelines, rules and more…

Solutions (e.g SiLK & LucidWorks App for Splunk) to help streamline use case adoption. Platform

Who is LucidWorks

Page 5: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Intranet Search Knowledge Base

E-Discovery E-Commerce

‘Big Data Search’

Application Innovation

Index Characteristics

‘Enterprise Search’

‘Intelligent Search’

Gigabyte scale Single instance Full-text

Terabyte Scale Cluster-ready Structured/

Unstructured Data Near real-time

Search on Hadoop Log Analysis Fraud Detection

Unlimited Scale Cloud-ready Handles any data

type Real-time NoSQL Alternative

Continuum of Search

Page 6: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Creates the data access layer leveraged by best-in-class data-drivenapplications:

is the choice of those building data-driven applications at massive scale

6

Solr is the Choice

Page 7: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

A Big Data Search search index

Unlimited Scale Cloud-ready Handles any data type Real-time NoSQL Alternative

7

Creates the data access layer

At-Hoc Discovery Personalization Context

That developers & users demand in

their Big Data applications

Big Data Search

is the partner of choice to deliver next generation search by the leading Big Data vendors

Page 8: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Big Data Ecosystem WITHOUT LucidWorks Search

Input Data Stream

Traditional RDBMS/EDWDoc Stores

Platform for Data Storage and Machine Learning

Difficult Getting Value from Data

1. Opaque2. Narrow views into data3. Out-of-date4. Not Actionable5. Accessible mostly to

expert users6. Expensive, ineffective

translation to broader set of users

Product Mgr’s

Business Users

Rest of Org

Data Scientist

BI AnalystIT

HDFS; NoSQL; Hadoop

Real-time Processing

Page 9: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Input Data Stream

Traditional RDBMS/EDWDoc Stores

Directly Access Data and Insights to Drive Actions:

Breakthrough ROI

Predictive

Relevant

Actionable

Timely

HDFS; NoSQL; Hadoop

Real-time Processing

Lucene/Solr

Solving the Last Mile Problem of Big Data

Page 10: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Solution Components

Gateway

JDBC Connector

Web/File System Crawl

Data Warehouse

Hadoop Connectors

Clickstream Networking

Data Sources

Connectors

Servers

Page 11: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Events from App/Server/Web Logs,etc

• Application Logs– 2013-12-18 01:37:20,637 INFO core.SolrCore - [collection1] webapp= path=/browse

params={fl=lucid_facet&facet.query={!tag%3Done_day}dateCreated:[NOW-1DAY/DAY+TO+NOW/DAY] &facet.query={!tag%3Done_year}dateCreated:[NOW-365DAYS/DAY+TO+NOW/DAY]&start=260&q=faceting&f.project.facet.limit=20&role=DEFAULT&req_type=main&hl.simple.post=</span>&facet.field={!ex%3Dsource}source&facet.field={!ex%3Dsource}list_type&facet.field={!ex%3Dsource}issue_status&facet.field={!ex%3Dsource}lucid_facet&facet.field={!ex%3Dproject}project&facet.field={!ex%3Dauthor_display}author_display} hits=6761 status=0 QTime=14

• Firewall Logs– Apr 07 2014 10:14:56 eventid='1278457197410173971' severity=severe

category="Penetrate/ArpPoisoning" hostId=r signature=3201-2 description="Unix Password File Access Attempt" attacker=110.236.0.15 target=27.96.128.0 target=141.146.8.66 gc_score="-5" gc_riskdelta="3" gc_riskrating="false" gc_deny_packet="true" gc_deny_attacker="false”

• Web Logs– 50.17.233.225 - - [09/Mar/2014:06:26:50 -0700] "GET / HTTP/1.1" 200 24442 "-" "Mozilla/5.0 (X11;

U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 »

• Syslogs– Apr 17 07:00:42 Lucids-MacBook-Pro-25.local Microsoft Outlook[2461]: CGSCopyDisplayUUID:

Invalid display 0x18d88a81

• Other—Database Logs, Click Data, Conversions, Social Media (Tweets…), Financial Data, Product Catalogs, Knowledge Base, etc.

• Volume, Variety and Velocity

Page 12: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Application Development Process

• Understand your Users• Know your Data• Prepare and Ingest Data into Solr• Build Visualizations• Iterate

Page 13: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Search Analytics—Understand your Users

• Who will use this application– Business User (eCommerce or KM), IT and Search

Administrators

• What are they interested in?– What are people searching for?– Which queries are returning zero hits?– Which searches are providing slow response times?– What is my memory & cpu usage, jvm metrics, etc.?– Is there a trend in my slow searches?– Is the cache warm-up time very large?

• First three of interest to Business User, Search Admins/Developers interested in all six.

Page 14: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Search Analytics–Know your Data

• Where is the data available?– Core Logs– Core Request Logs– Connector Logs– Mbeans API– Log4j

• Data Connectors– LogStash (for this example)– Hadoop Job Jar

Page 15: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Centralized Logging Infrastructures

• Can be built using a combination of LogStash, Apache Flume, Lumberjack, Rabbit MQ, Apache Kafka, etc.

• Today’s example uses LogStash—extensive documentation at http://logstash.net/docs/1.4.0

Shipper

Shipper

Broker Indexer

Page 16: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

Solr/Solr Cloud

Search Analytics—Data Ingestion & Visualization

Gateway(Reverse Proxy)

Solr Output Writer for

LogStash (Http)

Search Logs

Visualization Configurable Dashboards

Hadoop ConnectorGrokIngestMapperLogStash

Page 17: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

5 DEMO

Search | Discover | Analyze

Confidential and Proprietary © Copyright 2013

Page 18: Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013

• Contacts– Will Hayes, Chief Product Officer

[email protected] twitter:@iamwillhayes

– Ravi Krishnamurthy, Director of Solutions [email protected]

• Links– http://www.lucidworks.com/silk

Q & A