building a data driven search application with lucidworks silk
DESCRIPTION
LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following: - A powerful UI to analyze time series data stored in Lucene/Solr - Creating and sharing visualizations, dashboards and reports - Discovery and analysis of data coming from servers, applications, devices and more - Exploration of click, geospatial and social data in ways previously unimaginableTRANSCRIPT
Confidential and Proprietary © Copyright 2013
Building a Data-Driven Log Application
with SILK
April 21, 2014Search | Discover | Analyze
Confidential and Proprietary © Copyright 2013
Agenda
• Introduction to LucidWorks• The Continuum of Search• LucidWorks SILK
– Enabling Big Data Search– 360-degree view of customers and systems– Breakthrough ROI
• Solution Components• Demonstration• Summary and Q&A
Confidential and Proprietary © Copyright 2013
Speakers
• Chief Product Officer at LucidWorks• 15 years product, marketing and BD
experience• Prior to LW 8 years @Splunk (Employee ~9)• Proud Search Snob
• Leads LucidWorks’ newly created Solutions team
• 16-year track record of data-driven solutions– Customer analytics/nano-targeting– Improving product development operations– Video processing and transmission
• Establishing search as the paradigm for solving the "last mile problem" of big data
Confidential and Proprietary © Copyright 2013
Commercial entity behind Lucene/Solr - industry leading open search engine:
• 300+ enterprise customers
• Consulting, training, SLAs and “Pro-Active Support” for open source
LucidWorks platform provides advanced search capabilities directly on Solr:
Connectors , Entity Extraction, Security, pipelines, rules and more…
Solutions (e.g SiLK & LucidWorks App for Splunk) to help streamline use case adoption. Platform
Who is LucidWorks
Confidential and Proprietary © Copyright 2013
Intranet Search Knowledge Base
E-Discovery E-Commerce
‘Big Data Search’
Application Innovation
Index Characteristics
‘Enterprise Search’
‘Intelligent Search’
Gigabyte scale Single instance Full-text
Terabyte Scale Cluster-ready Structured/
Unstructured Data Near real-time
Search on Hadoop Log Analysis Fraud Detection
Unlimited Scale Cloud-ready Handles any data
type Real-time NoSQL Alternative
Continuum of Search
Confidential and Proprietary © Copyright 2013
Creates the data access layer leveraged by best-in-class data-drivenapplications:
is the choice of those building data-driven applications at massive scale
6
Solr is the Choice
Confidential and Proprietary © Copyright 2013
A Big Data Search search index
Unlimited Scale Cloud-ready Handles any data type Real-time NoSQL Alternative
7
Creates the data access layer
At-Hoc Discovery Personalization Context
That developers & users demand in
their Big Data applications
Big Data Search
is the partner of choice to deliver next generation search by the leading Big Data vendors
Confidential and Proprietary © Copyright 2013
Big Data Ecosystem WITHOUT LucidWorks Search
Input Data Stream
Traditional RDBMS/EDWDoc Stores
Platform for Data Storage and Machine Learning
Difficult Getting Value from Data
1. Opaque2. Narrow views into data3. Out-of-date4. Not Actionable5. Accessible mostly to
expert users6. Expensive, ineffective
translation to broader set of users
Product Mgr’s
Business Users
Rest of Org
Data Scientist
BI AnalystIT
HDFS; NoSQL; Hadoop
Real-time Processing
Confidential and Proprietary © Copyright 2013
Input Data Stream
Traditional RDBMS/EDWDoc Stores
Directly Access Data and Insights to Drive Actions:
Breakthrough ROI
Predictive
Relevant
Actionable
Timely
HDFS; NoSQL; Hadoop
Real-time Processing
Lucene/Solr
Solving the Last Mile Problem of Big Data
Confidential and Proprietary © Copyright 2013
Solution Components
Gateway
JDBC Connector
Web/File System Crawl
Data Warehouse
Hadoop Connectors
Clickstream Networking
Data Sources
Connectors
Servers
Confidential and Proprietary © Copyright 2013
Events from App/Server/Web Logs,etc
• Application Logs– 2013-12-18 01:37:20,637 INFO core.SolrCore - [collection1] webapp= path=/browse
params={fl=lucid_facet&facet.query={!tag%3Done_day}dateCreated:[NOW-1DAY/DAY+TO+NOW/DAY] &facet.query={!tag%3Done_year}dateCreated:[NOW-365DAYS/DAY+TO+NOW/DAY]&start=260&q=faceting&f.project.facet.limit=20&role=DEFAULT&req_type=main&hl.simple.post=</span>&facet.field={!ex%3Dsource}source&facet.field={!ex%3Dsource}list_type&facet.field={!ex%3Dsource}issue_status&facet.field={!ex%3Dsource}lucid_facet&facet.field={!ex%3Dproject}project&facet.field={!ex%3Dauthor_display}author_display} hits=6761 status=0 QTime=14
• Firewall Logs– Apr 07 2014 10:14:56 eventid='1278457197410173971' severity=severe
category="Penetrate/ArpPoisoning" hostId=r signature=3201-2 description="Unix Password File Access Attempt" attacker=110.236.0.15 target=27.96.128.0 target=141.146.8.66 gc_score="-5" gc_riskdelta="3" gc_riskrating="false" gc_deny_packet="true" gc_deny_attacker="false”
• Web Logs– 50.17.233.225 - - [09/Mar/2014:06:26:50 -0700] "GET / HTTP/1.1" 200 24442 "-" "Mozilla/5.0 (X11;
U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 »
• Syslogs– Apr 17 07:00:42 Lucids-MacBook-Pro-25.local Microsoft Outlook[2461]: CGSCopyDisplayUUID:
Invalid display 0x18d88a81
• Other—Database Logs, Click Data, Conversions, Social Media (Tweets…), Financial Data, Product Catalogs, Knowledge Base, etc.
• Volume, Variety and Velocity
Confidential and Proprietary © Copyright 2013
Application Development Process
• Understand your Users• Know your Data• Prepare and Ingest Data into Solr• Build Visualizations• Iterate
Confidential and Proprietary © Copyright 2013
Search Analytics—Understand your Users
• Who will use this application– Business User (eCommerce or KM), IT and Search
Administrators
• What are they interested in?– What are people searching for?– Which queries are returning zero hits?– Which searches are providing slow response times?– What is my memory & cpu usage, jvm metrics, etc.?– Is there a trend in my slow searches?– Is the cache warm-up time very large?
• First three of interest to Business User, Search Admins/Developers interested in all six.
Confidential and Proprietary © Copyright 2013
Search Analytics–Know your Data
• Where is the data available?– Core Logs– Core Request Logs– Connector Logs– Mbeans API– Log4j
• Data Connectors– LogStash (for this example)– Hadoop Job Jar
Confidential and Proprietary © Copyright 2013
Centralized Logging Infrastructures
• Can be built using a combination of LogStash, Apache Flume, Lumberjack, Rabbit MQ, Apache Kafka, etc.
• Today’s example uses LogStash—extensive documentation at http://logstash.net/docs/1.4.0
Shipper
Shipper
Broker Indexer
Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway(Reverse Proxy)
Solr Output Writer for
LogStash (Http)
Search Logs
Visualization Configurable Dashboards
Hadoop ConnectorGrokIngestMapperLogStash
Confidential and Proprietary © Copyright 2013
5 DEMO
Search | Discover | Analyze
Confidential and Proprietary © Copyright 2013
Confidential and Proprietary © Copyright 2013
• Contacts– Will Hayes, Chief Product Officer
[email protected] twitter:@iamwillhayes
– Ravi Krishnamurthy, Director of Solutions [email protected]
• Links– http://www.lucidworks.com/silk
Q & A