search and analyze data in real time
TRANSCRIPT
![Page 1: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/1.jpg)
Search and Analyze Data in Real TimePrashant Shewale and Rohit Kalsarpe
![Page 2: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/2.jpg)
Agenda
1 Problem in validating logs
2 How Logstash can help
3 ELK Stack (Elastic Search, Logstash, Kibana)
4 Some hands on
5 How we used ELK stack in our automation framework
6 World beyond
![Page 3: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/3.jpg)
Problem in validating logs
Follow active log files.
Logs keep growing and are rotated.
Collating multiline logs in single event is difficult task.
We have different kinds of applications and hence different kinds of logs. And that have different formats.
![Page 4: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/4.jpg)
192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html HTTP/1.1" 200 3500 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET /style.css HTTP/1.1" 200 4138 www.yahoo.com "http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1.1" 200 10229 www.yahoo.com "http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/1.1" 400 1997 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ...)" "-"
Sample Apache Log
![Page 5: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/5.jpg)
Feb 4 06:10:09 techy sendmail[5392]: o140e90B005392: from=, size=2434, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]
Feb 4 06:10:09 techy sendmail[5380]: o140e9Mi005380: to=root, ctladdr=root (0/0), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=32168, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (o140e90B005392 Message accepted for delivery)
Sample SendMail Log
![Page 6: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/6.jpg)
Oct 20 03:45:50 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=1059 TOS=0x00 PREC=0x00 TTL=115 ID=31368 DF PROTO=TCP SPT=17992 DPT=80 WINDOW=16477 RES=0x00 ACK PSH URGP=0
Oct 20 03:46:02 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00 TTL=52 ID=763 DF PROTO=TCP SPT=20229 DPT=22 WINDOW=15588 RES=0x00 ACK URGP=0
Oct 20 03:46:14 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=324 TOS=0x00 PREC=0x00 TTL=49 ID=64245 PROTO=TCP SPT=47237 DPT=80 WINDOW=470 RES=0x00 ACK PSH URGP=0
Oct 20 03:46:26 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00 TTL=45 ID=2010 PROTO=TCP SPT=48322 DPT=80 WINDOW=380 RES=0x00 ACK URGP=0
Sample IPTable Log
![Page 7: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/7.jpg)
Use RegEx to parse data
Source:xkcd.com
![Page 8: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/8.jpg)
Actual RegEx to parse Apace log
![Page 9: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/9.jpg)
Source:xkcd.com
![Page 10: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/10.jpg)
How Logstash can help
LogStash is a data pipeline that helps you process logs from a variety of systems.
Logstash allows you to parse data and converge on a common format.
Logstash provides a fast and convenient way to custom logic for parsing these logs
Support for multiple plugins
![Page 11: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/11.jpg)
LogStash
Input Section Filter Section Output Section
• File• Stdin• Syslog• SNMP Traps• TCP/UDP• and many more
• Grok• Mutate• Geoip• Drop• and many more
• Elastic Search• File• Email• and many more
![Page 12: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/12.jpg)
Logstash Config File
input {
...
}
filter {
...
}
output {
...
}
![Page 13: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/13.jpg)
Logstash-forwarder
A tool to collect logs locally for processing elsewhere
Secure, low latency, low resource usage, and reliable.
Another option: Log-courier
Logstash-forwarder
Logstash
![Page 14: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/14.jpg)
ELK Stack
Elasticsearch, Logstash and Kibana
End-to-end stack that delivers actionable insights in real time from almost any type of structured and unstructured data sourceI. Logstash is used for cooking data
II. Elastic Search is used for storing this cooked data
III. Kibana gives shape to your data
Each one is packed and fully self contained in a jar and easy to use
![Page 15: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/15.jpg)
What is ELK?
Shipper
Shipper
Shipper
![Page 16: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/16.jpg)
What is ELK?
Shipper
Shipper
Shipper
![Page 17: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/17.jpg)
Elastic Search
Real time search and indexing tool
Easy to setup; RESTful API
Easy to cluster and scale
High Availability
Schema-Free
![Page 18: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/18.jpg)
What is ELK?
Shipper
Shipper
Shipper
![Page 19: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/19.jpg)
Kibana
Seamless Integration with Elasticsearch
Give Shape to Your Data
Sophisticated Analytics
Easy Setup
Simple Data Export
![Page 20: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/20.jpg)
What is ELK?
Shipper
Shipper
Shipper
![Page 21: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/21.jpg)
Demo
![Page 22: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/22.jpg)
How we used ELK stack in our automation framework
![Page 23: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/23.jpg)
Automation Box 1
Automation Box 2
Automation Box n
Mail Server
Mail Server
Mail Server
Logstash
Cook
Correlate
Elastic Search
Index
Store
Logs
Structured data
Structured data
![Page 24: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/24.jpg)
World Beyond
Analytics - count things and summarize your data.
Crawling and Document Processing1. For crawling, people are using both Scrapy and Nutch together
with Elasticsearch.
Variety of companies are using ELK stack to pump their search infrastructure. 1. Wikimedia
2. Empowers GitHub's 4 million members through providing search across GitHub's 8 million+ code repositories.
![Page 25: Search and analyze data in real time](https://reader035.vdocument.in/reader035/viewer/2022062711/55c59e5dbb61ebf16a8b47ed/html5/thumbnails/25.jpg)
Thank You