finding the needle in the haystack with elk...finding the needle in the haystack with elk...
TRANSCRIPT
S
Finding the needle in the haystack with ELK
Elasticsearch for Incident Handlers and Forensic Analysts
Whoami
S Working for the Belgian Government my own company S Incident Handling S Malware analysis
S Forensics (network + system)
S Open Source minded
S Creator of MISP – Malware Information Sharing Platform
S Creator of pystemon – pastebin monitoring tool
S Core organizer of the FOSDEM conference for many years
S Contact me: [email protected]
S
Finding the needle in the haystack with ELK
Elasticsearch for Incident Handlers and Forensic Analysts
image by James Lumb
What tools do you use?
S Text logs
S notepad
S Grep
S awk / sed / cut
S MS Excel / OOo Calc
image by velorichard.wordpress.com
Optimizing
S grep -F log.txt
S zgrep -F log.txt
S zgrep -f patterns.txt -F log.txt
S find "$LOGS_DIR" -iname "*.gz" -print0 | parallel --gnu -0 -n1 -P8 zgrep -f patterns.txt –F > result-all.txt
S Fast for single search, however no column lookup !
Optimizing
S MySQL / MS Access
S Splunk S free = 500MB/day
S ELSA – Enterprise Log Search and Archive S Limitation of the # of columns
S ${COMMERCIAL_TOOL}
Trick for Splunk Addicts
S Limit is 500 MB /day
S 3 license violations allowed per month
S Set the date to 00:01 AM
S Index as much as possible 24h/day for 3 days (while loops are your friend)
S Enjoy searching
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or position of the moon
S Open Source, Free to use, commercial support
Configurations
S https://github.com/cvandeplas/ELK-forensics
S Repository with Logstash and Kibana configurations
S Mactime, BlueCoat, Mail IMSS, IWSVA, IIS, SuperTimeline, Plaso, …
S http://christophe.vandeplas.com/2014/06/setting-up-single-node-elk-in-20-minutes.html
S Our focus today: S Forensics and Incident Handling
S Batch-Import
S
How does it work?
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or position-of-the-moon-licensing
S Open Source, Free to use, commercial support
Inputs
S Inputs & codecs S collectd, drupal_dblog, elasticsearch, eventlog, exec, file,
ganglia, gelf, gemfire, generator, graphite, heroku, imap, invalid_input, irc, jmx, log4j, lumberjack, pipe, puppet_facter, rabbitmq, rackspace, redis, relp, s3, snmptrap, sqlite, sqs, stdin, stomp, syslog, tcp, twitter, udp, unix, varnishlog, websocket, wmi, xmpp, zenoss, zeromq
S cloudtrail, collectd, compress_spooler, dots, edn, edn_lines, fluent, graphite, json, json_lines, json_spooler, line, msgpack, multiline, netflow, noop, oldlogstashjson, plain, rubydebug, spool
S Outputs
S Filters
Input Example
S I usually don’t use “file” as input
S Keeps a reference to the position in the file
S TCP socket is the easiest for me
S ncat log01.lab.internal 18001 < logfile.log!
Outputs
S Inputs & codecs
S Outputs S boundary, circonus, cloudwatch, csv, datadog,
datadog_metrics, elasticsearch, elasticsearch_http, elasticsearch_river, email, exec, file, ganglia, gelf, gemfire, google_bigquery, google_cloud_storage, graphite, graphtastic, hipchat, http, irc, jira, juggernaut, librato, loggly, lumberjack, metriccatcher, mongodb, nagios, nagios_nsca, null, opentsdb, pagerduty, pipe, rabbitmq, rackspace, redis, redmine, riak, riemann, s3, sns, solr_http, sqs, statsd, stdout, stomp, syslog, tcp, udp, websocket, xmpp, zabbix, zeromq
S Filters
Output Example
Filters
S Inputs & codecs
S Outputs
S Filters S advisor, alter, anonymize, checksum, cidr, cipher, clone,
collate, csv, date, dns, drop, elapsed, elasticsearch, environment, extractnumbers, fingerprint, gelfify, geoip, grep, grok, grokdiscovery, i18n, json, json_encode, kv, metaevent, metrics, multiline, mutate, noop, prune, punct, railsparallelrequest, range, ruby, sleep, split, sumnumbers, syslog_pri, throttle, translate, unique, urldecode, useragent, uuid, wms, wmts, xml, zeromq
Filter Example
Filter Example
Grok
S Named regular expressions to match patterns/extract data.
S Logstash ships with lots of patterns ! https://github.com/elasticsearch/logstash/tree/master/patterns
S Test app: http://grokdebug.herokuapp.com
Testing complex Groks
Data Enrichment with Filters
S Extract fields: csv, grok, kv!
S Extract date!
S Modify using mutate!
S Enrich with S Geoip
S User-agent
S Urldecode
S Translate
S …
Geoip
Geoip
User-Agent
User-Agent
Translate
Translate
Ruby as last resort
* There might be a better way to do this, but ruby and I are not really friends yet
Data Enrichment with Filters
S Extract fields: csv, grok, kv!
S Extract date!
S Modify using mutate!
S Enrich with S Geoip
S User-agent
S Urldecode
S Translate
S …
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or season-licensing
S Open Source, Free to use, commercial support
Elasticsearch
S Wikipedia: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
S Very very fast
S Adding an node = easier than extremely easy
Elasticsearch
S Be cautious
S No security by default
S Auto-discovery, auto-distribution if other node is present
S Elastic HQ plugin S cd /usr/share/elasticsearch/bin!S ./plugin -install royrusso/elasticsearch-HQ!
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or horoscope-licensing
S Open Source, Free to use, commercial support
Kibana
S Fancy GUI
S Extremely easy to build up a dashboard
S Gives good overview over data
S Powerful, but limited in capability
S For more: write a python script or use REST API
DO NOT PRESS
THIS BUTTON
Search syntax
S Apache Lucene Search syntax
S title:foo title:"foo bar”
S title:"foo bar” AND body:"quick fox”
S (title:"foo bar" AND body:"quick fox") OR title:fox
S title:foo -title:bar
S title:foo*bar
S time_taken:[10000 TO 999999999]
http://www.lucenetutorial.com/lucene-query-syntax.html
Load dashboards
Filter
S
Performance
Performance goals
S Focus Incident Handling and Forensics
S Max speed of indexing
S Max speed of searching
S During indexation search may be slow
S No need for redundancy
S So don’t use this advice for operations-live-production
Performance Logstash
S Memory setting: (/etc/default/elasticsearch) S LS_HEAP_SIZE="500m"!
S Command line flag: S -w or –filterworkers AMOUNT_OF_CORES (default: 1)!
S Each extra filter slows it down S Grok aka regex = slow
S Prefer csv, kv
S Use the least possible wildcards (* or +)!
S Geoip = slow but very practical
S User-agent = slow, often practical
Performance Elasticsearch
S Memory setting (/etc/default/elasticsearch) S ES_HEAP_SIZE=12g => set to half of RAM (max 32 GB)
S Disable redundancy (/etc/elasticsearch/elasticsearch.yml)
S index.number_of_replicas: 0!
S Shards for number of nodes (/etc/elasticsearch/elasticsearch.yml) S index.number_of_shards: 1
S Increase memory buffer for search S indices.memory.index_buffer_size: 50%!
Perf. Elasticsearch Indexes
S Open Index = memory usage + disk usage Closed Index = disk usage, so close index when not needed
S Per case new indexes Similar logs in the same index, but use a field “host” to differentiate investigations S system timelines: logstash-%{[case]}-%{[type]}
S mail logs: logstash-%{[case]}-%{[type]}-%{+YYYY.MM}
S proxy logs: logstash-%{[case]}-%{[type]}-%{+YYYY.MM.dd}
S curl -XPOST 'localhost:9200/logstash-${case}*/_close' curl -XPOST 'localhost:9200/logstash-${case}*/_open'!
Performance Kibana
S Each block/graph is extra search
S So 10 graphs equals 10 simultaneous searches
1. First select small date/time window
2. Test your search on small data set
3. Add filters
4. Zoom out on date/time
5. Dig deeper
Keep in mind
S Logstash is (relatively) SLOW
S Finished? Close the index, do NOT delete it
S Or save JSON to files (output plugin Logstash), re-index them later
S Node++ = Speed++
S
Forensic analysis
Plaso
S Plaso = the new log2timeline and more
S log2timeline.py win7-64-nfury-10.3.58.6.dump /path/to/disk/image
S psort.py -o elastic win7-64-nfury-10.3.58.6.dump
ELK-forensics
S https://github.com/cvandeplas/ELK-forensics
S Logstash configs
S Kibana dashboards
S Mactime, Log2timeline csv, BlueCoat, Mail IMSS, IWSVA, IIS
S More to come
Other interesting projects using Elasticsearch
S Moloch – Open Source large scale IPv4 full PCAP capturing, indexing and database system. https://github.com/aol/moloch
S Mozdef – PoC – automate IH process and facilitate real-time activities - https://github.com/jeffbryner/MozDef
S Suricata – Exports data in EVE format (JSON). Great to visualize malware activity from sandbox
S
Places to be? • https://github.com/cvandeplas/ELK-forensics • http://www.elasticsearch.org/overview/elkdownloads/ • http://logstash.net/ • https://groups.google.com/forum/#!forum/logstash-users