tb3248 - syslog connector performance tuning - girish mantry moehadi liang(1)
TRANSCRIPT
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
1/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syslog Connector
Performance TuningGirish Mantry, Moehadi LiangTechnical Solutions Consultants
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
2/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
Syslog Connector Performance Tuning
Agenda
In this session we will take a look at
Syslog connector variants Connector components and operation
Stages in the event flow
Performance bottlenecks and tuning at each stage
Out of memory problems and tuning
Customer cases
General recommendations
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
3/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syslog Connector PerformanceTuning
Syslog connector variants, components, operation and
event flow
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
4/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
Syslog Connector Performance TuningSyslog Connector Variants
Network
ListenersFil
Read
ArcSight CEF
Encrypted
Syslog (UDP)
Syslog
NG
Daemon
Syslog
Daemon
Syslog
Pipe
UDP
Raw TCP
Default port 514
UDP
Raw TCP
TLS
Default port 1999
UDP
Symmetric Key
Encryption
Default port 514
Only CEF format
Unix Pipe
Supported on all platforms
Configurable interfaces and ports
Supported only
Work in conjunction with
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
5/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5
Syslog Connector Performance Tuning
Syslog Connector Components
Device
Type 1
Device
Type 2
Device
Type N
Subagent
QueueC
Subagent
Subagent
Raw Events
ESM
Transport
Logger
Transport
C
C1 C2
Main Flow
C1 C2
Destination Flow
C1 C2
Destination FlowParsed Events
Processed Events
Note: Queuing only applies to network listeners and not for file readers
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
6/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Syslog Connector Performance TuningEvent Flow
Event Reception Event Queuing Event Parsing Event Processing
Receives network
packets on UDP/TCP
sockets
Extracts human
readable syslog raw
events from network
packets
Raw events are written
to a queue of files on
the file system in the
order in which they are
received
Raw events are picked
up from the file queue
in a FIFO manner and
parsed using regular
expressions
Information from
device log formats
normalized into
Arcsight event format
Normalized events are
categorized and
processed in many
ways useful for
correlation and asset
modeling
Events are batched,
filtered or aggregated
as required for
efficiency
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
7/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syslog Connector PerformanceTuning
Performance Bottlenecks in the Event Flow and Tuning
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
8/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Syslog Connector Performance Tuning
Event Reception Choice of Transport Protocol
UDP performs better on reliable networks
Use Raw TCP on unreliable networks
Use TLS for encrypted transport with Syslog NG
Bottleneck (when dealing with Raw TCP or TLS)
Java applications do not know when a client closes the connection with a FIN
Connections remain idle in a CLOSE_WAIT state until closed explicitly by the application
Idle connections can grow over a period of time and can exceed the connector limit or OS limit
Happens faster with large number of devices or with devices that create new connections frequently
Tuning
Parameter Default Recommendation
tcppeerclosedchecktimeout -1Set it to 30000 msec or higher to tell the connector to check for connections closed by p
and close them on the connector side as well
tcpmaxsockets 1000 Increase it higher as required to accommodate simultaneous connections from a large n
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
9/28
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
10/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
Syslog Connector Performance Tuning
Event Parsing Inspection and Device Type Detection
Multiple subagents with one subagent per device type with a parser that has a regex to match something unique in the log
Subagent parsers are ordered such that specific regexes come ahead of generic ones to detect device types accurately
Connector inspects messages from senders applying regexes in the order to detect the device type and associates the subagen
when a match is found. A single sender could be associated with multiple device types and subagents
Associated subagent parsers are used to parse messages from a sender and inspection process is not reapplied unless a messa
type is encountered from the same sender
Syslog senders and their associated subagent types can be seen in current/user/agent/syslog.properties
Bottleneck
Inspection process involving regex matching could be expensive because connector has more than 100 subagents
Tuning If you are sure of device types in your environment, you can restrict the subagent list by following properties
Parameter Default Recommendation
usecustomsubagentlist false Set it to true to make the connector consider the customized subagent list
customsubagentlist List of subagents (>100)Set it to the restricted subagent list based on device types in your environm
the original relative order of subagents not to affect the accuracy of subag
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
11/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
Syslog Connector Performance Tuning
Event Parsing - continued Regular expressions in parsers
Bottleneck
A badly written regular expression in the parser can be a big performance hit on the connector
Optimization
For supported device types, development went through optimizing the regular expressions in the respective parsers. If you are
syslog flex connector parsers, consider the following guidelines.
Make your regexes generic only as much as needed. Specific regular expressions perform better than generic ones
Use generic greedy expressions like .* and .+ at the end and not in the beginning or middle of a regular expression. Replace th
equivalents like .*? and .+? with a clear character or token marking the boundary.
Use of greedy expressions with more specific characters or meta characters is okay, ex:- \s+ for a continuous string of whites
\d+ for a continuous string of numerals or \w+ for a continuous string alpha numerals Maximum number of devices
Bottleneck
Connector allows up to a max of 5000 devices and does not process events from newer devices once this limit is reached
Tuning
Parameter Default Recommendation
syslog.max.device.count 5000 Increase it as required to match the number of devices in your environment
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
12/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
Syslog Connector Performance Tuning
Event Processing Agent Batching
Batch size controls how many events go together from component to component in the event flow and eventually to the destin
Doubling or tripling default size of 100 could help improve the performance internally as well as over networks with latency
Do not increase beyond that because it could have a negative impact by increasing memory requirements to hold the batches
Categorization
Categorization files for different device types are loaded into memory and some of those can be big
Connector base memory usage can be high when dealing with a large number of device types
Java heap space may need to be bumped up
External Map File Processing
External map file query is executed for every batch of events
Make sure the query is simple and returns fast, if you are using this feature
Connector Filtering
Make sure that the filter condition is optimized and not extremely complex
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
13/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
Syslog Connector Performance Tuning
Event Processing - continued Field Based Aggregation
Groups events with same values in specified fields into buckets and produces aggregated events on time interval expiry or reac
Restrict the field set to minimum required and choose an optimal event threshold value to keep the number of event field comp
Choose an optimal time interval not to block the event flow for too long
Avoid using preserve common fields setting in a high event volume environment
Name Resolution
Name resolutions are done in background threads and the event flow is not normally blocked for the answers to come back
If the Wait For Name Resolution feature is enabled, then the event flow is blocked for a certain timeout period for the answers
Do not enable Wait For Name Resolution feature in an environment requiring frequent resolutions
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
14/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Syslog Connector Performance Tuning
Event TransportEvent caching can occur for a number of reasons - network latency and problems in the destination are the com
Bottleneck
Excessive caching can cause delays in events reaching their destination
When cache becomes full, connector starts dropping events
Tuning
Enable transport multithreading (except when the root cause is a problem in the destination)
For the logger smart message transport, turn on the https persistent connection feature
Increase the cache size to hold events for longer in the cache and prevent loss of events
Parameter Default Recommendation
http.transport.threadcount 1 Applies only to the ESM transport. Increase it by small increments as re
transport.loggersecure.threads 1Applies only to the logger secure transport. Increase it by small increm
transport.loggersecure.connection.persistent falseApplies only to the logger secure transport. Change it to true for reusin
HTTPS connections and not tear them down for every batch of events
Cache Size 1GBIncrease it as required up to a limit of 50GB. This is a destination setting
configured using ESM console, connector appliance GUI or local connec
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
15/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syslog Connector PerformanceTuning
Out of Memory Problems and Tuning
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
16/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Syslog Connector Performance Tuning
Java Process Memory and ManagementMemory allocated to a java process consists of Heap Space and Native Memory
Heap space is allocated as instructed by java run time parameters
-Xms (Initial heap size), -Xmx(Maximum heap size), 256 MB by default on connectors
Native memory size = Process Memory size Size of Heap Space
Garbage Collection reclaims the memory of unused objects
Minor collections (GC), reclaims memory in YOUNG generation and moves survivors into OLD
Major collections (Full GC), reclaims memory in all of the Heap space, takes much longer
JVM stops the application threads during GC or Full GC
Frequent Full GCs affects application performance severely
A clear indicator for the need to increase the maximum heap size
Out of memory errors can happen in any of these memory areas
Memory limitations in 32 bit connector build
Total addressable space is 4GB, Kernel space ranges from 1GB to 2GB depending on OS
User space available for process is 2GB to 3GB depending on OS
Limits exist on max heap space: 1GB (connector appliances), 1.5 GB (Windows), 2 GB (Unix)
Use 64 bit connector build for higher memory
Process Mem
YOUNG Genera
Newly created obj
OLD Generat
Old objects surviving m
PERMANENT Gen
Classes, methods
Code Generation
Socket Buffers
Thread Stacks
Direct Memory S
JNI Code
Garbage Collect
JNI Allocated m
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
17/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Syslog Connector Performance Tuning
Dealing with Java out-of-memory errors
Errors Root Cause and Recommendation
java.lang.OutOfMemoryError:Java heap space
java.lang.OutOfMemoryError:
Requested array size exceeds
VM limit
Garbage collection is unable to free up more space and memory could not be allocated for new o
Increase the maximum heap size using -Xmx option in increments as required up to the limit
If this still does not help, there could be a potential memory leak or a bug open a support incid
logs and heap dumps
java.lang.OutOfMemoryError:
PermGen space
Permanent generation area has become full due to loading many classes statically or creating dy
creating too many interned strings
Default max size of PermGen space is 64 MB. Increase it in small increments using -XX:MaxPer
java.lang.OutOfMemoryError:
Unable to create a new native
thread
JVM is low on native memory and unable to create a new VM thread. Make more native memory a
Reducing the heap space using Xms and Xmx options
Reducing the stack space of using Xss option
Out of Memory Error
(allocation.cpp:211),
pid=16950, tid=1855142800
Displayed in the fatal error logs when the JVM crashes due to a malloc failure. The system could
RAM or swap space or the process size limit was hit on a 32 bit system. Take one or more of the f
Reduce memory load on the system or increase physical memory or swap space
Decrease the number of application threads, reduce the java heap space and stack space
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
18/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
Syslog Connector Performance TuningAdjusting memory optionsOn a software connector, add or edit settings in a file under current/user/agent folder
agent.wrapper.confwhen running as a service
setmem.sh(Unix) or setmem.bat (Windows) when running as a standalone application. This file may have to be created if it does n
set ARCSIGHT_MEMORY_OPTIONS="-Xms256m -Xmx256m (Example only. Add or remove options as required inside the doub
export ARCSIGHT_MEMORY_OPTIONS (only on Unix)
On a connector appliance
Only heap space can be changed using a container command Configure Memory Settings
Other settings can be changed using SSH or diagnostic tools file editor using the same mechanisms as for a software connector
Memory Type Running as service Running as a standalone applicat
Heap Space wrapper.java.initmemory=256 (initial heap size)wrapper.java.maxmemory=256 (maximum heap size)
-Xms256m Xmx1024mIt is recommended to increase only the max hea
Perm Gen Space
Add additional java parameters with adjusted indexes
wrapper.java.additional.7=-XX:PermSize=64m
wrapper.java.additional.8=-XX:MaxPermSize=128m
-XX:PermSize=64m -XX:MaxPermSize=128m
It is recommended to increase only the max pe
Stack space
Add an additional java parameter with adjusted index
wrapper.java.additional.9=-Xss=64k
-Xss=64k Default stack size is OS dependent. A
observe. Too low a value can cause StackOverfl
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
19/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syslog Connector PerformanceTuning
Customer Cases
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
20/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20
Syslog Connector Performance TuningTroubleshooting
Changing the transport protocol to UDP or Raw TCP did not help
Could not reproduce the problem in house
Customer captured tcpdump packets and analyzed them using Wireshark
Large number of TCP Window Full messages SEQ/ACK analysis showed that at times there is more than 10KB data in flight indicating that the
receiver is too slow to process the incoming flood of packets
TCP receive buffer and window sizes got reduced over time which contributed to the slow reception
Further enquiries revealed that the Syslog NG connector is receiving TLS data from 2 other sources
With this new discovery of customer environment, problem could also be reproduced in house
Observed a high memory usage and Increased the heap space to1024 MB, but it did not help
Root Cause
Destination Syslog NG connector did not close TCP connections when sources closed connections Growing TCP connections forces receive buffer size to be reduced causing slower reception
Solution
Set the tcppeerclosedchecktimeout parameter to 30000 msec (half a minute)
This parameter tells the connector to proactively check and close any TCP sockets
Source
Connector
S
Customer CCEF Syslog TLS destinat
200eps, while ESM and
cache for the same even
Syslog NG
Source 1
Syslog NG
Source 2
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
21/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21
Syslog Connector Performance Tuning
CustomerHuge difference of
Fortigate Firewall a
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:35:29 | [Full GC
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:37:08 | [Full GC
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:38:52 | [Full GC
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:40:30 | [Full GC
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:42:06 | [Full GC
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:43:47 | [Full GC
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:45:32 | [Full GC
agent.out.wrapper.log:INFO | jvm 1 | 2012/12/05 11:47:10 | [Full GC
Queue Rate(SLC) vs Events/Sec(SLC) Queue Drop Count Memory usage (Total vs Used)
Events/Sec(SLC) vs Throughput(SLC) Cache Size and Current Drop count Frequent Full GCs
Observations
Incoming event rate was much higher than the processing rate and connector was queuing heavily
During peak hours, queuing has exceeded the size limit and dropped a huge number of events
Caching observed during peaks hours and some events were dropped when cache size limit is exceeded
High memory usage and frequent Full GCs were observed affecting the performance of the connector
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
22/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22
Syslog Connector Performance Tuning
Customer Case 2 SolutionMachines hosting the connectors were very powerful (64 bit Linux, 48 core CPU, 128 GB RAM, 600 GB hard disk)
Actions Taken
Increased the java heap size to 2048 MB to reduce the frequency of full GCs
Enabled syslog parser multi-threading to keep up with the queuing rate
Increased the file queue size from 100 to 2000 files of 10MB equivalent to 20 GB in total size to prevent dropping of events from fi
Increased the cache size from 1GB to 10GB to prevent dropping of events from cache during peak hours
The above measured helped the performance of the connector significantly
Where it did not help solve the problem completely, we asked the customer to split the event volume among multiple syslog co
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
23/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
Syslog Connector Performance Tuning
Onboard
Connecto
Customer Customer had logg
events to an ESM m
loggers were exper
caching
Observations
Time per Batch= roundtrip time taken for a batch of events to travel from logger to ESM and the
acknowledgment for the batch to come back from ESM logger
The US logger took an average of 40 msec/batch and the UK logger took an average over 500 msec/batch
This large difference in the round trip time is indicative of network latency due to geographical distance and isthe root cause of caching in the UK logger Logger in
Solution
Enabled multithreading on the ESM transport with a thread count of 2, this showed an improvement in throughput
Increased the thread count to 7 (number of processors in the CPU) and caching went away completely
USA Logger: Time per Batch ~ 40 msec UK Logger: Time per Batch > 500 msec
f
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
24/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syslog Connector PerformanceTuning
Recommendations
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
25/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25
Syslog Connector Performance Tuning
Some Recommendations
Evaluate the number of devices, device types and event volume early in your deployment cyc
Split the load among multiple connectors when the incoming event rate exceeds the achievab
that varies based on the underlying platform and environment
When splitting the load, consider grouping the devices of same type to one connector and ano
different connector
Evaluate the total capacity of your machine and other processes running to determine the nu
connectors to install on a single machine
Cumulative heap size allocated to connectors and other java processes should be well below
memory available on the system
Use 64 bit syslog connector builds to overcome the out of memory errors
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
26/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26
Please fill out a survey.
Hand it to the door monitor on your way out.
Thank you for providing your feedback, which
helps us enhance content for future events.
Session TB3248 Speakers Girish Mantry, Moehadi Liang
Please give me your feedback
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
27/28
-
8/10/2019 TB3248 - Syslog Connector Performance Tuning - Girish Mantry Moehadi Liang(1)
28/28
Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.