architectureguide · 2020-05-01 · considerations...

SNYPR 6.3.1 On-Prem

Architecture Guide

Date Published: 7/9/2020

Table of ContentsIntroduction 4

SNYPR Architecture 8

Hadoop Components 8High Availability 10

Data Ingestion 16

Phase 1: Collect and Publish 16Phase 2: Enrichment 17Phase 3: Processing 18Indexing Incoming Events 20

Deployment Alternatives 21

Dedicated SNYPR Deployment 21SNYPR Deployment with Existing Hadoop Infrastructure 23

Deployment Assumptions 25

SNYPR Kafka Topic Partitioning Reference 25SNYPR Search Shard Allocation Reference 26SNYPR YARN Resource Allocation Reference 27

Search Deployment Options 30

Search Index Storage Estimates 32

Disaster Recovery Alternatives 36

Alternatives 36Considerations 36

Network Bandwidth 39

Network Bandwidth Characteristics by Tier 40Network Bandwidth Requirements from RIN Collection Tier to MessagingTier 41

Virtual Infrastructure 43

Considerations for virtual deployments 43

SNYPR Cloud Deployment 44

Deployment Architecture 44

Considerations 45

Amazon EC2 45Microsoft Azure 46

Google Cloud 48

SNYPR Reference Hardware 50

SNYPR Architecture Guide 2

Reference Server Specifications 53

Hardware Specifications 53Server Mount Point 56Alternatives for Limiting the Size of the Infrastructure 68

Sizing and Capacity Planning 70

Server Types 70Assumptions 70Deployment Option 1: Dedicated Snypr Data Lake 72Deployment Option 2: Existing Data Lake 73

Spark Jobs Configuration for Kerberized Kafka 75

Network Tuning Recommendations 76

RIN Syslog Configuration 83

Hadoop Cluster Tuning Recommendations 84

Hadoop Cluster Performance 84Hadoop Cluster Log Configuration 99

Remote Ingestion Node Tuning 103

Reference Server Used is a VM 103Server Preparation 104Recommended Tools for Network Statistics 104Tune Server Network Parameters 105Performance Scenarios 117Best Practices 118Common Errors 120References 120


Introduction

IntroductionSNYPR is a big data security analytics platform built on Hadoop that utilizes Securonixmachine-learning-based anomaly detection techniques and threat models to detectsophisticated cyber and insider attacks. SNYPR uses Hadoop both as its distributedsecurity analytics engine and long-term data retention engine. Hadoop nodes can beadded as needed, allowing the solution to scale horizontally to support hundreds ofthousands of events per second (EPS).

SNYPR features:

l Supports a rich variety of security data, including security event logs, user identitydata, access privileges, threat intelligence asset metadata, and netflow data.

l Normalizes, indexes, and correlates security event logs, network flows, and applic-ation transactions.

l Utilizes machine learning-based anomaly detection techniques, including behaviorprofiling, peer group analytics, pattern analysis, and event rarity to detect advancedthreats.

l Provides out-of-the-box threat and risk models for detection and prioritization ofinsider threat, cyber threat, and fraud.

l Risk-ranks entities involved in threats to enable an entity-centric (user or devices)approach to mitigating threats.

l Provides Spotter, a blazing-fast search feature with normalized search syntax that

enables investigators to investigate today’s threats and track advanced persistentthreats over long periods of time, with all data available at all times.

Documentation ConventionsThere are different font styles used throughout the SNYPR documentation to indicatespecific information. The table below describes the common formatting conventionsused in the documentation:


Introduction

Convention Description

Bold font

Words in bold can indicate the following:

l Buttons that you need to click

l Fields in the user interface (UI)

l Menu options in the UI

l Information you need to type or select

Indicates commands or code.

Menu navigationThe navigation path to reach a specific screen in the UI is separatedby a greater than symbol (>). For example, Menu > Administration.

UPPERCASE FONT All uppercase words are acronyms.

Folders and folderpaths

Quotation marks are used around a folder name or folder path. Forexample, “C:\Documents\UserGuide”.

Document Name Audience

Installation Guide

System administrators, system integrators,

and deployment teams who need to install

the application.

RIN Installation Guide

On-boarding team and deployment

engineers who need to install the RIN to

connect to the SNYPR application to ingest

data.

Data Integration Guide

Data integrators who need to import

activity and enrichment datasources to

support existing and custom use cases.


Introduction


Content Guide

l Data Integrators and deployment

engineers who need to use existing

connectors to import data and deploy

available content.

l Content developers who need to use

the out-of-the-box content to detect

the threats to your

organization.

Analytics Guide

Content developers who need to use the

existing content and custom analytics

available in the SNYPR platform to

develop use cases to detect the threats to

your organization.

Security Analyst Guide

l Information security professionals and

security analysts who need to detect

and manage threats.

l Risk and compliance officers and IT

specialists who need to use SNYPR

reporting capabilities to monitor and

remediate compliance.

Access Analytic Guide

l Information security professionals and

security analysts who need to detect

and remediate high-risk access due to

orphaned accounts, privilege creep, or

account compromise.

l Compliance officers and data owners

who need to review and remediate

access for privilege creep,

SOD violations, and orphaned

accounts.


Introduction


Administrator Guide

l System administrators and service

providers who need information about

how to monitor and administer the

platform at a systems level.

l Business managers and other users in a

supervisory role who need information

about how to use SNYPR to grant

employees and partners access to

applications, check for policy

violations, and manage cases.


SNYPR Architecture

SNYPR Architecture

Hadoop ComponentsSNYPR users a Hadoop cluster for processing all data. The core Hadoop components

include the following services:

l HDFS (Hadoop Distributed File System): Used to store security events and viol-ations. Data is stored in compressed parquet format.

l YARN (Yet Another Resource Negotiator): Provides resource management cap-abilities for jobs.

l Spark Streaming: Processing framework for live streaming data.

l HBase: Distributed no-SQL data store on HDFS to store the results of the ana-lytics.


SNYPR Architecture

l Kafka: Horizontally scalable message-bus used to manage the delivery of incomingsecurity events.

l Impala (CDH) or Hive (HDP): Provides a SQL interface to the data stored inHDFS.

l ZooKeeper: Cluster management software to maintain configurations and

synchronization services across nodes within a cluster.

Note: The Hadoop cluster is configured for high availability based on best

practices deployment of Hadoop.


SNYPR Architecture

High AvailabilityThe SNYPR solution includes high availability of all the components of theinfrastructure. The Hadoop cluster is configured for high availability based on bestpractices deployment of Hadoop. This includes (but is not limited to) at a minimumhigh availability of the HDFS Namenodes, YARN resource Managers, minimum of 3Zookeeper servers, and minimum of 3 kafka brokers. The high availability for theSNYPR servers that leverage the Hadoop cluster are described below.

SNYPR Application ServerHigh availability of the SNYPR Console is provided with an HA configuration of twonodes, with the user interface active on one of the two nodes during normal operation.MySQL replication, and a Redis cluster is configured as well as backup of the filesystem where the configuration data is stored (referred to as SECURONIX_HOME). Aload balancer is configured for access to the user interface.


SNYPR Architecture

SNYPR-EYE ServerHigh availability of the SNYPR-EYE Server is provided with an HA configuration of twonodes, with the user interface active on one of the two nodes during normal operation.MySQL replication, as well as backup of the file system where the configuration data isstored (referred to as SNYPR-EYE_HOME) is configured on these servers for highavailability and a load balancer is configure for access to the user interface.


SNYPR Architecture

SNYPR Search ServerHigh availability of the SNYPR Search Servers is configured for each SNYPR Search cellin the deployment. The SNYPR Search cell includes a Local Event Indexer (LEI) as wellas multiple search instances. A search cell with high availability will include at least 2SNYPRSNYPR Search servers. The LEI process is running on the primary server forindexing the incoming event data from the Enriched topic on Kafka. A search serverprovides a replica of all indexed data on another server. During a fail-over, the LEI isstarted on the second search server to enable active indexing on that server.


SNYPR Architecture

SNYPR Remote Ingestion NodesAt least two SNYPR Remote Ingestion nodes (RINs) are recommended for highavailability in each location that they are deployed. RINs are typically installed in eachmajor data center in close proximity to the logs that are being collected. The datacollected by the RINs and forwarded to the kafka brokers is in compressed batchesthat minimize the network transfer by roughly 90%. The RINs also encrypt the payloadand support SSL and mutual authentication as well as Kerberos authentication.

The RINs collect data through two different methods, the push method and the pullmethod. The push method uses the embedded syslog server to collect and forwarddata to the kafka topics. The pull method uses the Securonix Connectors installed onthe RIN to connect to the APIs and gather the logs and forward them the to the Kafkatopic. High availability is provided on the kafka brokers by having 3 separate kafkabrokers and replication of the topics for availability.

A sticky load balancer is recommended for incoming traffic to the Remote Ingestionnode for incoming syslog traffic.


SNYPR Architecture

SNYPR Remote Ingestion Nodes

Hadoop Cluster Guidance for High AvailabilityThe Hadoop infrastructure services are used for high availability. The recommended

settings are as follows:

l At least three Kafka brokers with ISR=3

l HDFS replication factor =3

l Kafka message retention = 2 days

l Kafka In Sync Replica (ISR=3)

l HDFS replication set to three

l HA Namenode

l HA Resource Manager


SNYPR Architecture

l Minimum of three Zookeeper servers

l If security is required:

l Kerberos authentication of all services in the Hadoop cluster

l Encryption of HDFS folders with HDFS encryption is also available for sensitiveresource data

l Authorization for protection of the access to data in the Hadoop cluster isrecommended with the native tools (Ranger for Hortonworks, Sentry forCloudera)

l The SNYPR Edge Nodes for Ingestion and the Console User interface interact withthe Hadoop services and support Kerberos

Note: This is not a complete list. It is recommended that you follow the Hadoop

best practices for deployment.

In addition to the storage required for the data, the compute and memory required forrunning the SNYPR jobs must be available in the Hadoop cluster. The SNYPR solution

includes several jobs that are running in the cluster. YARN is used to schedule theresources. The primary jobs that are part of SNYPR and the resources allocation arelisted below.

The specific infrastructure required is based on the required peak ingestion rate.

Request specific deployment guidance from Securonix.


Data Ingestion

Data IngestionSNYPR includes a data ingestion pipeline that includes normalization, contextenrichment, and correlation.

All event data in SNYPR is stored in a super enriched format. The Open Event Format(OEF) is a self-describing format capable of supporting information fromheterogeneous data sources, while also adding enrichment data sets like user identitydata, threat intelligence feeds, asset information and others. This format enablesevents to be contextually enriched at ingestion time. This ensures that historicalchanges to the enriched data are captured with the event at the time it occurred. Theoriginal source event is always maintained in the OEF event. (Seehttps://openeventformat.org for details )The three phases of the SNYPR eventingestion pipeline are shown below.

Phase 1: Collect and PublishIn this phase, events are collected and a SNYPR publisher on the Remote Ingestion

Node (RIN) forwards the messages to the Kafka raw topic. There are multiple types ofSNYPR publishers, including the Ingestion node that uses the SNYPR ConnectorLibrary and the syslog publisher that forwards messages directly to the Kafka rawtopic . The SNYPR publishers forward all events to the raw topic in the SNYPRtransport format. This transport format adds metadata to the source events todescribe the event source and tag the events for processing in the enrichment job. The

SNYPR publishers also support batching, compression, and encryption of the eventsthat are published. This minimizes the bandwidth for transmission to the centralizedKafka brokers.


https://openeventformat.org/

Data Ingestion

Phase 2: EnrichmentThe SNYPR Enrichment Spark Streaming job is responsible for event filtering,normalization, and context enrichment of the raw logs. During context enrichment,context is added to the incoming log data. This context enrichment includesenrichment from user HR sources, geolocation information, threat intelligence datum,and other lookup data like internal network maps and asset data. Additionally, the rawevent log message is stored in the original format as one of the columns in thenormalized schema.

Single PipelineDuring processing of the data ingestion at the enrichment phase, either a singleingestion pipeline or the Pipeline Orchestration can be configured. All data sent fromthe remote ingestion nodes is processed by the enrichment job in a single pipelineconfiguration. This is used for small deployments with similar data sources.


Data Ingestion

Pipeline OrchestrationSNYPR include a Pipeline Orchestration feature that allows the enrichment process tointelligently distribute the enrichment phase across multiple enrichment pipelines. Inaddition to parallel ingestion at the enrichment phase, this is a feature that ensuresthat resources that take longer to process each event are split off of the mainenrichment pipeline into an alternate pipeline. This ensures that the resourcesprocessing can be analyzed and optimized out of the main enrichment pipeline withoutaffecting the performance of the other resources being ingested.

Phase 3: ProcessingThe third phase of the event ingestion pipeline is a parallel phase where multiple Spark

streaming jobs subscribe to the enriched topic and perform indexing, store enrichedevents in HDFS, and also analyze the events for threats.

The ingested data is stored for long-term storage in HDFS as parquet files and madeaccessible as Hive database tables that are partitioned by resource, year, and day.


Data Ingestion

The solution also indexes the data and stores it in SNYPR Search Solr collections. Thesolution creates additional index collections as the data size passes a configurablethreshold, and maintains a control index for execution of parallel queries across theentire set of collections. The index files are maintained on the dedicated SNYPR Searchservers on local storage. This configuration provides parallel query execution across allthe collections for deterministic response time for interactive use by the SNYPR userinterface.

The log compliance data is stored in a read-only format that cannot be modified.SNYPR supports strong authentication, authorization, and encryption of the Hadoopinfrastructure. SNYPR also provides application layer encryption and masking that canbe enabled selectively.

SNYPR uses Edge nodes for the user interface and for the SNYPR Search nodes. Allprocessing and long term storage of data is done within the Hadoop cluster. SNYPRprovides a feature called Spotter as an integral part of the solution. This featureprovides online searching and visualization of event data for the configured indexretention period.

The SNYPR Remote ingestion node includes the connectors that are used to ingest the

log data. The connectors leverage the specific log source APIs or files to access the logdata. The incoming log messages are associated with a Job ID and a Resource IDbefore they are submitted to Kafka, so that they can be processed by the SparkStreaming enrichment job. The connectors also perform offset management of thesource of the log data to ensure that all the logs messages are obtained and, in somecases, pre-processing of the source data. An example of pre-processing the log data is

the Ironport syslog connector. This connector converts the multi-line messages into asingle line for publishing to Kafka.


Data Ingestion

Indexing Incoming EventsSNYPR includes dedicated SNYPR Search servers. These servers are edge nodes in theHadoop cluster and consume the enriched messages from the Kafka topic andperformed local indexing on the search servers. The search indexes are designed tooptimize the search performance by paralleling the searching across multiple sub-indexes or SNYPR Search collections. Each collection is further distributed across aconfigured number of shards to ensure distribution of the workload. Each Solr serverin the cluster is allocated CPU and memory to allow the SNYPR Search server toperform optimally.

The indexed events are ingested in real-time by the solution. The SNYPR indexing jobis a distributed Spark Streaming job that runs within the Hadoop cluster. The computeand memory resources used for indexing are reserved capacity to ensure that eventsare ingested at the rate that they arrive to the solution. This allows the indexing ofingested events to be paralleled across the cluster to meet the deploymentrequirements of the solution.

An index control core collection is used to track the number of collections that the

solution is hosting. The solution maintains a maximum number of documents percollection threshold. The solution dynamically creates additional collections as moreevents are imported into the environment. The solution also provides the ability toduplicate redundant event data from the indexes during ingestion.

SearchingThe Spotter search interface allows users to search across all events. Interactive anddeterministic response time for searches is obtained by executing parallel searchesacross the collections. This approach ensures that the size of each index is optimizedand that the infrastructure can grow to support larger indexes without impacting theuser experience. The search results are incrementally returned to the user interfaceand displayed to the user as they arrive to ensure the responsiveness of the Spotterinterface.


Deployment Alternatives

Deployment AlternativesThe SNYPR solution utilizes services in a Hadoop cluster. SNYPR provides thefollowing deployment options:

l SNYPR UEBA: SNYPR User and Entity Behavior Analytics (UEBA)This solutionprovides security analytics on security events. Events are stored only during theprocessing of the analytics.

l SNYPR Security Analytics Data: This solution provides security analytics on secur-ity events. Events are stored for historical purposes and high-performance threathunting solution is provided for searching and visualization of events.

Dedicated SNYPRDeploymentThe Securonix SNYPR solution, shown in the diagram above, illustrates the servicesthat are used within SNYPR. In this deployment diagram, SNYPR is deployed with adedicated Security Analytics Data Lake. In this configuration, the Master nodes includethe SNYPR Console and the Cloudera Manager service as well as other services like

the HDFS Namenode, the YARN resource manager, Zookeeper, and other services thatare used by the Hadoop cluster.

Based on the size of the deployment (events per second (EPS), analytics processed,retention period) and the features being supported (UEBA, Security AnalyticsPlatform, Data Lake).



The SNYPR Architecture will scale to meet the deployment requirements. For a smallUEBA deployment, a limited number of servers are deployed and a dedicated SNYPRSearch Server is used for index storage. The deployment include between 3 and 6Hadoop servers along with a dedicated SNYPR Search server. The SNYPR applicationand the Redis service are collocated with the Hadoop master services.

For a medium UEBA deployment, full high availability of all services is configured ofservers are deployed and two dedicated SNYPR Search Servers are used for indexstorage. The deployment include between 6 and 10 Hadoop servers along with twodedicated SNYPR Search servers and two dedicated SNYPR Application Servers.

SNYPRDeployment with Dedicated Security Analytics Data Lake – Medium –UEBA

For a large Security Analytics Data Lake deployment, full high availability of allservices is configured for all servers that are deployed and at least two dedicatedSNYPR Search Servers are used for index storage. The deployment includes between 6and 10 Hadoop servers along three dedicated Kafka Brokers and two dedicatedSNYPR Search servers.



SNYPRDeployment with Dedicated Security Analytics Data Lake – Large –Security Analytics Data Lake

SNYPR Deployment with Existing HadoopInfrastructureThe SNYPR solution shown in the following diagram (Figure 5) illustrates the servicesfor SNYPR that are added to an existing Hadoop cluster. The SNYPR Application,SNYPR Search and SNYPR-EYE nodes are shown on the top and the existing Hadoopcluster is shown in the box on the bottom. For the supported Hadoop distributions,please see the SNYPR Installation Guide.



Logical SNYPRArchitecture – Existing Hadoop Cluster


Deployment Assumptions

Deployment AssumptionsDeploying a SNYPR environment requires many considerations for each of thecomponents of the solution.

For a standard deployment architecture, the following is recommended:

l Fast network access for the Hadoop cluster and edge nodes – 10 gigabyte Ethernetwith jumbo frames configured on all switches and network interfaces (MTU=9000).

l All services running in a single data center

l A balanced SNYPR cluster with similar nodes (CPU, memory, storage, network)

l Securonix SNYPR using standard Securonix connectors for data ingestion. Theexact sources of event data are deployment specific

l The log event data available to the SNYPR environment (Ingestion Nodes), or fordirect connector access to log sources, based on the connector used

l Storage bandwidth recommended: 1000 IOPS per Hadoop and SNYPR Searchserver

l Purging Online Event data after retention period days to minimize required stor-age, unless there is a business need for long term historical searching. Violation andbehavior data is not purged.

l Java 8 used by the cluster

For Hadoop tuning, See the section in this guide: Hadoop Cluster TuningRecommendations.

SNYPR Kafka Topic Partitioning Reference

KafkaTopics

10000 - 20000 EPS

Partitions Replication

tenantid-Raw 75 2

tenantid-Enriched 75 2



KafkaTopics

10000 - 20000 EPS

Partitions Replication

tenantid-Ops 1 2

tenantid-Tiertwo 75 2

tenantid-Control 1 2

tenantid-IndexerCount 1 2

tenantid-Violations 75 2

tenantid-User 1 2

tenantid-Count 1 2

tenantid-Preview 1 2

SNYPR Search Shard Allocation Reference

Solr Collections

10000 - 20000 EPS

Servers 9

Shards Replication

tenantid-activity 12 2

tenantid-violation 12 2

tenantid-whitelist 1 2

tenantid-entitymetadata 1 2

tenantid-tpi 1 2



Solr Collections

10000 - 20000 EPS

Servers 9

Shards Replication

tenantid-eeocontrolcore 1 2

tenantid-lookup 1 2

tenantid-ipmapping 1 2

tenantid-watchlist 1 2

tenantid-

dailyviolationsummary1 2

tenantid-users 1 2

tenantid-riskscorecard 1 2

tenantid-entityrelation 1 2

tenantid-access 1 2

SNYPR YARNResource Allocation ReferenceThe SNYPR Spark applications are configured based on the ingestion rate that must besupported. An example of the Spark Application resources allocation is shown in thetable below. The table below is an example of the resource allocation for a deploymentthat supports 20,000 events per second with typical workload. There are manyvariables affecting a deployment and the specific sizing recommended. ContactSecuronix for specific information.



SparkStreamingYARNResources

10,000 - 20,000 EPS

Driver Executors

vCPUMemory(GB)

NumberofExecutors

vCPUMemory(GB)

Event

Enrichment6 2 80 1 3

Event Ingestion 6 2 20 1 2

Behavior

Analytics1 2 10 1 4

Policy Engine

IEE1 2 40 1 2

Policy Engine

AEE1 2 10 1 3

Risk Generation 1 2 10 2 2

Traffic

Analyzer1 2 10 1 4

Behavior

Profile1 2 6 1 2

Robotic

Behavior1 2 10 1 3

Event Archiver 1 2 10 1 1



SparkStreamingYARNResources

10,000 - 20,000 EPS

Driver Executors

vCPUMemory(GB)

NumberofExecutors

vCPUMemory(GB)

Phishing 1 2 1 1 4

YARN

Resources21 22 217 546

Total

YARN

Resources

238 568


Search Deployment Options

Search Deployment OptionsSNYPR Search is a high-performance indexing and search solution that stores allactivity events in the environment that are access by the user interface.

SNYPR Search is deployed on an edge node in the Hadoop cluster. It requires access tothe SNYPR Console on the application server and the Kafka Brokers. These serversperform event indexing as well as storage of all violation data and related informationused by the SNYPR user interface.

Embedded Dedicated

Description

Limited search server for

small UEBA deployments.

Limited to one search cell.

Dedicated search server

for small UEBA or Security

Analytics Data Lake

deployments.

Indexing rate per Search

Cell (multiple cells are

configured for increased

performance)

3k average EPS

5k peak EPS

Multiple Search Cells are

supported, each cell

supports 10k average EPS

/ 15k peak EPS.

Redundancy of search

indexes with replication

can be configured for high

availability and faster

search performance.

Retention 7 days 30 days or more.

Search EmbeddedAn embedded deployment of SNYPR Search is collocated with the SNYPR Applicationand shares the resources on that server. The resources required for an embeddeddeployment of SNYPR Search are:



l 10 CPU

l 16 GB RAM

l 1 TB usable storage

An embedded SNYPR Search server is for small UEBA deployments and is limited to3,000 EPS average and 5k peak (EPS), and 7 days of retention. For deploymentscenarios with greater requirements, SNYPR Search Dedicated servers will be used.

SNYPRSearch EmbeddedMode

Search DedicatedThe SNYPR Search Dedicated deployment options are listed in the diagram below. A

SNYPR Search Standard deployment uses a single dedicated server for indexing andsearching. A SNYPR Search High Performance Cell includes separate servers forindexing and searching. In a high-performance cell, the indexes are replicated acrossservers for redundancy and for isolating indexing workload from search workload.



SNYPRSearch Dedicated

Search Index Storage Estimates

Embedded: 7 Days

Premium: 30Days

Premium: 30DayswithReplica

Days 7 30 30

Replic

as1 1 2

EPS

Avg

Messag

e Size

events / dayGB/da

y

Storage

(GB)

Storage

(GB)

Storage

(GB)

1,000 600 86,400,000 48 169 724 1,448

2,500 600 216,000,000 121 422 1,810 3,621

5,000 600 432,000,000 241 845 3,621 7,242



Embedded: 7 Days

Premium: 30Days


7,500 600 648,000,000 362 N/A 5,431 10,863

10,00

0600 864,000,000 483 N/A 7,242 14,484

15,00

0600

1,296,000,0

00724 N/A 10,863 21,726

20,00

0600

1,728,000,0

00966 N/A 14,484 28,968

Premium: 60Days


Premium: 90Days


Days 60 60 90 90

Repli

cas1 2 1 2

EPS

Avg

Mess

age

Size

events /

day

GB/d

ay

Storage

(GB)

Storage

(GB)

Storage

(GB)

Storage

(GB)



Premium: 60Days


Premium: 90Days


1,00

0600

86,400,00

048 1,448 2,897 2,173 4,345

2,50

0600

216,000,0

00121 3,621 7,242 5,431 10,863

5,00

0600

432,000,0

00241 7,242 14,484 10,863 21,726

7,50

0600

648,000,0

00362 10,863 21,726 16,294 32,589

10,0

00600

864,000,0

00483 14,484 28,968 21,726 43,452

15,0

00600

1,296,000,

000724 21,726 43,452 32,589 65,178

20,0

00600

1,728,000,

000966 28,968 57,936 43,452 86,904

SNYPR Extra Large DeploymentsThe sizing guidelines in this document are references for deployment of SNYPR. Thesolution will support much larger deployments based on the customer requirements.

For large deployments the search servers are dedicated servers rather than beingcollocated on the Compute/Storage nodes. This allows the search indexers to scale asneeded without impacting other services. This includes Solr and a dedicatedZookeeper configuration to avoid contention.



There is no upper limit to the deployment size. The deployment architecture for extra-large deployments will be determined based on the specific deployment requirements.Contact Securonix for details.

The major variables that dictate the deployment recommendations include:

l Ingestion Rate (Events Per Second) of security event data

l Number of Users interacting with the application interactively

l The data retention requirements for online data

l The data retention requirements for log data

l The disaster recovery strategy


Disaster Recovery Alternatives

Disaster Recovery AlternativesSNYPR can be deployed to meet several disaster recovery objectives. Because of thesize of the solution and the costs associated with disaster recover, several DRalternative strategies are available. Since SNYPR can be deployed with an existingHadoop environment, the disaster recovery strategy must align with the DR strategyfor the Hadoop infrastructure being used for SNYPR. The alternatives in thisdocument assume a dedicated Hadoop infrastructure for SNYPR, and describe thedisaster recovery considerations for the entire solution, including Hadoop. If anexisting Hadoop environment is used, the same considerations, are relevant, but theactual configuration of the Hadoop disaster recovery will be assumed to be part of theexisting Hadoop infrastructure.

AlternativesThe SNYPR Disaster Recovery Alternatives include:

1. Advanced DR with Full Infrastructure - identical infrastructure with data rep-

lication from primary site to DR Site, with the ability to continue processing inflight messages from the Kafka brokers at the DR site.

2. Full DR with Full Infrastructure - identical infrastructure with select data rep-lication from primary site to DR Site, with the ability to rebuild search indexes aftera DR from the historical enriched event data, and the ability to process new activityevents at the DR site.

3. Limited DR with limited infrastructure - limited infrastructure with violation, sum-mary, and configuration data only and the ability to process new activity events.

ConsiderationsThe considerations for disaster recovery must be made for each service included in thesolution. The primary considerations for each of the node types are described asfollows:



l SNYPR Console Nodes: The SNYPR Console Nodes include the SNYPR User inter-face and the SNYPR configuration database.

l SNYPR Search Servers are dedicated search nodes that include a local eventindexer and multiple search instances for distributed searches. These servers areedge nodes in a hadoop cluster that read data from Kafka and index the data tolocal storage on the search servers. The SNYPR Search servers include optimizationfor maximum search performance and density on physical server. Apache Solr isused for the underlying search server.

l SNYPR-EYE Server is a SNYPR monitoring and alerting server that is used for theconfiguration and operational health monitoring of all SNYPR services includingthe all the servers in the Hadoop cluster, the processes on the SNYPR Console, theSNYPR Spark Streaming applications running in the YARN cluster, including the per-formance of the data ingestion of all resources, the performance and health of theSNYPR Search processes. The SNYPR Eye solution installs and manages SNYPR-EYE agents on the servers in the environment for local monitoring.

l SNYPR Remote Ingestion Nodes include the ingestion servers with the connectors,the incoming activity log files, and the Kafka brokers with the in-flight messages.

l Hadoop Master: These nodes also include the Hadoop administration services like

Cloudera Manager and Zookeeper when Hadoop is deployed as part of the solu-tion. The considerations for disaster recovery at this tier include file system rep-lication with rsync, or a backup and restore strategy, as well as MySQL databasereplication for the SNYPR configuration database and the Hive metastore.

l Compute / Storage Nodes: The SNYPR Compute / Storage Nodes include HDFSand all the files stored by the system in HDFS for Hive / Impala table access, Solr

indexes, and HBase tables. The considerations for disaster recovery at this tierinclude replication (using distcp) or backup and recovery of the HDFS data, HBasereplication (using the WALs), and replication of the Solr collection schema data.

l Kafka Brokers: The considerations for disaster recovery at this tier include KafkaMirrorMaker for the in-flight Kafka messages.

The exact disaster recovery strategy implemented should be in alignment with thebusiness continuity requirements for each deployment. The table shows thealternatives for disaster recovery configuration with the impact on the businesscontinuity.



Advanced DRwith FullInfrastructure

Full DR with FullInfrastructure

Limited DR withLimitedInfrastructure

DR Target 1 day 1 week

1 week (Violations,

behavior and data

only)

Configuration Data X X X

Violation Data X X X

Case Management X X X

Behavior

SummariesX X X

Historical Enriched

EventsX X X

Search Indexes X

rebuild search

indexes after DR

initiation

X

Kafka in-flight

messagesX X X

Unprocessed

Event FilesX X X

The availability of the data that SNYPR needs at the disaster site as well as networkfailover and end user access to the disaster recovery infrastructure must also beconsidered. The typical services that are needed at the disaster site to continueprocessing are shown in the diagram below. This includes user and access data, as wellas event logs that are ingested by the solution. For details, refer to the ClouderaBackup and Disaster Recovery at:https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_bdr_about.html


https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_bdr_about.html.

Network Bandwidth

Network BandwidthA SNYPR deployment includes network transfer of several types of data into thesolution. This includes User, Access, TPI, Event Logs, Network Maps and other types ofdata for a typical deployment. Due to the potential sensitivity of some of this data, avirtual private cloud may be required for each deployment. In addition to the securityconsiderations, the infrastructure will require sufficient network bandwidth. The typeof network traffic used by the solution is:

l End User Access to the Securonix User Interface

l Import of User, Access, and TPI data into the Master nodes

l Cluster communication and synchronization between the cluster nodes

l Import of event log data into the child nodes

The largest network traffic required is to transfer the event log data from the source

to the child nodes for import through the solutions connectors. The network traffic

rate from the event logs sources to the child can be calculated by multiplying the

events per second time the average message size.

5000 events per second (EPS) to two ingestion nodes in the deployment with an

average message size of 500 byte, will require 2.5 MB per second, or roughly 25 Mb

per second of bandwidth.


Network Bandwidth

Network Bandwidth Characteristics by Tier

Tier DescriptionNetworkRequirements

Admin

This tier is where the end users

losing to the user interface (traffic

on port 443). This tier also includes

all management services for the

cluster and connects to the Compute

/ Storage / Search tier and

Messaging Tier for various services.

incoming connections Web Services

on port 443, MySQL configuration

for spark jobs, Redis, zookeeper and

other hadoop cluster services. This

tier hosts the management services

that the agents on the Admin,

Compute / Storage / Messaging tiers

will communicate with.

10 GB ethernet, MTU =

9000, centralized data

center for Admin,

Compute / Storage /

Search, and Messaging

Tiers)

Compute/Storage/Search

Network traffic for to these servers

includes spark, Impala, HDFS, HBase

services. Outbound traffic to

services in the Admin tier and the

Messaging Tier are also required.



center for Admin,

Compute / Storage /


Tiers)

SNYPR Search

Network traffic for to these servers

includes SNYPR Search (Solr).

Outbound traffic to services in the

Kafka Messaging Tier is required.



center for Admin,

Compute / Storage /


Tiers)


Network Bandwidth

Tier DescriptionNetworkRequirements

Messaging

This tier includes incoming traffic to

Kafka Brokers (SSL traffic to port

9093, and zookeeper traffic on port

2181).



center for Admin,

Compute / Storage /


Tiers)

Collection

This server collects logs and

provides a syslog server on port

514. The connectors on the server

also collect logs with native

protocols. The primary network

traffic from this tier is to the Admin

tier on port 443 for web services

and the Kafka brokers in the

Messaging Tier on port 9093 SSL)

Remote data center

with outbound network

access to the

centralized data center.

If 10 gigabyte Ethernet is not available and gig-bit Ethernet is used in the deployment,then the performance of the deployment will be limited by the network performance.

Network Bandwidth Requirements fromRINCollection Tier to Messaging TierThe table below displays the network bandwidth requirements from the RemoteIngestion Nodes (RINs) collection tier to the messaging tier (Kafka Brokers).

Average EPS 20,000 EPS

Number of RINs 1 RINS

average message size 600 bytes


Network Bandwidth

Average EPS 20,000 EPS

Transferred to Kafka after

compression (%)30 %

Total Traffic to Kafka 36 Mbits/s

Traffic per RIN to Kafka

(assuming equal distribution)36 Mbits/s


Virtual Infrastructure

Virtual InfrastructureDue to the high-performance requirements of the solution, physical servers ordedicated cloud instances are recommended. A virtual infrastructure can beconsidered for small deployments or non-production environments.

Considerations for virtual deploymentsl These are VMs that can be deployed as needed on the vSphere cluster, without

over- subscription of either CPU or Memory resources. Configure CPUs along phys-ical socket boundaries. According to vmware, one VM per NUMA node is advisable.

l These nodes house the Cloudera Master services and serve as the gateway/edgedevice that connects the rest of the customer’s network to the Cloudera cluster.

l Care should also be taken to ensure automated movement of VMs is disabled.There should be no DRS or vMotion of VMs allowed in this deployment model.This is critical as VMs are tied to physical disks and movement of VMs within thecluster will result in data loss.

l Configure Distributed Resource Scheduler (DRS) rules so that there is strong neg-ative affinity between the master node VMs. This ensures that no two masternodes are provisioned or migrated to the same physical vSphere host.

l Key configuration parameter to consider is the MTU size to ensure that the sameMTU size being set at the physical switches, guest OS, ESXi VMNIC and thevswitch layers. This is relevant when enabling jumbo frames. (9000 MTU), which is

recommended for Hadoop environments.

l Set up virtual disks in “independent persistent” mode for optimal performance.Eager Zeroed Thick virtual disks provide the best performance.

l Each provisioned disk is mapped to one vSphere datastore (which in turn containsone VMDK or virtual disk)

l VMXNET3 NIC should be configured.

l Disable or minimize anonymous paging by setting vm.swappiness=0 or 1.

l VMs on the same physical host are affected by the same hardware failure. In orderto match the reliability of a physical deployment, replication of data across two vir-tual machines on the same host should be avoided.


SNYPR Cloud Deployment

SNYPR Cloud DeploymentSNYPR solution can be deployed in a cloud environment. Several considerations mustbe addressed when deploying SNYPR in a cloud including the following:

l Infrastructure selection: The infrastructure used should provide equivalentresources (CPU, memory, and storage capacity and bandwidth) to the physicalserver recommendations listed in this document.

l Deployment Architecture: SNYPR could be deployed exclusively in the cloud or asa hybrid cloud / on-site topology

l Network Access: The infrastructure must have access to the data (user, access,event log, TPI, etc.) that will be used. A Virtual Private Cloud may be required fortransmission of sensitive data.

Infrastructure SelectionSNYPR can be deployed in public or private cloud environments. Based on thedeployment requirements of the solution, the specific infrastructure used for each

cloud infrastructure should be selected to ensure that the appropriate resources areavailable. This includes selection of the appropriate virtual instance types to supportthe CPU, memory, storage and network bandwidth requirements of the solution.

Deployment ArchitectureThe deployment of SNYPR includes a Hadoop cluster as well as servers for the userinterface and for event ingestion. When SNYPR is deployed in a cloud environment,there are two primary deployment alternatives. The first is a Securonix Clouddeployment where all servers in the cluster are hosted in the cloud.

The second is a Securonix Cloud / On-Premise deployment where the console nodesare deployed in the cloud and the ingestion nodes are deployed on-premise.


Considerations

ConsiderationsThis section contains considerations for the following topics:

l Amazon EC2

l Microsoft Azure

Amazon EC2There are several Amazon EC2 Instance Types that are a good fit for deployingSecuronix. The M4 general purpose instances are recommended. These are defined byAmazon as: "M4 instances are the latest generation of General-Purpose Instances.This family provides a balance of compute, memory, and network resources, and it is agood choice for many applications."

Featuresl 2.4 GHz Intel Xeon® E5-2676 v3 (Haswell) processors

l EBS-optimized by default at no additional cost

l Support for Enhanced Networking

l Balance of compute, memory, and network resources

HadoopMaster

Compute /Storage

KafkaSNYPRSearch

SNYPRSearch

AmazonEC2InstanceType

R5.4xlarge M4.16xlarge M5.2xlarge M5.4xlarge M4.16xlarge

RAM(GB)

128 256 32 64 256

vCPU 16 64 8 16 64


Considerations

HadoopMaster

Compute /Storage

KafkaSNYPRSearch

SNYPRSearch

Storage(GB, splitintomultipleEBSvolumes)

10,000 10,000 3,000 3,000 10,000

Amazon provides several alternatives for the instance types used, like the R3.8XL, andthe D2.8XL, which are also good options. The storage chosen should provide adequatebandwidth to the volume used. This is the equivalent of 1000 IOPs per instance to theselected storage type.

In addition to standard Amazon AWS EC2 instances, the guidance for deployingCloudera in Amazon Web Services is recommended. See the following link:https://www.cloudera.com/partners/solutions/amazon-web-services.html.

Microsoft AzureSeveral Azure Virtual Machine instance types (https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/) are a good fit for deploying Securonix. A G4(East US2), D15 v2 (East US2), or H16m (South Central US) instance type is

recommended.

The D sv2 instances are recommended. These are defined by Microsoft as:

“D11-15 v2 instances are based on the 2.4 GHz Intel Xeon® E5-2673 v3 (Haswell)processor, and can achieve 3.1 GHz with Intel Turbo Boost Technology 2.0. D11-15 v2are ideal for memory-intensive enterprise applications. D15 v2 instance is isolated tohardware dedicated to a single customer.

For persistent storage, use the variant “Dsv2” VMs and purchase Premium Storageseparately.”


https://www.cloudera.com/partners/solutions/amazon-web-services.html

https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/

https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/

Considerations

HadoopAdmin

Compute /Storage

KafkaBroker

SNYPRSearch

SNYPRConsole

MicrosoftAzureInstanceType

E16 v3 D64 v3 E16 v3 D64 v3 E16 v3

RAM (GB) 128 256 128 256 128

vCPU 16 64 16 64 16

Storage(GB, splitinto multipleEBSvolumes)

3,000 10,000 5,000 10,000 5,000

Microsoft provides several alternatives for the storage for the instances used. The

storage chosen should provide adequate bandwidth to the volume used. This is theequivalent of 1000 IOPs per instance to the selected storage type.

In addition to standard Azure instances, the following guidance for deploying Clouderain Microsoft Azure is recommended. See the link:https://www.cloudera.com/more/news-and-blogs/press-releases/2015-09-24-

cloudera-enterprise-data-hub-edition-provides-enterprise-ready-hadoop-for-microsoft-azure.html.


https://www.cloudera.com/more/news-and-blogs/press-releases/2015-09-24-cloudera-enterprise-data-hub-edition-provides-enterprise-ready-hadoop-for-microsoft-azure.html



Google Cloud

Google CloudThe table below shows an example configuration for a Google Cloud SNYPRarchitecture with 10,000 EPS, and 30-day os search index storage.

Type QuantityInstanceType

CPU Memory Storage

Master

Servers3

N1-

Highmem-

16

16 vCPU 104 GB

(Quantity 1)

/root 500 GB

(SSD),

(Quantity 1)

/zookeeper

250 GB (SSD)

SNYPR

Console

Servers

1

N1-

standard-

16

16 vCPU 60 GB

(Quantity 1)

/root 500 GB

(SSD),

(Quantity 8)

/data 500 GB

(standard)

Compute /

Storage

Servers

6

N1-

standard-

64

64 vCPU 240 GB

(Quantity 1)

/root 128 GB

(SSD),

(Quantity 5)

/search[1-10]

1000 GB

(standard)

Search /

Storage

Servers

1

N1-

Highmem-

64

64 vCPU 416 GB

(Quantity 1)

/root 128 GB

(SSD),

(Quantity 10)

/search[1-10]

5500 GB

(SSD)


Google Cloud

Type QuantityInstanceType

CPU Memory Storage

Kafka

Ingestion

Servers

3

N1-

standard-

8

8 vCPU 30 GB

(Quantity 1)

/root 128 GB

(SSD),

(Quantity 1)

/zookeeper

256 GB (SSD),

(Quantity 3)

/data 1024 GB

GB (standard)

Remote

Ingestion

Nodes

1

N1-

standard-

8

8 cpu 30 GB

(Quantity 1)

/root 128 GB

(SSD),

(Quantity 3)

/data 2000 GB

GB (standard)


SNYPR Reference Hardware

SNYPR Reference HardwareThe SNYPR architecture includes the following nodes that integrate with the Hadoopservices:

SNYPR

Application

Server

Console

User

Interface,

configuration

db, Redis

These are edge nodes in a Hadoop cluster that are used

for the SNYPR user interface and the configuration

repository for all components used by the solution. Each

Console Node performs the following tasks:

l Provide visualizations for

monitoring events, threat management dashboards,

investigations and incident

response

l Build custom dashboards with

visualizations for viewing violation and event data

l Configure all ingestion jobs

- user identities, access privileges, threat

intelligence, security events

and others

l Administration interface for application

support, personnel and administrators

l Configure all policies and

analytics, including behavior-based anomaly

detection, peer-based analytics,

threat modeling and risk analytics

h



SNYPR

EYE

SNYPR EYE

Interface,

configuration

db

SNYPR Eye Server is a SNYPR monitoring and alerting

server that is used for the configuration and operational

health monitoring of all SNYPR services including the all

the servers in the hadoop cluster, the processes on the

SNYPR Console, the SNYPR Spark Streaming

applications running in the YARN cluster, including the

performance of the data ingestion of all resources, the

performance and health of the SNYPR Search processes.

The SNYPR Eye solution installs and manages SNYPR-

EYE agents on the servers in the environment for local

monitoring.

SNYPR

Remote

Ingestion

Node

SNYPR

Remote

Ingestion

Node

SNYPR Remote Ingestion Nodes: These nodes are Edge

nodes in a Hadoop cluster that are used to ingest

security event log data into the environment with the

Securonix connectors.

Each SNYPR Ingestion node performs the following

tasks:

l Import Events from log sources

l Publish events to Kafka

Message Bus with batching, compression and

encryption

l Accept incoming log files on syslog

l Cache In-transit messages

Hadoop

Master

Hadoop

cluster

management

services

Hadoop Master Nodes: These are the master servers in

the Hadoop cluster.



Hadoop Compute / Storage Nodes: These are the main

nodes in a Hadoop cluster that are used to store

compressed data and process all the jobs associated with

SNYPR.

Each SNYPR Compute/Storage node performs the

following tasks:

l Fetch data from the ingestion nodes.

l Perform all the jobs associated with SNYPR

based on the configuration stored in the Master

node, including parsing, indexing,

analytics, and storage.

l Store data with 90% compression in structured

JSON format.

l Pass processed data to SNYPR Search indexes

that are used by the SNYPR console for review by

the end user.

Hadoop

Kafka

Broker

Kafka

Broker,

dedicated

zookeeper

Kafka broker servers for in transit messages,

configuration zookeeper servers dedicated to Kafka.

These servers use local storage for in transit messages.


Reference Server Specifications

Reference Server SpecificationsThis section contains recommendations for the following topics:

l Hardware Specifications

l Server Mount Point

Hardware SpecificationsThe hardware specifications for the infrastructure are listed in the following table:

ConfigurationSNYPR-M1:Hadoop Master

SNYPR-M2:Hadoop Masterwith SNYPR

SNYPR-M3:Hadoop Masterwith SNYPR andKafka

Server Model Dell R640 Dell R640 Dell R640

CPU

2 x Intel Xeon

Gold 5120 2.2G,

14C/28T

2 x Intel Xeon

Gold 5120 2.2G,

14C/28T

2 x Intel Xeon Gold

5120 2.2G,

14C/28T

Memory256GB RDIMM,

2666MT/s

256GB RDIMM,

2666MT/s

256GB RDIMM,

2666MT/s

Boot Storage

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

Additional Storage

4 x 2.4 TB 10K

RPM SAS 12Gbps

4Kn

6 x 2.4 TB 10K

RPM SAS 12Gbps

4Kn

8 x 2.4TB 10K

RPM SAS 12Gbps

4Kn

Network 10GE 10GE 10GE



ConfigurationSNYPR-M1:Hadoop Master

SNYPR-M2:Hadoop Masterwith SNYPR

SNYPR-M3:Hadoop Masterwith SNYPR andKafka

Power 2 x 1100W 2 x 1100W 2 x 1100W

Rack Units 1RU 1RU 1RU

ConfigurationSNYPR-C1:Standard DensityCompute/Storage

SNYPR-C2: HighDensityCompute/Storage

SNYPR-C3:MaximumDensityCompute/Storage

Server Model Dell R640 Dell R740xd Dell R740xd

CPU

2 x Intel Xeon Gold

5120 2.2G,

14C/28T

2 x Intel Xeon Gold

5120 2.2G,

14C/28T

2 x Intel Xeon Gold

5120 2.2G,

14C/28T

Memory256GB RDIMM,

2666MT/s

256GB RDIMM,

2666MT/s

256GB RDIMM,

2666MT/s

Boot Storage

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

Additional

Storage

10 x 2.4 TB 10K

RPM SAS 12Gbps

4Kn

24 x 2.4 TB 10K

RPM SAS 12Gbps

4Kn

30 x 2.4TB 10K

RPM SAS 12Gbps

4Kn


Power 2 x 1100W 2 x 1100W 2 x 1100W

Rack Units 1RU 2 RU 2 RU



ConfigurationSNYPR-SEARCH1:Standard DensityCompute/Storage

SNYPR-SEARCH3:Maximum DensityCompute/Storage

Server Model Dell R640 Dell R740xd

CPU2 x Intel Xeon Gold 5120

2.2G, 14C/28T

2 x Intel Xeon Gold 5120

2.2G, 14C/28T

Memory256GB RDIMM,

2666MT/s256GB RDIMM, 2666MT/s

Boot Storage2 x 1.6TB SSD SATA Mix

Use 12Gbps 512e

2 x 1.6TB SSD SATA Mix

Use 12Gbps 512e

Additional Storage10 x 2.4TB 10K RPM SAS

12Gbps 4Kn

30 x 2.4TB 10K RPM SAS

12Gbps 4Kn

Network 10GE 10GE

Power 2 x 1100W 2 x 1100W

Rack Units 1RU 2 RU

ConfigurationSNYPR-K3:Kafka Brokers

SNYPR-R1:RemoteIngestion Node

SNYPR-S3:SNYPR Console

Server Model Dell R640 Dell R640 Dell R640

CPU

2 x Intel Xeon

Gold 5120 2.2G,

14C/28T

2 x Intel Xeon

Gold 5120 2.2G,

14C/28T

2 x Intel Xeon Gold

5120 2.2G,

14C/28T



ConfigurationSNYPR-K3:Kafka Brokers

SNYPR-R1:RemoteIngestion Node

SNYPR-S3:SNYPR Console

Memory128GB RDIMM,

2666MT/s

64GB RDIMM,

2666MT/s

128GB RDIMM,

2666MT/s

Boot Storage

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

2 x 1.6TB SSD

SATA Mix Use

12Gbps 512e

Additional Storage

10 x 2.4 TB 10K

RPM SAS 12Gbps

4Kn

4 x 2.4 TB 10K

RPM SAS 12Gbps

4Kn

4 x 2.4 TB 10K

RPM SAS 12Gbps

4Kn


Power 2 x 1100W 2 x 1100W 2 x 1100W

Rack Units 1RU 1RU 1RU

Alternate hardware configuration can be used, but equivalent specifications arerequired for CPU, memory, network bandwidth, storage capacity and bandwidth.

Server Mount PointThe storage mount point configuration for each of the servers is listed in the tablebelow:



Mount PointSNYPR-M1:HadoopMaster

SNYPR-M2:HadoopMaster withSNYPR

SNYPR-M3:HadoopMaster withSNYPR andKafka

Comments

/ 100 GB 100 GB 100 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/boot 2 GB 2 GB 2 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

swap 10 GB 10 GB 10 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/zookeeper 100 GB 100 GB 100 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/var 800 GB 800 GB 800 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/dfs 200 GB 200 GB 200 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs



Mount PointSNYPR-M1:HadoopMaster

SNYPR-M2:HadoopMaster withSNYPR

SNYPR-M3:HadoopMaster withSNYPR andKafka

Comments

/securonix 4.2 TB 6.3 TB 8.4 TB

RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount

/snyprsearch - - - RAID 6

/data1 - - 2.1 TBJBOD, xfs,

noatime


noatime


noatime


noatime



MountPoint

SNYPR-C1:StandardDensityCompute/Storage

SNYPR-C2:High DensityCompute/Storage


Comments

/ 100 GB 100 GB 100 GB

RAID 1,

(1.6 TB

mixed use

SSD

drives),

xfs


RAID 1,

(1.6 TB

mixed use

SSD

drives),

xfs


RAID 1,

(1.6 TB

mixed use

SSD

drives),

xfs

/zookeepe

r100 GB 100 GB 100 GB

RAID 1,

(1.6 TB

mixed use

SSD

drives),

xfs



MountPoint




Comments

/var 800 GB 800 GB 800 GB

RAID 1,

(1.6 TB

mixed use

SSD

drives),

xfs

/dfs 200 GB 200 GB 200 GB

RAID 1,

(1.6 TB

mixed use

SSD

drives),

xfs

/securonix - - -

RAID 10,

xfs, if

syslog is

used

locally use

higher

storage

amount

/snyprsear

ch- - - RAID 6

/data1 2.1 TB 2.1 TB 2.1 TBJBOD, xfs,

noatime



MountPoint




Comments


noatime


noatime


noatime


noatime


noatime


noatime


noatime


noatime


noatime

/data11 - 2.1 TB 2.1 TBJBOD, xfs,

noatime



MountPoint




Comments


noatime


noatime


noatime


noatime


noatime


noatime


noatime


noatime


noatime


noatime



MountPoint




Comments


noatime


noatime


noatime


noatime


noatime


noatime


noatime


noatime


noatime



Mount PointSNYPR-SEARCH1:StandardDensityCompute/Storage

SNYPR-SEARCH3:MaximumDensityCompute/Storage

Comments

/ 100 GB 100 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/boot 2 GB 2 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

swap 10 GB 10 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/zookeeper 100 GB 100 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/var 800 GB 800 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs

/dfs 200 GB 200 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs



Mount PointSNYPR-SEARCH1:StandardDensityCompute/Storage

SNYPR-SEARCH3:MaximumDensityCompute/Storage

Comments

/securonix - -

RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount

/snyprsearch 17 TB 60 TB RAID 6

Mount PointSNYPR-K3:KafkaBrokers

SNYPR-R1:RemoteIngestionNode

SNYPR-S3:SNYPRConsole

Comments

/ 100 GB 100 GB 100 GB

RAID 1, (1.6

TB mixed use

SSD drives),

xfs


RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount






Comments


RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount

/zookeeper 100 GB - -

RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount

/var 800 GB 1000 GB 1000 GB

RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount

/dfs 200 GB 200 GB 200 GB

RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount






Comments

/securonix - 4.2 TB 4.2 TB

RAID 10, xfs,

if syslog is

used locally

use higher

storage

amount

/snyprsearch - - - RAID 6

/data1 2.1 TB - -JBOD, xfs,

noatime


noatime


noatime


noatime


noatime


noatime


noatime






Comments


noatime


noatime


noatime

Alternatives for Limiting the Size of theInfrastructureThe recommended architecture assumes full functionality and full access to indexeddata and source data for the duration of the retention period.

Other factors may reduce the size of the recommended infrastructure such as areduction in the volume of log data or filtering of some log data to avoid storage ofunneeded events.

You can configure the Hadoop compute and storage nodes to use very dense storageper node. The following table shows an example configuration that is possible. Thisconfiguration includes dense storage.

Model vCpu Memory (GB) Storage (TB)

SNYPR-S3 56 256 3

SNYPR-SEARCH1 56 256 17

SNYPR-R3 32 64 3

SNYPR-M3 56 256 9



Model vCpu Memory (GB) Storage (TB)

SNYPR-C1 56 256 21

SNYPR-K3 56 64 15

Premium: 60Days


Premium: 90Days


Days 60 60 90 90

Repli

cas1 2 1 2

EPS

Avg

Mess

age

Size

events /

day

GB/d

ay

Storage

(GB)

Storage

(GB)

Storage

(GB)

Storage

(GB)

1,00

0600

86,400,00

048 1,448 2,897 2,173 4,345

2,50

0600

216,000,0

00121 3,621 7,242 5,431 10,863

5,00

0600

432,000,0

00241 7,242 14,484 10,863 21,726

7,50

0600

648,000,0

00362 10,863 21,726 16,294 32,589

10,0

00600

864,000,0

00483 14,484 28,968 21,726 43,452

15,0

00600

1,296,000,

000724 21,726 43,452 32,589 65,178

20,0

00600

1,728,000,

000966 28,968 57,936 43,452 86,904


Sizing and Capacity Planning

Sizing and Capacity PlanningSecuronix provides sizing and capacity planning. The considerations and examples ofthe sizing recommendations provided here. (Contact Securonix Support for morespecific deployment recommendations).

Server Types

Type Description Type

Snypr Application

Snypr Console user interface

and Snypr Eye Monitoring

Server

Dedicated Snypr Server

Snypr Search

Snypr Search and Indexer

Servers with local storage of

search indexes

Dedicated Snypr Server

Remote Ingestion

Nodes

Remote Ingestion Servers for

log collectionDedicated Snypr Server

Hadoop Master Hadoop management serverDedicated Snypr Server or

Existing Hadoop Cluster

Compute / Storagehadoop compute and storage

server

Dedicated Snypr Server or


Kafka BrokersKafka brokers for transient

messages

Dedicated Snypr Server or


AssumptionsSeveral assumptions are made when providing a sizing recommendation. The followinglist is an example of input assumptions and related sizing recommendations:

l EPS Input is pre filtered EPS (for UEBA, 40 % filtering is assumed, for SDL, no fil-tering)



l if EPS is greater than 2500 EPS, dedicated Snypr Search servers are required.

l YARN will use 50% of the memory on the compute nodes

l in small cluster (less than 6 nodes) the master will have 20 vCPUs for YARN and 64GB RAM

l Kafka compression from RIN = 3x (30% or source)

l Additional Adjustment Considerations

l HDFS data node - resources required - 4 vCpu compute on data node servers, 8 GB

l HBase Region Servers - resources required - 5 percent compute on region servers,16 GB

l Search - if 7200 rpm drives are use, reduce EPS by 20 percent

l Custom hardware requires a YARN memory to vCPY ratio of 3.2 or higher GB per

vCPU

Type Sample Input Values

Deployment Type SDL

Average Events Per Second 20,000

Peak EPS 20,000

Peak Filtered EPS 20,000

Average Message Size 600

Identities 20000

Analytics Complexity Medium

Search Retention 30

Long Term Storage 180

Kafka Retention 2

HDFS Replication 2

Solr HA No



Type Sample Input Values

Excess Capacity None

RIN HA No

Snypr Console HA No

Snypr Console dedicated Yes

YARN vCpu percent 70

YARN Memory percent 60

TypeOutput CalculatedValues

Peak EPS 20,000 EPS

Peak Filtered EPS 20,000 EPS

Daily Events 1,728,000,000 events per day

Daily Ingestion Size 966 GB/day

Total Search Events 52 billion events

Total Search Storage 14,484 GB

Total long term Events 311 billion events

Total Long Term (HDFS)

Storage109,499 GB

Identities 20000

Deployment Option 1: Dedicated Snypr Data Lake

Server Type Quantity Model

SNYPR Application 1 SNYPR-S3



Server Type Quantity Model

SNYPR Search 2 SNYPR-SEARCH1

Remote Ingestion Nodes 2 SNYPR-R3

Hadoop Master 3 SNYPR-M2

Compute / Storage 8 SNYPR-C1

Kafka Brokers 3 SNYPR-K2

Total Servers 19

Deployment Option 2: Existing Data LakeExisting Hadoop Recommendation

SNYPR Edge Node ServerType

Quantity Model

SNYPR Application 1 SNYPR-S3

SNYPR Search 2 SNYPR-SEARCH1

Remote Ingestion Nodes 2 SNYPR-R3

Total Snypr Edge Node Servers 5

Existing Hadoop Capacity Required

SNYPR Edge Node ServerType

Quantity Model

YARN Capacity 320 vCPU

1,229 GB RAM

HDFS storage (3 x replication) 107 TB

Kafka Storage (2 days retention) 45 TB


Spark Jobs Configuration for Kerberized Kafka

Spark Jobs Configuration forKerberized KafkaWhen running the SNYPR spark applications in a kerberized cluster, add the belowparameters in the sparkjobs scripts in order to sparkjobs for connecting to securekafka.

--driver-java-options "-

Djava.security.auth.login.config=/opt/keytabs/jaas.conf -

Djute.maxbuffer=50000000 -Dspark.driver.userClassPathFirst=true -

Dspark.executor.userClassPathFirst=true" \

--conf "spark.executor.extraJavaOptions=-

Djava.security.auth.login.config=/opt/keytabs/jaas.conf -

XX:+UseConcMarkSweepGC -

Dlog4j.configuration=./conf/log4j.properties -

Djute.maxbuffer=50000000 -Xss1G" \


Network Tuning Recommendations

Network Tuning RecommendationsThe network configuration can have a dramatic performance impact on theenvironment. The network tuning guidance in this section can be used to optimize thenetwork configuration for the linux servers in the environment.

Modify Network Kernel SettingEdit the Network Tuning Parameters in / etc / sysctl.conf file:

# vi /etc/sysctl.conf

Edit the following values:

# allow testing with buffers up to 128MB

net.core.rmem_max = 134217728

net.core.wmem_max = 134217728

# increase Linux autotuning TCP buffer limit to 64MB

net.ipv4.tcp_rmem = 4096 87380 67108864

net.ipv4.tcp_wmem = 4096 65536 67108864

# recommended default congestion control is htcp



net.ipv4.tcp_congestion_control=htcp

# recommended for hosts with jumbo frames enabled (only relevant

for systems with 10GB interfaces)

net.ipv4.tcp_mtu_probing=1

# recommended for CentOS7

net.core.default_qdisc = fq

In order for the above changes to take effect, reboot the server.

Increase the Transmit Queue LengthSet the txqueuelen permanently:

vi /etc/rc.local

Add the following (this is the interface where you will receive data):

/sbin/ifconfig em1 txqueuelen 10000

To validate:

# ifconfig em1 | grep txque

ether 90:b1:1c:1f:e6:1b txqueuelen 10000 (Ethernet)



Location Value

/etc/sysctl.conf vm.swappiness = 10

/etc/security/limits.conf hdfs - nofile 32768

/etc/security/limits.conf mapred - nofile 32768

/etc/security/limits.conf hbase - nofile 32768

/etc/security/limits.conf yarn - nofile 32768

/etc/security/limits.conf solr - nofile 32768

/etc/security/limits.conf sqoop2 - nofile 32768

/etc/security/limits.conf spark - nofile 32768

/etc/security/limits.conf hive - nofile 32768

/etc/security/limits.conf impala - nofile 32768

/etc/security/limits.conf hue - nofile 32768

/etc/security/limits.conf kafka - nofile 32768

/etc/security/limits.conf hdfs - nproc 32768

/etc/security/limits.conf mapred - nproc 32768

/etc/security/limits.conf hbase - nproc 32768

/etc/security/limits.conf yarn - nproc 32768

/etc/security/limits.conf solr - nproc 32768

/etc/security/limits.conf sqoop2 - nproc 32768

/etc/security/limits.conf spark - nproc 32768

/etc/security/limits.conf hive - nproc 32768

/etc/security/limits.conf impala - nproc 32768

/etc/security/limits.conf hue - nproc 32768

/etc/security/limits.conf kafka - nproc 32768



Location Value

/etc/security/limits.d/20-nproc.conf hdfs - nproc 32768

/etc/security/limits.d/20-nproc.conf mapred - nproc 32768

/etc/security/limits.d/20-nproc.conf hbase - nproc 32768

/etc/security/limits.d/20-nproc.conf yarn - nproc 32768

/etc/security/limits.d/20-nproc.conf solr - nproc 32768

/etc/security/limits.d/20-nproc.conf sqoop2 - nproc 32768

/etc/security/limits.d/20-nproc.conf spark - nproc 32768

/etc/security/limits.d/20-nproc.conf hive - nproc 32768

/etc/security/limits.d/20-nproc.conf impala - nproc 32768

/etc/security/limits.d/20-nproc.conf hue - nproc 32768

/etc/security/limits.d/20-nproc.conf kafka - nproc 32768

/sys/kernel/mm/transparent_

hugepage/defragecho never

/sys/kernel/mm/transparent_

hugepage/enabledecho never

Proposed Configuration Tuning

jetty.conf for SNYPR Searchmake the default timeout 180K ms vs the

50K

/etc/sysctl.conf

# --------------------------------------------------------------------

# The following allow the server to handle lots of connection requests

# --------------------------------------------------------------------

# Increase number of incoming connections that can queue up



/etc/sysctl.conf

# before dropping

net.core.somaxconn = 50000

# Handle SYN floods and large numbers of valid HTTPS connections

net.ipv4.tcp_max_syn_backlog = 30000

# Increase the length of the network device input queue

net.core.netdev_max_backlog = 20000

# Increase system file descriptor limit so we will (probably)

# never run out under lots of concurrent requests.

# (Per-process limit is set in /etc/security/limits.conf)

fs.file-max = 100000

# Widen the port range used for outgoing connections

net.ipv4.ip_local_port_range = 10000 65000

# If your servers talk UDP, also up these limits

net.ipv4.udp_rmem_min = 8192

net.ipv4.udp_wmem_min = 8192

# --------------------------------------------------------------------

# The following help the server efficiently pipe large amounts of data

# --------------------------------------------------------------------

# Disable source routing and redirects

net.ipv4.conf.all.send_redirects = 0

net.ipv4.conf.all.accept_redirects = 0

net.ipv4.conf.all.accept_source_route = 0

# Disable packet forwarding.



/etc/sysctl.conf

net.ipv4.ip_forward = 0

net.ipv6.conf.all.forwarding = 0

# Disable TCP slow start on idle connections

net.ipv4.tcp_slow_start_after_idle = 0

# Turn on the tcp_window_scaling

net.ipv4.tcp_window_scaling = 1

# Turn on the tcp_timestamps

net.ipv4.tcp_timestamps = 1

# Turn on the tcp_sack

net.ipv4.tcp_sack = 1

# Change Congestion Control (default: reno)


# Increase Linux autotuning TCP buffer limits

# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE

# Don't set tcp_mem itself! Let the kernel scale it based on RAM.



net.core.rmem_default = 16777216

net.core.wmem_default = 16777216

net.core.optmem_max = 40960

net.ipv4.tcp_rmem = 4096 87380 16777216

net.ipv4.tcp_wmem = 4096 87380 16777216

# --------------------------------------------------------------------



/etc/sysctl.conf

# The following allow the server to handle lots of connection churn

# --------------------------------------------------------------------

# Disconnect dead TCP connections after 1 minute

net.ipv4.tcp_keepalive_time = 60

# Wait a maximum of 5 * 2 = 10 seconds in the TIME_WAIT state after a FIN, to handle

# any remaining packets in the network.

net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10

# How long to keep ESTABLISHED connections in conntrack table

# Should be higher than tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_

intvl )

net.netfilter.nf_conntrack_tcp_timeout_established = 300

net.netfilter.nf_conntrack_generic_timeout = 300

# Allow a high number of timewait sockets

net.ipv4.tcp_max_tw_buckets = 2000000

# Timeout broken connections faster (amount of time to wait for FIN)

net.ipv4.tcp_fin_timeout = 10

# Let the networking stack reuse TIME_WAIT connections when it thinks it's safe to do

so

net.ipv4.tcp_tw_reuse = 1

# Determines the wait time between isAlive interval probes (reduce from 75 sec to 15)

net.ipv4.tcp_keepalive_intvl = 15

# Determines the number of probes before timing out (reduce from 9 sec to 5 sec)

net.ipv4.tcp_keepalive_probes = 5

# -------------------------------------------------------------



RIN Syslog ConfigurationWhen the NDB ( Data Broker ) is used in the design, the values of parameters belowneed to be 0:

net.ipv4.conf.enp94s0f1.rp_filter = 0

The NDB is a one way device and will not acknowledge packets received that the OSmay send. For this reason, the Kernel will drop packets if the above is set to 1.


Hadoop Cluster Tuning Recommendations

Hadoop Cluster TuningRecommendationsThe tuning parameters in Table 1 describe the Hadoop tuning parameters for each ofthe services in the Hadoop cluster that optimize the Hadoop cluster performance forthe SNYPR workloads.

Hadoop Cluster Performance

Type

Type

SettingConservative

Optimal

Yar

nAll

yarn

container

memory

60 GB60

GB70 GB

Yar

nAll

Yarn :

Container

Memory

Maximum

4 GB 4 GB 4 GB

Yar

nAll

Java Heap

Size of Node

Manager

850 MB 1 GB 850 MB

Yar

nAll

Java Heap

Size of

ResourceMan

ager

2 GB 2 GB 2 GB



Type

Type

SettingConservative

Optimal

Yar

nAll

ZooKeeper

Client

Timeout

zkClientTime

out

1 min1

min1 min

Hba

seAll

Java Heap

Size of Hbase

Master in

Bytes

1 GB 1 GB 1 GB

Hba

seAll

HBase: Java

Heap Size

Thrift in

Bytes: 1 GB

1 GB 1 GB 1 GB

Hba

seAll

Java Heap

Size of Hbase

RegionServer

in Bytes

15 GB20

GB20 GB

Hba

se

Cl

ou

de

ra

hbase.rpc.tim

eout15 min

10

min15 min

Hba

se

Cl

ou

de

ra

Hbase Client

Scanner

Timeout

15 min10

min15 min



Type

Type

SettingConservative

Optimal

Hba

se

Cl

ou

de

ra

RegionServer

Lease Period15 min

10

min15 min

Hba

seAll

zookeeper.se

ssion.timeout90000

9000

090000

Hba

se

Cl

ou

de

ra

HBase

Service

Advanced

Configuration

Snippet

(Safety

Valve) for

hbase-

site.xml

name:

hbase.ipc.warn.respo

nse.time value: 500

name:

hbase.ipc.warn.respon

se.time value: 500

HD

FSAll

Java Heap

Size of

NameNode in

Bytes

8 GB 8gb 16 GB

HD

FSAll

Java Heap

Size of

DataNode in

Bytes

8 GB 8gb 8 GB

HD

FSAll

Maximum

Concurrent

Moves

300 300 300



Type

Type

SettingConservative

Optimal

HD

FSAll

DataNode

Balancing

Bandwidth

1GB optional , 10MB

default

1GB

optio

nal ,

10M

B

defa

ult

1GB optional , 10MB

default

HD

FSAll

HDFS :

Maximum

Memory

Used for

Caching : 2

GB

2 GB 2gb 2 GB

HD

FSAll

Maximum

Number of

Transfer

Threads

160001600

016000

Imp

ala

Cl

ou

de

ra

Java Heap

Size of

Catalog

Server in

Bytes

2 GB 4gb 2 GB

Imp

ala

Cl

ou

de

ra

Impala

Daemon

Memory

Limit

12 GB20

gb12 GB



Type

Type

SettingConservative

Optimal

Spa

rkAll

Java Heap

Size of

History

Server in

Bytes

512 MB512

MB512 MB

Spa

rk

2

All

Java Heap

Size of

History

Server in

Bytes

512 MB512

MB512 MB

Hiv

eAll

Hive : Spark

Executor

Maximum

Java Heap

Size : 256

MB

256 MB256

MB256 MB

Hiv

eAll

Hive : Spark

Driver

Maximum

Java Heap

Size : 256

MB

256 MB256

MB256 MB

Hiv

eAll

Hive : Spark

Driver

Memory

Overhead :

26 MB

26 MB256

MB26 MB



Type

Type

SettingConservative

Optimal

Hiv

eAll

Hive : Spark

Executor

Memory

Overhead :

26 MB

26 MB256

MB26 MB

Hiv

eAll

Hive : Java

Heap Size of

Hive

Metastore

Server in

Bytes : 4 GB

4 GB 4 gb 4 GB

Kaf

kaAll

KAFKA:

Maximum

Message Size

- message_

max_bytes -

10 MiB

10 MiB10

MB10 MiB

Kaf

kaAll

KAFKA:

Replica

Maximum

Fetch Size -

replica.fetch.

max.bytes -

10 MiB

15 MiB15

MB15 MiB

Kaf

kaAll

Kafka Broker

logging levelERROR ERROR



Type

Type

SettingConservative

Optimal

Kaf

kaAll

Kafka

MirrorMaker

Logging

Threshold

ERROR ERROR

Kaf

kaAll

ZooKeeper

Session

Timeout

zookeeper.se

ssion.timeout.

ms

6s 6s 6s

Kaf

kaAll

Number of

Replica

Fetchers

2 2 4

Kaf

kaAll

open file

limit or

maximum file

descriptors

1000001000

00100000

Kaf

kaAll

Java Heap

Size of

Broker

8 GB 8GB 8 GB

Kaf

kaAll

Data

Retention

Hours

log.retention.

hours

7 days7

Days7 days



Type

Type

SettingConservative

Optimal

Kaf

kaAll

Kafka Broker

Advanced

Configuration

Snippet

(Safety

Valve) for

kafka.propert

ies

num.network.threads

=16

socket.send.buffer.b

ytes=1048576

socket.receive.buffe

r.bytes=1048576

socket.request.max.b

ytes=104857600

replica.fetch.wait.ma

x.ms=500

replica.socket.timeo

ut.ms=30000

replica.socket.receiv

e.buffer.bytes=6553

6

replica.high.waterma

rk.checkpoint.interv

al.ms =5000

controller.socket.tim

eout.ms=30000

controller.message.q

ueue.size=10

zookeeper.sync.time.

ms=2000

socket.request.max.b

ytes=104857600

queued.max.requests

=16

fetch.purgatory.purg

e.interval.requests=

100

producer.purgatory.p

urge.interval.request

s =100

authorizer.class.nam

e=

kafka.security.auth.Si

mpleAclAuthorizer

allow.everyone.if.no.

acl.found=true

num.network.threads=

16

socket.send.buffer.byt

es=1048576

socket.receive.buffer.

bytes=1048576

socket.request.max.by

tes=104857600

replica.fetch.wait.max.

ms=500

replica.socket.timeout.

ms=30000

replica.socket.receive.

buffer.bytes=65536

replica.high.watermar

k.checkpoint.interval.

ms =5000

controller.socket.time

out.ms=30000

controller.message.qu

eue.size=10

zookeeper.sync.time.

ms=2000

socket.request.max.by

tes=104857600

queued.max.requests=

16

fetch.purgatory.purge.i

nterval.requests=100

producer.purgatory.pu

rge.interval.requests=

100

authorizer.class.name=

kafka.security.auth.Si

mpleAclAuthorizer

allow.everyone.if.no.a

cl.found=true



Type

Type

SettingConservative

Optimal

HD

FSAll

Blocks With

Corrupt

Replicas

Monitoring

Thresholds

warning:0.5,

critical:1

warn

ing:0.

5,

critic

al:1

warning:0.5, critical:1

HD

FSAll

Under-

replicated

Block

Monitoring

Thresholds

warning:10,

critical:40

warn

ing:1

0,

critic

al:40

warning:10, critical:40

HD

FSAll

Replication

Factor2 2 2

HD

FSAll

Minimal

Block

Replication

1 1 1

HD

FSAll

Maximal

Block

Replication

512 512 512

HD

FS

Cl

ou

de

ra

Safemode

Threshold

Percentage

0.9990.99

90.999

Ima

pla

Cl

ou

de

ra

dump when

out of

memory

disableddisab

leddisabled



Type

Type

SettingConservative

Optimal

Kaf

ka

Cl

ou

de

ra

dump when

out of

memory

disableddisab

leddisabled

Spa

rk

Cl

ou

de

ra

dump when

out of

memory

disableddisab

leddisabled

Yar

n

Cl

ou

de

ra

dump when

out of

memory

disableddisab

leddisabled

Zoo

kee

per

Cl

ou

de

ra

dump when

out of

memory

disableddisab

leddisabled

zoo

kee

pe

r-

kaf

ka

Cl

ou

de

ra

dump when

out of

memory

disableddisab

leddisabled

Hba

se

Cl

ou

de

ra

Dump Heap

when out of

memory

disableddisab

leddisabled



Type

Type

SettingConservative

Optimal

Kaf

kaAll

Minimum

Number of

Replicas in

ISR

min.insync.re

plicas

1 1



Type

Type

SettingConservative

Optimal

HB

ASEAll

Java

Configuration

Options for

HBase

RegionServer

-XX:+UseParNewGC

-

XX:+UseConcMarkS

weepGC -

XX:CMSInitiatingOcc

upancyFraction=70 -

XX:+CMSParallelRe

markEnabled -

XX:ParallelGCThrea

ds=20 -

XX:ConcGCThreads=

15 -

XX:+UnlockExperim

entalVMOptions -

XX:G1MixedGCLive

ThresholdPercent=8

5 -

XX:G1HeapWastePe

rcent=2 -

XX:InitiatingHeapOc

cupancyPercent=35

-

XX:+PrintReference

GC -

XX:+UseGCLogFileR

otation -

XX:NumberOfGCLog

Files=10 -

XX:GCLogFileSize=2

0M -verbose:gc -

XX:+PrintGCDetails

-

XX:+PrintGCTimeSta

mps -

XX:+PrintGCDateSta

mps -

XX:+PrintTenuringDi

stribution -

XX:+PrintGCApplicat

ionStoppedTime -

Xloggc:/var/log/hbas

e/gc.log

-XX:+UseParNewGC -

XX:+UseConcMarkSw

eepGC -

XX:CMSInitiatingOccu

pancyFraction=70 -

XX:+CMSParallelRema

rkEnabled -

XX:ParallelGCThreads

=20 -

XX:ConcGCThreads=1

5 -

XX:+UnlockExperimen

talVMOptions -

XX:G1MixedGCLiveT

hresholdPercent=85 -

XX:G1HeapWastePerc

ent=2 -

XX:InitiatingHeapOcc

upancyPercent=35 -

XX:+PrintReferenceG

C -

XX:+UseGCLogFileRot

ation -

XX:NumberOfGCLogF

iles=10 -

XX:GCLogFileSize=20

M -verbose:gc -

XX:+PrintGCDetails -

XX:+PrintGCTimeSta

mps -

XX:+PrintGCDateStam

ps -

XX:+PrintTenuringDis

tribution -

XX:+PrintGCApplicati

onStoppedTime -

Xloggc:/var/log/hbase/

gc.log



Type

Type

SettingConservative

Optimal

Zoo

kee

per

Cl

ou

de

ra

Jute Max

Buffer90 MB

90

MB90 MB

Zoo

kee

per

All

Java Heap

Size of

Zookeeper

Server in

Bytes

6 GB 8 GB 6 GB

Zoo

kee

per

All

Minimum

Session

Timeout

8000 8000 8000

Zoo

kee

per

All

Maximum

Session

Timeout

900009000

090000

Zoo

kee

per

All

Canary

Connection

Timeout

20 seconds

20

seco

nds

20 seconds

Zoo

kee

per

AllTick Time

tickTime4000 4000 4000

Zoo

kee

per

All

Maximum

Client

Connections

maxClientCn

xns

8000 8000 8000



Type

Type

SettingConservative

Optimal

Zoo

kee

pe

r-

Kaf

ka

All

Zookeeper-

kafka - Java

Heap Size of

Zookeeper

Server in

Bytes

8 GB 8GB 8 GB

Zoo

kee

pe

r-

Kaf

ka

AllTick Time

tickTime4000 2000 4000

Zoo

kee

pe

r-

kaf

ka

Cl

ou

de

ra

Jute Max

Buffer50 MB

50

MB50 MB

Zoo

kee

pe

r-

kaf

ka

AllMaxclientcon

nections6000 6000 6000



Type

Type

SettingConservative

Optimal

Zoo

kee

pe

r-

kaf

ka

AllminSessionTi

meout4000 4000 4000

Zoo

kee

pe

r-

kaf

ka

AllmaxSessionTi

meout90000

6000

090000

YA

RNAll

yarn.resourc

emanager.am.

max-retries,

yarn.resourc

emanager.am.

max-attempts

20 20 20

Imp

alaAll

Impala

daemon

Saftety valve

--enable_partitioned_

aggregation=true --

enable_partitioned_

hash_join=true



Hadoop Cluster Log Configuration

Service Level Property

HBase ERRORGateway Logging

Threshold

HBase ERRORHBase REST Server

Logging Threshold

HDFS ERRORDataNode Logging

Threshold

HDFS ERRORFailover Controller Logging

Threshold

HDFS ERRORGateway Logging

Threshold

HDFS ERROR HttpFS Logging Threshold

HDFS ERRORJournalNode Logging

Threshold

HDFS ERRORNFS Gateway Logging

Threshold

HDFS ERRORNameNode Block State

Change Logging Threshold

HDFS ERRORNameNode Logging

Threshold

HDFS ERRORSecondaryNameNode

Logging Threshold




Hive ERRORGateway Logging

Threshold

Hive ERRORHive Metastore Server

Logging Threshold

Hive ERRORHiveServer2 Logging

Threshold

Hive ERRORWebHCat Server Logging

Threshold

Imapala ERRORImpala Catalog Server

Logging Threshold

Imapala ERRORImpala Daemon Logging

Threshold

Imapala ERROR

Impala Llama

ApplicationMaster Logging

Threshold

Imapala ERRORImpala StateStore Logging

Threshold

Kafka ERRORGateway Logging

Threshold

Kafka ERRORKafka Broker Logging

Threshold

Kafka ERRORKafka MirrorMaker

Logging Threshold




Key Value Store ERRORLily HBase Indexer Logging

Threshold

Oozie ERROROozie Server Logging

Threshold

Spark ERROR Shell Logging Threshold

Spark ERRORGateway Logging

Threshold

YARN ERRORHistory Server Logging

Threshold

Gateway Logging

Threshold

YARN ERRORJobHistory Server Logging

Threshold

YARN ERRORNodeManager Logging

Threshold

YARN ERRORResourceManager Logging

Threshold

Zookeeper ERROR Server Logging Threshold

Clouder Manager ERRORActivity Monitor Logging

Threshold

Clouder Manager ERRORAlert Publisher Logging

Threshold




Clouder Manager ERROREvent Server Logging

Threshold

Clouder Manager ERRORHost Monitor Logging

Threshold

Clouder Manager ERRORService Monitor Logging

Threshold


Remote Ingestion Node Tuning

Remote Ingestion Node TuningThe below configurations have been tested to support TCP connections from over 3K- 5K hosts providing a continuous stream of data. The NIC on the server is 10G tosupport the increased loads.

If the data is forwarded from a SIEM to the RIN , the number of tcp connections madeare minimal [< 50]. In such a scenario, high number of connections are not a bottleneckand aggressive tuning is not suggested. For high eps environments, dedicatedresources are requested

Reference Server Used is a VMCompare Server to Reference Server lscpu

l Architecture: x86_64

l CPU op-mode(s): 32-bit, 64-bit

l Byte Order: Little Endian

l CPU(s): 8

l On-line CPU(s) list: 0-7

l Thread(s) per core: 1

l Core(s) per socket: 2

l Socket(s): 4

l NUMA node(s): 1

l Vendor ID: GenuineIntel

l CPU family: 6

l Model: 58

l Model name: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

l Stepping: 0

l CPU MHz: 2700.000

l BogoMIPS: 5400.00



l Hypervisor vendor: VMware

l Virtualization type: full

l L1d cache: 32K

l L1i cache: 32K

l L2 cache: 256K

l L3 cache: 30720K

l NUMA node0 CPU(s): 0-7

Server PreparationAdding networking monitoring tools is recommended for gathering statistics anddebugging when errors are present.

Requesting for dedicated resources ensures optimal performance for the collectors.

Dedicated resources and VM configurations setting the latency sensitivity as High[Edit Settings > VM Options > Latency Sensitivity] are essential in any of the following

scenarios:

l Unfiltered eps > 10K.

l Inbound tcp connections > 10

l Complex Filters

Recommended Tools for Network Statisticsl Install netstat: rpm -ivh net-tools-1.60-114.el6.x86_64.rpm. For more rpm

packages:https://rpmfind.net/linux/rpm2html/search.php?query=%2Fbin%2Fnetstat

l Install ethtool: rpm -ivh ethtool-3.5-6.el6.x86_64.rpm. For additional rpm packages:http://fr2.rpmfind.net/linux/rpm2html/search.php?query=ethtool



Tune Server Network ParametersEdit sysctl.conf file:

vi /etc/sysctl.conf

Add the following parameters to sysctl.conf:

# ---------------------------------------------------------------

-----

# The following allow the server to handle lots of connection

requests

# ---------------------------------------------------------------

-----

# Increase number of incoming connections that can queue up

# before dropping

net.core.somaxconn = 50000

# Handle SYN floods and large numbers of valid HTTPS connections

net.ipv4.tcp_max_syn_backlog = 30000

# Increase the length of the network device input queue

net.core.netdev_max_backlog = 20000



# Increase system file descriptor limit so we will (probably)

# never run out under lots of concurrent requests.

# (Per-process limit is set in /etc/security/limits.conf)

fs.file-max = 100000

# Widen the port range used for outgoing connections

net.ipv4.ip_local_port_range = 10000 65000

# If your servers talk UDP, also up these limits

net.ipv4.udp_rmem_min = 8192

net.ipv4.udp_wmem_min = 8192

# ---------------------------------------------------------------

-----

# The following help the server efficiently pipe large amounts of

data

# ---------------------------------------------------------------

-----



# Disable TCP slow start on idle connections

net.ipv4.tcp_slow_start_after_idle = 0

# Turn off the tcp_window_scaling

net.ipv4.tcp_window_scaling = 1

# Turn off the tcp_timestamps

net.ipv4.tcp_timestamps = 0

# Turn off the tcp_sack

net.ipv4.tcp_sack = 0

# Increase Linux autotuning TCP buffer limits

# Don't set tcp_mem itself! Let the kernel scale it based on RAM.

# allow testing with buffers up to 128MB





# increase Linux autotuning TCP buffer limit to 64MB

net.ipv4.tcp_rmem = 4096 87380 67108864

net.ipv4.tcp_wmem = 4096 65536 67108864

# recommended default congestion control is htcp


# recommended for hosts with jumbo frames enabled (only relevand

for systems with 10GB interfaces)

net.ipv4.tcp_mtu_probing=1

Reload sysctl configurations:

sysctl -p

Increase Transmit Queue Length for 10G NICs;

/sbin/ifconfig <interface where you will receive data> txqueuelen

10000

/sbin/ifconfig <interface where you will receive data> txqueuelen

10000

Set the txqueuelen permanently:



vi /etc/rc.local



Syslog-NG Configurations



High EPS Environment



## For Improving performance with lots of connections## max_connections = active_connections## log_iw_size = number of active_connections * EPS## log_fetch_limit = 10000 ## flush_lines = 10000## log_fifo_size = log_iw_size * 10

## Improving performance with a few connections but high amount of traffic:

## log_iw_size = number of active_connections * 100,000## or number of active_connections * EPS whichever is greater

## log_fetch_limit = number of active_connections * 100,000## or number of active_connections * EPS whichever is greater

## log_fifo_size = log_fifo_size = log_iw_size * 10## flush_lines = 10,000 or greater

options {

## Specifies how many lines are flushed to a destination at a time.## The syslog-ng OSE application waits for this number of lines## to accumulate and sends them off in a single batch## Increasing this number increases throughput as more messages are sent in a singlebatch

## but also increases message latency

flush_lines (10000);

## Enable syslog-ng OSE to run in multithreaded mode and use multiple CPUs

threaded(yes) ;## The time to wait in seconds before an idle destination file is closed## Time in sec.time-reap(3);

##The time to wait in seconds before a dead connection is reestablished



## Time in sec.time-reopen(2);

## MARK messages are generated when there was no message traffic## to inform the receiver that the connection is still alive## Destination driver drops all MARK messages

## If an explicit mark-mode() is not given to the drivers## where none is the default value, then none will be used

mark-mode(none);

## STATS are log messages sent by syslog-ng, containing statistics about dropped logmessages## Set to 0 to disable the STATS messagesstats-freq(0);

## If a client sends the log message directly to the syslog-ng server, the chain-hostnames() option is enabled on the server,## and the client sends a hostname in the message that is different from its DNS

hostname (as resolved from DNS by the syslog-ng server),## then the server can append the resolved hostname to the hostname in the message(separated with a / character) when the message is written to the destination.chain_hostnames (off);

## Use DNS for name resolution

use_dns (no);dns_cache(no);use_fqdn (no);

## Create Directories and Set permissions for files getting generated## Set the permission for directories where the file will be read fromcreate_dirs (yes);keep_hostname (yes);chain_hostnames(off);log_msg_size(10000);dir_owner(securonix);dir_group(securonix);



owner(securonix);group(securonix);dir_perm(0775);perm(0775);

## From TCP and unix-stream sources, syslog-ng reads a maximum of log-fetch-limit()from every connection of the source

log-fetch-limit(1000);

};

## Other Notes: Sizing and setting buffer sizes## The number of connections to the source is set using the max-connections()parameter source s_network { network(transport("tcp") ip(0.0.0.0) port(10517) max-connections(10000) keep-alive(yes) so_rcvbuf(161920000) log-iw-size(161920000) );};

## Every destination has an output buffer (log-fifo-size()).

destination d_file{ file("/opt/Ingester/import/in/windows/windows_$R_YEAR$R_MONTH$R_DAY$R_HOUR$R_MIN.log" log-fifo-size(1619200000000) );};

## Flow-control uses a control window to determine if there is free space in theoutput buffer for new messages.

## Every source has its own control window, the log-iw-size() parameter sets the sizeof the control window.## add in Filters as per requirements

log { source(s_network); destination(d_file);

## If the output buffer becomes full, and flow-control is not used, messages may belost ## Comment out if causing impact on source systems buffer

flags(flow-control);

};



Low EPS Environment



options {flush_lines (1000);threaded(yes) ;time-reap(3);time-reopen(2);mark-mode(none);stats-freq(0);chain_hostnames (off);use_dns (no);dns_cache(no);use_fqdn (no);create_dirs (yes);chain_hostnames(off);log_msg_size(10000);dir_owner(securonix);dir_group(securonix);owner(securonix);group(securonix);dir_perm(0775);

perm(0775);log-fetch-limit(1000);threaded(yes) ;time-reap(3);time-reopen(2);mark-mode(none);

stats-freq(0);chain_hostnames (off);use_dns (no);dns_cache(no);use_fqdn (no);create_dirs (yes);keep_hostname (yes);chain_hostnames(off);log_msg_size(10000);dir_owner(securonix);dir_group(securonix);owner(securonix);



group(securonix);dir_perm(0775);perm(0775);log-fetch-limit(1000);

};source s_network { network(transport("tcp") ip(0.0.0.0) port(10517) max-connections(1000) keep-alive(yes) ); };

## Every destination has an output buffer (log-fifo-size()).destination d_file{file("/opt/Ingester/import/in/windows/windows_$R_YEAR$R_MONTH$R_DAY$R_HOUR$R_MIN.log");};

## Flow-control uses a control window to determine if there is free space in theoutput buffer for new messages.## Every source has its own control window, the log-iw-size() parameter sets the sizeof the control window.## add in Filters as per requirements

log { source(s_network);

destination(d_file);

## If the output buffer becomes full, and flow-control is not used, messages may belost

## Comment out if causing impact on source systems buffer

flags(flow-control);

};

Performance ScenariosFor improving performance with a lot of connections:



max_connections = active_connectionslog_iw_size = number of active_connections * EPSlog_fetch_limit = 10000flush_lines = 10000log_fifo_size = log_iw_size * 10

For improving performance with few connections, but a high amount of traffic:

log_iw_size = number of active connections * 100,000 or number of active connections* EPS whichever is greaterlog_fetch_limit = number of active connections * 100,000 or number of activeconnections * EPS whichever is greaterlog_fifo_size = log_fifo_size = log_iw_size * 10flush_lines = 10,000 or greater

Best Practices

Data CollectionThe fastest way the syslog-ng application can receive log messages from the networkis using plain TCP transport with the network() source driver . By default, syslog-ngruns in multithreaded mode to scale to multiple CPUs or cores for increasedperformance

A TCP-based network source will scale based on the number of active connections.

This means that if there are 10 incoming TCP connections all coming to the samenetwork source, then that source can use 10 threads, one thread for each connection.

Higher stats_level decreases the performance. For example, stats_level(2) means -10%in performance.

Data Processing and FilteringMessage processors — such as filters, rewrite rules, and parsers — are executed by thereader thread in a sequential manner.



Simple filtering (for example, filtering on facility or tag) has no impact on performanceat all. However , regular expressions, even simple ones, significantly decrease themessage-processing rate, by about 40-45%.

It is advised to use the simplest filters when filtering incoming messages. If a messagecan be filtered with several types of filters, check the measured data. A message whenfiltered with a regexp , the performance of syslog-ng can drop down to 55-60% of theoriginal performance level. If the tag or facility filters are used, there is no decrease inperformance.

When using multiple filters one after the other, or connecting filters with the logicalAND/OR operators, the order of filters has a significant impact on performance.

Prioritize filters that are the most likely to match the incoming log messages to the topof the configuration file.

Data ConnectionsIf there are several thousand active connections simultaneously , it is advised to placerelay syslog-ng-s on another computer in front of the syslog-ng server. Switching

between active connections is time-consuming, while the amount of incomingmessages is usually not significant. This problem is solved by using relays, since theyare collecting the logs. The syslog-ng solution can handle lots of log messages sentfrom a few connections easily .

For system to support large number of tcp connections , irrespective of the EPS, thepreference is for a 10G NIC. NIC bonding can be carried out if VM cannot provide

dedicated 10G NIC

Debugging

Obtain pcap file at the interface and port to identify the source of the problem:

tcpdump -i eth0 -nnAs0 tcp dst port 517 -w /Securonix/tcpdump-06052018-183200.pcap



Common Errors

Rx DropsIt implies there is a network issue. This could mean fault network, faulty cable, badinterface.

Interface Not Sending ACKThis scenario implies that there is contention on the NIC and the existent NIC isunable to handle the load.

Files Not Getting CreatedThis scenario implies either configuration error in syslog-ng or the file handler limit onthe environment for the user creating the files has been reached.

ReferencesReason to turn off the tcp_sack: http://rtodto.net/effect-of-tcp-sack-on-throughput/

Reason to turn off the tcp_timestamps: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/reduce_tcp_performance_spikes

Result of modifying Latency Sensitivity:

https://www.brianjgraf.com/2016/10/12/enabling-latency-sensitivity-option-on-vms-should-i-do-it/

Results of tuning syslog: https://syslog-ng.com/documents/html/syslog-ng-pe-6.0-guides/en/syslog-ng-pe-v6.0-performance-whitepaper/pdf/syslog-ng-pe-v6.0-performance-whitepaper.pdf

Results of Syslog guidelines: https://syslog-ng.com/documents/html/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/configuring-flow-control.html