harnessing the power of splunk and google cloud: deploy

Alex CainSenior Product Manager | Splunk

Harnessing the Power of Splunk and Google Cloud: Deploy, Ingest, and Beyond

Add your headshot to the circle below by clicking the icon in the center.

Roy ArsanPartner Engineer | Google Cloud

Senior Product Manager | Splunk

Alex Cain

Partner Engineer | Google Cloud

Roy Arsan

During the course of this presentation, we may make forward-looking statements regarding future events or

the expected performance of the company. We caution you that such statements reflect our current

expectations and estimates based on factors currently known to us and that actual events or results could

differ materially. For important factors that may cause actual results to differ from those contained in our

forward-looking statements, please review our filings with the SEC.

The forward-looking statements made in this presentation are being made as of the time and date of its live

presentation. If reviewed after its live presentation, this presentation may not contain current or accurate

information. We do not assume any obligation to update any forward-looking statements we may make. In

addition, any information about our roadmap outlines our general product direction and is subject to change

at any time without notice. It is for informational purposes only and shall not be incorporated into any contract

or other commitment. Splunk undertakes no obligation either to develop the features or functionality

described or to include any such feature or functionality in a future release.

Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in

Forward-Looking Statements

THIS SLIDE IS REQUIRED, DO NOT DELETE

Deploy on GCP Ingest from GCP Insights!

AgendaHere’s what is up

Deploying Splunk on GCP

The basics, best practices, and a demo (part 1…)

▶ This White Paper is an excellent starting point for a Splunk deployment, regardless of the underlying infrastructure.

• Use this document to map requirements to architectures and best practices.

▶ https://www.splunk.com/pdfs/technical-briefs/splunk-validated-architectures.pdf

Deploying Splunk - GCPArchitecture first. The Splunk Validated Architectures White Paper

▶ This Tech Brief contains GCP specific best practices and recommendations.

▶ https://www.splunk.com/pdfs/technical-briefs/deploying-splunk-enterprise-on-google-cloud-platform.pdf

Deploying Splunk - GCPGCP basics. The GCP Splunk Deployment Tech Brief

Splunk Enterprise on GCP

1. Networking

2. Compute

3. Storage

GCP Primer

Google Cloud InfrastructureLargest network of any public cloud provider

FASTER (US, JP, TW) 2016

Unity (US, JP) 2010SJC (JP, HK, SG) 2013

Edge points of presence >100

Monet (US, BR) 2017

Netw ork

Netw ork sea cable investments

PLCN Unity (HK, LA) 2018

Indigo (SG, ID, AU) 2019

Tannat (BR, UY, AR) 2017

Junior (Rio, Santos) 2017

Google global cache edge nodes (>800) https://peering.google.com

https://cloud.google.com/about/locations

https://cloud.google.com/compute/docs/regions-zones/regions-zones

Taiwan

Frankfurt

Singapore

S Carolina

N Virginia

Belgium

London

Mumbai

Sydney

Oregon

São Paulo

Finland

Montreal

Los Angeles

Netherlands

Region and number of zones

Zurich

3Hong Kong

▶ VPC network is global

GCP Networking

Region

Zone A

Zone B

Region X

Project

Region Y

Zone A

Zone B

Region

▶ Subnet spans entire region

GCP Networking

Region

Zone A

Zone B

Region X

Project

Region Y

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

▶ VM private IP address is regional

GCP Networking

Region

Zone A

Zone B

Region X

Project

Region Y

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

Private: 10...

Public: 203...

Private: 10...

Public: 203...

Private: 172...

Public: 203...

Private: 172...

Public: 203...

▶ Routing table is global

GCP Networking

Region

Zone A

Zone B

Region X

Project

Region Y

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

Private: 10...

Public: 203...

Private: 10...

Public: 203...

Private: 172...

Public: 203...

Private: 172...

Public: 203...

Destination Next hop Network

172.16.0.0/24 Virtual network default

▶ Routing table is global

GCP Networking

Region

Zone A

Zone B

Region X

Project

Region Y

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

Private: 10...

Public: 203...

Private: 10...

Public: 203...

Private: 172...

Public: 203...

Private: 172...

Public: 203...

Destination Next hop Network

Google Compute Engine (GCE) Virtual machines, networks (IaaS)

Google Kubernetes Engine (GKE) Managed Docker containers (CaaS)

Google App Engine (GAE) Serverless app platform (PaaS)

Google Cloud Functions Serverless app platform (FaaS)

GCP ComputeGCP offers 4 kinds of scalable computing

▶ VMs

• Predefined and custom machine types

• Live migration

• Managed instance group (zonal or regional)

▶ Billing

• Per-second billing

• Sustained use discounts (up to 30%)

• Committed use discounts (up to 57% or 70%)

Google Compute EngineSome notable features…

Google Compute Engine - Machine Typesfor Splunk Enterprise workload

Indexers: Search Heads:

Deployment Server, License or Cluster Master:

Instance Type Concurrent Users Performance

n1-standard-16 8 Good

n1-standard-32 16 Better

Instance Type Daily Volume (GB)

n1-standard-16 Up to 100

n1-standard-32 100-250

Instance Type Performance

n1-highcpu-8 Good

n1-highcpu-16 Better

GCP StorageStorage and databases

Cloud Storage

Cloud Bigtable

Cloud Datastore

Cloud SQL Cloud Spanner

Persistent Disk

Cloud Memorystore

Cloud Filestore

▶ Block Storage

• Local SSD vs Persistent Disk

• Standard vs SSD Persistent Disks

• Zonal vs Regional Persistent Disks

▶ Object Storage

• Google Cloud Storage (GCS)

• Standard vs Nearline vs Coldline

• Regional vs Multi-Regional

GCP Storage

Storage

Persistent

Google Compute Engine – Storage types

▶ Persistent Disk (PD)

• High performance, low latency (single-digit ms for SSD)

• Durable. Persists if instance dies - can be re-attached

• Up to 64 TB per disk – no RAID required

• Online or live resizing with no downtime

• Regional PD for added redundancy and HA

▶ Local SSD

• Very high throughput, lowest latency

• Data is lost when instance terminates

• 375 GB - can attach up to 8 for total of 3TB

▶ Cloud Storage (GCS)

• Snapshots (backups) are global

GCP StorageSome notable features…

Storage Type Cost per GB/Month

PD Standard $0.04

Regional PD $0.08

PD SSD $0.17

Regional PD SSD $0.34

Local SSD $0.08

GCS Standard $0.02

Listed pricing (us-central1) does not include any

discounts, and is subject to change. See latest

pricing at https://cloud.google.com/pricing/list

1. Storage

2. HA/DR

Best Practices for Splunk

Deployment

▶ Consider local SSD only with clustering

• Limited to 3 TB / indexer

• Must manage striping of local SSDs

▶ Use PD (SSD or Standard) for all other cases

• Peak* IOPS & throughput at only 4TB PD SSD

• Can dynamically resize indexer(s) storage

• Can use SSD PD and Standard PD for hot/warm vs cold storage

▶ Use PD SSD for boot device

• At least 50 GB in size for performance

Storage Best Practices

*Assumes higher core (32+) VMs. See latest performance at

https://cloud.google.com/compute/docs/disks/performance

▶ Regional Managed Instance Group

• For clustered-nodes

• Search Head cluster, Indexer cluster

▶ Failover using Regional PD

• For single-node roles

• Cluster Master, Deployer, etc.

▶ Setup snapshot schedule for PDs

• For non-clustered nodes especially

• Prevents data loss due to use error

Regional High Availability

Health Checking

Consistency

Regional High Availability

Fast RPO & RTO – Better than snapshot

Automatic Failover

Disaster Recovery

Automatic Failover

Best Practices for HA/DR

Reliable global forensics analytics, real-time event threat detection, resilient and easy to scale with demand

▶ 20 TB/day

▶ 90-day retention

▶ Splunk Enterprise + ES + UBA

Customer Example

Why migrate

Existing infrastructure expensive and unreliable

Limited on-prem capacity, lack of agility

Did not fit HW spec

Leverage ML capabilities of Google Cloud

Results:

Lower TCO (40% lower cost)

Deployment in days/hours vs months

Ability to easily scale – now 25 TB/day

Resilient to disk, VM and zonal failure

Deployment size:

▶ Multi-site HA

▶ 240 indexers

▶ 15 search heads

▶ 1.6 PB storage

Real-world deploymentReal-world deployment

Indexer

Compute Engine

Search Heads

Compute Engine

Interconnect

Netw orking

Site A - us-west2-a

Search Heads

Compute Engine

Indexer Nodes

Compute Engine

Deployer

Compute Engine

Search Head

Cluster

Indexer

Cluster

Master

Compute Engine

Deployment

Compute Engine

License

Compute Engine

Cloud Load

Balancing

Cloud Load

Balancing

Indexer

Compute Engine

Search Heads

Compute Engine

Site B - us-west2-b Site C - us-west2-c

* VM’s use Regional Persistent disk to provide zonal redundancy

Subnet 10.0.0.0/24

Splunk Enterprise Terraform scripts

Now open-sourced on GitHub

https://github.com/GoogleCloudPlatform/terraform-google-splunk-enterprise

bit.ly/splunk-on-gcp

How do I get started?

Splunk Enterprise on GCP

Getting GCP data into Splunk

The basics, best practices, and a demo (part 2…)

Use Cases & QuestionsWe all have questions, but how do I know where to start?

Security IT Ops Business

• Are buckets secure? Do they contain

sensitive data?

• What assets are being modified?

• Are we following our access policies?

• Is there any unusual activity or threat?

• How many servers do I have?

• Are services meeting SLAs?

• Are are there any perf bottlenecks?

• What’s the usage over time?

• How many events/requests processed per second?

• Where is most of my cloud spend?

• What are areas to optimize cost?

• Is infrastructure properly sized?

• Are we using what we’ve paid for?

Data CoverageGoogle Cloud offers mountains of data, so get to know it

Security IT Ops Business

• Cloud Security Command Center

• Cloud Asset Inventory

• Cloud Audit Logs

• G Suite Admin Audit Logs

• Stackdriver Logs

• Stackdriver Metrics

• GKE & GKE On-Prem Metrics, Logs, Metadata

• Billing Reports

GDI Service Map

Splunk Enterprise

DBX GCP TAHEC

BigQuery Cloud Storage

Stackdriver

Monitoring

Stackdriver

Logging

Cloud Security

Command Center

Pub/Sub

GKE On-Prem

Cloud Asset

Inventory

Compute, Storage,

DB, Networking

Services

All GCP-Monitored

Services & Resources

Dataflow

Cloud Dataflow

Intelligently scales to millions of QPS

Open source programming model

Unified batch and streaming processing

Fully Managed, No-Ops data processing

Cloud DataflowData Sources and Sinks

Storage

Bigtable

Datastore

BigQuery

Pub/Sub

Data Sources

Dataflow

Storage

Bigtable

BigQuery

Pub/Sub

See Google-provided Dataflow templates for common use cases

https://github.com/GoogleCloudPlatform/DataflowTemplates

Third-

Party DB

Pub/Sub to Splunk Dataflow templateStreaming data to Splunk HEC

● In the Splunk-GCP world, Dataflow can be used to stream data from Pub/Sub to Splunk

● Use “Pub/Sub to Splunk” template pipeline from Google-provided templates:

○ https://github.com/GoogleCloudPlatform/DataflowTemplates

○ Supports dead letter queue into Pub/Sub topic (fallback), secondary sink to GCS (archive)

○ Supports JavaScript User-defined functions (UDF) to transform event before sending to Splunk

Dataflow

Dataflow template transforms/enriches data before pushing to Splunk HEC

Splunk HEC

Storage

Pub/Sub

Dataflow vs AddonHow do these ingestion methods compare?

• Send data to Splunk via HEC

− Normal HEC limitations

• Cloud-native streaming (simplicity, security, scale)

• Wide coverage: Pub/Sub, GCS, BigQuery, etc.

• Simplifies collecting:

− Asset inventory

− G Suite

• Collect data via Pub/Sub in matches

• Some predefined source-types

• Also collects:

− Stackdriver Metrics

− Billing

GCP Stackdriver LogsGCP GDI Pattern

● GCP logs (audit, etc.) end up in Stackdriver Logs

○ Also referred to as GCP Logging

● Stackdriver logs can be configured to have Pub/Sub as a sink destination

○ Remember - Splunk can scalably pull from Pub/Sub

○ Alternatively, can use GCP Dataflow to stream directly from Pub/Sub to Splunk HEC

Stackdriver

Logging

Pub/Sub

Splunk pulls from Pub/SubOR stream to Splunk HEC

Stackdriver Logging export sets Pub/Sub as a sink for

incoming logs

GCP Services export logs to Stackdriver logging

Dataflow

Alternate path

GCP Stackdriver MetricsGCP GDI Pattern

● ALL Stackdriver Metrics are supported by the Splunk Add-on for GCP

○ For detailed VM instance metrics - Stackdriver agent must be installed

○ List of GCP service metrics here:

• https://cloud.google.com/monitoring/api/metrics_gcp

Stackdriver

GCP Services export metrics to Stackdriver monitoring

Monitoring

Splunk pulls in specific metrics with scheduled API calls

GCP Billing DataGCP GDI Pattern

● GCP Cloud billing reports can be configured to be pushed daily to a GCS bucket. (File Export)

● The Splunk Add-on for Google Cloud Platform comes with an input for pulling reports from a GCS bucket

● Alternative: Billing data exported to BigQuery

○ Export to GCS, then use existing Billing input - need to automate this process

○ BigQuery billing data is actually in a different format and more verbose than supported file (GCS) export approach

Storage

Billing API

Splunk pulls in billing reports with scheduled API calls

GCP exports billing reports to a GCS bucket

Google Cloud StorageOther than billing data

● Other GCS data?○ Option 1: Use Dataflow templates to stream or batch to Pub/Sub, then pull via Add-on○ Option 2: Use Pub/Sub to Splunk Dataflow template, and set source connector to GCS○ For low bandwidth data use scheduled batch jobs ($), otherwise streaming jobs ($$$)

Pub/Sub

Dataflow

DataFlow template transforms/enriches data before pushing to Pub/Sub

Storage

Splunk pulls from Pub/Sub

Dataflow

Storage

DataFlow template transforms/enriches data before pushing to Splunk HEC

Pulling via Add-on

Streaming to HEC

G SuiteGCP GDI Pattern

▶ Can we stream to Pub/Sub and use the GCP Add-on?

• G Suite audit logs can be exported to BigQuery

• BigQuery -> DataFlow -> Splunk

• BigQuery -> Export to GCS -> DataFlow -> Splunk

• Latency becomes a consideration

BigQuery Cloud

Dataflow

DataFlow template transforms/enriches data before pushing to Splunk HEC

Optional alternate path

Storage

▶ This Add-on supports data collection for a number of GCP sourcetypes out of the box

• https://splunkbase.splunk.com

/app/3088/

Getting Data In – Splunk sideThe Splunk side of things… The Splunk Add-on for Google Cloud Platform

▶ Multi-tiered organization structure allows for separation of projects, products, departments, etc. within GCP

▶ GCP Stackdriver can export aggregated logs from all or a subset of projects, folders, etc.

▶ Configure in one place, log everywhere

Getting Data In – GCP sideGCP Organization structure. Simplified logging across projects

Getting Data InListing it out

Data Source Mechanism

Stackdriver Logs

(includes Cloud Audit)

Splunk GCP Add-on Mod Input, OR

Streaming via Pub/Sub to Splunk Dataflow pipeline

Stackdriver Metrics Splunk GCP Add-on Mod Input

Cloud Storage – Billing Reports Splunk GCP Add-on Mod Input

Cloud Asset Inventory GCP streaming pipeline + Splunk GCP Add-on Mod Input

Cloud Security Command Center GCP streaming pipeline + Splunk GCP Add-on Mod Input

GKE & GKE On-Prem (Anthos) Splunk Connect for K8s (and Splunk App for Infrastructure)

BigQuery Splunk DB Connect using BigQuery JDBC drivers

Check out Session FN2132 for a deeper dive on GCP.”Will cover: Asset Inventory, Cloud Security Command Center,

Anthos & GKE, VPC Flow, Stackdriver Query Library,

and running Splunk of GCP

GCP Side Best Practices What levers to pull on the GCP side

▶ Enable Data Access logs for select or all services

• Best practice: configure at the organization level

• Admin Activity logs enabled by default

▶ Configure logging export to Pub/Sub topic

• Best practice: Set up aggregated export for the organization

• Filtering: Can include/exclude logs for specific resources and types

▶ Set IAM policy permission for Pub/Sub topic

• Grant SA permission to publish to topic

▶ Configure Splunk Add-on for GCP to pull from Pub/Sub topic

• Use dedicated SA for Add-on with least privilege principle

Splunk Side Best Practices What levers to pull on the Splunk side

▶ Scaling data collection

• Setup more inputs for the same Pub/Sub topic (no this wont cause duplication)

• Create more Pub/Sub topics split by use-case (security centric logs in their own topic)

• Add more Instances to collect from Pub/Sub or increase number of HEC listeners

• Pub/Sub to HEC: bump timeout from 10s to 50s

▶ Management

• Centralized vs multiple Pub/Sub topics (more upstream routing)

• Organization vs Projects

Insert your own screenshot here.

For best results, use an image sized at 1450 x 850

You!Thank

RATE THIS SESSION

Go to the .conf19 mobile app to

harnessing the power of splunk and google cloud: deploy

Documents

program overview - splunk...mission-critical services. none....

islandsofsplunk* - .conf2017 | the 8th annual splunk...

splunk developer & admin certification training...this...

splunk cloud product...

splunk review - university of birmingham · ref: splunk...

splunk spark integration - github...

rivium splunk windows · o splunk enterprise security * o...

splunk® enterprise -...

splunk in rakuten: splunk as a service for all

splunk® for big data analytics · sales@splunk.com...

splunk inc. splunk 4.1.7 security target - common...

stoq’ing your splunk - sans · pdf filestoq’ingyour...

splunk overview · internet of things and industrial data....

referent / redner benjamin tiggemann · 2015-07-08 ·...

splunk conf2014 - splunk for data science

splunkingyour mobile apps - .conf19 | splunk · introducing...

splunk conf2010: corporate express presents splunk with sap

fortscale splunk integrationinfo.fortscale.com/hubfs/ueba...

netﬁlter iptables for splunk documentation - read the...

splunk validated architectures · splunk engineers...