harnessing the power of splunk and google cloud: deploy

51
Alex Cain Senior Product Manager | Splunk Harnessing the Power of Splunk and Google Cloud: Deploy, Ingest, and Beyond Roy Arsan Partner Engineer | Google Cloud

Upload: others

Post on 08-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Alex CainSenior Product Manager | Splunk

Harnessing the Power of Splunk and Google Cloud: Deploy, Ingest, and Beyond

Add your headshot to the circle below by clicking the icon in the center.

Roy ArsanPartner Engineer | Google Cloud

© 2019 SPLUNK INC.

Senior Product Manager | Splunk

Alex Cain

Partner Engineer | Google Cloud

Roy Arsan

© 2019 SPLUNK INC.

During the course of this presentation, we may make forward-looking statements regarding future events or

the expected performance of the company. We caution you that such statements reflect our current

expectations and estimates based on factors currently known to us and that actual events or results could

differ materially. For important factors that may cause actual results to differ from those contained in our

forward-looking statements, please review our filings with the SEC.

The forward-looking statements made in this presentation are being made as of the time and date of its live

presentation. If reviewed after its live presentation, this presentation may not contain current or accurate

information. We do not assume any obligation to update any forward-looking statements we may make. In

addition, any information about our roadmap outlines our general product direction and is subject to change

at any time without notice. It is for informational purposes only and shall not be incorporated into any contract

or other commitment. Splunk undertakes no obligation either to develop the features or functionality

described or to include any such feature or functionality in a future release.

Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in

the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2019 Splunk Inc. All rights reserved.

Forward-Looking Statements

THIS SLIDE IS REQUIRED, DO NOT DELETE

© 2019 SPLUNK INC.

Deploy on GCP Ingest from GCP Insights!

AgendaHere’s what is up

© 2019 SPLUNK INC.

Deploying Splunk on GCP

The basics, best practices, and a demo (part 1…)

© 2019 SPLUNK INC.

▶ This White Paper is an excellent starting point for a Splunk deployment, regardless of the underlying infrastructure.

• Use this document to map requirements to architectures and best practices.

▶ https://www.splunk.com/pdfs/technical-briefs/splunk-validated-architectures.pdf

Deploying Splunk - GCPArchitecture first. The Splunk Validated Architectures White Paper

© 2019 SPLUNK INC.

▶ This Tech Brief contains GCP specific best practices and recommendations.

▶ https://www.splunk.com/pdfs/technical-briefs/deploying-splunk-enterprise-on-google-cloud-platform.pdf

Deploying Splunk - GCPGCP basics. The GCP Splunk Deployment Tech Brief

© 2019 SPLUNK INC.

Splunk Enterprise on GCP

© 2019 SPLUNK INC.

1. Networking

2. Compute

3. Storage

GCP Primer

© 2019 SPLUNK INC.

Google Cloud InfrastructureLargest network of any public cloud provider

FASTER (US, JP, TW) 2016

Unity (US, JP) 2010SJC (JP, HK, SG) 2013

Edge points of presence >100

Monet (US, BR) 2017

Netw ork

Netw ork sea cable investments

PLCN Unity (HK, LA) 2018

Indigo (SG, ID, AU) 2019

Tannat (BR, UY, AR) 2017

Junior (Rio, Santos) 2017

Google global cache edge nodes (>800) https://peering.google.com

https://cloud.google.com/about/locations

https://cloud.google.com/compute/docs/regions-zones/regions-zones

Taiwan

Frankfurt

Singapore

S Carolina

N Virginia

Belgium

London

Mumbai

Sydney

Oregon

Iowa

São Paulo

Finland

Tokyo

Montreal

Los Angeles

Netherlands

3

3

3

3

3

3

4

3

3

3

3

3

Region and number of zones

3

3

3

3

3

Zurich

3Hong Kong

3

Osaka

3

3

© 2019 SPLUNK INC.

▶ VPC network is global

GCP Networking

Region

Zone A

Zone B

Region X

VM

VM

VPC

Project

Region Y

VM

VM

Zone A

Zone B

Region

© 2019 SPLUNK INC.

▶ VPC network is global

▶ Subnet spans entire region

GCP Networking

Region

Zone A

Zone B

Region X

VM

VM

VPC

Project

Region Y

VM

VM

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

© 2019 SPLUNK INC.

▶ VPC network is global

▶ Subnet spans entire region

▶ VM private IP address is regional

GCP Networking

Region

Zone A

Zone B

Region X

VM

VM

VPC

Project

Region Y

VM

VM

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

Private: 10...

Public: 203...

Private: 10...

Public: 203...

Private: 172...

Public: 203...

Private: 172...

Public: 203...

© 2019 SPLUNK INC.

▶ VPC network is global

▶ Subnet spans entire region

▶ VM private IP address is regional

▶ Routing table is global

GCP Networking

Region

Zone A

Zone B

Region X

VM

VM

VPC

Project

Region Y

VM

VM

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

Private: 10...

Public: 203...

Private: 10...

Public: 203...

Private: 172...

Public: 203...

Private: 172...

Public: 203...

Destination Next hop Network

172.16.0.0/24 Virtual network default

10.0.0.0/24 Virtual network default

© 2019 SPLUNK INC.

▶ VPC network is global

▶ Subnet spans entire region

▶ VM private IP address is regional

▶ Routing table is global

GCP Networking

Region

Zone A

Zone B

Region X

VM

VM

VPC

Project

Region Y

VM

VM

Zone A

Zone B

Region

Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24

Private: 10...

Public: 203...

Private: 10...

Public: 203...

Private: 172...

Public: 203...

Private: 172...

Public: 203...

Destination Next hop Network

172.16.0.0/24 Virtual network default

10.0.0.0/24 Virtual network default

© 2019 SPLUNK INC.

Google Compute Engine (GCE) Virtual machines, networks (IaaS)

Google Kubernetes Engine (GKE) Managed Docker containers (CaaS)

Google App Engine (GAE) Serverless app platform (PaaS)

Google Cloud Functions Serverless app platform (FaaS)

GCP ComputeGCP offers 4 kinds of scalable computing

© 2019 SPLUNK INC.

▶ VMs

• Predefined and custom machine types

• Live migration

• Managed instance group (zonal or regional)

▶ Billing

• Per-second billing

• Sustained use discounts (up to 30%)

• Committed use discounts (up to 57% or 70%)

Google Compute EngineSome notable features…

© 2019 SPLUNK INC.

Google Compute Engine - Machine Typesfor Splunk Enterprise workload

Indexers: Search Heads:

Deployment Server, License or Cluster Master:

Instance Type Concurrent Users Performance

n1-standard-16 8 Good

n1-standard-32 16 Better

Instance Type Daily Volume (GB)

n1-standard-16 Up to 100

n1-standard-32 100-250

Instance Type Performance

n1-highcpu-8 Good

n1-highcpu-16 Better

© 2019 SPLUNK INC.

GCP StorageStorage and databases

Cloud Storage

Cloud Bigtable

Cloud Datastore

Cloud SQL Cloud Spanner

Persistent Disk

Cloud Memorystore

Cloud Filestore

© 2019 SPLUNK INC.

▶ Block Storage

• Local SSD vs Persistent Disk

• Standard vs SSD Persistent Disks

• Zonal vs Regional Persistent Disks

▶ Object Storage

• Google Cloud Storage (GCS)

• Standard vs Nearline vs Coldline

• Regional vs Multi-Regional

GCP Storage

Cloud

Storage

Persistent

Disk

Google Compute Engine – Storage types

© 2019 SPLUNK INC.

▶ Persistent Disk (PD)

• High performance, low latency (single-digit ms for SSD)

• Durable. Persists if instance dies - can be re-attached

• Up to 64 TB per disk – no RAID required

• Online or live resizing with no downtime

• Regional PD for added redundancy and HA

▶ Local SSD

• Very high throughput, lowest latency

• Data is lost when instance terminates

• 375 GB - can attach up to 8 for total of 3TB

▶ Cloud Storage (GCS)

• Snapshots (backups) are global

GCP StorageSome notable features…

Storage Type Cost per GB/Month

PD Standard $0.04

Regional PD $0.08

PD SSD $0.17

Regional PD SSD $0.34

Local SSD $0.08

GCS Standard $0.02

Listed pricing (us-central1) does not include any

discounts, and is subject to change. See latest

pricing at https://cloud.google.com/pricing/list

© 2019 SPLUNK INC.

1. Storage

2. HA/DR

Best Practices for Splunk

Deployment

© 2019 SPLUNK INC.

▶ Consider local SSD only with clustering

• Limited to 3 TB / indexer

• Must manage striping of local SSDs

▶ Use PD (SSD or Standard) for all other cases

• Peak* IOPS & throughput at only 4TB PD SSD

• Can dynamically resize indexer(s) storage

• Can use SSD PD and Standard PD for hot/warm vs cold storage

▶ Use PD SSD for boot device

• At least 50 GB in size for performance

Storage Best Practices

*Assumes higher core (32+) VMs. See latest performance at

https://cloud.google.com/compute/docs/disks/performance

© 2019 SPLUNK INC.

▶ Regional Managed Instance Group

• For clustered-nodes

• Search Head cluster, Indexer cluster

▶ Failover using Regional PD

• For single-node roles

• Cluster Master, Deployer, etc.

▶ Setup snapshot schedule for PDs

• For non-clustered nodes especially

• Prevents data loss due to use error

Regional High Availability

Health Checking

Consistency

Regional High Availability

Fast RPO & RTO – Better than snapshot

Automatic Failover

Disaster Recovery

Automatic Failover

Best Practices for HA/DR

© 2019 SPLUNK INC.

Goal:

Reliable global forensics analytics, real-time event threat detection, resilient and easy to scale with demand

▶ 20 TB/day

▶ 90-day retention

▶ Splunk Enterprise + ES + UBA

Customer Example

Why migrate

Existing infrastructure expensive and unreliable

Limited on-prem capacity, lack of agility

Did not fit HW spec

Leverage ML capabilities of Google Cloud

Results:

Lower TCO (40% lower cost)

Deployment in days/hours vs months

Ability to easily scale – now 25 TB/day

Resilient to disk, VM and zonal failure

Deployment size:

▶ Multi-site HA

▶ 240 indexers

▶ 15 search heads

▶ 1.6 PB storage

© 2019 SPLUNK INC.

Real-world deploymentReal-world deployment

Indexer

Nodes

Compute Engine

Search Heads

Compute Engine

Interconnect

Netw orking

Site A - us-west2-a

On-

prem

Search Heads

Compute Engine

Indexer Nodes

Compute Engine

Deployer

Compute Engine

Search Head

Cluster

Indexer

Cluster

Master

Compute Engine

Deployment

Compute Engine

License

Compute Engine

*

*

*

Cloud Load

Balancing

Cloud Load

Balancing

Indexer

Nodes

Compute Engine

Search Heads

Compute Engine

Site B - us-west2-b Site C - us-west2-c

* VM’s use Regional Persistent disk to provide zonal redundancy

Subnet 10.0.0.0/24

© 2019 SPLUNK INC.

Splunk Enterprise Terraform scripts

Now open-sourced on GitHub

https://github.com/GoogleCloudPlatform/terraform-google-splunk-enterprise

or

bit.ly/splunk-on-gcp

How do I get started?

© 2019 SPLUNK INC.

Splunk Enterprise on GCP

© 2019 SPLUNK INC.

Demo

© 2019 SPLUNK INC.

Getting GCP data into Splunk

The basics, best practices, and a demo (part 2…)

© 2019 SPLUNK INC.

Use Cases & QuestionsWe all have questions, but how do I know where to start?

Security IT Ops Business

• Are buckets secure? Do they contain

sensitive data?

• What assets are being modified?

• Are we following our access policies?

• Is there any unusual activity or threat?

• How many servers do I have?

• Are services meeting SLAs?

• Are are there any perf bottlenecks?

• What’s the usage over time?

• How many events/requests processed per second?

• Where is most of my cloud spend?

• What are areas to optimize cost?

• Is infrastructure properly sized?

• Are we using what we’ve paid for?

© 2019 SPLUNK INC.

Data CoverageGoogle Cloud offers mountains of data, so get to know it

Security IT Ops Business

• Cloud Security Command Center

• Cloud Asset Inventory

• Cloud Audit Logs

• G Suite Admin Audit Logs

• Stackdriver Logs

• Stackdriver Metrics

• GKE & GKE On-Prem Metrics, Logs, Metadata

• Billing Reports

© 2019 SPLUNK INC.

GDI Service Map

Splunk Enterprise

DBX GCP TAHEC

BigQuery Cloud Storage

Stackdriver

Monitoring

Stackdriver

Logging

Cloud Security

Command Center

Cloud

Pub/Sub

GKE +

GKE On-Prem

Cloud Asset

Inventory

Compute, Storage,

DB, Networking

Services

All GCP-Monitored

Services & Resources

Cloud

Dataflow

© 2019 SPLUNK INC.

Cloud Dataflow

Intelligently scales to millions of QPS

Open source programming model

Unified batch and streaming processing

Fully Managed, No-Ops data processing

© 2019 SPLUNK INC.

Cloud DataflowData Sources and Sinks

Cloud

Storage

Cloud

Bigtable

Cloud

Datastore

BigQuery

Cloud

Pub/Sub

Data Sources

Cloud

Dataflow

Cloud

Storage

Cloud

Bigtable

BigQuery

Cloud

Pub/Sub

Sinks

See Google-provided Dataflow templates for common use cases

https://github.com/GoogleCloudPlatform/DataflowTemplates

Third-

Party DB

© 2019 SPLUNK INC.

Pub/Sub to Splunk Dataflow templateStreaming data to Splunk HEC

● In the Splunk-GCP world, Dataflow can be used to stream data from Pub/Sub to Splunk

● Use “Pub/Sub to Splunk” template pipeline from Google-provided templates:

○ https://github.com/GoogleCloudPlatform/DataflowTemplates

○ Supports dead letter queue into Pub/Sub topic (fallback), secondary sink to GCS (archive)

○ Supports JavaScript User-defined functions (UDF) to transform event before sending to Splunk

Cloud

Dataflow

Dataflow template transforms/enriches data before pushing to Splunk HEC

Splunk HEC

Cloud

Storage

Cloud

Pub/Sub

© 2019 SPLUNK INC.

Dataflow vs AddonHow do these ingestion methods compare?

• Send data to Splunk via HEC

− Normal HEC limitations

• Cloud-native streaming (simplicity, security, scale)

• Wide coverage: Pub/Sub, GCS, BigQuery, etc.

• Simplifies collecting:

− Asset inventory

− G Suite

• Collect data via Pub/Sub in matches

• Some predefined source-types

• Also collects:

− Stackdriver Metrics

− Billing

© 2019 SPLUNK INC.

GCP Stackdriver LogsGCP GDI Pattern

● GCP logs (audit, etc.) end up in Stackdriver Logs

○ Also referred to as GCP Logging

● Stackdriver logs can be configured to have Pub/Sub as a sink destination

○ Remember - Splunk can scalably pull from Pub/Sub

○ Alternatively, can use GCP Dataflow to stream directly from Pub/Sub to Splunk HEC

Stackdriver

Logging

Cloud

Pub/Sub

Splunk pulls from Pub/SubOR stream to Splunk HEC

Stackdriver Logging export sets Pub/Sub as a sink for

incoming logs

GCP Services export logs to Stackdriver logging

Cloud

Dataflow

Alternate path

© 2019 SPLUNK INC.

GCP Stackdriver MetricsGCP GDI Pattern

● ALL Stackdriver Metrics are supported by the Splunk Add-on for GCP

○ For detailed VM instance metrics - Stackdriver agent must be installed

○ List of GCP service metrics here:

• https://cloud.google.com/monitoring/api/metrics_gcp

Stackdriver

GCP Services export metrics to Stackdriver monitoring

Monitoring

Splunk pulls in specific metrics with scheduled API calls

© 2019 SPLUNK INC.

GCP Billing DataGCP GDI Pattern

● GCP Cloud billing reports can be configured to be pushed daily to a GCS bucket. (File Export)

● The Splunk Add-on for Google Cloud Platform comes with an input for pulling reports from a GCS bucket

● Alternative: Billing data exported to BigQuery

○ Export to GCS, then use existing Billing input - need to automate this process

○ BigQuery billing data is actually in a different format and more verbose than supported file (GCS) export approach

Cloud

Storage

Cloud

Billing API

Splunk pulls in billing reports with scheduled API calls

GCP exports billing reports to a GCS bucket

© 2019 SPLUNK INC.

Google Cloud StorageOther than billing data

● Other GCS data?○ Option 1: Use Dataflow templates to stream or batch to Pub/Sub, then pull via Add-on○ Option 2: Use Pub/Sub to Splunk Dataflow template, and set source connector to GCS○ For low bandwidth data use scheduled batch jobs ($), otherwise streaming jobs ($$$)

Cloud

Pub/Sub

Cloud

Dataflow

DataFlow template transforms/enriches data before pushing to Pub/Sub

Cloud

Storage

Splunk pulls from Pub/Sub

Cloud

Dataflow

Cloud

Storage

DataFlow template transforms/enriches data before pushing to Splunk HEC

Pulling via Add-on

Streaming to HEC

© 2019 SPLUNK INC.

G SuiteGCP GDI Pattern

▶ Can we stream to Pub/Sub and use the GCP Add-on?

• G Suite audit logs can be exported to BigQuery

• BigQuery -> DataFlow -> Splunk

OR

• BigQuery -> Export to GCS -> DataFlow -> Splunk

• Latency becomes a consideration

BigQuery Cloud

Dataflow

DataFlow template transforms/enriches data before pushing to Splunk HEC

Optional alternate path

Cloud

Storage

© 2019 SPLUNK INC.

▶ This Add-on supports data collection for a number of GCP sourcetypes out of the box

• https://splunkbase.splunk.com

/app/3088/

Getting Data In – Splunk sideThe Splunk side of things… The Splunk Add-on for Google Cloud Platform

© 2019 SPLUNK INC.

▶ Multi-tiered organization structure allows for separation of projects, products, departments, etc. within GCP

▶ GCP Stackdriver can export aggregated logs from all or a subset of projects, folders, etc.

▶ Configure in one place, log everywhere

Getting Data In – GCP sideGCP Organization structure. Simplified logging across projects

© 2019 SPLUNK INC.

Getting Data InListing it out

Data Source Mechanism

Stackdriver Logs

(includes Cloud Audit)

Splunk GCP Add-on Mod Input, OR

Streaming via Pub/Sub to Splunk Dataflow pipeline

Stackdriver Metrics Splunk GCP Add-on Mod Input

Cloud Storage – Billing Reports Splunk GCP Add-on Mod Input

Cloud Asset Inventory GCP streaming pipeline + Splunk GCP Add-on Mod Input

Cloud Security Command Center GCP streaming pipeline + Splunk GCP Add-on Mod Input

GKE & GKE On-Prem (Anthos) Splunk Connect for K8s (and Splunk App for Infrastructure)

BigQuery Splunk DB Connect using BigQuery JDBC drivers

© 2 0 1 9 S P L U N K I N C .

Check out Session FN2132 for a deeper dive on GCP.”Will cover: Asset Inventory, Cloud Security Command Center,

Anthos & GKE, VPC Flow, Stackdriver Query Library,

and running Splunk of GCP

© 2019 SPLUNK INC.

GCP Side Best Practices What levers to pull on the GCP side

▶ Enable Data Access logs for select or all services

• Best practice: configure at the organization level

• Admin Activity logs enabled by default

▶ Configure logging export to Pub/Sub topic

• Best practice: Set up aggregated export for the organization

• Filtering: Can include/exclude logs for specific resources and types

▶ Set IAM policy permission for Pub/Sub topic

• Grant SA permission to publish to topic

▶ Configure Splunk Add-on for GCP to pull from Pub/Sub topic

• Use dedicated SA for Add-on with least privilege principle

© 2019 SPLUNK INC.

Splunk Side Best Practices What levers to pull on the Splunk side

▶ Scaling data collection

• Setup more inputs for the same Pub/Sub topic (no this wont cause duplication)

• Create more Pub/Sub topics split by use-case (security centric logs in their own topic)

• Add more Instances to collect from Pub/Sub or increase number of HEC listeners

• Pub/Sub to HEC: bump timeout from 10s to 50s

▶ Management

• Centralized vs multiple Pub/Sub topics (more upstream routing)

• Organization vs Projects

© 2019 SPLUNK INC.

Demo

© 2019 SPLUNK INC.

Insert your own screenshot here.

For best results, use an image sized at 1450 x 850

© 2019 SPLUNK INC.© 2019 SPLUNK INC.

You!Thank

RATE THIS SESSION

Go to the .conf19 mobile app to