harnessing the power of splunk and google cloud: deploy
Post on 08-Apr-2022
2 Views
Preview:
TRANSCRIPT
Alex CainSenior Product Manager | Splunk
Harnessing the Power of Splunk and Google Cloud: Deploy, Ingest, and Beyond
Add your headshot to the circle below by clicking the icon in the center.
Roy ArsanPartner Engineer | Google Cloud
© 2019 SPLUNK INC.
Senior Product Manager | Splunk
Alex Cain
Partner Engineer | Google Cloud
Roy Arsan
© 2019 SPLUNK INC.
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward-looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2019 Splunk Inc. All rights reserved.
Forward-Looking Statements
THIS SLIDE IS REQUIRED, DO NOT DELETE
© 2019 SPLUNK INC.
▶ This White Paper is an excellent starting point for a Splunk deployment, regardless of the underlying infrastructure.
• Use this document to map requirements to architectures and best practices.
▶ https://www.splunk.com/pdfs/technical-briefs/splunk-validated-architectures.pdf
Deploying Splunk - GCPArchitecture first. The Splunk Validated Architectures White Paper
© 2019 SPLUNK INC.
▶ This Tech Brief contains GCP specific best practices and recommendations.
▶ https://www.splunk.com/pdfs/technical-briefs/deploying-splunk-enterprise-on-google-cloud-platform.pdf
Deploying Splunk - GCPGCP basics. The GCP Splunk Deployment Tech Brief
© 2019 SPLUNK INC.
Google Cloud InfrastructureLargest network of any public cloud provider
FASTER (US, JP, TW) 2016
Unity (US, JP) 2010SJC (JP, HK, SG) 2013
Edge points of presence >100
Monet (US, BR) 2017
Netw ork
Netw ork sea cable investments
PLCN Unity (HK, LA) 2018
Indigo (SG, ID, AU) 2019
Tannat (BR, UY, AR) 2017
Junior (Rio, Santos) 2017
Google global cache edge nodes (>800) https://peering.google.com
https://cloud.google.com/about/locations
https://cloud.google.com/compute/docs/regions-zones/regions-zones
Taiwan
Frankfurt
Singapore
S Carolina
N Virginia
Belgium
London
Mumbai
Sydney
Oregon
Iowa
São Paulo
Finland
Tokyo
Montreal
Los Angeles
Netherlands
3
3
3
3
3
3
4
3
3
3
3
3
Region and number of zones
3
3
3
3
3
Zurich
3Hong Kong
3
Osaka
3
3
© 2019 SPLUNK INC.
▶ VPC network is global
GCP Networking
Region
Zone A
Zone B
Region X
VM
VM
VPC
Project
Region Y
VM
VM
Zone A
Zone B
Region
© 2019 SPLUNK INC.
▶ VPC network is global
▶ Subnet spans entire region
GCP Networking
Region
Zone A
Zone B
Region X
VM
VM
VPC
Project
Region Y
VM
VM
Zone A
Zone B
Region
Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24
© 2019 SPLUNK INC.
▶ VPC network is global
▶ Subnet spans entire region
▶ VM private IP address is regional
GCP Networking
Region
Zone A
Zone B
Region X
VM
VM
VPC
Project
Region Y
VM
VM
Zone A
Zone B
Region
Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24
Private: 10...
Public: 203...
Private: 10...
Public: 203...
Private: 172...
Public: 203...
Private: 172...
Public: 203...
© 2019 SPLUNK INC.
▶ VPC network is global
▶ Subnet spans entire region
▶ VM private IP address is regional
▶ Routing table is global
GCP Networking
Region
Zone A
Zone B
Region X
VM
VM
VPC
Project
Region Y
VM
VM
Zone A
Zone B
Region
Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24
Private: 10...
Public: 203...
Private: 10...
Public: 203...
Private: 172...
Public: 203...
Private: 172...
Public: 203...
Destination Next hop Network
172.16.0.0/24 Virtual network default
10.0.0.0/24 Virtual network default
© 2019 SPLUNK INC.
▶ VPC network is global
▶ Subnet spans entire region
▶ VM private IP address is regional
▶ Routing table is global
GCP Networking
Region
Zone A
Zone B
Region X
VM
VM
VPC
Project
Region Y
VM
VM
Zone A
Zone B
Region
Subnet X1: 10.0.0.0/24 Subnet Y1: 172.16.0.0/24
Private: 10...
Public: 203...
Private: 10...
Public: 203...
Private: 172...
Public: 203...
Private: 172...
Public: 203...
Destination Next hop Network
172.16.0.0/24 Virtual network default
10.0.0.0/24 Virtual network default
© 2019 SPLUNK INC.
Google Compute Engine (GCE) Virtual machines, networks (IaaS)
Google Kubernetes Engine (GKE) Managed Docker containers (CaaS)
Google App Engine (GAE) Serverless app platform (PaaS)
Google Cloud Functions Serverless app platform (FaaS)
GCP ComputeGCP offers 4 kinds of scalable computing
© 2019 SPLUNK INC.
▶ VMs
• Predefined and custom machine types
• Live migration
• Managed instance group (zonal or regional)
▶ Billing
• Per-second billing
• Sustained use discounts (up to 30%)
• Committed use discounts (up to 57% or 70%)
Google Compute EngineSome notable features…
© 2019 SPLUNK INC.
Google Compute Engine - Machine Typesfor Splunk Enterprise workload
Indexers: Search Heads:
Deployment Server, License or Cluster Master:
Instance Type Concurrent Users Performance
n1-standard-16 8 Good
n1-standard-32 16 Better
Instance Type Daily Volume (GB)
n1-standard-16 Up to 100
n1-standard-32 100-250
Instance Type Performance
n1-highcpu-8 Good
n1-highcpu-16 Better
© 2019 SPLUNK INC.
GCP StorageStorage and databases
Cloud Storage
Cloud Bigtable
Cloud Datastore
Cloud SQL Cloud Spanner
Persistent Disk
Cloud Memorystore
Cloud Filestore
© 2019 SPLUNK INC.
▶ Block Storage
• Local SSD vs Persistent Disk
• Standard vs SSD Persistent Disks
• Zonal vs Regional Persistent Disks
▶ Object Storage
• Google Cloud Storage (GCS)
• Standard vs Nearline vs Coldline
• Regional vs Multi-Regional
GCP Storage
Cloud
Storage
Persistent
Disk
Google Compute Engine – Storage types
© 2019 SPLUNK INC.
▶ Persistent Disk (PD)
• High performance, low latency (single-digit ms for SSD)
• Durable. Persists if instance dies - can be re-attached
• Up to 64 TB per disk – no RAID required
• Online or live resizing with no downtime
• Regional PD for added redundancy and HA
▶ Local SSD
• Very high throughput, lowest latency
• Data is lost when instance terminates
• 375 GB - can attach up to 8 for total of 3TB
▶ Cloud Storage (GCS)
• Snapshots (backups) are global
GCP StorageSome notable features…
Storage Type Cost per GB/Month
PD Standard $0.04
Regional PD $0.08
PD SSD $0.17
Regional PD SSD $0.34
Local SSD $0.08
GCS Standard $0.02
Listed pricing (us-central1) does not include any
discounts, and is subject to change. See latest
pricing at https://cloud.google.com/pricing/list
© 2019 SPLUNK INC.
▶ Consider local SSD only with clustering
• Limited to 3 TB / indexer
• Must manage striping of local SSDs
▶ Use PD (SSD or Standard) for all other cases
• Peak* IOPS & throughput at only 4TB PD SSD
• Can dynamically resize indexer(s) storage
• Can use SSD PD and Standard PD for hot/warm vs cold storage
▶ Use PD SSD for boot device
• At least 50 GB in size for performance
Storage Best Practices
*Assumes higher core (32+) VMs. See latest performance at
https://cloud.google.com/compute/docs/disks/performance
© 2019 SPLUNK INC.
▶ Regional Managed Instance Group
• For clustered-nodes
• Search Head cluster, Indexer cluster
▶ Failover using Regional PD
• For single-node roles
• Cluster Master, Deployer, etc.
▶ Setup snapshot schedule for PDs
• For non-clustered nodes especially
• Prevents data loss due to use error
Regional High Availability
Health Checking
Consistency
Regional High Availability
Fast RPO & RTO – Better than snapshot
Automatic Failover
Disaster Recovery
Automatic Failover
Best Practices for HA/DR
© 2019 SPLUNK INC.
Goal:
Reliable global forensics analytics, real-time event threat detection, resilient and easy to scale with demand
▶ 20 TB/day
▶ 90-day retention
▶ Splunk Enterprise + ES + UBA
Customer Example
Why migrate
Existing infrastructure expensive and unreliable
Limited on-prem capacity, lack of agility
Did not fit HW spec
Leverage ML capabilities of Google Cloud
Results:
Lower TCO (40% lower cost)
Deployment in days/hours vs months
Ability to easily scale – now 25 TB/day
Resilient to disk, VM and zonal failure
Deployment size:
▶ Multi-site HA
▶ 240 indexers
▶ 15 search heads
▶ 1.6 PB storage
© 2019 SPLUNK INC.
Real-world deploymentReal-world deployment
Indexer
Nodes
Compute Engine
Search Heads
Compute Engine
Interconnect
Netw orking
Site A - us-west2-a
On-
prem
Search Heads
Compute Engine
Indexer Nodes
Compute Engine
Deployer
Compute Engine
Search Head
Cluster
Indexer
Cluster
Master
Compute Engine
Deployment
Compute Engine
License
Compute Engine
*
*
*
Cloud Load
Balancing
Cloud Load
Balancing
Indexer
Nodes
Compute Engine
Search Heads
Compute Engine
Site B - us-west2-b Site C - us-west2-c
* VM’s use Regional Persistent disk to provide zonal redundancy
Subnet 10.0.0.0/24
© 2019 SPLUNK INC.
Splunk Enterprise Terraform scripts
Now open-sourced on GitHub
https://github.com/GoogleCloudPlatform/terraform-google-splunk-enterprise
or
bit.ly/splunk-on-gcp
How do I get started?
© 2019 SPLUNK INC.
Use Cases & QuestionsWe all have questions, but how do I know where to start?
Security IT Ops Business
• Are buckets secure? Do they contain
sensitive data?
• What assets are being modified?
• Are we following our access policies?
• Is there any unusual activity or threat?
• How many servers do I have?
• Are services meeting SLAs?
• Are are there any perf bottlenecks?
• What’s the usage over time?
• How many events/requests processed per second?
• Where is most of my cloud spend?
• What are areas to optimize cost?
• Is infrastructure properly sized?
• Are we using what we’ve paid for?
© 2019 SPLUNK INC.
Data CoverageGoogle Cloud offers mountains of data, so get to know it
Security IT Ops Business
• Cloud Security Command Center
• Cloud Asset Inventory
• Cloud Audit Logs
• G Suite Admin Audit Logs
• Stackdriver Logs
• Stackdriver Metrics
• GKE & GKE On-Prem Metrics, Logs, Metadata
• Billing Reports
© 2019 SPLUNK INC.
GDI Service Map
Splunk Enterprise
DBX GCP TAHEC
BigQuery Cloud Storage
Stackdriver
Monitoring
Stackdriver
Logging
Cloud Security
Command Center
Cloud
Pub/Sub
GKE +
GKE On-Prem
Cloud Asset
Inventory
Compute, Storage,
DB, Networking
Services
All GCP-Monitored
Services & Resources
Cloud
Dataflow
© 2019 SPLUNK INC.
Cloud Dataflow
Intelligently scales to millions of QPS
Open source programming model
Unified batch and streaming processing
Fully Managed, No-Ops data processing
© 2019 SPLUNK INC.
Cloud DataflowData Sources and Sinks
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
BigQuery
Cloud
Pub/Sub
Data Sources
Cloud
Dataflow
Cloud
Storage
Cloud
Bigtable
BigQuery
Cloud
Pub/Sub
Sinks
See Google-provided Dataflow templates for common use cases
https://github.com/GoogleCloudPlatform/DataflowTemplates
Third-
Party DB
© 2019 SPLUNK INC.
Pub/Sub to Splunk Dataflow templateStreaming data to Splunk HEC
● In the Splunk-GCP world, Dataflow can be used to stream data from Pub/Sub to Splunk
● Use “Pub/Sub to Splunk” template pipeline from Google-provided templates:
○ https://github.com/GoogleCloudPlatform/DataflowTemplates
○ Supports dead letter queue into Pub/Sub topic (fallback), secondary sink to GCS (archive)
○ Supports JavaScript User-defined functions (UDF) to transform event before sending to Splunk
Cloud
Dataflow
Dataflow template transforms/enriches data before pushing to Splunk HEC
Splunk HEC
Cloud
Storage
Cloud
Pub/Sub
© 2019 SPLUNK INC.
Dataflow vs AddonHow do these ingestion methods compare?
• Send data to Splunk via HEC
− Normal HEC limitations
• Cloud-native streaming (simplicity, security, scale)
• Wide coverage: Pub/Sub, GCS, BigQuery, etc.
• Simplifies collecting:
− Asset inventory
− G Suite
• Collect data via Pub/Sub in matches
• Some predefined source-types
• Also collects:
− Stackdriver Metrics
− Billing
© 2019 SPLUNK INC.
GCP Stackdriver LogsGCP GDI Pattern
● GCP logs (audit, etc.) end up in Stackdriver Logs
○ Also referred to as GCP Logging
● Stackdriver logs can be configured to have Pub/Sub as a sink destination
○ Remember - Splunk can scalably pull from Pub/Sub
○ Alternatively, can use GCP Dataflow to stream directly from Pub/Sub to Splunk HEC
Stackdriver
Logging
Cloud
Pub/Sub
Splunk pulls from Pub/SubOR stream to Splunk HEC
Stackdriver Logging export sets Pub/Sub as a sink for
incoming logs
GCP Services export logs to Stackdriver logging
Cloud
Dataflow
Alternate path
© 2019 SPLUNK INC.
GCP Stackdriver MetricsGCP GDI Pattern
● ALL Stackdriver Metrics are supported by the Splunk Add-on for GCP
○ For detailed VM instance metrics - Stackdriver agent must be installed
○ List of GCP service metrics here:
• https://cloud.google.com/monitoring/api/metrics_gcp
Stackdriver
GCP Services export metrics to Stackdriver monitoring
Monitoring
Splunk pulls in specific metrics with scheduled API calls
© 2019 SPLUNK INC.
GCP Billing DataGCP GDI Pattern
● GCP Cloud billing reports can be configured to be pushed daily to a GCS bucket. (File Export)
● The Splunk Add-on for Google Cloud Platform comes with an input for pulling reports from a GCS bucket
● Alternative: Billing data exported to BigQuery
○ Export to GCS, then use existing Billing input - need to automate this process
○ BigQuery billing data is actually in a different format and more verbose than supported file (GCS) export approach
Cloud
Storage
Cloud
Billing API
Splunk pulls in billing reports with scheduled API calls
GCP exports billing reports to a GCS bucket
© 2019 SPLUNK INC.
Google Cloud StorageOther than billing data
● Other GCS data?○ Option 1: Use Dataflow templates to stream or batch to Pub/Sub, then pull via Add-on○ Option 2: Use Pub/Sub to Splunk Dataflow template, and set source connector to GCS○ For low bandwidth data use scheduled batch jobs ($), otherwise streaming jobs ($$$)
Cloud
Pub/Sub
Cloud
Dataflow
DataFlow template transforms/enriches data before pushing to Pub/Sub
Cloud
Storage
Splunk pulls from Pub/Sub
Cloud
Dataflow
Cloud
Storage
DataFlow template transforms/enriches data before pushing to Splunk HEC
Pulling via Add-on
Streaming to HEC
© 2019 SPLUNK INC.
G SuiteGCP GDI Pattern
▶ Can we stream to Pub/Sub and use the GCP Add-on?
• G Suite audit logs can be exported to BigQuery
• BigQuery -> DataFlow -> Splunk
OR
• BigQuery -> Export to GCS -> DataFlow -> Splunk
• Latency becomes a consideration
BigQuery Cloud
Dataflow
DataFlow template transforms/enriches data before pushing to Splunk HEC
Optional alternate path
Cloud
Storage
© 2019 SPLUNK INC.
▶ This Add-on supports data collection for a number of GCP sourcetypes out of the box
• https://splunkbase.splunk.com
/app/3088/
Getting Data In – Splunk sideThe Splunk side of things… The Splunk Add-on for Google Cloud Platform
© 2019 SPLUNK INC.
▶ Multi-tiered organization structure allows for separation of projects, products, departments, etc. within GCP
▶ GCP Stackdriver can export aggregated logs from all or a subset of projects, folders, etc.
▶ Configure in one place, log everywhere
Getting Data In – GCP sideGCP Organization structure. Simplified logging across projects
© 2019 SPLUNK INC.
Getting Data InListing it out
Data Source Mechanism
Stackdriver Logs
(includes Cloud Audit)
Splunk GCP Add-on Mod Input, OR
Streaming via Pub/Sub to Splunk Dataflow pipeline
Stackdriver Metrics Splunk GCP Add-on Mod Input
Cloud Storage – Billing Reports Splunk GCP Add-on Mod Input
Cloud Asset Inventory GCP streaming pipeline + Splunk GCP Add-on Mod Input
Cloud Security Command Center GCP streaming pipeline + Splunk GCP Add-on Mod Input
GKE & GKE On-Prem (Anthos) Splunk Connect for K8s (and Splunk App for Infrastructure)
BigQuery Splunk DB Connect using BigQuery JDBC drivers
© 2 0 1 9 S P L U N K I N C .
Check out Session FN2132 for a deeper dive on GCP.”Will cover: Asset Inventory, Cloud Security Command Center,
Anthos & GKE, VPC Flow, Stackdriver Query Library,
and running Splunk of GCP
© 2019 SPLUNK INC.
GCP Side Best Practices What levers to pull on the GCP side
▶ Enable Data Access logs for select or all services
• Best practice: configure at the organization level
• Admin Activity logs enabled by default
▶ Configure logging export to Pub/Sub topic
• Best practice: Set up aggregated export for the organization
• Filtering: Can include/exclude logs for specific resources and types
▶ Set IAM policy permission for Pub/Sub topic
• Grant SA permission to publish to topic
▶ Configure Splunk Add-on for GCP to pull from Pub/Sub topic
• Use dedicated SA for Add-on with least privilege principle
© 2019 SPLUNK INC.
Splunk Side Best Practices What levers to pull on the Splunk side
▶ Scaling data collection
• Setup more inputs for the same Pub/Sub topic (no this wont cause duplication)
• Create more Pub/Sub topics split by use-case (security centric logs in their own topic)
• Add more Instances to collect from Pub/Sub or increase number of HEC listeners
• Pub/Sub to HEC: bump timeout from 10s to 50s
▶ Management
• Centralized vs multiple Pub/Sub topics (more upstream routing)
• Organization vs Projects
© 2019 SPLUNK INC.
Insert your own screenshot here.
For best results, use an image sized at 1450 x 850
top related