5 key considerations for monitoring google cloud€¦ · occur within your environment (for...

16
5 Key Considerations for Monitoring Google Cloud

Upload: others

Post on 15-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

5 Key Considerationsfor Monitoring

Google Cloud

Page 2: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

What’s inside

Introduction

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Software is taking over the world

Hybrid, multi-cloud is the norm

Microservices and containers require real-time visibility

AI empowers — but not all AI is equal

DevOps and Continuous Delivery: Innovation's soulmate

Digital experiences matter

As companies push to digitally transform, they are accelerating their workloads to the cloud to leverage the technology platform they need to release better software faster, and ensure it works perfectly across every customer interaction. But, the very nature of dynamic cloud environments is complex and can threaten an organization’s business / end-user experience.

This eBook will give you 5 key tips for monitoring Google Cloud so that you can better allow development, operations/SRE and business teams to get fast feedback on application performance. This will help you modify and improve applications quickly, and continue to increase the value you are delivering to business teams.

Introduction

2 ©2020 Dynatrace5 Key Considerations for Google Cloud

Page 3: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Speed and scale: a double-edged sword

You invested in Google Cloud Platform to build and run your software at a speed and scale that will transform your business—that’s where Google Cloud excels. But are you prepared for the complexity that comes with speed and scale? As software development transitions to a cloud native approach that employs microservices, containers, and software-defined cloud infrastructure, the complexity you will experience in the immediate future is more immense than the human mind can envision.

You also invested in monitoring tools. Lots of them over the years. But your traditional monitoring tools don’t work in this new dynamic world of speed and scale that Google Cloud enables. That’s why many analysts and industry leaders predict that more than 50% of enterprises will entirely replace their traditional monitoring tools in the next few years. Which brings us to why we’ve written this guide. We understand how important your software is, and why choosing the right

monitoring platform is mandatory if you want to live by speed and scale, and not die by speed and scale.

Manual effortSlow, manual deployment and config coupled with manual upgrades and rework when environments change means a maximum of just 5% of apps are monitored

Monitoring tool proliferationMultiple monitoring tools for different purposes with siloed teams looking at myopic data sets

Agent complexityComplex mix of agents for diverse technology types each with different deployment, installation, and configuration processes

Just a bunch of chartsData from multiple agents and different sources looks great but it's just a bunch of charts on a dashboard with no answers

Software is taking over the worldIntroduction

5 Key Considerations for Google Cloud 3 ©2020 Dynatrace

Page 4: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

We worked with your industry peers to arrive at our insights and conclusions

Dynatrace works with the world’s most recognized brands, helping to automate their operations and release better software faster. We have experience monitoring the largest Google Cloud implementations, helping enterprises manage the significant complexity challenges of speed and scale. Some examples include:

• A large retailer managing 2,000,000 transactions a second

• An airline with 9,200 agents on 550 hosts capturing 300,000 measurements per minute and more than 3,000,000 events per minute

• A large health insurer with 2,200 agents on 350 hosts, with 900,000 events per minute and 200,000 measurements per minute

Read on to reveal five critical factors that dictate the right monitoring platform for Google Cloud.

At Dynatrace, we went through our own digital transformation and have achieved the goal of transforming into a DevOps, cloud-native / cloud-centric best practices-led software company. Today, in terms of continuous automation, we enjoy being named furthers for Completeness of Vision and highest for Ability to Execute in the Gartner Magic Quadrant for Application Performance Monitoring.

SPEED 26 releases per year

QUALITY 93% reduction in production bugs

CUSTOMERS Ecstatic

AGILITY 5,000 cloud deployments per day

INNOVATION Hundreds of developers, no operations

Dynatrace Transformation Report

45 Key Considerations for Google Cloud ©2020 Dynatrace

Page 5: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Hybrid, multi-cloud is the normChapter 1

Insight

Enterprises are rapidly adopting cloud infrastructure as a service (IaaS), platform as a service (PaaS), and function as a service (FaaS) to increase agility and accelerate innovation. Cloud adoption is so widespread that hybrid multi-cloud is now the norm. According to RightScale, 81% of enterprises are executing a multi-cloud strategy, while 451 Research predicts that over two-thirds of enterprises will operate a hybrid multi-cloud environment by 2020.2

Hybrid cloud As enterprises migrate applications to the cloud or build new cloud native applications, they are also maintaining traditional applications and infrastructure. Over time, the balance will shift from the traditional tech stack to the new stack, but new and old will continue to coexist and interact.

Multi-cloud Different cloud platforms have different features and benefits, technologies, levels of abstraction, price, and geographic footprints, that make them suitable for specific services. Enterprises started with a single cloud provider but quickly embraced multiple clouds resulting in highly distributed application and infrastructure architectures.

1RightScale: Cloud Computing Trends: 2018 State of the Cloud Survey2451 Research Voice of the Enterprise: Cloud Transformation

5 Key Considerations for Google Cloud 5 ©2020 Dynatrace

Page 6: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Challenge

The result of hybrid multi-cloud is bimodal IT—the practice of building and running two distinctly different application and infrastructure environments. Enterprises need to continue to enhance and maintain existing relatively static environments while also building and running new applications-scalable, dynamic software defined infrastructure in the cloud.

Putting traditional IT to one side for a moment to focus solely on multi-cloud platforms, the frequent output is monitoring-tool proliferation resulting from teams operating in siloes despite critical interdependencies between services running across clouds.

The challenge of multiple monitoring tools across clouds is further compounded when we bring traditional IT back into focus, and the need to monitor and manage a range of existing technologies that also have service interdependencies with cloud environments.

Key consideration

Simplicity and cost saving drove early cloud adoption. But today, enterprise cloud use has evolved into a complex and dynamic landscape that spans multiple clouds as well as traditional on-premises technologies. The ability to seamlessly monitor the full technology stack across multiple clouds while also monitoring traditional on-premises technology stacks is critical to automating operations, no matter how highly distributed the applications and infrastructure being monitored.

AWSCloudwatch

AzureMonitoring

GoogleStackdriver

PCFMetrics

OpenShiftMetrics

VMwareMetrics

69%

of enterprises will have a hybrid,multi-cloud environment by 2020

-451 Research

65 Key Considerations for Google Cloud ©2020 Dynatrace

Page 7: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Microservices and containers require real-time visibilityChapter 2

Insight

Microservices and containers are revolutionizing the way applications are built and deployed, providing tremendous benefits in terms of speed, agility, and scale.3 In fact, 98% of enterprise development teams expect microservices to become their default architecture, and IDC predicts that by 2022, 90% of all apps will feature a microservices architecture.4

Challenge Seventy-two percent of CIOs say that monitoring containerized microservices in real time is almost impossible. Moving to microservices running in containers makes it harder to get visibility into environments. Each container acts like a tiny server, multiplying the number of points you need to monitor. They live, scale, and die based on health and demand. As enterprises scales their Google Cloud environments from on-premises to cloud to multi-cloud, the number of dependencies and data generated increases exponentially, making it impossible to understand the system as a whole.

The traditional approach to instrumenting applications involves the manual deployment of multiple agents. When environments consist of thousands of containers with orchestrated scaling, manual instrumentation becomes impossible and severely limits the ability to innovate.

Key consideration A manual approach to instrumenting, discovering, and monitoring microservices and containers will not work. For dynamic, scalable platforms like Google Cloud, a fully automated approach to agent deployment and continuous discovery of containers and monitoring of the applications and services running within them is mandatory.

³Dimensional Research⁴IDC FutureScape⁵Dynatrace CIO Complexity Report

72%

of CIOs say monitoring containerizedmicroservices in real time is almost impossible

-Dynatrace CIO Complexity Report

75 Key Considerations for Google Cloud ©2020 Dynatrace

Page 8: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

AI empowers — but not all AI is equalChapter 3

Insight

Gartner predicts 30% of IT organizations that fail to adopt AI will no longer be operationally viable by 2022.6 As enterprises embrace a hybrid multi-cloud environment, the sheer volume of data created and the massive environmental complexity will make it impossible for humans to monitor, comprehend, and take action. This critical need for machines to solve data volume and speed challenges has resulted in Gartner creating a new category—AIOps (AI for IT operations).

Challenge

AI is a buzzword across many industries and making sense of the market noise is difficult. To help, here are three key AI use cases to keep in mind when considering how to monitor your Google Cloud Platform and applications:

Many enterprises are trying to insert technology and adding an AIOps solution to the 10 to 25+ monitoring tools they already have. While this approach may have limited benefits such as alert noise reduction, it will only be able to marginally address the power of root cause analysis and auto-remediation because it will not have the full context of the environment.

AI and root cause analysisThe biggest benefit of AI to monitoring is its ability to automate root cause analysis, which makes it possible to identify and resolve problems at speed. An AI engine that has access to more complete data (including third-party data) will provide faster, contextual insights.

AI and alert stormsAI is perfectly suited to real-time monitoring and analysis of large data sets to provide the most probable reason for a performance issue. AI recognizes when related anomalies occur within your environment (for example, when thresholds are broken), thereby preventing alert storms.

AI and auto-remediation AI should be a part of your CI/CD pipeline, deployment, and remediation processes. Problems can be detected quickly, and problem builds can be identified earlier so you can automatically remediate or roll back to a previous state.

⁶AI (in a box) for IT Ops—The AIOps 101 you’ve been looking for

8 ©2020 Dynatrace5 Key Considerations for Google Cloud

Page 9: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

You will also find there are many different approaches to AI. Here are a few of the more popular ones you are likely to encounter as you move towards an AIOps strategy:

Key consideration

So while AI empowers — not all AI is created equal. Attempting to enhance existing monitoring tools with AI such as machine learning and anomaly-based AI, will provide limited value. AI needs to be inherent in all aspects of the monitoring platform and see everything in real time, including the topology of the architecture, dependencies, and service flow. AI should also be able to ingest additional data sources for inclusion in the AI algorithms, as opposed to correlating data via charts and graphs.

Deterministic AIThis gives you the ability to discover the topology of your environment and the metrics produced by all components. It works immediately and adapts to changes without having to relearn patterns. It is also excellent at event noise reduction (alert storms), dependency detection, root cause analysis, and business impact analysis.

Machine-learning AIThis is a metrics-based approach. It takes time to build a data set to which it can compare previous states. Its strongest feature is limiting event noise reduction. However, it does not offer root cause or business impact analysis.

Anomaly-based AIThis form of AI provides satisfactory event noise reduction and dependency detection. One of the major drawbacks is that it takes a lot of time to build a metrics model that would show a correlation for root cause analysis.

30%

of IT organizations that fail to adopt AI willno longer be operationally viable by 2022

-Gartner

Applications

Services

Processes

Hosts

Datacenters

9 ©2020 Dynatrace5 Key Considerations for Google Cloud

Page 10: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

DevOps and Continuous Delivery: Innovation's soulmateChapter 4

Insight

DevOps and Continuous Delivery is perhaps the most critical consideration when maximizing an investment in Google Cloud and other cloud technologies. Implemented and executed correctly, these can enhance an enterprise’s ability to innovate with speed, scale, and agility. Last year Dynatrace did a survey that included two important dimensions, MTTR (length of time to remediate and resolve an issue) and mean-time-to-innovation (the time it takes to build and test functionality to push to the end-users). These tell a lot about the maturity of a company and their level of automation. The results surprisingly showed that only about 5% of the people we surveyed are achieving top performance. 95% of companies today are not leveraging the full potential of cloud native technology.⁷

Challenge As enterprises scale across multiple teams, there will be hundreds or thousands of changes a day, resulting in code pushes every few minutes. While CI/CD tooling helps mitigate error-prone manual tasks through automated build, test, and deployment, bad code can still make it into production. The complexity of a highly-dynamic and distributed cloud environment like Google Cloud, along with thousands of deployments a day, will only exacerbate this risk.

As the software stakes get higher, shifting performance checks left—that is, earlier in the pipeline– enabling faster feedback loops becomes critical. But it can’t be accomplished easily with a multi-tool approach to monitoring. To be effective, a monitoring solution needs to have a holistic view of every component, every change, and contextual understanding of the impact each change has on the system as a whole.

Key consideration To go fast and not break things, AI and automation should be a part of your DevOps monitoring strategy.

• Use monitoring strategically as a feature of the end-to-end pipeline to help automate, and also to democratize data for tighter collaboration across teams.

• Shift-left and automate quality and stop bad code changes before they reach production

• Shift-right and automate deployments and release higher quality applications more frequently

• And automate operations so that you can auto-mitigate and self-heal bad deployments in production.

Notify about tests

Comparison result

Build #17

Notify about canary

Validation result

Performance-Gates-as-Code Validation-as-Code Autoremediation-as-Code

Build #18

ProductionDeploy / Test Validate ApprovalStaging Deploy / Test Validate Approval

Strategically Integrate Intelligent Monitoring into the Software Delivery Process

5 Key Considerations for Google Cloud 10 ©2020 Dynatrace

Page 11: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Digital experiences matterChapter 5

Insight

Enterprises are striving to accelerate innovation without putting customer experiences at risk, but it’s not just traditional end-customer experiences of web and mobile apps at risk. Apps build on Google Cloud support a broad range of services and audiences that are reliant on the emerging paradigm of machine-to-machine (M2M) and Internet of Things (IoT) connections.

• The consumerization of IT has evolved to include wearables, smart homes, smart cars and life-critical health devices

• Corporate employees are increasingly working remotely and need access to systems that are in the corporate datacenter and cloud based

• Employees using office workspaces rely on smart office features for lighting, temperature, safety, and security

What was simply regarded as user experience has evolved and grown into digital experience, encapsulating end-users, employees, and IoT.

The rise of the machines Machines are used in unimaginable areas worldwide and are increasingly being hooked into the Internet, across all industries, creating a colossal communication network at the global scale. Gartner estimates connected devices in use worldwide will top 20 billion by 2020.

5 Key Considerations for Google Cloud 11 ©2020 Dynatrace

Page 12: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Challenge

Enterprise IT departments face mounting pressure to accelerate their speed of innovation, while user expectations for speed, usability, and availability of applications and services increases unabated. Combined with the explosion of IoT devices and the increasingly vast array of technologies involved, managing and optimizing digital experiences while embracing high frequency software release cycles and operating complex hybrid cloud environments presents significant challenges.

If digital experiences aren’t measured, how can enterprises prioritize and react when problems occur? Are they even aware there are problems? And if experiences are quantified, is it in the context of the supporting applications, services, and infrastructure that permit rapid root-cause analysis and remediation? Only enterprises able to deliver extraordinary digital customer experiences will stay relevant and prosper.

Key consideration

Enterprises need confidence that they’re delivering, or on the path to delivering, exceptional digital experiences in increasingly complex environments. To achieve this, they require real-time monitoring and 100% visibility across all types of customers, employees, and machine-based experiences. Key things to look for include:

Performance

Root Cause

Impact

Revenue

of mobile users abandon session if longer than 3 seconds to load

of customers expect online help resolution within 5 minutes

of users will not return after negative experience

of CIOs fear IoT performance problems could derail operations and significantly damage revenue

Ability to visualize and prioritize impact Understand how specific issues or overall performance impacts every single user session or device and prioritize by magnitude.

Visibility from the edge to the coreA single view across your entire multi-cloud ecosystem. From the performance of users and edge devices to your applications and cloud platforms, all in context.

Single source of truth for allEnsure stakeholders, from IT to marketing, have access to the same data to avoid silos, finger pointing, and war rooms.

2019 Global IT Complexity Report

76%

of CIOs say multi-cloud deployments make monitoring user experience difficult.

-Dynatrace CIO Complexity Report

76%

of CIOs say multi-cloud deployments make monitoring user experience difficult

-Dynatrace CIO Complexity Report 53%

75%

79%

74%

125 Key Considerations for Google Cloud ©2020 Dynatrace

Page 13: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

5 Key Considerations for Google Cloud

Dynatrace's all-in-one platform provides intelligence into the performance of your apps, underlying infrastructure, experience of users, and more, so you can automate IT operations, release better software faster, and deliver unrivaled digital experiences.

Dynatrace + Google Cloud: A powerful partnership

Dynatrace is an all-in-one enterprise cloud monitoring solution that provides real-time answers and insights for all teams. AI-powered, full stack, and completely automated — all you need to transform faster and compete more effectively in the digital age.

Full Stack

Understand all the relationships and interdependencies, top to bottom, for your complex enterprise cloud ecosystem.

AI Powered

Deterministic, causation-based AI for real-time insights, actions, and actionable answers, not just more data.

Automated

Zero-touch configuration, continuous discovery and mapping, effortless problem identification, and root cause determination.

Web-scale

Scale-out cloud native architecture, role-based governance for large global teams and automatic, enterprise-wide deployment.

13 ©2020 Dynatrace

360º actionable monitoring with one-click deployment

Intelligent enterprise cloud monitoring

Full-stack

Hyper-scale

Complexity, innovation, speed

Hybrid cloud

Scale

Productivity

Security

Dynatrace helps you build better software faster

Page 14: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Spend your time innovating, not monitoringLeveraging Dynatrace enables enterprises to innovate faster, automate IT operations, and provide perfect software experiences to customers. Dynatrace is built to support innovation at scale, minimize risk, and reduce cloud complexity. Utilizing its AI capabilities, Dynatrace provides real-time, high-fidelity data to operations, development, and business teams.

This helps organizations lay the foundation for a more collaborative organizational structure. It opens the door to even greater agility and flexibility to innovate at scale through automation and autonomous cloud operations.

Dynatrace by the numbers

1 solution for all performance management across on-premises, cloud, hybrid, and multi-cloud environments

0 configuration required

600+ R&D experts ensure the industry’s broadest technology coverage

100,000+ hosts: Scales for the world’s largest environments

5 minutes: Start monitoring in just minutes with a software-as-a-service, managed, or on-premises solution

360° monitoring

1 -click deployment

145 Key Considerations for Google Cloud ©2020 Dynatrace

Page 15: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

The bottom line: Transform your digital business faster with Dynatrace & Google Cloud

With Dynatrace and Google Cloud Platform, organizations can:

• Release better software, faster. Build an unbreakable delivery pipeline and enable self-healing, so you can focus on innovation, not troubleshooting.

• Automate and modernize cloud operations. Ensure enterprise cloud success while optimizing resources and rationalizing tools with automated, AI-powered monitoring.

• Deliver perfect software experiences. Ensure perfect experiences by seeing every customer’s journey from their perspective, in the context of your app and infrastructure performance.

Power and scale: Native Google Cloud integration

Dynatrace natively embeds OneAgent technology through Google Cloud virtual machine extensions, making it the most powerful enterprise cloud monitoring solution available for this cloud platform.

One-click deployment delivers the full picture of how Dynatrace builds on top of the productivity, intelligence, and hybrid capabilities of Google Cloud.

Applications

Services

Processes

Hosts

Datacenters

Scheduler

Controller Manager

API Server

Control Plane

VM

VM

VM

Pod

Pod

Containers

Worker Nodes

Pod

Multi-cloud

GKE GCP Compute

Compute

OneAgent Operator

Digital experience

OneAgent deploys automatically to all layers and technologiesin your environment

Monitor, analyze and optimizeevery digital interaction

Real-time auto discoverythrough OneAgent Operator

Injection of containerswithout code or image changes

Automatic and continuous deployment of Dynatrace OneAgent to all components

Full integration with all major cloud platforms

5 Key Considerations for Google Cloud 15 ©2020 Dynatrace

Page 16: 5 Key Considerations for Monitoring Google Cloud€¦ · occur within your environment (for example, when thresholds are broken), thereby preventing alert storms. AI and auto-remediation

Enterprises use Google Cloud to fundamentally transform how they build and run applications at speed and scale in highly distributed, multi-cloud environments.

We hope this 5 Key Considerations for Monitoring Google Cloud e-book has provided helpful advice and guidance to you on your Google Cloud journey. Dynatrace is committed to providing enterprises the data and intelligence needed to be successful with their Google Cloud deployments, no matter how complex.

If you are ready to learn more, please visit www.dynatrace.com/trial for assets, resources, and a free 15-day trial.

About DynatraceDynatrace provides software intelligence to simplify enterprise cloud complexity and accelerate digital transformation. With AI and complete automation, our all-in-one platform provides answers, not just data, about the performance of applications, the underlying infrastructure, and the experience of all users. That’s why many of the world’s largest enterprises, including 72 of the Fortune 100, trust Dynatrace to modernize and automate enterprise cloud operations, release better software faster, and deliver unrivaled digital experiences.

Learn more

04.20.20 8586_EBK_Agency/jw ©2020 Dynatrace