reimagining application performance management in the

8
Reimagining Application Performance Management in the AIOps Era By Charles Araujo +1-617-517-4999 | [email protected] | www.intellyx.com

Upload: others

Post on 19-Jun-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reimagining Application Performance Management in the

Reimagining Application Performance Management in the AIOps Era

By Charles Araujo

+1-617-517-4999 | [email protected] | www.intellyx.com

Page 2: Reimagining Application Performance Management in the

1 © 2019 Intellyx LLC | +1-617-517-4999 | [email protected] | www.intellyx.com

Table of Contents

Traditional APM: Built for a Different Time ............................................................... 3

The Evolving Nature of APM ................................................................................... 4

AI-powered APM for the Rest of the Stack ............................................................... 4

The Intellyx Take ................................................................................................. 5

About the Author ................................................................................................. 7

Page 3: Reimagining Application Performance Management in the

2 © 2019 Intellyx LLC | +1-617-517-4999 | [email protected] | www.intellyx.com

Enterprise IT organizations have historically turned to application performance management (APM) systems to instrument, monitor, and manage critical applications. But as the rate and impact of failures are now showing, these traditional approaches are beginning to come up short as complexity increases. Enterprise organizations are left looking for another option.

On a busy April afternoon last year, thousands of shoppers were lined up all across Australia waiting to buy their groceries.

And they waited.

In some cases, they were abruptly told that the store had closed, to leave their shopping carts, and to exit.

It was not a good day for Woolworths, Australia's largest chain of supermarkets. In total, the cash registers at nearly five hundred of the company's stores had failed for at least thirty minutes.

Customers were perturbed—and they lashed out on social media to express their frustration.

“Down Down, the cash registers are down.... #woolworths new slogan,” wrote @wolfcat on Twitter.

The culprit? A software “glitch.”

The outage caused widespread impact to both Woolworths’ top line (lost revenue) and to its reputation.

While I doubt the company will take solace from this, it is far from alone.

Throughout the world, enterprise organizations are suffering massive and systemic failures at an increasing rate and, as Woolworths' situation demonstrates, the impact of these failures are not only measured in hard costs (often in the millions), but also in reputational impact that is exponentially more costly in the long run.

Ironically, one of the reasons these failures are increasing is because enterprises are aggressively seeking to execute digital transformation initiatives—which has the effect of expanding the complexity and transiency of the technology stack as they are continually growing and changing it.

As complexity and transiency increases, however, so does the risk of a service disruption—various studies show that changes cause as many as 70-80% of outages. Moreover, as transformational initiatives continue to embed technology into nearly every facet of the customer experience, the impact of service failures increases dramatically.

Enterprise IT organizations historically turned to application performance management (APM) systems to instrument, monitor, and manage critical applications and avoid these types of issues. But as the rate and impact of these failures are showing, these traditional approaches are coming up short as complexity increases.

Enterprise organizations are left looking for another option.

Page 4: Reimagining Application Performance Management in the

3 © 2019 Intellyx LLC | +1-617-517-4999 | [email protected] | www.intellyx.com

Traditional APM: Built for a Different Time The operational gap that this increased complexity is causing is catching many enterprise leaders off-guard.

Having made significant investments in traditional APM solutions, they thought they were covered. Why is it that these solutions, into which they have invested both significant money and organizational resources, are no longer getting the job done?

The reason is that traditional APM solutions were built for another time—a time in which the technology stack was less complex and more predictable.

When technology companies began introducing the idea that organizations could instrument and monitor critical applications, it was a well-received breakthrough. Yes, it would be involved and invasive (requiring code-level instrumentation), but the value of being able to monitor critical applications was worth the trouble—and the cost.

This investment was made possible, however, because three things were also true.

• The application stack was relatively static. There was little worry that after investing in instrumenting an application that it would suddenly change or that the organization would suddenly decommission it.

• Organizations could easily distinguish between their critical and non-critical applications. This clarity meant that they would only need to invest resources instrumenting a small subset of their application stack.

• The mostly monolithic nature of the technology stack at this early stage meant that it was also a reasonably simple task to identify and map the infrastructure elements supporting those critical applications.

Today, however, none of this true.

Instead, the modern enterprise technology stack is in a constant state of change and is becoming ever-more interconnected and intertwined—at both the application and infrastructure level. The result is that it has become almost impossible to separate critical from non-critical.

It is, therefore, not difficult to see why the resource-intensive instrumentation architectures of APM solutions are having trouble delivering on their promise to organizations—and why the enterprise understanding of APM must evolve.

Page 5: Reimagining Application Performance Management in the

4 © 2019 Intellyx LLC | +1-617-517-4999 | [email protected] | www.intellyx.com

The Evolving Nature of APM

While it is clear that the broader idea of APM must evolve, that’s not to say that traditional APM solutions have no place in the modern enterprise.

Traditional solutions still provide significant value in those instances in which an enterprise has critical applications that require deep, code-level telemetry. In most cases, organizations will already have those applications instrumented and monitored using traditional APM solutions. There’s no reason for that to change.

The point, however, is that this view of APM is no longer enough.

Many enterprises are covering as little as five percent of their application stack with traditional APM solutions. The problem, as we’ve discussed, is that today’s complex, interconnected technology stack means that a non-critical, non-instrumented application or infrastructure element can cause a massive disruption to a critical system—but organizations have little to no visibility to anything beyond the limited subset of the stack they have deemed critical.

The reality today is that anything that connects to a critical system is now an extension of it and is, at least from one perspective, critical as well.

At the same time, as technology now powers so much of the customer engagement and experience, systems that organizations would not have traditionally considered critical may now, in fact, be critical at various points of the customer’s journey.

Call this phenomenon criticality creep.

This creep is leading enterprises to recognize that in this complex, interconnected reality, they need the ability to monitor everything—all applications and all of their underlying infrastructure in real-time.

Enterprise leaders must, therefore, evolve the way they think about APM to reflect this more complete and holistic view of instrumenting, monitoring, and managing their entire technology stack (although perhaps not in the traditional ways)—and to do so from a business perspective.

There is no longer room for gaps that leave an organization open to a failure from some unexpected source, but which results in a significant impact.

AI-powered APM for the Rest of the Stack

The challenge for both application and operational leaders is how to accomplish this monitor-everything approach without crushing the organization under the weight of the overhead that traditional APM typically involves.

Organizations can find the answer in a related evolution occurring within IT operations, which has also struggled under the weight of IT's growing complexity. This challenge has led to a transformation in how organizations manage IT operations by leveraging various forms of automation and artificial intelligence—what some call AIOps.

Page 6: Reimagining Application Performance Management in the

5 © 2019 Intellyx LLC | +1-617-517-4999 | [email protected] | www.intellyx.com

The same thing that underpins the transition to AIOps will enable the AI-powered evolution of APM: operational patterns.

Despite the growing complexity of the technology stack, each interaction between elements of the stack leaves digital breadcrumbs that represent patterns of interaction.

While these bits of data are too voluminous and arcane for a human operator to make sense of and leverage from an operational perspective, this data is a goldmine for machine learning algorithms that can use it to identify critical patterns—particularly when they are consolidated into an operational data lake. This approach enables organizations to create a consolidated data source for analyzing operational data and identifying operational patterns.

The ability to identify patterns is becoming even more critical as organizations embrace more componentized approaches to develop and deploy applications—leveraging technologies such as containers and microservices. While essential as organizations modernize and transform, these approaches also result in even greater transiency, which makes it increasingly difficult to instrument and monitor modern applications.

Most critically, however, organizations can also use machine learning and methods borrowed from AIOps to identify contextual relationships between these operational patterns and the business outcomes they support. This ability to transform technical indicators into business-relevant signals is essential as organizations seek to instrument and monitor the entirety of the technology stack without having it overwhelm their operational capacity.

The trick in accomplishing all of this, however, is leveraging hybrid management systems, such as ScienceLogic SL1, that use a wide variety of approaches to collect operational data and then leverage machine learning and other forms of AI to discern these operational patterns and connect them to business outcomes.

The data-centric and technology agnostic approaches these types of hybrid systems employ, enable organizations to close the operational gap left by traditional APM solutions by making it possible to instrument and monitor the entire technology stack without overburdening the organization's operational capability.

The Intellyx Take

As a member of the IT leadership team, I vividly remember having numerous conversations and debates to decide which of our applications were critical and which were not.

In many cases, it was a straightforward discussion. But with the cost and high management overhead of instrumenting applications using our APM tools, we could only instrument a small portion of our application portfolio.

Page 7: Reimagining Application Performance Management in the

6 © 2019 Intellyx LLC | +1-617-517-4999 | [email protected] | www.intellyx.com

There were always those applications at the edge of criticality where the debates ensued. It often became very political, sometimes including backroom deals to ensure that your application would make the cut.

Those days, however, are long gone.

I can't imagine having that kind of conversation today. With the complexity of the technology stack and the fact that it is now embedded into every facet of business operations, trying to determine what is critical and what is not is pure folly.

Even if you could somehow draw that line, however, it would still be meaningless as the pace of change in today's market means you'd be having those conversations and debates on a continual basis.

No one is arguing that application performance management is no longer critical. It’s just the opposite: APM is more critical than ever.

The issue, in fact, is that it has become so vital that organizations must monitor and manage the performance of everything. Everything is critical, and you can no longer predict from where the next risk of a major outage may emerge.

The traditional approaches to APM, however, are just too cumbersome and expensive to make it economically and operationally feasible to use them for this purpose.

The only answer, therefore, is for organizations to evolve their view of APM into a more holistic and encompassing approach that includes not only traditional, code-instrumentation approaches to APM, but also more modern hybrid approaches, such as that taken by ScienceLogic.

It is this more encompassing view that makes it possible to extend the value and benefits of APM across the entirety of the stack.

It’s important to recognize that this is less of a technical issue and more of a mindset shift. Enterprise leaders must move beyond attempting to categorize their technology stack as they have in the past—deeming some elements critical and others not and then managing them accordingly.

Instead, they must accept that today’s complex environment and ever-changing market requires a new approach to application performance management that assumes everything is critical, leverages AI to manage complexity, employs automation to keep pace with change, and which focuses on business outcomes above all else.

Page 8: Reimagining Application Performance Management in the

7 © 2019 Intellyx LLC | +1-617-517-4999 | [email protected] | www.intellyx.com

About the Author

Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. He is Principal Analyst with Intellyx, the first and only industry analyst firm focused on agile digital transformation. He has authored three books and published over 100 articles. He has been a regular contributor to both InformationWeek and CIO Insight Magazine and has been quoted or published in magazines, blogs and websites including Time, CIO, CIO & Leader, IT Business Edge, TechRepublic, Computerworld, USA Today, and Forbes. He is the founder of The Institute for Digital Transformation and a sought-after keynote speaker having addressed over 10,000 business and IT leaders in 10 countries over the last several years. He is passionate about the power of technology to deliver competitive and transformational advantage to organizations and in the critical need to develop next generation “digital leaders” that can transform their organizations into Digital Enterprises. He is presently at work on a new book entitled, Thinking Digital: How to Thrive and Win in the Digital Era, which will explore this topic in detail. Prior to joining Intellyx, Charles served as an advisor and consultant for nearly twenty years, leading numerous large-scale transformation programs for Fortune 1000 organizations and government institutions involving as many as 10,000 program participants. In his early career, he spent many years working in and with IT organizations in the healthcare, financial services, and aerospace industries, directly leading teams of more than 100 members. About Intellyx Intellyx is the first and only industry analysis, advisory, and training firm focused on agile digital transformation. Intellyx works with enterprise digital professionals to cut through technology buzzwords and connect the dots between the customer and the technology—to provide the vision, the business case, and the architecture for agile digital transformation initiatives.

About ScienceLogic ScienceLogic is a leader in IT Operations Management, providing modern IT operations with actionable insights to predict and resolve problems faster in a digital, ephemeral world. Its solution sees everything across cloud and distributed architectures, contextualizes data through relationship mapping, and acts on this insight through integration and automation. Trusted by thousands of organizations, ScienceLogic’s technology was designed for the rigorous security requirements of United States Department of Defense, proven for scale by the world’s largest service providers, and optimized for the needs of large enterprises. https://sciencelogic.com/

© Intellyx LLC. As of the time of writing, ScienceLogic is an Intellyx customer. Intellyx retains final editorial control of this paper.