testing and measurement in devops: find solutions—not more problems
TRANSCRIPT
DW7 DevOps 11/16/2016 4:15:00 PM
Testing and Measurement in DevOps: Find Solutions—Not More Problems
Presented by:
Andreas Grabner
Dynatrace
Brought to you by:
350 Corporate Way, Suite 400, Orange Park, FL 32073
888-‐268-‐8770 ·∙ 904-‐278-‐0524 - [email protected] - http://www.stareast.techwell.com/
Andreas Grabner Dynatrace
Performance enthusiast Andreas Grabner has been a developer, tester, architect, and product evangelist for the past fifteen years for several testing and diagnostics companies including Segue, Borland, Compuware, and Dynatrace. Andreas now helps organizations find performance, scalability, and architectural problems in their applications. Speaking at meetups, user groups, and international conferences, he shares his knowledge and teaches others how to recognize and avoid the formerly-mentioned problems.
11/1/2016
1
Testing and Measurement in DevOpsMeasurement in DevOpsFind Solutions – Not More ProblemsAndreas Grabner: @grabnerandi, [email protected]: http://www.slideshare.net/grabnerandiPodcast: https://www.spreaker.com/show/pureperformance
The Story started in 2009
@grabnerandi
11/1/2016
2
@grabnerandi
“The stuff we did h St t Uwhen we were a Start Up
and we All were
@grabnerandi
Devs, Testers and Ops”Quote from Andreas Grabner back in 2013 @ DevOps Boston
11/1/2016
3
@grabnerandi
Utmost goal: minimize cycle time (= Lead Time)
This is where you
minimize Users
This is where youcreate value!
timefeature cycle time
11/1/2016
4
24 “Features in a Box” Ship the whole box!
Very late feedback
„1 Feature at a Time“
Continuous Innovation and Optimization
„Optimize before Deploy“„Immediate Customer Feedback“
11/1/2016
5
DevOps Adoption
700 deployments / YEAR
Innovators (aka Unicorns): Deliver value at the speed of business
10 + deployments / DAY
50 60 d l t / DAY50 – 60 deployments / DAY
Every 11.6 SECONDS
11/1/2016
6
DevOps @ Targetpresented at Velocity, DOES and more …
“We increased from monthly to 80 deployments per week
only 10 incidents per month
@grabnerandihttp://apmblog.dynatrace.com/2016/07/07/measure‐frequent‐successful‐software‐releases/
… only 10 incidents per month …
… over 96% successful! ….”
11/1/2016
7
“We Deliver High Quality Software,Faster and Automated using New Stack“
„Shift-Left Performance to Reduce Lead Time“
Adam Auerbach, Sr. Dir DevOps
https://github.com/capitalone/Hygieia & https://www.spreaker.com/user/pureperformance
“… deploy some of our most critical production workloads on the AWS platform …”, Rob Alexander, CIO
2 major releases/yearcustomers deploy & operate on-prem
26 major releases/year170 prod deployments/dayself-service online sales
2011 2016
SaaS & Managed
11/1/2016
8
“In Your Face” Data!
@grabnerandihttps://dynatrace.github.io/ufo/
#1: Availability -> Brand Impact
@grabnerandi
Availability dropped to 0%Availability dropped to 0%
11/1/2016
9
New Deployment + Mkt PushNew Deployment + Mkt Push
Overall increase of Users!Overall increase of Users!
#2: User Experience -> Conversion
Increase # of unhappy users!Increase # of unhappy users!
Overall increase of Users!Overall increase of Users!
Spikes in FRUSTRATED Users!Spikes in FRUSTRATED Users!
@grabnerandi
Decline in Conversion Rate
#3: Resource Cons -> Cost per Feature
4x $$$ to IaaS
@grabnerandi
11/1/2016
10
#4: Performance -> Behavior
@grabnerandi
Not every Sprint ends without bruises!
@grabnerandi
11/1/2016
11
Richard DominguezDeveloper in OperationsPrep Sportswear
„In 2013 business demanded to go from monthly to daily deployments“
@grabnerandi
„80% failed!“
Understanding Code Complexity• 4 Millions Lines of Monolith Code• Partially coded and commented in
Russian
From Monolith to Microservice• Initial devs no longer with company• What to extract withouth breaking it?
Shift Left Quality & Performance• No automated testing in the pipeline• Bad builds just made it into production
Cross Application Impacts• Shared Infrastructure between Apps• No consolidated monitoring strategy
11/1/2016
12
Scaling an Online Sports Club Search ServiceResponse Time
4) Performance Slows Growth Users
1) 2-Man Project 2) Limited Success
3) Start Expansion
@grabnerandi
2015201420xx 2016+
5) Potential Decline?
Early 2015: Monolith Under Pressure
April: 0 52s
May: 2.68s 94.09% CPU 94.09% CPU
April: 0.52s
@grabnerandi
Can‘t scale vertically endlessly!
Bound
11/1/2016
13
From Monolith to Services in a Hybrid-Cloud
@grabnerandi
Front Endto Cloud
Scale Backendin Containers!
Go live – 7:00 a.m.
@grabnerandi
11/1/2016
14
Go live – 12:00 p.m.
@grabnerandi
What Went Wrong?
11/1/2016
15
Architecture ViolationDirect access to DB from frontend serviceArchitecture ViolationDirect access to DB from frontend service
Single search query end-to-end
Direct access to DB from frontend serviceDirect access to DB from frontend service
@grabnerandi
26.7s Load Time
5kB Payload
26.7s Load Time
5kB Payload
33! Service Calls
99kB - 3kB for each call!
33! Service Calls
99kB - 3kB for each call!
171! Total SQL Count171! Total SQL Count
Understanding Code Complexity• Existing 10 year old code & 3rd party• Skills: Not everyone is a perf expert or born architect
From Monolith to Microservice• Service usage in the End-to-End Scenarios?• Will it scale? Or is it just a new monolith?
Understand Deployment Complexity• When moving to Cloud/Virtual: Costs, Latency …• Old & new patterns, e.g: N+1 Query, Data
Understand Your End Users• What they like and what they DONT like!• Its priority list & input for other teams, e.g: testing
11/1/2016
16
The fixed end-to-end use case“Re-architect” vs. “Migrate” to Service-Orientation
@grabnerandi
2.5s (vs 26.7) 5kB Payload
2.5s (vs 26.7) 5kB Payload
1! (vs 33!) Service Call
5kB (vs 99) Payload!
1! (vs 33!) Service Call
5kB (vs 99) Payload!
3! (vs 177) Total SQL Count
3! (vs 177) Total SQL Count
@grabnerandi
11/1/2016
17
You measure it! from Dev (to) Ops
@grabnerandi
Build # Use Case Stat # APICalls # SQL Payload CPU
Use Case Tests and Monitors Service & App Metrics Ops
#ServInst Usage RT
Continuous Innovation and OptimizationScenario: Monolithic App with 2 Key FeaturesScenario: Monolithic App with 2 Key Features
Build 17 testNewsAlert OK
testSearch OK
1 5 2kb 70ms
1 35 5kb 120ms
B ild 26 t tN Al t OK
Build 25 testNewsAlert OK
testSearch OK
1 4 1kb 60ms
34 171 104kb 550ms
1 0.5% 7.2s
1 63% 5.2s
1 1kb 60 1 %
Re‐architecture into „Services“ + Performance FixesRe‐architecture into „Services“ + Performance Fixes
@grabnerandi
Build 26 testNewsAlert OK
testSearch OK
1 4 1kb 60ms
2 3 10kb 150ms
1 0.6% 3.2s
6 75% 2.5s
Build 35 testNewsAlert -
testSearch OK
- ‐ - -
2 3 7kb 100ms
- ‐ ‐
4 80% 2.0s
11/1/2016
18
@grabnerandi
„Always seek to Increase Flow“
„Understand and Respond to Outcome“
@grabnerandi
„Culture on Continual Experimentation“
11/1/2016
19
Increased Flow of High Quality Value
B k th M lithBreak the MonolithInfrastructure as CodeMigrate to Virtual/Cloud/PaaS
@grabnerandi
Test Driven DevelopmentAutomated DeploymentsShift-Left PerformanceRemove Bottlenecks
Fast Response to Outcome: Address Deployment Impact
C t d Effi iAvailability
@grabnerandiUser Experience, Conversion Rate
Costs and Efficiencyy
11/1/2016
20
Real User Feedback: Building the RIGHT thing RIGHT!
Removing what nobody needs
@grabnerandi
Experiment & innovate on new ideas
Optimizing what is not perfect
Testing: Ensure Success in The First Way
„Always seek to Increase Flow“
@grabnerandi
(1) Removing Bottlenecks
(2) Eliminating Technical Debt
(3) Successful Miroservices Migration(3) Successful Cloud Migration
(1) Shift-Left Performance
(2) Reduce Code Complexity
11/1/2016
21
@grabnerandi
@grabnerandiAND MANY MORE
11/1/2016
22
Remove Application Code Bottlenecks
Remove Architectural Bottlenecks
11/1/2016
23
Remove Service Bottlenecks
Remove Database Bottlenecks
11/1/2016
24
Remove Deployment Bottlenecks
“To Deliver High Quality Working Software Faster“
„We have to Shift‐Left Performance to Optimize Pipelines“http://apmblog.dynatrace.com/2016/10/04/scaling‐continuous‐delivery‐shift‐left‐performance‐to‐improve‐lead‐time‐pipeline‐flow/
11/1/2016
25
= Functional Result (passed/failed)+ Web Performance Metrics (# of Images, # of JavaScript, Page Load Time, ...)+ App Performance Metrics (# of SQL, # of Logs, # of API Calls, # of Exceptions ...)
Fail the build early!
Reduce Lead Time: Stop 80% of Performance Issues in your Integration Phase
CI/CD: Test Automation (Selenium, Appium, Cucumber, Silk, ...) to detect functional and
architectural (performance, scalabilty) regressions
Perf: Performance Test (JMeter, LoadRunner, Neotys, Silk, ...) to detect tough performance issues
11/1/2016
26
Shift‐Left Performance results in Reduced Lead Timepowered by Dynatrace Test Automation
http://apmblog.dynatrace.com/2016/10/04/scaling‐continuous‐delivery‐shift‐left‐performance‐to‐improve‐lead‐time‐pipeline‐flow/
Faster Lead Times to User Value!Results in Business Success!
11/1/2016
27
QuestionsSlides: slideshare.net/grabnerandiGet Tools: bit ly/dtpersonalGet Tools: bit.ly/dtpersonalWatch: bit.ly/dttutorialsFollow Me: @grabnerandiRead More: blog.dynatrace.comListen: http://bit.ly/pureperfMail: [email protected]
Andreas GrabnerDynatrace Developer AdvocateDynatrace Developer Advocate
@grabnerandi@grabnerandi
http://blog.dynatrace.comhttp://blog.dynatrace.com