INTEGRATION TESTING AS VALIDATION AND
MONITORINGMelissa BenuaSenior Backend EngineerPlayFab, IncSTARWEST 2015
The challenge: Monitoring SaaS products
Software as a service is exploding, and so is testing complexity:
1. Not enough just to run tests at build time, now you also need need deploy-time integration tests and continuous network monitoring
2. Every layer of tests addscomplexity & maintenance costs
3. There are a limited amount of engineer-hours in the day
4. Engineers want to use their timewith maximum efficiency
Time spent writing the same tests over again is time that could be spent doing more interesting and important stuff!
EXISTING OPTIONSCommercial products you can buy now!
3
Cloud Monitoring Services
Providers: • Keynote• Gomez• Pingdom
Pros:• Lightweight• Integrated alerting• Public vs. private status pages
Cons:• Difficult to manage multiple contributors• Can’t do complex checks easily (log in a user and verify inventory)• Can get expensive or require enterprise contracts
Hosted Monitoring Services
Providers:• Sensu• System Center Operations
Manager (SCOM)• Nagios
Pros:• Extremely powerful• Older technology
Cons:• Complex to set up• Single centralized server• Overkill for many services hosted in the cloud
OUR APPROACHDo it the PlayFab way!
May 3, 2023 PlayFab Confidential 6
Our Solution
1. Author one set of HTTP-level tests• Same as how clients connect• Self-contained and self-initializing• Repeatable and reliable
2. Deploy tests both within the build environmentand within the monitoring cloud
3. Collect data from tests into one central location4. Present data for use by both devops and customers
Pros:• Efficient use of engineering resources• VM hosting bill is very small• Can run complex tests without worrying about maintainability
Cons:• Pipeline requires some maintenance• Requires knowing how to use two different clouds• Must be able to do test setup from within a different ecosystem
Our solution, cont’d
Goals:• Minimize number of lines of code
duplicated per functional piece• Reliable & trustworthy reporting• Affordable cost• Adequate geo-location• Very low maintenance time cost• Easy to access• More free time for engineering!
Limitations:• Smaller # of monitoring leaf nodes (~10 instead of ~100 or ~1000)• Vulnerable to gaps in dev logic• Not as straightforward to set up• Monitoring is only as good as your testing!
TESTING SCENARIOSOne of these may look familiar!
May 3, 2023 PlayFab Confidential 9
Scenario A – RESTful API
Sample characteristics:• Custom service in Java layered on
Apache• Private hosting• Tests via Junit• Authenticates using private login• Connects to several different backend
services (mongodb, sql, analytics, queueing, etc)
Scenario B – MVC Website
Sample characteristics:• Built on .net MVC• Hosted in Azure• Testing via custom harness• Authenticates using OAuth and Facebook• Backends into locally-hosted SQL server
Scenario C - PlayFab
Characteristics:• JSON API built on C# + management website
• https://api.playfab.com/documentation• Hosted in Windows on AWS• Tests via VSTest• Many moving parts
• Game server hosting• Client versus server authentication• Third-party purchasing and auth providers• Various backend data sources
IMPLEMENTING OUR SOLUTIONHow to wire up the pipeline!
May 3, 2023 PlayFab Confidential 13
14
Architecture
Build Server Compiles
code Runs tests
ProductionDeploys
Web Server Collects Data
Web Site Displays Data
DeveloperWritesTests
Europe
Microsoft Azure
US-West US-East Asia
Amazon Web ServicesSubmits Code
HTTP
HTTP
HTTP
HTTP
Utilized Tech
Test Framework• VSTest or Junit or custom executor• Must output a predictable, machine-readable format (.TRX from VSTest comes with an XSD for easy parsing)
Execution + Communication Layer• Consul or custom cross-DC chatter• Consul API is in many languages, easy to secure and simple configure• Regularly executes the test executable• Shares test results as ‘service health checks’ across DCs
Custom Data Bridge• Transform test framework output into Consul input
May 3, 2023 PlayFab Confidential 16
Picking Monitoring Tests
Full App Integration Test Suite
Internal Service A Test Suite
Library Unit Test Suite
Integration Suite
Internal Service B Test Suite
Integration Suite
Picking Monitoring Tests, con’t
Must-haves:• Happen at same layer clients access (HTTP, generally)• Cover key ‘P0’ functionality areas• Cover areas with lots of ‘moving parts’
Nice-to-haves:• All exposed APIs• Third-party integrations• Full success-testing run
Ideal world:• Full integration test suite
Scenario Must-Have Test Cases
REST API• Login/Authenticate• Logout• One test per
downstream service• Stretch: one test per
API
MVC Website• One test per login
method (OAuth, Facebook)
• Key pages• Basic SQL coverage
Deployment Pipeline
The fewer manual steps the better!
Sample flow:
Submit Code
to Repo
CI Runs Build
CI Runs Tests
Deployment
Packages Created
Tests Deployed
into Monitor
Cloud Storage
Cloud Storage Distribut
es to VMs
Monitoring Cloud
Any cloud will do!Number of regions is important• Azure has https://azure.microsoft.com/en-us/regions/#services• AWS has http://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_regionVMs can be teeny – no need for heavy compute or memory usage
Test Execution Frequency
How complex is it to run your tests?• Run a simple executable?• Have to download a lot of data?• Long setup phase?• How long does a full test pass take?Periodic execution (every N seconds)Faster is better! Pingdom ‘free’ tier is every 15 minutes per checkIdeal range is between 30 seconds and 5 minutesBe careful not to drown your ‘real traffic’• Test traffic hiding problems with real users is a legitimate issue!• Try to stay under 10% of total traffic if possible
Collecting Results
Execute TestsPut machine-readable test results into collator• Consul accepts Datacenter, CaseName, Pass/Warn/Fail, Note (we store latency)• Agents may be updated using SDK or direct to HTTP interface• Example: http://localhost:8500/v1/agent/check/pass/mytestcase• Full HTTP API: https://www.consul.io/docs/agent/http.htmlSmall adapter program reads test results and outputs to Consul Agent (SDK or HTTP)
Output!
Alerting
Ideal to hear about outages as a push rather than a pullDetermine what ‘failure’ means to you• Balance between false alarms and missing real alarmsMany options!• Post alerts into VictorOps for paging• Send email from monitoring website• Send push notification through your cloud
Questions?
Melissa [email protected]://www.linkedin.com/in/mbenuahttp://www.slideshare.net/MelissaBenua
APPENDIXTechnical Details and Sample Config
May 3, 2023 PlayFab Confidential 26
Partial Consul Configuration
{"datacenter": "prd-uswest1","retry_join_wan": [ “west.cloudapp.net", “east.cloudapp.net" ], "server": true, "service": { "name": "pfmonitor", "checks": [ { "script": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -file c:\\runtests.ps1", "interval": "120s" } ] }}
Consul Commands
Full HTTP API: https://www.consul.io/docs/agent/http.htmlAdd a health check:
$body = { "ID": “mypath", "Name": "Path Works", "Notes": "Checking uptime and latency", "HTTP": "http://my.service.com/path", "TTL": "45s"}
• Invoke-WebRequest http://localhost:8500/v1/agent/check/register -Body $bodyList the health checks:• Invoke-WebRequest http://localhost:8500/v1/health/checks/myservice
[ { "Node": "somenode", "CheckID": “mypath", "Name": “Path Works", "Status": "passing", },]
Consul Commands
Update a health check:• Can add ?note=foo to pass details like latency• Invoke-WebRequest http://localhost:8500/v1/agent/check/pass/mypath• Invoke-WebRequest http://localhost:8500/v1/agent/check/warn/mypath• Invoke-WebRequest http://localhost:8500/v1/agent/check/fail/mypath