velocity2015 talk
TRANSCRIPT
Know before your customer knows:
Monitoring trends to predict failures in the cloud
Rakesh Ranjan Sriram Srinivasan Cloud Data Services, IBM Silicon Valley Lab
It used to be simple (… atleast for me, the developer)
§ I built a “product” - sometimes they were innovative, engineering marvels - and somebody sold them..
Dev
Cu$tomer
the “product” the delivered “product”
the release party. (and promotions + bonuses).(Confessions of a product developer)
it usually ended up in one of two ways..
Read the manuals,
whitepapers & “best” practises
Got mad & sad
Provided feedback & got help..
or hired “consultants”
Multiple releases later.. — satisfaction !
Heroes spoke about
the “successful” rollout
at conferences
Cu$tomer (the operator)
Dumped the product & somebody got the boot..(Confessions of a product developer)
What we do now..
§ Service Delivery § Monitoring & Operations § Service reliability engineering § Security & Compliance § Developer Advocacy § Support
Managed Services !
http://bluemix.net
https://cloudant.com/
+ …
Many moving parts..§ Bare metal servers in softlayer.com § Virtual Machines (Cloud Compute Instances) § Network complexity: firewalls & VLANs § Storage: SAN, ISCSI, Object store & backups § http servers, proxy/reverse proxy servers, Bluemix apps § Software: proprietrary, open-source § Delivery & Ops: Automation, Test, Monitoring & Control § Customers: free tiers, trial users, pay-as-you-go users & subscribers
- High Availability (near 100% up time requirement)
26k+ instances
55k+ users
1800+ servers in bluemix.net
1250 instances in cloudant.com
multiple data centers/Bluemix regions
Managed Services need smarter operations…
Management requires lots of Metrics & Visualizations
0
12,500
25,000
37,500
50,000
0
35
70
105
140
2014 2015
System Perf
User activity
Forums
Can’t just be in reactive mode..§ eNurture:
- Understand your users - And their apps & the technology they employ in the cloud. - Guide them when they need it and even before they realize they need help - Business imperative: Sell them more/upgrade them to better plans as and when
they grow retain and gain customers (keep users happy !)
§ Predictive analytics for Operational Efficiency - better planned growth - better forecast system failures - improve on Mean Time To Recovery - minimize cost of operations