linkedin sre: from inception to global scale · 24/7 deployments move to service oriented...
TRANSCRIPT
![Page 1: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/1.jpg)
SRE
Bruno Connelly
LinkedIn SRE:
From Inception to Global Scale
Bruno Connelly & Viji Nair
![Page 2: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/2.jpg)
About Us
Bruno Connelly VP, Engineering
Joined LinkedIn in April 2010
Background in ISP and Consumer Internet
Initial experience team building in Asia in 2005
![Page 3: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/3.jpg)
About Us
Viji NairDirector, SRE
Joined LinkedIn in January 2012
16 Years as an SRE + Startups
Open source evangelist and contributor
Working with global teams since 2006
![Page 4: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/4.jpg)
OUR VISION
Create economic opportunity for every member of the global workforce
![Page 5: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/5.jpg)
ECONOMIC GRAPH
MEMBERS COMPANIES
JOBSSKILLS
SCHOOLS KNOWLEDGE
![Page 6: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/6.jpg)
1M+ MEMBERS 80K+ COMPANIES
15K+ JOBS22K SKILLS
8K HIGHER EDU ORGS 12M+ CONVERSATIONS
![Page 7: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/7.jpg)
Growing Global Network
500M+ 100KMembers Articles published weekly
40%yr/yr increase in engaged
feed sessions weekly
2 50%New sign-ups per second Active members use
Linkedin Messaging weekly
100M+Monthly Unique Visitors
![Page 8: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/8.jpg)
Numbers Behind the Scenes
340K 770KGraph QPS
1.5KGraph EdgesEdge QPS
60BServices in production
4TKafka Messagesconsumed/day
600TB 2TKafka Messages published/day
12MData storage Peak Data QPS
![Page 9: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/9.jpg)
Edge POP Footprint
15Edge Locations
HONG KONG
SINGAPORE
SYDNEY
MUMBAI
SEATTLE
SAN JOSE
LOS ANGELES DALLAS
CHICAGO
MIAMI
ASHBURN
SÃO PAULO
DUBLIN
LONDON
FRANKFURT
![Page 10: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/10.jpg)
Production Application Footprint
OREGON
TEXAS
4Active Data Centers
VIRGINIA
SINGAPORE
![Page 11: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/11.jpg)
Engineering Footprint
SAN FRANCISCO NEW YORK
SUNNYVALE BANGALORE
![Page 12: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/12.jpg)
Looking back 2010 . . .
![Page 13: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/13.jpg)
Entire Production Footprint
2010
1Active Data Center
CHICAGO
LOS ANGELES
2Data Centers
![Page 14: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/14.jpg)
LinkedIn Operations
● Classical, stratified model: Systems, Networks, Applications, DBA
● Heavy-weight processes driven by tickets and heroes
● Culture of not trusting developers in any deployed environments
● Huge wall and growing frustration between Dev and Ops teams (and in ops itself)
● 7 engineers in total made up NOC, SRE, Release Operations: “Site Operations”
● On-call was horrible
2010
![Page 15: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/15.jpg)
Member Growth
500,000,000
450,000,000
400,000,000
350,000,000
300,000,000
250,000,000
200,000,000
150,000,000
100,000,000
50,000,000
02003 2004 2005 20072006 2008 2009 2010 2011 2012 2013 2014 2015 2016
7 Years of Tech Debt
2017
We were here
32% YOY Growth
![Page 16: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/16.jpg)
Is the Site Up?
● Peak traffic periods Mon-Wed ~ 6-10am
● Regular capacity related outages Mon-Wed ~ 6-10am
● Zero tolerance for failure in the application stack
● Near zero instrumentation
● Bi-weekly downtime maintenances
2010
![Page 17: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/17.jpg)
Let’s make a few changeschange software development model
active/active serving model
cheaper datacenters
remove monolithic databasesgraceful degradation
remove hardware load balancers
more data centers
move to service oriented architecture24/7 deployments
dev driven deployments
replace java serialized objects over RPC with REST APIs
modernize our application stack
move faster
self service everything
code contributions to the main application stack3x3 deployments
auto escalation
auto remediation
automated datacenter buildout
![Page 18: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/18.jpg)
Core Principles
Site Up Empower Developer Ownership
Operations is an Engineering Problem
1 2 3
![Page 19: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/19.jpg)
Everyone should be able to deploy code[safely]
![Page 20: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/20.jpg)
Self-service Deployments
Promote to a single production data center
“Canary” to a single production instance
EKG: automated metrics-based validation
Ramp features slowly to the member base
Promote to remaining production data centers
1
2
3
4
5
15K+Successful commits/day
Code promotions/day
200+
600+Feature ramps/day
![Page 21: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/21.jpg)
Create a culture of operational metrics“What gets measured gets fixed.”
![Page 22: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/22.jpg)
REST API
Self-service Instrumentation and Monitoring
java applications
non-java applications
metrics collectors
alerting visualization
metrics api
IRIS
23KGraph dashboards
10MMetrics ingested/sec
340KAlerts processed/min
600M+Total metrics
![Page 23: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/23.jpg)
We don’t want a traditional NOC[permanently]
![Page 24: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/24.jpg)
Correlation Engine
Self-service Remediation and Escalation
15KRemediation Plans
Escalation Plans
9K
17KExecutions/day
Alerts Salt
Deployment
Metrics
Notify(IRIS, JIRA, etc..)
FeedbackNurse
![Page 25: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/25.jpg)
Unexpected outcomes from a different model
● Tiered escalation systems did not scale and incentivized undesirable outcomes
● Introduced “Production SRE”; SREs solving NOC problems via software
● Created NOC to SRE transition program
Escalation
Fixed?
N
Y
Alerts
Auto-remediation
![Page 26: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/26.jpg)
Scaling the team into India
![Page 27: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/27.jpg)
Member Growth
500,000,000
450,000,000
400,000,000
350,000,000
300,000,000
250,000,000
200,000,000
150,000,000
100,000,000
50,000,000
02003 2004 2005 20072006 2008 2009 2010 2011 2012 2013 2014 2015 2016
SRE Established
Bangalore Office Online
2017
Bangalore SRE
![Page 28: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/28.jpg)
Ground Rules
Start simple
2
Follow the sun is not the goal
1
Same culture, same hiring bar, same everything
3
![Page 29: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/29.jpg)
We made mistakes and learned from them
404
Armed w/ our principles & playbook...
Resources not found
![Page 30: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/30.jpg)
Lesson 1: Bootstrapping Remote Teams Is Hard
Evangelize SRE role3
Pre conceived notion of US based companies
Standardize SRE title2 Negative association of the “SRE” title
Invest in potential & new grads
Applying US model did not work
1
![Page 31: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/31.jpg)
OUTCOMES
Lesson 1: Bootstrapping Remote Teams Is Hard
Don’t expect hiring patterns from other markets to
necessarily work
Leverage the community to help; you’re likely not alone
Leverage other sources: College Grads, High-Potential
junior engineers
![Page 32: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/32.jpg)
Lesson 2: Create Your Own Identity
3Need engaging work
to sustain teamsProvide quality work
for the teams
FINDERDC MANAGER Projects & & ALERT CORRELATION 2 Work patterns are
differentFocus on opportunity
and strengths
1 Invest based in value addition & local
needsMirroring teams does
not work
![Page 33: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/33.jpg)
OUTCOMES
Lesson 2: Create Your Own Identity
Don’t map teams 1:1 across offices
Find the right balance, quality vs quantity
Use lulls in workload to give them hard problems they can
own
![Page 34: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/34.jpg)
Lesson 3: One Size Does Not Fit All
33Alignment on
leadership commitment & investment
Right leadership is essential
1 Curate to specific team needEvery team is unique
2 Equal partnership in running a product
Equal ownership is critical to success
![Page 35: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/35.jpg)
OUTCOMES
Lesson 3: One Size Does Not Fit All
Focus on teams that have shown that they can
collaborate
Mistakes will happen; own them and fix them quickly
![Page 36: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/36.jpg)
Bangalore Today
Building region-specific products:
● Performance focused mobile client● New college grad placements● Filling India’s blue-collar labor gap
Strong team presence:
● 60+ SREs across 10 teams● 100+ application developers● Owns LinkedIn core messaging
platform
![Page 37: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/37.jpg)
SFSan Francisco
SNVSunnyvale BLR Bangalore
NYC New York CitySRE
SRE Globally Today
Learnings from BLR led to other regional expansions
300+ SREs across four global offices
![Page 38: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/38.jpg)
Takeaways from our Journey
Unique office identities
leveraged for other
engineering presencesDirectionally good
ideas haven’t failed us
yet
It really does take
a villageThis isn’t easy, but
it is possible
![Page 39: LinkedIn SRE: From Inception to Global Scale · 24/7 deployments move to service oriented architecture dev driven deployments replace java serialized objects over RPC with REST APIs](https://reader034.vdocument.in/reader034/viewer/2022042915/5f5061e8a041f3044863c156/html5/thumbnails/39.jpg)
Thanks!