maintenance challenges mike hughes. the challenge significant amounts of maintenance on both lans ...
TRANSCRIPT
Maintenance ChallengesMaintenance Challenges
Mike Hughes <[email protected]>
The ChallengeThe Challenge
Significant amounts of maintenance on both LANsComplete upgrade of Extreme LAN hardwareMajor upgrade of Foundry LAN (RX16s)Reconfiguration of backbone rings in both
LANs Deploying MRP2 on Foundry, EAPSv2 on Extreme
“Chains” of dependencies Would be a total of between 14-16 slots
It would take ages!It would take ages!
Maintenance is usually one, occasionally two slots per week, minimum 7 days noticePlus, only one LAN at once to minimise risk
Would take between 3 and 4 months Not accounting for failures/backouts
Would cause delays of an additional week per backout
Oh, and we needed to get it done sooner!High demand for 10G ports!
Intense ApproachIntense Approach
Block out whole weeks for maintenance Allowing between 1 and 4 slots each week
Alternate between Foundry and Extreme Allowing changes to settle in
Ability to rapidly reschedule (<24 hours) for maintenance which overran or had to be backed out
Hire in external hands to assist Reduce direct stress on LINX staff Ensure “backup” crew available next day
Breaking the NewsBreaking the News
First get your colleagues on side Then, break it to your members! Surprisingly well received
Especially given the “otherwise we’ll still be finishing this in September” situation
It seemed it would only cause significant problems for one member (using DNS-based load distribution)
We even managed to mitigate some of that
SchedulingScheduling
6 new switchesAll complete box swaps
One hard cutover, 5 staggered migrations
2 different software upgradesInvolving 6 different switches
New inter-site fibre delivery Backbone reconfiguration Little opportunity to build “greenfield”
Weekly BreakdownWeekly Breakdown1st
May
8th May
15th May
22nd May
29th May
5th June
12th June
19th June
26th June
3rd July
LINX 53 Meeting
Euro-IX Meeting Migrate 100M ports to RX @ Telehouse
East
Install RX16 @ Telehouse East
Insert 3x BD8800s into EAPS ring
UKNOF Meeting
Migrate all member ports on 2x BD8800s
Migrate all member ports on 2x BD8800s
Migrate 1G ports to RX @ Telehouse
East
NANOG Meeting, San Jose
Upgrade MG8 s/w
Upgrade Jetcore s/w 4x BI8000s
Retry Failed Jetcore s/w upgrade on one
switch only
Retry Failed Jetcore s/w upgrade
Migrate entire MG8 to new RX16 @ THN
- 7hr maint!
EAPSv2 pre-deploy and build(Using new intersite fibre)
Activate New EAPSv2 Backbone
Topology
Deploy MRP2 and upgrade interswitch
trunks
Emergency Maintenance on a
BD8800 MSM
SLACK WEEK
SLACK WEEK
New Staff Member
Foundry Network
Extreme Network
External Event
LINX Meeting
Emergency Maintenance
Window
LEGEND
Weekly BreakdownWeekly Breakdown1st
May
8th May
15th May
22nd May
29th May
5th June
12th June
19th June
26th June
3rd July
LINX 53 Meeting
Euro-IX Meeting Migrate 100M ports to RX @ Telehouse
East
Install RX16 @ Telehouse East
Insert 3x BD8800s into EAPS ring
UKNOF Meeting
Migrate all member ports on 2x BD8800s
Migrate all member ports on 2x BD8800s
Migrate 1G ports to RX @ Telehouse
East
NANOG Meeting, San Jose
Upgrade MG8 s/w
Upgrade Jetcore s/w 4x BI8000s
Retry Failed Jetcore s/w upgrade on one
switch only
Retry Failed Jetcore s/w upgrade
Migrate entire MG8 to new RX16 @ THN
- 7hr maint!
EAPSv2 pre-deploy and build(Using new intersite fibre)
Activate New EAPSv2 Backbone
Topology
Deploy MRP2 and upgrade interswitch
trunks
Emergency Maintenance on a
BD8800 MSM
SLACK WEEK
SLACK WEEK
New Staff Member
Foundry Network
Extreme Network
External Event
LINX Meeting
Emergency Maintenance
Window
LEGEND
We broke the “news” to the members here
Weekly BreakdownWeekly Breakdown1st
May
8th May
15th May
22nd May
29th May
5th June
12th June
19th June
26th June
3rd July
LINX 53 Meeting
Euro-IX Meeting Migrate 100M ports to RX @ Telehouse
East
Install RX16 @ Telehouse East
Insert 3x BD8800s into EAPS ring
UKNOF Meeting
Migrate all member ports on 2x BD8800s
Migrate all member ports on 2x BD8800s
Migrate 1G ports to RX @ Telehouse
East
NANOG Meeting, San Jose
Upgrade MG8 s/w
Upgrade Jetcore s/w 4x BI8000s
Retry Failed Jetcore s/w upgrade on one
switch only
Retry Failed Jetcore s/w upgrade
Migrate entire MG8 to new RX16 @ THN
- 7hr maint!
EAPSv2 pre-deploy and build(Using new intersite fibre)
Activate New EAPSv2 Backbone
Topology
Deploy MRP2 and upgrade interswitch
trunks
Emergency Maintenance on a
BD8800 MSM
SLACK WEEK
SLACK WEEK
New Staff Member
Foundry Network
Extreme Network
External Event
LINX Meeting
Emergency Maintenance
Window
LEGEND
This window had to be backed out
Weekly BreakdownWeekly Breakdown1st
May
8th May
15th May
22nd May
29th May
5th June
12th June
19th June
26th June
3rd July
LINX 53 Meeting
Euro-IX Meeting Migrate 100M ports to RX @ Telehouse
East
Install RX16 @ Telehouse East
Insert 3x BD8800s into EAPS ring
UKNOF Meeting
Migrate all member ports on 2x BD8800s
Migrate all member ports on 2x BD8800s
Migrate 1G ports to RX @ Telehouse
East
NANOG Meeting, San Jose
Upgrade MG8 s/w
Upgrade Jetcore s/w 4x BI8000s
Retry Failed Jetcore s/w upgrade on one
switch only
Retry Failed Jetcore s/w upgrade
Migrate entire MG8 to new RX16 @ THN
- 7hr maint!
EAPSv2 pre-deploy and build(Using new intersite fibre)
Activate New EAPSv2 Backbone
Topology
Deploy MRP2 and upgrade interswitch
trunks
Emergency Maintenance on a
BD8800 MSM
SLACK WEEK
SLACK WEEK
New Staff Member
Foundry Network
Extreme Network
External Event
LINX Meeting
Emergency Maintenance
Window
LEGEND
This window had to be backed out
Weekly BreakdownWeekly Breakdown1st
May
8th May
15th May
22nd May
29th May
5th June
12th June
19th June
26th June
3rd July
LINX 53 Meeting
Euro-IX Meeting Migrate 100M ports to RX @ Telehouse
East
Install RX16 @ Telehouse East
Insert 3x BD8800s into EAPS ring
UKNOF Meeting
Migrate all member ports on 2x BD8800s
Migrate all member ports on 2x BD8800s
Migrate 1G ports to RX @ Telehouse
East
NANOG Meeting, San Jose
Upgrade MG8 s/w
Upgrade Jetcore s/w 4x BI8000s
Retry Failed Jetcore s/w upgrade on one
switch only
Retry Failed Jetcore s/w upgrade
Migrate entire MG8 to new RX16 @ THN
- 7hr maint!
EAPSv2 pre-deploy and build(Using new intersite fibre)
Activate New EAPSv2 Backbone
Topology
Deploy MRP2 and upgrade interswitch
trunks
Emergency Maintenance on a
BD8800 MSM
SLACK WEEK
SLACK WEEK
New Staff Member
Foundry Network
Extreme Network
External Event
LINX Meeting
Emergency Maintenance
Window
LEGEND
We retried using different code next day
Weekly BreakdownWeekly Breakdown1st
May
8th May
15th May
22nd May
29th May
5th June
12th June
19th June
26th June
3rd July
LINX 53 Meeting
Euro-IX Meeting Migrate 100M ports to RX @ Telehouse
East
Install RX16 @ Telehouse East
Insert 3x BD8800s into EAPS ring
UKNOF Meeting
Migrate all member ports on 2x BD8800s
Migrate all member ports on 2x BD8800s
Migrate 1G ports to RX @ Telehouse
East
NANOG Meeting, San Jose
Upgrade MG8 s/w
Upgrade Jetcore s/w 4x BI8000s
Retry Failed Jetcore s/w upgrade on one
switch only
Retry Failed Jetcore s/w upgrade
Migrate entire MG8 to new RX16 @ THN
- 7hr maint!
EAPSv2 pre-deploy and build(Using new intersite fibre)
Activate New EAPSv2 Backbone
Topology
Deploy MRP2 and upgrade interswitch
trunks
Emergency Maintenance on a
BD8800 MSM
SLACK WEEK
SLACK WEEK
New Staff Member
Foundry Network
Extreme Network
External Event
LINX Meeting
Emergency Maintenance
Window
LEGEND
…and the rest two days later
AfterthoughtsAfterthoughts
The blocking out of whole weeks for maintenance worked Especially when we could quickly reschedule that
maintenance work which got backed out Otherwise, we’d have had at least 7 days “slip”
The intensive approach worked We haven’t had to raise a scheduled maintenance
ticket until this week! Prevented network operating for extended periods in
“transitional state”
What we started withWhat we started with
What we ended up with - FoundryWhat we ended up with - Foundry
What we ended up with - ExtremeWhat we ended up with - Extreme
Any questions?Any questions?