how the disaster cluster recovered · i/os kaboom:: alpha es40 quorum:: integrity rx2620 sdboom::...
TRANSCRIPT
![Page 1: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/1.jpg)
![Page 2: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/2.jpg)
© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice
How the Disaster Proof OpenVMS Cluster Recovered So Fast, and How Yours Can, Too
Keith Parris Systems/Software Engineer
HPMonday, May 19 and Wednesday, May 21
![Page 3: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/3.jpg)
Story of the OpenVMS Cluster in the Disaster Proof Video
![Page 4: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/4.jpg)
4 30 July 2015
Disaster Proof Demonstration and Video
![Page 5: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/5.jpg)
5 30 July 2015
Camden Arkansas NTS
![Page 6: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/6.jpg)
6 30 July 2015
The Failover Datacenter
![Page 7: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/7.jpg)
7 30 July 2015
The original “green” datacenter
![Page 8: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/8.jpg)
8 30 July 2015
Nature gets in on the act!
![Page 9: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/9.jpg)
9 30 July 2015
KABOOM! Arkansas on the ground
![Page 10: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/10.jpg)
10 30 July 2015
OpenVMS Disaster-Proof configuration & application
XP12000 XP24000
Shadow set
Strea
m of
I/Os
KABOOM::
Alpha
ES40
QUORUM::
Integrity
rx2620
SDBOOM::
Integrity
Superdome
All I/O’s need to complete
to all spindles before it is
considered done.
When a spindle drops out
The shadow set is reduced.
I/O’s “in flight” wait for the
Shadow set to be reduced.
The longest outstanding request for an I/O during the DP demo was 13.71 seconds.
![Page 11: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/11.jpg)
11 30 July 2015
GQB ready for a ride!
![Page 12: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/12.jpg)
12 30 July 2015
Disaster Proof Demo OpenVMS Cluster
![Page 13: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/13.jpg)
How the Disaster Proof OpenVMS Cluster Recovered So Fast, and How Yours Can, Too
![Page 14: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/14.jpg)
OpenVMS Cluster Failure Detection Mechanisms and Cluster State Transitions
![Page 15: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/15.jpg)
15 30 July 2015
OpenVMS Cluster Connection Manager and Transient Failures
• Some failures are temporary and transient
− Especially in a LAN environment
• To prevent the disruption of unnecessary removal of a node from the cluster, when a communications failure is detected, the Connection Manager waits for a time in hopes of the problem going away by itself
− This time is called the Reconnection Interval
• SYSGEN parameter RECNXINTERVAL
− RECNXINTERVAL is dynamic and may thus be temporarily raised if needed for something like a scheduled LAN outage
![Page 16: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/16.jpg)
16 30 July 2015
OpenVMS Cluster Connection Manager and Communications or Node Failures
• If the Reconnection Interval passes without connectivity being restored, or if the node has “gone away”, the cluster cannot continue without a reconfiguration
• This reconfiguration is called a State Transition, and one or more nodes will be removed from the cluster
![Page 17: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/17.jpg)
17 30 July 2015
Failure and Repair/Recovery within Reconnection Interval
Failure occurs
Failure detected
(virtual circuit
broken)
Problem fixed
Fixed state detected
(virtual circuit
re-opened)
Time
RECNXINTERVAL
![Page 18: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/18.jpg)
18 30 July 2015
Hard Failure
Failure occurs
Failure detected
(virtual circuit
broken)
State transition
(node removed
from cluster)
Time
RECNXINTERVAL
![Page 19: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/19.jpg)
19 30 July 2015
Late Recovery
Failure occurs
Failure detected
(virtual circuit
broken)
State transition
(node removed
from cluster)
Time
RECNXINTERVAL
Problem fixed
Fix detected
Node does CLUEXIT
bugcheck
Node learns it has been
removed from cluster
![Page 20: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/20.jpg)
20 30 July 2015
Failure Detection Mechanisms
• Mechanisms to detect a node or communications failure
− Last-Gasp Datagram
− Periodic checking
• Multicast Hello packets on LANs
• Polling on CI and DSSI
• TIMVCFAIL check
![Page 21: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/21.jpg)
21 30 July 2015
PEDRIVER Hello Packet Timing
• Hello packet Transmit Interval
−Default is 3 seconds
−Dithered by reducing to as much as half to avoid forming”packet trains”
• so Hellos could be spaced as close as 1.5 seconds, or as far apart as 3 seconds
• Hello packet Listen Timeout
−Default is 8 seconds
−Allows detection of failure in between 8 and 9 seconds
![Page 22: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/22.jpg)
22 30 July 2015
Failure Detection onLAN interconnects
Time t=0
Time t=3
Time t=6
Time t=9
Remote node Local node
Hello packet
Hello packet
Hello packet (lost)
Hello packet
Clock ticks
01
2
30
12
34
5
6
10
Listen Timer
![Page 23: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/23.jpg)
23 30 July 2015
Failure Detection onLAN interconnects
Time t=0
Time t=3
Time t=6
Remote node Local node
Hello packet
Hello packet (lost)
Clock ticks
01
2
3
4
5
6
Listen Timer
7
8Virtual
Circuit
Broken
Hello packet (lost)
![Page 24: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/24.jpg)
24 30 July 2015
TIMVCFAIL Mechanism
Local node Remote node
Time t=0
Time t=1/3 of TIMVCFAIL
Time t=2/3 of TIMVCFAIL
Request
Response
Request
Response
![Page 25: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/25.jpg)
25 30 July 2015
TIMVCFAIL Mechanism
Local node Remote node
Time t=0
Time t=1/3 of TIMVCFAIL
Time t=2/3 of TIMVCFAIL
Request
Response
Request
Time t=TIMVCFAIL
Node fails
some time during
this period
1
2
Virtual circuit broken
![Page 26: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/26.jpg)
26 30 July 2015
Sequence of eventsDuring a State Transition
• Determine new cluster configuration
• If quorum is lost:
• QUORUM capability bit removed from all CPUs
• no process can be scheduled to run
• Disks all put into mount verification
• If quorum is not lost, continue…
• Rebuild lock database
• Stall lock requests
• I/O synchronization
• Do rebuild work
• Resume lock handling
![Page 27: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/27.jpg)
27 30 July 2015
Measuring State Transition Effects
• Determine the type of the last lock rebuild:$ ANALYZE/SYSTEM
SDA> READ SYS$LOADABLE_IMAGES:SCSDEF
SDA> EVALUATE @(@CLU$GL_CLUB + CLUB$B_NEWRBLD_REQ) & FF
Hex = 00000002 Decimal = 2 ACP$V_SWAPPRV
• Rebuild type values:
1. Merge (locking not disabled)
2. Partial
3. Directory
4. Full
![Page 28: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/28.jpg)
28 30 July 2015
Measuring State Transition Effects
• Determine the duration of the last lock request stall period:
SDA> DEFINE TOFF = @(@CLU$GL_CLUB+CLUB$L_TOFF)
SDA> DEFINE TON = @(@CLU$GL_CLUB+CLUB$L_TON)
SDA> EVALUATE TON-TOFF
Hex = 0000026B Decimal = 619 PDT$Q_COMQH+00003
![Page 29: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/29.jpg)
29 30 July 2015
Minimizing Impactof State Transitions
• Configurations issues:
− Few (e.g. exactly 3) nodes
− Quorum node; no quorum disk
− Set up LAN cluster interconnect to minimize length of time packet-forwarding is blocked
• Original IEEE 802.1d Spanning Tree algorithm could take 35-40 seconds to converge and start forwarding packets again
− Two completely-independent spanning trees could help avoid communications being blocked on both at once
• Newer IEEE 802.1w Rapid Spanning Tree (and IEEE 802.1s Multiple Spanning Tree) protocols can be configured to recover in less than 1 second
![Page 30: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/30.jpg)
Disaster Proof Demonstration Settings and Behavior
![Page 31: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/31.jpg)
31 30 July 2015
OpenVMS System Parameter Settings for the Disaster Proof Demonstration
• SHADOW_MBR_TMO lowered from default of 120 down to 8 seconds
• RECNXINTERVAL lowered from default of 20 down to 10 seconds
• TIMVCFAIL lowered from default of 1600 to 400 (4 seconds, in 10-millisecond clock units) to detect node failure in 4 seconds, worst-case, (detecting failure at the SYSAP level)
• LAN_FLAGS bit 12 set to enable Fast LAN Transmit Timeout (give up on a failed packet transmit in 1.25 seconds, worst case, instead of an order of magnitude more in some cases)
• PE4 set to hexadecimal 0703 (Hello transmit interval of 0.7 seconds, nominal; Listen Timeout of 3 seconds), to detect node failure in 3-4 seconds at the PEDRIVER level
![Page 32: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/32.jpg)
32 30 July 2015
Disaster Proof Demo Timeline
• Time = 0: Explosion occurs
• Time around 3.5 seconds: Node failure detected, via either PEDRIVER Hello Listen Timeout or TIMVCFAIL mechanism. VC closed; Reconnection Interval starts.
• Time = 8 seconds: Shadow Member Timeout expires; shadowset members removed.
• Time around13.5 seconds: Reconnection Interval expires; State Transition begins.
• Time = 13.71 seconds: Recovery complete; Application processing resumes.
![Page 33: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/33.jpg)
33 30 July 2015
Disaster Proof Demo Timeline
Explosion
Failure Detection Time
PEDRIVER Hello Listen Timeout or
TIMVCFAIL Timeout
T = 0 T = about 3.5 seconds
![Page 34: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/34.jpg)
34 30 July 2015
Disaster Proof Demo Timeline
Explosion
Shadow Member Timeout
Failed Shadowset Members Removed
T = 0 T = 8 seconds
![Page 35: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/35.jpg)
35 30 July 2015
Disaster Proof Demo Timeline
Reconnection Interval
PEDRIVER Hello Listen Timeout or
TIMVCFAIL Timeout
T = 0 T = about 3.5 seconds
Explosion
T = about 13.5 seconds
State Transition Begins
![Page 36: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/36.jpg)
36 30 July 2015
Disaster Proof Demo Timeline
T = 0 T = 13.71 seconds
Explosion
T = about 13.5 seconds
Node Removed
from Cluster Application Resumes
Cluster State Transition
Lock Database Rebuild
State Transition Begins
![Page 37: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/37.jpg)
Simulation and Testing of Long Distance DR/DT Configurations
![Page 38: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/38.jpg)
38 30 July 2015
Trends
![Page 39: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/39.jpg)
39 30 July 2015
Trends
• Increase in disasters
• Longer inter-site distances for better protection
• Business pressures for shorter distances for performance
• Increasing pressure not to bridge LANs between sites
![Page 40: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/40.jpg)
40 30 July 2015
• Trends
− Increase in Disasters
Trends
![Page 41: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/41.jpg)
41 30 July 2015
“Natural disasters have quadrupled over the last two decades, from an average of 120 a year in the early 1980s to as many as 500 today.”
Continuity Insights Magazine
Nov./Dec. 2007 issue, page 10
![Page 42: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/42.jpg)
42 30 July 2015
“There has been a six-fold increase in floods since 1980. The number of floods and wind-storms have increased from 60 in 1980 to 240 last year.”
Continuity Insights Magazine
Nov./Dec. 2007 issue, page 10
![Page 43: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/43.jpg)
44 30 July 2015
Increase in Disasters
http://www.oxfam.org/en/files/bp108_climate_change_alarm_0711.pdf/download
![Page 44: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/44.jpg)
46 30 July 2015
• Trends
− Longer inter-site distances for better protection
Trends
![Page 45: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/45.jpg)
47 30 July 2015
“Some CIOs are imagining potential disasters that go well beyond the everyday hiccups that can disrupt applications and networks. Others, recognizing how integral IT is to business today, are focusing on the need to recover instantaneously from any unforeseen event.” …“It's a different world. There are so many more things to consider than the traditional fire, flood and theft.”
“Redefining Disaster“
Mary K. Pratt, Computerworld, June 20, 2005http://www.computerworld.com/hardwaretopics/storage/story/0,10801,102576,00.html
![Page 46: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/46.jpg)
48 30 July 2015
Northeast US Before Blackout
Source: NOAA/DMSP
![Page 47: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/47.jpg)
49 30 July 2015
Northeast US After Blackout
Source: NOAA/DMSP
![Page 48: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/48.jpg)
50 30 July 2015
“The blackout has pushed many companies to expand their data center infrastructures to support data replication between two or even three IT facilities -- one of which may be located on a separate power grid.”
Computerworld, August 2, 2004http://www.computerworld.com/securitytopics/security/recovery/story/0,10801,94944,00.html
![Page 49: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/49.jpg)
51 30 July 2015
“You have to be far enough apart to make sure that conditions in one place are not likely to be duplicated in the other.“… “A useful rule of thumb might be a minimum of about 50 km, the length of a MAN, though the other side of the continent might be necessary to play it safe.”“Disaster Recovery Sites: How Far Away is Far Enough?”
Drew Robb, Datamation, October 4, 2005http://www.enterprisestorageforum.com/continuity/features/article.php/3552971
![Page 50: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/50.jpg)
52 30 July 2015
Trends:Longer inter-site distances for better protection
• In the past, protection was focused against risks like fires, floods, tornadoes. 1 to 5 miles was fine between sites.
• Right after 9/11, 60 to100 miles looked much better.
• After the Northeast Blackout of 2003, and increasing awareness of the possibility of a terrorist group obtaining a nuclear device and wiping out an entire metropolitan area is no longer inconceivable.
− Resulting pressure is for inter-site distances of 1,000 to 1,500 miles
• Challenges:
− Telecommunications links
− Latency due to speed of light adversely affects performance
![Page 51: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/51.jpg)
53 30 July 2015
• Trends
−Business pressures for shorter distances for performance
Trends
![Page 52: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/52.jpg)
54 30 July 2015
“A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm, by one estimate.”
Richard Martin, InformationWeek,
April 23, 2007
![Page 53: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/53.jpg)
55 30 July 2015
“The fastest systems, running from traders' desks to exchange data centers, can execute transactions in a few milliseconds -- so fast, in fact, that the physical distance between two computers processing a transaction can slow down how fast it happens.”Richard Martin, InformationWeek,
April 23, 2007
![Page 54: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/54.jpg)
56 30 July 2015
“This problem is called data latency --delays measured in split seconds. To overcome it, many high-frequency algorithmic traders are moving their systems as close to the Wall Street exchanges as possible.”
Richard Martin, InformationWeek,
April 23, 2007
![Page 55: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/55.jpg)
57 30 July 2015
• Trends
− Increasing pressure not to bridge LANs between sites
Trends
![Page 56: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/56.jpg)
58 30 July 2015
Trends:Increasing Resistance to LAN Bridging
• In the past, setting up a VLAN spanning sites for an OpenVMS disaster-tolerant cluster was common
• Networks are now IP-centric
• IP network mindset sees LAN bridging as “bad,” sometimes even “totally unacceptable”
• Alternatives:−Separate, private link for OpenVMS Multi-site Cluster
−Metropolitan Area Networks (MANs) using MPLS
−Ethernet-over-IP (EoIP)
−SCS-over-IP support planned for OpenVMS 8.4
![Page 57: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/57.jpg)
59 30 July 2015
Site Selection and Inter-Site Distance
![Page 58: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/58.jpg)
60 30 July 2015
Planning for DT: Site Selection
Sites must be carefully selected:
• Avoid hazards
− Especially hazards common to both (and the loss of both datacenters at once which might result from that)
• Make them a “safe” distance apart
• Select site separation in a “safe” direction
![Page 59: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/59.jpg)
61 30 July 2015
Planning for DT: What is a “Safe Distance”
Analyze likely hazards of proposed sites:
• Natural hazards
− Fire (building, forest, gas leak, explosive materials)
− Storms (Tornado, Hurricane, Lightning, Hail, Ice)
− Flooding (excess rainfall, dam breakage, storm surge, broken water pipe)
− Earthquakes, Tsunamis
![Page 60: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/60.jpg)
62 30 July 2015
Planning for DT: What is a “Safe Distance”
Analyze likely hazards of proposed sites:
• Man-made hazards
− Nearby transportation of hazardous materials (highway, rail)
− Terrorist with a bomb
− Disgruntled customer with a weapon
− Enemy attack in war (nearby military or industrial targets)
− Civil unrest (riots, vandalism)
![Page 61: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/61.jpg)
63 30 July 2015
Former Atlas E Missile Silo Site in Kimball, Nebraska
![Page 62: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/62.jpg)
64 30 July 2015
Planning for DT: Site Separation Distance
• Make sites a “safe” distance apart
• This must be a compromise. Factors:
− Risks
− Performance (inter-site latency)
− Interconnect costs
− Ease of travel between sites
− Availability of workforce
![Page 63: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/63.jpg)
65 30 July 2015
Planning for DT: Site Separation Distance
• Select site separation distance:− 1-3 miles: protects against most building fires, natural gas leaks,
armed intruders, terrorist bombs
− 10-30 miles: protects against most tornadoes, floods, hazardous material spills, release of poisonous gas, non-nuclear military bomb strike
− 100-300 miles: protects against most hurricanes, earthquakes, tsunamis, forest fires, most biological weapons, most power outages, suitcase-sized nuclear bomb
− 1,000-3,000 miles: protects against “dirty” bombs, major region-wide power outages, and possibly military nuclear attacks
Threat Radius
![Page 64: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/64.jpg)
66 30 July 2015
"You have to be far enough away to be beyond the immediate threat you are planning for.“…"At the same time, you have to be close enough for it to be practical to get to the remote facility rapidly.“
“Disaster Recovery Sites: How Far Away is Far Enough?” By Drew Robb
Enterprise Storage Forum, September 30, 2005
http://www.enterprisestorageforum.com/continuity/features/article.php/3552971
![Page 65: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/65.jpg)
68 30 July 2015
“A Watertight Plan” By Penny Lunt Crosman, IT Architect, Sept. 1, 2005
http://www.itarchitect.com/showArticle.jhtml?articleID=169400810
“Survivors of hurricanes, floods, and the London terrorist bombings offer best practices and advice on disaster recovery planning.”
![Page 66: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/66.jpg)
69 30 July 2015Source: “A Watertight Plan” By Penny Lunt Crosman, IT Architect, Sept. 1, 2005
![Page 67: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/67.jpg)
70 30 July 2015
Planning for DT: Site Separation Direction
• Select site separation direction:
− Not along same earthquake fault-line
− Not along likely storm tracks
− Not in same floodplain or downstream of same dam
− Not on the same coastline
− Not in line with prevailing winds (that might carry hazardous materials or radioactive fallout)
![Page 68: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/68.jpg)
Long-Distance Disaster Tolerance Using OpenVMS Clusters
![Page 69: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/69.jpg)
Background
![Page 70: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/70.jpg)
73 30 July 2015
Historical Context
Example: New York City, USA
• 1993 World Trade Center bombing raised awareness of DR and prompted some improvements
• Sept. 11, 2001 has had dramatic and far-reaching effects
−Scramble to find replacement office space
−Many datacenters moved off Manhattan Island, some out of NYC entirely
− Increased distances to DR sites
− Induced regulatory responses (in USA & abroad)
![Page 71: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/71.jpg)
74 30 July 2015
Trends and Driving Forces in the US
• BC, DR and DT in a post-9/11 world:
−Recognition of greater risk to datacenters
• Particularly in major metropolitan areas
−Push toward greater distances between redundant datacenters
• It is no longer inconceivable that, for example, terrorists might obtain a nuclear device and destroy the entire NYC metropolitan area
![Page 72: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/72.jpg)
75 30 July 2015
Trends and Driving Forces in the US
• "Draft Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System“
−http://www.sec.gov/news/studies/34-47638.htm
• Agencies involved:
Federal Reserve System
Department of the Treasury
Securities & Exchange Commission (SEC)
• Applies to:
Financial institutions critical to the US economy
![Page 73: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/73.jpg)
76 30 July 2015
US Draft Interagency White Paper
The early “concept release” inviting input made mention of a 200-300 mile limit (only as part of an example when asking for feedback as to whether any minimum distance value should be specified or not):
“Sound practices. Have the agencies sufficiently described expectations regarding out-of-region back-up resources? Should some minimum distance from primary sites be specified for back-up facilitiesfor core clearing and settlement organizations and firms that play significant roles in critical markets (e.g., 200 -300 miles between primary and back-up sites)? What factors should be used to identify such a minimum distance?”
![Page 74: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/74.jpg)
77 30 July 2015
US Draft Interagency White Paper
This induced panic in several quarters:
• NYC feared additional economic damage of companies moving out
• Some pointed out the technology limitations of some synchronous mirroring products and of Fibre Channel at the time which typically limited them to a distance of 100 miles or 100 km
Revised draft contained no specific distance numbers; just cautionary wording
Ironically, that same non-specific wording now often results in DR datacenters 1,000 to 1,500 miles away
![Page 75: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/75.jpg)
78 30 July 2015
US Draft Interagency White Paper
“Maintain sufficient geographically dispersedresources to meet recovery and resumption objectives.”
“Long-standing principles of business continuity planning suggest that back-up arrangements should be as far away from the primary site as necessary to avoid being subject to the same set of risks as the primary location.”
![Page 76: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/76.jpg)
79 30 July 2015
US Draft Interagency White Paper
“Organizations should establish back-up facilities a significant distance away from their primary sites.”
“The agencies expect that, as technology and business processes … continue to improve and become increasingly cost effective, firms will take advantage of these developments to increase the geographic diversification of their back-up sites.”
![Page 77: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/77.jpg)
80 30 July 2015
Ripple effect of Regulatory Activity Within the USA
• National Association of Securities Dealers (NASD):
−Rule 3510 & 3520
• New York Stock Exchange (NYSE):
−Rule 446
![Page 78: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/78.jpg)
81 30 July 2015
Ripple effect of Regulatory Activity Outside the USA
• United Kingdom: Financial Services Authority:−Consultation Paper 142 – Operational Risk and Systems
Control
• Europe:−Basel II Accord
• Australian Prudential Regulation Authority−Prudential Standard for business continuity management
APS 232 and guidance note AGN 232.1
• Monetary Authority of Singapore (MAS)−“Guidelines on Risk Management Practices – Business
Continuity Management” affecting “Significantly Important Institutions” (SIIs)
![Page 79: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/79.jpg)
82 30 July 2015
Resiliency Maturity Model project
• The Financial Services Technology Consortium (FTSC) has begun work on a Resiliency Maturity Model
−Taking inspiration from the Carnegie Mellon Software Engineering Institute’s Capability Maturity Model (CMM) and Networked Systems Survivability Program
− Intent is to develop industry standard metrics to evaluate an institution’s business continuity, disaster recovery, and crisis management capabilities
![Page 80: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/80.jpg)
Long-distance Effects:Inter-site Latency
![Page 81: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/81.jpg)
84 30 July 2015
Long-distance Cluster Issues
• Latency due to speed of light becomes significant at higher distances. Rules of thumb:
− About 1 ms per 100 miles, one-way
− About 1 ms per 50 miles round-trip latency
• Actual circuit path length can be longer than highway mileage between sites
• Latency can adversely affect performance of
− Remote I/O operations
− Remote locking operations
![Page 82: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/82.jpg)
85 30 July 2015
200 240 400
4400
23000
0
5000
10000
15000
20000
25000
Latency (micro-seconds)
Gigabit Ethernet, zerodistance
Fast Ethernet, zerodistance
ATM 30 miles
DS-3 250 miles
OC-3 1400 miles
OpenVMS Lock Request Latencies
![Page 83: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/83.jpg)
86 30 July 2015
Inter-site Latency:Actual Customer Measurements
Highway MileageLatency (ms) Est. Circuit Path Length
5 miles ATM OC-3 0.5 30 miles
35 miles 1.5 95 miles
25 to 35 miles,
IP DLSW link3 to 4 190-250 miles (effective)
130 miles DS-3 4.4 275 miles
“Over 150” miles 5.5 350 miles
1,250 miles DS-3 30 1,875 miles
![Page 84: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/84.jpg)
87 30 July 2015
Differentiate between latency and bandwidth
• Can’t get around the speed of light and its latency effects over long distances
− Higher-bandwidth link doesn’t mean lower latency
![Page 85: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/85.jpg)
Long-distance Techniques:SAN Extension
![Page 86: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/86.jpg)
89 30 July 2015
SAN Extension
• Fibre Channel distance over fiber is limited to about 100 kilometers
−Shortage of buffer-to-buffer credits adversely affects Fibre Channel performance above about 50 kilometers
• Various vendors provide “SAN Extension” boxes to connect Fibre Channel SANs over an inter-site link
• See SAN Design Reference Guide Vol. 4 “SAN extension and bridging”:
−http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00310437/c00310437.pdf
![Page 87: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/87.jpg)
Long-distance Data Replication
![Page 88: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/88.jpg)
91 30 July 2015
Disk Data Replication
• Data mirroring schemes
− Synchronous
• Slower, but no chance of data loss in conjunction with a site loss
− Asynchronous
• Faster, and works for longer distances
but can lose seconds’ or minutes’ worth of data (more under high loads) in a site disaster
![Page 89: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/89.jpg)
92 30 July 2015
Continuous AccessSynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write
![Page 90: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/90.jpg)
93 30 July 2015
Continuous AccessSynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write
Write
![Page 91: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/91.jpg)
94 30 July 2015
Continuous AccessSynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write
Write
Success status
![Page 92: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/92.jpg)
95 30 July 2015
Continuous AccessSynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write
Write
Success status
Success status
![Page 93: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/93.jpg)
96 30 July 2015
Continuous AccessSynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write
Write
Success status
Success status
Application
continues
![Page 94: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/94.jpg)
97 30 July 2015
Continuous AccessAsynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write
![Page 95: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/95.jpg)
98 30 July 2015
Continuous AccessAsynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write Success status
![Page 96: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/96.jpg)
99 30 July 2015
Continuous AccessAsynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write Success status
Application
continues
![Page 97: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/97.jpg)
100 30 July 2015
Continuous AccessAsynchronous Replication
Node
FC Switch
Node
FC Switch
Mirrorset
EVA EVA
Controller in
charge of
mirrorset:
Write
Write
Success status
Application
continues
![Page 98: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/98.jpg)
101 30 July 2015
Synchronous versus Asynchronous Replication and Link Bandwidth
Time
0 8 am 12 noon 5 pm 12 pm
MB/Sec
Synchronous – RPO = 0
Asynchronous – RPO 2 hrs. max
Asynchronous – RPO many hrs.
Application write bandwidth
![Page 99: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/99.jpg)
102 30 July 2015
Data Replication and Long Distances
• Some vendors claim synchronous mirroring is impossible at a distance over 100 kilometers, 100 miles, or 200 miles, because their product cannot support synchronous mirroring over greater distances
• OpenVMS Volume Shadowing does synchronous mirroring
−Acceptable application performance is the only limit found so far on inter-site distance for HBVS
![Page 100: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/100.jpg)
103 30 July 2015
Long-distance SynchronousHost-based Mirroring Software Tests
• OpenVMS Host-Based Volume Shadowing (HBVS) software (host-based mirroring software)
• SAN Extension used to extend SAN using FCIP boxes
• AdTech box used to simulate distance via introduced packet latency
• No OpenVMS Cluster involved across this distance (no OpenVMS node at the remote end; just “data vaulting” to a “distant” disk controller)
![Page 101: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/101.jpg)
104 30 July 2015
Long-distance HBVS Test Results
Delay, 1-way
(milliseconds)
Throughput
(Bytes/Second)
Distance
(Kilometers)
Distance
(Miles)
0 ms 11 megabytes 0 km 0 miles
10 ms 226 kilobytes 2,000 km 1,250 miles
50 ms 45 kilobytes 10,000 km 6,250 miles
100 ms 24 kilobytes 20,000 km 12,500 miles
200 ms 15 kilobytes 40,000 km 25,000 miles
300 ms 9 kilobyte 60,000 km 37,500 miles
400 ms 8 kilobytes 80,000 km 50,000 miles
485 ms 6.5 kilobytes 97,000 km 60,625 miles
![Page 102: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/102.jpg)
Mitigating the Effects of Long Inter-site Distances
![Page 103: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/103.jpg)
106 30 July 2015
Minimizing Round Trips Between Sites
• Some vendors have Fibre Channel SCSI-3 protocol tricks to do writes in 1 round trip vs. 2
−e.g. Brocade’s “FastWrite” or Cisco’s “Write Acceleration”
• Application design can also affect number of round-trips required between sites
![Page 104: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/104.jpg)
107 30 July 2015
Mitigating Impact of Inter-Site Latency
How applications are distributed across a multi-site OpenVMS cluster can affect performance
This represents a trade-off among performance, availability, and resource utilization
![Page 105: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/105.jpg)
108 30 July 2015
Application Scheme 1:Hot Primary/Cold Standby
• All applications normally run at the primary site
− Second site is idle, except for data replication work, until primary site fails, then it takes over processing
• Performance will be good (all-local locking)
• Fail-over time will be poor, and risk high (standby systems not active and thus not being tested)
• Wastes computing capacity at the remote site
![Page 106: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/106.jpg)
109 30 July 2015
Application Scheme 2:Hot/Hot but Alternate Workloads
• All applications normally run at one site or the other, but not both; data is mirrored between sites, and the opposite site takes over upon a failure
• Performance will be good (all-local locking)
• Fail-over time will be poor, and risk moderate (standby systems in use, but specific applications not active and thus not being tested from that site)
• Second site’s computing capacity is actively used
![Page 107: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/107.jpg)
110 30 July 2015
Application Scheme 3:Uniform Workload Across Sites
• All applications normally run at both sites simultaneously. (This would be considered the “norm” for most OpenVMS clusters.)
• Surviving site takes all load upon failure
• Performance may be impacted (some remote locking) if inter-site distance is large
• “Fail-over” time will be excellent, and risk low (all systems are already in use running the same applications, thus constantly being tested)
• Both sites’ computing capacity is actively used
![Page 108: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/108.jpg)
111 30 July 2015
Work-arounds being used today
• Multi-hop replication
−Synchronous to nearby site
−Asynchronous to far-away site
• Transaction-based replication
−e.g. replicate transaction (a few hundred bytes) with Reliable Transaction Router instead of having to replicate all the database page updates (often 8 kilobytes or 64 kilobytes per page) and journal log file writes behind a database
![Page 109: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/109.jpg)
112 30 July 2015
Data Replication over Long Distances:Multi-Hop Replication
• It may be desirable to synchronously replicate data to a nearby “short-haul” site, and asynchronously replicate from there to a more-distant site− This is sometimes called “cascaded” data replication
Synch Secondary AsynchPrimary Tertiary
100 miles 1,000 miles
Short-Haul Long-Haul
![Page 110: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/110.jpg)
Testing & Simulation of Long Distances
![Page 111: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/111.jpg)
114 30 July 2015
Testing / Simulation
• Before incurring the risk and expense of site selection, datacenter construction, and inter-site link procurement:
• Test within a single-datacenter test environment, with distance simulated by introducing packet latency, and bandwidth simulated by throttling traffic flow
• Techniques for simulating distance with latency:
−Hardware Network Emulators
−Software Network Emulators
![Page 112: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/112.jpg)
115 30 July 2015
Hardware Network Emulators
• A couple of vendors / products:
−Shunra STORM Network Emulator
−Spirent AdTech
![Page 113: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/113.jpg)
116 30 July 2015
Software Network Emulators
• A couple of examples:
−NIST Net from the National Institute of Standards and Technology
• http://snad.ncsl.nist.gov/nistnet/
−D4 (Dick’s Dynamic Delay Device) in OpenVMS
![Page 114: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/114.jpg)
117 30 July 2015
D4
• Capability added to OpenVMS Gigabit Ethernet LAN drivers
• Packets can be:
−Delayed
− Lost
• Bandwidth can be throttled/limited
![Page 115: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/115.jpg)
118 30 July 2015
D4
• Controlled by LAN SDA Extension:
−SDA> LAN DELAY PARAM /qualifiers
−SDA> LAN DELAY STATUS /qualifiers
![Page 116: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/116.jpg)
119 30 July 2015
D4
• LAN packets are handled / affected between a pair of Gigabit Ethernet NICs
• One non-Primary CPU recommended per pair of NICs
−Use Fast_Path to move interrupts off of Primary CPU onto a non-Primary CPU for both NICs
• So a quad-CPU OpenVMS system with 6 Gigabit Ethernet NICs can handle 3 LAN traffic streams
![Page 117: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/117.jpg)
120 30 July 2015
D4
• OpenVMS 8.3 or later, plus a LAN patch kit:−8.3 on Alpha: VMS83A_LAN-V0300 (or later)
−8.3 on Integrity: VMS83I_LAN-V0700 (or later)
−8.3-1H1: VMS831I_LAN-V0100 (or later)
• Functionality is contained in _MON images. Set SYSTEM_CHECK to 1 or:−Copy SYS$LOADABLE_IMAGES:SYS$EI1000_MON.EXE
to SYS$LOADABLE_IMAGES:SYS$EI1000.EXE
−Copy SYS$LOADABLE_IMAGES:SYS$EW5700_MON.EXE to SYS$LOADABLE_IMAGES:SYS$EW5700.EXE.
![Page 118: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/118.jpg)
121 30 July 2015
Example D4_SETUP.COM• $ !
• $ ! Configure RX4640 system for LAN Delay Function using EIC/EID, EIE/EIF, EWA/EWB
• $ !
• $ set noon
• $ !
• $ ! Set preferred CPU of other devices
• $ !
• $ set dev fga0/pref=0
• $ set dev fgb0/pref=0
• $ set dev eia/pref=0
• $ set dev eib/pref=0
• $ set dev eig/pref=0
• $ set dev eih/pref=0
• $ set dev ewc/pref=0
• $ !
• $ ! Devices to use are the AB465A Broadcom ports (Ruchba combo)
• $ !
• $ set dev ewa/pref=1
• $ set dev ewb/pref=1
• $ !
• $ ! Devices to use are the A7012A Intel ports
• $ !
• $ set dev eic/pref=2
• $ set dev eid/pref=2
• $ !
• $ ! Devices to use are the AB545A Intel ports (quad card)
• $ !
• $ set dev eie/pref=3
• $ set dev eif/pref=3
• $ !
• $ ! Turn off LAN driver tracing on all devices
• $ !
• $ mc lancp set dev/notrace/all
• $ ! Turn on LAN driver tracing on interesting devices, excluding fork begin/end entries
• $ !
• $ ! mc lancp set dev/trace=(mask=(%xFFFFFFF3,-1),size=2048) ewa
• $ ! mc lancp set dev/trace=(mask=(%xFFFFFFF3,-1),size=2048) ewb
• $ ! mc lancp set dev/trace=(mask=(%xFFFFFFF3,-1),size=2048) eic
• $ ! mc lancp set dev/trace=(mask=(%xFFFFFFF3,-1),size=2048) eid
• $ ! mc lancp set dev/trace=(mask=(%xFFFFFFF3,-1),size=2048) eie
• $ ! mc lancp set dev/trace=(mask=(%xFFFFFFF3,-1),size=2048) eif
![Page 119: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/119.jpg)
122 30 July 2015
SDA> LAN commands
• SDA> LAN DELAY PARAM /DEVICE=(device1,device2) /AGE=value /BANDWIDTH=value /BUFFER=value /DELAY=value /LOSS=value /TLOSS=value
![Page 120: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/120.jpg)
123 30 July 2015
SDA> LAN commands
−/DEVICE=(device1,device2) specifies the two LAN devices to use. They must both be assigned to the same secondary CPU.
−/DELAY=value specifies the amount of delay in microseconds to be imposed on each received packet before it is transmitted on the other device. Zero is the default.
−/BANDWIDTH=value specifies the maximum bandwidth allowed in megabits per second. Zero (default) means there is no bandwidth limit.
![Page 121: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/121.jpg)
124 30 July 2015
SDA> LAN commands
−/AGE=value specifies the packet age limit to be imposed, in microseconds. Packets older than this age are discarded. Zero (default) means there is no age limit.
−/BUFFER=value specifies the maximum amount of data in bytes to be buffered. Incoming packets that would cause this limit to be exceeded are discarded. Zero (default) means there is no buffering limit.
−/LOSS=value specifies the packet loss rate to be imposed, as the number of packets to be discarded each second. Zero (default) is no intentional packet loss.
−/TLOSS=value specifies the total number of packets to be discarded. Zero (default) means there is no limit to the number of packets that will be discarded.
![Page 122: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/122.jpg)
125 30 July 2015
SDA> LAN commands
• SDA> LAN DELAY STATUS /DEVICE=(device1,device2) /CONTINUOUS=value /HISTOGRAM /RESET
![Page 123: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/123.jpg)
126 30 July 2015
SDA> LAN commands
−/DEVICE=(device1,device2) specifies the two LAN devices to use. They must both be assigned to the same secondary CPU. If no devices are specified, status will be displayed for all device pairs.
−/CONTINUOUS=value specifies that the status display is to be repeated every value seconds. The default is no repetitions.
![Page 124: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/124.jpg)
127 30 July 2015
SDA> LAN commands
−/HISTOGRAM specifies that histogram data should be displayed, which includes:• Delay Variance (not a true statistical variance) – the difference between
the expected time that a transmit was to be issued and the time it actually was. For example, if the specified delay was 50 microseconds and a packet was transmitted 55 microseconds after the packet was received, the histogram bucket incremented is for 5 microseconds. This gives you an idea how accurate the delay function is. There are 64 buckets of 1024 CPU cycles each, so for a 1000 mhz processor, each bucket is 1.024 microseconds each. Note that this does not include any additional delay, perhaps because the transmit queue on the device is backing up because of load or the effect of flow control.
• Packets Outstanding – the number of packets outstanding to the other device for transmit. There are 16 buckets of 64 packets each, so the first bucket is for 0-63 packets outstanding, etc.
• Bytes Outstanding – the number of bytes outstanding to the other device for transmit. There are 16 buckets of 64k bytes each, so the first bucket is for 0-65535 bytes, etc.
• Packet Length – the length of each received packet in 16 buckets are given in the display 64..127, 128..191, etc.
![Page 125: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/125.jpg)
128 30 July 2015
SDA> LAN commands
−/RESET – clears the counters before the display (you can also use LAN DELAY PARAM /DEVICE=(device1,device2) to clear the counters).
![Page 126: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/126.jpg)
129 30 July 2015
LAN DELAY STATUS ExampleWAN$SDA(X-1) Extension on VLAN4 (HP rx4640 (1.30GHz/3.0MB)) at 9-JUL-2006 13:02:10.96
---------------------------------------------------------------------------------------
Device 1: EIC (Active) Device 2: EID (Active) CPU affinity: 2
Delay (usec): 5000 Max packet age (usecs): 0 Loss rate (pk/sec): 0
Bandwidth (mbits/sec): 50 Max buffering (bytes): 0 Total loss (pks): 0
EIC Xmt (pk) 1668495 (by) 13668246768 (mpk) 8 (mby) 1264 Lost (age) 0
EIC Rcv (pk) 1668228 (by) 13666059504 (mpk) 8 (mby) 1264 Lost (buffering) 0
EIC MBits/sec (128 pk) Xmt 0.00 Rcv 0.00 X+R 0.00 Lost (intentional) 0
EIC MBits/sec (512 pk) Xmt 0.00 Rcv 0.00 X+R 0.01 Lost (pool) 0
EIC MBits/sec (4096 pk) Xmt 0.04 Rcv 0.04 X+R 0.08 Current xmt (pk) 0/8
EIC MBits/sec (All pk) Xmt 11.91 Rcv 11.91 X+R 23.83 Current xmt (by) 0/57344
EIC Failures: Link 1 Xmt 0 Rcv 0 Elapsed time (sec) 9178
EID Xmt (pk) 1668228 (by) 13666059504 (mpk) 8 (mby) 1264 Lost (age) 0
EID Rcv (pk) 1668594 (by) 13669057776 (mpk) 8 (mby) 1264 Lost (buffering) 0
EID MBits/sec (128 pk) Xmt 0.00 Rcv 0.00 X+R 0.00 Lost (intentional) 0
EID MBits/sec (512 pk) Xmt 0.00 Rcv 0.00 X+R 0.01 Lost (pool) 0
EID MBits/sec (4096 pk) Xmt 0.04 Rcv 0.04 X+R 0.08 Current xmt (pk) 100/483
EID MBits/sec (All pk) Xmt 11.91 Rcv 11.91 X+R 23.83 Current xmt (by) 819200/3956736
EID Failures: Link 1 Xmt 0 Rcv 0 Elapsed time (sec) 9178
SDA>
![Page 127: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/127.jpg)
130 30 July 2015
LAN DELAY STATUS/HISTOGRAM Example
WAN$SDA(X-1) Extension on VLAN4 (HP rx4640 (1.30GHz/3.0MB)) at 27-AUG-2006 13:32:33.17
---------------------------------------------------------------------------------------
Device 1: EIC (Active) Device 2: EID (Active) CPU affinity: 2
Delay (usec): 0 Max packet age (usecs): 0 Loss rate (pk/sec): 0
Bandwidth (mbits/sec): 0 Max buffering (bytes): 0 Total loss (pks): 0
EIC Delay Variance (0..49+ usec): - - - 23% 44% 19% 10% - - 1% 3% - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
EIC Packets Outstanding (0..960+): 100% - - - - - - - - - - - - - - -
EIC Bytes Outstanding (0..960k+) : 100% - - - - - - - - - - - - - - -
EIC Packet Length: 64+ 128+ 192+ 256+ 384+ 448+ 512+ 756+ 1024 1280 1519 2048 3072 4096 6144 8192
EIC Packets: 33% 1% - 32% - - - - 17% 13% - - - - - 4%
EID Delay Variance (0..49+ usec): - - - 25% 46% 23% 2% - - 1% 3% - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
EID Packets Outstanding (0..960+): 100% - - - - - - - - - - - - - - -
EID Bytes Outstanding (0..960k+) : 100% - - - - - - - - - - - - - - -
EID Packet Length: 64+ 128+ 192+ 256+ 384+ 448+ 512+ 756+ 1024 1280 1519 2048 3072 4096 6144 8192
EIC Packets: 33% 1% - 32% - - - - 17% 13% - - - - - 4%
![Page 128: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/128.jpg)
Real-Life Examples
![Page 129: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/129.jpg)
132 30 July 2015
Real-Life Example:Credit Lyonnais, Paris
•Credit Lyonnais fire in May 1996
•OpenVMS multi-site cluster with data replication between sites (Volume Shadowing) saved the data
•Fire occurred over a weekend, and DR site plus quick procurement of replacement hardware allowed bank to reopen on Monday
Source: Metropole Paris
![Page 130: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/130.jpg)
133 30 July 2015
“ In any disaster, the key is to protect the data. If you lose your CPUs, you can replace them. If you lose your network, you can rebuild it. If you lose your data, you are down for several months. In the capital markets, that means you are dead. During the fire at our headquarters, the DIGITAL VMS Clusters were very effective at protecting the data.”
Jordan DoePatrick HummelIT Director, Capital Markets Division, Credit Lyonnais
![Page 131: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/131.jpg)
134 30 July 2015
Headquarters for Manhattan's Municipal Credit Union (MCU) were across the street from the World Trade Center, and were devastated on Sept. 11."It took several days to salvage critical data from hard-drive arrays and back-up tapes and bring the system back up” ...“During those first few chaotic days after Sept. 11, MCU allowed customers to withdraw cash from its ATMs, even when account balances could not be verified. Unfortunately, up to 4,000 people fraudulently withdrew about $15 million."
Ann Silverthorn, Network World Fusion, 10/07/2002
http://www.nwfusion.com/research/2002/1007feat2.html
![Page 132: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/132.jpg)
135 30 July 2015
Real-Life Examples: Commerzbank on 9/11
• Datacenter near WTC towers
• Generators took over after power failure, but dust & debris eventually caused A/C units to fail
• Data replicated to remote site 30 miles away
• One AlphaServer continued to run despite 104° F temperatures, running off the copy of the data at the opposite site after the local disk drives had succumbed to the heat
• See http://h71000.www7.hp.com/openvms/brochures/commerzbank/
![Page 133: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/133.jpg)
136 30 July 2015
“Because of the intense heat in our datacenter, all systems crashed except for ourAlphaServer GS160... OpenVMS wide-areaclustering and volume-shadowing technologykept our primary system running off thedrives at our remote site 30 miles away.”
Werner Boensch, Executive Vice PresidentCommerzbank, North America
![Page 134: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/134.jpg)
137 30 July 2015
Real-Life Examples of OpenVMS: International Securities Exchange
• All-electronic stock derivatives (options) exchange
• First new stock exchange in the US in 26 years
• Went from nothing to majority market share in 3 years
• OpenVMS Disaster-Tolerant Cluster at the core, surrounded by other OpenVMS systems
• See http://h71000.www7.hp.com/openvms/brochures/ise/
![Page 135: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/135.jpg)
138 30 July 2015
“OpenVMS is a proven product that’s beenbattle tested in the field. That’s why wewere extremely confident in building thetechnology architecture of the ISE onOpenVMS AlphaServer systems.”
Danny Friel, Sr. Vice President,Technology / Chief Information Officer,International Securities Exchange
![Page 136: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/136.jpg)
139 30 July 2015
“ We just had a disaster at one of our 3 sites 4 hours ago. Both the site's 2 nodes and 78 shadow members dropped when outside contractors killed all power to the computer room during maintenance. Fortunately the mirrored site 8 miles away and a third quorum site in another direction kept the cluster up after a minute of cluster state transition.”
Lee Mah,Capital Health Authority
writing in comp.os.vms, Aug. 20, 2004
![Page 137: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/137.jpg)
140 30 July 2015
“I have lost an entire data center due to a combination of a faulty UPScombined with a car vs. powerpole, and again when we needed to do major power maintenance. Both times, the remaining half of the cluster kept us going.”Ed Wilts, Merrill Corporation
writing in comp.os.vms, July 22, 2005
![Page 138: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/138.jpg)
Business Continuity
![Page 139: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/139.jpg)
142 30 July 2015
Business Continuity: Not Just IT
•The goal of Business Continuity is the ability for the entire business, not just IT, to continue operating despite a disaster.
•Not just computers and data:
−People
−Facilities
−Communications: Data networks and voice
−Transportation
−Supply chain, distribution channels
−etc.
![Page 140: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/140.jpg)
UsefulResources
![Page 141: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/141.jpg)
144 30 July 2015
Business Continuity Resources
• Disaster Recovery Journal:
− http://www.drj.com/
• Continuity Insights Magazine:
− http://www.continuityinsights.com//
• Contingency Planning & Management Magazine
− http://www.contingencyplanning.com/
• All are high-quality journals. The first two are available free to qualified subscribers
• All hold conferences as well
![Page 142: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/142.jpg)
145 30 July 2015
Multi-OS Disaster-Tolerant Reference Architectures Whitepaper
• Entitled “Delivering high availability and disaster tolerance in a multi-operating-system HP Integrity server environment”
• Describes DT configurations across all of HP’s platforms: HP-UX, OpenVMS, Linux, Windows, and NonStop
• http://h71028.www7.hp.com/ERC/downloads/4AA0-6737ENW.pdf
![Page 143: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/143.jpg)
146 30 July 2015
Tabb Research Report
• "Crisis in Continuity: Financial Markets Firms Tackle the 100 km Question"
−available from https://h30046.www3.hp.com/campaigns/2005/promo/wwfsi/index.php?mcc=landing_page&jumpid=ex_R2548_promo/fsipaper_mcc%7Clanding_page
![Page 144: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/144.jpg)
147 30 July 2015
Draft Interagency White Paper
• "Draft Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System“−http://www.sec.gov/news/studies/34-47638.htm
• Agencies involved:Federal Reserve System, Department of the Treasury,
Securities & Exchange Commission (SEC)
• Applies to:Financial institutions critical to the US economy
• But many other agencies around the world are adopting similar rules
![Page 145: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/145.jpg)
148 30 July 2015
Business Continuity and Disaster Tolerance Services from HP
Web resources:
• BC Services:− http://h20219.www2.hp.com/services/cache/10107-0-0-225-121.aspx
• DT Services: − http://h20219.www2.hp.com/services/cache/10597-0-0-225-121.aspx
![Page 146: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/146.jpg)
149 30 July 2015
OpenVMS Disaster-Tolerant Cluster Resources
• OpenVMS Documentation at OpenVMS website:− OpenVMS Cluster Systems
− HP Volume Shadowing for OpenVMS
− Guidelines for OpenVMS Cluster Configurations
• OpenVMS High-Availability and Disaster-Tolerant Cluster information at the HP corporate website: http://h71000.www7.hp.com/availability/index.htmlandhttp://h18002.www1.hp.com/alphaserver/ad/disastertolerance.html
• More-detailed seminar and workshop notes at http://www2.openvms.org/kparris/ and http://www.geocities.com/keithparris/
• Book “VAXcluster Principles” by Roy G. Davis, Digital Press, 1993, ISBN 1-55558-112-9
![Page 147: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/147.jpg)
150 30 July 2015
Questions?
![Page 148: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/148.jpg)
151 30 July 2015
Speaker Contact Info:
•Keith Parris
•E-mail: [email protected] [email protected]
•Web: http://www2.openvms.org/kparris/
![Page 149: How the Disaster Cluster Recovered · I/Os KABOOM:: Alpha ES40 QUORUM:: Integrity rx2620 SDBOOM:: Integrity Superdome All I/O’s need to complete to all spindles before it is considered](https://reader033.vdocument.in/reader033/viewer/2022060510/5f272405552c5f4d3d3efb1d/html5/thumbnails/149.jpg)