lake sarez: interactive crisis management on the...
TRANSCRIPT
Human and Organization Factors: Quality & Reliability of Engineered Systems- CE290/2008/1
Lake Sarez: Interactive Crisis Management on the Highest Dam in the World
Marc F. Muller
Abstract: This paper evaluates the reliability of a natural dam safety system in a crisis situation. The system assessed
in this work is the early warning and population evacuation plan linked to the Usoi natural landslide dam on Lake Sarez in Tajikistan. Based on experts’ opinions, both the capacity of the system to issue flash flood early warning
alarms, and its capacity to safely evacuate the population is found satisfactory in most crisis scenarios. Yet situations involving the malfunction of key hardware components are found to create crises that may compromise the reliability
of the system. Recommendations are issued to favor a proper interactive management of these crises by improving the performance of the human components of the system. Such recommendations include the requirement of High
Reliability Organization standards as well as an emphasis on training and selection. Finally, the suggestion is made to put the emphasis on the education, training, local knowledge and judgment of the local population, rather than solely
relying on the technology embedded in the system’s hardware, for such a crucial matter as flood safety.
Subject Headings: Interactive Systems, Dam Safety, Risk Management, System Reliability
I. Background
A. Introduction
This paper is written to apply the principles of interactive approaches to assess and mitigate the risks
linked to human and organizational factors in the
management of a rapidly evolving crisis. Specifically, the study will be conducted in the context of the safety
of the highest dam in the world (Palmieri, 1999), Usoi natural landslide dam on Lake Sarez, in the Tajikistan
Pamir Mountains.
Fig. 1. Satellite Image of Lake Sarez, Tajikistan
[http://veimages.gsfc.nasa.gov/2388/ISS002-ESC-7771_lrg.jpg]
Although recent studies (Droz 2007) revealed a
probability of failure comparable to the safety admitted on engineered man-made dams, the high
consequences of a hypothetical failure on the whole region due to its size are such that a state of the art
early warning system has been installed between 2000
and 2006 with international funding. The system is designed to issue several alarm levels at the detection
of an early failure sign and includes a direct evacuation alarm in the downstream villages in case a
flash flood is detected. Although the system is highly automated, human factors still remain important
components to its reliability, amongst other things, by
insuring a proper maintenance and operation of the automated elements, or by insuring the interactive
management of unknown unknowable events in crisis situations. A meta-study of more than 600 engineering
failures showed that merely 20% of the accidents
where directly due to an intrinsic failure of the engineered system (Bea 2008). The remaining 80%
could be linked to a human malfunction. These individual human malfunctions can be linked to
several deeper roots, including organizational malfunctions, procedure flaws, dysfunctional
hardware or an inadequate working environment. They either directly cause the system failure (extrinsic
cause), or cause a situation prone to an intrinsic failure. In the case of Lake Sarez, due to the extreme
remoteness of the location, a correct maintenance is especially critical to prevent an intrinsic failure of the
system. Some informal concerns having been expressed about
decreases in the quantity of qualified local operating personal (Droz 2008, Personal communication), this
paper intends to evaluate the robustness of the interactive management of the system in the event of a
crisis caused, directly or indirectly, by a human malfunction.
B. Interactive Crisis Management
A crisis can be defined as “ a developing sequence of
events in which the risks associated with the system
increases to a hazardous state […] and occur when
improbable events are joined and produce an
evolutionary and interactive complexity in the
performance of a system.” (Bea 2008). Such situations
are especially likely to occur in high risk, high uncertainty systems such as those destined to mitigate
natural hazards. Although such systems are designed to correctly mitigate the targeted natural hazard (in
which case no crisis occurs as such), there are chances that the standard procedures and processes be
disturbed, leading to an unexpected abnormal situation
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/2
where the system is unable to deliver the outcome for
which it was designed. Such state is a crisis situation. As mentioned above, the disturbance leading to crisis
is most of the time a combination of events. These events are generally characterized by a high
unpredictability (hence a low probability of occurrence) and high potential consequences. They are
often fundamentally the result of “human operators
‘pushing the envelope’ and thereby breaching the
safety defenses of an otherwise safe system” (Bea 2008). Such human malfunctions are often violations
whereby people “do what they should not be doing” (Dougherty 1995, quoted in Bea 2008). Such
violations could include the failure to properly accomplish the required maintenance tasks of the
safety systems. Another common source of disturbing events can be classified as unknown unknowable and
include the occurrence of unpredictable damaging
event, such as the simultaneous occurrence of a third natural hazard.
Such chain of events creates a crisis situation, which in a natural hazard context, often turns out to be a
rapidly evolving crisis, whereby time is a critical factor. In the case of Lake Sarez, numerical models
expect a massive flood to reach Barchidiv, the uppermost village of the valley, less than 30 minutes
after its generating event (Zaninetti 2000, quoted by Schuster 2004). Indeed, rapidly evolving crises are
characterized by the time factor and the urge to rescue the system as quickly as possible. Such situation
produces a considerable stress to the human operators, which often results in the degradation of cognitive
performances, eventually leading to vagabonding, retreating and cognitive black outs in extreme cases
(Bea 2008). As a matter of fact, rapidly evolving crisis lead to a situation where the cognitive abilities of the
human operators are often minimal at the very moment they are most solicited. Furthermore, if the
occurrence of the crisis situation itself can be linked to the already poor overall reliability of the human factor
(e.g. due to limited manpower, as it may be for the studied case), there ensues a dangerous situation
almost prone to disaster. Three complementary approaches are known to
mitigate the risks linked to a system (Bea 2008). A proactive approach encompasses the actions taken to
prevent the occurrence of a crisis, whereas a reactive
approach includes the application of lessons learned from passed mishaps and near failures to prevent
future crisis. An interactive approach to risk management includes
the group of actions taken to restore the system to its operational state, given an occurring crisis. In other
words, an interactive approach consists in strategies to turn potential catastrophes into near misses. It is
impossible to totally control, through proactive and reactive measures alone, the chain of such events as
unknown unknowables and human malfunctions that are prone to lead to a crisis. Fortunately, all crisis do
not lead to failures, and several examples can be given of systems that interactively manage with success
periodically occurring unexpected crises, including medical emergency services, commercial aviation, and
natural hazard mitigation. According to Bea (2008),
behind each accident is the order of 10 to 100 near
misses. Several authors have studied the common characteristics of these resilient systems (Bea, Weick,
Lagadec, Klein, Miller, Pidgeon: all quoted in Bea 2008) and strategies designed to successfully
“engineer and manage the unexpected”. The current risk mitigation strategy for Lake Sarez is
proactive by nature. The strategy consists in limiting the consequences of a potential future failure through
early detection instrumentation and an efficient evacuation plan on one side, while a long-term risk
mitigation component is designed to eventually reduce the likelihood of future risks. Yet, the interactive
aspect of risk mitigation is a key aspect where human factors are of highest importance. Indeed, given a
crisis situation, the ultimate success of its adaptive management almost always relies on human
performance. This parameter is often not enough taken
into account in mitigation strategies, perhaps due to its difficulty to quantify and to address (humans are
difficult to engineer). The proactive component of Lake Sarez risk mitigation plan being designed and
operational, I would like to study the interactive aspects of the mitigation strategy, by focusing on its
four key sequential components, the so called OODA loop (Orr 1983, quoted in Bea 2008) : observation and
crisis detection, orientation and sense making, decision and action.
Therefore, the system’s ability to successfully manage an occurring natural hazard will be evaluated, in the
event of an unexpected disturbance leading to a crisis situation. The system’s operational robustness to the
unknown will be assessed.
C. Lake Sarez Case Background
The following section gives a background for the studied case. Although all the given information is not
strictly linked to the management of rapidly evolving crisis, broad background knowledge of the
institutional and local context is required to understand the bigger picture and properly address the
focused topic.
1. Situation
Lake Sarez is located in the semi autonomous Gorno
Badakhshan region, in the southeastern part of the former Soviet Republic of Tajikistan in Central Asia.
The dam is located in the Bartang valley in the Pamir mountain range, which is counted among the highest
and least accessible mountain ranges in the world. Among the common problems faced by the local
population of such remote and mountainous areas is an extreme social, economic and political isolation
that is exacerbated by the difficulties arising from the transition from Soviet rule (Schuster 2004). Moreover,
the inaccessibility of the region is notorious, as a two days trip through ill maintained mountain trails is
necessary to link the region to Dushanbe, the country’s capital city. In addition to its isolation and
inaccessibility, the area displays an extremely high seismic activity, coupled to a harsh continental
climate. As a result, the region is a natural disaster
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/3
prone area, where earthquake, floods, landslides and
stone avalanches are common. The lake was formed in 1911 in the aftermath of a 7.6
magnitude earthquake causing an enormous landslide that blocked the Murgrab river valley (Schuster 2004).
The river waters rose to form a 17 cubic kilometer lake, comparable to half of the size of lake Geneva,
flooding the upstream valley on 60 km, with a water level at 3260 meters above sea level. With a height of
550 meters to the lowest point of its crest, Usoi dam is the greatest dam in the world. The dam is twice as
high as Nurek dam, the tallest man made dam in the world, also located in Tajikistan.
Fig. 2. Lake Sarez location map [http://www.acig.org/artman/uploads/map_tajikistan.jpg]
Being a non-engineered gigantic dam, the safety of
Usoi dam is not a recent preoccupation. Until recently, the little information that passed beyond the borders of
the Soviet Union described a “colossal dam of
questionable stability which retained a vast reservoir
of water. […] Impact projections suggested that a
flood could affect roughly 5 million people living
along the Bartang, Panj and Amu Darya rivers, a path
traversing Tajikistan, Afghanistan, Uzbekistan and
Turkmenistan” (Palmieri 1999) to the Aral Sea, thus
reaching big scale, international proportions.
2. Chronology of events On February 18th 1911, a gigantic 7.6 magnitude
earthquake produced a 2.2 billion cubic meter landslide, burying the village of Usoi and blocking the
course of the Murgrab River. The rising water that gradually formed drowned the village of Sarez to form
a 200-meter deep lake. The water level was ultimately stabilized in 1914 by the seepage through the dam,
with the formation of 57 streams that form an important erosion canyon on the downstream side of
the dam. In the following years, several Russian expeditions
were put in place to evaluate the stability of the dam. The information yielded by these expeditions is rare
and mostly unpublished (Schuster 2004). Yet, the opinions on the stability of the dam were very diverse,
but the consideration of the high consequences of failure obviously conducted the Russians to install a
first early warning system in 1988 (Palmieri 2000).
This early warning system was based on the
combination of hydrometric measurements and visual observations for flood detection. The information was
transmitted through cable and satellite connection to Moscow and Dushanbe, where the decision could be
taken to evacuate the highly populated lower Amu Darya basin. Yet no plan was designed to alert and
evacuate the 19’000 inhabitants of the Bartang valley that were considered the most at risk (Palmieri 2000).
Moreover, at the fall of the Soviet Union in the early 1990’s, the technology was aging and the whole
system was considered inefficient due to the lack of proper maintenance.
In 1991, the country achieved its independence from the Soviet Union. This was followed by a five-year
civil war that was particularly intense in this semi autonomous region of Gorno Badakhshan. As a result,
the isolation of the region was even increased and the
installed early warning system became even more obsolete.
In 1997, the newly formed government of Tajikistan brought the situation to the attention of the united
nation international decade for natural disaster reduction (UN/IDNDR) secretariat, to “lead an effort
to raise international awareness on this problem and
develop a program to reduce this threat” (Palmieri
1999). As a result, a UN/IDNDR Interagency Risk
Assessment Mission was sent to Lake Sarez to assess the situation. The mission confirmed the inefficiency
of the existing early warning system and acknowledged the problems caused by the very low
accessibility of the region that prevented the implementation of a structural heavily engineered
solution to stabilize the dam. The mission also stated the very low probability of occurrence of a major
disaster. Yet, due to the high consequences a failure would have, the design of an up to date early warning
system was recommended that would enable the safe evacuation of the nearby population most at risk.
In 2000, the Lake Sarez Risk Mitigation Plan (LSRMP) was approved by the World Bank, with the
expressed objective of decreasing the “proportion of
vulnerable communities in the Bartang and Murgrab
valleys with disaster management plans as well as
responsibilities and procedures agreed upon by
community leaders and villagers, responsible
government authorities and interested non
governmental organizations (NGO)” (Palmieri 2000).
The four component plan includes technical consulting and the installation of an up to date monitoring and
early warning system (component A), social training and safety related supplies to the local population
(component B), the study of a long term solution through intensive monitoring and consulting
(component C), and institutional strengthening and capacity building of the local government (component
D). The Swiss Government, the United Sates Agency for International Development (USAID), the
Government of Tajikistan, the Aga Khan Development Network (AKDN) and a credit of the
World Bank, shared the financial burden of the implementation.
Lake Sarez
Dushanbe
TADJIKISTAN
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/4
The expected five years implementation period was
extended by one year. In 2006, the LSRMP was fully implemented. In 2007, the final reports of the
implementation and the project evaluation were completed (Stucky Ltd, World Bank). The mandate of
the implementing agencies expired while the operational responsibility of the operation and
maintenance of the early warning and monitoring system is now passed over to the Tajik authorities.
Namely, the Usoi Department is formally in charge of the LSRMP, as part of the Ministry of Emergencies
and Civil Defense, in Dushanbe.
3. Lake Sarez Risk Mitigation Plan
3.1 Alternatives
One could argue that the installation of a monitoring and early warning system to mitigate a risk linked to a
natural structure that is expected to fail someday in a near or far time horizon (Papyrin 2007) may seem a
rather shy and non-durable strategy at the very least. Indeed, several other more structural proactive
measures have been considered by Soviet scientists to mitigate the risk in a more ‘durable’ manner. Such
strategies include “controlled 100-150 m water level
drawdown in the lake to eliminate overtopping by high
wave through construction of a tunnel spillway on the
left bank for irrigation in dry years and power
generation” or to “raise the crest of the lowest part of
the dam by moving the boulder material over the
obstruction using construction machinery or by the
blast fill method from the exposed scarps located
above” (Zolotarev 1986, quoted in Schuster 2004).
Yet, these most ‘rational’ solutions are difficult to implement due to the extreme remoteness of the area,
and its difficulty of access (Fig. 3). Indeed, Periotto (2000, quoted in Palmieri 2000) estimates the
construction cost of the road required to transport the infrastructure needed to realize such project to over
300,000$ per kilometer, which compromises heavily the economical feasibility of such project. However,
the component C of the implemented risk mitigation strategy includes monitoring and documentation
toward a possible durable long-term solution that would be economically feasible.
Fig. 3. Typical bridge in the Bartang Valley [27]
Another efficient prevention strategy would consist in
evacuating the local population as a preventive measure, and label the area a “non-habitable zone due
to natural hazards” to discourage further resettlement. By having the humility of acknowledging the power
of nature, such solution would display a safe and durable solution. Yet the social and political cost the
displacement of 19’000 rural people that are attached to their land is immense and difficult to pay. Indeed,
the sense of belonging to their homeland is very high among the rural population, and people prefer to stay
in their mountain valleys, despite the remoteness, the lack of supplies and the occurrence of natural hazards
(Palmieri 1999). Indeed, the local population has lived with the constant threats of natural hazards for
generations and would certainly not envisage the prospect of leaving.
Therefore, a monitoring and early warning system can
be seen as a consensus to an economically and socially feasible short to medium term risk mitigation strategy.
Indeed, the local population is deeply attached to their lands, displays a remarkably high education and is
ready to accept capacity building for community participation in disaster mitigation and response
(Palmieri 2000). Thus given the remoteness and inaccessibility of the setting, and given the alertness,
understanding and willingness to respond of the local population, the installation of an early warning and
monitoring system has been selected as the most adapted mitigation strategy for the area (Palmieri
2000)
3.2 Components Lake Sarez Mitigation Plan is formed of four
complementary components. Although this study will focus on component A (Early Warning and
Monitoring System) that will be described in details in a following section, the three other components are
here described in order to have a better understanding of the setting of the project.
Component B consists if the social training and the supply of safety related materials to the local
population. It has been implemented by FOCUS humanitarian assistance, a non-governmental
organization (NGO) with an extended work experience in the region in the fields of natural hazard
relief and mitigation (Palmieri 2000), and funded by USAID and AKDN. The objectives of this component
were to “make the early warning system community-
based” (World Bank 2007) by raising awareness. This
was done by providing information, emergency training and the involvement of the communities in
the preparation and supply of safe havens on higher
grounds. Despite implementation delays, the World Bank evaluated the implementation of this component
as “satisfactory” (El-Hanbali 2007). All the vulnerable communities have been identified, equipped with safe
havens and organized into response groups that received disaster mitigation and training groups.
Despite some initial concerns about the competency of the implementing NGO to maintain a high quality
preparedness (World Bank mid term review 2003), the current local awareness and implication of the
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/5
population and the long history of engagement of the
local NGO’s, the sustainability of component B is not put into question by the World Bank’s final evaluation
(El-Hanbali 2007). Component C consists of the study of long term
solutions to mitigate the risk linked to Lake Sarez. The study is based on the data revealed by the monitoring
system provided by component A. Alike component A, component C was financed by the Government of
Switzerland has been implemented by Stucky Consulting Engineers Ltd (STUCKY), a Swiss
company, under the guidance of an international Panel of Expert (POE), and consists of complementary desk
studies including the assessment of different routes to the lake, digital terrain modeling, inundation studies,
wave generation and propagation in the lake, mechanism of dam overtopping, sediment
accumulation rate, seepage, etc (El-Hanbali 2007).
The World Bank has evaluated this component as “satisfactory”. Although no further field research has
been scheduled for the near future, local experts and decision makers have been provided with complete
and up to date data on the situation, towards the design of a sustainable long term solution (El-Hanbali 2007).
By intending to strengthen local institution to efficiently take over the management, operation and
maintenance of Lake Sarez Risk Mitigation Plan, the awareness of component D is of particular relevance
in the evaluation of the reliability of the human component of the project. Component D has been
implemented by the Government of Tajikistan (GoT) with funding from both a World Bank credit and the
GoT. A new government agency, the Sarez Agency (SA) has been created within the Ministry of
Emergency Situation and Civil Defense (MESCD), with the mandate of managing the operation and
maintenance phase of the LSRMP. Consultancy and funding from the World Bank has been provided to
strengthen this institution in its capability to conduct this task. Yet by February 2002, due to unsatisfactory
financial management, the SA was dismissed and replaced by the Usoi Department, an existing
department within the MESCD, directed by Mr. Kadam Maksaev, which operated the original early
warning system. Ultimately, the capacity building and training were thus directed towards the Usoi
Department, which are currently in charge of the
operation and maintenance of the early detection and monitoring system. The performance and capacity of
the Usoi Department has been evaluated as
satisfactory by the World Bank at the end of the project implementation phase, yet although the
department has shown a capability to mobilize and coordinate operation from other government and
research institutions, this cooperation has not been institutionalized to ensure that it occurs on a regular
and formalized basis (El-Hanbali 2007). Moreover, a recent decrease in the department’s qualified staff has
recently raised some informal concerns on its further performances (Droz 2008, Personal communications).
3.3 Monitoring and Early Warning System
The implementation of a monitoring and early warning system constituted the first component (A) of
the LSRMP. The system was designed by STUCKY and funded by the Swiss government. Fela Planungs
AG, a Swiss construction company was awarded the supply and installation of the system (FELA Planungs
AG 2004). The relevance of such a “light” proactive mitigation system with such a light structural
component, as opposed to a heavier and perhaps more durable engineering solution, in the context of Lake
Sarez has been discussed in section 3.1. The purpose
is here to accurately describe the key components of the system, in order to allow a further analysis and
evaluation of the system’s operational robustness to the unknown. Where not specified, all the information
of the following section comes from STUCKY documentation.
a. Risks assessment
As part of components A and C, a thorough and quantitative analysis of the risks linked to the
occurrence of natural hazards on Lake Sarez has been conducted by STUCKY.
Fault tree analysis
A fault tree analysis was conducted, whereby the possible causes of a considered threatening event are
assessed and analyzed. In the case of Lake Sarez, the considered threatening event is a condensed flash
flood with a flow increase superior to 400 cubic meters per second, as basic estimates consider events
above this magnitude to cause significant damages on downstream villages. The fault tree is displayed in the
following figure.
flood >400m3/s
internal
erosion
huge wave
pressure wave earthquake
extreme flooding
internal
disturbance
landslide
water level
variation
overtopping
?
yes
no
external
erosion
clogging
global
instability
superficial slideflood >400m3/s
internal
erosion
huge wave
pressure wave earthquake
extreme flooding
internal
disturbance
landslide
water level
variation
overtopping
?
yes
no
external
erosion
clogging
global
instability
superficial slide
Fig. 4. Fault tree Analysis [6]
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/6
The analysis of the several hazards revealed by the fault tree is summarized in the following paragraphs,
and allowed the emphasis of most probable scenarios.
Lake level increase
A steady lake level increase of about 20cm/yr (Droz
2006) has been noticed. Yet a period of 200 years would be necessary for the lake level to reach the
lowest part of the crest of the dam. Moreover, the higher permeability of the upper layers of the dam is
expected to allow a higher seepage flow rate and thus slow the level increase. A dam overflow due to the
steady increase of lake level is thus considered a very unlikely event (Droz 2006).
Global Dam Instability
The stability of the natural dam has also been
considered as a possible hazard source. The external erosion due to the formation of springs
on the downstream side of the dam has not been considered important enough to compromise the
global stability of the dam. Yet, obstructions of the Murgrab River due to limited external erosion are
considered possible (Droz 2007) and may lead to sudden floods.
The global stability of the dam in the event of an earthquake has also been considered, especially given
the high seismic activity of the region. Yet, the stability of the slopes is considered as high, as the
expected displacement attributed to high accelerations (> 0.4 g) is of the order of 10 cm and considered as
negligible, given the size of the dam. As a result, although the global stability of the dam is
considered as high, sudden flash floods could be caused by obstructions of the river due to external
erosion.
Seepage
The seepage of the dam has been closely observed as a
potential source of hazard. Yet, the current estimated low hydraulic gradient and low speed of the seepage,
as well the low turbidity of the outflow water (Schuster 2004) were considered to indicate a low risk
linked to the seepage in the current condition of the dam. Yet, slow modifications of the seepage regime
are noted and attributed to the geological immaturity
of the dam that will require further monitoring. Clogging hazard is considered as low, due to the high
number of springs and the heterogeneity of the dam material. Piping hazard in the current situation is also
considered as low, yet an earthquake, or the impact of a surge wave can cause sudden modifications of the
dam’s internal structure. Such a modification could allow the formation of natural pipes within the dam,
which would result in a sudden increase in the discharge flow rate and eventual flooding.
Thus, although no alarming risk linked to seepage is to expect in the current situation, the evolution of the
discharge flow rate must be monitored with special attention in the event of an earthquake or a surge
wave.
Overtopping surge wave
The risk of the occurrence of a huge wave can be explained by the presence of a massive slowly moving
slope instability on the mountain side located on the right bank of the lake. A massive landslide into the
lake can thus be triggered by an event such as an earthquake, possibly resulting to a tsunami that would
overtop the dam. The height of the wave is a function of the speed and volume of the hypothetical landslide.
A numerical simulation of a worst case scenario given the parameters as known today yields to a wave
overtopping the dam by 50 meters above the lowest point of its crest, ensuing to a flood of about 800,000
cubic meters (Droz 2006). But the occurrence of a landslide of 0.5 cubic kilometer of volume at 20 m/s
that would be required to overtop de dam is considered as very improbable.
However, a more thorough characterization of the
right bank landslide remains to be done. Until the parameters of the possible landslide are not better
known, the “real possibility of a wave overtopping the
dam and the knowledge of the effects of such a wave
on the downstream valley will remain unclear” (Droz 2007)..
Expert Assessment Meeting
The discussion conducted above has lead to the fact that a major threatening event such as a flood superior
to 400 cubic meters per second is highly improbable in the absence of a major triggering event such as an
earthquake. Therefore, the expert assessment meeting, which has been conducted to evaluate and quantify the
probability of occurrence of the considered flood, is based on the probability of occurrence of a major
earthquake. The several considered chains of event leading from a major earthquake to the flood and their
asserted probability of occurrence are displayed in the hazard analysis tree given in annex (Droz 2007).
The two most likely scenarios are (a) a piping failure as a consequence of toe instability of the dam due to
the pressure wave caused by the earthquake; and (b) an overtopping wave caused by large landslide into the
reservoir, triggered by the earthquake. The probability of occurrence of both chains of event are in the order
of 10^-5, which is the probability of failure ordinarily admitted for man-made dams (Droz 2007). Yet, both
of these phenomenon have happened on Italian dams
in the twentieth century, resulting in enormous consequences in terms of human lives lost: the tailing
dams of the Stava valley failed due to piping (National Geographic Channel 2008), while the Vajont dam was
overtopped by a surge wave (Genevois 2005).
Risk Analysis However, despite its relatively low probability of
occurrence, a flood resulting from a failure of Usoi dam would have high consequences in terms of
damages on the downstream population. Indeed, the 19000 inhabitants of the Bartang valley are considered
as directly exposed, while a total of 132’000 inhabitants of the upper Panj valley are liable to be
affected (Palmieri 2000). Between 1000 and 10000 probable casualties are estimated (Droz 2007). Indeed,
the existing obsolete warning system would take an
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/7
approximated 54 minutes via television satellite to
reach Barchidiv, the uppermost village of the valley, and another estimated two hours to evacuate the
residents to higher ground (Schuster 2004). Given the fact that a flood triggered by a landslide caused surge
wave (scenario b) would take an approximated 25 to 30 minutes to reach Barchidiv (Zaninetti 2000, quoted
by Schuster 2004), the consequences of such an event in terms of casualties are sadly obvious.
Therefore, despite the low probability of occurrence of a threatening event, the high consequences in terms of
casualties causes the risk to be high. The following figure represents the risk level as compared to the risk
currently accepted by various dam owners.
Fig. 5. Risk Diagram [6]
Therefore, the objective of the updated early warning system is to reduce the risk by reducing the
consequences of a threatening event. Preliminary
studied by the World Bank (2000) showed that an estimated 18’500 casualties would be avoided. The
cost analysis conducted by the same institution indicated an estimated cost by statistical life saved of
232 US$. The justification for investment would thus be “very strong”, by U.S. Federal Government
standard practice in Financing Life-Protective-Program (Palmieri 2000).
b. System design Requirements
Following the recommendation issued by the UN/IDNDR mission and in the framework of the
LSRMP, an early warning and monitoring system was designed by STUCKY. The high degree of automation
of the system was required because of the data volume to be collected, the inaccessibility of the sensor
location and the necessity to automatically trigger alarms for civil protection purpose, once a pre-
established values of significant parameters are detected (Palmieri 2000). Moreover, an emphasis on
high quality data transmission systems was required due to the obvious time constraint in the event of a
flash flood, and the high degree of coordination
needed between the central managing unit in Dushanbe, the observation station at the dam, the
remote measuring units and the village alarms (Palmieri 1999).
Hardware and measuring device
The early warning and monitoring system include sensors placed in nine distinct monitoring unit (MU),
scattered on identified sensitive points in the whole area. Measured parameters include flash flood sensors
downstream of the dam and in the village of Barchidiv, seismic accelerometers on the dam toe and
the lake shore, global positioning measuring (GPS) devices scattered on the dam and the right bank
unstable slope, pressure cells for lake level and wave height in the vicinity of the dam, flash flood sensors
river gauging stations in the village of Barchidiv and
automatic weather stations. Moreover, an observation center (dam house, CU) is located on high ground on
the left bank near the dam, with visual contact on the right bank landslide. The dam house is to serve as a
base for periodical observation expeditions, in particular GPS campaigns and turbidity
measurements, as well as a relay in the transmission of the monitored data to Dushanbe, where the
Supervisory Control And Data System (SCADA) is located. There, the operational control and
management of the system is conducted by the Usoi Department. Finally, the SCADA as well as both the
dam house and the village of Barchidiv are equipped with a direct manual alarm trigger.
The general layout of the early detection and warning system is displayed in figure 6.
Data transmission
As discussed above, an efficient data transmission system is a requirement to the efficiency of the
warning system. For transmission between the monitoring units and the
central unit at the dam house, a bidirectional very high frequency (VHF) radio system is generally applied,
with the exception of the two most remote units that are linked to the central unit via satellite technology.
There, Very Small Aperture Terminal (VSAT) systems are applied using the International Maritime
Satellite Organization (Inmarsat D+) network.
Bidirectional transmissions between the central unit at the dam house and the SCADA at Dushanbe are
provided by VSAT technology using the International Telecommunications Satellite Consortium (Intelsat IS-
704) network. Finally, the link between the SCADA, the dam house,
alarms in the villages and the rescue unit in the downstream village of Khorog is provided by the
Inmarsat D+ network, allowing the priority transmission of alarm signals. Furthermore, the link is
bidirectional to allow the monitoring of the alarm equipments in the villages to detect quickly a possible
breakdown.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/8
Fig. 6. Early Warning and Monitoring System situation [7]
Fig. 7. Warning levels chart [7]
Procedures
The monitoring devices, according to the value
measured for the considered parameters, can issue five warning levels. The different warning levels are
described in the preceding table. Once the warning is issued, several procedures are
prescribed for the SCADA to follow, depending on the
warning level and parameter. Following these procedures, several alarm level could be issued,
including alarm level 1 (observation) and alarm level 3 (immediate evacuation). The complete procedure list
is given in annex. Yet several features can be noticed. There are three critical parameters suitable to activate
a direct automated evacuation alarm in the villages of
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/9
the most exposed zone. These parameters are the
detection of a flash flood, the detection of an important and persistent decrease of the discharge
flow rate, or the detection of a surge wave higher than 50m. In these cases, no direct human interaction is
involved in the alarm process, with the exception of a proper maintenance of the device.
In the event of a massive earthquake, no direct interactive measure is prescribed. However, a
proactive increase of attention to the evolution of the critical parameters mentioned above (level 1 alarm) is
ordered. Level one warnings on the critical parameters
mentioned above are “correlated”, as an increased observation of the other critical parameters is
expected. Other, monitoring parameters, considered as less
critical that is, less subject to a short-term time
constraint, are also considered. An unusual modification of these parameters also issues a level
one alarm resulting to an increased awareness and more frequent and directed measurements.
Furthermore, as mentioned above, all data is transmitted to the central unit by automated
transmissions. There, the data is transmitted to the SCADA at Dushanbe, where it is analyzed and where
the procedures are applied. However, level three warnings issued for flashfloods and large waves result
in the direct transmission of a level three alarm (evacuation) in the most vulnerable villages (Fig. 8).
Fig. 8. Equipped alarm house in a Bartang Valley
village [27] 3.4 Current situation and concerns
The monitoring and early detection system is at this date fully operational and under local Tajik steering
and management. As mentioned earlier, the Usoi Department (UD) as part of the Ministry of
Emergency and Civil Defense is the implementing agency. Capacity has been built within the
implementing agency as part of component D of the
LSRMP. Specifically, technical, managerial and organizational training has been given to the staff. As
well, operation and maintenance of the system are in the responsibility of the UD. It must be noted that the
advanced technology of the system, coupled with the harsh Pamiri environment, makes maintenance
especially critical to the sustainability of an efficient system (El-Hanbali 2006). The maintenance
instructions provided by STUCKY are given in annex. One can notice the importance of maintaining the
energy provision source for the monitoring units, especially the flood sensor unit, which constant
functioning is absolutely critical in the efficiency of the system.
Although the global performance of the delivered project has been judged “satisfactory” by the World
Bank (2006), the financing institution has raised
several concerns on the sustainability of the efficiency of the project.
The first two concerns are rather institutional and financial. First the teamwork capability among the
local agencies is questioned. Indeed, although the implementing agency (the Usoi Department) seems to
have shown a capability to mobilize and coordinate cooperation from other government and research
institutions, no formalized scheme exists, and some concern have been raised on the long term
sustainability of such informal collaboration. The second concern is raised on the possible lack of
sustained long term funding for operation and maintenance, if international funding were to stop.
The two last concerns are much more likely to directly compromise the quality and efficiency of the early
warning and monitoring system in a much shorter time and are, in my opinion more preoccupying.
First, a gradual decrease in the quality of the maintenance of the system is suspected (El-Hanbali
2006). Indeed, the fact is, that although training has been given to produce “Operation and Maintenance”
and “Control Yearbook” reports yearly, a slight decrease in the quality of the publication has been
lately noticed by experts (Droz 2008, personal communication). Furthermore, the World Bank fears a
seasonal decrease in maintenance quality due to the difficulty of access and the harsh environmental
conditions. However, as earthquakes can occur all
year long, critical maintenance tasks including periodic battery and solar panel checks must be
regularly done. A pertinent illustration on the significance of the task can be found in the
maintenance of MU7. MU7 is the measuring unit upstream from Barchidiv that is designed to detect
flash floods and issue level three evacuation warnings to 15 villages (FELA planungs AG 2004). It is thus a
critical component of the system. Yet its control unit, GPS and solar panel are located on a remote location
about 100 meters above the river (Fig. 9). Their access for maintenance is not a trivial task, especially in
winter.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/10
The decrease of the maintenance quality is a critical issue that could produce a crisis situation and will thus
be taken seriously in this work. Finally, the lack of long-term national technical
capacity to operate and maintain the system, as well as the lack of ability to recruit and maintain qualified
staff during and after the project, cause concerns (El-Hanbali 2006). Indeed, an out-migration phenomenon
of local skilled staff has been lately noticed through a significant decrease in the number of personal having
received a proper technical and organizational training that are still working among the Usoi Department
(Droz 2008, Personal communication)
As a summary, the probable decrease of the quality of maintenance of the system produces a crisis prone
situation, whose interactive management efficiency
may be compromised by the lack of qualified manpower. This paper intends to evaluate and give
recommendations to manage this uncomfortable situation.
D. Scope and methodology
1. Goal Given the concerns mentioned in the last paragraph,
the goal of the project is to assess, and evaluate the reliability of the early warning and population
evacuation systems of Lake Sarez, in the setting of a rapidly evolving crisis situation. The focus is set on
the human factor in the interactive management of
these crises.
2. Methodology 2.1 System definition
The system considered in this paper is the set of
procedures, hardware, organization and human operators that are taking part in the management and
transmission of information. The considered time window stretches between the occurrence of the
triggering natural hazard and the completed evacuation of the villagers to safe havens. I therefore
chose not to include the management of eventual
rescue missions and the subsistence of the population
in the safe havens. The quality attributes of the system that are here
considered are serviceability and safety, whereby a success is recorded when the villagers could be safely
evacuated with sufficient time notice before the flood reaches them.
2.2 Demand and capacity components of the safety system
The assessment of the reliability of the system
described above will be based on the system’s capacity to meet the demand in the context of a rapidly
evolving crisis. The demand on the system is imposed by the natural
hazard scenario that has been attributed most risk by the expert risk assessment of Lake Sarez. This
scenario will be detailed in a further section, but leads to a high intensity <2000m3/s flash flood of
concentrated mud in the Bartang valley. Quantitatively, the demand on the system is defined as
the time available to evacuate the population, since the instant the flood could first be detected by the
uppermost flood sensor of the system. The capacity of the LSRMP that would be solicited in
the occurrence of the demand described above can be separated into two distinct components: first the time
needed to detect and signal the threat in the form of an alarm issued by the engineered system; then the time
needed to actually evacuated the local population to
safe zones. Thus the capacity meets the demand when the sum of
the alarm transmission and evacuation times is smaller than the flood travel time. In that light, the reliability
of the system in a crisis situation will be analyzed in this paper.
2.3. Crises
A crisis is defined as a situation, where “improbable
events are joined and produce an evolutionary and
interactive complexity in the performance of a system”
(Bea 2008). In other words, I will define a rapidly evolving crisis as a situation where unexpected events
force the system to function beyond the setting it was designed for, under critical time constraints, and
therefore compromise its ability to perform with the required quality. Therefore, the mere occurrence of the
big scale natural hazard considered in the demand analysis that will be conducted does not create a crisis
situation per se, because the system is designed to manage such a situation. The elements actually
triggering a crisis are additional concomitant events that compromise the ability of the system’s capacity to
meet the demand. Three principal types of such events can be mentioned:
- Unexpected “unexpectable” external events
attacking the system, typical example of which include meteorological surprises.
- Internal malfunctions within the system, which often turn out to be linked to human violations.
Such events include ill-performed maintenance operations
Fig. 9. Remoteness of MU7 control unit [27]
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/11
- Finally, instead of abruptly decreasing its
capacity, events may create a crisis by increasing the demand on the system. In the considered case,
such an event would involve an increase in the flood speed.
This paper is focused on human factors by rather
considering crisis-triggering events from the two first categories.
2.4. Evaluation Sources and Validation The sources to the analysis presented in this paper can
be classified in two categories: expert assessment and literature review. The redundancy of the information
between and among these two categories constitutes the validation system to the analysis. The principal
experts that were contacted are mentioned in the following list:
- Mr. Patrice Droz is the technical director of
STUCKY, Ltd, Switzerland. He was in charge of designing and implementing components A and C
of the LSRMP. Specifically, he has designed the organization and procedures of the early warning
and monitoring system and is thus knowledgeable
on its expected behavior. Mr. Droz was motivated to conduct the analysis.
- Mr. Kadam Maksaev is the head of the Usoi Department, the implementing agency. He is
currently in charge of the management of the LSRMP. Specifically, he is responsible of the
operation and maintenance of the early warning and monitoring system and manages the SCADA
and is thus knowledgeable on its current behavior. Although repeatedly contacted, Mr.
Maskaev did not respond positively to the analysis.
- FOCUS humanitarian is the Non Governmental
Organization that is in charge of the implementation and management of the
evacuation plan of the Bartang valley villages. They are thus knowledgeable on the local context
and the current behavior of the local population. Specifically, Mr. Mustafa Karim, the country
director was motivated to collaborate and provided contact with Mr. Abdulhamid Gayosov
for flood routing information and Mr. Rahim Balsara for information on local populations.
- Dr Robert Bea is an expert in Risk management at the University of California at Berkeley and is
thus knowledgeable on the methodologies to be employed in the analysis and evaluation.
The demand is assessed by collecting flood routing
data from a combination of reports from the industry
(Stucky Ltd 2007, United Nation Mission to Lake Sarez 2001, USAid 2006), of academic literature
(Schuster 2004), and of personal communication with knowledgeable experts (Droz, Karim and Gayosov
2008). Both components of the capacity were assessed by
generating crisis situations on the basis of concerns found in the literature (El Hanbali 2007), personal
communication (Droz 2008) and in relevant case studies (Bea 2008). Each situation was then matched
to the specific context of Lake Sarez by being validated or rejected the experts mentioned above. The
experts were then assessed on the probable behavior of the system in the event of the crisis situation that they
considered most risky in terms of likelihood and consequences. Subsequently, one or several possible
interactive crisis management strategies were generated and submitted to the expert’s validation.
Fig. 10. MU7 Flood sensor situation [7]
Barchidiv
Dam
MU7
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/12
II. Analysis
A. Demand 1. Baseline scenario As discussed above, a crisis only occurs when unexpected events compromise the ability of the
system to mitigate a given natural hazard with an expected quality. Thus, in order to define a crisis
situation, a non-crisis situation must be considered. There is therefore a need to consider a baseline natural
hazard scenario that ought to be correctly managed by the system if there were no crisis. In this study, we
will consider the natural hazard scenario that is attributed the highest probability of occurrence by the
expert assessment meeting mentioned in an earlier paragraph.
A massive earthquake of 0.5g acceleration (magnitude
7.3 on the Richter scale) occurs. The ensuing gravity
wave causes a structural disturbance at the dam’s toe,
modifies significantly its internal structure and
disturbs its seepage regime. The disturbance is such
that as the situation evolves, a piping phenomenon
occurs within the dam, whereby the pressing water
forms important underground canals. Eventually, after
a period of about two hours during which a strong
discharge drop can be observed downstream, these
canals reach the downstream slope of the dam and
cause a flash flood. The created flood is massive and
powerful enough (above 400m^3/s surplus to the
normal discharge, flowing at 40km/h) to cause
significant consequences on the downstream valley.
Furthermore, during the piping process the flow
would erode loose deposits and debris forming a
hyper-concentrated flow, whereby the volume of water
will progressively represent only one fifth of the total
flow volume, before being diluted in the confluence
forming the Bartang River.
2. Flood routing analysis 2.1. Flood routing simulation
Given the hazard described above, a flood routing simulation has been performed by Stucky Ltd, on the
basis of which the early warning system and evacuation plan were designed. The objective was to
identify the potential effects of a sudden flood in the Bartang Valley due to a sudden discharge from Usoi
dam. Therefore two flooding scenarios were simulated, where respectively 1000 m3/s and 5000
m3/s were reached in a period of six hours. These parameters were selected to simulate the timing of a
piping event in Usoi dam, where the flood is initiated by a disturbance in the internal structure of the dam
following an earthquake. The simulation was conducted using St Venant hydrodynamic equation,
with an approximated terrain roughness of 25 (Droz 2007).
According to simulation results, an increase of 10% in the flow rate can be expected in Barchidev 1.5 hours
after the beginning of the phenomenon. The maximum
discharge is expected after 6.5 hours, which corresponds to 30 minutes after the peak source
discharge is reached at the dam. Therefore, if we take into account the 20km distance separating Barchidiv
from Usoi dam, we can expect an average flood velocity of 40km/h
1.
2.2. Flood hazard mapping
Given the flood routing simulation results, flood hazard intensity maps have been established by Stucky
Ltd using the Swiss standards (fig 11), which take into account level and velocity of the flow according to the
following table. While the two uppermost villages of Barchidiv and
Nisur are partly located in low to medium danger zones, the hazard intensity mapping expects most of
the other villages of the Bartang valley to be entirely located in high risk zones for floods from 1000 m3/s.
Furthermore, considering the fact that in the absence
of an efficient risk mitigation system, an estimated 5000 human casualties would add up to the 4000
houses, 44 schools 180 community hospitals and 18000ha of cultivated land that would be destroyed in
the Bartang valley as a result of such floods from lake Sarez (Droz 2007), the implementation of a risk
mitigation strategy that would involve the evacuation of all the villages of the Bartang valley is considered a
necessity.
Fig. 12. 5000m3/s flood risk intensity mapping of
the village of Barchidiv.[7]
1 With the assumption that the flood is water. If the fact that the
flow is a hyper concentrated mud flood is taken into account, the velocity is much lower (but the flood much more destructive).
Water
height
or Water height x
velocity
Intensity
level
h< 0.5 m h x v < 0.5 m2/s low
0.5m < h < 2m 0.5 m < h x v < 2 m2/s medium
h > 2 m h x v > 2 m2/s high
Fig. 11. Swiss flood risk intensity standards. [7]
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/13
2.3. Quantifying the demand
The parameters resulting from the flood routing simulation described above enable the estimation of
the demand to be met by an effective risk mitigation system in the Bartang valley. The mitigation strategy
that is considered in the case of lake Sarez being an early warning system coupled to a population
evacuation plan, the demand parameter to be considered is the evacuation time (i.e. the time
available to alert and evacuate the population in a given village). The evacuation time being shortest due
to their location, the demand is highest on the villages of Barchidiv and Nisur, which will thus henceforth be
considered critical and focused on in this analysis. The village of Barchidiv is located 20 km downstream
of lake Sarez and has 30 households with 186 inhabitants. According to the assessment of the
geologists a flash flood from Lake Sarez would reach
the village in 23 to 27 minutes. In the case of the village of Nisur, a flood would reach the 40
households (242 inhabitants) of the village in a period of 31 to 38 minutes after being generated at the lake,
29 km upstream (Gayosov, quoted by Karim 2008) Considering the fact that the uppermost flood
detectors (MU7) are located 10 km downstream of the lake, the time between the moment the detectable
flood can be detected by the system and the moment it passes in the village of Barchidiv is 12 to 14 minutes,
and 16 to19 minutes for the village of Nisur (ibid). Therefore, an evacuation time of 12 to 14 minutes in
the village of Barchidiv is the critical demand to be considered in the present system. A failure to meet
this demand is thus considered as a sufficient failure condition.
B Capacity
1. Early Warning System
1.1. Baseline Procedure
The standard procedure planned for the system to
manage the baseline scenario described in the
preceding section is given as follows (Droz 2007).
The earthquake is detected by the strong motion
accelerometers in both the measuring units on the lake
shore (MU2) and at the dam toe (MU8), and the level
1 warning transmitted by satellite to the SCADA at
Dushanbe, which must ask for visual confirmation by
radio at the dam house. Once the earthquake is
confirmed, a visual inspection of the dam is conducted
from the dam house and awareness increased to detect
discharge drops, flash floods and high waves.
The strong discharge drop (of about half the normal
discharge) that could be caused by the dam toe
instability and the internal disturbance can then be
detected by visual observation and/or the river
gauging system located in the village of Barchidiv
(MU9), about 16 km downstream from the dam. The
detection of the discharge drop issues a level one
warning to the SCADA at Dushanbe. The awareness is
increased towards the detection of a flash flood.
Assuming it occurs less than two hours after the
detection of the discharge drop, the flash flood is
detected by the MU7 flood sensors located about 10
km upstream of Barchidiv at a flow rate of 400m3/s
(level 1 alarm). If a flow rate superior to 2000m3/s is
detected, a level three alarm is automatically given to
both the SCADA and the villages within two minutes.
The evacuation alarm is thus triggered in the villages
about 10 to 12 minutes before the flood reaches
Barchidiv.
If the flashflood had occurred more than two hours
after the discharge drop is detected, the villages would
already have been evacuated, as a level three
evacuation alarm is automatically issued by MU9
once the discharge drop has lasted two hours.
1.2. Crisis Situations
Several crisis situations with different crisis triggering events have been generated on the basis of the
concerns identified in the several documentation sources (Droz 2008, El Hanbali 2007, Bea 2008
personal communication).
- The first three situations are caused by direct and
indirect (i.e. maintenance related) human violations, which can be linked to existing
concerns on the future supply of funding and trained staff.
Situation 1
Due to insufficient funding and out-migration of
trained staff, a lack of qualified personal to
operate the system on a continuous basis is to be
noticed. Therefore, the currently sufficiently
trained operating personal has to be working on
longer shifts, which raises their level of stress and
fatigue.
The earthquake occurs around 4 a.m, and neither
the SCADA operator, nor the personal at the dam
house are able to react appropriately because of
stress, fatigue and concomitant health problem
(say a strong diarrhea). As a result, as the
baseline natural hazard scenario unfolds, no
human operator is available to manage the
system.
Situation 2
Due to the remoteness of their location and their
difficulty of access, no maintenance operation has
been performed on measuring units 7 (flood
sensor), 4 and 8 (strong motion accelerometers)
during the passed winter season that has been
particularly harsh.
As a result, at the occurrence of the earthquake in
mid April, the flood sensors and strong motion
accelerometers have not been accessed for either
routine maintenance or testing since mid
October.
Situation 3
Due to the decrease in funding and trained staff,
the overall global maintenance quality of the
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/14
system has decreased dramatically, except for the
alarm unit in the villages that are maintained in a
satisfactory way because of the strong local
involvement.
As a result, at the occurrence of the earthquake,
no maintenance operation has been performed on
any part of the system (except the village alarm
units) for a period of two years.
- The next three situations are caused by unpredicted unpredictable events whose
occurrence has a relatively low probability and high consequences. Such features have been
identified to be typical of crisis triggering events (Bea 2008).
Situation 4
Due to a huge snowstorm, all access to and from
the dam house is cut for two weeks at the
occurrence of the earthquake. As well, the outside
visibility is reduced to less then 10 meters for the
whole period.
As a result, no site access or visual observation
can be made for a period of two weeks following
the earthquake.
Situation 5
A stone avalanche triggered by the earthquake
destroys completely measuring unit 7. As a result,
the flood sensors at MU7 are not operational at
the occurrence of the flash flood.
Situation 6
A snow avalanche triggered by the earthquake
destroys completely the dam house.
- Finally, the two last situations illustrate specific concerns found in the literature (Schuster 2004,
Papyrin 2007) about the reliability of lake Sarez early warning and monitoring system.
Situation 7
Due to global warming, the hydro geological
setting of the area changes. Specifically, the
melting of the permafrost and the progressive rise
of the water level due to glacier melting may
change the natural hazard probabilities and
decrease the capacity of the early warning system
to mitigate the considered scenario.
Situation 8
A massive landslide on the unstable right bank
slope is triggered by the earthquake and
generates a massive tsunami that would not
overtop the dam, but destroy the dam house.
1.3. Crisis selection
Validation was sought on the eight situation described above by submitting them to an expert’s advice, who
was to qualitatively evaluate their probability of occurrence and consequences in terms of human lives
at risk, by describing them as “low”, “medium” and
“high”. Both Kadam Maskaev (the current chief
operator) and Patrice Droz (the system designer) were contacted and requested to evaluate the situations, yet
only Mr. Droz responded positively. The following section is thus solely based on Mr. Droz’s knowledge
of the system and its environment, as its designer.
Situation 1
The first situation involves the activation of a level
one emergency at the event of an earthquake. The procedure does not require any direct and immediate
operator intervention other than a visual evaluation of the earthquake impacts, conducted at dawn. If a
flashflood happened to occur before dawn, the flashflood detection and alarm system is fully
automatic and operational if it is not destroyed during the earthquake (situation 5). Therefore, although the
probability of an earthquake occurring at night in the
shift of an insufficiently trained operator is considered as “medium”, its consequences are considered “low”.
Situations 2, 3 and 5
Situations two, three and five all involve a malfunction of the MU7 flood detector, either because
of a lack of maintenance or its destruction during the earthquake. According to the expert, there is a
“medium” probability of a physical destruction of MU7 due to consequences of the earthquake (say a
stone avalanche). Yet, due to the difficulty of access of the device, combined with a decrease in funding
and staff training, the probability of a malfunction of MU7 due to a lack of maintenance (e.g. typically a
failure to periodically check its power supply) is considered “high”. The flood sensors located at MU7
are the only flash flood detection devices upstream from the village of Barchidiv and are thus a capital
component to the system’s ability to detect the flood early enough to allow a safe evacuation of the valley.
Yet, there is an advantageous configuration and a proper correlation with the MU9 measuring unit
located in Barchidiv that would allow the evacuation of most of the downstream population. However, if
the proper crisis sequence unfolds, the villages of Barchidiv and Nisur may not be evacuated in time.
This point will be pursued in a following section. The consequences of a malfunction of MU7 are thus
considered as “medium”.
Situation 4
The punctual lack of accessibility/visibility displayed in situation four, although likely to happen in this
elevated and mountainous area, does not compromise the direct efficiency of the evacuation alarm triggering
system that is entirely automatic and independent on the need of visual confirmation in the case of a flash
flood. Yet it may influence the evacuation time of the population and delay the rescue and/or observation
missions. This point will be considered in the assessment of the evacuation plan conducted in a later
section. On a side note, according to Mr. Droz, the only MI8 helicopter in Tajikistan that could transport
15 people crashed in March 2008, which may temporarily complicate the emergency access of the
area. The probability of such a situation to occur is
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/15
thus considered “high”, while the consequences on the
ability of the system to detect the danger and transmit the alarm is considered “low”.
Situations 6 and 8
Situations six and eight involve the complete destruction of the dam house due. The probability of
occurrence of situation six, which involves a snow avalanches is considered very close to zero by the
expert, due to the chosen location of the house. Situation eight involves the destruction of the dam
house due to a tsunami in lake Sarez. The very occurrence of the tsunami due to a massive landslide
of the unstable right bank slope of the lake in the event of an earthquake has been considered “very unlikely”
by the panel of experts in charge of the risk survey undertaken in the frame of the LSRMP. Yet if the
tsunami does occur, the destruction of the dam house
is possible. Indeed, the dam house was built by the Usoi department at an altitude 30 meters lower than
advised by Stucky Ltd, to « spare their men ».
Situation 7
Finally, situation seven involves possible effects of
global warming on the region, namely a melting of the
permafrost and an increase in river water levels due to
glacier melting. According to the expert, the melting of the permafrost will merely produce superficial
landslides and local bloc collapses, but no important big scale motions. Moreover, a hydrological survey
established in 2001 has shown that despite the global warming, no important modification in the natural
flow rates in the area has occurred in the last couple of years. Therefore, despite the quasi certainty of global
warming, its consequences on the early warning system of lake Sarez are considered “low”.
The expert opinion on the effects of the considered
crises on the system gives important insights on its functioning and behavior. The qualitative risk
evaluation linked to each situation is summarized in figure 13. According to the expert, the riskiest
situations concerning the alarm system are situations
two, three and five, which all involve a malfunction of the main flood sensors 10 km upstream from
Barchidiv (MU7). The destruction of MU7 will thus be considered, as the selected crisis-triggering
situation to be further investigated in the following section.
1.4. Crisis Analysis
A malfunction of MU7 having been identified as the
riskiest crisis triggering situation by the expert assessment presented above, the aim of the following
paragraph is to perform a deeper estimation of the effect of such a crisis on the functionality of the early
warning system. A quantitative likelihood estimation of such a crisis
was provided by Mr. Droz (2008). The likelihood of the destruction of MU7 as a consequence of the
earthquake was estimated at 5%, while the likelihood of a malfunction due to a like of maintenance was
estimated as high as 30%. These crisis likelihoods are high enough to deserve attention.
However, the effects of a MU7 malfunction are
mitigated by the presence of the downstream measuring unit MU9. Indeed, MU9 is very likely to be
working properly because of its location in the village of Barchidiv, which facilitates its access for
maintenance and decreases the probability of it being destroyed in the aftermath of the earthquake (the
villages being generally located in “safer” zones). This
measuring unit is equipped with a gauging unit to
detect the significant flow rate decrease that precedes a piping failure flash food. In addition and similarly to
MU7, MU9 is equipped with automatic flood sensors. Therefore, if the flash flood occurs after a period of
two hours of significant flow rate decrease, an evacuation alarm will be automatically triggered by
MU9. The response to such a scenario would thus not be affected by a malfunction of MU7.
Yet if the flash flood occurs within two hours after the flow rate decreases, it will only be automatically
detected when a 2000m3/s flow rate passes through the village of Barchidiv. According to Mr. Droz, the
system has been designed to trigger the evacuation
alarm within 2 minute from the moment the flood is detected. Therefore, in the worst-case scenario where
the flood is only detected at MU9, the evacuation is triggered about 15 minutes after having passed MU7
(i.e. the fifteen minutes needed for the flood to cover the 10 km separating MU7 and MU9. If no proper
action were taken to decrease this 15-minute delay in
Situation Description Likelihood Consequences
1 Earthquake at night on unprepared staff Medium Low
2 MU7 Flood sensors (FS) malfunction due to physical inaccessibility for maintenance
High Medium
3 MU7 FS malfunction due to long term decrease in maintenance quality
High Medium
4 No site access and no visibility High Low
5 MU7 FS destruction resulting from earthquake Medium Medium
6 Dam house destruction resulting from snow avalanche Low Low
7 Global warming effects Medium Low
8 Dam house destruction resulting from a tsunami. Low Low
Fig. 13. Expert situation selection.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/16
detection time, the two uppermost villages of
Barchidiv and Nisur would thus be threatened by lack of sufficient evacuation time. Mr. Droz estimates the
consequences of this delay in terms of human lives at about ten casualties. This low number can probably be
explained by the fact that both Barchidiv an Nisur are mostly located in low to medium zones, as revealed by
the risk mapping established from the flood routing simulations. Moreover, according to Mr. Droz the
villagers are likely to spontaneously evacuate to the safe havens in the event of an earthquake. This
assertion will be investigated in a following section concerning the evacuation plan.
1.5. Crisis management suggestion
However, given the configuration of the early warning system, I argue that the baseline delay imposed by
automatic reaction time of 15 minutes described above
can be decreased through a proper management of a MU7 malfunction crisis. Such a crisis management
involves a proper implementation by the SCADA operating crew of what Reason (1990, quoted by Bea
2008) describes as an Observe, Orient, Decide and Act (OODA) loop. In the case of the described crisis, an
efficient OODA loop can be described as following:
- Observe: The goal of the observation phase is to detect the
symptoms and abnormal system behavior that reveal the presence of a crisis. In the case of the
considered crisis, a level 1 (400m3/s) flood alarm from MU9 without the corresponding alarm from
the upstream MU7 flood sensors is a possible early sign of the crisis induced by a malfunction
of MU7.
- Orient: The goal of the orientation phase is to establish
and confirm a causality link between the observed symptoms and the crisis-triggering event. In the
considered crisis, the malfunction of MU7 has to be inferred by the operating crew on the basis of
the absence of the expected level one warning. This inference must ultimately be confirmed by
launching a MU7 (distant) testing protocol.
- Decide:
The goal of the decision phase is to take the appropriate decision on the basis of the
previously established causality links. The outcomes of an appropriate decision are to
mitigate the crisis and return the system to its normal functioning state. In the considered crisis,
the decision to evacuate the potentially threatened villages must be taken. In the uncertainty of the
actual occurrence of a flood (due to the acknowledged malfunction of MU7), such a
decision may not be easy to take, knowing that a “false” evacuation would be dangerous, costly
and could encourage a crying wolf effect (i.e. decrease the awareness of the population in the
event of future alarms). On the other side, the results of a not evacuating the villagers in the
event of a flash flood are obvious.
- Act: The decision taken in the previous step must then
be implemented, which in the considered case consists in manually triggering from the SCADA
the level three alarms in the threatened villages.
- Observe: Finally, the results of the crisis management
strategies must be observed and the decision taken to reiterate the OODA loop if the crisis is
not solved. In the considered case, a feedback and visual confirmation from the dam house staff
and/or the evacuated population are expected.
If the OODA loop is successfully conducted, an evacuation alarm can be given shortly after a
reasonable level one 400m3/s flow rate is detected in
Barchidiv, which, depending on the hydrogram of the flood, would leave enough time for a successful
evacuation of the threatened villages. This management strategy has been submitted to and
validated by M. Droz (2008), with the remark that its efficient implementation would be compromised in
the event of a too sharp hydrogram peak.
2. Evacuation plan
2.1. Baseline performance
In addition to triggering the alarms, the successful
mitigation of the natural hazard described in the baseline scenario involves the efficient evacuation of
the villages. The evacuation plan consists of the rapid evacuation of the villagers towards equipped and
maintained safety havens in the nearby mountains once the local alarm sirens are activated. A “Disaster
Response Team” is nominated in each village to organize and coordinate these evacuations, as well as
to supply and maintain the safety havens. This is done in collaboration with FOCUS humanitarian, a NGO
who works to raise awareness and provide training among the communities. In that frame, evacuation
exercises have been ordered in April 2007 by the Usoi department (the LSRMP management authority) on
the most critical villages of Barchidiv and Nisur. According to FOCUS humanitarian (Karim 2008,
Personal correspondence), an evacuation drill was conducted in the village of Barchidiv on April 19
th
2007 and yielded to an evacuation time of ten minutes, in which the population of 186 successfully
transferred to the safety haven located 500 meters above the village. On the next day in the down stream
village of Nisur, 17 minutes were required for the 242 inhabitants to go through the 800 meters to the
designated safety haven. Assuming the alarm is given by the early warning
system within the period of two minutes after the flood is detected at MU7 (see section II.B.1.1), these
evacuation times are hardly within the total of 12 to 14
minutes and 16 to 19 minutes respectively imposed by the flash flood demand on the villages of Barchidiv
and Nisur (see section II.A.2.2) . As a matter of fact, in the case of Nisur, the total capacity performance of
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/17
19 minutes (i.e. two minutes detection time and 17
minutes evacuation) does not meet the demand. Furthermore, it is important to mention that the
evacuation exercises have been conducted in favorable conditions (at day and in spring) and on a warned
population, to prevent “shocking” and “excessive stress” on the local population (Karim 2008, personal
correspondence). Yet several authors have shown (UrbanikII 2000, Duclos 1987, Aboelata 2004,
Graham 1999, Sime 1995, Colonna 2001) that the evacuation time performance of a community is
shaped by numerous factors that are not taken into account if the exercise is conducted on a warned
population. Such factors include the level of local awareness and compliance, the isolation of the
community in a crisis situation, environmental factors such as night time and bad weather, the lack of
sufficient warning time, and the congestion of the
evacuation paths due to panic, confusion and the lack of excess capacity of the local communication system.
Thus, given the limited excess capacity in the population evacuation performance times, and taking
into account the evacuation performance shaping factors mentioned in the literature, several crisis
situations were generated, whereby the ability of the evacuation plan to meet the required demand may be
compromised due to supplementary unfavorable factors added to the baseline exercise conditions in
which the evacuation time performances were measured.
2.2. Crisis situations
According to previous reports (Palmieri 1999) and
thanks to the awareness raising work of local NGO’s among the population, the lack of local knowledge and
perceived threat that has been mentioned in the literature (Duclos 1987, Colonna 2001) to be key
evacuation time performance shaping factors are not of concern in the present case. Moreover, the low
population density of the area also decreases the concern of evacuation delays due to the congestion of
evacuation routes (Sime 1995). Yet several other factors may increase the evacuation time to a point,
where the demand imposed by the flood exceeds the capacity of the system to evacuate the population in
time.
Thus, several potential crisis-triggering situations were generated on the basis the factors that were
found in the literature to affect the population evacuation time. The relevance of each scenario was
then tested trough their submission to an expert’s judgment. Being the FOCUS humanitarian staff
responsible of the LSRMP evacuation plan in the Bartang valley, M Rahim Balsara was here consulted.
He was asked to evaluate the likelihood (P()) of each scenario, as and its potential consequences (Cons) in
terms of human lives lost. For each situation, his evaluation and comments are given in a following
table.
- The first factor is to consider is the surprise effect (Colonna 2001). As Colonna states, “With no
announced warning, occupants might demonstrate
behaviors that could be dangerous under actual
emergency conditions ”. Therefore, even without taking into consideration the other aggravating
factors that will be mentioned in the following paragraph, the surprise factor alone may be of
significant importance and thus a potential crisis triggering situation:
Situation 1
The baseline scenario occurs without other
aggravating factor than the fact that the
evacuation occurs on a surprise basis.
Remarks P() Med The surprise effect can be
mitigated through the crisis mitigation strategy presented
further down, where training in the detection of “natural” early
signs will be suggested. Cons High Consequences have been
estimated by M Balsara to delay the evacuation time from
10 to 15-17 minutes for Barchidiv, and from 17 to 20-
25 minutes for Nisur.
Fig. 14. Situation 1
M. Balsara estimated the effect of the surprise factor on the evacuation time shown in figure 14,
with the comment that actual trainings or drill have not yet been conducted during winter. The
figures given above are thus rough estimates. Yet the fact is that the (top) values of 14 and 19
minutes, imposed by the demand to the villages of Barchidiv and Nisur respectively, are
exceeded.
- The next two situations may be linked to the surprise factor as well, whereby subsequent panic
or the lack of appropriate and available leadership may significantly affect the evacuation
performance (Duclos 1987, Colonna 2001):
Situation 2
A general panic situation occurs among the
population, as a result of the simultaneous
occurrence of the earthquake and the flood
sirens.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/18
Remarks P() Low Villagers are used to natural
hazards around them and
would most likely react and respond in a calm manner.
Cons Low Once the gravity of situation is realized, evacuation times
would not be impacted. These
villagers have received trainings and general education
related to these hazards as well as the alert system.
Fig. 15. Situation 2
Situation 3
Heavy casualties are to be deplored among the
population due to the earthquake. As a result,
the key members of the disaster response team in
the villages are missing when the flood sirens
ring.
Remarks P() Low Members of Community
Support Teams (CST) are
spread across the villages and unlike to be all affected
at the same time. Cons Low Dependence on CST for
evacuation is fairly low as general community
knowledge and trainings are at play during such crisis.
Fig. 15. Situation 3
- Furthermore, the obvious effect of external
environmental factors affecting orientation and movement (Graham 1999, UrbanikII 2000) are
taken into account in the two next situations:
Situation 4
The principal evacuation routes are severely
damaged due to snow or stone avalanches
triggered by the earthquake.
Remarks P() Medium Evacuation paths are
designed to be safe from these natural hazards
during the planning and design phase.
Cons High If this were the case, they
would need to pursue alternate routes, which
might delay reaching destination.
Fig. 16. Situation 4
Situation 5
The baseline scenario occurs at night in a bad
snowstorm. Therefore visibility is reduced below
10 meters in the whole area.
Remarks
P() Low Villagers are familiar with
the routes and use to severe weather conditions.
Cons Low Less likely to impact
evacuation times in a significant manner as they
are familiar with these routes and weather
conditions. Fig. 17. Situation 5
- Finally, the criticality of providing a sufficient warning time for the population to evacuate
(Graham 1999, Aboelata 2004) is illustrated in the two last situations, where the alarm could
simply not be transmitted in time by the early warning system.
Situation 6
The earthquake destroys the alarm stations in the
villages. As a result, no evacuation alarm is
given.
Remarks P() Medium These are wireless
systems connected via
satellite and less likely to
be affected with on the ground damage.
Cons High If these systems do fail to function due to unforeseen
circumstances, the evacuation times will be
significantly longer as the warnings would have to
be from a neighboring village or actual event
itself.
Fig. 18. Situation 6
Situation 7
Due to a decrease of the maintenance quality,
the alarm equipment in the villages is not in
function
.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/19
Remarks High System malfunction
probability is high as these
equipments are not managed and maintained
by Committee of Emergency Situation,
USOI department and villagers. Due to join
responsibility its upkeep, regular testing and
functioning High In case of system failures,
evacuation times will be
significantly longer.
Fig. 19. Situation 7
The qualitative risk evaluation linked to each situation
is summarized in figure 20. According to the expert, the highest risk with regard to the evacuation of the
population can be found in situations 4, where the evacuation routes are made useless as a consequence
of the earthquake; as well as situations 6 and 7, where the alarm is not given at all. Although the possible
occurrence of situation 4 would certainly be preoccupying enough to deserve further study, this
paper will focus on the crisis triggered by the failure of the alarm system to order the evacuation of a
village. Indeed, the malfunction likelihood of the early warning system (either at MU7 or at the village sirens
level) has been considered high by both consulted experts (Droz 2008 and Balsara 2008). The
importance of the potential consequences linked to the total absence of flash flood alarm in the Bartang
Valley being obvious, the risk linked to such a crisis is
thus maximal.
Situation Description Likelihood Consequences
1 Evacuation on a surprise basis Med High
2 General Panic Low Low
3 No surviving leadership Low Low
4 Very low visibility and extremely bad weather condition. Low Low
5 Evacuation path destroyed by the earthquake Medium High
6 Village alarm destroyed by the earthquake Medium High
7 Village alarm not in function due to poor maintenance High High Fig. 20. Expert situation selection.
2.3. Crisis Analysis
The risk evaluation by expert judgment has yielded to selecting the crisis-triggering situation where the
village alarm system does not function properly and fails to issue the evacuation alarm before the passage
of the flash flood. According to the expert, the likelihood of an alarm malfunction in the village of
Barchidiv or Nisur is “very high” that is, higher than 85%.
The malfunction of the village alarm units may result
in the passage of a super concentrated flash flood through unwarned villages. The consequences of such
a crisis in terms of human casualties are considered “medium” in Barchidiv but “high” in Nisur.
Yet, in a similar manner to the other crisis detailed earlier in this paper, I argue that a proper interactive
management of the crisis can mitigate the consequences of these malfunctions. Indeed, given the
failure of the engineered system to issue the expected early warning, the most obvious crisis management
strategy would consist in searching for alternate warning signs. In that light, two possibilities have
been identified and considered, in collaboration with M. Balsara (2008).
Reported alarm
The first strategy has been mentioned by M. Balsara as a response to one of the crisis-triggering situation
mentioned in the previous section (situation 6), and
consists of the evacuation message being transmitted
from a neighboring village. Considering the critical villages of Barchidiv and
Nisur, and given the remoteness of the location and the available transportation means, the alarm message
can be either transmitted via radio communication or by a running messenger. These possibilities have been
submitted to the expert’s judgment, in order to evaluate their probability of occurrence and success. A
warning is considered successful when the evacuation
of the village is completed before the passage of the flash flood.
Given the distance and lack of quick communication means in an earthquake aftermath, the expert
considered both the occurrence and the success of a warning transmitted via land communication
negligible. However, the chances of occurrence and success of a radio warning was considered high (60%-
85%). Thus, training the villagers to systematically transmit the warning signal to the neighboring village
by radio can mitigate the risk linked to a malfunction of the alarm equipment.
However, this strategy is based on the assumption that the alarm malfunctions are not correlated among the
villages. In other words, the occurrence of the malfunction in a given village does not affect the
likelihood of it occurring in a neighboring village as well. Yet the task of maintenance being currently in
the responsibility of the same entity (the Usoi department) in all the villages, a decrease in the
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/20
quality of the maintenance would affect all the
villages equally. The likelihood that a given alarm will not function is thus higher if the surrounding alarms
are also not functioning due to a lack of maintenance. Therefore, the reported alarm strategy would only
present itself as an efficient crisis management option if the procedure of warning the surrounding villages
by radio is formalized and the villagers subsequently trained. Furthermore, the task and responsibility of
maintaining the village alarm equipment should be outsourced to the villagers themselves. That would
decrease the correlation between the alarm malfunctions. Furthermore, attributing the alarm
maintenance task to the people these alarms are designed to save may increase the likelihood that the
maintenance is properly performed. Finally it would build local capacity and empower the communities,
which was one of the main goals of the project as
stated by the World Bank (see section I.C.3.2 and ElHanbali 2007).
Direct Observation
The second strategy has been suggested by Mr. Droz (2008), and is based on the capacity of the villagers to
notice and react to the chain of natural events that would precede the flashflood in the baseline scenario.
Namely, the life saving strategy consists in the reflex of the Community Support Teams in the villages to
spontaneously take the appropriate decision to conduct a safety evacuation, once a strong earthquake and/or
important flow rate variations in the river are observed.
A spontaneous decision of the villagers to evacuated the village shortly after the occurrence of the
earthquake, and thus before the occurrence of the flash flood, would remove the negative consequences of a
malfunction of the alarm system by giving more than enough time for a successful journey to the safety
havens. Such a reaction is thus à priori desirable. Yet, the occurrence and success of such a spontaneous
decision depends on a chain of likelihoods that frame the decision making process within the community. In
the same manner as in a preceding paragraph, these crisis management stages will be here described in the
frame of Reason’s (1990) OODA loop:
- Observation: The first likelihood concerns the
probability of these preceding phenomena to be observed and noticed by the villagers, despite
possible difficult environmental conditions resulting in a low visibility. In the considered
case, these phenomena are an earthquake and a strong river flow rate variation. The likelihood
that the villagers would notice these events is considered “medium” (40%-60%) and “very
high” (>85%) for the earthquake and the flow rate
variation respectively. Yet, the fact has to be
noticed that these probabilities depend greatly on local conditions, mainly environmental, that
could affect the detection ability of the villagers. - Orientation: An inference must then be made
between the detected phenomenon and the imminent threat of a flash flood. There is thus
certain likelihood that these events be properly interpreted by the Community Support Teams
(CST) that may then issue an evacuation order. This likelihood is considered “high” (60%-80%)
by the expert. - Decision: The evacuation order from the CST
must then be followed by the villagers, despite the absence of a tangible and unmistakable
evacuation signal, such as alarm sirens. Therefore, the likelihood must be considered that
the whole village will actually follow the
evacuation order from their fellow villagers of the CST in the absence of an alarm. According to the
expert, an issue can be found here in the absence of formalized back up village alarm system that
can be unmistakably activated and followed in case the main system fails to work. Such a system
can be as simple as a village gong, but must be implemented.
- Action: The evacuation order must then be transmitted to the entire population of the village
in a minimum time and, again, in the absence of an unmistakable alarm siren. However, if the
entire population is reached and warned, the chances of a successful evacuation are “very
high” (>85%) according to the expert. - Observation: Finally, the observation of the
effectiveness of the intended actions is critical. Specifically, in the present case, the piping
phenomenon and the resulting flashflood can occur up to hours after the triggering earthquake.
Therefore, it is critical that the entire population stays in the safety haven until the risk is
confirmed to be decreased to an acceptable level.
If the OODA loop is successfully conducted, the villagers themselves can give an evacuation alarm,
shortly after the occurrence of the early phenomenon that is, several hours before the occurrence of the
flood. Furthermore, this warning mechanism would be
totally unaffected by any of the hardware malfunction of the system that are considered here. Finally, if we
consider the OODA loops for the two considered warning signs (earthquake and flow rate variation) as
a parallel configuration of two series system, the overall chances of success estimated by the expert is
“high”, which provides internal validation to this strategy.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/21
III. Evaluation
A qualitative analysis of the robustness of the Lake
Sarez Risk Mitigation Project has been undertaken in this work. The evaluation of the robustness of the
LSRMP consisted in inquiring whether the capacity depicted by an expert assessment meets the demand in
the considered situations. In the absence of a crisis, the baseline demand of respectively at least 12 and 16
minutes is hardly met by the baseline capacity of respectively 12 and 19 minutes in the critical villages
of Barchidiv and Nisur. However, knowing that most of both villages are located in relatively safe zone, the
consequences of this lack of excess capacity have been estimated to less than 10 casualties (Droz 2008).
As a matter of fact, the LSRMP has lead to a significant decrease of the risk in the Bartang valley to
similar orders of magnitude to those nowadays
admitted for engineered dams (Droz 2007), which must be considered a success, given the remoteness
and lack of infrastructure of the area. Furthermore, the effect of most of the considered potential crisis
situations on the reliability of the LSRMP were considered “low” by the experts in terms of likelihood
or consequences. This denotes a sufficient robustness
of the system with regards to most of the considered
crises, due to its configuration, correlation, and local excess capacity.
However, the expert assessments have also revealed that the probability of occurrence of the riskiest crisis-
triggering situations are highly dependent on the quality of the maintenance of the infrastructure, for
both the engineered early warning system and the population evacuation plan.
In that light, given (1) the non-negligible level of natural hazard risk linked to Lake Sarez (fig. 5.) and
(2) the emitted concerns about the future staffing and funding of the device, as well as the decrease in the
quality of maintenance (section I.C.3.4. and expert assessments), a crisis similar to those that were
analyzed in this paper is likely to happen in a medium
to far term future, and will thus require an efficient interactive management as an absolute necessity to
mitigate its effect. In that light, the following recommendations are given
as possible paths to address this issue and thus limit the related risk through an optimized crisis mitigation
and management strategy.
IV. Recommendations
The following recommendations were issued on the basis of the theories and methods drafted by Professor
Robert Bea, from the University of California at Berkeley, in the frame of his research on interactive
crisis management strategies. The fact that Professor Bea based his work on 500+ accident cases (Bea
2008) provides itself an external validation to these recommendations. Additionally, an internal validation
in the specific context of Lake Sarez will be sought through the submission of this paper to the local
experts.
A. Proactive measures: towards a High Reliability Organization
The term “High Reliability Organization” (HRO) describes “organization that have operated nearly error
free for a long period of time” (Bea 2008). Such organization include nuclear aircraft carriers, nuclear
power plants or air traffic control system; the common characteristic of all these systems being the extreme
consequences of the occurrence of a failure. In that light, and given that its very purpose is to be reliable
enough to decrease the risk linked to the Usoi dam, it is my opinion that the LSRMP should strive towards
being a HRO as a proactive measure to prevent crises. This section is to underline HRO principles (Roberts
1990, Bea 2008) and characteristics that are particularly relevant to the LSRMP.
1. Safety oriented organizational culture
The first relevant common characteristic of an HRO is to place safety in the very root of their
organizational culture. In other words, such
organizations are primarily preoccupied by failures and crises, at every level of management.
Such state of mind is crucial among the actors involved in the LRSMP as it would allow a quick
detection and remediation of to-be crises, as well as a more efficient management of crisis that do
occur. In that light, the decision of the Usoi department to build the dam house at a lower
altitude than advised by the expert, in order to “spare their men” (Droz 2008) is a preoccupying
detail. Indeed, although not necessarily significant in terms of direct risk increase, this
decision may reveal a deeper and preoccupying mindset, where safety requirements are
compromised for an economy of labor, even before the operational phase of the project.
2. Open communication and extensive process
auditing
According to Merry (1998), a safety oriented
organizational culture involves, among other things, ongoing safety performance
measurements and openness of communications among the organization. Moreover, extensive
process auditing procedures and actions are
mentioned by Roberts and Libuser (1993) to be key aspects of a HRO. Therefore, the absence of
motivation of the operating authority to being audited is worrying at the very least. Indeed,
despite numerous attempts, no relevant answer has been obtained from the Usoi department in
the frame of this paper. Furthermore, Droz (2008) mentioned a decrease in the quality of the safety
reports issued by the organization, as well as a
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/22
break in the communication between himself (the
designer) and the current operators of the project.
3. Migrating decision making
Migrating decision making is mentioned by
Roberts (1989) to be another key aspects of HRO’s, whereby “authority is pushed to the
lower levels of the organization [and…] decision making responsibility allowed to migrate to the
persons with the most expertise to make the decision when the situation arises (employee
empowerment)” (Bea 2008). Yet in the case of the Usoi Department, most operational decisions
concerning maintenance (Droz 2008), real time management of crisis (Droz 2008) and training
(Karim 2008) are mentioned to be left to the person of Kadam Maksaev, the Usoi Department
head, in Dushanbe2. Such a decision making
scheme does not optimally attribute the available cognitive resources of the organization to
improve its safety and may eventually compromise its ability to efficiently resolve an
occurring crisis.
Therefore, maintaining the reliability of the LSRMP on a long-term basis would involve a deep shift in the
organizational mindset towards a safety oriented culture; a better adequacy between the decision
making power, responsibility and relevant knowledge; and the restoration of communications with potentially
useful parties outside the organization coupled to an inside motivation to be audited. In addition, several
other HRO characteristics can be mentioned to potentially contribute decrease the likelihood and
optimize the management of crisis in the case of the LSRMP. Such principles include a commitment to
resilience, an emphasis on training and selection, and a proper management of incentives, all of which will
be further described in the following paragraphs.
B. Interactive measures: staying awake!
The importance of properly managing the real time occurrence of a crisis, in order to limit its effect and
restore a proper functioning of the system, has been described at the beginning of this paper (section I.B).
Considering the selected crisis situations, a specific interactive crisis management strategy has been
suggested to minimize the time required to both warn and evacuate the population (sections II.B.1.5. and
II.B.2.3.). The goal of the following section is to browse the OODA loops of both strategies to identify
and summarize the main system characteristics necessary to their successful application.
2 Which is located at more than 400km of the Bartang Valley.
Furthermore, the unique available helicopter has been reported to crash in May 2008 (Droz 2008).
1. Early Warning System Crisis Management
Orientation
The symptom detection time depends on the proper functioning system’s sensorial abilities. In
the population evacuation case, these sensorial abilities are the villagers themselves that should
be able to quickly detect any potential crisis-forecasting event in their environment. This goal
is achieved by raising awareness through training and education. Fortunately, risk assessments
(Droz 2007) have revealed that the trigger of any natural hazard linked to lake Sarez is quasi
certainly a strong earthquake, which is rather obvious to detect. In the early warning system
case, the detection time is influenced by the flood sensors at MU9, as well as the ability of the
system to properly transmit the alarm to the
operations head quarters in Dushanbe, by avoiding unfavorably correlated components (e.g.
components in a parallel configuration that are threatened by the same source of risk). In the case
of lake Sarez this is done by the low correlation between the probabilities of failure of MU7 and
MU9 (section II.B.1.5). However, the symptom detection time also depends on an uncontrollable
characteristic of the “demand” on the system that is, the morphology of the flood. Indeed, the
strategy would only be effective for progressively increasing floods (Droz 2008), where sufficient
time is available between the passage of a 400m3/s flow rate to a level three 2000m3/s flow.
If the flood is too sudden, the available time is too small to effectively perform the prescribed
actions.
Orientation
The orientation mainly depends on the ability of
the involved persons to decode the symptoms and to attribute them to their actual cause or
consequence. This ability is described by Weick (1995) as “sensemaking” and is a critical
component to the success of the crisis management strategy.
Decision
The decision time depends on the ability of the
responsible persons to apply a clear and efficient decision protocol, whereby the decision power in
a given situation is attributed to the person that has the highest ability to do so. This ability
includes a deep enough experience and knowledge of the specific field context to be able
to appropriately visualize the situation (i.e. “to step in the victim’s shoes”), the ability to generate
and browse numerous decision options, and the ability to instantly be aware of the probable
consequences of any possible solution. In other words, the decision maker must be able to
exercise empathy, improvisation and mental simulation in a stressful and time limited situation
(Bea 2008).
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/23
Action
Finally the minimization of the action time requires a good coordination within the
communities and the crisis management crew, as well as the system’s ability to efficiently transmit
information to the village alarms.
Most importantly, one can notice the fact that the
success of the described strategy mainly depends on human and organizational qualities. It is thus once
again the responsibility of the top management authority to sustain these qualities by nourishing a
safety-based organizational culture that includes what
Weick et al (1998) call a “commitment to resilience”, which involves a formal support for improvisation in
crisis situations, and the active building of the organization’s cognitive resource. Yet this critical
aspect can only be achieved through an adapted training and selection policy.
2. Population Evacuation Crisis Management
The strategy suggested in the following section is
based on the several strategies suggested by or submitted to the several experts (M. Balsara and Mr.
Droz). It consists on achieving the goal of successfully evacuating in time the villages at risk through a triple
barrier strategy. The first barrier is a spontaneous evacuation of the population at the observation of
natural early warning signs. If the villagers fail to evacuate, the second barrier is the siren alarm from the
engineered warning system. If the villagers still fail to evacuate due to a system malfunction, the last barrier
is a reported alarm via radio communication from a neighboring village.
2.1. Spontaneous evacuation
Despite the presence of an engineered early warning
system, I argue that the success of the evacuation plan must primarily be based on the evacuees themselves
for the following reasons. First I believe that their own security must ultimately rely on themselves, rather
than on an unfamiliar and complex engineered system that is far from being infallible, as the experts’
assessment have shown. Secondly, I believe that the extremely valuable capital of local knowledge among
the villagers must be used to mitigate their risk situation. Finally, the possibility that the villagers
would spontaneously evacuate towards the mountains at the occurrence of an important natural hazard has
been considered by both MM. Droz and Balsara. Therefore, the training and formalization of such a
process through the OODA loop described in section II.B.2.3 is likely to increase its effectiveness. Each
step of this loop is considered in the following paragraph, with specific implementation
recommendations that were issued in collaboration with M Balsara.
- Observation:
The local capacity to observe and notice specific early warning signs may be increased through
training. According to M Balsara, it is important that the training be adapted to the multiple layers
of the population, including a children
sensibilization campaign in schools. Yet, most
importantly, it must keep the people aware to natural signs by helping to avoid the pitfalls of
overconfidence (i.e. being “used” to the danger”) and over reliance on the technology alone.
- Orientation:
The level of awareness of the Community Support Teams (CST) and their ability to interpret
and react to early natural signs must be insured through frequent checks and testing by an outside
implementing agency (e.g. FOCUS humanitarian or the Usoi Department).
- Decision:
The capacity of the CST to decide an evacuation in a crisis situation must be tested and improved
through frequent and targeted drills and trainings.
- Action:
Finally, a formalized protocol must be established to alert the whole village in case of an alarm
malfunction. Such protocol may involve the use of traditional alarm techniques such as a village
gong.
2.2. Village alarm system
The second barrier relies on the capability of the early
warning system to detect the danger and trigger the village alarm systems. This system and the linked
crisis management strategies and OODA loop have been described and detailed in sections II.B.1 and
IV.B.1.
2.3. Reported alarms
Finally, in the failure of the first two evacuation
strategies, the ultimate crisis management strategy on which to rely is the collaboration within the villagers
in a crisis situation (Balsara 2008). This solution was investigated in section II.B.2.3. The two principal
elements that were found to potentially increase the success likelihood of the strategy are here
summarized: - The reflex to warn the neighboring villagers at
the reception of a level 3 evacuation alarm must be formalized and specifically trained. Moreover,
villages must be provided with efficient and
adapted inter villages alarm transmission means. Such means may include radio communications,
but also more traditional device (e.g. visual signalizations) in the event of a radio
malfunction. - Finally, responsibility to maintain the elements of
the early warning system that are within the village must be given to the villagers. In addition
to decreasing the correlation between the village alarm systems, it would perhaps increase the
maintenance level of the device, given the villagers’ obvious motivation rationale. Finally, it
would increase the building and valuation of local capacity.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/24
C. Training: Empowering the human factor
Maintaining awareness, sensemaking, empathy, improvisation, mental simulation and coordination
capabilities among the human components of the system requires the active sustaining of the
organization’s cognitive resources, which must involve training and selection. Moreover, both
consulted experts have mentioned training and proper staff selection as the key measures needed to sustain
the long-term reliability of the system. Therefore, although the consulted experts have shown no
concerns about the current preparedness level of both the operators and the local population, the following
section aims at addressing the issue of training as “the cognitive skills developed for crisis management
degrade rapidly if they are not maintained and used” (Bea 2008). According to Bea (2008), training should
occur on three levels:
1. Normal Situations
First, the operators should be trained to the
system’s normal operation in order to encompass all the commonly performed tasks in their skill
based cognitive realm. In other words the result
of the training must be a skill-based performance of the routine tasks that does not mobilize
excessive cognitive resources. In the case of Lake Sarez training to normal operations involves the
proper maintenance of the hardware and infrastructure parts of the system, including the
critical maintenance operations on the flood sensors and the village alarm stations.
2. Abnormal Situations
Secondly, operators should be trained to handle abnormal situations, whereby known but
unexpected threatening event are imposed. Such training is to enable the prescribed restoration
procedure to be performed on a rule-based basis, rather than on a slower knowledge base. As a
result, the OODA loop reasoning, which would occur in the management of an untrained
situation, could be cut short to bypass the orientation and decision stages
3. In the case of the
early warning system, such training rule would include the direct launch of a testing procedure on
MU7 if a level one-flood alarm were solely given by MU9. Furthermore, the local population can
be trained toward a rule based reasoning through multiple evacuation exercises and the perfect
knowledge of the evacuation procedure.
3. Unbelievable situations
Finally, training to extraordinary “unknown unknowable” situations is crucial to sharpen the
people’s cognitive skills to properly react when a real time crisis management is needed. Indeed,
being exposed to unexpected and unexpectable situations is the best manner to train the cognitive
qualities that are mentioned above to be critical to
3 The procedure to adopt is directly linked to the observed symptoms by a trained rule.
accelerate the OODA loop sequence. In the case
of the early warning system, such a training may for example be focused on the need to manually
trigger the evacuation alarm, in a high cognitive stress situation, as soon as a major irreversible
malfunctions are detected and confirmed (such as the malfunction of MU7 in the example started
above). Furthermore, training to unexpected situations should as well be applied to the local
population, through unexpected evacuation drills and the constitution of the reflex to flee towards
the safety haven in any serious suspicion of a threatening event. The National Fire Protection
Agency (NFPA) Life Safety Code (2001) states that “Fire is always unexpected. If the drill is
always held in the same way at the same time, it
loses much of its value. When, for some reason
during an actual fire, it is not possible to follow
the usual routine of the emergency egress and
relocation drill to which occupants have be- come
accustomed, confusion and panic might ensue.
Drills should be carefully planned to simulate
actual fire conditions.” Due to their unexpected nature and the necessity to evacuate the
population in a limited given time, crises on the LSRMP must be managed in a similar manner.
Yet, as mentioned in a previous section, the Usoi Department applies the policy of not conducting
any unexpected evacuation drill on the local population, exactly to avoid the panic and
emotion that would sharpen the cognitive capabilities of the people.
In addition to the crucial important of training, a
proper staff selection process is capital to insure that the key powers and responsibilities are attributed to
the right persons that is, people possessing the needed skills and cognitive potential be efficient in a crisis
situation. Yet such a selection can only be undertaken if the proper incentives are put in place to attract and
conserve sufficient talented and trained staff.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/25
V. Conclusion A qualitative assessment of the reliability of the Lake Sarez Mitigation Project (LSRMP) and its interactive
management has been conducted on the basis of expert judgment. As a result, several potential
improvement areas have been put into light and recommendation issued. Such recommendations
include the need to strive towards High Reliability Organization (HRO) standards through a safety
oriented organizational culture, improved communication and decision making migration; the
need to improve interactive crisis management skills such as maintaining awareness, sensemaking,
empathy, improvisation, mental simulation and coordination capabilities; and the key importance of
training, in normal, abnormal and extraordinary conditions.
The goal of a qualitative assessment was precisely and
voluntarily limited to putting into light these potential improvement areas, yet a quantitative analysis of the
situation would be the logical next step in the documentation of the reliability of the LSRMP in a
crisis situation. However, more theoretical and methodological research would be then required,
notably on a way to quantify the effect of cognitive and environmental factors (such as surprise and night
time) on the evacuation time of the villages.
Furthermore, it is evident that the implementation of
all the suggested measures will only effectively take place if the proper framework of incentives and means
is provided. Indeed, incentive is needed at the operators level to prevent the exodus of skilled staff
and thus insure the durability of the system through a proper maintenance, as well as sustained cognitive
skills and local knowledge. Furthermore, incentive is needed at the management level to effectively promote
a safety oriented organizational culture and lead the organization towards a HRO. Finally, incentive is
needed at the government level to provide the project with sufficient means to achieve its goals, once
international funding has ceased. Although a variety of means can be thought of to create incentives based
on the human wants and needs, including regulation and peer recognition, the incentives that have been
recognized to be most efficient (Bea 2008) in the
implementation of reliability measures on a long term perspective are rather positive (i.e. rewards rather than
punishment) and financial, and often rooted in a societal increase of safety and reliability requirements.
Therefore the most efficient incentives to provide long term funding sustainability to the LSRMP are likely to
be found in the realms of international commerce and politics, and public opinion. Thus, a proper financial,
social and economical study of the long-term sustainability of this endeavor is yet to be undertaken.
As a conclusion, the main principle to bear in mind at
the end of this paper is the importance of not relying on the technology alone, while managing risk
mitigating system. Technology can fail, especially in the given context of hard environmental conditions
and decreasing maintenance quality. As a matter of fact, technology will fail; and by failing will create a
crisis if no robust measure is taken. Moreover, the engineered alarm system shows in the present case a
very limited excess capacity of a few minutes at most, while the first signs of the phenomenon (i.e. the
earthquake) occurred and could thus be interpreted with a time margin of several hours. Therefore,
although the engineered early warning system is here clearly necessary to insure the warning of the entire
valley with an acceptable probability in the occurrence
of a flashflood, the potential effect of the grass-root capacity building of the communities through
intensive training and education, and based on their extensive knowledge of the location, must not be
neglected and may result to a complementary and much more efficient means to mitigate the risk in the
Bartang valley.
Finally and more fundamentally, it is mostly important to acknowledge the impossibility to entirely control
the risk linked to natural phenomenon of the magnitude of Lake Sarez; and perhaps someday, in the
dreaded event of a catastrophe aftermath, have the humility to consider the perhaps only reasonable
reactive approach to a near miss, given the amplitude of the consequences: withdrawing, migrating and
resettling, and ultimately leave to Mother Nature the conclusion of the story.
VI. Acknowledgement
The author is indebted to the consulted experts, Mr. Patrice Droz (Stucky Ltd, Switzerland) and Mr. Rahim Balsara (FOCUS Humanitarian, Tajikistan) for their priceless contribution. Furthermore, Ms Michèle Itten, Mr. Mustafa
Karim, Dr Robert Bea and Mr. Rune Storesund are acknowledged for their precious advice.
.
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/26
VII. References
[1] Aboelata M et al, “Transportation model for evacuation in estimating dam failure life loss”, Proceedings of the Australian Committee on Large Dams Conference, 2004
[2] Balsara R, Personal Correspondence, April 2008 [3] Bea R, “Human and Organizational Factors: Quality and Reliability of Engineered
Systems”, Course Reader, 2008 [4] Bea R, “Managing the Unpredictable”, ASME feature article, 2008 [5] Colonna G, “Introduction to Employee Fire and Life Safety », National Fire Protection
Association, 2001 [6] Droz P, “Abstracts Related to Risk Mitigation of the LSRMP Final Report”, Stucky
Ltd, 2007 [7] Droz P, Spasic-Gril L, “Lake Sarez Risk Mitigation Project: A Global Risk Analysis”,
ICOLD, 2006 [8] Droz P, Personal Correspondence, February - April 2008 [9] Duclos P et al, “Community evacuation following a chlorine release”, Mississippi
Disasters 11 (4) , 1987 [10] El- Hanbali U, “Implementation Completion and Results Report […] for Lake Sarez
Risk Mitigation Project”, World Bank, 2007 [11] “Flood on Stava valley”, Seconds from Disaster, National Geographic Channel, 2008 [12] Gayosov A, Personal Correspondence, April 2008 [13] Genevois R, Ghirotti M, “The 1963 Vaiont Landslide”, Giornale di Geologia
Applicata., 2005 [14] Graham W, “A Simple Procedure for Estimating Loss of Life from Dam Failure”,
Dam Safety Office, US Bureau of Reclamation, 2001 [15] Karim M, Personal Correspondence, April 2008 [16] Merry M, “Assessing the Safety Culture of an Organization”, J.Safety and Reliability
Society, 1998 [17] Life Safety Code, National Fire Protection Association, 2001 [18] Palmieri A, “Project Upraisal Document […] for Lake Sarez Risk Mitigation Project”,
World Bank, 2000
Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/27
[19] Palmieri A, “UN/IDNDR Interagency Risk Assessment Mission to Lake Sarez”, UN-
OCHA 1999 [20] Papyrin L, “Myths on Lake Sarez risk mitigation and realities”, Ferghana information
agency, 2007 [21] Risley et al, “Usoi Dam Wave Overtopping and Flood Routing in the Bartang and Panj
Rivers, Tajikistan”, USAid Water Resource Investigation Report, 2006 [22] Roberts K, “New Challenges in Organizational Research:High Reliability
Organizations”, Industrial Crisis Quaterly, 1989 [23] Roberts K and Libuster C, “From Bophal to Banking, Organizational Design Can
Mitigate Risk”, Organizational Dynamics, 1993 [24] Roberts K, “Some Characteristics of High Reliability Organizations”, Organization
Science, 1990 [25] Schuster R, Alford D, “Usoi Landslide Dam and Lake Sarez, Pamir Mountains,
Tajikistan”, Environmental Engineering and Geoscience, 2004 [26] Sime J, “Crowd Psychology and Engineering”, Safety Science, 1995 [27] “Tajikistan Lake Sarez”, Fela Planungs AG Website, 2004 (http://www.fela.ch)
(02/25/08) [28] UrbanikII T, “Evacuation time estimates for nuclear power plants”, Journal of
Hazardous Materials , 2000 [29] Weick K, “Sensemaking in Organizations”, Thousand Oaks, CA:Sage [30] Weick K et al, “Organizing for High Reliability: Processes of Collective
Mindfulness”, Research in Organizational Behavior, 1998.