lake sarez: interactive crisis management on the...

27
Human and Organization Factors: Quality & Reliability of Engineered Systems- CE290/2008/1 Lake Sarez: Interactive Crisis Management on the Highest Dam in the World Marc F. Muller Abstract: This paper evaluates the reliability of a natural dam safety system in a crisis situation. The system assessed in this work is the early warning and population evacuation plan linked to the Usoi natural landslide dam on Lake Sarez in Tajikistan. Based on experts’ opinions, both the capacity of the system to issue flash flood early warning alarms, and its capacity to safely evacuate the population is found satisfactory in most crisis scenarios. Yet situations involving the malfunction of key hardware components are found to create crises that may compromise the reliability of the system. Recommendations are issued to favor a proper interactive management of these crises by improving the performance of the human components of the system. Such recommendations include the requirement of High Reliability Organization standards as well as an emphasis on training and selection. Finally, the suggestion is made to put the emphasis on the education, training, local knowledge and judgment of the local population, rather than solely relying on the technology embedded in the system’s hardware, for such a crucial matter as flood safety. Subject Headings: Interactive Systems, Dam Safety, Risk Management, System Reliability I. Background A. Introduction This paper is written to apply the principles of interactive approaches to assess and mitigate the risks linked to human and organizational factors in the management of a rapidly evolving crisis. Specifically, the study will be conducted in the context of the safety of the highest dam in the world (Palmieri, 1999), Usoi natural landslide dam on Lake Sarez, in the Tajikistan Pamir Mountains. Fig. 1. Satellite Image of Lake Sarez, Tajikistan [http://veimages.gsfc.nasa.gov/2388/ISS002-ESC-7771_lrg.jpg] Although recent studies (Droz 2007) revealed a probability of failure comparable to the safety admitted on engineered man-made dams, the high consequences of a hypothetical failure on the whole region due to its size are such that a state of the art early warning system has been installed between 2000 and 2006 with international funding. The system is designed to issue several alarm levels at the detection of an early failure sign and includes a direct evacuation alarm in the downstream villages in case a flash flood is detected. Although the system is highly automated, human factors still remain important components to its reliability, amongst other things, by insuring a proper maintenance and operation of the automated elements, or by insuring the interactive management of unknown unknowable events in crisis situations. A meta-study of more than 600 engineering failures showed that merely 20% of the accidents where directly due to an intrinsic failure of the engineered system (Bea 2008). The remaining 80% could be linked to a human malfunction. These individual human malfunctions can be linked to several deeper roots, including organizational malfunctions, procedure flaws, dysfunctional hardware or an inadequate working environment. They either directly cause the system failure (extrinsic cause), or cause a situation prone to an intrinsic failure. In the case of Lake Sarez, due to the extreme remoteness of the location, a correct maintenance is especially critical to prevent an intrinsic failure of the system. Some informal concerns having been expressed about decreases in the quantity of qualified local operating personal (Droz 2008, Personal communication), this paper intends to evaluate the robustness of the interactive management of the system in the event of a crisis caused, directly or indirectly, by a human malfunction. B. Interactive Crisis Management A crisis can be defined as “ a developing sequence of events in which the risks associated with the system increases to a hazardous state […] and occur when improbable events are joined and produce an evolutionary and interactive complexity in the performance of a system.” (Bea 2008). Such situations are especially likely to occur in high risk, high uncertainty systems such as those destined to mitigate natural hazards. Although such systems are designed to correctly mitigate the targeted natural hazard (in which case no crisis occurs as such), there are chances that the standard procedures and processes be disturbed, leading to an unexpected abnormal situation

Upload: nguyendan

Post on 23-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Human and Organization Factors: Quality & Reliability of Engineered Systems- CE290/2008/1

Lake Sarez: Interactive Crisis Management on the Highest Dam in the World

Marc F. Muller

Abstract: This paper evaluates the reliability of a natural dam safety system in a crisis situation. The system assessed

in this work is the early warning and population evacuation plan linked to the Usoi natural landslide dam on Lake Sarez in Tajikistan. Based on experts’ opinions, both the capacity of the system to issue flash flood early warning

alarms, and its capacity to safely evacuate the population is found satisfactory in most crisis scenarios. Yet situations involving the malfunction of key hardware components are found to create crises that may compromise the reliability

of the system. Recommendations are issued to favor a proper interactive management of these crises by improving the performance of the human components of the system. Such recommendations include the requirement of High

Reliability Organization standards as well as an emphasis on training and selection. Finally, the suggestion is made to put the emphasis on the education, training, local knowledge and judgment of the local population, rather than solely

relying on the technology embedded in the system’s hardware, for such a crucial matter as flood safety.

Subject Headings: Interactive Systems, Dam Safety, Risk Management, System Reliability

I. Background

A. Introduction

This paper is written to apply the principles of interactive approaches to assess and mitigate the risks

linked to human and organizational factors in the

management of a rapidly evolving crisis. Specifically, the study will be conducted in the context of the safety

of the highest dam in the world (Palmieri, 1999), Usoi natural landslide dam on Lake Sarez, in the Tajikistan

Pamir Mountains.

Fig. 1. Satellite Image of Lake Sarez, Tajikistan

[http://veimages.gsfc.nasa.gov/2388/ISS002-ESC-7771_lrg.jpg]

Although recent studies (Droz 2007) revealed a

probability of failure comparable to the safety admitted on engineered man-made dams, the high

consequences of a hypothetical failure on the whole region due to its size are such that a state of the art

early warning system has been installed between 2000

and 2006 with international funding. The system is designed to issue several alarm levels at the detection

of an early failure sign and includes a direct evacuation alarm in the downstream villages in case a

flash flood is detected. Although the system is highly automated, human factors still remain important

components to its reliability, amongst other things, by

insuring a proper maintenance and operation of the automated elements, or by insuring the interactive

management of unknown unknowable events in crisis situations. A meta-study of more than 600 engineering

failures showed that merely 20% of the accidents

where directly due to an intrinsic failure of the engineered system (Bea 2008). The remaining 80%

could be linked to a human malfunction. These individual human malfunctions can be linked to

several deeper roots, including organizational malfunctions, procedure flaws, dysfunctional

hardware or an inadequate working environment. They either directly cause the system failure (extrinsic

cause), or cause a situation prone to an intrinsic failure. In the case of Lake Sarez, due to the extreme

remoteness of the location, a correct maintenance is especially critical to prevent an intrinsic failure of the

system. Some informal concerns having been expressed about

decreases in the quantity of qualified local operating personal (Droz 2008, Personal communication), this

paper intends to evaluate the robustness of the interactive management of the system in the event of a

crisis caused, directly or indirectly, by a human malfunction.

B. Interactive Crisis Management

A crisis can be defined as “ a developing sequence of

events in which the risks associated with the system

increases to a hazardous state […] and occur when

improbable events are joined and produce an

evolutionary and interactive complexity in the

performance of a system.” (Bea 2008). Such situations

are especially likely to occur in high risk, high uncertainty systems such as those destined to mitigate

natural hazards. Although such systems are designed to correctly mitigate the targeted natural hazard (in

which case no crisis occurs as such), there are chances that the standard procedures and processes be

disturbed, leading to an unexpected abnormal situation

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/2

where the system is unable to deliver the outcome for

which it was designed. Such state is a crisis situation. As mentioned above, the disturbance leading to crisis

is most of the time a combination of events. These events are generally characterized by a high

unpredictability (hence a low probability of occurrence) and high potential consequences. They are

often fundamentally the result of “human operators

‘pushing the envelope’ and thereby breaching the

safety defenses of an otherwise safe system” (Bea 2008). Such human malfunctions are often violations

whereby people “do what they should not be doing” (Dougherty 1995, quoted in Bea 2008). Such

violations could include the failure to properly accomplish the required maintenance tasks of the

safety systems. Another common source of disturbing events can be classified as unknown unknowable and

include the occurrence of unpredictable damaging

event, such as the simultaneous occurrence of a third natural hazard.

Such chain of events creates a crisis situation, which in a natural hazard context, often turns out to be a

rapidly evolving crisis, whereby time is a critical factor. In the case of Lake Sarez, numerical models

expect a massive flood to reach Barchidiv, the uppermost village of the valley, less than 30 minutes

after its generating event (Zaninetti 2000, quoted by Schuster 2004). Indeed, rapidly evolving crises are

characterized by the time factor and the urge to rescue the system as quickly as possible. Such situation

produces a considerable stress to the human operators, which often results in the degradation of cognitive

performances, eventually leading to vagabonding, retreating and cognitive black outs in extreme cases

(Bea 2008). As a matter of fact, rapidly evolving crisis lead to a situation where the cognitive abilities of the

human operators are often minimal at the very moment they are most solicited. Furthermore, if the

occurrence of the crisis situation itself can be linked to the already poor overall reliability of the human factor

(e.g. due to limited manpower, as it may be for the studied case), there ensues a dangerous situation

almost prone to disaster. Three complementary approaches are known to

mitigate the risks linked to a system (Bea 2008). A proactive approach encompasses the actions taken to

prevent the occurrence of a crisis, whereas a reactive

approach includes the application of lessons learned from passed mishaps and near failures to prevent

future crisis. An interactive approach to risk management includes

the group of actions taken to restore the system to its operational state, given an occurring crisis. In other

words, an interactive approach consists in strategies to turn potential catastrophes into near misses. It is

impossible to totally control, through proactive and reactive measures alone, the chain of such events as

unknown unknowables and human malfunctions that are prone to lead to a crisis. Fortunately, all crisis do

not lead to failures, and several examples can be given of systems that interactively manage with success

periodically occurring unexpected crises, including medical emergency services, commercial aviation, and

natural hazard mitigation. According to Bea (2008),

behind each accident is the order of 10 to 100 near

misses. Several authors have studied the common characteristics of these resilient systems (Bea, Weick,

Lagadec, Klein, Miller, Pidgeon: all quoted in Bea 2008) and strategies designed to successfully

“engineer and manage the unexpected”. The current risk mitigation strategy for Lake Sarez is

proactive by nature. The strategy consists in limiting the consequences of a potential future failure through

early detection instrumentation and an efficient evacuation plan on one side, while a long-term risk

mitigation component is designed to eventually reduce the likelihood of future risks. Yet, the interactive

aspect of risk mitigation is a key aspect where human factors are of highest importance. Indeed, given a

crisis situation, the ultimate success of its adaptive management almost always relies on human

performance. This parameter is often not enough taken

into account in mitigation strategies, perhaps due to its difficulty to quantify and to address (humans are

difficult to engineer). The proactive component of Lake Sarez risk mitigation plan being designed and

operational, I would like to study the interactive aspects of the mitigation strategy, by focusing on its

four key sequential components, the so called OODA loop (Orr 1983, quoted in Bea 2008) : observation and

crisis detection, orientation and sense making, decision and action.

Therefore, the system’s ability to successfully manage an occurring natural hazard will be evaluated, in the

event of an unexpected disturbance leading to a crisis situation. The system’s operational robustness to the

unknown will be assessed.

C. Lake Sarez Case Background

The following section gives a background for the studied case. Although all the given information is not

strictly linked to the management of rapidly evolving crisis, broad background knowledge of the

institutional and local context is required to understand the bigger picture and properly address the

focused topic.

1. Situation

Lake Sarez is located in the semi autonomous Gorno

Badakhshan region, in the southeastern part of the former Soviet Republic of Tajikistan in Central Asia.

The dam is located in the Bartang valley in the Pamir mountain range, which is counted among the highest

and least accessible mountain ranges in the world. Among the common problems faced by the local

population of such remote and mountainous areas is an extreme social, economic and political isolation

that is exacerbated by the difficulties arising from the transition from Soviet rule (Schuster 2004). Moreover,

the inaccessibility of the region is notorious, as a two days trip through ill maintained mountain trails is

necessary to link the region to Dushanbe, the country’s capital city. In addition to its isolation and

inaccessibility, the area displays an extremely high seismic activity, coupled to a harsh continental

climate. As a result, the region is a natural disaster

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/3

prone area, where earthquake, floods, landslides and

stone avalanches are common. The lake was formed in 1911 in the aftermath of a 7.6

magnitude earthquake causing an enormous landslide that blocked the Murgrab river valley (Schuster 2004).

The river waters rose to form a 17 cubic kilometer lake, comparable to half of the size of lake Geneva,

flooding the upstream valley on 60 km, with a water level at 3260 meters above sea level. With a height of

550 meters to the lowest point of its crest, Usoi dam is the greatest dam in the world. The dam is twice as

high as Nurek dam, the tallest man made dam in the world, also located in Tajikistan.

Fig. 2. Lake Sarez location map [http://www.acig.org/artman/uploads/map_tajikistan.jpg]

Being a non-engineered gigantic dam, the safety of

Usoi dam is not a recent preoccupation. Until recently, the little information that passed beyond the borders of

the Soviet Union described a “colossal dam of

questionable stability which retained a vast reservoir

of water. […] Impact projections suggested that a

flood could affect roughly 5 million people living

along the Bartang, Panj and Amu Darya rivers, a path

traversing Tajikistan, Afghanistan, Uzbekistan and

Turkmenistan” (Palmieri 1999) to the Aral Sea, thus

reaching big scale, international proportions.

2. Chronology of events On February 18th 1911, a gigantic 7.6 magnitude

earthquake produced a 2.2 billion cubic meter landslide, burying the village of Usoi and blocking the

course of the Murgrab River. The rising water that gradually formed drowned the village of Sarez to form

a 200-meter deep lake. The water level was ultimately stabilized in 1914 by the seepage through the dam,

with the formation of 57 streams that form an important erosion canyon on the downstream side of

the dam. In the following years, several Russian expeditions

were put in place to evaluate the stability of the dam. The information yielded by these expeditions is rare

and mostly unpublished (Schuster 2004). Yet, the opinions on the stability of the dam were very diverse,

but the consideration of the high consequences of failure obviously conducted the Russians to install a

first early warning system in 1988 (Palmieri 2000).

This early warning system was based on the

combination of hydrometric measurements and visual observations for flood detection. The information was

transmitted through cable and satellite connection to Moscow and Dushanbe, where the decision could be

taken to evacuate the highly populated lower Amu Darya basin. Yet no plan was designed to alert and

evacuate the 19’000 inhabitants of the Bartang valley that were considered the most at risk (Palmieri 2000).

Moreover, at the fall of the Soviet Union in the early 1990’s, the technology was aging and the whole

system was considered inefficient due to the lack of proper maintenance.

In 1991, the country achieved its independence from the Soviet Union. This was followed by a five-year

civil war that was particularly intense in this semi autonomous region of Gorno Badakhshan. As a result,

the isolation of the region was even increased and the

installed early warning system became even more obsolete.

In 1997, the newly formed government of Tajikistan brought the situation to the attention of the united

nation international decade for natural disaster reduction (UN/IDNDR) secretariat, to “lead an effort

to raise international awareness on this problem and

develop a program to reduce this threat” (Palmieri

1999). As a result, a UN/IDNDR Interagency Risk

Assessment Mission was sent to Lake Sarez to assess the situation. The mission confirmed the inefficiency

of the existing early warning system and acknowledged the problems caused by the very low

accessibility of the region that prevented the implementation of a structural heavily engineered

solution to stabilize the dam. The mission also stated the very low probability of occurrence of a major

disaster. Yet, due to the high consequences a failure would have, the design of an up to date early warning

system was recommended that would enable the safe evacuation of the nearby population most at risk.

In 2000, the Lake Sarez Risk Mitigation Plan (LSRMP) was approved by the World Bank, with the

expressed objective of decreasing the “proportion of

vulnerable communities in the Bartang and Murgrab

valleys with disaster management plans as well as

responsibilities and procedures agreed upon by

community leaders and villagers, responsible

government authorities and interested non

governmental organizations (NGO)” (Palmieri 2000).

The four component plan includes technical consulting and the installation of an up to date monitoring and

early warning system (component A), social training and safety related supplies to the local population

(component B), the study of a long term solution through intensive monitoring and consulting

(component C), and institutional strengthening and capacity building of the local government (component

D). The Swiss Government, the United Sates Agency for International Development (USAID), the

Government of Tajikistan, the Aga Khan Development Network (AKDN) and a credit of the

World Bank, shared the financial burden of the implementation.

Lake Sarez

Dushanbe

TADJIKISTAN

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/4

The expected five years implementation period was

extended by one year. In 2006, the LSRMP was fully implemented. In 2007, the final reports of the

implementation and the project evaluation were completed (Stucky Ltd, World Bank). The mandate of

the implementing agencies expired while the operational responsibility of the operation and

maintenance of the early warning and monitoring system is now passed over to the Tajik authorities.

Namely, the Usoi Department is formally in charge of the LSRMP, as part of the Ministry of Emergencies

and Civil Defense, in Dushanbe.

3. Lake Sarez Risk Mitigation Plan

3.1 Alternatives

One could argue that the installation of a monitoring and early warning system to mitigate a risk linked to a

natural structure that is expected to fail someday in a near or far time horizon (Papyrin 2007) may seem a

rather shy and non-durable strategy at the very least. Indeed, several other more structural proactive

measures have been considered by Soviet scientists to mitigate the risk in a more ‘durable’ manner. Such

strategies include “controlled 100-150 m water level

drawdown in the lake to eliminate overtopping by high

wave through construction of a tunnel spillway on the

left bank for irrigation in dry years and power

generation” or to “raise the crest of the lowest part of

the dam by moving the boulder material over the

obstruction using construction machinery or by the

blast fill method from the exposed scarps located

above” (Zolotarev 1986, quoted in Schuster 2004).

Yet, these most ‘rational’ solutions are difficult to implement due to the extreme remoteness of the area,

and its difficulty of access (Fig. 3). Indeed, Periotto (2000, quoted in Palmieri 2000) estimates the

construction cost of the road required to transport the infrastructure needed to realize such project to over

300,000$ per kilometer, which compromises heavily the economical feasibility of such project. However,

the component C of the implemented risk mitigation strategy includes monitoring and documentation

toward a possible durable long-term solution that would be economically feasible.

Fig. 3. Typical bridge in the Bartang Valley [27]

Another efficient prevention strategy would consist in

evacuating the local population as a preventive measure, and label the area a “non-habitable zone due

to natural hazards” to discourage further resettlement. By having the humility of acknowledging the power

of nature, such solution would display a safe and durable solution. Yet the social and political cost the

displacement of 19’000 rural people that are attached to their land is immense and difficult to pay. Indeed,

the sense of belonging to their homeland is very high among the rural population, and people prefer to stay

in their mountain valleys, despite the remoteness, the lack of supplies and the occurrence of natural hazards

(Palmieri 1999). Indeed, the local population has lived with the constant threats of natural hazards for

generations and would certainly not envisage the prospect of leaving.

Therefore, a monitoring and early warning system can

be seen as a consensus to an economically and socially feasible short to medium term risk mitigation strategy.

Indeed, the local population is deeply attached to their lands, displays a remarkably high education and is

ready to accept capacity building for community participation in disaster mitigation and response

(Palmieri 2000). Thus given the remoteness and inaccessibility of the setting, and given the alertness,

understanding and willingness to respond of the local population, the installation of an early warning and

monitoring system has been selected as the most adapted mitigation strategy for the area (Palmieri

2000)

3.2 Components Lake Sarez Mitigation Plan is formed of four

complementary components. Although this study will focus on component A (Early Warning and

Monitoring System) that will be described in details in a following section, the three other components are

here described in order to have a better understanding of the setting of the project.

Component B consists if the social training and the supply of safety related materials to the local

population. It has been implemented by FOCUS humanitarian assistance, a non-governmental

organization (NGO) with an extended work experience in the region in the fields of natural hazard

relief and mitigation (Palmieri 2000), and funded by USAID and AKDN. The objectives of this component

were to “make the early warning system community-

based” (World Bank 2007) by raising awareness. This

was done by providing information, emergency training and the involvement of the communities in

the preparation and supply of safe havens on higher

grounds. Despite implementation delays, the World Bank evaluated the implementation of this component

as “satisfactory” (El-Hanbali 2007). All the vulnerable communities have been identified, equipped with safe

havens and organized into response groups that received disaster mitigation and training groups.

Despite some initial concerns about the competency of the implementing NGO to maintain a high quality

preparedness (World Bank mid term review 2003), the current local awareness and implication of the

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/5

population and the long history of engagement of the

local NGO’s, the sustainability of component B is not put into question by the World Bank’s final evaluation

(El-Hanbali 2007). Component C consists of the study of long term

solutions to mitigate the risk linked to Lake Sarez. The study is based on the data revealed by the monitoring

system provided by component A. Alike component A, component C was financed by the Government of

Switzerland has been implemented by Stucky Consulting Engineers Ltd (STUCKY), a Swiss

company, under the guidance of an international Panel of Expert (POE), and consists of complementary desk

studies including the assessment of different routes to the lake, digital terrain modeling, inundation studies,

wave generation and propagation in the lake, mechanism of dam overtopping, sediment

accumulation rate, seepage, etc (El-Hanbali 2007).

The World Bank has evaluated this component as “satisfactory”. Although no further field research has

been scheduled for the near future, local experts and decision makers have been provided with complete

and up to date data on the situation, towards the design of a sustainable long term solution (El-Hanbali 2007).

By intending to strengthen local institution to efficiently take over the management, operation and

maintenance of Lake Sarez Risk Mitigation Plan, the awareness of component D is of particular relevance

in the evaluation of the reliability of the human component of the project. Component D has been

implemented by the Government of Tajikistan (GoT) with funding from both a World Bank credit and the

GoT. A new government agency, the Sarez Agency (SA) has been created within the Ministry of

Emergency Situation and Civil Defense (MESCD), with the mandate of managing the operation and

maintenance phase of the LSRMP. Consultancy and funding from the World Bank has been provided to

strengthen this institution in its capability to conduct this task. Yet by February 2002, due to unsatisfactory

financial management, the SA was dismissed and replaced by the Usoi Department, an existing

department within the MESCD, directed by Mr. Kadam Maksaev, which operated the original early

warning system. Ultimately, the capacity building and training were thus directed towards the Usoi

Department, which are currently in charge of the

operation and maintenance of the early detection and monitoring system. The performance and capacity of

the Usoi Department has been evaluated as

satisfactory by the World Bank at the end of the project implementation phase, yet although the

department has shown a capability to mobilize and coordinate operation from other government and

research institutions, this cooperation has not been institutionalized to ensure that it occurs on a regular

and formalized basis (El-Hanbali 2007). Moreover, a recent decrease in the department’s qualified staff has

recently raised some informal concerns on its further performances (Droz 2008, Personal communications).

3.3 Monitoring and Early Warning System

The implementation of a monitoring and early warning system constituted the first component (A) of

the LSRMP. The system was designed by STUCKY and funded by the Swiss government. Fela Planungs

AG, a Swiss construction company was awarded the supply and installation of the system (FELA Planungs

AG 2004). The relevance of such a “light” proactive mitigation system with such a light structural

component, as opposed to a heavier and perhaps more durable engineering solution, in the context of Lake

Sarez has been discussed in section 3.1. The purpose

is here to accurately describe the key components of the system, in order to allow a further analysis and

evaluation of the system’s operational robustness to the unknown. Where not specified, all the information

of the following section comes from STUCKY documentation.

a. Risks assessment

As part of components A and C, a thorough and quantitative analysis of the risks linked to the

occurrence of natural hazards on Lake Sarez has been conducted by STUCKY.

Fault tree analysis

A fault tree analysis was conducted, whereby the possible causes of a considered threatening event are

assessed and analyzed. In the case of Lake Sarez, the considered threatening event is a condensed flash

flood with a flow increase superior to 400 cubic meters per second, as basic estimates consider events

above this magnitude to cause significant damages on downstream villages. The fault tree is displayed in the

following figure.

flood >400m3/s

internal

erosion

huge wave

pressure wave earthquake

extreme flooding

internal

disturbance

landslide

water level

variation

overtopping

?

yes

no

external

erosion

clogging

global

instability

superficial slideflood >400m3/s

internal

erosion

huge wave

pressure wave earthquake

extreme flooding

internal

disturbance

landslide

water level

variation

overtopping

?

yes

no

external

erosion

clogging

global

instability

superficial slide

Fig. 4. Fault tree Analysis [6]

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/6

The analysis of the several hazards revealed by the fault tree is summarized in the following paragraphs,

and allowed the emphasis of most probable scenarios.

Lake level increase

A steady lake level increase of about 20cm/yr (Droz

2006) has been noticed. Yet a period of 200 years would be necessary for the lake level to reach the

lowest part of the crest of the dam. Moreover, the higher permeability of the upper layers of the dam is

expected to allow a higher seepage flow rate and thus slow the level increase. A dam overflow due to the

steady increase of lake level is thus considered a very unlikely event (Droz 2006).

Global Dam Instability

The stability of the natural dam has also been

considered as a possible hazard source. The external erosion due to the formation of springs

on the downstream side of the dam has not been considered important enough to compromise the

global stability of the dam. Yet, obstructions of the Murgrab River due to limited external erosion are

considered possible (Droz 2007) and may lead to sudden floods.

The global stability of the dam in the event of an earthquake has also been considered, especially given

the high seismic activity of the region. Yet, the stability of the slopes is considered as high, as the

expected displacement attributed to high accelerations (> 0.4 g) is of the order of 10 cm and considered as

negligible, given the size of the dam. As a result, although the global stability of the dam is

considered as high, sudden flash floods could be caused by obstructions of the river due to external

erosion.

Seepage

The seepage of the dam has been closely observed as a

potential source of hazard. Yet, the current estimated low hydraulic gradient and low speed of the seepage,

as well the low turbidity of the outflow water (Schuster 2004) were considered to indicate a low risk

linked to the seepage in the current condition of the dam. Yet, slow modifications of the seepage regime

are noted and attributed to the geological immaturity

of the dam that will require further monitoring. Clogging hazard is considered as low, due to the high

number of springs and the heterogeneity of the dam material. Piping hazard in the current situation is also

considered as low, yet an earthquake, or the impact of a surge wave can cause sudden modifications of the

dam’s internal structure. Such a modification could allow the formation of natural pipes within the dam,

which would result in a sudden increase in the discharge flow rate and eventual flooding.

Thus, although no alarming risk linked to seepage is to expect in the current situation, the evolution of the

discharge flow rate must be monitored with special attention in the event of an earthquake or a surge

wave.

Overtopping surge wave

The risk of the occurrence of a huge wave can be explained by the presence of a massive slowly moving

slope instability on the mountain side located on the right bank of the lake. A massive landslide into the

lake can thus be triggered by an event such as an earthquake, possibly resulting to a tsunami that would

overtop the dam. The height of the wave is a function of the speed and volume of the hypothetical landslide.

A numerical simulation of a worst case scenario given the parameters as known today yields to a wave

overtopping the dam by 50 meters above the lowest point of its crest, ensuing to a flood of about 800,000

cubic meters (Droz 2006). But the occurrence of a landslide of 0.5 cubic kilometer of volume at 20 m/s

that would be required to overtop de dam is considered as very improbable.

However, a more thorough characterization of the

right bank landslide remains to be done. Until the parameters of the possible landslide are not better

known, the “real possibility of a wave overtopping the

dam and the knowledge of the effects of such a wave

on the downstream valley will remain unclear” (Droz 2007)..

Expert Assessment Meeting

The discussion conducted above has lead to the fact that a major threatening event such as a flood superior

to 400 cubic meters per second is highly improbable in the absence of a major triggering event such as an

earthquake. Therefore, the expert assessment meeting, which has been conducted to evaluate and quantify the

probability of occurrence of the considered flood, is based on the probability of occurrence of a major

earthquake. The several considered chains of event leading from a major earthquake to the flood and their

asserted probability of occurrence are displayed in the hazard analysis tree given in annex (Droz 2007).

The two most likely scenarios are (a) a piping failure as a consequence of toe instability of the dam due to

the pressure wave caused by the earthquake; and (b) an overtopping wave caused by large landslide into the

reservoir, triggered by the earthquake. The probability of occurrence of both chains of event are in the order

of 10^-5, which is the probability of failure ordinarily admitted for man-made dams (Droz 2007). Yet, both

of these phenomenon have happened on Italian dams

in the twentieth century, resulting in enormous consequences in terms of human lives lost: the tailing

dams of the Stava valley failed due to piping (National Geographic Channel 2008), while the Vajont dam was

overtopped by a surge wave (Genevois 2005).

Risk Analysis However, despite its relatively low probability of

occurrence, a flood resulting from a failure of Usoi dam would have high consequences in terms of

damages on the downstream population. Indeed, the 19000 inhabitants of the Bartang valley are considered

as directly exposed, while a total of 132’000 inhabitants of the upper Panj valley are liable to be

affected (Palmieri 2000). Between 1000 and 10000 probable casualties are estimated (Droz 2007). Indeed,

the existing obsolete warning system would take an

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/7

approximated 54 minutes via television satellite to

reach Barchidiv, the uppermost village of the valley, and another estimated two hours to evacuate the

residents to higher ground (Schuster 2004). Given the fact that a flood triggered by a landslide caused surge

wave (scenario b) would take an approximated 25 to 30 minutes to reach Barchidiv (Zaninetti 2000, quoted

by Schuster 2004), the consequences of such an event in terms of casualties are sadly obvious.

Therefore, despite the low probability of occurrence of a threatening event, the high consequences in terms of

casualties causes the risk to be high. The following figure represents the risk level as compared to the risk

currently accepted by various dam owners.

Fig. 5. Risk Diagram [6]

Therefore, the objective of the updated early warning system is to reduce the risk by reducing the

consequences of a threatening event. Preliminary

studied by the World Bank (2000) showed that an estimated 18’500 casualties would be avoided. The

cost analysis conducted by the same institution indicated an estimated cost by statistical life saved of

232 US$. The justification for investment would thus be “very strong”, by U.S. Federal Government

standard practice in Financing Life-Protective-Program (Palmieri 2000).

b. System design Requirements

Following the recommendation issued by the UN/IDNDR mission and in the framework of the

LSRMP, an early warning and monitoring system was designed by STUCKY. The high degree of automation

of the system was required because of the data volume to be collected, the inaccessibility of the sensor

location and the necessity to automatically trigger alarms for civil protection purpose, once a pre-

established values of significant parameters are detected (Palmieri 2000). Moreover, an emphasis on

high quality data transmission systems was required due to the obvious time constraint in the event of a

flash flood, and the high degree of coordination

needed between the central managing unit in Dushanbe, the observation station at the dam, the

remote measuring units and the village alarms (Palmieri 1999).

Hardware and measuring device

The early warning and monitoring system include sensors placed in nine distinct monitoring unit (MU),

scattered on identified sensitive points in the whole area. Measured parameters include flash flood sensors

downstream of the dam and in the village of Barchidiv, seismic accelerometers on the dam toe and

the lake shore, global positioning measuring (GPS) devices scattered on the dam and the right bank

unstable slope, pressure cells for lake level and wave height in the vicinity of the dam, flash flood sensors

river gauging stations in the village of Barchidiv and

automatic weather stations. Moreover, an observation center (dam house, CU) is located on high ground on

the left bank near the dam, with visual contact on the right bank landslide. The dam house is to serve as a

base for periodical observation expeditions, in particular GPS campaigns and turbidity

measurements, as well as a relay in the transmission of the monitored data to Dushanbe, where the

Supervisory Control And Data System (SCADA) is located. There, the operational control and

management of the system is conducted by the Usoi Department. Finally, the SCADA as well as both the

dam house and the village of Barchidiv are equipped with a direct manual alarm trigger.

The general layout of the early detection and warning system is displayed in figure 6.

Data transmission

As discussed above, an efficient data transmission system is a requirement to the efficiency of the

warning system. For transmission between the monitoring units and the

central unit at the dam house, a bidirectional very high frequency (VHF) radio system is generally applied,

with the exception of the two most remote units that are linked to the central unit via satellite technology.

There, Very Small Aperture Terminal (VSAT) systems are applied using the International Maritime

Satellite Organization (Inmarsat D+) network.

Bidirectional transmissions between the central unit at the dam house and the SCADA at Dushanbe are

provided by VSAT technology using the International Telecommunications Satellite Consortium (Intelsat IS-

704) network. Finally, the link between the SCADA, the dam house,

alarms in the villages and the rescue unit in the downstream village of Khorog is provided by the

Inmarsat D+ network, allowing the priority transmission of alarm signals. Furthermore, the link is

bidirectional to allow the monitoring of the alarm equipments in the villages to detect quickly a possible

breakdown.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/8

Fig. 6. Early Warning and Monitoring System situation [7]

Fig. 7. Warning levels chart [7]

Procedures

The monitoring devices, according to the value

measured for the considered parameters, can issue five warning levels. The different warning levels are

described in the preceding table. Once the warning is issued, several procedures are

prescribed for the SCADA to follow, depending on the

warning level and parameter. Following these procedures, several alarm level could be issued,

including alarm level 1 (observation) and alarm level 3 (immediate evacuation). The complete procedure list

is given in annex. Yet several features can be noticed. There are three critical parameters suitable to activate

a direct automated evacuation alarm in the villages of

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/9

the most exposed zone. These parameters are the

detection of a flash flood, the detection of an important and persistent decrease of the discharge

flow rate, or the detection of a surge wave higher than 50m. In these cases, no direct human interaction is

involved in the alarm process, with the exception of a proper maintenance of the device.

In the event of a massive earthquake, no direct interactive measure is prescribed. However, a

proactive increase of attention to the evolution of the critical parameters mentioned above (level 1 alarm) is

ordered. Level one warnings on the critical parameters

mentioned above are “correlated”, as an increased observation of the other critical parameters is

expected. Other, monitoring parameters, considered as less

critical that is, less subject to a short-term time

constraint, are also considered. An unusual modification of these parameters also issues a level

one alarm resulting to an increased awareness and more frequent and directed measurements.

Furthermore, as mentioned above, all data is transmitted to the central unit by automated

transmissions. There, the data is transmitted to the SCADA at Dushanbe, where it is analyzed and where

the procedures are applied. However, level three warnings issued for flashfloods and large waves result

in the direct transmission of a level three alarm (evacuation) in the most vulnerable villages (Fig. 8).

Fig. 8. Equipped alarm house in a Bartang Valley

village [27] 3.4 Current situation and concerns

The monitoring and early detection system is at this date fully operational and under local Tajik steering

and management. As mentioned earlier, the Usoi Department (UD) as part of the Ministry of

Emergency and Civil Defense is the implementing agency. Capacity has been built within the

implementing agency as part of component D of the

LSRMP. Specifically, technical, managerial and organizational training has been given to the staff. As

well, operation and maintenance of the system are in the responsibility of the UD. It must be noted that the

advanced technology of the system, coupled with the harsh Pamiri environment, makes maintenance

especially critical to the sustainability of an efficient system (El-Hanbali 2006). The maintenance

instructions provided by STUCKY are given in annex. One can notice the importance of maintaining the

energy provision source for the monitoring units, especially the flood sensor unit, which constant

functioning is absolutely critical in the efficiency of the system.

Although the global performance of the delivered project has been judged “satisfactory” by the World

Bank (2006), the financing institution has raised

several concerns on the sustainability of the efficiency of the project.

The first two concerns are rather institutional and financial. First the teamwork capability among the

local agencies is questioned. Indeed, although the implementing agency (the Usoi Department) seems to

have shown a capability to mobilize and coordinate cooperation from other government and research

institutions, no formalized scheme exists, and some concern have been raised on the long term

sustainability of such informal collaboration. The second concern is raised on the possible lack of

sustained long term funding for operation and maintenance, if international funding were to stop.

The two last concerns are much more likely to directly compromise the quality and efficiency of the early

warning and monitoring system in a much shorter time and are, in my opinion more preoccupying.

First, a gradual decrease in the quality of the maintenance of the system is suspected (El-Hanbali

2006). Indeed, the fact is, that although training has been given to produce “Operation and Maintenance”

and “Control Yearbook” reports yearly, a slight decrease in the quality of the publication has been

lately noticed by experts (Droz 2008, personal communication). Furthermore, the World Bank fears a

seasonal decrease in maintenance quality due to the difficulty of access and the harsh environmental

conditions. However, as earthquakes can occur all

year long, critical maintenance tasks including periodic battery and solar panel checks must be

regularly done. A pertinent illustration on the significance of the task can be found in the

maintenance of MU7. MU7 is the measuring unit upstream from Barchidiv that is designed to detect

flash floods and issue level three evacuation warnings to 15 villages (FELA planungs AG 2004). It is thus a

critical component of the system. Yet its control unit, GPS and solar panel are located on a remote location

about 100 meters above the river (Fig. 9). Their access for maintenance is not a trivial task, especially in

winter.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/10

The decrease of the maintenance quality is a critical issue that could produce a crisis situation and will thus

be taken seriously in this work. Finally, the lack of long-term national technical

capacity to operate and maintain the system, as well as the lack of ability to recruit and maintain qualified

staff during and after the project, cause concerns (El-Hanbali 2006). Indeed, an out-migration phenomenon

of local skilled staff has been lately noticed through a significant decrease in the number of personal having

received a proper technical and organizational training that are still working among the Usoi Department

(Droz 2008, Personal communication)

As a summary, the probable decrease of the quality of maintenance of the system produces a crisis prone

situation, whose interactive management efficiency

may be compromised by the lack of qualified manpower. This paper intends to evaluate and give

recommendations to manage this uncomfortable situation.

D. Scope and methodology

1. Goal Given the concerns mentioned in the last paragraph,

the goal of the project is to assess, and evaluate the reliability of the early warning and population

evacuation systems of Lake Sarez, in the setting of a rapidly evolving crisis situation. The focus is set on

the human factor in the interactive management of

these crises.

2. Methodology 2.1 System definition

The system considered in this paper is the set of

procedures, hardware, organization and human operators that are taking part in the management and

transmission of information. The considered time window stretches between the occurrence of the

triggering natural hazard and the completed evacuation of the villagers to safe havens. I therefore

chose not to include the management of eventual

rescue missions and the subsistence of the population

in the safe havens. The quality attributes of the system that are here

considered are serviceability and safety, whereby a success is recorded when the villagers could be safely

evacuated with sufficient time notice before the flood reaches them.

2.2 Demand and capacity components of the safety system

The assessment of the reliability of the system

described above will be based on the system’s capacity to meet the demand in the context of a rapidly

evolving crisis. The demand on the system is imposed by the natural

hazard scenario that has been attributed most risk by the expert risk assessment of Lake Sarez. This

scenario will be detailed in a further section, but leads to a high intensity <2000m3/s flash flood of

concentrated mud in the Bartang valley. Quantitatively, the demand on the system is defined as

the time available to evacuate the population, since the instant the flood could first be detected by the

uppermost flood sensor of the system. The capacity of the LSRMP that would be solicited in

the occurrence of the demand described above can be separated into two distinct components: first the time

needed to detect and signal the threat in the form of an alarm issued by the engineered system; then the time

needed to actually evacuated the local population to

safe zones. Thus the capacity meets the demand when the sum of

the alarm transmission and evacuation times is smaller than the flood travel time. In that light, the reliability

of the system in a crisis situation will be analyzed in this paper.

2.3. Crises

A crisis is defined as a situation, where “improbable

events are joined and produce an evolutionary and

interactive complexity in the performance of a system”

(Bea 2008). In other words, I will define a rapidly evolving crisis as a situation where unexpected events

force the system to function beyond the setting it was designed for, under critical time constraints, and

therefore compromise its ability to perform with the required quality. Therefore, the mere occurrence of the

big scale natural hazard considered in the demand analysis that will be conducted does not create a crisis

situation per se, because the system is designed to manage such a situation. The elements actually

triggering a crisis are additional concomitant events that compromise the ability of the system’s capacity to

meet the demand. Three principal types of such events can be mentioned:

- Unexpected “unexpectable” external events

attacking the system, typical example of which include meteorological surprises.

- Internal malfunctions within the system, which often turn out to be linked to human violations.

Such events include ill-performed maintenance operations

Fig. 9. Remoteness of MU7 control unit [27]

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/11

- Finally, instead of abruptly decreasing its

capacity, events may create a crisis by increasing the demand on the system. In the considered case,

such an event would involve an increase in the flood speed.

This paper is focused on human factors by rather

considering crisis-triggering events from the two first categories.

2.4. Evaluation Sources and Validation The sources to the analysis presented in this paper can

be classified in two categories: expert assessment and literature review. The redundancy of the information

between and among these two categories constitutes the validation system to the analysis. The principal

experts that were contacted are mentioned in the following list:

- Mr. Patrice Droz is the technical director of

STUCKY, Ltd, Switzerland. He was in charge of designing and implementing components A and C

of the LSRMP. Specifically, he has designed the organization and procedures of the early warning

and monitoring system and is thus knowledgeable

on its expected behavior. Mr. Droz was motivated to conduct the analysis.

- Mr. Kadam Maksaev is the head of the Usoi Department, the implementing agency. He is

currently in charge of the management of the LSRMP. Specifically, he is responsible of the

operation and maintenance of the early warning and monitoring system and manages the SCADA

and is thus knowledgeable on its current behavior. Although repeatedly contacted, Mr.

Maskaev did not respond positively to the analysis.

- FOCUS humanitarian is the Non Governmental

Organization that is in charge of the implementation and management of the

evacuation plan of the Bartang valley villages. They are thus knowledgeable on the local context

and the current behavior of the local population. Specifically, Mr. Mustafa Karim, the country

director was motivated to collaborate and provided contact with Mr. Abdulhamid Gayosov

for flood routing information and Mr. Rahim Balsara for information on local populations.

- Dr Robert Bea is an expert in Risk management at the University of California at Berkeley and is

thus knowledgeable on the methodologies to be employed in the analysis and evaluation.

The demand is assessed by collecting flood routing

data from a combination of reports from the industry

(Stucky Ltd 2007, United Nation Mission to Lake Sarez 2001, USAid 2006), of academic literature

(Schuster 2004), and of personal communication with knowledgeable experts (Droz, Karim and Gayosov

2008). Both components of the capacity were assessed by

generating crisis situations on the basis of concerns found in the literature (El Hanbali 2007), personal

communication (Droz 2008) and in relevant case studies (Bea 2008). Each situation was then matched

to the specific context of Lake Sarez by being validated or rejected the experts mentioned above. The

experts were then assessed on the probable behavior of the system in the event of the crisis situation that they

considered most risky in terms of likelihood and consequences. Subsequently, one or several possible

interactive crisis management strategies were generated and submitted to the expert’s validation.

Fig. 10. MU7 Flood sensor situation [7]

Barchidiv

Dam

MU7

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/12

II. Analysis

A. Demand 1. Baseline scenario As discussed above, a crisis only occurs when unexpected events compromise the ability of the

system to mitigate a given natural hazard with an expected quality. Thus, in order to define a crisis

situation, a non-crisis situation must be considered. There is therefore a need to consider a baseline natural

hazard scenario that ought to be correctly managed by the system if there were no crisis. In this study, we

will consider the natural hazard scenario that is attributed the highest probability of occurrence by the

expert assessment meeting mentioned in an earlier paragraph.

A massive earthquake of 0.5g acceleration (magnitude

7.3 on the Richter scale) occurs. The ensuing gravity

wave causes a structural disturbance at the dam’s toe,

modifies significantly its internal structure and

disturbs its seepage regime. The disturbance is such

that as the situation evolves, a piping phenomenon

occurs within the dam, whereby the pressing water

forms important underground canals. Eventually, after

a period of about two hours during which a strong

discharge drop can be observed downstream, these

canals reach the downstream slope of the dam and

cause a flash flood. The created flood is massive and

powerful enough (above 400m^3/s surplus to the

normal discharge, flowing at 40km/h) to cause

significant consequences on the downstream valley.

Furthermore, during the piping process the flow

would erode loose deposits and debris forming a

hyper-concentrated flow, whereby the volume of water

will progressively represent only one fifth of the total

flow volume, before being diluted in the confluence

forming the Bartang River.

2. Flood routing analysis 2.1. Flood routing simulation

Given the hazard described above, a flood routing simulation has been performed by Stucky Ltd, on the

basis of which the early warning system and evacuation plan were designed. The objective was to

identify the potential effects of a sudden flood in the Bartang Valley due to a sudden discharge from Usoi

dam. Therefore two flooding scenarios were simulated, where respectively 1000 m3/s and 5000

m3/s were reached in a period of six hours. These parameters were selected to simulate the timing of a

piping event in Usoi dam, where the flood is initiated by a disturbance in the internal structure of the dam

following an earthquake. The simulation was conducted using St Venant hydrodynamic equation,

with an approximated terrain roughness of 25 (Droz 2007).

According to simulation results, an increase of 10% in the flow rate can be expected in Barchidev 1.5 hours

after the beginning of the phenomenon. The maximum

discharge is expected after 6.5 hours, which corresponds to 30 minutes after the peak source

discharge is reached at the dam. Therefore, if we take into account the 20km distance separating Barchidiv

from Usoi dam, we can expect an average flood velocity of 40km/h

1.

2.2. Flood hazard mapping

Given the flood routing simulation results, flood hazard intensity maps have been established by Stucky

Ltd using the Swiss standards (fig 11), which take into account level and velocity of the flow according to the

following table. While the two uppermost villages of Barchidiv and

Nisur are partly located in low to medium danger zones, the hazard intensity mapping expects most of

the other villages of the Bartang valley to be entirely located in high risk zones for floods from 1000 m3/s.

Furthermore, considering the fact that in the absence

of an efficient risk mitigation system, an estimated 5000 human casualties would add up to the 4000

houses, 44 schools 180 community hospitals and 18000ha of cultivated land that would be destroyed in

the Bartang valley as a result of such floods from lake Sarez (Droz 2007), the implementation of a risk

mitigation strategy that would involve the evacuation of all the villages of the Bartang valley is considered a

necessity.

Fig. 12. 5000m3/s flood risk intensity mapping of

the village of Barchidiv.[7]

1 With the assumption that the flood is water. If the fact that the

flow is a hyper concentrated mud flood is taken into account, the velocity is much lower (but the flood much more destructive).

Water

height

or Water height x

velocity

Intensity

level

h< 0.5 m h x v < 0.5 m2/s low

0.5m < h < 2m 0.5 m < h x v < 2 m2/s medium

h > 2 m h x v > 2 m2/s high

Fig. 11. Swiss flood risk intensity standards. [7]

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/13

2.3. Quantifying the demand

The parameters resulting from the flood routing simulation described above enable the estimation of

the demand to be met by an effective risk mitigation system in the Bartang valley. The mitigation strategy

that is considered in the case of lake Sarez being an early warning system coupled to a population

evacuation plan, the demand parameter to be considered is the evacuation time (i.e. the time

available to alert and evacuate the population in a given village). The evacuation time being shortest due

to their location, the demand is highest on the villages of Barchidiv and Nisur, which will thus henceforth be

considered critical and focused on in this analysis. The village of Barchidiv is located 20 km downstream

of lake Sarez and has 30 households with 186 inhabitants. According to the assessment of the

geologists a flash flood from Lake Sarez would reach

the village in 23 to 27 minutes. In the case of the village of Nisur, a flood would reach the 40

households (242 inhabitants) of the village in a period of 31 to 38 minutes after being generated at the lake,

29 km upstream (Gayosov, quoted by Karim 2008) Considering the fact that the uppermost flood

detectors (MU7) are located 10 km downstream of the lake, the time between the moment the detectable

flood can be detected by the system and the moment it passes in the village of Barchidiv is 12 to 14 minutes,

and 16 to19 minutes for the village of Nisur (ibid). Therefore, an evacuation time of 12 to 14 minutes in

the village of Barchidiv is the critical demand to be considered in the present system. A failure to meet

this demand is thus considered as a sufficient failure condition.

B Capacity

1. Early Warning System

1.1. Baseline Procedure

The standard procedure planned for the system to

manage the baseline scenario described in the

preceding section is given as follows (Droz 2007).

The earthquake is detected by the strong motion

accelerometers in both the measuring units on the lake

shore (MU2) and at the dam toe (MU8), and the level

1 warning transmitted by satellite to the SCADA at

Dushanbe, which must ask for visual confirmation by

radio at the dam house. Once the earthquake is

confirmed, a visual inspection of the dam is conducted

from the dam house and awareness increased to detect

discharge drops, flash floods and high waves.

The strong discharge drop (of about half the normal

discharge) that could be caused by the dam toe

instability and the internal disturbance can then be

detected by visual observation and/or the river

gauging system located in the village of Barchidiv

(MU9), about 16 km downstream from the dam. The

detection of the discharge drop issues a level one

warning to the SCADA at Dushanbe. The awareness is

increased towards the detection of a flash flood.

Assuming it occurs less than two hours after the

detection of the discharge drop, the flash flood is

detected by the MU7 flood sensors located about 10

km upstream of Barchidiv at a flow rate of 400m3/s

(level 1 alarm). If a flow rate superior to 2000m3/s is

detected, a level three alarm is automatically given to

both the SCADA and the villages within two minutes.

The evacuation alarm is thus triggered in the villages

about 10 to 12 minutes before the flood reaches

Barchidiv.

If the flashflood had occurred more than two hours

after the discharge drop is detected, the villages would

already have been evacuated, as a level three

evacuation alarm is automatically issued by MU9

once the discharge drop has lasted two hours.

1.2. Crisis Situations

Several crisis situations with different crisis triggering events have been generated on the basis of the

concerns identified in the several documentation sources (Droz 2008, El Hanbali 2007, Bea 2008

personal communication).

- The first three situations are caused by direct and

indirect (i.e. maintenance related) human violations, which can be linked to existing

concerns on the future supply of funding and trained staff.

Situation 1

Due to insufficient funding and out-migration of

trained staff, a lack of qualified personal to

operate the system on a continuous basis is to be

noticed. Therefore, the currently sufficiently

trained operating personal has to be working on

longer shifts, which raises their level of stress and

fatigue.

The earthquake occurs around 4 a.m, and neither

the SCADA operator, nor the personal at the dam

house are able to react appropriately because of

stress, fatigue and concomitant health problem

(say a strong diarrhea). As a result, as the

baseline natural hazard scenario unfolds, no

human operator is available to manage the

system.

Situation 2

Due to the remoteness of their location and their

difficulty of access, no maintenance operation has

been performed on measuring units 7 (flood

sensor), 4 and 8 (strong motion accelerometers)

during the passed winter season that has been

particularly harsh.

As a result, at the occurrence of the earthquake in

mid April, the flood sensors and strong motion

accelerometers have not been accessed for either

routine maintenance or testing since mid

October.

Situation 3

Due to the decrease in funding and trained staff,

the overall global maintenance quality of the

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/14

system has decreased dramatically, except for the

alarm unit in the villages that are maintained in a

satisfactory way because of the strong local

involvement.

As a result, at the occurrence of the earthquake,

no maintenance operation has been performed on

any part of the system (except the village alarm

units) for a period of two years.

- The next three situations are caused by unpredicted unpredictable events whose

occurrence has a relatively low probability and high consequences. Such features have been

identified to be typical of crisis triggering events (Bea 2008).

Situation 4

Due to a huge snowstorm, all access to and from

the dam house is cut for two weeks at the

occurrence of the earthquake. As well, the outside

visibility is reduced to less then 10 meters for the

whole period.

As a result, no site access or visual observation

can be made for a period of two weeks following

the earthquake.

Situation 5

A stone avalanche triggered by the earthquake

destroys completely measuring unit 7. As a result,

the flood sensors at MU7 are not operational at

the occurrence of the flash flood.

Situation 6

A snow avalanche triggered by the earthquake

destroys completely the dam house.

- Finally, the two last situations illustrate specific concerns found in the literature (Schuster 2004,

Papyrin 2007) about the reliability of lake Sarez early warning and monitoring system.

Situation 7

Due to global warming, the hydro geological

setting of the area changes. Specifically, the

melting of the permafrost and the progressive rise

of the water level due to glacier melting may

change the natural hazard probabilities and

decrease the capacity of the early warning system

to mitigate the considered scenario.

Situation 8

A massive landslide on the unstable right bank

slope is triggered by the earthquake and

generates a massive tsunami that would not

overtop the dam, but destroy the dam house.

1.3. Crisis selection

Validation was sought on the eight situation described above by submitting them to an expert’s advice, who

was to qualitatively evaluate their probability of occurrence and consequences in terms of human lives

at risk, by describing them as “low”, “medium” and

“high”. Both Kadam Maskaev (the current chief

operator) and Patrice Droz (the system designer) were contacted and requested to evaluate the situations, yet

only Mr. Droz responded positively. The following section is thus solely based on Mr. Droz’s knowledge

of the system and its environment, as its designer.

Situation 1

The first situation involves the activation of a level

one emergency at the event of an earthquake. The procedure does not require any direct and immediate

operator intervention other than a visual evaluation of the earthquake impacts, conducted at dawn. If a

flashflood happened to occur before dawn, the flashflood detection and alarm system is fully

automatic and operational if it is not destroyed during the earthquake (situation 5). Therefore, although the

probability of an earthquake occurring at night in the

shift of an insufficiently trained operator is considered as “medium”, its consequences are considered “low”.

Situations 2, 3 and 5

Situations two, three and five all involve a malfunction of the MU7 flood detector, either because

of a lack of maintenance or its destruction during the earthquake. According to the expert, there is a

“medium” probability of a physical destruction of MU7 due to consequences of the earthquake (say a

stone avalanche). Yet, due to the difficulty of access of the device, combined with a decrease in funding

and staff training, the probability of a malfunction of MU7 due to a lack of maintenance (e.g. typically a

failure to periodically check its power supply) is considered “high”. The flood sensors located at MU7

are the only flash flood detection devices upstream from the village of Barchidiv and are thus a capital

component to the system’s ability to detect the flood early enough to allow a safe evacuation of the valley.

Yet, there is an advantageous configuration and a proper correlation with the MU9 measuring unit

located in Barchidiv that would allow the evacuation of most of the downstream population. However, if

the proper crisis sequence unfolds, the villages of Barchidiv and Nisur may not be evacuated in time.

This point will be pursued in a following section. The consequences of a malfunction of MU7 are thus

considered as “medium”.

Situation 4

The punctual lack of accessibility/visibility displayed in situation four, although likely to happen in this

elevated and mountainous area, does not compromise the direct efficiency of the evacuation alarm triggering

system that is entirely automatic and independent on the need of visual confirmation in the case of a flash

flood. Yet it may influence the evacuation time of the population and delay the rescue and/or observation

missions. This point will be considered in the assessment of the evacuation plan conducted in a later

section. On a side note, according to Mr. Droz, the only MI8 helicopter in Tajikistan that could transport

15 people crashed in March 2008, which may temporarily complicate the emergency access of the

area. The probability of such a situation to occur is

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/15

thus considered “high”, while the consequences on the

ability of the system to detect the danger and transmit the alarm is considered “low”.

Situations 6 and 8

Situations six and eight involve the complete destruction of the dam house due. The probability of

occurrence of situation six, which involves a snow avalanches is considered very close to zero by the

expert, due to the chosen location of the house. Situation eight involves the destruction of the dam

house due to a tsunami in lake Sarez. The very occurrence of the tsunami due to a massive landslide

of the unstable right bank slope of the lake in the event of an earthquake has been considered “very unlikely”

by the panel of experts in charge of the risk survey undertaken in the frame of the LSRMP. Yet if the

tsunami does occur, the destruction of the dam house

is possible. Indeed, the dam house was built by the Usoi department at an altitude 30 meters lower than

advised by Stucky Ltd, to « spare their men ».

Situation 7

Finally, situation seven involves possible effects of

global warming on the region, namely a melting of the

permafrost and an increase in river water levels due to

glacier melting. According to the expert, the melting of the permafrost will merely produce superficial

landslides and local bloc collapses, but no important big scale motions. Moreover, a hydrological survey

established in 2001 has shown that despite the global warming, no important modification in the natural

flow rates in the area has occurred in the last couple of years. Therefore, despite the quasi certainty of global

warming, its consequences on the early warning system of lake Sarez are considered “low”.

The expert opinion on the effects of the considered

crises on the system gives important insights on its functioning and behavior. The qualitative risk

evaluation linked to each situation is summarized in figure 13. According to the expert, the riskiest

situations concerning the alarm system are situations

two, three and five, which all involve a malfunction of the main flood sensors 10 km upstream from

Barchidiv (MU7). The destruction of MU7 will thus be considered, as the selected crisis-triggering

situation to be further investigated in the following section.

1.4. Crisis Analysis

A malfunction of MU7 having been identified as the

riskiest crisis triggering situation by the expert assessment presented above, the aim of the following

paragraph is to perform a deeper estimation of the effect of such a crisis on the functionality of the early

warning system. A quantitative likelihood estimation of such a crisis

was provided by Mr. Droz (2008). The likelihood of the destruction of MU7 as a consequence of the

earthquake was estimated at 5%, while the likelihood of a malfunction due to a like of maintenance was

estimated as high as 30%. These crisis likelihoods are high enough to deserve attention.

However, the effects of a MU7 malfunction are

mitigated by the presence of the downstream measuring unit MU9. Indeed, MU9 is very likely to be

working properly because of its location in the village of Barchidiv, which facilitates its access for

maintenance and decreases the probability of it being destroyed in the aftermath of the earthquake (the

villages being generally located in “safer” zones). This

measuring unit is equipped with a gauging unit to

detect the significant flow rate decrease that precedes a piping failure flash food. In addition and similarly to

MU7, MU9 is equipped with automatic flood sensors. Therefore, if the flash flood occurs after a period of

two hours of significant flow rate decrease, an evacuation alarm will be automatically triggered by

MU9. The response to such a scenario would thus not be affected by a malfunction of MU7.

Yet if the flash flood occurs within two hours after the flow rate decreases, it will only be automatically

detected when a 2000m3/s flow rate passes through the village of Barchidiv. According to Mr. Droz, the

system has been designed to trigger the evacuation

alarm within 2 minute from the moment the flood is detected. Therefore, in the worst-case scenario where

the flood is only detected at MU9, the evacuation is triggered about 15 minutes after having passed MU7

(i.e. the fifteen minutes needed for the flood to cover the 10 km separating MU7 and MU9. If no proper

action were taken to decrease this 15-minute delay in

Situation Description Likelihood Consequences

1 Earthquake at night on unprepared staff Medium Low

2 MU7 Flood sensors (FS) malfunction due to physical inaccessibility for maintenance

High Medium

3 MU7 FS malfunction due to long term decrease in maintenance quality

High Medium

4 No site access and no visibility High Low

5 MU7 FS destruction resulting from earthquake Medium Medium

6 Dam house destruction resulting from snow avalanche Low Low

7 Global warming effects Medium Low

8 Dam house destruction resulting from a tsunami. Low Low

Fig. 13. Expert situation selection.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/16

detection time, the two uppermost villages of

Barchidiv and Nisur would thus be threatened by lack of sufficient evacuation time. Mr. Droz estimates the

consequences of this delay in terms of human lives at about ten casualties. This low number can probably be

explained by the fact that both Barchidiv an Nisur are mostly located in low to medium zones, as revealed by

the risk mapping established from the flood routing simulations. Moreover, according to Mr. Droz the

villagers are likely to spontaneously evacuate to the safe havens in the event of an earthquake. This

assertion will be investigated in a following section concerning the evacuation plan.

1.5. Crisis management suggestion

However, given the configuration of the early warning system, I argue that the baseline delay imposed by

automatic reaction time of 15 minutes described above

can be decreased through a proper management of a MU7 malfunction crisis. Such a crisis management

involves a proper implementation by the SCADA operating crew of what Reason (1990, quoted by Bea

2008) describes as an Observe, Orient, Decide and Act (OODA) loop. In the case of the described crisis, an

efficient OODA loop can be described as following:

- Observe: The goal of the observation phase is to detect the

symptoms and abnormal system behavior that reveal the presence of a crisis. In the case of the

considered crisis, a level 1 (400m3/s) flood alarm from MU9 without the corresponding alarm from

the upstream MU7 flood sensors is a possible early sign of the crisis induced by a malfunction

of MU7.

- Orient: The goal of the orientation phase is to establish

and confirm a causality link between the observed symptoms and the crisis-triggering event. In the

considered crisis, the malfunction of MU7 has to be inferred by the operating crew on the basis of

the absence of the expected level one warning. This inference must ultimately be confirmed by

launching a MU7 (distant) testing protocol.

- Decide:

The goal of the decision phase is to take the appropriate decision on the basis of the

previously established causality links. The outcomes of an appropriate decision are to

mitigate the crisis and return the system to its normal functioning state. In the considered crisis,

the decision to evacuate the potentially threatened villages must be taken. In the uncertainty of the

actual occurrence of a flood (due to the acknowledged malfunction of MU7), such a

decision may not be easy to take, knowing that a “false” evacuation would be dangerous, costly

and could encourage a crying wolf effect (i.e. decrease the awareness of the population in the

event of future alarms). On the other side, the results of a not evacuating the villagers in the

event of a flash flood are obvious.

- Act: The decision taken in the previous step must then

be implemented, which in the considered case consists in manually triggering from the SCADA

the level three alarms in the threatened villages.

- Observe: Finally, the results of the crisis management

strategies must be observed and the decision taken to reiterate the OODA loop if the crisis is

not solved. In the considered case, a feedback and visual confirmation from the dam house staff

and/or the evacuated population are expected.

If the OODA loop is successfully conducted, an evacuation alarm can be given shortly after a

reasonable level one 400m3/s flow rate is detected in

Barchidiv, which, depending on the hydrogram of the flood, would leave enough time for a successful

evacuation of the threatened villages. This management strategy has been submitted to and

validated by M. Droz (2008), with the remark that its efficient implementation would be compromised in

the event of a too sharp hydrogram peak.

2. Evacuation plan

2.1. Baseline performance

In addition to triggering the alarms, the successful

mitigation of the natural hazard described in the baseline scenario involves the efficient evacuation of

the villages. The evacuation plan consists of the rapid evacuation of the villagers towards equipped and

maintained safety havens in the nearby mountains once the local alarm sirens are activated. A “Disaster

Response Team” is nominated in each village to organize and coordinate these evacuations, as well as

to supply and maintain the safety havens. This is done in collaboration with FOCUS humanitarian, a NGO

who works to raise awareness and provide training among the communities. In that frame, evacuation

exercises have been ordered in April 2007 by the Usoi department (the LSRMP management authority) on

the most critical villages of Barchidiv and Nisur. According to FOCUS humanitarian (Karim 2008,

Personal correspondence), an evacuation drill was conducted in the village of Barchidiv on April 19

th

2007 and yielded to an evacuation time of ten minutes, in which the population of 186 successfully

transferred to the safety haven located 500 meters above the village. On the next day in the down stream

village of Nisur, 17 minutes were required for the 242 inhabitants to go through the 800 meters to the

designated safety haven. Assuming the alarm is given by the early warning

system within the period of two minutes after the flood is detected at MU7 (see section II.B.1.1), these

evacuation times are hardly within the total of 12 to 14

minutes and 16 to 19 minutes respectively imposed by the flash flood demand on the villages of Barchidiv

and Nisur (see section II.A.2.2) . As a matter of fact, in the case of Nisur, the total capacity performance of

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/17

19 minutes (i.e. two minutes detection time and 17

minutes evacuation) does not meet the demand. Furthermore, it is important to mention that the

evacuation exercises have been conducted in favorable conditions (at day and in spring) and on a warned

population, to prevent “shocking” and “excessive stress” on the local population (Karim 2008, personal

correspondence). Yet several authors have shown (UrbanikII 2000, Duclos 1987, Aboelata 2004,

Graham 1999, Sime 1995, Colonna 2001) that the evacuation time performance of a community is

shaped by numerous factors that are not taken into account if the exercise is conducted on a warned

population. Such factors include the level of local awareness and compliance, the isolation of the

community in a crisis situation, environmental factors such as night time and bad weather, the lack of

sufficient warning time, and the congestion of the

evacuation paths due to panic, confusion and the lack of excess capacity of the local communication system.

Thus, given the limited excess capacity in the population evacuation performance times, and taking

into account the evacuation performance shaping factors mentioned in the literature, several crisis

situations were generated, whereby the ability of the evacuation plan to meet the required demand may be

compromised due to supplementary unfavorable factors added to the baseline exercise conditions in

which the evacuation time performances were measured.

2.2. Crisis situations

According to previous reports (Palmieri 1999) and

thanks to the awareness raising work of local NGO’s among the population, the lack of local knowledge and

perceived threat that has been mentioned in the literature (Duclos 1987, Colonna 2001) to be key

evacuation time performance shaping factors are not of concern in the present case. Moreover, the low

population density of the area also decreases the concern of evacuation delays due to the congestion of

evacuation routes (Sime 1995). Yet several other factors may increase the evacuation time to a point,

where the demand imposed by the flood exceeds the capacity of the system to evacuate the population in

time.

Thus, several potential crisis-triggering situations were generated on the basis the factors that were

found in the literature to affect the population evacuation time. The relevance of each scenario was

then tested trough their submission to an expert’s judgment. Being the FOCUS humanitarian staff

responsible of the LSRMP evacuation plan in the Bartang valley, M Rahim Balsara was here consulted.

He was asked to evaluate the likelihood (P()) of each scenario, as and its potential consequences (Cons) in

terms of human lives lost. For each situation, his evaluation and comments are given in a following

table.

- The first factor is to consider is the surprise effect (Colonna 2001). As Colonna states, “With no

announced warning, occupants might demonstrate

behaviors that could be dangerous under actual

emergency conditions ”. Therefore, even without taking into consideration the other aggravating

factors that will be mentioned in the following paragraph, the surprise factor alone may be of

significant importance and thus a potential crisis triggering situation:

Situation 1

The baseline scenario occurs without other

aggravating factor than the fact that the

evacuation occurs on a surprise basis.

Remarks P() Med The surprise effect can be

mitigated through the crisis mitigation strategy presented

further down, where training in the detection of “natural” early

signs will be suggested. Cons High Consequences have been

estimated by M Balsara to delay the evacuation time from

10 to 15-17 minutes for Barchidiv, and from 17 to 20-

25 minutes for Nisur.

Fig. 14. Situation 1

M. Balsara estimated the effect of the surprise factor on the evacuation time shown in figure 14,

with the comment that actual trainings or drill have not yet been conducted during winter. The

figures given above are thus rough estimates. Yet the fact is that the (top) values of 14 and 19

minutes, imposed by the demand to the villages of Barchidiv and Nisur respectively, are

exceeded.

- The next two situations may be linked to the surprise factor as well, whereby subsequent panic

or the lack of appropriate and available leadership may significantly affect the evacuation

performance (Duclos 1987, Colonna 2001):

Situation 2

A general panic situation occurs among the

population, as a result of the simultaneous

occurrence of the earthquake and the flood

sirens.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/18

Remarks P() Low Villagers are used to natural

hazards around them and

would most likely react and respond in a calm manner.

Cons Low Once the gravity of situation is realized, evacuation times

would not be impacted. These

villagers have received trainings and general education

related to these hazards as well as the alert system.

Fig. 15. Situation 2

Situation 3

Heavy casualties are to be deplored among the

population due to the earthquake. As a result,

the key members of the disaster response team in

the villages are missing when the flood sirens

ring.

Remarks P() Low Members of Community

Support Teams (CST) are

spread across the villages and unlike to be all affected

at the same time. Cons Low Dependence on CST for

evacuation is fairly low as general community

knowledge and trainings are at play during such crisis.

Fig. 15. Situation 3

- Furthermore, the obvious effect of external

environmental factors affecting orientation and movement (Graham 1999, UrbanikII 2000) are

taken into account in the two next situations:

Situation 4

The principal evacuation routes are severely

damaged due to snow or stone avalanches

triggered by the earthquake.

Remarks P() Medium Evacuation paths are

designed to be safe from these natural hazards

during the planning and design phase.

Cons High If this were the case, they

would need to pursue alternate routes, which

might delay reaching destination.

Fig. 16. Situation 4

Situation 5

The baseline scenario occurs at night in a bad

snowstorm. Therefore visibility is reduced below

10 meters in the whole area.

Remarks

P() Low Villagers are familiar with

the routes and use to severe weather conditions.

Cons Low Less likely to impact

evacuation times in a significant manner as they

are familiar with these routes and weather

conditions. Fig. 17. Situation 5

- Finally, the criticality of providing a sufficient warning time for the population to evacuate

(Graham 1999, Aboelata 2004) is illustrated in the two last situations, where the alarm could

simply not be transmitted in time by the early warning system.

Situation 6

The earthquake destroys the alarm stations in the

villages. As a result, no evacuation alarm is

given.

Remarks P() Medium These are wireless

systems connected via

satellite and less likely to

be affected with on the ground damage.

Cons High If these systems do fail to function due to unforeseen

circumstances, the evacuation times will be

significantly longer as the warnings would have to

be from a neighboring village or actual event

itself.

Fig. 18. Situation 6

Situation 7

Due to a decrease of the maintenance quality,

the alarm equipment in the villages is not in

function

.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/19

Remarks High System malfunction

probability is high as these

equipments are not managed and maintained

by Committee of Emergency Situation,

USOI department and villagers. Due to join

responsibility its upkeep, regular testing and

functioning High In case of system failures,

evacuation times will be

significantly longer.

Fig. 19. Situation 7

The qualitative risk evaluation linked to each situation

is summarized in figure 20. According to the expert, the highest risk with regard to the evacuation of the

population can be found in situations 4, where the evacuation routes are made useless as a consequence

of the earthquake; as well as situations 6 and 7, where the alarm is not given at all. Although the possible

occurrence of situation 4 would certainly be preoccupying enough to deserve further study, this

paper will focus on the crisis triggered by the failure of the alarm system to order the evacuation of a

village. Indeed, the malfunction likelihood of the early warning system (either at MU7 or at the village sirens

level) has been considered high by both consulted experts (Droz 2008 and Balsara 2008). The

importance of the potential consequences linked to the total absence of flash flood alarm in the Bartang

Valley being obvious, the risk linked to such a crisis is

thus maximal.

Situation Description Likelihood Consequences

1 Evacuation on a surprise basis Med High

2 General Panic Low Low

3 No surviving leadership Low Low

4 Very low visibility and extremely bad weather condition. Low Low

5 Evacuation path destroyed by the earthquake Medium High

6 Village alarm destroyed by the earthquake Medium High

7 Village alarm not in function due to poor maintenance High High Fig. 20. Expert situation selection.

2.3. Crisis Analysis

The risk evaluation by expert judgment has yielded to selecting the crisis-triggering situation where the

village alarm system does not function properly and fails to issue the evacuation alarm before the passage

of the flash flood. According to the expert, the likelihood of an alarm malfunction in the village of

Barchidiv or Nisur is “very high” that is, higher than 85%.

The malfunction of the village alarm units may result

in the passage of a super concentrated flash flood through unwarned villages. The consequences of such

a crisis in terms of human casualties are considered “medium” in Barchidiv but “high” in Nisur.

Yet, in a similar manner to the other crisis detailed earlier in this paper, I argue that a proper interactive

management of the crisis can mitigate the consequences of these malfunctions. Indeed, given the

failure of the engineered system to issue the expected early warning, the most obvious crisis management

strategy would consist in searching for alternate warning signs. In that light, two possibilities have

been identified and considered, in collaboration with M. Balsara (2008).

Reported alarm

The first strategy has been mentioned by M. Balsara as a response to one of the crisis-triggering situation

mentioned in the previous section (situation 6), and

consists of the evacuation message being transmitted

from a neighboring village. Considering the critical villages of Barchidiv and

Nisur, and given the remoteness of the location and the available transportation means, the alarm message

can be either transmitted via radio communication or by a running messenger. These possibilities have been

submitted to the expert’s judgment, in order to evaluate their probability of occurrence and success. A

warning is considered successful when the evacuation

of the village is completed before the passage of the flash flood.

Given the distance and lack of quick communication means in an earthquake aftermath, the expert

considered both the occurrence and the success of a warning transmitted via land communication

negligible. However, the chances of occurrence and success of a radio warning was considered high (60%-

85%). Thus, training the villagers to systematically transmit the warning signal to the neighboring village

by radio can mitigate the risk linked to a malfunction of the alarm equipment.

However, this strategy is based on the assumption that the alarm malfunctions are not correlated among the

villages. In other words, the occurrence of the malfunction in a given village does not affect the

likelihood of it occurring in a neighboring village as well. Yet the task of maintenance being currently in

the responsibility of the same entity (the Usoi department) in all the villages, a decrease in the

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/20

quality of the maintenance would affect all the

villages equally. The likelihood that a given alarm will not function is thus higher if the surrounding alarms

are also not functioning due to a lack of maintenance. Therefore, the reported alarm strategy would only

present itself as an efficient crisis management option if the procedure of warning the surrounding villages

by radio is formalized and the villagers subsequently trained. Furthermore, the task and responsibility of

maintaining the village alarm equipment should be outsourced to the villagers themselves. That would

decrease the correlation between the alarm malfunctions. Furthermore, attributing the alarm

maintenance task to the people these alarms are designed to save may increase the likelihood that the

maintenance is properly performed. Finally it would build local capacity and empower the communities,

which was one of the main goals of the project as

stated by the World Bank (see section I.C.3.2 and ElHanbali 2007).

Direct Observation

The second strategy has been suggested by Mr. Droz (2008), and is based on the capacity of the villagers to

notice and react to the chain of natural events that would precede the flashflood in the baseline scenario.

Namely, the life saving strategy consists in the reflex of the Community Support Teams in the villages to

spontaneously take the appropriate decision to conduct a safety evacuation, once a strong earthquake and/or

important flow rate variations in the river are observed.

A spontaneous decision of the villagers to evacuated the village shortly after the occurrence of the

earthquake, and thus before the occurrence of the flash flood, would remove the negative consequences of a

malfunction of the alarm system by giving more than enough time for a successful journey to the safety

havens. Such a reaction is thus à priori desirable. Yet, the occurrence and success of such a spontaneous

decision depends on a chain of likelihoods that frame the decision making process within the community. In

the same manner as in a preceding paragraph, these crisis management stages will be here described in the

frame of Reason’s (1990) OODA loop:

- Observation: The first likelihood concerns the

probability of these preceding phenomena to be observed and noticed by the villagers, despite

possible difficult environmental conditions resulting in a low visibility. In the considered

case, these phenomena are an earthquake and a strong river flow rate variation. The likelihood

that the villagers would notice these events is considered “medium” (40%-60%) and “very

high” (>85%) for the earthquake and the flow rate

variation respectively. Yet, the fact has to be

noticed that these probabilities depend greatly on local conditions, mainly environmental, that

could affect the detection ability of the villagers. - Orientation: An inference must then be made

between the detected phenomenon and the imminent threat of a flash flood. There is thus

certain likelihood that these events be properly interpreted by the Community Support Teams

(CST) that may then issue an evacuation order. This likelihood is considered “high” (60%-80%)

by the expert. - Decision: The evacuation order from the CST

must then be followed by the villagers, despite the absence of a tangible and unmistakable

evacuation signal, such as alarm sirens. Therefore, the likelihood must be considered that

the whole village will actually follow the

evacuation order from their fellow villagers of the CST in the absence of an alarm. According to the

expert, an issue can be found here in the absence of formalized back up village alarm system that

can be unmistakably activated and followed in case the main system fails to work. Such a system

can be as simple as a village gong, but must be implemented.

- Action: The evacuation order must then be transmitted to the entire population of the village

in a minimum time and, again, in the absence of an unmistakable alarm siren. However, if the

entire population is reached and warned, the chances of a successful evacuation are “very

high” (>85%) according to the expert. - Observation: Finally, the observation of the

effectiveness of the intended actions is critical. Specifically, in the present case, the piping

phenomenon and the resulting flashflood can occur up to hours after the triggering earthquake.

Therefore, it is critical that the entire population stays in the safety haven until the risk is

confirmed to be decreased to an acceptable level.

If the OODA loop is successfully conducted, the villagers themselves can give an evacuation alarm,

shortly after the occurrence of the early phenomenon that is, several hours before the occurrence of the

flood. Furthermore, this warning mechanism would be

totally unaffected by any of the hardware malfunction of the system that are considered here. Finally, if we

consider the OODA loops for the two considered warning signs (earthquake and flow rate variation) as

a parallel configuration of two series system, the overall chances of success estimated by the expert is

“high”, which provides internal validation to this strategy.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/21

III. Evaluation

A qualitative analysis of the robustness of the Lake

Sarez Risk Mitigation Project has been undertaken in this work. The evaluation of the robustness of the

LSRMP consisted in inquiring whether the capacity depicted by an expert assessment meets the demand in

the considered situations. In the absence of a crisis, the baseline demand of respectively at least 12 and 16

minutes is hardly met by the baseline capacity of respectively 12 and 19 minutes in the critical villages

of Barchidiv and Nisur. However, knowing that most of both villages are located in relatively safe zone, the

consequences of this lack of excess capacity have been estimated to less than 10 casualties (Droz 2008).

As a matter of fact, the LSRMP has lead to a significant decrease of the risk in the Bartang valley to

similar orders of magnitude to those nowadays

admitted for engineered dams (Droz 2007), which must be considered a success, given the remoteness

and lack of infrastructure of the area. Furthermore, the effect of most of the considered potential crisis

situations on the reliability of the LSRMP were considered “low” by the experts in terms of likelihood

or consequences. This denotes a sufficient robustness

of the system with regards to most of the considered

crises, due to its configuration, correlation, and local excess capacity.

However, the expert assessments have also revealed that the probability of occurrence of the riskiest crisis-

triggering situations are highly dependent on the quality of the maintenance of the infrastructure, for

both the engineered early warning system and the population evacuation plan.

In that light, given (1) the non-negligible level of natural hazard risk linked to Lake Sarez (fig. 5.) and

(2) the emitted concerns about the future staffing and funding of the device, as well as the decrease in the

quality of maintenance (section I.C.3.4. and expert assessments), a crisis similar to those that were

analyzed in this paper is likely to happen in a medium

to far term future, and will thus require an efficient interactive management as an absolute necessity to

mitigate its effect. In that light, the following recommendations are given

as possible paths to address this issue and thus limit the related risk through an optimized crisis mitigation

and management strategy.

IV. Recommendations

The following recommendations were issued on the basis of the theories and methods drafted by Professor

Robert Bea, from the University of California at Berkeley, in the frame of his research on interactive

crisis management strategies. The fact that Professor Bea based his work on 500+ accident cases (Bea

2008) provides itself an external validation to these recommendations. Additionally, an internal validation

in the specific context of Lake Sarez will be sought through the submission of this paper to the local

experts.

A. Proactive measures: towards a High Reliability Organization

The term “High Reliability Organization” (HRO) describes “organization that have operated nearly error

free for a long period of time” (Bea 2008). Such organization include nuclear aircraft carriers, nuclear

power plants or air traffic control system; the common characteristic of all these systems being the extreme

consequences of the occurrence of a failure. In that light, and given that its very purpose is to be reliable

enough to decrease the risk linked to the Usoi dam, it is my opinion that the LSRMP should strive towards

being a HRO as a proactive measure to prevent crises. This section is to underline HRO principles (Roberts

1990, Bea 2008) and characteristics that are particularly relevant to the LSRMP.

1. Safety oriented organizational culture

The first relevant common characteristic of an HRO is to place safety in the very root of their

organizational culture. In other words, such

organizations are primarily preoccupied by failures and crises, at every level of management.

Such state of mind is crucial among the actors involved in the LRSMP as it would allow a quick

detection and remediation of to-be crises, as well as a more efficient management of crisis that do

occur. In that light, the decision of the Usoi department to build the dam house at a lower

altitude than advised by the expert, in order to “spare their men” (Droz 2008) is a preoccupying

detail. Indeed, although not necessarily significant in terms of direct risk increase, this

decision may reveal a deeper and preoccupying mindset, where safety requirements are

compromised for an economy of labor, even before the operational phase of the project.

2. Open communication and extensive process

auditing

According to Merry (1998), a safety oriented

organizational culture involves, among other things, ongoing safety performance

measurements and openness of communications among the organization. Moreover, extensive

process auditing procedures and actions are

mentioned by Roberts and Libuser (1993) to be key aspects of a HRO. Therefore, the absence of

motivation of the operating authority to being audited is worrying at the very least. Indeed,

despite numerous attempts, no relevant answer has been obtained from the Usoi department in

the frame of this paper. Furthermore, Droz (2008) mentioned a decrease in the quality of the safety

reports issued by the organization, as well as a

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/22

break in the communication between himself (the

designer) and the current operators of the project.

3. Migrating decision making

Migrating decision making is mentioned by

Roberts (1989) to be another key aspects of HRO’s, whereby “authority is pushed to the

lower levels of the organization [and…] decision making responsibility allowed to migrate to the

persons with the most expertise to make the decision when the situation arises (employee

empowerment)” (Bea 2008). Yet in the case of the Usoi Department, most operational decisions

concerning maintenance (Droz 2008), real time management of crisis (Droz 2008) and training

(Karim 2008) are mentioned to be left to the person of Kadam Maksaev, the Usoi Department

head, in Dushanbe2. Such a decision making

scheme does not optimally attribute the available cognitive resources of the organization to

improve its safety and may eventually compromise its ability to efficiently resolve an

occurring crisis.

Therefore, maintaining the reliability of the LSRMP on a long-term basis would involve a deep shift in the

organizational mindset towards a safety oriented culture; a better adequacy between the decision

making power, responsibility and relevant knowledge; and the restoration of communications with potentially

useful parties outside the organization coupled to an inside motivation to be audited. In addition, several

other HRO characteristics can be mentioned to potentially contribute decrease the likelihood and

optimize the management of crisis in the case of the LSRMP. Such principles include a commitment to

resilience, an emphasis on training and selection, and a proper management of incentives, all of which will

be further described in the following paragraphs.

B. Interactive measures: staying awake!

The importance of properly managing the real time occurrence of a crisis, in order to limit its effect and

restore a proper functioning of the system, has been described at the beginning of this paper (section I.B).

Considering the selected crisis situations, a specific interactive crisis management strategy has been

suggested to minimize the time required to both warn and evacuate the population (sections II.B.1.5. and

II.B.2.3.). The goal of the following section is to browse the OODA loops of both strategies to identify

and summarize the main system characteristics necessary to their successful application.

2 Which is located at more than 400km of the Bartang Valley.

Furthermore, the unique available helicopter has been reported to crash in May 2008 (Droz 2008).

1. Early Warning System Crisis Management

Orientation

The symptom detection time depends on the proper functioning system’s sensorial abilities. In

the population evacuation case, these sensorial abilities are the villagers themselves that should

be able to quickly detect any potential crisis-forecasting event in their environment. This goal

is achieved by raising awareness through training and education. Fortunately, risk assessments

(Droz 2007) have revealed that the trigger of any natural hazard linked to lake Sarez is quasi

certainly a strong earthquake, which is rather obvious to detect. In the early warning system

case, the detection time is influenced by the flood sensors at MU9, as well as the ability of the

system to properly transmit the alarm to the

operations head quarters in Dushanbe, by avoiding unfavorably correlated components (e.g.

components in a parallel configuration that are threatened by the same source of risk). In the case

of lake Sarez this is done by the low correlation between the probabilities of failure of MU7 and

MU9 (section II.B.1.5). However, the symptom detection time also depends on an uncontrollable

characteristic of the “demand” on the system that is, the morphology of the flood. Indeed, the

strategy would only be effective for progressively increasing floods (Droz 2008), where sufficient

time is available between the passage of a 400m3/s flow rate to a level three 2000m3/s flow.

If the flood is too sudden, the available time is too small to effectively perform the prescribed

actions.

Orientation

The orientation mainly depends on the ability of

the involved persons to decode the symptoms and to attribute them to their actual cause or

consequence. This ability is described by Weick (1995) as “sensemaking” and is a critical

component to the success of the crisis management strategy.

Decision

The decision time depends on the ability of the

responsible persons to apply a clear and efficient decision protocol, whereby the decision power in

a given situation is attributed to the person that has the highest ability to do so. This ability

includes a deep enough experience and knowledge of the specific field context to be able

to appropriately visualize the situation (i.e. “to step in the victim’s shoes”), the ability to generate

and browse numerous decision options, and the ability to instantly be aware of the probable

consequences of any possible solution. In other words, the decision maker must be able to

exercise empathy, improvisation and mental simulation in a stressful and time limited situation

(Bea 2008).

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/23

Action

Finally the minimization of the action time requires a good coordination within the

communities and the crisis management crew, as well as the system’s ability to efficiently transmit

information to the village alarms.

Most importantly, one can notice the fact that the

success of the described strategy mainly depends on human and organizational qualities. It is thus once

again the responsibility of the top management authority to sustain these qualities by nourishing a

safety-based organizational culture that includes what

Weick et al (1998) call a “commitment to resilience”, which involves a formal support for improvisation in

crisis situations, and the active building of the organization’s cognitive resource. Yet this critical

aspect can only be achieved through an adapted training and selection policy.

2. Population Evacuation Crisis Management

The strategy suggested in the following section is

based on the several strategies suggested by or submitted to the several experts (M. Balsara and Mr.

Droz). It consists on achieving the goal of successfully evacuating in time the villages at risk through a triple

barrier strategy. The first barrier is a spontaneous evacuation of the population at the observation of

natural early warning signs. If the villagers fail to evacuate, the second barrier is the siren alarm from the

engineered warning system. If the villagers still fail to evacuate due to a system malfunction, the last barrier

is a reported alarm via radio communication from a neighboring village.

2.1. Spontaneous evacuation

Despite the presence of an engineered early warning

system, I argue that the success of the evacuation plan must primarily be based on the evacuees themselves

for the following reasons. First I believe that their own security must ultimately rely on themselves, rather

than on an unfamiliar and complex engineered system that is far from being infallible, as the experts’

assessment have shown. Secondly, I believe that the extremely valuable capital of local knowledge among

the villagers must be used to mitigate their risk situation. Finally, the possibility that the villagers

would spontaneously evacuate towards the mountains at the occurrence of an important natural hazard has

been considered by both MM. Droz and Balsara. Therefore, the training and formalization of such a

process through the OODA loop described in section II.B.2.3 is likely to increase its effectiveness. Each

step of this loop is considered in the following paragraph, with specific implementation

recommendations that were issued in collaboration with M Balsara.

- Observation:

The local capacity to observe and notice specific early warning signs may be increased through

training. According to M Balsara, it is important that the training be adapted to the multiple layers

of the population, including a children

sensibilization campaign in schools. Yet, most

importantly, it must keep the people aware to natural signs by helping to avoid the pitfalls of

overconfidence (i.e. being “used” to the danger”) and over reliance on the technology alone.

- Orientation:

The level of awareness of the Community Support Teams (CST) and their ability to interpret

and react to early natural signs must be insured through frequent checks and testing by an outside

implementing agency (e.g. FOCUS humanitarian or the Usoi Department).

- Decision:

The capacity of the CST to decide an evacuation in a crisis situation must be tested and improved

through frequent and targeted drills and trainings.

- Action:

Finally, a formalized protocol must be established to alert the whole village in case of an alarm

malfunction. Such protocol may involve the use of traditional alarm techniques such as a village

gong.

2.2. Village alarm system

The second barrier relies on the capability of the early

warning system to detect the danger and trigger the village alarm systems. This system and the linked

crisis management strategies and OODA loop have been described and detailed in sections II.B.1 and

IV.B.1.

2.3. Reported alarms

Finally, in the failure of the first two evacuation

strategies, the ultimate crisis management strategy on which to rely is the collaboration within the villagers

in a crisis situation (Balsara 2008). This solution was investigated in section II.B.2.3. The two principal

elements that were found to potentially increase the success likelihood of the strategy are here

summarized: - The reflex to warn the neighboring villagers at

the reception of a level 3 evacuation alarm must be formalized and specifically trained. Moreover,

villages must be provided with efficient and

adapted inter villages alarm transmission means. Such means may include radio communications,

but also more traditional device (e.g. visual signalizations) in the event of a radio

malfunction. - Finally, responsibility to maintain the elements of

the early warning system that are within the village must be given to the villagers. In addition

to decreasing the correlation between the village alarm systems, it would perhaps increase the

maintenance level of the device, given the villagers’ obvious motivation rationale. Finally, it

would increase the building and valuation of local capacity.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/24

C. Training: Empowering the human factor

Maintaining awareness, sensemaking, empathy, improvisation, mental simulation and coordination

capabilities among the human components of the system requires the active sustaining of the

organization’s cognitive resources, which must involve training and selection. Moreover, both

consulted experts have mentioned training and proper staff selection as the key measures needed to sustain

the long-term reliability of the system. Therefore, although the consulted experts have shown no

concerns about the current preparedness level of both the operators and the local population, the following

section aims at addressing the issue of training as “the cognitive skills developed for crisis management

degrade rapidly if they are not maintained and used” (Bea 2008). According to Bea (2008), training should

occur on three levels:

1. Normal Situations

First, the operators should be trained to the

system’s normal operation in order to encompass all the commonly performed tasks in their skill

based cognitive realm. In other words the result

of the training must be a skill-based performance of the routine tasks that does not mobilize

excessive cognitive resources. In the case of Lake Sarez training to normal operations involves the

proper maintenance of the hardware and infrastructure parts of the system, including the

critical maintenance operations on the flood sensors and the village alarm stations.

2. Abnormal Situations

Secondly, operators should be trained to handle abnormal situations, whereby known but

unexpected threatening event are imposed. Such training is to enable the prescribed restoration

procedure to be performed on a rule-based basis, rather than on a slower knowledge base. As a

result, the OODA loop reasoning, which would occur in the management of an untrained

situation, could be cut short to bypass the orientation and decision stages

3. In the case of the

early warning system, such training rule would include the direct launch of a testing procedure on

MU7 if a level one-flood alarm were solely given by MU9. Furthermore, the local population can

be trained toward a rule based reasoning through multiple evacuation exercises and the perfect

knowledge of the evacuation procedure.

3. Unbelievable situations

Finally, training to extraordinary “unknown unknowable” situations is crucial to sharpen the

people’s cognitive skills to properly react when a real time crisis management is needed. Indeed,

being exposed to unexpected and unexpectable situations is the best manner to train the cognitive

qualities that are mentioned above to be critical to

3 The procedure to adopt is directly linked to the observed symptoms by a trained rule.

accelerate the OODA loop sequence. In the case

of the early warning system, such a training may for example be focused on the need to manually

trigger the evacuation alarm, in a high cognitive stress situation, as soon as a major irreversible

malfunctions are detected and confirmed (such as the malfunction of MU7 in the example started

above). Furthermore, training to unexpected situations should as well be applied to the local

population, through unexpected evacuation drills and the constitution of the reflex to flee towards

the safety haven in any serious suspicion of a threatening event. The National Fire Protection

Agency (NFPA) Life Safety Code (2001) states that “Fire is always unexpected. If the drill is

always held in the same way at the same time, it

loses much of its value. When, for some reason

during an actual fire, it is not possible to follow

the usual routine of the emergency egress and

relocation drill to which occupants have be- come

accustomed, confusion and panic might ensue.

Drills should be carefully planned to simulate

actual fire conditions.” Due to their unexpected nature and the necessity to evacuate the

population in a limited given time, crises on the LSRMP must be managed in a similar manner.

Yet, as mentioned in a previous section, the Usoi Department applies the policy of not conducting

any unexpected evacuation drill on the local population, exactly to avoid the panic and

emotion that would sharpen the cognitive capabilities of the people.

In addition to the crucial important of training, a

proper staff selection process is capital to insure that the key powers and responsibilities are attributed to

the right persons that is, people possessing the needed skills and cognitive potential be efficient in a crisis

situation. Yet such a selection can only be undertaken if the proper incentives are put in place to attract and

conserve sufficient talented and trained staff.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/25

V. Conclusion A qualitative assessment of the reliability of the Lake Sarez Mitigation Project (LSRMP) and its interactive

management has been conducted on the basis of expert judgment. As a result, several potential

improvement areas have been put into light and recommendation issued. Such recommendations

include the need to strive towards High Reliability Organization (HRO) standards through a safety

oriented organizational culture, improved communication and decision making migration; the

need to improve interactive crisis management skills such as maintaining awareness, sensemaking,

empathy, improvisation, mental simulation and coordination capabilities; and the key importance of

training, in normal, abnormal and extraordinary conditions.

The goal of a qualitative assessment was precisely and

voluntarily limited to putting into light these potential improvement areas, yet a quantitative analysis of the

situation would be the logical next step in the documentation of the reliability of the LSRMP in a

crisis situation. However, more theoretical and methodological research would be then required,

notably on a way to quantify the effect of cognitive and environmental factors (such as surprise and night

time) on the evacuation time of the villages.

Furthermore, it is evident that the implementation of

all the suggested measures will only effectively take place if the proper framework of incentives and means

is provided. Indeed, incentive is needed at the operators level to prevent the exodus of skilled staff

and thus insure the durability of the system through a proper maintenance, as well as sustained cognitive

skills and local knowledge. Furthermore, incentive is needed at the management level to effectively promote

a safety oriented organizational culture and lead the organization towards a HRO. Finally, incentive is

needed at the government level to provide the project with sufficient means to achieve its goals, once

international funding has ceased. Although a variety of means can be thought of to create incentives based

on the human wants and needs, including regulation and peer recognition, the incentives that have been

recognized to be most efficient (Bea 2008) in the

implementation of reliability measures on a long term perspective are rather positive (i.e. rewards rather than

punishment) and financial, and often rooted in a societal increase of safety and reliability requirements.

Therefore the most efficient incentives to provide long term funding sustainability to the LSRMP are likely to

be found in the realms of international commerce and politics, and public opinion. Thus, a proper financial,

social and economical study of the long-term sustainability of this endeavor is yet to be undertaken.

As a conclusion, the main principle to bear in mind at

the end of this paper is the importance of not relying on the technology alone, while managing risk

mitigating system. Technology can fail, especially in the given context of hard environmental conditions

and decreasing maintenance quality. As a matter of fact, technology will fail; and by failing will create a

crisis if no robust measure is taken. Moreover, the engineered alarm system shows in the present case a

very limited excess capacity of a few minutes at most, while the first signs of the phenomenon (i.e. the

earthquake) occurred and could thus be interpreted with a time margin of several hours. Therefore,

although the engineered early warning system is here clearly necessary to insure the warning of the entire

valley with an acceptable probability in the occurrence

of a flashflood, the potential effect of the grass-root capacity building of the communities through

intensive training and education, and based on their extensive knowledge of the location, must not be

neglected and may result to a complementary and much more efficient means to mitigate the risk in the

Bartang valley.

Finally and more fundamentally, it is mostly important to acknowledge the impossibility to entirely control

the risk linked to natural phenomenon of the magnitude of Lake Sarez; and perhaps someday, in the

dreaded event of a catastrophe aftermath, have the humility to consider the perhaps only reasonable

reactive approach to a near miss, given the amplitude of the consequences: withdrawing, migrating and

resettling, and ultimately leave to Mother Nature the conclusion of the story.

VI. Acknowledgement

The author is indebted to the consulted experts, Mr. Patrice Droz (Stucky Ltd, Switzerland) and Mr. Rahim Balsara (FOCUS Humanitarian, Tajikistan) for their priceless contribution. Furthermore, Ms Michèle Itten, Mr. Mustafa

Karim, Dr Robert Bea and Mr. Rune Storesund are acknowledged for their precious advice.

.

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/26

VII. References

[1] Aboelata M et al, “Transportation model for evacuation in estimating dam failure life loss”, Proceedings of the Australian Committee on Large Dams Conference, 2004

[2] Balsara R, Personal Correspondence, April 2008 [3] Bea R, “Human and Organizational Factors: Quality and Reliability of Engineered

Systems”, Course Reader, 2008 [4] Bea R, “Managing the Unpredictable”, ASME feature article, 2008 [5] Colonna G, “Introduction to Employee Fire and Life Safety », National Fire Protection

Association, 2001 [6] Droz P, “Abstracts Related to Risk Mitigation of the LSRMP Final Report”, Stucky

Ltd, 2007 [7] Droz P, Spasic-Gril L, “Lake Sarez Risk Mitigation Project: A Global Risk Analysis”,

ICOLD, 2006 [8] Droz P, Personal Correspondence, February - April 2008 [9] Duclos P et al, “Community evacuation following a chlorine release”, Mississippi

Disasters 11 (4) , 1987 [10] El- Hanbali U, “Implementation Completion and Results Report […] for Lake Sarez

Risk Mitigation Project”, World Bank, 2007 [11] “Flood on Stava valley”, Seconds from Disaster, National Geographic Channel, 2008 [12] Gayosov A, Personal Correspondence, April 2008 [13] Genevois R, Ghirotti M, “The 1963 Vaiont Landslide”, Giornale di Geologia

Applicata., 2005 [14] Graham W, “A Simple Procedure for Estimating Loss of Life from Dam Failure”,

Dam Safety Office, US Bureau of Reclamation, 2001 [15] Karim M, Personal Correspondence, April 2008 [16] Merry M, “Assessing the Safety Culture of an Organization”, J.Safety and Reliability

Society, 1998 [17] Life Safety Code, National Fire Protection Association, 2001 [18] Palmieri A, “Project Upraisal Document […] for Lake Sarez Risk Mitigation Project”,

World Bank, 2000

Human & Organization Factors: Quality & Reliability of Engineered Systems-Term Project CE290/2008/27

[19] Palmieri A, “UN/IDNDR Interagency Risk Assessment Mission to Lake Sarez”, UN-

OCHA 1999 [20] Papyrin L, “Myths on Lake Sarez risk mitigation and realities”, Ferghana information

agency, 2007 [21] Risley et al, “Usoi Dam Wave Overtopping and Flood Routing in the Bartang and Panj

Rivers, Tajikistan”, USAid Water Resource Investigation Report, 2006 [22] Roberts K, “New Challenges in Organizational Research:High Reliability

Organizations”, Industrial Crisis Quaterly, 1989 [23] Roberts K and Libuster C, “From Bophal to Banking, Organizational Design Can

Mitigate Risk”, Organizational Dynamics, 1993 [24] Roberts K, “Some Characteristics of High Reliability Organizations”, Organization

Science, 1990 [25] Schuster R, Alford D, “Usoi Landslide Dam and Lake Sarez, Pamir Mountains,

Tajikistan”, Environmental Engineering and Geoscience, 2004 [26] Sime J, “Crowd Psychology and Engineering”, Safety Science, 1995 [27] “Tajikistan Lake Sarez”, Fela Planungs AG Website, 2004 (http://www.fela.ch)

(02/25/08) [28] UrbanikII T, “Evacuation time estimates for nuclear power plants”, Journal of

Hazardous Materials , 2000 [29] Weick K, “Sensemaking in Organizations”, Thousand Oaks, CA:Sage [30] Weick K et al, “Organizing for High Reliability: Processes of Collective

Mindfulness”, Research in Organizational Behavior, 1998.