hotspots: failure cascades on heterogeneous critical...

HotSpots: Failure Cascades on Heterogeneous CriticalInfrastructure Networks

Liangzhe Chen∗, Xinfeng Xu∗,◦, Sangkeun Lee+, Sisi Duan+Alfonso G. Tarditi+, Supriya Chinthavali+, B. Aditya Prakash∗

∗Department of Computer Science, Virginia Tech◦Department of Physics, Virginia Tech+Oak Ridge National Laboratory

{liangzhe, badityap}@cs.vt.edu, [email protected], {lees4, duans, tarditiag, chinthavalis}@ornl.gov

ABSTRACT

Critical Infrastructure Systems such as transportation, water andpower grid systems are vital to our national security, economy, andpublic safety. Recent events, like the 2012 hurricane Sandy, showhow the interdependencies among di�erent CI networks lead tocatastrophic failures among the whole system. Hence, analyzingthese CI networks, and modeling failure cascades on them becomesa very important problem.

However, traditional models either do not take multiple CIs orthe dynamics of the system into account, or model it simplistically.In this paper, we study this problem using a heterogeneous net-work viewpoint. We �rst construct heterogeneous CI networkswith multiple components using national-level datasets. �en westudy novel failure maximization problems on these networks, tocompute critical nodes in such systems. We then provideHotSpots,a scalable and e�ective algorithm for these problems, based on care-ful transformations. Finally, we conduct extensive experiments onreal CIS data from multiple US states, and show that our methodHotSpots outperforms non-trivial baselines, gives meaningful re-sults and that our approach gives immediate bene�ts in providingsituational-awareness during large-scale failures.

1 INTRODUCTION

Modern critical infrastructures (CIs) such as Energy, Water, Com-munication etc are mutually dependent in such complex ways. Forexample, the energy network depends on the water network fortreatment, dissemination, and disposition and the water network re-lies on the energy network for energy production [28]. Indeed, suchdependencies exist across multiple CIs, where hazards/failures af-fecting one CI network can potentially propagate to other networksand disrupt the functionality of the entire system.

�e 2003 Northeastern US blackout [5] is a perfect example de-picting the risk, where a fault in 3 transmission lines caused amassive blackout impacting multiple CIs. Initially, the massive

�is manuscript has been authored by UT-Ba�elle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. �e United States Government retainsand the publisher, by accepting the article for publication, acknowledges that theUnited States Government retains a non-exclusive, paid-up, irrevocable, worldwidelicense to publish or reproduce the published form of this manuscript, or allow others todo so, for United States Government purposes. �e Department of Energy will providepublic access to these results of federally sponsored research in accordance with theDOE Public Access Plan (h�p://energy.gov/downloads/doe-public-access-plan).,© ACM. .DOI:

power outage caused blackouts across several U.S. states. �e mas-sive power outage e�ects cascaded to drinking and waste-watertreatment systems, communication, transportation, and a numberof major business and food services. Nearly 50 million people werea�ected, causing huge economic losses exceeding $5 billion. Aboutevery 4months in the US, a major blackout occurs, a�ecting onemil-lion or more people [24]. �is increased vulnerability of single CInetwork can be easily ampli�ed due to the interdependencies. Sim-ilar examples abound such as in Hurricane Sandy where cascadingand escalating failures caused severe impacts to the infrastructuresand slowed down the recovery tremendously [10]. Hence, modelingand simulation of interdependent CI systems (CISs) has become animportant �eld to realize the idea of ‘smart cities and nations’.

Since the 2003 NE blackout, FEMA (Federal Emergency Man-agement Agency), other federal agencies such as DOE-OE, DoDand several national labs are constantly working towards improv-ing wide-area situational awareness and developing sophisticateddecision support tools that can help in predicting propagating im-pacts due to an extreme event like hurricanes, wild�res etc [2]. Fore.g, Oak Ridge National Lab (ORNL) developed detailed hurricaneoutage models to predict which substations will be impacted, andwhat the downstream implications of these failures are even beforea hurricane hits the land [8]. Such situational awareness can assistmitigation planning with the available resources, and support theconstruction of more resilient systems for future.

To this end, identifying critical and vulnerable nodes and linksof these interconnected networks that can cause maximum dam-age is very important so that the system reliability and e�ciencycan be improved by monitoring and protecting them [22, 26]. Forexample, based on topological analysis of the NE American grid,it was identi�ed that nodes with a high degree are more impor-tant than others [6]. Centrality indexes based on betweennesswere de�ned by several researchers to analyze the vulnerabilityof power systems [19]. Similarly from a data-mining view point,a few recent studies have tried to model failures in infrastructurenetworks [13, 14]. However, all these methods either work singleCIs, or do not take into account any dynamics of the system, ormodel it very simplistically (like considering just one-step failures).

In this paper, a collaboration between computer scientists andpower engineers, we broadly approach this problem using heteroge-neous networks. We unify various CI systems by �rst constructinga 5-component heterogeneous network constructed by combiningvarious individual CIs. Looking at the system as a heterogeneousnetwork gives many advantages. We then study the problem of �nd-ing the k critical nodes (the ‘hot spots’) in the power grid network,

, , L. Chen et al.

the failure of which would cause the maximum failures across theCI networks. We design a novel and an e�ective way of predictingnon-linear interaction among the nodes the result of which maynot be visible through a static structural analysis.

Our contributions are:(1) We construct heterogeneous networks from real CI datasets

and develop a novel tractable cascade model F-Cas.(2) We develop an e�cient and e�ective algorithm HotSpots

to �nd the most critical nodes in the CIS using F-Cas,whose failure can cause maximum damage.

(3) �rough our extensive experiments and case studies onunique large real datasets at ORNL, we show that our ap-proach has immediate bene�ts to CI analysis, thatHotSpotsoutperforms non-trivial baselines and e�ectively �nds thevulnerable nodes and the results from HotSpots helpswith real world situations.

2 OUR FORMULATION AND SETUP

Here, we �rst introduce the �ve CI components we consider inthis work and how we interlink them to a heterogeneous networkG. Second, we propose a failure cascade model F-Cas on G thatcaptures the interdependencies between di�erent components inG. And �nally we formally de�ne our problems.Preliminary: IC Model. Given a weighted directed network, thepopular Independent Cascade (IC) model [20] describes the spreadof a contagion (idea/in�uence etc.) over it. Once infected/activated,each node gets one chance to activate its neighbors in the next timestep with probability equal to the connecting edge-weight. �ecascading process starts with an initial active ‘seed’ set, and endswhen there is no new activation.

2.1 Network Construction

Interpretation and conversion of a set of GIS (geographic informa-tion system) datasets into a heterogeneous network is a crucial butchallenging task. With no systematic tools available, researchersneed to understand geographical objects and their relationshipscarefully, and write conversion scripts for every di�erent analyticpurposes. To avoid such ad hoc proprietary data processing, wedevelop and utilize a generic reusable urbannet-toolkit to system-atically construct CI heterogeneous networks for our analysis.

�e urbannet-toolkit we design contains four components.shp2csv converts shape�le data (a prevalent data format used inGIS which is not suitable for network analysis) to csv �les; csv2netconstructs node lists or edge lists from di�erent shapes (POINT,MULTIPOLYGON, MULTILINESTRING); if needed, net-simpli�er

simpli�es the networks by removing redundant nodes that are usedto depict the shape contour; and �nally, net-linker interconnectsdi�erent CI networks based on user-speci�c criteria.

Speci�cally we select �ve important components from the HSIPGold data [1] and EIA data [3] from the power system, and thenatural gas system (as shown in Tab. 1). Among them, powerplants, substations, and natural gas compressors are facilities with-out connections among themselves. For example, a power plant donot directly connect to another power plant, the connections arethrough other types of facilities. While in the transmission networkand the pipeline network, each node represents a connection point

between two transmission lines or pipelines, and the link betweentwo nodes represents the actual transmission lines/pipelines. �esecomponents contain a natural support chain, where the powerplants use the natural gas as fuel to generate electric power, thetransmission nodes deliver the power to substations, the substationsdistribute power to natural gas compressors, and �nally naturalgas compressors help deliver natural gas to power plants throughthe pipelines. Note that there are di�erent types of power plantswhich use di�erent fuels, here we only consider those which usenatural gas as fuel.

InfrastructureType NodeType Descrip4on

Power

Electricalpowerplants(g)

Generateelectricalpowerwhichistransmi?edtosubsta4onsthroughthetransmissionnetwork.

Transmissionnodes(t)

Moveelectricalpowerfrompowerplantstosubsta4ons.

Electricalsubsta4ons(s)

Transformvoltageanddistributeelectricalpowerstoconsumers.

NaturalgasNaturalgascompressors(c)

Increasethepressureofagastotransportitthroughpipelines.

Pipelines(p) Transportnaturalgastoconsumers.

Table 1: Summary of the �ve components in G.

To realize such a support chain in the system, we create interlinksbetween di�erent components in the following ways.Substations are connected to the nearest transmission node sinceit gets electrical supply from it. Each substation is also connectedto the natural gas compressors within its service area to capturethe fact that it provides power to these local facilities (service areasare non-overlapping, so each natural gas compressor is connectedto only one substation).Power plants are connected to the nearest natural gas pipelineand transmission node, since they get fuel from the pipelines andoutput power through the transmission network.Natural gas compressors are connected to the nearest pipelineto capture the fact that the �ow and the pressure of the natural gasalong these pipelines depends on the compressors.

We summarize the network structure in Fig. 1. Our �nal directedheterogeneous graph is G (V ,E), where V = {Vд ,Vt ,Vs ,Vc ,Vp } con-tains all nodes in the �ve CI components; and E = {Et ,Ep ,Einter }contains the edges in the transmission network, pipeline network,and all the interlinks we created above. �e directions of edges areeither indicated from the data themselves, or from the feed-supplyrelation we introduce for creating the interlinks.

2.2 Failure Cascade Model F-Cas

�e CI network system is vulnerable to potential failure cascades,as localized failures may get ampli�ed to system-wide levels. For ex-ample, the failure of nodes in the transmission network would forcere-routing of the power �ow and cause overload, or near-capacityoperations of other transmission lines, with an increased proba-bility of failure of additional nodes. With widespread, sca�ered,power interruptions, natural gas compressors may eventually losepower, thus a�ecting gas-fueled power plants, further reducing theoverall generation available to support the load.

HotSpots: Failure Cascades on Heterogeneous Critical Infrastructure Networks , ,

…�

PowerPlants�

TransmissionNet�

Substa4ons�GasCompressors�

GasPipeline�

Loopoffailurecascade�

Figure 1: Interconnections and structure of G.

Such failures are very complicated to model: state-of-the-artsupercomputers utilizing sophisticated parallelization algorithmsrequire tens of days to perform accurate simulations just for thepower system1. �ese are intractable for analysis and also hardto interface with other tools. Hence we propose a failure cascademodel F-Cas which makes simpli�ed yet realistic enough assump-tions. �e goal and the challenge is to capture the most importantunique dynamics quickly, to estimate failure cascades, and thenextract critical nodes. See Alg. 1. Overall, we �rst simulate the fail-ure cascade caused by overloads within the transmission network.Based on such failures, we detect, for each other type of node, ifthey would fail until there are no new failures (this process maycontinue for several iterations or ‘loops’ until convergence). Next,we de�ne the failure conditions for each CI component in F-Cas.Substations fail when they have no path in the transmission net-work to an active power plant, due to the lack of power.Natural gas compressors fail when their associated substation(from which it gains power) fails.Power plants fail when any of the natural gas compressors itconnects to fails, due to the lack of fuel.Pipelines serve as the connection between natural gas compressorsand the power plants, they do not depend on other facilities andhence we assume they would not fail during the failure cascading.In reality, if some pipelines are damaged by some natural disaster,they can be removed from our analysis in advance.Transmission nodes may fail due to any overload resulting frompower re-routing in the transmission network caused by the failureof other transmission nodes (the power has to go through otherroutes which increases the probability of an overload of other trans-mission nodes). To capture such failure cascades due to overloading,we propose two Independent Cascade (IC) style models.

Trans-naive: When a transmission node fails, its children inthe transmission network (the nodes that consume power from it)would have to gain power from other nodes. �is increases thechance of overloading of other nodes. So in this model, we simplyassume that when a node fails, its ‘co-parents’ (nodes that share acommon child node) have a certain probability to overload. Usingthis assumption, we �rst identify the co-parent nodes according tothe transmission network, and then create a new co-parent network(where two transmission nodes are connected if they are co-parents1h�p://www.nrel.gov/continuum/analysis/ergis.html

Algorithm 1 A simulation process for F-CasInput: G , a seed node set S , Trans-real (or Trans-naive)Output: �e set of failed nodesW .1: W = S2: Run Trans-real/Trans-naive simulation to �nd the set of transmission

nodes that fail (add them toW ).3: while |W | is increasing do4: for each substation s < W , compressor c < W , and power plantд <W do

5: Check if it fails according to its failure condition (see Sec. 2.2)6: Add the corresponding node to W if it fails.7: ReturnW .

in the original network). �e edge-weights will be:

ei j =

{c if ti and tj share a child0 otherwise (1)

where c is some constant probability/weight, which represents theprobability of a node failing its co-parents. �e failure cascadeprocess in the transmission network can be thought of as an ICmodel on a co-parent transmission network.

Trans-real: We make a more realistic assumption here: the prob-ability of a transmission node’s failure would be higher when theloading operating condition (e.g. power transferred) is closer to anestablished engineering limit (here referred to as the node powercapacity) [29]. Based on this assumption, we design the followingedge weights for all edges in the transmission network, represent-ing the probability of node tj ’s failure given the failure of its parentnode ti :

ei j =

∑x ∈Cs (Par (tj )\ti ) Load (x )∑x ∈Par (tj )\ti Capacity (x )

(2)

where Par (tj ) are the parents of tj , Cs (Par (·)) are the union of thechild nodes of parent nodes in Par (·). Note that unlike Trans-naive,the above edge weights are de�ned directly on the edges in thetransmission network instead of an additional co-parent network.Basically we look at the parents of tj (except ti ), and calculate theratio of the total load from their children, with the capacity theparents have. If the ratio is closer to 1, the operating loads of theparents are closer to their engineering limits, and hence it is morelikely that they fail to provide enough power to tj .Novelty: Note that our �nal failure cascade model F-Cas is a combi-nation of all the di�erent types of failures mentioned above, whichcannot be simply represented by popular cascade-style models instandard literature [15, 20]. For example, consider the the path-based failure condition of a substation. In typical cascade modelssuch IC, the failures are passed from one failed neighbor to another(through the connecting edge). However in F-Cas, a substationmay fail due to a non-local failure of some transmission node whichmay be far away. Hence the typical in�uence-analysis algorithmscannot be directly applied for our problem, and novel techniquesare needed.

2.3 Problem De�nitions

As mentioned in the introduction, identifying critical nodes thatcan cause maximum damage is important for improving the system

, , L. Chen et al.

reliability and e�ciency. Hence given the cascade model, we for-mally de�ne the following failure maximization problems to �ndsuch critical nodes.Problem 1 (Max-Sub)

Given the heterogeneous network G, the failure cascade modelF-Cas, and k .

Find the best set S∗ of k transmission nodes to fail, s.t the ex-pected number of �nal failed substations are maximized, i.e.

S∗ = argmaxSE[#s |S] (3)

where #s represents the number of substations that would even-tually fail given the initial failure of S . Note that in Max-Sub,we select nodes based on the failure of substations because theloss/failure of an electrical substation directly leads to the powerblackout of a region. In Max-SubBus de�ned below, we furtherextend Max-Sub by adding another component into our target.Problem 2 (Max-SubBus)

Given the heterogeneous network G, the failure cascade modelF-Cas, and k .

Find the best set S∗ of k transmission nodes to fail, s.t the ex-pected number of �nal failed substations, and the transmission busnodes are maximized, i.e.

S∗ = argmaxSE[#s + #t |S] (4)

Similar as inMax-Sub, #t represents the number of transmissionnodes that would eventually fail given the initial failure of S .

3 OUR METHODS

�e challenges of solvingMax-Sub andMax-SubBus are two-fold.First, as described before, the failure does not necessarily cascadelocally from one node to its neighbor. For example, we need tocheck the entire transmission network to decide if a substationfails or not, which is a very expensive operation. Second, failuresloop through di�erent components before convergence, makingit harder to analyze. In fact, we prove that both Max-Sub andMax-SubBus are NP-hard.

Lemma 3.1. Max-Sub and Max-SubBus are NP-hard.

�is lemma can be proved by reducing from the Set Cover prob-lem (detailed proofs for all our lemmas are omi�ed for space2).Because of the di�culty of the problems, we �rst a�empt to solvethem in a simpli�ed scenario where the failure cascade does notform a loop. In such a scenario, instead of usingG , we useG ′ whichis a subgraph of G that only contains three components: powerplants, transmission bus nodes, and substations. When only con-sidering these three components, the failure of a substation cannotfurther induce failure of power plants, i.e. the failure spreads inone direction without forming any loop. In the following, we solveMax-Sub and Max-SubBus under such a simpli�ed scenario �rst,and then propose our algorithms for the original problems wherethe failure cascade forms a loop.

2Detailed proof and additional results in appendix: h�ps://goo.gl/QiTJMQ

3.1 Max-Sub and Max-SubBus without loop

In this section, we considerG ′ with only three CI components, andthe failure cascades in one direction from the transmission networkto the substations.

In both Max-Sub and Max-SubBus, we need to optimize onthe expected number of failed substations, which can be wri�en as

E[#s |S] =∑si

Pr(si |S ) (5)

where Pr(si |S ) represents the probability of si ’s failure given thenode set S which initially fail. By summing the failure probabil-ities over all substations, we get the expected number of failedsubstations. �e failure probability Pr(si |S ) basically representsthe probability of si does not have a path to any power plants.We may combine connected component analysis and Trans-real,Trans-naive simulations to empirically estimate these probabilities,however it is hard to express them in a close form and hence hardto directly optimize. To express such a probability, we propose touse the dominator tree to capture critical nodes to si and estimatePr(si |S ).

As a �rst step, we merge all the power plants into a super powerplant node g (Fig. 2(a)(b)). �e super node g inherits all the edgesfrom the merged power plants, i.e. if there is an edge connectingдi to tj , we create a link between g and tj . With such mergingoperation, the failure condition of a substation does not change: ifa substation has a path to g, it certainly has a path to at least one ofthe original power plant; What’s more, it allows us to construct adominator tree rooted at g.Dominator tree (D). In graph theory, given any directed graphand a starting node g, a node u dominates another node v if allpaths from g to v pass u. If all dominator nodes of v dominatesu, then u is a direct dominator of v , denoted by u = idom(v ). Forexample, in Fig. 2(b), t6 is a direct dominator of s2 since all pathsfrom g to s2 pass through node t6, and all other dominators of s2dominates t6. We can build a dominator tree rooted at g by addingedges between all node pairs u and v if u = idom(v ). Dominatortrees have been extensively studied in control-�ow problems, andbuilding dominator trees is a well-studied topic with near-lineartime algorithms available [11].

g1� g2� g3�

t1� t2�

t4�

s3�t6�t5�

t3�

s1� s2�

s4�

(a) Original graph

t1� t2�

t4�

t6�t5�

t3�

s1�

s3�

s2�

s4�

(b) Get root g

t1� t2�

t4�

t6�t5�

t3�

s1�

s3�

s2�

s4�

(c) Dominator tree

Figure 2: Dominator tree examples. We�rstmerge all power

plants into the super root node g, and then construct the cor-

responding D.

Naturally, the substations would become leaf nodes in the con-structed dominator tree. For each substation si there exists a pathPi = {g,t1,t2, ...,si } in the dominator tree, which starts from g, goesover the transmission network and �nally reaches the substationsi . Further, each tj ∈ Pi dominates si , namely if any tj ∈ Pi fails, si

https://goo.gl/QiTJMQ


would certainly fail. �erefore, we can estimate Pr(si |S ) using itsancestor nodes in the dominator tree as

Pr(si |S ) = 1 −∏tj ∈Pi

(1 − Pr(tj |S )) (6)

which is the probability of any transmission node tj in Pi failing.Note that to simplify analysis, we assume independence amongPr(ti |S ). Since we know that if any tj ∈ Pi fails, si is bound to failin the end, this is a lower bound estimation of the true Pr(si |S ).Using this estimation, we can now reformulate our objective func-tion (expected number of failed nodes) inMax-Sub,Max-SubBus

(without loop) as

E[#s |S]′ =∑si

Pr(si |S ) = T −∑si

∏tj ∈Pi

(1 − Pr(tj |S )) (7)

E[#s + #t |S]′ =∑si

Pr(si |S ) +∑ti

Pr(ti |S )

= T −∑si

∏tj ∈Pi

(1 − Pr(tj |S )) +∑ti

Pr(ti |S ) (8)

For both E[#s |S]′ and E[#s + #t |S]′, we show that they satisfy thefollowing diminishing return property.

Lemma 3.2. E[#s |S]′ and E[#s + #t |S]′ are both a) monotonicallynon-decreasing; b) sub-modular in terms of S2.

HotSpots framework. See pseudo-code of our HotSpots frame-work in Alg. 2. Note that due to diminishing returns we also adopt alazy evaluation strategy [21] to speed up the algorithm (by avoidingevaluating δ (ti ) for all ti in each iteration). For easy understand-ing, Alg. 2 does not include the lazy evaluation. HotSpots uses agreedy method to iteratively select nodes that lead to the maximummarginal gain of the �nal objective function. As both E[#s |S]′ andE[#s + #t |S]′ are submodular, this immediately gives us a (1− 1/e )-approximation algorithm for optimizing them [23].

To calculate the marginal gain δ (ti ) of adding a node ti to S , we�rst need to estimate Pr(t |S ) for calculating E[#s]′ and E[#s + #t]′.For this purpose, we run a set ofm Trans-real/Trans-naive simula-tions on the transmission network to get the empirical probabilitiesof a ti failing given S . �en, we propose to use a recursive functionto e�ciently calculate the value of the objective function givenPr(t |S ) (Alg. 4). In e�ect, this recursive function just does a correct‘walk’ on the dominator tree. We observe that the objective func-tion Eq. 7 (and also the �rst part of Eq. 8) can be naturally factorizedover di�erent paths in the dominator tree (each t corresponds toa path in D). �erefore, we call our recur (·) function on the rootnode g, and it goes over the dominator tree in a top-down fashion,where we take summation when iterating horizontally (visitingthe children of a node), and production when iterating vertically(visiting di�erent nodes in the same path). Combining the above,we calculate the marginal gain δ (ti ) of adding a node in line 15−16.

3.2 Max-Sub and Max-SubBus with loop

Now we consider the original G with all �ve components, andthe failures can further spread from substations to natural gascompressors, power plants, and �nally to substations again (the‘loop’). Our main idea is to �rst calculate the failure probabilitiesof power plants given those of substations using the natural gas

Algorithm 2 HotSpots frameworkInput: G′, F-Cas, k ,mOutput: A set S of k nodes1: S = { }2: Merge all power plants into g, and construct a dominator tree D rooted

at g for G′3: while |S | < k do

4: for each ti < S do

5: Estimate Pr(t |S ), Pr(t |S ∪ ti ) using F-Cas simulation.6: δ (ti ) = Update (Pr(t |S ∪ ti ), Pr(t |S ), D, g)7: t ∗ = argmax δ (ti ), add t ∗ to S8: Return S

Algorithm 3Update (·)

Input: Pr(t |S ∪ t ∗), Pr(t |S ), D/D+, gOutput: δ (t ∗)

//For the without-loop version

1: Return Recur (Pr(t |S ∪ t ∗), D, g) − Recur (Pr(t |S ), D, g)//For the with-loop version

2: Initialize all Pr(s |S ∪ t ∗), Pr(s |S ), Pr(d |S ∪ t ∗), Pr(d |S ) as 03: while Pr(s |S ∪ t ∗), Pr(s |S ) is changing do

//Traverse D+ to update Pr(s |S ∪ t ∗), Pr(s |S )4: Recur+ (Pr(s |S ∪ t ∗), Pr(t |S ∪ t ∗), D+, g, 1)5: Recur+ (Pr(s |S ), Pr(t |S ), D+, g, 1)6: Update Pr(d |S ∪ t ∗), Pr(d |S ) using Eq. 107: Return E[#s |S ∪ t ∗]−E[#s |S ] using Eq. 11 (similarly forMax-SubBus)

Algorithm 4 Recur (Pr(t |S ),D,x)Input: Pr(t |S ), the dominator tree D , the current node visited xOutput: �e value of the objective function Eq. 7 (Eq. 8 can be calculated

similarly)1: if x is a substation then

2: Return 13: kids = {child nodes of x in D }4: if kids = ∅ then

5: Return 0//reach a leaf node in D that is not a substation6: s = 0//add up the value from the child nodes7: for each kid in kids do8: s+ = Recur (Pr(t |S ), D, kid )//call Recur() on the child node9: Return Pr(x |S ) ∗ s

system. �en we modify the dominator tree we constructed aboveto include the failure of power plants, and thus encode the failurecascade from substation to power plants.

3.2.1 Failure probabilities of power plants. In the failure cascadeloop, once a substation fails, the natural gas compressors that relyon it for power would fail. Similarly, once a natural gas compressorfails, the power plants it supplies fuel to (through the pipeline net-work) may fail. As mentioned in Sec. 2.2, we assume the pipelinesdo not fail; hence we can directly map each power plant дi to a setof relevant substations, and derive its failure probability.

Pr(дi |S ) = 1 −∏c j ∈Ai

(1 − Pr(Sub (c j ) |S )) (9)

whereAi is the set of compressor nodes that fuel the power plant дithrough the pipeline network, and Sub (c j ) represents the substation

, , L. Chen et al.

that is connected to c j . Basically, for each power plant, we �ndall the natural gas compressors it can reach through the pipelinenetwork, and �nd the substations that are connected with thesenatural gas compressors, and �nally calculate the probabilitiesaccordingly.

3.2.2 Dominator tree with loop (D+). �e dominator tree wehad constructed before merges all power plants to a super rootnode g. While this is an essential step to construct a meaningfuldominator tree for our tasks, it brings di�culties to include failureprobabilities of power plants since all of them are represented asg. To modify the dominator tree to include the failure probabilitiesof power plants, we insert a dummy node for each branch in D torepresent the set of power plants that provide power to this branch.

t1� t2�

t4�

t6�t5�

t3�

s1�

s3�

s2�

s4�

d3�d2�d1�

g1,g2�g1�g2�g3�

g2,g3�

For example in the le� snippet,for each direct child node ti ofg in D, we insert a node di ,such that the connection fromg to ti would go through di �rst.�is dummy node di representsthe set of power plants that canreach ti , and it fails with the fol-lowing probabilities:

Pr(di |S ) = 1 −∏дj ∈Bi

(1 − Pr(дj |S )) (10)

where Bi represents the set of power plants that can reach ti .With the addition of these dummy nodes, we now have a failure

cascade loop in the dominator tree. Initially Pr(d |S ) are set as 0;then we estimate the failure probabilities of substations Pr(s |S );then we update the value of Pr(d |S ), and recalculate Pr(s |S ), soon. �ese probabilities are non-decreasing over iterations in thefailure cascade loop, and they are bounded by 1. Hence we wouldeventually converge. We can rewrite our objective function Eq. 7,Eq. 8 using the probabilities Pr(s |S ) a�er convergence.

E[#s |S]+ = T −∑si

∏tj ∈Pi

(1 − Pr(tj |S ))·∏дj ∈Bi

∏cy ∈Aj

(1 − Pr(Sub (cy ) |S )) (11)

E[#s + #t |S]+ = T −∑si

∏tj ∈Pi

(1 − Pr(tj |S ))·∏дj ∈Bi

∏cy ∈Aj

(1 − Pr(Sub (cy ) |S )) +∑ti

Pr(ti |S ) (12)

Further, we can prove by induction that the above objectivefunctions maintain the diminishing return property.

Lemma 3.3. E[#s |S]+ and E[#s + #t |S]+ are both a) monotonicallynon-decreasing; b) sub-modular in terms of S2.

Hence, the complete problem with loop can be solved using thesame (1 − 1/e )-approximation framework in Alg. 2, with a newdominator tree,Update (·), and Recur+ (·) function (Alg. 3, Alg. 5).

�e overall time complexity of our HotSpots framework isO (kVt (mEt + lVt + lVs )), where Vt , Vs are the number of transmis-sion nodes and substation nodes respectively, Et is the number of

Algorithm 5 Recur+ (Pr(s |S ),Pr(t |S ),D+,x,v)Input: Pr(s |S ), Pr(t |S ), D+, x ,v (the probability that none of the previous

nodes fail)Output: Update the values of Pr(s |S )1: if x is a substation then

2: Pr(x |S ) = 1 − v3: else4: kids = {child nodes of x in D+ }5: for each kid in kids do6: Recur+ (Pr(s |S ), Pr(t |S ), D+, kid,v ∗ (1 − Pr(x |S )))

Node Type TN PA FL OHPower Plants 11 45 79 35Transmission Nodes 206 224 253 105Electrical Substations 489 831 1590 806Gas Compressors 105 291 45 189Pipelines 387 5667 624 7641

Table 2: Number of nodes in each CI components for each

dataset we use.

edges in the transmission network, and l is the number of failurecascade loops until the probabilities converge.

4 EXPERIMENTS

In this section, we design various experiments and case studiesto evaluate our algorithms. For the experiments, we set the loadand the capacity of any transmission node as a constant value.Di�erent load and capacity se�ings can easily be used by changingthe corresponding values in F-Cas.Datasets: We construct heterogeneous CI networks as describedin Sec. 2.1 for four di�erent states for our evaluation: Tennessee(TN), Pennsylvania (PA), Florida (FL) and Ohio (OH). �e statisticsof each of these datasets are shown in Tab. 2.Baselines: To the best of our knowledge, there is no existing algo-rithm that can be used to solve the failure maximization problems(Max-Sub,Max-SubBus) as our algorithms do for the CI networks.We adapt two related algorithms, and generate several algorithmswith di�erent node selection strategies as our baselines.(1) Opera [13] is a recent work which picks k critical nodes thatwould maximally break the connectivity of a target network (whichcan be measured by the number of triangles in the network). Werun it on the transmission network.(2) Netshield [27] is an immunization algorithm which aims tominimize the epidemic threshold of the graph. We select the top ktransmission nodes according to the ranking given by the algorithm.(3) Degree: We pick the k transmission nodes with the highestdegree as the seed nodes set.(4) Pagerank: We pick the k transmission nodes with the highestpagerank values.(5) Random: We randomly select k transmission nodes.

4.1 E�ectiveness (Q1)

For the nodes selected by our algorithms and the baselines, we runour F-Cas simulation to evaluate the e�ectiveness forMax-Sub andMax-SubBus. See Figure 3 (only show OH for lack of space, resultsfor FL, TN, PA are similar2). In all states under all situations (Trans-real and Trans-naive), the proposed HotSpots algorithm performs


be�er than the baselinemethods. �e di�erence betweenHotSpotsand other baselines decreases as k increases. �is is expected, asmore seeds are picked, the eventual failures will saturate. Note thatOpera does not give good results for Max-Sub and Max-SubBus,as it is designed to deal with static interdependent networks. OurMax-Sub and Max-SubBus both need the dynamics of the wholenetwork system—as a result, our proposed HotSpots outperformsOpera. Also for other baselines, as they only consider the staticinformation within one network, they do not perform as well or asstable as HotSpots.

0 10 20 30 40 50 K

0

50

100

150

200

250

300

350

400

#s

HotSpotsOPERADegreePagerankRandomNetshield

(a) OH, Max-Sub

0 10 20 30 40 50 K

0

100

200

300

400

#s+

#t


(b) OH, Max-SubBus

0 10 20 30 40 50 K

0

50

100

150

200

250

300

350

400

#s


(c) OH, Max-Sub

0 10 20 30 40 50 K

0

100

200

300

400

#s+

#t


(d) OH, Max-SubBus

Figure 3: Evaluating the detected nodes. Proposed

HotSpots (in red), outperforms all the baselines.

Top/bottom row are Trans-naive/Trans-real. # of seeds

k vs expected # of substations (and transmission nodes)

from each method: larger is better.

4.2 Scalability (Q2)

We investigate how HotSpots scales as the number of seeds k andas size of network |V | changes (we ranHotSpots on increasing sizesof the TN state data). See Figure 4. As expected from our complexityanalysis, it scales linearly with k and almost quadratically on |V |.Note that for all our datasets, HotSpots �nished within 30 minutesin choosing top-50 nodes.

(a) vs k (b) vs |V |Figure 4: Scalability Results.

4.3 Success Stories and Case Studies (Q3)

In this section, we show how HotSpots algorithms and results canbe used for real applications at ORNL and beyond. Note that the

HSIP Gold data we used is not public, and we cannot directly showthe network we constructed due to security reasons. �ereforein all our visualizations, we use the maps and the high-voltagetransmission lines from the US Energy Information Administration(EIA) [3] which are publicly available.Heterogeneous networks for CI. Constructing a heterogeneousCI network having interlinks within and across networks to cap-ture various physical, geographical and cyber interdependenciesprovides useful insights right o� the bat. For example, by simplyoverlaying the hurricane advisory track over the heterogeneousnetwork we constructed, we can quickly estimate the extent of theinfrastructure damage, and even predict the spread of the damagebeyond the contour of the hurricane track. �is is crucial as ge�ingsuch estimates currently are non-trivial, and time consuming [10].

In this case study, we overlay actual hurricane sandy wind swathtracks (advisory no. 20) obtained from Hurricane Mapping [4] overour �ve-component heterogeneous network, and estimate boththe immediate and predicted damage of the hurricane. �e resultsfrom our analysis provide several intuitive insights. Firstly, whenthe hurricane Sandy makes landfalls on New York city, by simplyoverlaying the hurricane track with ourG , we can quickly estimatethe CI facilities directly a�ected by the hurricane. As shown inFig. 5(a), we expect from HotSpots that 227 nodes inG are a�ecteda�er sandy hits NY city. Assuming these 227 nodes would failimmediately, by running our failure cascade model F-Cas, we canpredict the future damage of the CI network and pinpoint high riskregions and facilities. In Fig. 5(b)(c), the failure quickly cascadesto other CI facilities in the entire NY state. Further, the cascadingand escalating loop failures are clearly visible on the map with487 nodes failing before the failure cascade loop is formed and 712nodes failing a�er the loop formation. �ese reinforcing loops wereone of the major causes for the prolonged recovery. Our failuremodel is able to systematically predict the cascading loop trendsgiven the initial perturbation input. Such prediction results comple-ment existing hurricane assessment tools, such as HEADOUT [2],OCIA3, EARRS [8] by including the failure cascade a�ect into theirdamage assessment system which are mainly based on applicationof fragility curves [7] without considering the interdependenciesamong di�erent CI components.Comparison with the 2003 Blackout. An in-depth study on theUS NE 2003 blackout event revealed that a single high voltage trans-mission line in northern Ohio brushed against some overgrowntrees and shut down due to overheating [5]. Within a couple ofhours, three other lines sagged into trees and switched o�, forc-ing other power lines to shoulder an extra burden and tripping acascade of failures throughout southeastern Canada and eight NEstates. �is suggests that the initiators of this massive blackoutshould be critical nodes in the context of cascading failures. Inthis case study, we select a portion of the heterogeneous networkoverlapping the Ohio region and run HotSpots to identify the top5 vulnerable nodes (shown in Fig. 6a). By comparing our �ndingswith the study mentioned above, we �nd that one of the nodes(on the top right) we identi�ed are truly critical: it is only onehop away (within ∼ 10 miles in geographical distance) from thenodes connecting the 3 transmission lines which actually failed and

3h�ps://www.dhs.gov/o�ce-cyber-infrastructure-analysis

https://www.dhs.gov/office-cyber-infrastructure-analysis

, , L. Chen et al.

(a) 227 nodes initially a�ected in NY city (b) 487 nodes fail before the cascade loop (c) 712 nodes fail a�er a loop of cascade

Figure 5: Estimate the impact of hurricane sandy on NY state. (a) # of directly a�ected nodes by overlaying constructed Gwith the hurricane track (b) F-Cas simulation results before the failure cascade forms a loop. (c) F-Cas results a�er the failure

cascade forms a loop. We mark regions with high number of CI failures as red (actual network not shown for data privacy).

triggered the blackout. Such critical nodes identi�ed by HotSpotscan also support DHS NIPP4 for strengthening the security andresilience of the CI system.Analyzing detected critical nodes. To evaluate the quality ofthe critical nodes HotSpots �nds, we mark our results on high-voltage (> 345 kV) transmission networks sourced from the EIA. InFig. 6, the locations of the detected nodes are marked with a redcircle. While the high-voltage transmission lines are not a directindicator of importance, it is expected in general that critical nodeswould be either on large generation plants or on transmission lineterminations with a large power throughput. In the visualizationof the high-voltage transmission networks, this is o�en correspon-dent to nodes with several converging lines. In these �gures, weclearly observe that many of the top transmission nodes HotSpotsidenti�ed are on these high-voltage transmission lines, and someon the intersection of multiple HV lines, indicating that the nodesdetected from HotSpots are truly important.

(a) OH

(b) PA

(c) TN

Figure 6: Top �ve transmission nodes identi�ed by

HotSpots, highlighted with red circles.

4.4 User interface

To improve usability, we also spent time designing a web-baseduser interface that integrates all the proposed tools and algorithmsand also facilitates ‘what-if’ scenario analysis. We describe thedetails in the following and show an example screenshot in Fig. 7.

4h�ps://www.dhs.gov/sites/default/�les/publications/NIPP%202013 Partnering%20for%20Critical%20Infrastructure%20Security%20and%20Resilience 508 0.pdf

Our interface allows users to input their own CI data, and con-struct their own heterogeneous CI network using our graph gener-ation toolkit (urbannet-toolkit). �e heterogeneous network willthen be visualized on a real map (top right in Fig. 7), with optionsto show a certain type of CI component (top le�), and to checkdetailed a�ributes of a speci�c node. Further, it allows users toinput a perturbation (i.e. selecting initial failure nodes), customizeand run our failure cascading simulation (bo�om le�), and get real-time failure statistics and visualizations in all the CI components(bo�om right). Finally, it integrates the HotSpots algorithm whichautomatically identi�es and visualizes the critical nodes. We alsoprovide a knob which can control the number of critical nodes theanalyst wants to visualize. �is interface would greatly help inanalyzing the vulnerability of the CI systems for domain expertsand decision makers.

Figure 7: Snapshot of our UI (shows part of Florida).

5 DISCUSSION

To the best of our knowledge, the proposedHotSpots algorithm, aswell as our urbannet-toolkit, F-Cas model, are the �rst a�emptin analyzing up to �ve CI components in a uni�ed framework froma graph analytic view point. Additional CI components can be eas-ily added given the interlinking rules, and the graphs it generatesfacilitate network-based analysis, and easy visualization for be�erunderstanding. As discussed in Section 2.2, our failure cascademodel F-Cas is not very expensive to run, and captures both thepath-based and neighbor-based failure conditions, that cannot be

https://www.dhs.gov/sites/default/files/publications/NIPP%202013_Partnering%20for%20Critical%20Infrastructure%20Security%20and%20Resilience_508_0.pdf

https://www.dhs.gov/sites/default/files/publications/NIPP%202013_Partnering%20for%20Critical%20Infrastructure%20Security%20and%20Resilience_508_0.pdf


modeled by existing cascade models. Note that such path-based fail-ure cascading is not restricted to the transmission network. Similaroverload phenomenons and path-based failures can be observedin the electrical distribution network (crucial for power delivery),which distributes power from substations to local facilities (a layerthat we do not model to simplify our analysis). Take the SCADAsystem (supervisory control and data acquisition)5 as another exam-ple, the communication servers send control signals through WANand LAN network to control facilities such as PLCs (programablelogic controllers) and RTUs (remote terminal units). When someconnections in the WAN and LAN network are damaged, message�ooding may happen which would cause message loss (similar tothe overloading in our problem), and further cause malfunctioningin the electric grid system. As a result, HotSpots can be easilyapplied on a wide range of CI systems to identify the critical nodesin those systems.

6 OTHER RELATEDWORK

In addition to work that we have mentioned in previous sections,we discuss more related work here.Infrastructure Vulnerability Analysis. Previous works on vul-nerability analysis and simulation of interdependencies betweencritical infrastructure systems can be categorized into: empiricalapproach, agent based, system dynamics based, economic theorybased, and network based approaches (see [24] for a review). A fewmathematical frameworks [12] and interdependency models [17]have been proposed for vulnerability analysis. Most of these workfocus on only two critical infrastructures at a time. For example,Parandehgheibi et al. [25] study the interdependency between acommunication network and a power grid network, Duenas-Osorioet al. [18] study the fragilities and interdependency between powerand water network. We study a more general problem where we fo-cus on combining multiple practical (�ve) CI components together.In�uence Maximization and Cascade Analysis. �e in�uencemaximization problem aims to �nd the best seed nodes whichmaximize in�uence. �is has been extensively studied on the Inde-pendent Cascade and the Linear �reshold models [20], where theygave a (1-1/e) approximation algorithm based on submodularity.Much work has focused on designing more e�cient algorithmsfor the original problem [9], or extending to continuous time mod-els [16], or under uncertainty [15]. All these algorithms assumethat the in�uence cascades locally through edges. In contrast, inour problem, the failure condition for nodes requires examinationof the connectivity of the entire network, and not just neighbors.A very recent work Opera [13] �nds critical nodes in a CI networkthat would maximally decrease the connectivity in target networks.However, they still use a ‘local’ failure model, which does not cap-ture the dynamics of CIS (they optimize on a di�erent objectivefunction based on the number of triangles in the network).

7 CONCLUSIONS

In this paper, we approach the problem of failure analysis on CIsystems using heterogeneous networks. We construct real CI net-works from datasets at ORNL, and formulate a novel cascade modelF-Cas and problems to identify critical nodes. We then develop5h�ps://en.wikipedia.org/wiki/SCADA

an e�ective and scalable algorithm HotSpots for these problems.We also showed through extensive experiments on real datasets,that such an approach is useful for CI analysis, that HotSpotsgives high quality results beating competitors, and the results areinterpretable and meaningful matching real-world situations.

As future work, the results from our approaches can serve asan input starting point for expensive, high-�delity simulations.Further, applying HotSpots to other CIs such as the transportationnetwork, the SCADA system is also useful.

REFERENCES

[1] Homeland security infrastructure program (HSIP). h�ps://gii.dhs.gov/HIFLD/hsip-guest.

[2] Hurricane electrical assessment damage outage tool (HEADOUT).h�p://www.gss.anl.gov/wp-content/uploads/2015/09/MORS PresentationTalaber WG21 and WG31 060415.pdf.

[3] U.S. energy information administration (EIA). h�ps://www.eia.gov/.[4] U.S. hurricane mapping. h�ps://hurricanemapping.com/.[5] S. Abraham, H. Dhaliwal, R. J. E�ord, L. J. Keen, A. McLellan, J. Manley, K. Voll-

man, N. J. Diaz, T. Ridge, et al. Final report on the august 14, 2003 blackout inthe united states and canada: Causes and recommendations. US-Canada PowerSystem Outage Task Force, 2004.

[6] R. Albert, I. Albert, and G. L. Nakarado. Structural vulnerability of the northamerican power grid. Phys. Rev. E, 69:025103, feb 2004.

[7] M. Allen, S. Fernandez, O. Omitaomu, and K. Walker. Application of hybridgeo-spatially granular fragility curves to improve power outage predictions. JGeogr Nat Disast, 4(127):2167–0587, 2014.

[8] A. M. Barker, E. B. Freer, O. A. Omitaomu, S. J. Fernandez, S. Chinthavali, andJ. B. Kodysh. Automating natural disaster impact analysis: An open resourceto visually estimate a hurricane’s impact on the electric grid. In Southeastcon,pages 1–3, 2013.

[9] C. Borgs, M. Brautbar, J. Chayes, and B. Lucier. Maximizing social in�uence innearly optimal time. In SODA, pages 946–957, Philadelphia, PA, USA, 2014.

[10] W. N. Bryan. Hurricane Sandy Situation Report. U.S. Department of Energy O�ceof Electricity Delivery & Energy Reliability, 2012.

[11] A. L. Buchsbaum, H. Kaplan, A. Rogers, and J. R. Westbrook. A new, simplerlinear-time dominators algorithm. ACM Trans. Program. Lang. Syst., 20(6):1265–1296, 1998.

[12] S. V. Buldyrev, N. W. Shere, and G. A. Cwilich. Interdependent networks withidentical degree of mutually dependent nodes. Physical Review, 83(1), 2011.

[13] C. Chen, J. He, N. Bliss, and H. Tong. On the connectivity of multi-layerednetworks: Models, measures and optimal control. In ICDM. IEEE, 2015.

[14] C. Chen, H. Tong, L. Xie, L. Ying, and Q. He. Fascinate: Fast cross-layer depen-dency inference on multi-layered networks. In KDD. ACM, 2016.

[15] W. Chen, T. Lin, Z. Tan, M. Zhao, and X. Zhou. Robust in�uence maximization.In KDD. ACM, 2016.

[16] H. Daneshmand, M. Gomez-Rodriguez, L. Song, and B. Schoelkopf. Estimatingdi�usion network structures: Recovery conditions, sample complexity & so�-thresholding algorithm. In ICML, pages 793–801, 2014.

[17] S. Duan, S. Lee, S. Chinthavali, and M. Shankar. Reliable communication modelsin interdependent critical infrastructure networks. In Resilience Week (RWS),2016, pages 152–157. IEEE, 2016.

[18] L. Dueas-Osorio, J. I. Craig, and B. J. Goodno. Seismic response of criticalinterdependent networks. Earthquake Engineering and Structural Dynamics,36(2):285–306, 2007.

[19] A. Dwivedi and X. Yu. A maximum-�ow-based complex network approach forpower system vulnerability analysis. IEEE Transactions on Industrial Informatics,9(1):81–88, 2013.

[20] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of in�uencethrough a social network. In KDD. ACM, 2003.

[21] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance.Cost-e�ective outbreak detection in networks. In KDD. ACM, 2007.

[22] Y.-S. Li, D.-Z. Ma, H.-G. Zhang, and Q.-Y. Sun. Critical nodes identi�cation ofpower systems based on controllability of complex networks. Applied Sciences,5(3):622–636, 2015.

[23] G. L. Nemhauser, L. A.Wolsey, andM. L. Fisher. An analysis of approximations formaximizing submodular set functions i. Mathematical Programming, 14(1):265–294, 1978.

[24] M. Ouyang. Review on modeling and simulation of interdependent criticalinfrastructure systems. Reliability Engineering and System Safety, 121:43–60,2014.

[25] M. Parandehgheibi and E. Modiano. Robustness of bidirectional interdependentnetworks: Analysis and design. arXiv preprint arXiv:1605.01262, 2016.

https://en.wikipedia.org/wiki/SCADA

https://gii.dhs.gov/HIFLD/hsip-guest

https://gii.dhs.gov/HIFLD/hsip-guest

http://www.gss.anl.gov/wp-content/uploads/2015/09/MORS_Presentation_Talaber_WG21_and_WG31_060415.pdf

http://www.gss.anl.gov/wp-content/uploads/2015/09/MORS_Presentation_Talaber_WG21_and_WG31_060415.pdf

https://www.eia.gov/

https://hurricanemapping.com/

, , L. Chen et al.

[26] A. Sen, A. Mazumder, J. Banerjee, A. Das, and R. Compton. Identi�cationof k most vulnerable nodes in multi-layered network using a new model ofinterdependency. In INFOCOM WKSHPS, pages 831–836. IEEE, 2014.

[27] H. Tong, B. A. Prakash, C. Tsourakakis, T. Eliassi-Rad, C. Faloutsos, and D. H.Chau. On the vulnerability of large graphs. In Data Mining (ICDM), 2010 IEEE10th International Conference on, pages 1091–1096. IEEE, 2010.

[28] U.S. Department of Energy. �e Water-Energy Nexus: Challenges and Opportu-nities, June 2014.

[29] Y. Yuan, Z. Li, and K. Ren. Modeling load redistribution a�acks in power systems.IEEE Transactions on Smart Grid, 2(2):382–390, 2011.

hotspots: failure cascades on heterogeneous critical...

Documents