[ieee 2011 international conference on recent trends in information systems (retis) - kolkata, india...
TRANSCRIPT
An Indexing Method for Efficient Querying of an
Attack Graph Atin Ruia, Vishal Parekh and Aveek Chakrabarti
Department of Computer Science and Engineering Jadavpur University
Kolkata,India
{atinruia.jucse, vishal.jucse, aveek.chakrabarti}@gmail.com
Abstract − Attack graphs are a novel way of examining how safe a
network is from attacks and analysing the shortcomings of these
networks. The analysis of the attack graph may help in assessing
network security. However, an attack graph can be very large in
size – containing a million nodes and a million edges. Thus
analysing such a large graph becomes problematic and time
consuming. We have proposed an indexing scheme for fast data
retrieval. Using this indexing scheme we can identify the
vulnerable machines in a network corresponding to an attack pattern.
Keywords−attack graph; graph mining; indexing; query
I. INTRODUCTION
An attack graph is a complete graph which gives a succinct
representation of different attack scenarios, depicted by attack
paths. An attack path is a logical succession of exploits where
each exploit in the series satisfies the preconditions for
subsequent exploits and makes a causal relationship among
them. Attack graphs have been proposed as a way to identify
critical network weaknesses, construct adversary models, analyse the security of a network and suggest changes to
improve the latter. It enumerates multi stage attacks. Thus
analysis of the attack graph may help in assessing network
security from the hackers' perspective. Network
security consists of the provisions and policies adopted by
the network administrator to prevent and
monitor unauthorized access, misuse, modification, or denial of
the computer network and network-accessible resources. It
deals with a trade-off between the degrees of accessibility and
protection of the network. The aim of network security is
providing resources and information to authorized users. However in practice it is extremely difficult to prevent
unauthorised users from accessing protected resources. An
attack graph plays a key role in providing information about
possible sequence of malicious actions and attacks in a
protected network in advance.
The process of identifying patterns and extracting
knowledge from large graph databases is known as graph
mining. A typical attack graph can range from hundreds of
nodes to millions of nodes and is generally represented using
the Resource Description Framework (RDF) [1-2]. As the size
of these RDF databases increases to millions of tuples, efficient graph matching and querying methods become increasingly
important. Due to the large size of these graphs, they cannot be
completely stored in main memory. This makes the scalability
of querying and searching methods extremely problematic.
Typically, indexes can be constructed for fast access. An index
is a data structure that improves the speed of data retrieval
operations. The space required to store the index is typically
less than the amount of space required to store the actual data.
In this paper an indexing method has been proposed, which
generates a key value for each node in the attack graph based
on their labels. These values are then inserted into an indexing structure which is highly optimized for searching, i.e. a B-tree.
Thus, according to our proposed method, searching for an
object involves mapping a label to its key value and then
performing a lookup in the index to find the desired object.
This method has several advantages. Firstly, it is not necessary
to know the schema of the data in advance as the indexing
depends on the data itself. The data accompanying each node
and edge of the attack graph may have a large number of
attributes associated with it. These attributes could differ for
different objects of the graph and could also be of different
types. However our scheme requires only the label of each node or edge, which is a string, and is independent of all other
attributes. Furthermore, the index structure can handle dynamic
changes of the graph.
Certain queries are very useful in analysing large attack
graphs. It is often required to identify certain machines
corresponding to an attack pattern. These patterns could also be
present in different parts of the graphs a multiple number of
times. In large graphs it is not possible to manually locate these
unknown machines. Thus, an algorithm has been proposed to
query an attack graph to identify all the possible groups of
machines compromised in a given attack pattern.
The rest of the article has been arranged as follows. Section
II describes the related work. The proposed approach has been
described in Section III with the proposed algorithm in Section
IV. Section V shows the time complexity and Section VI
concludes the article.
II. RELATED WORK
There have been many methods proposed in literature for
the generation and analysis of large attacks graphs [3-6].
However, to the best of our knowledge this is the first time an
indexing and searching approach for attack graphs has been
proposed. A number of systematic approaches to statically
analyse attack graphs by means of reasoning mechanisms based on logical expressions and conditional preference
2011 International Conference on Recent Trends in Information Systems
978-1-4577-0792-6/11/$26.00 ©2011 IEEE82
networks has been presented in [3]. It also provides a method
to compute preventative paths to prevent the network from
malicious attacks and select appropriate countermeasures based
on a given conditional decision preferences and relevant
factors. Instead of generating full attack graph the concept of
minimal attack graphs have been introduced in [4] and [6]. A
minimal attack graph is an attack graph in which all the attack
paths terminate to a goal condition. In [7], Jha et al proposes a
minimisation analysis of attack graphs. It addresses the issue of
finding a minimum set of countermeasures to prevent all
attacks in a given attack graph. In [6], an index has been proposed for sub-graph matching
and query processing by partitioning the graph. However the
index method is inefficient for dynamic graphs as the index
structure changes drastically with any change in the network.
An index for semi-structured data has been proposed in [9] .
The authors have proposed the use of a trie to provide fast
access to the data. There has also been some research on query
answering over graph datasets related to bioinformatics [10].
However in these cases the datasets are small enough to fit into
main memory itself and hence the need for efficient storage
and retrieval does not arise.
III. PROPOSED APPROACH
An attack graph G (V, E) consists of nodes and edges, all of
which have labels. However, the number of nodes in a graph
could run up to a million nodes and a million edges. An
indexing method using B-trees [11] has been proposed. B-trees
are balanced search trees designed for minimizing disk I/O
operations. An assumption has been made in our indexing
scheme, that no two nodes of the graph will have the same
label.
This assumption is realistic and applicable to all attack
graphs. A node in an attack graph specifies a service which is provided by one device to another device. In a network, one
service may be provided by many devices. However the
combination of the device providing the service and the device
using the service makes each node unique. Thus, no two nodes
of the attack graph will have the same label.
Consider Figure 1 as an example. The diagram shows the
network configuration. In the network configuration an attacker
is positioned at the workstation machine 0. The attacker can
initially remotely execute a shell in machine 0 without having
to provide a password. The goal of the attacker is to gain super-
user or root privilege at machine 2. A firewall is located
between the attacker and the internal network.
Figure 1 – Network Configuration of a network
Figure 2 shows the attack graph corresponding to the network in Figure 1. In the attack graph a node is used to
denote capabilities or conditions while edges are used to depict
exploits or attacks. Capabilities or conditions are denoted in
two ways –
a. f(x,y) – machine x has the capability to
perform the service f on y.
b. f(x) – the service f can be performed locally
by x.
The terms service and privilege have been used
interchangeably in the rest of the paper. A node with an
outgoing edge signifies a pre-condition and a node with an
incoming edge signifies a post-condition. Exploits or attacks
can be denoted as –
e(x,y) – machine x can execute the exploit e on y.
Figure 2 - Attack graph of the network in Figure 1
83
A node may have multiple edges leading to it having the
same label. In this case all of the pre-conditions corresponding
to that node for that label must be simultaneously present for
the exploit to be executed. For example in Figure 1, the node
trust(2,0) has two edges leading to it having the label
ftp_rhost(0,2). Thus, the attacker must have both the
capabilities ftp_c(0,2) and execute(0) before being able to
execute the attack ftp_rhost(0,2).
An index of this attack graph is to be created. A single B-
tree for storing all the nodes of the graph is constructed. A simple function is used to generate a key value for each node in
the graph. The nodes are inserted into the B-tree based on this
key value which is derived from the label of each node –
f (n) =∑ ASCII values of all the characters of the privilege
denoted by node n.
The sum of the ASCII values of all the characters of the
privilege denoted by a node is used to generate the key value of
that node. The label of the first node in Figure 2 is ftp_c(0,2).
The key value of the node is sum of the characters of the
privilege denoted by the node i.e. ftp_c. Thus the key value for
that node is 102+116+112+95+99 or 524. The data pointers of each node of the B-tree along with the key values are stored in
a single memory block. Whenever a node of the B-tree is to be
accessed its corresponding block is loaded into the main
memory, provided it is not already present. So the whole B-tree
does not need to be loaded into the main memory at any time.
Each data pointer of this B-tree, points to a secondary
memory location where that service is stored. In this location a
list mid_list and a table exploit_table are present. The list
mid_list will store all the machines ids, in a sorted manner,
between which that service is present. The table exploit_table will store all exploits which are incident on that service. For
each exploit in this table the location and name of the service at
the opposite end of the exploit are also stored.
For example in Figure 2, in case of the privilege ftp_c the
data pointer for the key value 524 will point to a location in
secondary memory where the details of that service are stored.
The list mid_list for that service will have two attributes. The
values for this list will be (0,1), (0,2), (1,2). Also the
exploit_table for that table will store the exploit ftp_rhosts.
Along with the exploit, the service at the opposite end i.e. trust
and its location will also be stored.
However, there may be a multiple number of nodes in the
attack graph having the same key value. For example the two
nodes denoting the services abcd and dcba both have the same
key value. In this case provisions are made such that the data
pointer of the node in the B-tree points to a table,
collision_table. This table will contain the actual names of the
service of the nodes having the same key value, along with the
corresponding secondary memory locations of the services.
According to the requirements a B-tree or a B+-tree may
have been used. In this paper a B-tree has been considered
under the assumption that the number of insertion and deletion
of nodes and edges will be much lesser as compared to the
number of search operation required. This is because, for a
typical enterprise network, an attack graph is generated once
and minor changes occur during its maintenance. In this case a
B-tree is most effective as the data pointers are not all required
to be in the leaf level. In a B+-tree all the data nodes are present
in the leaf level. Thus, searching for a value will require a
larger number of block accesses, as the whole height of the B+-
tree will have to be traversed, than in the case of a B-tree.
Thus, searching such a large graph can be time consuming.
An attack graph can also be stored in a relational database
(RDBMS). It is possible to use the inbuilt searching and
indexing techniques provided by the database. However there
are certain drawbacks in using this method. The searching
techniques adopted by the database are not specifically
designed for optimising the searching of an attack graph. The
methods provided may be of no use or worse could have an
adverse effect in this specific application.
IV. PROPOSED ALGORITHM
The formation of the index requires basic B-tree operations like insertion and deletion. The key values of nodes are
determined using the functions described above. The nodes are
then inserted into the node B-tree using standard B-tree
algorithms [11]. The node B-tree formed from the attack graph
in Figure 2 is shown below in Figure 3.
Figure 3 - Node B-tree corresponding to the attack graph in Fig. 2
The minimum degree, t of each node in the B-tree is taken
to be 2. Thus the minimum number of key values in each node
is (t-1) or 1 and the maximum number is (2t-1) or 3. The key
values of all the nodes are determined and are inserted
accordingly.
The proposed algorithm is used for identifying unknown
machines from an attack pattern. The query is shown in Figure
4 for the attack graph in Figure 2. The attack query is also in
the form of an attack graph. The unknown machine ids are
denoted by x and y. The index structure formed for the attack
graph in Figure 3 is partially shown in Figure 5. According to the algorithm initially the start node (selected arbitrarily) of the
query is taken to be ftp_c(x,y). The location of the service ftp_c
is found from the node B-tree using the function lookup, shown
in Figure 5 as step 1. If the service cannot be found in the B-
tree then there exists no possible solution for the query and null
is returned. If there exists multiple services for that key value
i.e. there are collisions, then the required service location is
obtained from collision_table using the locate function.
84
Algorithm : for identifying unknown machine ids from an attack graph
Input: Attack Graph G , Query-graph GQ, B – Tree nodeBTree
Output: Machine list result
1
2
3
4
5
6
7
8
9 10
11
12
13
14
15
16
17
18
19
20
21 22
23
24
25
26
27
28
Q ← createQueue
service_name ← get_service(startNode(GQ))
key ← generate _key(service_name)
service_addr ← lookup(nodeBTree,key) /* service_addr denotes a memory location */
if service_addr is null
return null
else if addr points to a collision_table
service_addr ← locate(collision_table,service_name)
enqueue service_addr in Q n ← load the service stored at service_addr
result ← getMIDlist(n)
while Q is not empty
service_addr ← dequeue from Q
n ← load the service stored at service_addr
result ← result ∩ getMIDlist(n)
for all edges e incident on n in GQ
expolit ← search(exploit_table,e)
if exploit is null
return null
service_addr = get_opposite_service_addr(exploit_table,exploit)
service_name = get_opposite_service_name(expoit_table,exploit) if(e.endnode.service_name does not match service_name)
return null
if(e.endnode is not visited)
enqueue(service_addr)
end for
end while
return result
In step 2 the location of the service in secondary memory is
loaded into main memory. Initially the machine id list midlist
of this service is stored as the result. The location of this
service is then inserted into a queue.
Figure 4 - Search query
At each iteration a service location is dequeued and the service is loaded into a main memory. A new possible set of
solutions is obtained from the midlist of this service (step 3).
An intersection operation is then carried out between this new
list and the old result list to obtain a new result list. In step 4,
for all the exploits on that service in the query graph, a check is
made to determine whether the same exploits also exist on that
service in the attack graph. The function search in the
algorithm is used for this purpose. If this function returns null,
then there is no possible solution and null is returned. For each
exploit another check is also necessary. The two services at the
opposite end of each exploit i.e. e.endnode.service in case of
the query graph, and service_name in case of the attack graph,
must match. If that is not the case then also null is returned.
Step 4 also shows how the location of the service trust at the
other end of the exploit ftp_rhost is obtained. If the node in the
query graph on the other of the exploit has not been visited
then the service represented by that node is enqueued. In the
next iteration the entire process is repeated for the next service as shown in steps 2 ,́ 3 ,́ and 4 ́ in Figure 5. This process is
repeated until the queue is empty.
The machine id lists are always stored in a sorted manner.
This is to ensure that the intersection operation can be carried
out efficiently. The common element or elements of the lists
are to be compared. At the first step, the first element of the
first list, say list 1 is compared with the corresponding element
of the second list, say list 2. The lists being sorted, while
checking for the second element of list 1, it is only needed to
start from that element in list 2 which had been checked last.
This guarantees that each list needs to be traversed completely, at most once. Thus the intersection operation can be carried out
in linear time.
85
Figure 5 – Steps involved in querying the attack graph
V. TIME COMPLEXITY
Number of nodes in the attack graph = n
Minimum degree of each node in the node B-tree = t
Height of the node B-tree = h
Avg. number of edges incident on a node = e
Avg. number of machine id combinations in a machine list = m Number of nodes in a query graph = p
Number of edges in a query graph = q
Time required to search a node from the node B-tree, T (N) =
O (t.h) = O (t logt n)
Time required to search an edge from edge_table, T (E) =
O(log2e)
In first case, the time to search for a key value within a node is
negligible as compared to the time required to load a new block
from secondary memory to main memory.
Thus, T(N) = O (logt n)
Time required to perform an intersection operation between two sorted machine lists T (I) = O(m)
Thus, time to process a query graph
= Time required to search for the first service in the node B-
tree + (p-1) × (Time required to perform an intersection
operation) + (q) × (Time required to search for an exploit in the
exploit table).
= T (N) + (p-1) × T (I) + q × T(E)
= O (logt n) + (p-1) × O (m) +(q) × O(log2e)
The time required to search for an exploit in the exploit table is
very low as for each service in an attack graph the number of possible exploits is limited. Further the traversal of a B-tree,
which requires comparitively more time is performed only
once at the beginning. Thus the number of nodes in the query
graph and the time required to perform an intersection
operation are the two major factors.
VI. CONLCLUSIONS
In this paper we have suggested an indexing method for the
efficient querying of attack graphs. In large attack graphs
consisting of a number of machines, it is often desirable to
identify which machines are vulnerable to attacks. Due to the
presence of such vulnerable machines, the whole network may be compromised. Thus the identification of these machines in a
network helps the network administrator in preventing attacks
by introducing remedial measures. We have proposed an
algorithm which uses the above index to identify the presence
of all such vulnerable machines in a network from a given
attack pattern. Our algorithm is efficient as the solution is
generated by a single traversal of the query graph. Further as
shown, the time complexity depends mainly on the time taken
to perform an intersection operation.
This is the first time such an analysis has been performed
on attack graphs, to the best of our knowledge. In future, we plan to utilise this indexing scheme for solving other types of
queries which will lead to a better analysis of attack graphs.
ACKNOWLEDGMENT
We would like to acknowledge our guide Mridul Sankar
Barik of the Department of Computer Science and
Engineering, Jadavpur University for his constant support. His
knowledge of attack graphs and his tutelage has made our work
possible.
REFERENCES
[1] A. Kiryakov, D. Ognyanov, D. Manov, OWLIM - a pragmatic semantic repository for OWL. In: WISE Workshops. (2005) 182–192.
[2] Y. Theoharis, V. Christophides, G. Karvounarakis, Benchmarking database representations of RDF/S Stores. (2005) 685–701.
[3] P. Kijsanayothin, R. Hewett, “Analytical Approach to Attack Graph Analysis for Network Security”, in ARES „10, Krakow 2010.
[4] P. Ammann, J. Pamula, R. Ritchey, J. Street, “ A host- based approach to network attack chaining analysis”, in Proceedings of the 21st Annual
Computer Security Applications Conference (ACSAC 2005), 2005, pp. 72-84.
86
[5] S. Noel, S. Jajodia, “Understanding complex network attack graphs
through clustered adjacency matrices.” In: Proceedings Computer Security Applications Conference (ACSAC), pp. 160–169 (2005).
[6] N. Ghosh, and S. K. Ghosh, “An intelligent technique for generating minimal attack graph”, First Workshop on Intelligent Security (Security
and Artificial Intelligence) Sec Art 2009. [7] S. Jha, O. Sheyner and J. Wing, “Two formal analysis of attack graphs”,
in CSFW ‟02, proceedings of the 15th IEEE workshop on Computer
Security Foundations, Washington D.C., USA 2002.
[8] Matthias Brocheler, Andrea Pugliese, V.S. Subrahmanian; “DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases”,
Proceedings of the 8th International Semantic Web Conference 2009. [9] Brian Cooper, Neal Sample, Michael Franklin, Gisli Hjaltason, Moshe
Shadmon, A Fast Index for Semistructured Data, Proceedings of the 27th
VLDB Conference, Rome, Italy 2001.
[10] Tian, Y., McEachin, R.C., Santos, C.: “SAGA: A Subgraph Matching Tool for Biological Graphs”. Bioinformatics 23(2) (2007) 232.
[11] T. Cormen, C. Leiserson, R. Rivest, C. Stein, Introduction to Algorithms, MIT Press.
87