extending the similarity-based xml multicast approach with...

8
Extending the Similarity-Based XML Multicast Approach with Digital Signatures Antonia Azzini, Stefania Marrara Dipartimento di Tecnologie dell’Informazione Universit` a degli Studi di Milano Crema, Italy [email protected] [email protected] Meiko Jensen, Jörg Schwenk Horst Görtz Institute for IT-Security Ruhr-University Bochum Bochum, Germany [email protected] [email protected] ABSTRACT This paper investigates the interplay between similarity- based SOAP message aggregation and digital signature ap- plication. An overview on the approaches resulting from the different orders for the tasks of signature application, verifi- cation, similarity aggregation and splitting is provided. De- pending on the intersection between similarity-aggregated and signed SOAP message parts, the paper discusses three different cases of signature application, and sketches their applicability and performance implications. Categories and Subject Descriptors M.4.1.d [Advanced Services Invocation Framework]: SMP protocol; M.3.0.b [Web Services Communication Protocols]: SOAP and WS-Security—similarity-based ag- gregation ; M.13.2.a [Service-Oriented Security Enable- ment at Software Level]: WS-Security optimization General Terms Security, Standardization Keywords XML similarity, SMP protocol, XML Signature, similarity- based aggregation 1. INTRODUCTION Web Services (WS) Security [14] and associated emerg- ing standards define SOAP-level techniques to move security related information along with message content, maintain- ing functionality and interoperability in business processes. Since SOAP messages often carry vital business information, their integrity and confidentiality needs to be protected, and SOAP message security assurance is a challenging part of SOA integration. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SWS’09, November 13, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-789-9/09/11 ...$10.00. SOAP security activities consist of encryption operations and XML filtering [4], with particular interest in signing and verification, that includes parsing, validation, transfor- mation and other document-level operations. While protecting message confidentiality is of paramount importance, it is often desirable to encrypt only parts of a message that is being sent from one entity to another, so that intermediate nodes between the two entities can process the message appropriately. Another important issue regards the performance of SOAP processing. Indeed, like other XML-based protocols, SOAP can consume a large amount of network resources when transmitting messages over the wire. This issue has drawn great interest, and many studies have proposed tech- niques for enhancing SOAP’s performance. SOAP messages consume significant network bandwidth and therefore cause higher latency than other, competitive technologies. While a great effort has been made to optimize SOAP performance in transmission, less attention has been given to the problem of reducing the overhead of applying security policy rules to a large number of messages to be transmit- ted. Recent studies [5] investigate the possibility of using a similarity-based approach in order to identify similar SOAP messages to be aggregated in a single message at the sender side before performing encryption and signing. In this paper we discuss aggregation in the framework of general WS-Security performance. WS-Security utilizes existing XML digital signature and encryption models to specify how to attach security tokens to SOAP messages, together with a set of processing rules. Digital signatures play an important role, due to their non-repudiation, au- thentication and integrity capabilities. The paper is orga- nized as follows: Section 2 presents a sample scenario to depict the application of the presented approach, Section 3 reports some basic foundations necessary to understand this work, Section 4 discusses in detail three possible cases of application of our message aggregation approach outlin- ing advantages and drawbacks, and finally Section 5 depicts conclusions of this work and future possible developments. 2. SAMPLE SCENARIO Most research on SOAP performance focused on the re- duction of SOAP message size and, thereby, on decreasing the overall traffic in the network. This can be done by re- ducing the size of each SOAP message individually via com- pression or by combining SOAP messages to avoid sending duplicate parts of the messages. 45

Upload: others

Post on 07-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

Extending the Similarity-Based XML Multicast Approachwith Digital Signatures

Antonia Azzini, Stefania MarraraDipartimento di Tecnologie dell’Informazione

Universita degli Studi di MilanoCrema, Italy

[email protected]@unimi.it

Meiko Jensen, Jörg SchwenkHorst Görtz Institute for IT-Security

Ruhr-University BochumBochum, Germany

[email protected]ö[email protected]

ABSTRACTThis paper investigates the interplay between similarity-based SOAP message aggregation and digital signature ap-plication. An overview on the approaches resulting from thedifferent orders for the tasks of signature application, verifi-cation, similarity aggregation and splitting is provided. De-pending on the intersection between similarity-aggregatedand signed SOAP message parts, the paper discusses threedifferent cases of signature application, and sketches theirapplicability and performance implications.

Categories and Subject DescriptorsM.4.1.d [Advanced Services Invocation Framework]:SMP protocol; M.3.0.b [Web Services CommunicationProtocols]: SOAP and WS-Security—similarity-based ag-gregation; M.13.2.a [Service-Oriented Security Enable-ment at Software Level]: WS-Security optimization

General TermsSecurity, Standardization

KeywordsXML similarity, SMP protocol, XML Signature, similarity-based aggregation

1. INTRODUCTIONWeb Services (WS) Security [14] and associated emerg-

ing standards define SOAP-level techniques to move securityrelated information along with message content, maintain-ing functionality and interoperability in business processes.Since SOAP messages often carry vital business information,their integrity and confidentiality needs to be protected, andSOAP message security assurance is a challenging part ofSOA integration.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SWS’09, November 13, 2009, Chicago, Illinois, USA.Copyright 2009 ACM 978-1-60558-789-9/09/11 ...$10.00.

SOAP security activities consist of encryption operationsand XML filtering [4], with particular interest in signingand verification, that includes parsing, validation, transfor-mation and other document-level operations.

While protecting message confidentiality is of paramountimportance, it is often desirable to encrypt only parts of amessage that is being sent from one entity to another, so thatintermediate nodes between the two entities can process themessage appropriately.

Another important issue regards the performance ofSOAP processing. Indeed, like other XML-based protocols,SOAP can consume a large amount of network resourceswhen transmitting messages over the wire. This issue hasdrawn great interest, and many studies have proposed tech-niques for enhancing SOAP’s performance. SOAP messagesconsume significant network bandwidth and therefore causehigher latency than other, competitive technologies.

While a great effort has been made to optimize SOAPperformance in transmission, less attention has been givento the problem of reducing the overhead of applying securitypolicy rules to a large number of messages to be transmit-ted. Recent studies [5] investigate the possibility of using asimilarity-based approach in order to identify similar SOAPmessages to be aggregated in a single message at the senderside before performing encryption and signing.

In this paper we discuss aggregation in the frameworkof general WS-Security performance. WS-Security utilizesexisting XML digital signature and encryption models tospecify how to attach security tokens to SOAP messages,together with a set of processing rules. Digital signaturesplay an important role, due to their non-repudiation, au-thentication and integrity capabilities. The paper is orga-nized as follows: Section 2 presents a sample scenario todepict the application of the presented approach, Section3 reports some basic foundations necessary to understandthis work, Section 4 discusses in detail three possible casesof application of our message aggregation approach outlin-ing advantages and drawbacks, and finally Section 5 depictsconclusions of this work and future possible developments.

2. SAMPLE SCENARIOMost research on SOAP performance focused on the re-

duction of SOAP message size and, thereby, on decreasingthe overall traffic in the network. This can be done by re-ducing the size of each SOAP message individually via com-pression or by combining SOAP messages to avoid sendingduplicate parts of the messages.

45

Page 2: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

Figure 1: An example for a signed SOAP message

A similarity-based SOAP multicast protocol has to satisfythree main aspects: first, it must be able to measure thesimilarity between messages to determine which messagesare similar enough to be grouped together into one message.Second, a special SMP message structure based on the stan-dard SOAP envelope must be defined to contain the dataof multiple recipients in one message. Third, the SMP solu-tion needs to deal with the processing of aggregated SOAPmessages at intermediary nodes.

In this paper we consider as running example a nationalmeteorological service authority (NMSA), which usuallybroadcasts weather forecasts and actual weather data tointerested customers. For instance, airports may use thisdata to decide on their operation mode (e.g. close runwaysdue to storm or fog). Therefore, customers have to registerthemselves to the NMSA, providing the location they wantto receive the forecast for, and a communication endpointfor data delivery. As weather is likely to be equally badin a broader area, the resulting weather forecasts for a cer-tain region are probably containing similar or even identicalweather data values, such as temperature, cloudiness, andwind strength and direction.

As the customers of the NMSA are spread over the wholeInternet, the distribution of the weather data can not beaccomplished via traditional IP-based multicast. Thus, theweather data usually has to be transmitted using single TCPconnections in unicast style. This poses a severe load over-head, since SOAP is based on XML and therefore inheritsall XML’s verbosity disadvantages. When there are manytransactions involving similar messages, one by one differen-tial encryption, decryption and signature checking of SOAPmessages can generate a very large amount of work overloadfor the service requester and provider.

Figure 1 shows the XML tree structure of how a weatherdata broadcast SOAP message may look like. The Headerof the message contains the information related to the sig-nature, certified by the corresponding tokens indicated asAuthToken in this example, and all the security informationpresented in the message. Reference elements specify theresources being signed in the SOAP message, and they areincluded in the element SignedInfo to protect them fromtampering. Our approach is based on a similarity-basedSOAP multicast protocol and digital signatures work in or-der to reduce the size of all multicast messages, by joining

Figure 2: An example for five similarity-aggregatedSOAP messages

their common parts in a new one defined as a new singlepart of the SMP body. The new message structure is repre-sented in Figure 2 which shows an aggregated SMP messagethat represents five nearly-identical SOAP messages. Theonly data value that slightly differs across these messages isthe value of wind speed, which is 3 beaufort for messages1,2, and 4, and 2 beaufort for messages 3 and 5. Apartfrom this, the original messages are completely identical.In the aggregate message the signed body is described bya SMP header, that specifies all the information regardinghow the message has been re-organized, and a SMP body,divided into the common element, that contains all the in-formation that remain unchanged in the messages, and thedistinctive element, that explains all the data that cannotbe aggregated, divided into single parts. In such a scenariothe overall computation time is reduced since the hash func-tion related to the common parts of all the SOAP messagesis computed only the first time, and it is presented for allthe others. Different hash functions are then used for thedistinctive remaining parts of the aggregated messages.

3. FOUNDATIONS AND OBJECTIVESIn this section we present some notions and approaches

that are preliminary to the idea presented in this work.

3.1 Similarity-Based Aggregation in XMLIn the past few years, there have been an increased interest

in developing efficient techniques for comparing XML-baseddocuments both in the field of information retrieval (IR) anddatabase retrieval. Before discussing the possibility of apply-ing an XML Signature to an aggregate SOAP message, webriefly review the proposals available in literature on sim-ilarity computation that could provide the foundation formessage aggregation. All available algorithms differ in theefficiency of the aggregation proposed and in the computa-tional costs required. Application to the SOAP aggregationposes strict requirements on both accounts so an efficienttrade-off becomes necessary.

3.1.1 XML Structural SimilarityIn the literature it is possible to retrieve various ap-

proaches for determining structural similarities betweenXML documents. Most of them derive from the techniquesfor finding edit distance between strings (e.g., [11]). In

46

Page 3: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

essence, all these approaches find the cheapest sequence ofedit operations that can transform one tree into another.Early approaches as [3] allow insertion, deletion and rela-beling of nodes anywhere in the tree. Some works as, forinstance, [2] restrict insertion and deletion operations to leafnodes and add a move operator that can relocate a sub-tree,as a single edit operation, from one parent to another. Theapproach presented in [1] allows insertion and deletion oper-ations of leaf nodes, and allows the relabeling of nodes any-where in the tree. The paper [15] extends a previous workby adding the operations insert tree and delete tree toallow insertion and deletion of whole sub-trees. The laststructural similarity approach we review is [7], which usesthe Fast Fourier Transform to compute similarity betweenXML documents.

3.1.2 Semantic Similarity MeasuresSemantic Similarity Measures are used in evaluating the

effectiveness of web search mechanisms in finding and rank-ing results [12]. In the field of Information Retrieval (IR),knowledge bases (thesauri, taxonomies and/or ontologies)provide a framework for organizing words (expressions) intoa semantic space [9]. Therefore, several methods have pro-posed to determine semantic similarity between concepts ina knowledge base. They can be classified as edge-based ap-proaches and node-based approaches. The edge-based ap-proach is used to evaluate semantic similarity in a knowl-edge base. As instance, [10] estimate the distance betweennodes corresponding to the concepts being compared: theshorter the path from one node to another, the more sim-ilar they are. Nevertheless, a widely known problem withthe edge-based approach is that it often relies on the notionthat links in the knowledge base represent uniform distances[9]. In real knowledge bases, the distance covered by a singlelink can vary with regard to network density, node depth,link type and information content of corresponding nodes[9].

3.2 Similarity-Based Aggregation andBroadcasting of XML messages

The basis of our approach, initially presented by Dami-ani and Marrara in [5], is [16] proposing a SOAP multicasttechnique, called Similarity-based Multicast Protocol (SMP),which takes into account the similarity of SOAP messages.SMP was designed to deal with SOAP performance issues byexploiting the similar structure of SOAP messages. The goalis to reduce the total traffic generated over a network whensending SOAP responses from servers to clients. SMP allowssimilar SOAP messages that share some parts of the SOAPtemplate to be sent as a single customized SMP message in-stead of being sent as multiple copies. Clients’ addresses arerepresented as strings and stored in the SMP header, whichis encapsulated inside the SOAP message body. The SMPbody is also embedded inside the SOAP message body.There are two sections in the SMP body: the Commonsection containing common values and structures of all mes-sages addressed to clients encoded in the SMP header; andthe Distinctive section containing individual different partsfor each response message.

The outermost envelope is referred to as an SMP message.The destination of an SMP message, which is specified inthe SOAP header, is the next router in a network when themessage is forwarded to all clients given in its SMP header.

Despite its advantage of saving network traffic, SMP hasa remarkable disadvantage: it uses a conventional routingprotocol (OSPF) to deliver messages to clients. Since OSPFuses Dijkstra’s algorithm, SMP messages are routed alongtheir shortest paths to destinations. Two nodes of a networkare often connected with multiple paths. Therefore, send-ing messages just along least hop paths does not maximizethe saving of traffic resulted from the similarity of messages.In addition, SMP has a user-configured time frame. Duringthis time period, outgoing SOAP response messages will belined in a queue if their similarity level falls within a thresh-old limit. When a new request message arrives at the server,the server generates its corresponding SOAP response mes-sage and computes its similarity against existing on-queuemessages. The algorithm used for computing similarity re-lies on structural and content comparisons (see Section 3.1).If the computed similarity satisfies the threshold then it isinserted into the queue. If not, the messages that alreadyreside in the queue are sent out as an aggregated SMP mes-sage. As a result, the queue is empty for new requests andthe above aggregation steps can be repeatedly carried outagain. Messages in the queue can also be dispatched au-tomatically after the defined time period expires. It is im-portant to note that to deploy SMP in a real network, allrouters in the network need to be SMP-compatible. Thiscan be done by installing an SMP software, which is an im-plementation of the proposed SMP on each router, to enableit to interpret SMP messages. The SOAP header in an SMPmessage specifies the next hop router as the message’s des-tination. Therefore, when an intermediary router receivesan SMP message, it processes the message as if it is the fi-nal destination of the message. Since an SMP-compatiblerouter operates on the application layer, it has full accessto the message’s envelope and parses the SOAP body to getthe list of clients encoded in the SMP header and the actualpayload in the SMP body.

3.3 XML Signature and WS-SecurityAlong with the benefits that web services provide for net-

work’s transactions, also considerations about data integrityand security should be posed, together with the rights andaccess permissions of the message’s users. Since web servicescommunications regard XML formatted messages a satisfac-tory solution, also considered in this approach, uses XMLSignature for identifying the requester web service, validat-ing message integrity, conforming non-repudiation and en-suring proper security.

As reported in the literature [17] XML Signature definesa set of rules and syntax that are used to handle digital sig-natures of data. Different from a traditional digital signa-ture, that is calculated over a complete message, generallyan XML signature considers a message or a document asconsisting of many elements, and it signs one or more (i.e.the aggregation) of such elements, making the sign processmore flexible and practical. As also reported in [18] XMLSignature Working Group have been creating a specificationfor defining digital signatures in an XML format. The au-thor also reported the main reasons for an XML signaturestandard when there are alternative mechanisms in order tomaintain data security in transit. The first corresponds tothe portability of an XML signature, entwined within theXML data; the second is flexibility, due to their capabilityto refer to many documents or parts of a single document,

47

Page 4: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

SignedInfo It contains or references the signed data and specifies the algorithms used.SignatureMethod It defines the signature algorithm used.CanonicalisationMethod It puts an XML document into a standard format, ensuring that XML documents

containing the same intrinsic information have the same binary representationand therefore the same signature.

Reference It contains the URI reference and any optional transformationsthat could be applied to the resource before signature.

SignatureValue It contains the actual value of the signature.KeyInfo Optional element that can be used to include key material with the digital signature.

Figure 3: Elements defining an XML Signature

as required, also due to the use of XML standards. Further-more, XML signature is optimized for XML documents, butit can also be used to sign non-XML documents.

The XML signature components are shown in Figure 3.XML digital signatures can be divided into three main

classes, depending on where the signature is applied: in-deed, if used to sign a resource outside its containing XMLdocument it is called a detached signature; if used to signsome part of its containing document it is called envelopedsignature, and finally, if it contains the signed data withinitself it is called enveloping signature.

The main steps carried when a digital signature is imple-mented are as follows: 1) create a SignedInfo element withSignatureMethod, CanonicalizationMethod and References;2) canonicalize the XML document; 3) calculate the Sig-natureValue, depending on the algorithms specified in theSignedInfo element; 4) create the digital signature, whichalso includes the SignedInfo, KeyInfo and SignatureValueelements.

4. DIGITAL SIGNATURES AND XCASTWhatever kind of similarity-based aggregation is used, if

the data to be delivered using XCast has to be integrity-protected, it must be digitally signed. As digital signa-tures are invalidated on any character modification withinthe signed block, this poses some problems to the use ofsimilarity-based aggregation. These are to be investigatednext. Please note that the scenarios and requirements dis-cussed below must be considered regardless of the actualsimilarity approach used, as they apply to any kind ofsimilarity-based aggregation. However, the optimization po-tential of the signature application strongly depends on theoptimization level of the similarity approach used.

Regarding the application of digital signatures to XCaststyle broadcast messages, it is necessary to consider four dif-ferent tasks in a SOAP message’s lifecycle: 1) The task ofapplying a digital signature to the SOAP message’s contents,2) the aggregation of several single SOAP messages to a sin-gle, similarity-based broadcast message, 3) the splitting of asingle broadcast message into several new messages (eithersmaller broadcasts or single SOAP messages), 4) the verifi-cation of the digital signatures applied to the document.

In the following, we are discussing the different scenariosthat result from performing these tasks in different orders.

4.1 Sign-Join-Split-Verify: Naïve ApproachObviously, as SOAP messages with digital signatures still

remain SOAP messages, they can be used as-is for similarity-based aggregation. According to the WS-Security and XMLSignature specifications, the application of a digital signa-

Figure 4: First scenario: Sign-Join-Split-Verify

ture to SOAP messages does not require many changes to thedocument’s contents. It causes the addition of a new SOAPheader value (Security), and—depending on the XML ref-erencing approach taken—it might require an additional IDattribute at the root element of the signed subtree. Eitherway, most of the document’s contents remain identical, thusthe similarity-based aggregation most likely will perfom itsbenefits even in this kind of scenario.

Once the message’s aggregation and re-separation taskshave been completed, the resulting single SOAP messagesare identical to those messages used in the signature-application step. Thus, a verification of the contained digitalsignatures will result in the same hash values, and signatureverification succeeds.

Nevertheless, though this approach can benefit from theperformance boost effects described in [16], it does not pro-vide any valueable improvements regarding digital signatureapplication in detail.

4.2 Join-Sign-Split-Verify: BroadcastApproach

Figure 5: Second scenario: Join-Sign-Split-Verify

Considering the original SMP-based XCast approach [16],the use of this approach would require re-application andre-verification of digital signatures on every router from thesender to all recipients. Obviously, this approach has se-vere flaws. Thus, we investigate the scenario considered forXCast, and provide some details on how to apply digitalsignatures here appropriately.

Figure 6 shows an example routing path for an XCast-

48

Page 5: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

Figure 6: XCast scenario with digital signatures using the Join-Sign-Split-Verify approach

Figure 7: Examples of semantically equivalent XMLfragments

based message broadcast. Starting at the message creator’sside, it shows how three different (but similar) SOAP mes-sages are to be transmitted to three different recipients. Asthe recipients are not located within the same network, thereare two XCast routers located at the network edge nodes.The XCast approach now consists in aggregating the threeSOAP messages at the sender side to become a single newmessage (compare Figure 2). This single message is thendelivered to the first XCast router (first split block in thefigure). Here, the message is processed and split into twonew messages (C1 and C4). C1 only contains those partsof the SMP message that are addressed to R1 or R2, whileC4 contains only the single SOAP message addressed to R3.This processing mode is repeated for the SMP message C1

accordingly. In the end, the final XCast routers (C2, C3,and C4) transform the last SMP messages to the originalsingle SOAP messages.

Before going into the details of the approach, we discusshere the benefits of using a similarity approach to aggregatethe messages before applying the signature. Indeed the sim-ilarity approach, applied before computing the canonicaliza-tion of the original messages, can identify parts of messagesthat are identical on a semantics point of view but with dif-ferent structure. These are parts that applying the canoni-calization algorithm will result identical. Some examples ofsuch documents are shown in Figure 7. Using a traditionalapproach, we should first canonicalize each original messageand then recognize that these parts are identical and can beaggregates. With our similarity-based approach we can firstrecognize and aggregate these semantically but not struc-turally identical parts and then apply the canonicalizationprocedure only once on the resulting aggregate saving timeand computational costs. Indeed it is a well known that it isthe canonicalization process the cost bottleneck of the entireXML signature algorithm.

After this short discussion about the potentiality of thesimilarity approach, in order to enable digital signatures ina most efficient way, it is necessary to distinguish three dif-ferent cases in terms of the relation of SMP messages andsigned XML subtrees. These are to be discussed next.

4.2.1 Case 1: Signed Subtree is Common

Figure 8: Separate signed and distinctive subtrees

The first case (illustrated in Figure 8) is that a digitalsignature must be applied to an XML subtree that is se-mantically identical among all aggregated SOAP messages.An example would be a PKI certificate of the sender, whichmay be contained in all outgoing SOAP message headers forauthentification purposes. As such a certificate is a staticset of string values, it will result in having the same, or atmost a very similar, representation in XML for all outgoingmessages.

As the signed subtree (the certificate data) is common toall SOAP messages considered in the aggregation step, itwill completely be contained in the <common> part of the re-sulting SMP message. Thus, instead of having the signaturecalculated for every single SOAP message, a better approachconsists in performing the aggregation step first. Then, thesignature can be calculated on the appropriate XML subtreefrom the <common> part of the SMP message, and the result-ing <Security> header can be added to the SOAP headerwithin the common part of the SMP message.

Once the so-created SMP message is split at an XCastrouter, the common part of the SMP message will be copiedinto all outgoing SMP messages. Thus, also the signatureheader is copied.

At the final recipient, the last XCast router re-instantiatesthe original SOAP message, but it now includes all digitalsignature metadata in the new <Security> header. Addi-tionally, as the signed contents did not change since their

49

Page 6: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

generation, the signature remains valid. Thus, the signatureverification done by the final recipient will succeed, and thedata integrity remains assured.

4.2.2 Case 2: Signed Subtree is CompletelyDistinctive

Figure 9: The signed subtree is completely includedin distinctive subtree

Depending on the degree of similarity aggregation, it ispossible to have signed contents to be completely distinctivefor all SOAP messages considered (cf. Figure 9). An exam-ple may be a signed WS-Addressing message ID [8], whichis intended to be different for all outgoing SOAP messagesof a certain service.

In this case, the resulting hash values and thus the sig-nature values definitely will be different for different SOAPmessages (for else the signature would be breakable). Thus,there is not much opportunity here to gain the same per-formance optimization effects as in the previous case. Itis necessary and inevitable to calculate every single digitalsignature itself, and have the resulting <Security> headerplaced in the SMP message appropriately.

Using this approach, the signed contents and also the <Se-curity> header are placed in appropriate <distinctive>

parts of the SMP message. Then, the XCast routers willsplit the SMP message as described above. The final XCastrouter will place the contents of the distinctive part atthe corresponding position in the resulting SOAP message.Thus, the signed contents at the final recipient remain un-modified, and the signature verification will succeed.

The only potential performance gain consists in exploit-ing the fact that most of the <Security> header remainsstatic, and thus will be common among all considered SOAPmessages. Thus, it is possible to have most of the <Se-

curity> header placed in the <common> part of the SMPmessage. Nevertheless, for the <DigestValue> and <Signa-

tureValue> elements it would be necessary to add appro-priate new <distinctive> parts for all considered SOAPmessages to the SMP message. Thus, the real performanceimpacts of this kind of optimization can be doubted.

4.2.3 Case 3: Signed Subtree is Partially DistinctiveConsiderably the most common case for the relation of

signed contents and distinctive subtrees consists in signedcontents that are common to all SOAP messages, but con-tain some distinctive subtrees (cf. Figure 11). For example,SOAP messages invoking the same WSDL operation mostlikely have a very similar message structure, but differ in thetext node values of their contents.

On the other hand, this case also implies the most difficul-ties in applying digital signatures, as the message fragments

Figure 11: The signed subtree includes distinctivesubtree

to be signed are no longer represented in a single block, butmay be spread over a bunch of common and distinctive partswithin the aggregated message. Thus, applying a digital sig-nature in such a scenario causes some severe troubles, andmay raise doubts on whether this is a viable approach at all.

Nevertheless, in the following we outline an approach thatenables applying a digital signature to aggregated SOAPmessages while keeping them valid on each XCast split.Thus, a signature is applied once for the aggregated mes-sage, and on each split is automatically copied into each ofthe resulting messages. This way, in the end, each recipientis provided with a single SOAP message that still contains avalid digital signature that protects the very same messagefragments as a common XML signature—applied prior tosimilarity-based aggregation—would provide.

Approach Description.The main idea of this approach is to apply a signature

for each part of the aggregated message that is to be signed(cmp. Figure 10). Thus, instead of performing a single sig-nature on the whole message fragment, we suggest to apply nsignatures on all common and distinctive parts that containcontents to be signed. Then, once the aggregated message issplit at any intermediate XCast router, these signatures areattached to all resulting messages (either SMP or SOAP)that still contain contents covered by the signature. In theend, the final recipient has to verify the signature for eachsingle part he receives in order to determine the validity ofthe signature as a whole.

Before coping with the security considerations for this ap-proach, it is necessary to dive into some details of XMLSignature for the realization. As explained in Section 3.3,XML Signature enables the application developer to instan-tiate one signature with a list of Reference elements, re-sulting in that each of these references is resolved, hashed,and verified as part of the overall signature verification task.This concept can be adapted easily to be used for the ap-proach to be presented. Thus, each signed part of the ag-gregated message is referenced by its very own Reference

element within the (single) SignedInfo block. Then, oncethe aggregated message is split at any XCast router, theSignedInfo block is added to all messages that still containsigned contents. This implies that some of the fragmentsreferenced in the SignedInfo block are no longer present inthe resulting (SMP or SOAP) message, but for those thatremain, the referenced fragment’s hash values and thus theDigestValues are still valid. In the end, the only adaptationrequired for proper signature validation is that the final re-

50

Page 7: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

Figure 10: Outline of an SMP message containing a signature on a partially distinctive subtree

Figure 12: Outline of a SOAP message resulting from an SMP message with aggregated signature

cipient ignores all references that are not pointing to a validlocation in the message document.

Security Considerations.Investigating the security properties provided by the pre-

sented approach of fragmentized signature contents, an im-portant precondition is to evaluate the reasonable require-ments posed on a digital signature to be fulfilled. Thisstrongly depends on the intended semantics behind the sig-nature application.

The major purpose a digital signature is used for is to en-sure data integrity. As every modification to the signed datacontents would instantly invalidate the signature, a com-mon digital signature perfectly fulfills the requirements ofthis task. Let us investigate this property regarding the ap-proach presented above. As can be seen, every content thatis referenced within the signature is also protected by thecorresponding DigestValue, thus every modification to anyof these signed message parts also instantly invalidates theaggregated signature as a whole. In this sense, given thatthe union of all message fragments results in the completeoriginal SOAP message, the aggregated signature providesthe same level of data integrity assurance as a single sig-nature would. The only property an attacker could exploithere is to add or remove some of the parts of the aggregatedSMP message in order to have these be present or absent

intentionally in the recipient’s SOAP message. Thus, an at-tacker can only add or remove message fragments at thosefragment cut-off points that were used during the similarity-based aggregation (cmp. Figure 12). Nevertheless, he cannot add his very own contents, nor modify any of the ex-isting parts without causing a signature invalidation, thushe can only “play with the aggregation-dependent messagefragments”, so to speak.

Obviously, this attack vector still provides someopportunities—depending on the actual SMP message andscenario—for some sophisticated XML rewriting attacks (cf.[13, 6]), but it can be shown that all of these issues canbe addressed by embedding appropriate message structuremetadata along with each part of the aggregated message(see also Section 5). Thus, these issues can be coped with,and in the end, the data integrity provided by the aggre-gated signature approach provides the same level of dataintegrity as a common signature applied prior to similarity-based aggregation would.

4.3 Join-Sign-Verify-Split: SinglecastApproach

The Join-Sign-Verify-Split is the case in which we obvi-ously expect the greatest performance gains. In this casewe suppose that the messages are first aggregated and thensigned. At the receiver side we suppose to have a trusted

51

Page 8: Extending the Similarity-Based XML Multicast Approach with ...sdoshi/jhuisi650/papers/spimacs/SP... · structural similarity approach we review is [7], which uses the Fast Fourier

Figure 13: Third scenario: Join-Sign-Verify-Split

entity which is in charge to verify the signature directly onthe aggregate message (hence the verification process is per-formed only once for all the recipients) and then the trustedun-signed messages are delivered to the final Recipients. Inthis case, at the signing side, all consideration carried out forthe Join-Sign-Split-Verify case still hold. The only differenceconsists in the presence of this trusted entity that performsthe verification process directly on the aggregate message,before the splitting phase, saving time and computationalcosts also in this process.

4.4 Other Application SchemesThe other potential permutations of the four tasks do not

result in useful scenarios. Either do they require nonsenseactivities (e.g. verifying digital signatures that have not yetbeen created) or they fail to be of interest for real-worldscenarios (e.g. the Sign-Verify-Join-Split approach).

5. CONCLUSION AND FUTURE WORKIn this paper, we investigated the interplay between

similarity-based SOAP message aggregation and digital sig-nature application. We provided an overview on the ap-proaches resulting from the different orders for the tasksof signature application, verification, similarity aggrega-tion and splitting. Depending on the intersection betweensimilarity-aggregated and signed SOAP message parts, wediscussed three different cases of signature application, andsketched their applicability and performance implications.

Concluding on the results, we found out that the approachof interweaving similarity-based aggregation and signatureapplication has a lot of potential for providing real perfor-mance gains, compared to each of these tasks being per-formed separately. Thus, an obvious future work consistsin finishing the ongoing prototype implementation of theapproaches, and providing a full evaluation on the differentperformance and network bandwidth impacts. Furthermore,the investigation of using a similarity-aware canonicalizationapproach prior to signature value calculation seems promis-ing, as it may provide further optimization potentials. Fi-nally, we intend to perform an in-depth analysis on the pro-vided level of security, and to derive a formal security proofon the approach of aggregated signatures as described inSection 4.2.3.

6. REFERENCES[1] S. S. Chawathe. Comparing hierarchical data in

external memory. In Proc. of the Twenty-fth Int. Conf.on Very Large Data Bases, pages 90–101, 1999.

[2] A. S. Cobena G. and A. Marian. Detecting changes inxml documents. In ICDE ’02: Proceedings of the 18thInternational Conference on Data Engineering,

page 41, Washington, DC, USA, 2002. IEEEComputer Society.

[3] S. D. and Z. K. Approximate tree matching. PatternMatching in Strings, Trees and Arrays, chapter 14,1995.

[4] E. Damiani, S. D. C. di Vimercati, S. Paraboschi, andP. Samarati. P2p-based collaborative spam detectionand filtering. Peer-to-Peer Computing, IEEEInternational Conference on, 0:176–183, 2004.

[5] E. Damiani and S. Marrara. Efficient soap messageexchange and evaluation through xml similarity. InProceedings of SWS08, pages 29–36, 2008.

[6] S. Gajek, M. Jensen, L. Liao, and J. Schwenk.Analysis of signature wrapping attacks andcountermeasures. In Proceedings of the 7th IEEEInternational Conference on Web Services (ICWS),2009.

[7] S. F. Giuseppe, E. Masciari, L. Pontieri, andA. Pugliese. Detecting structural similarities betweenxml documents. In In Proc. of the 5th Intl. Workshopon the Web and Databases, pages 55–60, 2002.

[8] M. Gudgin, M. Hadley, and T. Rogers. Web ServicesAddressing 1.0 - SOAP Binding. W3CRecommendation, May 2006.

[9] J. J. Jiang and D. W. Conrath. Semantic similaritybased on corpus statistics and lexical taxonomy, 1997.

[10] J. H. Lee, M. H. Kim, and Y. J. Lee. Informationretrieval based on conceptual distance in is-ahierarchies. Journal of Documentation, 49(2):188–207,June 1993.

[11] V. I. Levenshtein. Binary codes capable of correctingdeletions, insertions and reversals. Soviet PhysicsDoklady, 10:707+, February 1966.

[12] A. G. Maguitman, F. Menczer, H. Roinestad, andA. Vespignani. Algorithmic detection of semanticsimilarity. In WWW ’05: Proceedings of the 14thinternational conference on World Wide Web, pages107–116, New York, NY, USA, 2005. ACM.

[13] M. McIntosh and P. Austel. XML signature elementwrapping attacks and countermeasures. In SWS ’05:Proceedings of the 2005 workshop on Secure webservices, pages 20–27, New York, NY, USA, 2005.ACM Press.

[14] A. Nadalin, C. Kaler, R. Monzillo, andP. Hallam-Baker. Web Services Security: SOAPMessage Security 1.1 (WS-Security 2004). 2006.

[15] A. Nierman and H. V. Jagadish. Evaluating structuralsimilarity in xml documents. In WebDB, pages 61–66,2002.

[16] K. A. Phan, Z. Tari, and P. Bertok. Optimizing webservices performance by using similarity-basedmulticast protocol. In ECOWS ’06: Proceedings of theEuropean Conference on Web Services, pages 119–128,Washington, DC, USA, 2006. IEEE Computer Society.

[17] D. J. Polivy and R. Tamassia. Authenticatingdistributed data using web services and xmlsignatures. In Proceedings of the 2002 ACM Workshopon XML Security, pages 80–89, Fairfax, VA, USA,2002. ACM Press.

[18] A. Selkirk. Xml and security. BT Technology Journal,19, July 2001.

52