aggregation of dns health indicators: issues, expectations ... · aggregation of dns health...

Aggregation of DNS health indicators: issues,expectations and results

E. CasalicchioUniversity of Rome "Tor Vergata"

Dep. of Computer ScienceRome, Italy

[email protected]

M. Caselli, A. Coletta and I. Nai FovinoGlobal Cyber Security Center

Rome, [email protected]@gcsec.org

[email protected]

Abstract—Today DNS community is debating on theissue of Measuring Naming System Health and Security.There are several initiatives in this field, all claimingto be able to measure the DNS health state from alocal perspective. The reality is a bit different and manychallenges are still open: no standard metrics exists(only a shared list of five health indicators); no commonagreement of how to compute health indicators; nocommon concept of normality of the DNS behavior; nostandard framework for data/information sharing.

The Measuring the Naming System (MeNSa) projectproposes to realize a framework providing a formaland structured methodology, metrics and tools for themeasurement of DNS health and security level. In thispaper we concentrate our attention on the measurementaggregation problem.

This paper is a work in progress aiming to stimulatediscussions. The contribution we provide is twofold. Firstwe provide a brief description of the MeNSa projectSecond, we propose a methodology to combine differenthealth and security metrics in aggregated indexes. Ex-perimental results show what is possible to obtain, whatare the open issues and what we expect.

I. INTRODUCTION

The Domain Name System, constitutes the backboneof the modern cyber-world. However, its completelydistributed infrastructure, the fact of being completelyseamless to end-user, have contributed in making itone of the less considered infrastructures of Internet,when speaking of cyber-security. That is not anymoretrue, the Kaminsky’ exploit [5], the attacks that in thelast years have taken advantage of the weaknesses ofthe DNS in order to damage cyber-infrastructures (e.g.[9]), posed serious questions on the security, safety andstability of this system.

The problem of the security of DNS and of itsimpact on the cyber-society was discussed largely bythe community. In 2010 [4] come out the concept ofDNS Health as a way for defining when the DNSsystem is healthy or not, taking as example the humanbody health. The concepts expressed remained at theabstract level, without suggesting how to assess such acomplex property. In particular, as we will describein the next section, the concept of DNS health isdeveloped around five main indicators, i.e. availability,

coherency, integrity, resiliency and speed. What is notclear indeed in this concept, is the way in which itwould be possible, from real, measurable observation,to obtain "numbers" or, better, indexes, that can beused to quantify these properties, and, at last, tosummarize the "level of health" of the portion of DNSunder analysis. Moreover, other open issues are howto define a common concept of normality of the DNSbehavior and what should be the standard frameworkfor data/information sharing.

In the scientific literature (not cited here for lack ofspace) several low level metrics have been identified,allowing to capture single aspects of these indicators,but, to our knowledge, a framework explaining how tocombine together these low level measures has not yetdeveloped.

In this paper, we present our "answer" to this chal-lenge: after a brief description of the Measuring theNaming System (MeNSa) project1, we will show howit would be possible, once identified a well definedmeasuring point of view, to aggregate several differentmetrics related to the DNS system, in order to obtainaggregate indicators of its health.

Issues in the aggregation process are described inthe paper. They are mainly related to the definitionof the proper normalization function of evaluated met-rics and the right meaning could be attributed to thevalue obtained. We expect that the preliminary resultsobtained and the methodology tested in an end-userscenario can be exported to other server side scenariosand validated with real traffic measurement.

The paper is organized as in the following: SectionII recalls the definition of DNS health and security.Section III describes the MeNSa framework, the keychallenges and the main framework concepts. Themethodologies for metrics aggregation we proposeare presented in section IV. The experiments setup,the metrics used for the evaluation and the healthindicators we compute are described in section V-A.

1Measuring the Naming System (MeNSa)Project, GCSEC,http://www.gcsec.org/activity/research/dns-security-and-stability

Section V-B shows the obtained results. Finally SectionVI concludes the paper.

II. HEALTH AND SECURITY CONCEPTS

The DNS is a pillar for distributed services dailyaccessed by millions of Internet users. For that reason,its Security, Stability and Resilience (SSR) cannot beconsidered negligible. While in the last decade thecommunity worked hard to make the DNS more se-cure, stable and resilient, few steps have been donein understanding how to evaluate and quantify theSSR reached, under different conditions, by the DNS.One of the main challenges is indeed to define met-rics explaining what DNS SSR means, and to assessthe impact of changes in the DNS system, due toquery volume increases, or technology changes such asDNSSEC on its SSR properties. In the literature manystudies related to DNS traffic measurement techniquesand performance metrics (e.g. [1][2][3]) exist, but veryfew on DNS-SSR analysis. A first official discussionon indexes to measure DNS SSR is contained in thereport of the 2nd Symposium on DNS SSR [4]. Inthis report was defined the notion of DNS Health as aconcept capturing performance and resiliency but notdirectly including the notion of security.

According to this report [4] the key indexes todetermine if the DNS is healthy are: availability,coherency, integrity, resiliency and speed. We believethat such a list needs to be extended with the conceptsof stability and security, since a system cannot bedescribed as “healthy” if it is not stable and can becamequickly unhealthy if it is not secure. Therefore in ourwork we extended the list of health indicators with theconcepts of security, stability and vulnerability. Morein details, Security is the potential of the DNS to limitor protect itself from malicious activities (e.g. unau-thorized system access, fraudulent representation ofidentity, and interception of communications). Stabilityis the desired DNS feature to function reliably andpredictably day-to-day (e.g. protocols and standards).Stability facilitates universal acceptance and usage.Vulnerability is defined as the likelihood to find aweakness of an asset or group of assets that can beexploited by one or more threats [10].

III. THE MENSA FRAMEWORK

As stated before, the scope of the MENSA project isto define a methodology and a set of metrics to quantifythe Global health and security level of the DNS. TheDNS community agrees on the fact that while it isa common practice to monitor individually the DNSsubsystems to observe if the traffic parameters deviatefrom the average values, it is a challenge to extractknowledge on the DNS local/global behavior and tounderstand when the behavior is normal or abnormal.

Figure 1. Reference Architecture

The key points we propose to face this challenge arethe following:

1) To refine and improve existing metrics for co-herency, integrity, speed, availability, resiliency,vulnerability and security;

2) To define metric aggregation models to mergemeasured metrics into single and easy to under-stand health and security indicators

3) To refine and improve measurement approachesto compute the seven health indicators for thedifferent DNS actors (Root server operators, Op-erators of non-root authoritative name servers,recursive caches, open DNS resolver, end users);

4) To investigate methodologies (also from otherfields) to model the DNS behavior

5) To identify metric threshold levels that allow theDNS community to trigger when the behavior isnormal or abnormal.

While on the long term the MENSA project wouldprovide a solution to all the above items, in this paper,we concentrated our attention on item (2), that is metricaggregation. Previous results of the project were thedefinition of a set of health and security metrics [8],the framework architectural design and the definitionof the Point-of-View concept. (For more details see theproject deliverables [6], [7], [8])

A. Framework components

The most relevant concepts behind the MeNSaframework are summarized in the following.

A DNS reference model [6]. When defining ameasurement framework, it is important to define theboundaries of the system we want to measure. Figure 1,shows the reference architecture we have taken intoconsideration. The End User Application (e.g. browser)generates DNS queries and can have advanced featuressuch as pre-fetching and internal caching. The Ap-plication Service Provider (ASP) provides distributedservices/applications, mainly using web services tech-nologies. The NET component represents all the net-

work interconnections (LAN, Internet, etc.) and com-ponents interconnecting all the model elements. TheGlobal DNS System (a pseudo-element that indicatesthe Global DNS system in its entirety) and the DNSSub-system (an autonomous Naming Service managedby a specific entity). All other components are usedwith their standard meaning and are described in [6].

A set of metrics to quantify the health and securitylevel of the DNS. The metrics we propose are intendedto evaluate the health of the DNS by measuring theDNS along three dimensions, namely Vulnerabilities,Security, and Resiliency. A comprehensive descriptioncan be found in [8]. The metrics we consider in theexperiments are described in Sec. IV.

A set of measurement techniques and tools put inplace to gather information needed to compute metrics.How measurement is implemented depends on twomain factors: (a) what can be measured from whichpoint; and (b) the time horizon of data collection (e.g.seconds, hours, days or months).

The concept of point-of-view (PoV). A PoV is in-tended as the perspective of a DNS actor/component inobserving, using, operating and influencing the GlobalDNS. Potential users of the MeNSa framework fall inone of the following categories: end users, who aremostly unaware of the DNS function and operation;service providers, e.g. Internet and Application Serviceproviders; and operators, e.g resolvers, name servers,registrars. The definition of different point of viewis intended to categorize what components can beobserved and measured from a specific DNS actor andwhat information is needed from other DNS actors, toproperly assess the level of DNS Health & Security(H&S) perceived. This categorization will allow, foreach PoV, to define a set of health and security indica-tors and a set of computable metrics needed to evaluatethe indicator of interest (see Sec. IV). The six points ofview we defined are: End-User PoV, Application Ser-vice Provider PoV, Resolver PoV, Name Server PoV,Zone PoV, Global PoV. Of main interest for this paperis the end-user PoV, that represents the perspectivefrom which each user can evaluate the Naming service.From the End-User PoV, the components involved inthe resolution process are: the end-user application, thestub resolver and the network, while the only operationof interest will be the DNS lookup process.

B. Framework operation

The framework operation is organized in three macrophases: (1) preliminary diagnosis that, chosen the PoV,performs a first evaluation of the health level perceivedconducting simple measurements and assessments. (2)The definition of the Service Level Objectives (SLOs)and scenario phase, given the PoV, selects one or morethreat scenarios and the measurable and representativeindexes. (3) The detailed diagnosis and measurement

phase assess: the H&S level perceived; the achievableSLOs; the causes of SLO violation and improvementactions. The detailed diagnosis and measurement phaseis organized in three stages: selection of metrics [8],measurement [6], [7] and aggregation.

At the Aggregation stage, all the measures collectedare combined to provide aggregated indexes summa-rizing the Health and Security level perceived by thePoV, what the achievable SLOs are, and finally whatthe cause of health degradation could be and possiblesolutions.

IV. METRICS AGGREGATION

The concept of system health and security is multi-faced and can be hardly captured considering the mea-surements on a single metric. For a correct evaluation itis necessary to compute one or more aggregate indica-tors (or indexes). For example, an end-user desiring acomplete evaluation, would be interested in the overallranking of the DNS health of his system, but also inthe DNS health of the subcomponents.

There can be many ways to aggregate the metrics.In our work we consider two approaches (describedin detail in Sec. IV-A and Sec. IV-B) each of thempresenting advantages and disadvantages. Below aredefined common concepts.

Given a PoV it is possible to define: a Total Eval-uation (TE) index, that expresses the overall level ofDNS health and security; other indicators representingmore specific aspects of the system (e.g. protocolissues, vulnerability to DDoS, Stub resolver issues, andso on). To the specific PoV and related indexes areassociated a set of measurable metrics necessary forcomputing the aggregate indicators. The final conceptassociated to a PoV is the quality mapping that is afunction to normalize measured data and to make themdimensionless.

Formally, a PoV is associated with:• A set of M metrics {m1, . . . ,mM}. Let Di be

the domain of the metric mi, i.e. the measuredvalues vi1, vi2, . . . of mi belongs to Di.

• A set of M quality mappings qi : Di → [0, 1],one for each metric mi. The mapping qi trans-forms the measured value vij into a dimentionlessquality value qij = q(vij), where 0 indicates thelowest quality and 1 indicates the highest one.

• A set of aggregated indicators. Each indicator Ik

is fully defined by its vector of weights wk =(wk1 . . . wkM ) such that

∑Mi=1 wki = 1.

In the following, for ease of presentation, the PoV isidentified with the user of the framework in the specificviewpoint.

A. Session-based Aggregation

This aggregation approach considers a measurementinterval divided in a set {s1, . . . , sS} of several ses-

Algorithm 1: Aggregation by Session

foreach session sj doforeach metric mi do

measure vij ;qij = qi(vij) ;

endforeach indicator k do

compute(Ikj) ;end


compute(Ik) ;compute(∆Ik) ;

end

sions, specified by the PoV. The number S and durationof each session sj is specified by the PoV and setindependently by the considered metrics. Dependingon the goal of the analysis, of the degree of precisiondesired and of the measuring tools the duration ofeach session and the number of sessions can change Ingeneral, a greater number of sessions could make thehealth and security evaluation more precise.

During each session, data are gathered and collected,providing one measured value for each metric, andthese values are aggregated in order to obtain the valuesof the indicators. In this way, each session provides onevalue for each indicator. At the end of the experiment,all the indicators of the all sessions are combinedagain, in order to have the final indicator values andan estimation of the uncertainty of these values.

The aggregation is formally defined in Algorithm 1.For each session sj , the measuring tools provide

one value vij for each metric mi. The measurementsvij are then mapped to the quality values qij throughthe function qi. For each indicator Ik, and for eachsession sj , an aggregated value Ikj is computed as theweighted average mean of the quality values using thevector of weights of the indicator, i.e.

Ikj =M∑i=1

wkiqij (1)

where Iik is the aggregate value of the k-th indicatorIk. At the end of the last session, for each indicatorIk, the final aggregated value Ik is computed as themean value of session indicators Ikj , and the standarddeviation represents the uncertainty ∆Ik, i.e.

Ik =1S

S∑j=1

Ikj ∆Ik =

√√√√ 1S

S∑j=1

(Ikj − Ik)2 (2)

It is worth noticing that this aggregation methoddo not allow to have missing values vij . In a realscenario, it might be the case that during a time session

sj it is impossible to obtain the measurement vih ofa metric mh, for example because the tool fails orsimply because enough data is not available. Thus, thevalue qih is not available as well. In this case, theequation (1) cannot be computed if the indicator Ik issuch that wkh 6= 0. Computing the weighted averageof the remaining values may seem a solution, afternormalizing accordingly the related weights. In otherwords, it could be possible to calculate

Ikj =∑i 6=h

wki

1− wkhqij

but this method alters the way a vector of weights ex-press how important a metric is for a specific indicator.Moreover, if all the values vih such that wkh 6= 0 aremissing, the previous equation cannot be computed aswell. Skipping all the session with some value missingis the only work around for this method. In this way,however, the precision can be much lower.

As explained above, the aggregation method basedon fixed time sessions is simple and provide an easyway to handle the measurement error. However, it doesnot allow any missing values, otherwise the computa-tion is not possible without loosing accuracy.

B. Metrics-based Aggregation

In this section we define a slightly different way ofaggregating measurements to obtain the values of theaggregate indicators.

Unlike the previous method, the session length canchange depending on the metrics. Indeed, for eachmetric mi, the time is divided in Si sessions {sij} withj = 1 . . . Si. The number Si and the duration of thesessions depends on the metric mi, and specifically onthe nature of the measurement. This permits a better-tuned measurement of the specific metric. For example,some metrics may need more data gathered in manyquick sessions, while others may need fewer valuesthat take longer time sessions. Another big advantageof this method is how it handles missing measures. Theactual session specification can be indicated in the PoV,or changed accordingly to the actual requirements ofthe system or the measuring tools. The fixed sessionschema, like in the previous method, is a particularcase of this.

Algoritm 2 shows how the aggregators are computedFor each metric mi, and for every session sij of mi,the values vij are mapped to the quality values qij

through the function qi. Then the mean value and thestandard deviation over the Si sessions are computed,as resp. the quality value of the metric mi and thecorresponding uncertainty level. Formally, for everymetric mi

qi =1Si

Si∑j=1

qij ∆qi =

√√√√ 1Si

Si∑j=1

(qij − qi)2 (3)

Algorithm 2: Aggregation by metrics

foreach metric mi doforeach session sj do

qij = qi(vij) ;endcompute(qi) ;compute(∆qi) ;


compute(Ik) ;compute(∆Ik) ;

end

The aggregated indicators are computed as weightedaverages using their vectors of weights. An estimationof the uncertainty can be expressed by the weightedaverage of the errors of each metrics. Formally, the k-th aggregator and the error estimations are computedas

Ik =M∑i=1

wikqi ∆Ik =M∑i=1

wik∆qi (4)

In some cases, the metrics can be considered inde-pendent. If this is the case, the previous error estima-tion can be replaced by a more precise one computed asa squared weighted average, as standard in error theory.In other words, the error estimation can be computedas

∆Ik =

√√√√ M∑i=1

w2ik∆q2

i (5)

The hypothesis that the metrics are independentcan be too strong in some case. For this reason thePoV may specify if the uncertainty estimation must becomputed as in (4) or in (5).

Aggregating by metrics is more complex than theprevious method. Moreover, the final error must beaggregated as well, using the measurement errors ofeach metric. The proper way for aggregating errorsmay vary and it is subject of further investigation.However, the error theory gives at least two standardmethods for aggregating compound errors: a simpleweighted average and a squared weighted average.The first one should be preferred if metrics may bemutually dependent, and the second when metrics arestatistically independent. On the other hand, it easilyhandles missing values, because for every metric thecomputation is just made on the basis of the actualavailable values. Moreover, the flexible session schemacan be very effective dealing with metrics with differ-ent time requirements.

V. EXPERIMENTAL EVALUATION

The set of experiments that follows is designed toshow how a subset of the defined metrics can be

computed in the end-user PoV and how such metricscan be aggregated.

A. Measurements and Metrics

We set up two testbeds each one composed by aWindows machine running Firefox 8.0. The first wasconnected to the Internet through Italian ISP Fastweb(7 Mbit/s nominal) performed in the GCSEC labora-tory while the second used the GARR network usingas access point the University of Rome “Tor Vergata”(UTV). DNS resolutions are demanded to Fastweb andUTV resolvers respectively.

The tests collected data of 10-12 web browsingsessions each. Every session lasted from 10 to 15minutes for a total of 2 hours. For every session thedata is collected and analysed to get a measure ofthe metrics. Thus, the measurement process providesn = 10-12 values (one for each session) for eachmetric.

In our study we identified a large set of metricsuseful to assess DNS Health and Security. As explainedin [8] we started by considering all the most importantthreat scenarios for the DNS. The interesting metricsidentify the characteristics of the system that changewhen the system stops behaving normally. There aresome metrics that is not always possible to calculatefor a lack of necessary information. For each PoV weselected only the measurable ones. For example, it isimpossible to evaluate the Zone Inconsistencies metricfor end users or application service providers. Finally,in the following experiments, we further skimmed theset of measures considering only those measurable in atimespan of two hours. Therefore, in our experimentswe considered the following metrics.

a) Incoming Bandwidth Consumption (IBC): itis defined as the ratio between the total amount ofincoming data during a session over the duration ofthe session. The domain of this metric is [0, IBCmax],measured in Mbit/s, where the value IBCmax is thenominal maximum bandwidth declared by the IPS.

b) Incoming Traffic Variation (ITV): it is de-fined, for each session i, as IBC i − IBC i−1/lengthi,where IBC i is the incoming bandwidth consump-tion measured in the i-th session and lengthi isthe duration of that session. The domain of thismetric is [−ITV max , ITV max ] (in Mbit/s2), whereITV max = maxi

IBCmax

lengthi.

c) Traffic Tolerance (TT): it measures the RoundTrip Time (RTT) of a IP packet flowing between theend-user node and the ISP’s recursive resolver. Thedomain of the metric is [0, +∞], measured in seconds.

d) Stub Resolver Cache Poisoning (CP): it mea-sures the percentage of poisoned entries of the cache.The domain is [0, 100]. Every entry of the cache ischecked using a set of known recursive resolvers.

e) DNS Requests per Seconds (DNSR): it givesthe total number of DNS queries in the session. Thedomain is [0, +∞].

f) Rate of Repeated Queries (RRQ): in a normalbehavior, during a short session a name should beresolved only once, because of DNS caching. Ifthere are many DNS queries for the same namein the same session, this could be an indicator ofsome misbehavior. The metric returns the number ofrepeated DNS queries in a session. The domain is[0, +∞].

IBC, ITV are measured using NetAlyzer, TT ismeasured using ping. DNSR and RRQ are measuredmonitoring the session with WireShark and analyzingthe resulting PCAP file. Finally, CP is measured dump-ing the cache and parsing its content versus author-itative DNS servers. The comparison has been doneimmediately after the section to avoid open resolvercaches expiration.

For every given metric, the measurement will givea set of values in some kind of domain. The domaindepends on the nature of the metric itself. A possiblefirst step for aggregating measurements ranging ondifferent domain is to transform the measured valuesinto a quality value expressing how good the measuresare. This raw evaluation of single metrics could changein different PoV. We use real numbers in [0, 1], where1 is the best value and 0 the worst one. We call qualitymapping the function used for this transformation.In this way every measurement is mapped into acommon and uniform mathematical domain enablingto aggregate them in an easier way.

The set of quality mapping functions for the abovemetrics are defined as follows.

g) Incoming Bandwidth Consumption (IBC): letIBCMax be the max bandwidth value provided by theISP. The quality mapping q : [0, IBCmax]→ [0, 1] forthis metric is defined as

q(x) = 1− x

IBCMaxh) Incoming Traffic Variation (ITV): the quality

mapping q : [−ITV max , ITV max ] → [0, 1] for theITV metrics is defined as

q(x) =

{e−2x/ITVmax x > 01 x ≤ 0

i) Traffic Tolerance (TT): let RTT avg be theaverage value of the RTT during the session. We definethe quality mapping q : [0, +∞] → [0, 1] for the TTmetric as

q(x) =

1 x ≤ RTT avg

− xRTTavg

+ 2 RTT avg ≤ x ≤ 2RTT avg

0 x > 2RTT avg

IBV 0.999 0.999 … 0.994

ITV 1.00 1.00 … 0.999

TT 1.00 0 … 1.00

CP 0.73 0. 63 … 0.67

DNSR 1.00 1.00 … 0

RRQ 0.77 0.67 … 0.72

TE 0.926 0.72 … 0.735 NET 0.999 0.666 … 0.998 SR 0.846 0.779 … 0.743 PI 0.734 0.633 … 0.679 DoS 0.840 0.779 … 0.743

w = (0.17,0.17,0.17,0.13,0.17,0.17)

w = (1/3,1/3,1/3,0.0.0)

w = (0,0,0,0.27,0.36,0.36)

w = (1/5,1/5,1/5,0,1/5,1/5)

w = (0, 0, 0, 1, 0, 0)

0.763 ± 0.13

0.840 ± 0.16

0,679 ± 0.21

0.693 ± 0.12

0.774 ± 0.14

Sessions

Table IQUALITY RATINGS AND RESULTS OF THE SESSION-BASED

AGGREGATION.

j) Cache Poisoning in the “Stub Resolver” (CP-SR): the quality mapping q : [0, 100] → [0, 1] for thismetric is defined as

q(x) = e−xk

where k is a parameter properly tuned. In our case,after some tests, we decided to map the 10% ofpoisoned entries to the quality value 0.6, obtained withk = 20.

k) DNS Requests per Seconds (DNSR): the qual-ity mapping enables to compare the current DNSbehavior against a good reference. Let DNSRavg bethe average number of the DNS requests per secondduring the session. The quality mapping q for thismetric is defined as

q(x) =

{1− x

2·DNSRavg0 ≤ x ≤ 2 ·DNSRavg

0 x > 2 ·DNSRavg

l) Rate of Repeated Queries (RRQ): let Rmax bethe maximum number of DNS request in the currentsession. It is worth noting that Rmax changes in dif-ferent sessions. The quality mapping q for this metricis defined as

q(x) = 1− x

Rmax

B. Aggregation and Experiment Results

Table I shows quality ratings and the results of theevaluation of the session-based aggregation. Table IIshows the same data for the metric-based aggregation.

For every session, the quality ratings of the metricsare combined in the aggregated indexes that follow(such indexes are valid for the End User PoV).

m) Total Evaluation Index (TE): It gives a globalassessment of the PoV aggregating all the metrics con-sidered. It is worth noticing that the Cache Poisoningmetric can be affected by false positive issues. For thisreason, it should be considered less important than the

0.60 ± 0.46

0.66 ± 0.11

0.61 ± 0.48

0.69 ± 0.15

0.999999 ± 2.24E-06

0.997 ± 0.002 IBC 0.999 0.999 0.997 …

ITV 1.000 1.000 0.999 …

TT 1.00 0 0.30 …

CP 0.73 0.63 0.57 …

DNSR 1.00 1.00 0 …

RRQ 0.77 0.66 0.55 …

0.766 ±0.12

0.867 ±0.15

0.657 ±0.18

0.665 ±0.11

0.781 ±0.13

TE Sessions NET SR PI DoS

0.33

0.33

0.33

0

0

0

0.17

0.17

0.17

0.13

0.17

0.17

0

0

0

0.27

0.36

0.36

0

0

0

1

0

0

0.2

0.2

0.2

0.2

0.2

0.2

Table IIQUALITY RATINGS AND RESULTS OF THE METRIC-BASED

AGGREGATION.

other metrics for the overall evaluation of the system.Thus, the weight of the CP metric for the TE result islower then the weight of the other metrics. This reducesthe impact of the false positives on the results.

n) Protocol Issues Index (PI): It estimates pos-sible DNS protocol problems and, in the our PoV, isrelated only to the Cache Poisoning (Stub Resolver)metric.

o) Denial of Service Index (DoS): It evaluateshow improbable a DoS is in a given scenario. Itaggregates all the metrics but CP.

p) NET Index.: It estimates the performance ofthe network component. It aggregates Incoming Band-width Consumption, Incoming Traffic Variation, TrafficTolerance.

q) Stub Resolver Index (SR): It evaluates the StubResolver performance. It aggregates Cache Poisoning,DNS Requests Variation per Seconds, Rate of RepeatedQueries. The CP metric has a lower weight for thesame false positive issues and in the Total Evaluationcase.

C. Final Remarks on Results

The Total Evaluation is the main result of the EndUser perspective. It reflects the overall performanceof the system concerning the DNS service using thecomponents that can be measured by an user. In thisscenario little disservices can be acceptable, such astemporary DNS failures that just need to reload aweb page. For this reason TE values around 0.8 areacceptable also for a properly functioning system. Withour framework is possible to quantify such service leveland to verify if SLO are violated.

In our first experiment we show results for boththe two types of aggregation discussed above. Thevalues are quite similar. The Total Evaluation is 0.76in both the cases. These values quantify the HS leveleffectively perceived by the End User.

The other aggregated results give some insights ofthe performance of the different aspects of the systemand subcomponent. This information is very importantfor improving the performance of the system, because

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

NET Stub Resolver Protocol Issues DoS Total Evaluation

Misured Value

Aggregation by Metrics Aggregation by Sessions

Indexes

Qu

alit

y

Worst

Best

Figure 2. Threat scenario experiment.

it enables one to spot the components that must befixed in case of malfunctions.

Our calculations show that the Stub Resolver isthe component that is more likely to have problems,because the SR value is far from 1 (~0.66). Instead,the NET component evaluation is positive (~0.85). Itis important to notice that a further analysis is possibleif the aggregated results of the Recursive Resolverperspective are available as input metric for the EndUser PoV. In other words, the outputs of a Point ofView can be used as input metrics of another PoV.Aggregating these values with available local metricsincreases the accuracy of the overall assessment andrefines the evaluation of single components.

In our investigation we also focus on which threatscenario could affect our infrastructure. Some indica-tors give good hints about the likelihood of certainthreat or attacks. In our experiment, for example, thehigh values of PI and DoS (resp. around 0.7 and 0.8)indicate, with a low uncertainty, that the system wasnot affected by protocol issues nor denial of serviceattacks during the measurement timespan. We showthe results in Figure 2.

In the second experiment we use the frameworkto measure performances and security of the DNSinside the GARR network. We decided to use theAggregation by Sessions to obtain the final indicators.The Total Evaluation reaches 0.8 with an improvementof almost 5% compared to tests performed in theGCSEC laboratory. This is mainly due to the resultof the cache poisoning metric (over 0.85).

Both the NET and the Stub Resolver componentshave a good performance: the first is rated 0.86 whilethe second stops just above 0.7.

While the NET and the Stub Resolver express thequality of the related system components, the DoS andProtocol Issue indicators represents threat scenarios.The values of these two indicators are respectively0.80 and 0.85. Thus, we can exclude that the system is

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1


Series3Aggregation by Sessions

Indexes

Qu

alit

y

Worst

Best

Figure 3. GARR network experiment.

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1


Series3Aggregation by Sessions

Indexes

Qu

alit

y

Worst

Best

Figure 4. Cache poisoning experiment.

subject of a DoS attack or suffers from protocol issues.We show indicators’ values of this test in Figure 3

Finally, in the last experiment, we simulated somecache poisoning in order to validate our methodology.We manually corrupt 10% of the DNS cache entries inthe GCSEC laboratory. The Total Evaluation decreasesto 0.7. The NET component is still evaluated around0.8 but the Stub Resolver assessment goes down to0.6. These results entail the presence of problems inthe DNS libraries of the operating system as expected.Going further, the measurement process we discoverthat we clearly suffer from some protocol issues sincethis indicator is 0.38. The DoS indicator remains in-stead above 0.75. Results of this experiment are shownin Figure 4

It is worth noticing how the results can lead to practi-cal actions. Comparing the components’ indicators andthreats’ ones enables to spot the cache poisoning prob-lem. Indeed, the expected report of such a frameworkshould suggest to erase the DNS cache. Repeating thesame evaluation afterwords would further validate thissuggested action.

It is important to remark that the results we obtainedcan not be generalized and must be validated with alarger set of experiments. Our goal has been to showwhat its possible to measure and how metrics can beused and aggregated to investigate DNS health andsecurity.

VI. CONCLUSIONS

The Domain Name System constitutes the hiddenbackbone of Internet. Without its services almost all theapplications making use of the public network wouldnot be able to operate in an efficient manner. Themassive use into critical infrastructures of ICT systems,as logical effect put the DNS under the lights as anew potential source of disservices. DNS communityin the last few years started to reflect on the need ofmethodologies for assessing the Health of the globalDNS system. In this paper after describing at highlevel the Mensa framework, designed to fulfill thisneed, we provided the results of the first tests onfield, showing how, from the end-user PoV, metricscan be aggregated and used as a tool to verify thelevel of service perceived and the presence or absenceof threats. The next step is to validate our frameworkwith a larger set of experiments and to expand theframework in order to cover also the other point ofviews.

REFERENCES

[1] S.Castro, D.Wessels, M.Fomenkov, and K.Claffy. 2008.A day at the root of the internet. SIGCOMM Comput.Commun. Rev. 38, 5, 41-46.

[2] R. Liston, S. Srinivasan, and E. Zegura. 2002. Diversityin DNS performance measures. Proc. of the 2nd ACMSIGCOMM Workshop on Internet measurment. ACM,New York, NY, USA, 19-31.

[3] Sekiya, Y., Cho, K., Kato, A. and Murai, J. (2006),Research of method for DNS performance measurementand evaluation based on benchmark DNS servers. Elec-tronics and Communications in Japan (Part I: Commu-nications), 89: 66-75

[4] ICANN, Measuring the health of the Domain NameSystem, Report of the 2nd Annual Symposium on DNSSecurity, Stability, & Resiliency, 2010, Kyoto, Japan

[5] Dan Kaminsky, "It’s the end of the cache as we knowit", Blackhat USA 2008, Aug. 2008

[6] E.Casalicchio, M.Caselli, D.Conrad, J.Damas,I.N.Fovino, “Reference Architecture, Models andMetrics”, GCSEC technical document, Version 1.5,July 2011

[7] E.Casalicchio, M. Caselli, D.Conrad, J. Damas, I.NaiFovino, "Framework operation, the Web user PoV",GCSEC report, Version 1.1, July 2011

[8] E. Casalicchio, D. Conrad, J. Damas, S. Di Blasi, I.Nai Fovino, "DNS Metric Use Cases", GCSEC report,Version 1.0, May 2011

[9] "DNS hijack hits The Register: All well",http://www.theregister.co.uk/2011/09/05/dns_hijack_service_updated

[10] “Information technology – Security techniques-Information security risk management” ISO/IEC FIDIS27005:2008

aggregation of dns health indicators: issues, expectations ... · aggregation of dns health...

Documents