fundamentals of biometric system design · biometric devices and systems. step 1: develop a clear...

Fundamentals of Biometric SystemDesign

by S. N. Yanushkevich

Chapter 3

Biometric Methods and Techniques: Part I –

Statistics

FAR

FRR

Equal error rate (EER)

0

1

1

• Computing type I errors (FRR)

• Computing type II errors (FAR)

• System performance evaluation

S.N. Yanushkevich, Fundamentals of Biometric System Design 2

PrefaceThe key methodology of the measurement in biometric tech-nology is the engineering statistics.The key methodology of biometric data processing is signalprocessing and pattern recognition.The key performance metrics in biometrics are related tomatching rates (false match rate, false nonmatch rate, andfailure-to-enroll rate).

The crucial point of biometric system design is the measuring of biometric data. It ismandatory knowledge of all biometric design teams. In the engineering environment, thedata is always a sample1 selected from some population. For example, the calculationthe reliability parameters of biometric data, such as the confidence interval and thesample size, is a typical problem of the experimental study, reliability design and qualitycontrol. Engineering statistics provides various tools for assessing the of biometric data,in particular:

◮ Techniques for estimation of mean, variance, and correlation,

◮ Techniques for computing confidence intervals,

◮ Techniques for hypothesis testing, and

◮ Techniques for computing type I and type II errors.

In biometric system, decision making is based on statistical criteria. This is becausebiometric data is characterized by their high variability. Every time a user presentsbiometric data, a unique template2 Given the type of biometric, even two immediatelysuccessive samples of data from the same user generate entirely different templates.Statistical techniques are used to recognize that these templates belong to the sameperson. For example, a user may place the same finger on biometric device several timesand all generated templates will be different. To deal with this variety, statistical toolsmust be used.

For processing the raw biometric data, various techniques of the signal or imageprocessing, pattern recognition, and decision making are used, in particular,

◮ 2D discrete Fourier transform,

◮ Filtering in spatial and frequency domains using Fourier transform,

1 In biometric system, sample is a biometric measure submitted by the user and captured by the dataacquisition tool.

2 A template is a small file derived from the distinctive features of a user’s biometric data. A templateis used in a system to perform biometric matches. Biometric system store and compare templates derivedfrom biometric data. Biometric data cannot be reconstructed from a biometric template.


◮ Classifiers, and

◮ Pattern recognition module design.

In design of a biometric system, these techniques should be considered with respect tothe software or hardware implementation. This lecture brings these techniques togetherin the context of the implementation.

Finally, this lecture introduces notation of the performance of biometric system.Because biometric system is an application-specific computer system, the performanceis defined:

◮ In terms of specific application, such as operational accuracy, and

◮ In terms of computer platform, such as operational time.

In this lecture, performance in terms of operational accuracy is introduced (false rejectand accept rates, false match and non-match rates, failure to enroll).

Essentials of this lecture

• Statistical thinking. Statistical approach should be applied at all phases of a life cycle ofbiometric system including experimental study, design techniques, testing, reliabilityestimation, and quality control. Biometric data must be represented in the formwhich is acceptable for decision making in verification and identification procedures.For this, classic signal processing and pattern recognition methods are adopted.

• Statistical performance evaluation. Performance parameters of biometric system interms of the operational (system) accuracy and operational time (computationalspeed) cannot be measured exactly, they can only estimated using statistical tech-niques.

• Statistical decision-making. The variability of biometric data is propagated into thetemplates and decision making. Decision making at various levels of biometric sys-tem hierarchy is a statistical procedure by nature, that is decision making underuncertainty.


Biometric Methods and Techniques

Biometrics is a multidisciplinary area. Various advanced mathematical and engineeringmethods and techniques are used in biometric system design. In this lecture, the methodsfrom the following directions are briefly introduced:

◮ Statistical methods,

◮ Methods of signal processing, and

◮ Methods of pattern recognition.

1 Basic statistics for biometric system design

Biometric systems begin with the measurement of a behavioral/physiological charac-teristic. Key to all systems is the underlying assumption that the measured biometriccharacteristic is both distinctive between individuals and repeatable over time for thesame individual. Statistical methods provide the techniques for the biometric character-istic measuring.

In the implementation, the problems of measuring and controlling these random vari-ations begin in the data acquisition module. The users characteristic must be presentedto a sensor. The presentation of any biometric to the sensor introduces a behavioral(random) component to every biometric method. The output of the sensor forms theinput data upon which the system is built. It is a combination of (a) the biometricmeasure, (b) the way the measure is presented, and (c) the technical characteristics ofthe sensor. Both the repeatability and distinctiveness of the measurement are negativelyimpacted by changes in any of these factors.

The engineering method and statistical thinking

An engineering approach to formulating and solving problems is applicable to design thebiometric devices and systems.

Step 1: Develop a clear and concise description of the problem.

Step 2: Identify, at least tentatively, the important factors that affect this problem or that may play a rolein its solution.

Step 3: Propose a model for the problem, using knowledge of the biometric phenomenon being used in thebiometric system. State any limitation or assumptions of the model.

Step 4: Conduct appropriate experiments and collect data to test or validate the tentative model or con-clusions made in Steps 2 and 3.

Step 5: Refine the model on the basis of observed data.

Step 6: Manipulate the model to assist in developing an algorithm, program, and hardware platform.


Step 7: Conduct an appropriate experiment to confirm that the proposed design solutions are both effectiveand efficient with respect to given criteria.

Step 8: Draw conclusions or make recommendations based on design solutions.

The field of statistics deals with the collection, presentation, analysis, and use ofdata to make decisions. Statistical techniques are used in all phases of biometric systemdesign, their comparison and testing, and improving existing designs. Statistical methodsare used to help us describe and understand variability. By variability, we mean thatany successive observation of a biometric system or biometric phenomenon does notproduce an identical result. Because the measurements exhibit variability, we say thatthe measured parameter is a random variable. A convenient way to think of a randomvariable, sayX, which represents a measured quantity, is by using an appropriatemodel,for example,

Random variable X = Constant µ+ Noise ǫ

.In the engineering environment, the data is almost always a sample that has been

selected from some population. In biometric system design, data is collected in threeways:

◮ A retrospective study based on historical data; the engineer uses either all or onesample of the historical process data from some period of time; for example, bio-metric data from data bases, data from previous experimental study, etc.

◮ An observation study; the engineer observes the process during a period of routineoperation; for example, facial expressions, signatures, etc.

◮ A designed experiment; the engineer makes deliberate or purposeful changes incontrollable variables called factors of the system, observes the system output,and makes a decision or an inference about which variables are responsible for thechanges that he/she observes in the system output; for example, feature extractionfrom biometric data using an appropriate algorithm

Distinction between a designed experiment and anobservational/retrospective study

An important distinction between a designed experiment and either an observationalor retrospective study is that in the first one the different combinations of the factors ofinterest are applied randomly to a set of experimental units. This allows cause-and-effectrelationships to be established, and that cannot be done with observational/retrospectivestudies. A designed experiment is based on two statistical techniques: hypothesis testing

and confidence intervals.


Example 1: (Designed experiment.) Assume that a company in-troduces a new biometric device. How should an experiment bedesigned to test its effectiveness? The basic method would be toperform a comparison between the control devices and the newdevice.

Any comparison is based on a measurement. If the same thing is measured severaltimes, in an ideal world, the same result would be obtained each time. In practice, thereare differences. Each result is thrown off by chance error, and the error changes frommeasurement to measurement. No matter how carefully it was made, a measurementcould have turned out a bit differently from the previous way it did.

Statistical hypothesis

Many problems in biometric system design require that we decide whether to accept orreject a statement about some parameters. The statement is called a hypothesis and thedecision-making procedure about the hypothesis is called hypothesis testing.

Statistical hypothesis

A statistical hypothesis is an assertion or conjecture concerning one or more populations.The truth or false of statistical hypothesis is never known with absolute certainty, unless weexamine the entire population. This is impractical. Instead, we take a random sample fromthe population of interest and use the data contained in this sample to provide evidence thateither supports or does not support the hypothesis (leads to rejection of the hypothesis).The decision procedure must be done with the awareness of the probability of the wrongconclusion. The rejection of a hypothesis implies that the sample evidence

refutes it. In other words: The rejection means that there is a small probability

of obtaining the sample information observed when, in fact, the hypothesis

is true.

The structure of hypothesis testing is formulated using the term null hypothesis. Thisrefers to any hypothesis we wish to test and is denoted by H0. The rejection of H0 leadsto the acceptance of an alternative hypothesis, denoted by H1.

Null and alternative hypothesisThe alternative hypothesis H1 represents the question to be answered; its specification iscrucial. The null hypothesis H0 nullifies or opposes H1 and is often the logical complimentto H1. This results in one of the two following conclusions:

Reject H0 : In favor of H1 because of sufficient evidence in the data

Fail to reject H0 : because of insufficient evidence in the data


Example 2: (Null and alternative hypothesis.) Suppose that weare interested in deciding whether or not the mean, µ, is equal tovalue 50. Formally it is expressed as Null hypothesis H0 : µ =50 and Alternative hypothesisH1 : µ 6= 50. That is, the con-clusion is that we reject the hypothesis H0 in favor of hypothesisH1 if µ 6= 50.

Because in Example 2 the alternative hypothesis specifies the values of µ that couldbe either greater or less that 50, it is called a two-sided alternative hypothesis. Insome situations, we may wish to formulate a one-sided alternative hypothesis:

Null hypothesis H0 : µ = 50

One-sided alternative hypothesis H1 : µ < 50 or

One-sided alternative hypothesis H1 : µ > 50

Testing a statistical hypothesis

Let the null hypothesis be that the mean is µ = a, and the alternative hypothesis bethat µ 6= a. That is, we wish to test:

Null hypothesis H0 : µ = a

Two-sided alternative hypothesis H1 : µ 6= a

Suppose that a data sample of n is tested, and that the sample mean x is observed.The sample mean is an estimate of the true population mean µ = a. A value of thesample mean x, that falls close to the hypothesized value of µ, is the evidence that thetrue mean µ is really a; that is, such evidence supports the null hypothesis H0. On theother hand, a sample mean x that is considerably different from a, is evidence in supportof the alternative hypothesis H1. Thus, the sample mean represents the test statistics.

Example 3: (Critical region and values.) The sample mean x cantake on many different values. Suppose that if 48.5 ≤ x ≤ 51.5,we will not reject the null hypothesis H0 : µ = 50. If either x <48.5 or x > 51.5, we will reject the null hypothesis in favor of thealternative hypothesis H1 : µ 6= 50. The values of x that are lessthan 48.5 and greater than 51.5 constitute the critical regionfor the test. The boundaries that define the critical regions (48.5and 51.5) are called critical values.


Therefore, we reject H0 in favor of H1 if the test statistic falls in the critical region,and fails to reject H0 otherwise. This decision procedure can lead to either of the twowrong conclusions:

Type I error or False reject rate (FRR): is defined as rejecting the null hypothesisH0 when it is true. The type I error is also called the significant level of the test.The probability of making a type I error is

α = P (Type I error) = P (Reject H0 when H0 is true)

Type II error or False accept rate (FAR): is defined as failing to reject the null hy-pothesis when it is false. The probability of making a type II error is

β = P (Type II error) = P (Fail to reject H0 when H0 is false)

Properties of type I (FRR) and type II (FAR) errors

Property 1: Type I error and type II error are related. A decrease in the probabilityof one generally results in an increase in the probability of the other

Property 2: The size of the critical region, and, therefore, the probability of committinga type I error, can always be reduced by adjusting the critical value(s)

Property 3: An increase in the sample size n will reduce α and β simultaneously

Property 4: IfH0 is false, β is maximum when the true value of a parameter approachesthe hypothesized value. The greater the distance between the true value and thehypothesized value, the smaller β will be.

Recommendations for computing type I and II errors

Type I error. Generally, the designer controls the type I error probability α, calleda significance level, when the critical values (the boundaries that define the criticalregion, see Example 3) are selected. Thus, it is usually easy for the designer to set thetype I error probability at (or near) any desired value. Because the designer can directlycontrol the probability of wrongly rejecting H0, we always think of rejection of the nullhypothesis H0 as a strong conclusion.

Because we can control the probability of making a type I error, α, the problem iswhat value should be used. The type I error probability is a measure of risk,specifically, the risk of concluding the the null hypothesis is false when it really isn’t.So, the value of α should be chosen to reflect the consequences (biometric data, device,system, etc.) of incorrectly rejecting H0:

◮ Smaller values of α would reflect more serious consequences, and


◮ Larger values of α would be consistent with less severe consequences.

This is often hard to do, and what has evolved in much of biometric system design isto use the value α = 0.05 in most situations, unless there is information available thatindicates that this is an inappropriate choice.

Type II error. The probability of type II error, β, is not a constant. It depends onboth the true value of the parameter and the sample size that we have selected. Becausethe type II error probability β is a function of both, the sample size and extent to whichthe null hypothesis H0 is false, it is customary to think of the decision not to reject H0

as a weak conclusion, unless we know that β is acceptably small. Therefore, ratherthan saying we “accept H0” we prefer the terminology “fail to reject H0”.

Failing to reject H0 implies that we have not found sufficient evidence to reject H0,that is, to make a strong statement. Failing to reject H0 does not necessarily meanthere is a high probability that H0 is true. It may simply mean that more dataare required to reach a strong conclusion. This can have important implicationsfor the formulation of hypotheses.

The power of a statistical test is the probability of rejecting the null hypothesisH0 when the alternative hypothesis is true. The power is computed as

Power of a statistical test = 1− β

The power of a statistical test can be interpreted as the probability of correctlyrejecting a false null hypothesis. The power of the test is very descriptive and concisemeasure of the sensitivity of a statistical test, where by sensitivity we mean the abilityof the test to detect differences.

Example 4: (Type I and II errors.) The techniques for computingtype I and II errors for given data sample is shown in Fig. 1.

Estimating the mean

Even the most efficient estimator is unlikely to estimate a population parameter θ exactly.It is true that our accuracy increases with large samples, but there is still no reasonwhy we should expect a point estimate from a given sample to be exactly equal tothe population parameter it is supposed to estimate. It is preferable to determine aninterval within which we would expect to find the value of the parameter. Such aninterval is called an interval estimate.


Design example: Computing type I and II errors

Problem formulation:Let be face features such as the regions of lips, mouth. nose, ears,eys, eyebrow, and other facial measureing be detected. Let biometricdata corresponding to the lip topology is represented by a sample ofsize n = 10, while the mean and the standard deviation are µ = 50and σ = 2.5, respectively. This biometric data has a distributionfor which the conditions of the central limit theorem apply, so thedistribution of the sample mean is approximately normal with meanµ = 50 and σ/

√n = 2.5/

√10 = 0.79. Find the probability of type I

error.

Step 1: The probability of type I errorThe probability of making type I error (or significance level of our test)

α = P (Type I error) = P (Reject H0 when H0 is true)

is equal to the sum of the areas that have been shaded in the tails of the normal distribution. Wemay find this probability as

Probability of type I error, α =

Left tail︷︸︸︷

P (X < 48.5︸︷︷︸

x1

when µ = 50)+P (X >

x2

︷︸︸︷

51.5 when µ = 50)︸︷︷︸

Right tail

The z-values that correspond to the critical values 48.5 and 51.5 are calculated as follows:

z1 =x1 − µ

σ/√n

=48.5− 50

0.79= -1.90 and z2 =

x2 − µ

σ/√n

=51.5− 50

0.79= 1.90

Therefore α = P (Z < −1.90) + P (Z > 1.90) = P (Z < −1.90) + (1 − P (Z < 1.90)) =

0.0287 + (1− 0.9713) = 0.0574

Conclusion: This implies that 5.74% of all random samples would lead to rejection of thehypothesis H0 : µ = 50 when the true mean is really 50.

Step 2: Reducing a type I error by decreasing the critical region

α/2 = 0.0287 α/2 = 0.0287

48.5 51.5µ = 50

From inspection of the critical region for H0 : µ = 50versus H1 : µ 6= 50 and n = 10, note that we can reduceα by pushing the critical regions further into the tailsof the distribution. For example, if we make the criticalvalues 48 and 52, the values of α is

α = P (Z2 <48− 50

0.79+ P (Z2 >

52− 50

0.79)

= P (Z < −2.53) + P (Z > 2.53) = 0.0057 + 0.0057 = 0.0114

Fig. 1: Techniques for computing type I and II errors (Example 4).


Design example: Computing type I and II errors (Continuation)

Step 3: Reducing type I error by increasing the sample size

We could also reduce α by increasing the sample size, assuming that the critical values of 48.5and 51.5 do not change. If n = 16, σ

√

n= 2.5

√

16= 0.625, and using the original critical region, we

find

z1 =48.5− 50

0.625= −2.40 and z2 =

51.5− 50

0.625= 2.40

Therefore α = P (Z < −2.40) + P (Z > 2.40) = 0.0082 + 0.0082 = 0.0164

Step 4: Design decision on type I error

An acceptable type I error can be chosen from the following possibilities:

Type I error from the original critical region Z < −1.90, Z > 1.90 is α=0.0574

The type I error reduced by decreasing the critical region from

Z < −1.90, Z > 1.90 to Z < −2.53, Z > 2.53 is α=0.0114

Type I error reduced by increasing the sample size from n = 10 to n = 16 is α = 0.0164

Step 5: Specification of the probability of type II error

The probability of making type II error is

β = P (Type II error) = P (Fail to reject H0 when H0 is false)

To calculate β, we must have a specific alternative hypothesis; that is, we must have a particularvalue of µ. For example, suppose we want to reject the null hypothesis H0 : µ = 50 wheneverthe mean µ is grater than 52 or less than 48. We could calculate the probability of type II errorβ for the values µ = 52 and µ = 48, and use this result to tell us something about how thetest procedure would perform. Because of the symmetry of normal distribution function, it isonly necessary to evaluate one of the two cases, say, find the probability of not rejecting the nullhypothesis H0 : µ = 50 when the true mean is µ = 52.

Under H1: µ = 52

Under H0: µ = 50

52 54 48 50

◮ The normal distribution on the left (see figure on theleft) is the distribution of the test statistic X whenthe null hypothesis H0 : µ = 50 is true (this iswhat is meant by the expression “under H0 : µ =50”)

◮ The normal distribution on the right is the distribu-tion of the test statistic X when the alternativehypothesis is true and the value of the mean is 52(or “under H1 : µ = 52”).

Fig. 2: Techniques for computing type (continuation of Example 4).



Step 5: (continuation)

Now the type II error will be committed if the sample mean x falls between 48.5 and 51.5 (thecritical region boundaries) when µ = 52. This is the probability that 48.5 ≤ X ≤ 51.5 when thetrue mean is µ = 52, or the shaded area under the normal distribution on the right, that is

β = P (Type II error) = P (48.5 ≤ X ≤ 51.5 when µ = 52)

Step 6: Computing of the probability of type II error

The z-values corresponding to 48.5 and 51.5 when µ = 52 are

z1 =48.5− 52

0.79= −4.43 and z2 =

51.5− 52

0.79= −0.63

Therefore,

Probability of type II error, β = P (−4.43 ≤ Z ≤ −0.63)= P (Z ≤ −0.63)− P (Z ≤ −4.43)

= 0.2643− 0.0000 = 0.2643

Conclusion: If we are testing H0 : µ = 50 against H1 : µ 6= 50 with n = 10, and the truevalue of the mean is µ = 52, the probability that we will fail to reject the false null hypothesis is

0.2643 . By symmetry (see graphical representation in Fig. 2), if the truth value of the mean

is µ = 48, the value of β will also be 0.2643 .

Step 7: Analysis of a type II error

The probability of making a type II error β increases rapidly as the true value µ approaches thehypothesized value.For example, consider the case when the true value of the mean is µ = 50.5 and the hypothesizedvalue is H0 : µ = 50. The true value of µ is very close to 50, and the value for probability oftype II error β is β = P (48.5 ≤ X ≤ 51.5) when µ = 50.5.The z-values corresponding to 48.5 and 51.5 when µ = 50.5 are

z1 =48.5− 50.5

0.79= −2.53 and z2 =

51.5− 50.5

0.79= 1.27

Therefore β = P (−2.53 ≤ Z ≤ 1.27) = P (Z ≤ 1.27) − P (Z ≤ −2.53) = 0.8980 − 0.0057 =

0.8923 . This is higher than in case µ = 52, that is, we are more likely to accept the faulty

hypothesis µ = 50 (fail to reject H0 : µ = 50).

Fig. 3: Techniques for computing type I and type II errors (continuation of Example 4).



Step 7: (Continuation)

Conclusion: The type II error probability is much higher for the case in which the true mean is 50.5than for the case in which the mean is 52. Of course, in many practical situations, we would not be asconcerned with making a type II error if the mean were “close” to the hypothesized value. We would bemuch more interested in identifying the large differences between the true mean and the value specifiedin the null hypothesis.

Step 8: Reducing a type II error by increasing the sample size

Under H1: µ = 52

Under H0: µ = 50

52 54 48 50

The type II error probability also depends on the samplesize n. Suppose that the null hypothesis is H0 : µ = 50and that the true value of the mean is µ = 52. By lettingthe sample size increase from n = 10 to n = 16, wecan compare them using the graphics on the left. Thenormal distribution on the left is the distribution of Xwhen µ = 50, and the normal distribution on the right isthe distribution of X when µ = 52. As shown in figure,the type II error probability is

Probability of type II error β = P (48.5 ≤ X ≤ 51.5) when µ = 52

When n = 16, the standard deviation ofX is δ/√n = 2.5/

√16 = 0.625, and the z-values corresponding

to 48.5 and 51.5 when µ = 52 are

z1 =48.5− 52

0.625= −5.60 and z2 =

51.5− 52

0.625= −0.80

Therefore

Probability of type II error β = P (−5.60 ≤ Z ≤ −0.80)= P (Z ≤ −0.80)− P (Z ≤ −5.60)

= 0.2119− 0.0000 = 0.2119

This β = 0.2119 is smaller than β = 0.2642, so we decrease the probability of accepting the falsehypothesis H0 by increasing the sample size.

Step 9: Design decision on type II error

An acceptable type II error can be chosen from the following possibilities:

The type II error for the original sample size n = 10 and −4.43 ≤ Z ≤ −0.63 is 0.2643

The type II error, reduced by increasing the sample size from n = 10 to n = 16, is 0.2119

Fig. 4: Techniques for computing type I and type II errors(continuation of Example 4).



Step 10: Computing the power of a test

Suppose that the true value of the mean is µ = 52. When n = 10, we found that β = 0.2643, so thepower of this test is

Power of the test = 1− β = 1− 0.2643 = 0.7357

Conclusion: The sensitivity of the test for detecting the difference between a mean of 50 and 52

is 0.7357 . That is, if the true mean is really 52, this test will correctly reject H0 : µ = 50 and

“detect” this difference 73.57% of the time. If this value of power is judged to be too low, the designercan increase either α or the sample size n.

Fig. 5: Techniques for computing type I and type II errors (continuation of Example 4).

Let 0 < α < 1, then the interval a < θ < b, computed from the selected sample, iscalled a 100(1 − α)% confidence interval, the fraction 1 − α is called the degree ofconfidence, and the endpoints a and b, are called the lower and the upper confidencelimits.

If x is the mean of a random sample of size n from a population with known varianceσ2, a 100(1− α)% confidence interval for µ is given by

x− zα2

σ√n

< µ < x+ zα2

σ√n

(1)

where zα2is the z-value leaving an area of α

2to the right.

Practice recommendation. In experiments, σ is often unknown, and normalitycannot always be assumed. If n ≥ 30, s can replace σ, and the confidence interval

Confidence interval = x ± zα2

s√n

may be used. This is often referred to as a large-sample confidence interval. The justifi-cation lies only in the presumption that with a sample as large as 30 and the populationdistribution not too skewed, s (standard deviation from the sample) will be very closeto the true σ and, thus, the central limit theorem prevails. It should be emphasized thatthis is only an approximation, and the quality of the approach becomes better as thesample size grows larger.


The 100(1−α)% confidence interval provides an estimate of the accuracy of our pointestimate. If µ is actually the center value of the interval, then x estimates µ withouterror. However, in most cases, x will not be exactly equal to µ and the point estimate isan error.

Theorem 1: If x is used as an estimate of µ, we can then be 100(1− α)% confident thatthe error will not exceed the value

Error = zα2× σ√

n(2)

Example 5: (Errors of the confidence intervals.) Hand geometryis defined as a surface area of the hand or fingers and the corre-sponded measures (length, width, and thickness). The averagedistance between two points in a hand in 36 different measure-ments is found to be 2.6 mm. Calculate: (a) the 95% and 99%confidence intervals for the mean between these hand points and(b) the accuracy of point estimate using Theorem 1. Assumethat the population standard deviation is σ = 0.3. The solutionis given in Fig. 6.

Often in experimental studies, we are interested in how large a sample of biometricdata is necessary to ensure that the error in estimating µ will be less than a specifiedamount e.

Theorem 2: If x is used as an estimate of µ, we can be 100(1 − α)% confident that theerror will not exceed a specified amount e when the sample size is

Sample size n =

(zα

2× σ

e

)2

(3)

Theorem 2 is applicable only if we know the variance of the population from which weare to select our sample. Lacking this information, we could take a preliminary sample ofsize n ≥ 30 to provide an estimate σ. Then, using this estimate as an approximation forσ in Theorem 2, we could determine approximately how many observations are neededto provide the desired degree of accuracy.


Design example: Errors of the confidence intervals

Problem formulation:Let the hand geometry measurement results in a sample size of n = 36, thesample mean x = 2.6, and the population standard deviation is σ = 0.3.Calculate:(a) the 95% and 99% confidence intervals for the mean between these handpoints;(b) the accuracy of a point estimate using Theorem 1(c) the sample size, if we want to be 95% confident that our estimate of µdoes not exceed 0.05 (Theorem 2)

Step 1: If x is the mean of a random sample of size n from a population with known varianceσ2, a 100(1− α)% confidence interval for µ is given by

x− zα2

σ√n

< µ < x+ zα2

σ√n

where zα2is the z-value leaving an area of α

2 to the right.

Step 2: For α = 0.05 , n = 36, x = 2.6, and σ = 0.3, the 95% confidence interval is

2.6− (1.96)︸︷︷︸

z0.05/2

0.3√36

< µ < 2.6 + (1.96)︸︷︷︸

z0.05/2

0.3√36

, that is, 2.50 < µ < 2.70

Note that z−value, leaving an area of 0.025 to the right, and, therefore, an area of 0.975 to theleft, is z 0.05

2

= z0.025 = 1.96 (see the table)

Step 3: For α = 0.01 , n = 36, x = 2.6, and σ = 0.3, the 99% confidence interval is

2.6− (2.575)︸︷︷︸

z0.01/2

0.3√36

< µ < 2.6 + (2.575)︸︷︷︸

z0.01/2

0.3√36

, that is, 2.47 < µ < 2.73

Note that z−value, leaving an area of 0.005 to the right, and, therefore, an area of 0.995 to theleft, is z 0.01

2

= z0.005 = 2.575 (see the table)Observation: A longer interval is required to estimate µ with a higher degree of confidence.Decision: Based on Theorem 1, we are 95% confident that the sample mean x = 2.6 differs fromthe true mean µ by an amount that is less than

zα2× σ√

n= (1.96)

0.3√36

= 0.98

By analogy, we are 99% confident that the sample mean x = 2.6 differs from the true mean µ byan amount that is less than

zα2× σ√

n= (2.575)

0.3√36

= 0.13

Fig. 6: The error of estimating the mean (Example 5).


Example 6: (Sample size) (Continuation of Example 5) Howlarge a sample is required if we want to be 95% confident thatour estimate of µ is off by less than 0.05? Using Theorem 2,

n =

(zα

2× σ

e

)2

=

(1.96× 0.3

0.05

)2

= 138.3 ≈ 139

Therefore, we can be 95% confident that a random sample of size139 will provide an estimate x different from µ by an amount ofless than 0.05.

2 Biometric system performance evaluation

Fig. 7 contains the basic definitions and terminology used in the design and testing ofbiometric systems. In thus design, the terms such as a sample of biometric data, usertemplate, matching score, decision-making, decision rule and decision error rates areused in specific-application meaning.

2.1 Matching score

A broad category of variables, impacting the way in which the users inherent biometriccharacteristics, are displayed to the sensor. In many cases, the distinction betweenchanges in the fundamental biometric characteristics and the presentation effects maynot be clear.

Two samples of the same biometric characteristic from the same person are not iden-tical due to imperfect imaging conditions, changes in the users physiological or behavioralcharacteristics, ambient conditions, and user‘s interaction with the sensor. Therefore,the response of a biometric matching system is the matching score S(XQ, XI)

Response = Matching score S(Input︸︷︷︸

XQ

,

XI︷︸︸︷

Template)

that quantifies the similarity between the input XQ and the template XI representations.This similarity can be encoded by a single number.


Basic definitions and terminology

Sample: A biometric measure submitted by the user.

Template: A user’s reference measure based on features extracted from

the enrolment samples.

Matching score: A measure of the similarity between features derived

from a presented sample and a stored template. A match/nonmatch

decision may be made according to whether this score exceeds a

decision threshold.

System decision: A determination of the probable validity of a users

claim to identity/non-identity in the system.

Transaction: An attempt by a user to validate a claim of identity or

non-identity by consecutively submitting one or more samples, as

allowed by the system’s decision policy.

Verification: The user makes a positive claim to an identity, requiring

a one-to-one comparison of the submitted sample to the enrolled

template for the claimed identity.

Identification: The user makes either no claim or an implicit negative

claim to an enrolled identity, and a one-to-many search of the

entire enrolled database is required.

Positive claim of identity: The user claims to be enrolled in or known to

the system. An explicit claim might be accompanied by a claimed

identity in the form of a name, or personal identification number

(PIN). Common access control systems are an example.

Negative claim of identity: The user claims not to be known to or enrolled

in the system. For example, enrolment in social service systems

open only to those not already enrolled.

Genuine claim of identity: A user making a truthful positive claim

about identity in the system. The user truthfully claims to be

him/herself, leading to a comparison of a sample with a truly

matching template.

Impostor claim of identity: A user making a false positive claim about

identity in the system. The user falsely claims to be someone

else, leading to the comparison of a sample with a non-matching

template.

Fig. 7: Basic definitions and terminology that are used in biometric system design.


Example 7: (Response.) Similarity encoded by YES (1) or NO(0), between the input XQ given its number 11101 and the tem-plate XI given its number 10011 can be represented by the fol-lowing binary number

NO︷︸︸︷

0

# XQ

︷︸︸︷

11101

#XI︷︸︸︷

10011︸︷︷︸

Binary number

2.2 Decision rule

If the stored biometric template of a user I is represented by XI and the acquired inputfor recognition is represented by XQ, then the null H0 and alternate H1 hypotheses are:

Null hypothesis H0 : Input XQ does not come from the same person as the templateXI ; the associated decision is: “Person I is not who he/she claims to be”;

Alternate hypotheses H1 : Input XQ comes from the same person as the templateXI ; the associated decision is: “Person I is who he/she claims to be”

That is, we wish to test

Null hypothesis H0 : D = D0

Alternative hypothesis H1 : D 6= D0

The decision rule is as follows: if the matching score S(XQ, XI) is less than the systemthreshold t, then decide H0, else decide H1.

Controlled decision making in a biometric system

Nonmate pairs

(different persons)

t

M a t c h i n g s c o r e

Mate pairs

(the same person)

Decision making

P r

o b

a b

i l i

t y

Threshold

The higher the score, the more certain the sys-tem is that the two biometric measurements comefrom the same person. The system decision isregulated by the threshold t:

Decision 1: Pairs of biometric samples gener-ating scores higher than or equal to t areinferred as mate pairs, that is, the pairsbelong to the same person.

Decision 2: Pairs of biometric samples gener-ating scores lower than t are inferred asnonmate pairs, that is, the pairs belongto different persons.


2.3 Decision error rates

Decision errors are due to matching errors and image acquisition errors. These errors aresummed up and drive the decision process at various levels of the system, in particular,in situation when (a) one-to-one or one-to-many matching is required; (b) there is apositive or negative claim of identity; and (c) the system allows multiple attempts (thedecision policy). Biometric performance has traditionally been stated in terms of thedecision error rates.

2.4 FRR computing

The FRR (type I error) is defined as the probability that the user making a true claimabout his/her identity will be rejected as him/herself. That is, the FRR is the expectedproportion of transactions with truthful claims of identity (in a positive ID system) ornon-identity (in a negative ID system) that are incorrectly denied. A transaction mayconsist of one or more truthful attempts dependent upon the decision policy. Note thatrejection always refers to the claim of the user

Example 8: (False reject.) If person A1 types his/her correct userID into the biometric login for the given terminal, this meansthat A1 has just made a true claim that he/she is A1. PersonA1 presents his/her biometric measurement for verification. Ifthe biometric system does not match the template of A1 to theA1’s presented measurement, then there is a false reject. Thiscould happen because the matching threshold is too low, or thebiometric features presented by a person A1 are not close enoughto the biometric template.

Suppose a person A1 was denied his authentication (unsuccessfully authenticated) asA1 n times, while the total number of attempts was N , then FRR = n/N . Statistically,the more times something is done, the greater is the confidence in the result. The resultis the mean (average) FRR for K users of the system:

FRR =1

K

K∑

i=1

FRRi

FRR and matching algorithm

The strength of the FRR is the robustness of the algorithm. The more accurate thematching algorithm, the less likely a false rejection will happen.


2.5 FAR computing

The FAR (type II error) is defined as the probability that a user making a false claimabout his/her identity will be verified as that false identity. That is, FAR is the expectedproportion of transactions with wrongful claims of identity (in a positive ID system) ornon-identity (in a negative ID system) that are incorrectly confirmed. A transaction mayconsist of one or more wrongful attempts, depending upon the decision policy. Note thatacceptance always refers to the claim of the user3.

Example 9: (False accept.) If a person A1 types the user ID ofanother person A2 into the biometric login for the given terminal,this means that A1 has just made a false claim that he or sheis A2. The person A1 presents his biometric measurement forverification. If the biometric system matches A1 to A2, thenthere is a false acceptance. This could happen because thematching threshold is set too high, or it could be that biometricfeatures of A1 and A2 are very similar.

Suppose the person A1 was n times successfully authenticated as A2 in the totalnumber of attempts, N , then FAR = n/N . The FRR is the mean (average) for K usersof a system:

FAR =1

K

K∑

i=1

FARi

FAR and matching algorithm

The FAR characterizes the strength of the matching algorithm. The stronger the algorithm,the less likely that a false authentication will happen.

2.6 Matching errors

Matching algorithm errors, occurred while performing a single comparison of a submittedsample against a single enrolled template/model, are defined to avoid ambiguity withinthe system allowing multiple attempts or having multiple templates.

3 It should be noted that conflicting definitions are implicit in literature. In access control literature,a false acceptance is said to have occurred when a submitted sample is incorrectly matched to a templateenrolled by another user.


False match rate (FMR) is the expected probability that a sample will be falselydeclared to match a single randomly-selected non-self template; that is, measure-ments from two different persons are interpreted as if they were from the sameperson.

False non-match rate (FNMR) is the expected probability that a sample will befalsely declared not to match a template of the same measure from the same usersupplying the sample; that is, measurements from the same person are treated asif they were from two different persons.

Equal error rate (EER) is the value defined as EER=FMR=FNMR, that is, thepoint where false match and false non-match curves cross is called equal error rateor crossover rate. The EER provides an indicator of the system’s performance: alower EER indicates a system with good level of sensitivity and performance.

Difference between false match/non-match rates and false accept/reject rates is in-troduced in Fig. 8.

Example 10: (FMR and FNMR.) Let us assume that a certaincommercial biometric verification system wishes to operate at0.001% FMR. At this setting, several biometric systems, such asthe state-of-the-art fingerprint and iris recognition systems, candeliver less than 1% FNMR. A FMR of 0.001% indicates that,if a hacker launches a brute force attack with a large number ofdifferent fingerprints, 1 out of 100 000 attempts will succeed onan average.

To attack a biometric-based system, one needs to generate (or acquire) a large numberof samples of that biometric, which is much more difficult than generating a large numberof PINs/passwords. The FMR of a biometric system can be arbitrarily reduced for highersecurity at the cost of increased inconvenience to the users that results from a higherFNMR. Note that a longer PIN or password also increases the security while causingmore inconvenience in remembering and correctly typing them.


Difference between false match/non-match rates andfalse accept/reject rates

False match rate (FMR) and false non-match rate (FMNR) are not generally synonymouswith false accept rate (FAR) and false reject rate (FRR), respectively:

◮ False match/non-match rates are calculated over the number of comparisons:

False match

rate

︸︷︷︸

# of comparisons

V erification←−←− Biometric

system

V erification−→−→False

non-match

rate

︸︷︷︸

# of comparisons

◮ False accept/reject rates are calculated over transactions and refer to the acceptanceor rejection of the stated hypothesis, whether positive or negative:

False accept

rate

︸︷︷︸

# of transactions

V erification←−←− Biometric

system

V erification−→−→ False reject

rate

︸︷︷︸

# of transactions

Fig. 8: Difference between false match/non-match rates and false accept/reject rates.

Example 11: (FMR and FNMR.) Consider that airport authori-ties are looking for the 100 criminals.(a) Consider a verification system. The state-of-the-art finger-print verification system operates at 1% FNMR and 0.001%FMR; that is, this system would fail to match the correct users1% of the time and erroneously verify wrong users 0.001% of thetime.(b) Consider an identification system. Assume that the identifi-cation FMR is still be 1%, the FNMR is 0.1%. That is, while thesystem has a 99% chance of catching a criminal, it will producelarge number of false alarms. For example, if 10,000 people mayuse a airport in a day, the system will produce 10 false alarms.


In fact, the tradeoff between the FMR and FNMR rates in a biometric system is nodifferent from that in any detection system, including the metal detectors already in useat all the airports. Other negative recognition applications such as background checksand forensic criminal identification are also expected to operate in semi-automatic modeand their use follows a similar cost-benefit analysis.

2.7 FTE computing

The FTE (failure to enroll) is defined as the probability that a user, attempting tobiometrically enroll, will be unable to do so. The FTE is usually defined by a minimumof three attempts. The FTE can be calculated as follows. Let unsuccessful enrollmentevent occurs if a person A1, on his third attempt, is still unsuccessful. Let n be thenumber of unsuccessful enrollment events, and N be the total number of enrollmentevents. Then FTE = n/N . The mean (average) FTE for K users of a system is

FTE =1

K

K∑

i=1

FTEi

The EER (equal error rate) is defined as the crossover point on a graph that has boththe FAR and FRR curves plotted.

Genuine and impostor distribution

Genuine

distribution

Imposter distribution

FMR

FNMR

Threshold

M a t c h I n g s c o r e

P r

o b

a b

i l i

t y

t

The distribution of scores, generated frompairs of samples taken from the same person,is called the genuine distribution. The dis-tribution of scores while the samples are takenfrom different persons, is called the impostor

distribution.FMR and FNMR for a given threshold t aredisplayed over the genuine and impostor scoredistributions; FMR is the percentage of non-mate pairs whose matching scores are greaterthan or equal to t, and FNMR is the percent-age of mate pairs whose matching scores areless than t.

The FMR (FAR) and FNMR (FRR) are related and must be balanced (Figure 9).For example, in access control, perfect security would require denying access to everyone.Conversely, granting access to everyone would mean no security. Obviously, neitherextreme is reasonable, and biometric system must operate somewhere between the two.


False match rate (FMR) or False non-match rate (FNMR) orFalse accept rate (FAR) False reject rate (FRR)

◮ A FMR and FAR occurs when

a system incorrectly

matches an identity;

A FMR (FAR) is

the probability of

individuals being

wrongly matched.

◮ False matches may occur

because there is

a high degree of

similarity between

two individuals’

characteristics.

◮ In a verification and

positive identification

system, unauthorized

people can be granted

access to facilities or

resources as a result

of an incorrect match.

◮ In a negative

identification system,

the result of a false

match may be to deny

access.

◮ A FNMR and FRR occurs when a system

rejects a valid identity; A FNMR

(FRR) is the probability of valid

individuals being wrongly not

matched.

◮ False non-matches occur because

there is not a sufficiently strong

similarity between individuals’

enrollment and trial templates,

which could be caused by any

number of conditions. For

example, an individual’s biometric

data may have changed as a result

of aging or injury.

◮ In verification and positive

identification system, people

can be denied access to some

facility or resource as a result

of a system’s failure to make a

correct match.

◮ In a negative identification system,

the result of a false non-match

may be that a person is granted

access to resources to which

he/she should be denied.

Balance of FMR (FAR) and FNMR (FRR)

FMR (FAR) and FNMR (FRR) are related and must, therefore, always be

assessed in tandem, and acceptable risk levels must be balanced with the

disadvantages of inconvenience.

Fig. 9: Relations of the FMR (FAR) and FNMR (FRR).


3 Receiver operating characteristic (ROC) curves

The standard method for expressing the technical performance of a biometric device for aspecific population in a specific application is the Receiver Operating Characteristic(ROC) curve.

3.1 Applications of biometric system in terms of a ROC

The system performance at all the operating points (thresholds ) can be depicted in theform given in Fig. 10.

False reject rate (FRR)

Fal

se a

ccep

t rat

e (F

AR

)

Forensic applications

Civilian applications

High security applications

Fig. 10: Typical operating points of different biometric applications are displayed on theROC curve.

An ROC curve plots, parametrically as a function of the decision threshold t = T ,the rate of “false positives” (i.e. impostor attempts accepted) is shown on the X-axis,against the corresponding rate of “true positives” (i.e. genuine attempts accepted) onthe Y -axis.

3.2 Equal error rate (EER) in terms of ROC

Graphical interpretation of the EER is given in Figure 11. The FMR, FNMR, and EERbehavior is expressed in terms of a ROC. The FMR and FNMR can be considered as thefunctions of the threshold t = T . These functions give the error rates when the matchdecision is made at some threshold T .

3.3 Comparison the performance of biometric systems

ROC curves allow to compare the performance of different systems under similar condi-tions, or of a single system under differing conditions.


FAR

FRR

Equal error rate (EER)

0

1

1

◮ When the threshold T is set low, the FMR is high and theFNMR is low; when T is set high, the FMR is low andthe FNMR is high.

◮ For a given matcher, operating point (a point on the ROC)is often given by specifying the threshold T .

◮ In biometric system design, when specifying an applica-tion, or a performance target, or when comparing twomatches, the operating point is specified by choosingFMR or FNMR.

◮ The equal error operating point is defined as the EER. Thematcher can operates with highly unequal FMR andFNMR; in this case, the EER is unreliable summary ofsystem accuracy.

Fig. 11: The relationship between FRR, FAR, and EER.

Example 12: (Comparison two matchers.) Various approachescan be used in matcher design. The matches must be comparedusing criteria of operational accuracy (method and algorithm)and operational time (computing platform). In Figure 12, thetechnique for comparison two matches is introduced using crite-rion of operational accuracy.

3.4 Confidence intervals for the ROC

Each point on the ROC curve is calculated by integrating “genuine” and “impostor”score distributions between zero and some threshold, t = T . Confidence intervals forthe ROC at each threshold, t, have been found through a summation of the binomialdistribution under the assumption that each comparison represents a Bernoulli trial4

The confidence, β, given a non-varying probability p, of k sample/template com-parison scores, or fewer, out of n independent comparison scores being in the region of

4 An experiment can be represented by n repeated Bernoulli trials, each with two outcomes thatcan be labeled success, with a probability p, or failure, with probability 1 − p. The probabilitydistribution of the binomial random variable X, that is, the number of successes in n independent trials,

is b(x;n, p) =

(

n!(n−x)!

)

pxqn−x, x = 0, 1, . . . n. For example, for n = 3 and p = 0.25, the probability

distribution of X can be calculated as b(x; 3, 0.25) =

(

3!(3−x)!

)

(0.25)x(0.75)3−x, x = 0, 1, . . . 3.


Design example: Comparison two matchers using the ROC curves

Problem formulation:

FAR

FRR

Target FNMR

0

1

1

Matcher A

Matcher B

a b

In biometric system design, two types of matches are spec-ified, type A and type B matcher. These matches aredescribed by the ROC curves. Figure in the left shows thecorresponding ROCs for these matches and their operatingpoints for some specified target FNMR. The problem is tochoose the better matcher.

Step 1: Understanding of the initial data

The ROCs of two matches are plotted in the form acceptable for comparison (the same type ofROC and scaling factors). The ROC shows the trade-off between FMR and FNMR with respectto the threshold T . For a given operational matcher, the operational point is specified by theparticular threshold T .

Step 2: Comparison of two matchers

It follows from the ROC characteristics of the matchers, that:

◮ For every specified FMR it has a lower FNMR;

◮ For every specified FNMR it has a lower FMR;

Conclusion

The matcher A is better than matcher A for all possible thresholds T .

Fig. 12: Techniques for comparison two matchers using the ROC curves (Example 12).

integration would be

Confidence intervals 1− β = P (i ≤ k) =k∑

i=0

b(i;n, p)

︸︷︷︸

Available from Table

(4)

where binomial sums are available and given table for different values of n and p is given.


Example 13: (Binomial distribution.) Examples of the manipu-lation of the binomial distribution given n = 15 and p = 0.4, areas follows

(a) P (i ≥ 10) = 1− P (i < 10) = 1−9∑

i=0

b(i; 15, 0.4)

︸︷︷︸

From Table: 0.9662

= 0.0338

(b) P (3 ≤ i ≤ 8) =

8∑

i=3

b(i; 15, 0.4) =

8∑

i=0

b(i; 15, 0.4)

︸︷︷︸

From Table: 0.9050

−2∑

i=0

b(i; 15, 0.4)

︸︷︷︸

From Table: 0.0271

= 0.8779

(c) P (i = 5) = b(5; 15, 0.4) =

5∑

i=0

b(i; 15, 0.4)

︸︷︷︸

From Table: 0.4032

−4∑

i=0

b(i; 15, 0.4)

︸︷︷︸

From Table: 0.2173

= 0.1859

Equation 4 might be inverted to determine the required size, n, of a biometric testfor a given level of confidence, β, if the error probability, p, is known in advance.

3.5 The number of comparison scores

The required number of comparison scores (and test subjects) cannot be predicted priorto testing. To deal with this, Doddingtons Law is to test until 30 errors have beenobserved.

Example 14: (Doddingtons law.) If the test is large enough toproduce 30 errors, we will be about 95% sure that the true valueof the error rate for this test lies within about 40% of that mea-sured.

If the test is large enough to produce 30 errors, we will be about 95% sure thatthe true value of the error rate for this test lies within about 40% of that measured,provided that Equation 4 is applicable. The comparison of biometric measures will notbe Bernoulli trials and Equation 4 will not be applicable if:

(a) Trials are not independent, and

(b) The error probability varies across the population.

Example 15: (Equation 4 is not applicable.) Trials will not beindependent if users stop after a successful use and continue aftera non-successful use.


Example 16: (Failure to enroll (FTE) rate.) A fingerprint bio-metric system may be unable to extract features from the fin-gerprints of certain individuals, due to the poor quality of theridges. Thus, there is a failure to enroll (FTE) rate associatedwith using a single biometric trait. It has been empirically es-timated that as much as 4% of the population may have poorquality fingerprint ridges that are difficult to image with thecurrently available fingerprint sensors. This fact results in FTEerrors.

3.6 Test size

The size of an evaluation, in terms of the number of volunteers and the number ofattempts made (and, if applicable, the number of fingers/hands/eyes used per person)will affect how accurately we can measure error rates. The larger the test, the moreaccurate the results are likely to be.

Rules such as the Rule of 3 and Rule of 30, detailed below, give lower bounds tothe number of attempts needed for a given level of accuracy. However, these rules areoveroptimistic, as they assume that error rates are due to a single source of variability,which is not generally the case with biometrics. Ten enrolment-test sample pairs fromeach of a hundred people is not statistically equivalent to a single enrolment-test samplepair from each of a thousand people, and will not deliver the same level of certainty inthe results.

The Rule of 3 addresses the question: What is the lowest error rate that can bestatistically established with a given number N of (independent identically distributed)comparisons? This value is the error rate p for which the probability of errors in Ntrials is zero, purely by chance. It can be, for example, 5%.

The Rule of 3

Error rate p ≈ 3

Nfor a 95% confidence level (5)

Error rate p ≈ 2

Nfor a 90% confidence level (6)

Example 17: (Rule of 3.) A test of 300 independent samples can

be said with 95% confidence to have an error rate of 3300

= 1%

or less.


The “Rule of 30” Doddington5 proposes the Rule of 30 for helping determine thetest size: To be 90% confident that the true error rate is within ±30% of the

observed value, we need at least 30 errors.

The rule below generalizes different proportional error bands:

The Rule of 30To be 90% confident that the true error rate is within

±10% of the observed value, we need at least 260 errors



Example 18: (Rule of 30.) If we have 30 false non-match errors in3,000 independent genuine trials, we can say with 90% confidencethat the true error rate is 30

3000= 0.01± 30%, that is,

1%− 0.3 ≤ True error rate ≤ 1% + 0.3

0.7% ≤ True error rate ≤ 1.3%

3.7 Estimating confidence intervals

With sufficiently large samples, the central limit theorem implies that the observed errorrates should follow an approximately normal distribution. However, because we aredealing with proportions near to 0%, and the variance in the measures is not uniformover the population, some skewness is likely to remain until the sample size is quite large.

Confidence intervals under the assumption of normality are considered in Section 1.Often when Equation 1 is applied, the confidence interval reaches into negative valuesfor the observed error rate. However, negative error rates are impossible. This is dueto non-normality of the distribution of observed error rates. In these cases a specialapproaches are required, such as non-parametric methods. The latter reduce the needto make assumptions about the underlying distribution of the observed error rates andthe dependencies between attempts.

5 Doddington, G.R., Przybocki, M.A., Martin, A.F., and Reynolds, D.A. The NIST speaker recog-nition evaluation: Overview methodology, systems, results, perspective. Speech Communication, 2000,31(2-3), 225-254.


References

[1] Bolle R., Connell J., Pankanti S., Ratha N., and Senior A. Guide to Biometrics. Springer,Heidelberg, 2004.

[2] Germain R. S., Califano A., and Coville S. Fingerprint matching using transformationparameter clustering. IEEE Computational Science and Engineering, pp. 42–49, Oct.-Dec. 1997.

[3] Gonzalez R. C., Woods R. E., and Eddins S. L. Digital Image Processing Using MATLAB.Pearson, Prentice Hall, 2004.

[4] Joliffe I. T. Principle Component Analysis. Springer-Verlag, New York, 1986.

[5] Rangayyan R.M. Biomedical Image Analysis. CRC Press, Boca Raton, FL, 2005.

[6] Richards A. Alien Vision: Exploring the Electromagnetic Spectrum with Imaging Tech-nology. SPIE, 2001.


4 Problems

Problem 1: The distances Di between feature points measured in a sample of signatures arerepresented by a normally distributed random variable d with the following parameters of themean µ and standard deviation σ, n(d;µ, σ) (Fig. 13a):

(a) If µ = 40 and σ = 1.5, calculate the probability P (39 < d < 42)Solution:

Step 1:d1 − µ

σ< z <

d2 − µ

σ39− 40

1.5< z <

42− 40

1.5that is − 0.67 < z < 1.33

Step 2: P (39 < d < 42) = P (−0.67 < z < 1.33)

Step 3: P (−0.67 < z < 1.33)

= P (z < 1.33)− P (z < −0.67) = 0.6568

Answer: P (39 < d < 42) = 0.6568

Standard normal distribution

n(z;0,1)

P(− 0.67 < z < 1.33)

0 − 0.67 1.33

z

(b) If µ = 2.03 and σ = 0.44, calculate the probability P (d > 2.5)Solution:

Step 1:d− 2.03

0.44>

2.5− 2.03

0.44that is z > 1.07

Step 2: P (d > 2.5) = P (z > 1.07)

Step 3: P (z > 1.07)) = 1− P (z < 1.07) = 0.1423

Answer: P (d > 2.5) = 0.1423


n(z;0,1)

P(z > 1.07)

0 1.07

z

(c) If µ = 5 and σ = 1.58, calculate the probability P (d = 4)Solution: Let d1 = 1.5 and d2=4.5, then


Step 1:d1 − µ

σ< z <

d2 − µ

σ1.5− 5

1.58< z <

4.5− 5

1.58that is − 2.22 < z < −0.32

Step 2: P (1.5 < d < 4.5)

= P (−2.22 < z < −0.32)Step 3: P (−2.22 < z < −0.32)

= P (z < −0.32)− P (z < −2.22) = 0.3613

Answer: P (39 < d < 42) = 0.3613


n(z;0,1)

P(− 2.22 < z < − 0.32)

0 − 2.22 0.32

z

Di

Di

(a) (b)

Fig. 13: The distances Di between feature points measured in a signature (a) and hand(b) are represented by a normally distributed, n(d;µ, σ), random variable d withthe mean µ and standard deviation σ (Problems 1 and 2) .

Problem 2: The distances Di between feature points measured in a sample of signatures arerepresented by a normally distributed, n(d;µ, σ), random variable d with the mean µ = 10 andstandard deviation σ = 1.5 (Fig. 13b). Calculate

(a) P (9 < d < 11) and P (8 < d < 12)

(b) P (d < 10) and P (d < 9)

(c) P (d > 10) and P (d > 11)

(d) P (d = 10) , P (d = 9), and P (d = 11)

(e) P (8 < d < 10) and P (10 < d < 12)


Problem 3: The sample of distances Di between feature points measured on a retina image arerepresented by a normally distributed, n(d;µ, σ), random variable d (Fig. 14a). The samplesize is n = 36 and the sample mean is d = 2.6. The standard variance, σ, of the population isassumed σ = 0.3. Calculate:

(a) 90% confidence interval for µSolution: Using Equation 1

d− zα2

σ√n

< µ < d+ zα2

σ√n

for α = 0.1,α

2= 0.05 z0.05 = 1.645

2.6− 1.6450.3√36

< µ < 2.6 + 1.6450.3√36

2.52 < µ < 2.68

Answer: With 90% confidence, the true mean lies

within 2.52 < µ < 2.68 of the observed sample

mean d = 2.6


n(z;0,1)

5%

0 −1.645 1.645

z

5%

(b) 95% confidence interval for µSolution: Using Equation 1

d− zα2

σ√n

< µ < d+ zα2

σ√n

for α = 0.05,α

2= 0.025 z0.025 = 1.96

2.6− 1.960.3√36

< µ < 2.6 + 1.960.3√36

2.50 < µ < 2.70



mean d = 2.6


n(z;0,1)

2.5%

0 −1.96 1.96

z

2.5%

(c) 99% confidence interval for µSolution: Using Equation 1


d− zα2

σ√n

< µ < d+ zα2

σ√n

for α = 0.01,α

2= 0.005 z0.005 = 2.575

2.6− 2.5750.3√36

< µ < 2.6 + 2.5750.3√36

2.47 < µ < 2.73



mean d = 2.6


n(z;0,1)

0.5%

0 −2.575 2.575

z

0.5%

Observation: The larger the value we choose for zα2

, the wider we make all the intervals andthe more confident we can be that the parameter sample selected will produce an intervalthat contains the unknown parameter µ.

(a) (b)

Fig. 14: The distances Di between feature points measured in a retina (a) and gait (b)are represented by a normally distributed, n(d;µ, σ) (Problems 3 and 4).

Problem 4: The sample of distances Di between feature points measured in a sample of retinaare represented by a normally distributed, n(d;µ, σ), random variable d (Fig. 14b). The samplesize is n = 49 and the sample mean is d = 4.0. The standard variance, σ, of the population isassumed σ = 0.2. Calculate:

(a) 85% confidence interval for µ

(b) 90% confidence interval for µ


(c) 95% confidence interval for µ

(d) 98% confidence interval for µ

Compare the confidence intervals

Problem 5: How large is the size of sample considered in Problem 3, if we want to be:

(a) 90% confident that our estimate of µ is off by less than 0.05.Solution: Using Equation 3, the sample size is

Sample size n =

(zα

2

× σ

e

)2

=

(1.645× 0.3

0.05

)2

≈ 100

(b) 95% confident that our estimate of µ is off by less than 0.05.Solution: Using Equation 3, the sample size is

Sample size n =

(zα

2

× σ

e

)2

=

(1.96× 0.3

0.05

)2

≈ 138

(c) 99% confident that our estimate of µ is off by less than 0.05.Solution: Using Equation 3, the sample size is

Sample size n =

(zα

2

× σ

e

)2

=

(2.575× 0.3

0.05

)2

≈ 225

Problem 6: How large is the size of sample in Problem 4, if we want to be:

(a) 85% confident that our estimate of µ is off by less than 0.5

(b) 90% confident that our estimate of µ is off by less than 0.5

(c) 95% confident that our estimate of µ is off by less than 0.5

(d) 99% confident that our estimate of µ is off by less than 0.5

Problem 7: Estimate the lowest error rate that can be statistically established with the followingnumber N of (independent identically distributed) comparisons:

(a) 90% confident that the lowest error rate p, for which the probability of zero errors in 30trials is purely by chanceSolution: Using the Rule 5, the lowest error rate p is p = 2/30 = 0.07 or 7%

(b) 90% confident that the lowest error rate p, for which the probability of zero errors in 100trials is purely by chanceSolution: Using the Rule 5, the lowest error rate p is p = 2/100 = 0.02 or 2%


(c) 95% confident that the lowest error rate p, for which the probability of zero errors in 30trials is purely by chance.Solution: Using the Rule 5, the lowest error rate p is p = 3/30 = 0.1 or 10%

(c) 95% confident that the lowest error rate p, for which the probability of zero errors in 100trials is purely by chanceSolution: Using the Rule 5, the lowest error rate p is p = 3/100 = 0.03 or 3%

Problem 8: Using the Rule of 30, determine the true error rate in the following experiments:

(a) 1 error is observed in 30 independent trials

(b) 1 error is observed in 100 independent trials

(c) 10 error is observed in 500 independent trials

(d) 50 error is observed in 1000 independent trials

Problem 9: Suppose that a device’s performance goal is to reach 1% false non-match rate, anda 0.1% false match rate. Using the Rule of 30, estimate the number of genuine attempt trialsand impostor attempt trials.Solution: 30 errors at 1% false non-match rate implies a total of 3,000 genuine attempt trials,and 30 errors at 0.1% false match rate implies a total of 30,000 impostor attempt trials. Notethat the key assumption is that these trials are independent.

Problem 10: The distances Di between feature points measured in 100 fingerprints are repre-sented by a normally distributed, n(x;µ, σ), random variable x with the sample mean x = 71.8(Fig. 15a). Assuming a population standard deviation of σ = 8.9, does this seem to indicatethat the mean of distances is greater than 70? Use a 0.1 level of significance.Solution:

Input data: x = 71.8, σ = 8.9, n = 100, µ = 70, and α = 0.05

Step 1: Formulate the hypotheses

H0 : µ = 70

H1 : µ > 70

Step 2: Critical point for α = 0.1 is z0.05 = 1.645 (from thetable); critical region is defined as z0.05 > 1.645

Step 3: Critical point for input data (x = 71.8, σ = 8.9, n =100, and µ = 70)

z =x− µ

σ/√n=

71.8− 70

8.9/√100

= 2.02

Step 4 Decision: Reject H0 and conclude that the mean isgreater than 70.


n(z;0,1)

Critical region z > 1.645

0 2.02 1.645

z


Di

Di

(a) (b)

Fig. 15: The distances Di between feature points measured in a fingerprint (a) and face(b) are represented by a normally distributed, n(d;µ, σ) (Problems 10 and 11).

Problem 11: The distances Di between feature points measured in 50 facial images are repre-sented by a normally distributed, n(x;µ, σ), random variable x with the sample mean x = 7.8(Fig. 15b). Assuming a population standard deviation of σ = 0.5, does this seem to indicatethat the mean of distances is greater or less than 78? Use a 0.01 level of significance.Solution:


Input data: x = 7.8, σ = 0.5, n = 50, µ = 8, and α = 0.01

Step 1: Formulate the hypothesises

H0 : µ = 8

H1 : µ 6= 8

Step 2: Critical points for α = 0.012

= 0.005 are z0.005 ={−2.575; 2.575} (from the table); critical region is de-fined as z0.005 < −2.575 and z0.005 > 2.575

Step 3: Critical point for input data (x = 7.8, σ = 0.5, n =50, µ = 8, and α = 0.01)

z =x− µ

σ/√n=

7.8− 8

0.3/√50

= −2.83

Step 4 Decision: Reject H0 in favor of the alternative hypoth-esis H1 : µ 6= 8


n(z;0,1)

Critical region z > 2.575

0 2.575

z

−2.575 −2.83

Critical region z < − 2.575

Problem 12: Evaluate the performance of a system that accept at least 5 facial images of impos-tors as belonging to the database of 100, and reject 10 faces of persons enrolled in the database.

fundamentals of biometric system design · biometric devices and systems. step 1: develop a clear...

Documents