effects of hard and soft equality constraints on

40
Article Effects of Hard and Soft Equality Constraints on Reliability Analysis Vinicius Francisco Rofatto 1,2 , Marcelo Tomio Matsuoka 1,2,3,4 , Ivandro Klein 5,6 , Maurício Roberto Veronez 4 and Luiz Gonzaga da Silveira Jr. 4 1 Graduate Program in Remote Sensing, Federal University of Rio Grande do Sul, RS, Brazil; [email protected] (V.F.R.); 2 Institute of Geography, Federal University of Uberlandia, Monte Carmelo, 38500-000, MG, Brazil 3 Graduate Program in Agriculture and Geospatial Information, Federal University of Uberlândia, 38500-000, Monte Carmelo, MG, Brazil; [email protected] (M.T.M.) 4 Graduate Program in Applied Computing, Unisinos University, Av. Unisinos, 950, São Leopoldo, 93022-000, RS, Brazil; [email protected] (M.R.V.); [email protected] (L.G.d.S.J.) 5 Department of Civil Construction, Federal Institute of Santa Catarina, Florianopolis, 88020-300, SC, Brazil; [email protected] (I.K.) 6 Graduate Program in Geodetic Sciences, Federal University of Paraná, Curitiba, 81531-990, PR, Brazil * Correspondence: [email protected] Abstract: The reliability analysis allows to estimate the system’s probability of detecting and identifying outlier. Failure to identify an outlier can jeopardise the reliability level of a system. Due to its importance, outliers must be appropriately treated to ensure the normal operation of a system. The system models are usually developed from certain constraints. Constraints play a central role in model precision and validity. In this work, we present a detailed optical investigation of the effects of the hard and soft constraints on the reliability of a measurement system model. Hard constraints represent a case in which there exist known functional relations between the unknown model parameters, whereas the soft constraints are employed for the case where such functional relations can slightly be violated depending on their uncertainty. The results highlighted that the success rate of identifying an outlier for the case of hard constraints is larger than soft constraints. This suggested that hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. After identifying and removing possible outliers, one should set up the soft constraints to propagate the uncertainties of the constraints during the data processing. This recommendation is valid for outlier detection and identification purpose. Keywords: Constraints; Hypothesis Testing; Outlier Detection; Monte Carlo; Quality Control; Geodesy. 1. Introduction It is very common to build models (i.e. the equation systems) based on some initial knowledge about a given problem. In other words, models are often set up in a way that the model parameters need to fulfil certain constraints. Such constraints are a prior knowledge embedded into a model to avoid a trivial solution; to guarantee the stability of estimates; to improve the precision and accuracy of the results by reducing the number of unknown parameters or accordingly by increasing the redundancy of the system; and to mitigate (or even estimate) a possible systematic effect [13]. For example, [4] adopted constraints to determine the transponder coordinates in a problem of combining satellite positioning (GNSS) of a surface platform with acoustic ranging to seafloor transponders; [510] have used constraints to model the atmospheric effects on GNSS signals; and [11] have imposed the Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 April 2020 doi:10.20944/preprints202004.0119.v1 © 2020 by the author(s). Distributed under a Creative Commons CC BY license.

Upload: others

Post on 22-Oct-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effects of Hard and Soft Equality Constraints on

Article

Effects of Hard and Soft Equality Constraints onReliability Analysis

Vinicius Francisco Rofatto 12 Marcelo Tomio Matsuoka 1234 Ivandro Klein 56 MauriacutecioRoberto Veronez 4 and Luiz Gonzaga da Silveira Jr 4

1 Graduate Program in Remote Sensing Federal University of Rio Grande do Sul RS Brazilvfrofattogmailcom (VFR)

2 Institute of Geography Federal University of Uberlandia Monte Carmelo 38500-000 MG Brazil3 Graduate Program in Agriculture and Geospatial Information Federal University of Uberlacircndia 38500-000

Monte Carmelo MG Brazil tomioufubr (MTM)4 Graduate Program in Applied Computing Unisinos University Av Unisinos 950 Satildeo Leopoldo 93022-000

RS Brazil veronezunisinosbr (MRV) lgonzagajrgmailcom (LGdSJ)5 Department of Civil Construction Federal Institute of Santa Catarina Florianopolis 88020-300 SC Brazil

ivandrokleingmailcom (IK)6 Graduate Program in Geodetic Sciences Federal University of Paranaacute Curitiba 81531-990 PR Brazil Correspondence viniciusrofattoufubr

Abstract The reliability analysis allows to estimate the systemrsquos probability of detecting andidentifying outlier Failure to identify an outlier can jeopardise the reliability level of a systemDue to its importance outliers must be appropriately treated to ensure the normal operation of asystem The system models are usually developed from certain constraints Constraints play a centralrole in model precision and validity In this work we present a detailed optical investigation ofthe effects of the hard and soft constraints on the reliability of a measurement system model Hardconstraints represent a case in which there exist known functional relations between the unknownmodel parameters whereas the soft constraints are employed for the case where such functionalrelations can slightly be violated depending on their uncertainty The results highlighted that thesuccess rate of identifying an outlier for the case of hard constraints is larger than soft constraints Thissuggested that hard constraints should be used in the stage of pre-processing data for the purpose ofidentifying and removing possible outlying measurements After identifying and removing possibleoutliers one should set up the soft constraints to propagate the uncertainties of the constraints duringthe data processing This recommendation is valid for outlier detection and identification purpose

Keywords Constraints Hypothesis Testing Outlier Detection Monte Carlo Quality ControlGeodesy

1 Introduction

It is very common to build models (ie the equation systems) based on some initial knowledgeabout a given problem In other words models are often set up in a way that the model parametersneed to fulfil certain constraints Such constraints are a prior knowledge embedded into a model toavoid a trivial solution to guarantee the stability of estimates to improve the precision and accuracyof the results by reducing the number of unknown parameters or accordingly by increasing theredundancy of the system and to mitigate (or even estimate) a possible systematic effect [1ndash3] Forexample [4] adopted constraints to determine the transponder coordinates in a problem of combiningsatellite positioning (GNSS) of a surface platform with acoustic ranging to seafloor transponders [5ndash10]have used constraints to model the atmospheric effects on GNSS signals and [11] have imposed the

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

copy 2020 by the author(s) Distributed under a Creative Commons CC BY license

2 of 40

constraints of predicted satellite clocks to improve the precise orbit determination (POD) processingduring maneuvers

The models are usually formulated with minimal constraint or extra (redundant) constraints Inthat case we are referring to the so-called equality constraints which are usually incorporated into asystem of equations in order to try to have a model well posed [12]

For the most part minimal constraints are introduced to solve to the problem of rank deficiencyin linear systems The rank deficiency is often caused by the lack (or insufficient) information abouta problem In the field of geodesy for example minimal constraints are external information whoseprimary role is to specify the coordinate system to which the network station positions will beestimated by least-squares method (LS) This problem is known as datum definition (or also zero-orderdesign or datum choice problem) [13ndash18] Several works have been investigated in the sense ofminimum-constrained adjustment and the datum choice problem in the geodetic literature focusingon topics like free-adjustment and the role of inner constraints (see eg [19ndash22])

If the number of constraints exceeds the minimum needed to solve the rank deficiency of theequation systems we say that we have redundant (or extra) constraints Extra constraints have alsobeen used to check the stability of points in geodetic deformation analysis (see eg [23ndash25]) to test thecompatibility of constraints with the observations and the rest of the constraints [1142627] So far wehave only distinguished the constraints in terms of numerical quantity

The model can also be subject to a hard and soft (or weighted) constraints Hard constraintscan often represent a case in which there exist known functional relations between the unknownparameters Soft constraints (or looser constraints) are however for which such functional relationscan slightly be violated depending on their uncertainty [327] Soft constraints may also be referred toas a pseudo-observation model [28]

The well-known least-squares (LS) has been widely used as a standard method of modelparameters estimation in geodetic applications and many others branches of modern science [29ndash42]This is due to the flexibility of the LS since no concepts from probability theory are used in formulatingthe least-squares minimisation problem

LS has the property of being a linear unbiased estimator (LUE) and some special cases it coincideswith the best linear unbiased estimator (BLUE) The estimator which has the smallest variance ofall LUEs is called the best linear unbiased estimator (BLUE) If we have the full knowledge of theprobability density function (PDF) of the measurements the method of maximum likelihood estimation(MLE) can also be applied In case of normally distributed measurements (Gauss-Markov model)the MLE estimators are identical to the BLUE ones and therefore the LS and MLE principles provideidentical results [3243]

However the presence of undesirable outliers in dataset makes LS no longer unbiased and itdoes not coincides with MLE Here we assume that an outlier is a measurement that is so probablycaused by a blunder that it is better not used or not used as it is [44] Failure to identify an outliercan jeopardise the reliability level of a system Due to its importance outliers must be appropriatelytreated to ensure the quality of data analysis [45]

In this paper we employ the iterative data snooping (IDS) which is hypothesis test-based outlierImportant to mention that IDS is not restrict to the field of geodetic statistics but it is a generallyapplicable method [4647]

IDS is an iterative outlier elimination procedure which combines estimation testing and acorrective action [4548] Parameter estimation is often conducted in the sense of the LS Thenhypothesis testing is performed with the aim to identify any outlier that may be present in datasetAfter identification the suspected outlier is then excluded from the dataset as a corrective action(ie adaptation) and the LS is restarted without the rejected measurement If the model redundancypermits this procedure is repeated until no more (possible) outliers can be identified (see eg [31] pp135) Although here we restrict ourselves to the case of one outlier at a time IDS can also be applied

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

3 of 40

for the case of multiples (simultaneous) outliers [49] For more details about multiples (simultaneous)outlier refers to [50ndash52]

There are chances of correct and false decisions of IDS because these procedure is based onstatistical hypothesis testing Recently Rofatto et al [45] have provided an algorithm based onMonte Carlo to determine the probability levels associated with IDS In that case they described sixclasses of decisions for IDS namely probability of correct identification (PCI) probability of misseddetection (PMD) probability of wrong exclusion (PWE) probability of over-identification positive(Pover+) probability of over-identification negative (Poverminus) and probability of statistical overlap (Pol)as follows

bull PCI Probability of identifying and removing correctly an outlying measurementbull PMD Probability of not detecting the outlier (ie Type II decision error for IDS)bull PWE Probability of identifying and removing a non-outlying measurement while the lsquotruersquo outlier

remains in the dataset (ie Type III decision error [53] for IDS)bull Pover+ Probability of identifying and removing correctly the outlying measurement and othersbull Poverminus Probability of identifying and removing more than one non-outlying measurement

whereas the lsquotrue outlierrsquo remains in the datasetbull Pol occurs in cases where one alternative hypothesis has the same distribution as the another one

These hypotheses cannot be distinguished because their test statistics are numerically the sameviolating the IDS rule of one outlier at a time In that case they are nonseparable and an outliercannot be identified In other words it corresponds to the probability of flagging simultaneouslytwo (or more) measurements as outliers

Based on the probability of correct detection (PCD) and probability of correct identification(PCI) the minimal biases MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) canbe computed as sensitivity indicators for outlier detection and identification respectively Outlierdetection only informs whether or not there might have been at least one outlier However thedetection does not tell us which measurement is an outlier The localization of the outlier is a problemof outlier identification ie outlier identification implies the execution of a search among themeasurements for the most likely outlier [45] Therefore the smallest value of an outlier that can bedetected given a certain PCD defines the MDB On the other hand the smallest value of an outlier thatcan be identified given a certain PCI defines the MIB

In this paper we investigate the effects of models subject to constraints (minimum redundanthard and soft) on the probability levels associated with IDS It is important to emphasised that if astandard-deviation of a constraint (or a set of a constraint) is changed from zero to a nonzero value itis called ldquorelaxationrdquo of the constraint [28] Here we also evaluate the effect of relaxing constraintson the MIB and MDB This kind of assessment is a kind of sensitivity analysis We also highlight thatthe task of clustering a set of geodetic measurements is applied for the first time in this study Weshow that the clusters can be defined according to two deterministic parameters local redundancyand correlation between the outlier test statistics Moreover critical values optimized by Monte Carlomethod were used here [4546] in order to compute the decision classes associated with IDS ie PCI PMD PWE Pover+ Poverminus and Pol

The rest of the paper is organised as follows the elements involved with IDS are provided inSection 2 Next the method used to compute the probabilities is given in Section 3 The results for acase study are showed in Section 4 and their discussion are emphasised in Section 5 Finally the mainhighlights of this research are provided in Section 6

2 Theoretical Framework

The IDS is an important case of multiple hypothesis testing In this section therefore we brieflypresent the principal elements involved for the case of multiple testing

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

4 of 40

First the null hypothesis denoted byH0 is formulated under the condition that random errorsare normally distributed with expectation zero ie in the absence of outliers Thus the null hypothesisH0 of the standard GaussndashMarkov model in the linear or linearised form is given by [33]

H0 Ey = Ax +Ee = Ax Dy = Qe (1)

where E is the expectation operator D is the dispersion operator y isin Rntimes1 is the vector ofmeasurements A isin Rntimesu is the coefficient matrix x isin Rutimes1 is the unknown parameter vectore isin Rntimes1 is the unknown vector of measurement errors and Qe isin Rntimesn is the positive-definitecovariance matrix of the measurements y

Under normal working conditions (ieH0) the measurement error model is then given by

e sim N(0 Qe) (2)

First we assume that the coefficient matrix A suffers from a rank deficiency ie uminus rank(A) gt 0In that case minimal constraints are added to solve the problem of the rank-deficient system Anexample of how to handle the problem of the rank-deficient will be given in the section 41 where aminimal constraint in Equation 51 has been added to the matrix A in Equation 50 This means that thecolumns of the design matrix A in Equation 1 become linearly independent ie the matrix A becomefull rank such that uminus rank(A) = 0

The best linear unbiased estimator (BLUE) of e underH0 is the well-known estimated least-squaresresidual vector e isin Rntimes1 which is given by

e = yminus Ax

= yminus A(ATW A)minus1(ATWy)

= Ax + eminus A(ATW A)minus1(ATW(Ax + e))

= eminus A(ATW A)minus1(ATWe)

= (Iminus A(ATW A)minus1 ATW)e

= Re

(3)

with x isin Rutimes1 being the BLUE of x under H0 W isin Rntimesn is the known matrix of weights taken asW = σ0

2Qminus1e where σ2

0 is the variance factor I isin Rntimesn is the identity matrix and R isin Rntimesn is knownas the redundancy matrix The R matrix is an orthogonal projector that projects onto the orthogonalcomplement of the range space of A The main diagonal elements of matrix R are known as localredundancy numbers (denoted by ri of the the system The larger the number of local redundancy fora given measurement the larger the degree of importance of that measurement for the model ie thelarger the absorption of a possible error of that measurement into their corresponding least-squaresresidual

The degrees of freedom r (ie the overall redundancy) of the model underH0 (Equation (1)) is

r = rank(Qe) = nminus rank(A) = nminus u where (4)

Qe = Qe minus σ02 A(ATW A)minus1 AT (5)

On the other hand an alternative model is proposed when there are doubts about the reliabilitylevel of the model underH0 Here we assume that the validity of the null hypothesisH0 in Equation (1)can be violated if the dataset is contaminated by outliers The model in an alternative hypothesis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

5 of 40

denoted by HA is to oppose Equation (1) by an extended model that includes the unknown vectornabla isin Rqtimes1 of deterministic bias parameters as follows ([2931])

HA y = Ax + Cnabla+ e =(

A C)( x

nabla

)+ e (6)

where C isin Rntimesq is the matrix that relates bias parameters ie the values of the outliers to observationsWe restrict ourselves to the matrix (A C) having full column rank such that

r = rank(

A C)= u + q le n (7)

One of the most used procedures based on hypothesis testing for outliers in linear (or linearised)models is the well-known data snooping method [2954] This procedure consists of screening eachindividual measurement for the presence of an outlier [49] In that case data snooping is based on alocal model test such that q = 1 and therefore the n alternative hypothesis is expressed as

H(i)A y = Ax + cinablai + e =

(A ci

)( xnablai

)+ e foralli = 1 middot middot middot n (8)

Now matrix C in Equation (6) is reduced to a canonical unit vector ci which consists exclusivelyof elements with values of 0 and 1 where 1 means that the ith bias parameter of magnitude nablai affectsthe ith measurement and 0 means otherwise In that case the rank of (A ci) isin Rntimes(u+1) and the vector

nabla in Equation (6) reduces to a scalar nablai in Equation (8) ie ci=(

0 0 0 middot middot middot 1ith 0 middot middot middot 0)T

When q = nminus u an overall model test is performed For more details about the overall model test seefor example [5556]

Note that the alternative hypothesisH(i)A in Equation (8) is formulated under the condition that

the outlier acts as a systematic effect by shifting the random error distribution underH0 by its ownvalue [44] In other words the presence of an outlier in a dataset can cause a shift of the expectationunderH0 to a nonzero value Therefore hypothesis testing is often employed to check whether thepossible shifting of the random error distribution under H0 by an outlier is in fact a systematiceffect (bias) or merely a random effect This hypothesis test-based approach is called the mean-shiftmodel (see eg [1823294647525457ndash65]

In the context of the mean-shift model the test statistic involved in data snooping is given by thenormalised least-squares residual denoted by wi This test statistic also known as Baardarsquos w-testis given as follows

wi =ci

TQminus1e eradic

ciTQminus1e QeQminus1

e ci

foralli = 1 middot middot middot n (9)

The alternative hypothesis in Equation (8) is formulated in the sense that There is at least oneoutlier in the vector of measurements yirdquo [46] In that case we are interested in knowing which of thealternative hypotheses may lead to the rejection of the null hypothesis with a certain probability Thismeans testingH0 againstH(1)

A H(2)A H(3)

A H(n)A This is known as multiple hypothesis testing (see

eg [47485355575866ndash70]) In that case the test statistic coming into effect is the maximum absoluteBaardarsquos w-test value (denoted by max-w) which is computed as [47]

max-w = maxiisin1middotmiddotmiddot n

|wi| (10)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

6 of 40

The decision rule for this case is given by

Accept H0 i f max-w le k

Otherwise

Accept H(i)A i f max-w gt k

(11)

The decision rule in Equation 11 says that if none of the n w-tests get rejected then we accept thenull hypothesisH0 If the null hypothesisH0 is rejected in any of the n tests then one can only assumethat detection occurred In other words if the max-w is larger than some percentile of its probabilitydistribution (ie some critical value k) then there is evidence that there is an outlier in the datasetTherefore outlier detection only informs us whether the null hypothesisH0 is accepted or not [45]

However the detection does not tell us which alternative hypothesisH(i)A would have led to the

rejection of the null hypothesisH0 The localisation of the alternative hypothesis which would haverejected the null hypothesis is a problem of outlier identification Outlier identification implies theexecution of a search among the measurements for the most likely outlier In other words one seeks tofind which of Baardarsquos w-test is the maximum absolute value max-w and if that max-w is greater thansome critical value k

Therefore the data snooping procedure of screening measurements for possible outliers is actuallyan important case of multiple hypothesis testing and not single hypothesis testing Moreover note thatoutlier identification only happens when outlier detection necessarily exists ie ldquooutlier identificationrdquoonly occurs when the null hypothesisH0 is rejected However correct detection does not necessarilyimply correct identification [475568]

In a special case of having only one single alternative hypothesis one should decide betweenthe null hypothesis H0 and only one single alternative hypothesis H(i)

A of Equation (8) In that casethe false decisions are restricted to Type I error and Type II error The probability of a Type I Errorα0 is the probability of rejecting the null hypothesis H0 when it is true whereas the probability of aType II error β0 is the probability of failing to reject the null hypothesisH0 when it is false (note theindex lsquo0rsquo represents the case in which a single hypothesis is tested) Instead of α0 and β0 there is theconfidence level CL = 1minus α0 and the power of the test γ0 = 1minus β0 respectively The first deals withthe probability of accepting a true null hypothesisH0 the second addresses the probability of correctlyaccepting the alternative hypothesisH(i)

A In that case given a probability of a Type I decision error α0we find the critical value k0 as follows

k0 = Φminus1(

1minus α0

2

)(12)

where Φminus1 denotes the inverse of the cumulative distribution function (cdf) of the two-tailed standardnormal distribution N(0 1)

The normalised least-squares residual wi follows a standard normal distribution with theexpectation that Ewi = 0 if H0 holds true (there is no outlier) On the other hand if the systemis contaminated with a single outlier at the ith location of the dataset (ie under H(i)

A ) then theexpectation of wi is

Ewi =radic

λ0 =radic

ciTQminus1e QeQminus1

e cinabla2i (13)

where λ0 is the non-centrality parameter for q = 1 Note therefore that there is an outlier thatcauses the expectation of wi to become

radicλ0 The square-root of the non-centrality parameter

radicλ0

in Equation (13) represents the expected mean shift of a specific w-test In such a case the termci

TQminus1e QeQminus1

e ci in Equation (13) is a scalar and therefore it can be rewritten as follows [71]

|nablai| = MDB0(i) =

radicλ0

ciTQminus1e QeQminus1

e ci foralli = 1 middot middot middot n (14)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

7 of 40

where |nablai| is the Minimal Detectable Bias (MDB0(i) ) for the case in which there is only one singlealternative hypothesis which can be computed for each individual alternative hypothesis according toEquation (8)

For a single outlier the variance of an estimated outlier denoted by σ2nablai

is

σ2nablai

=(

ciTQminus1

e QeQminus1e ci

)minus1 foralli = 1 middot middot middot n (15)

Thus the MDB can also be written as

MDB0(i) = σnablai

radicλ0 foralli = 1 middot middot middot n (16)

where σnablai =radic

σ2nablai

is the standard deviation of estimated outlier nablaiThe MDB in Equations (14) or (16) of an alternative hypothesis is the smallest-magnitude outlier

that can lead to the rejection of the null hypothesisH0 for a given α0 and β0 Thus for each model ofthe alternative hypothesisH(i)

A the corresponding MDB can be computed [477273] The limitation ofthis MDB is that it was initially developed for the binary hypothesis testing case In that case the MDBis a sensitivity indicator of Baardarsquos w-test when only one single alternative hypothesis is taken intoaccount In this article we are confined to multiple alternative hypotheses Therefore both the MDBand MIB are computed by considering the case of multiple hypothesis testing

For a scenario coinciding with the null hypothesisH0 under multiple testing hypothesis thereis the probability of incorrectly identifying at least one alternative hypothesis This type of wrongdecision is known as the family-wise error rate (FWE) The FWE is defined as

FWE = αprime =le 1minus (1minus α0)n (17)

which is approximatelyFWE = αprime le ntimes α0 (18)

where α0 is the significance level for an individual test The quantity in Equation (18) is just equal tothe upper bound of the Bonferroni inequality ie αprime le nα [74] For example if the FWE level (αprime) is005 and one is running 5 tests then each test will have an α0 of 0055 = 001 In other words one usesa global Type I Error rate αprime that combines all tests under consideration instead of an individual errorrate α0 that only considers one test at a time time [70] In that case the critical value kbon f is computedas

kbon f = Φminus1(

1minus αprime

2n

)(19)

The Bonferroni in Equation (18) is a good approximation for the case in which alternativehypotheses are independent In practice however the test results always depend on each otherto some degree because we always have a correlation between w-tests The correlation coefficientbetween any Baardarsquos w-test statistic (denoted by ρwi wj ) such as wi and wj is given by [57]

ρwi wj =ci

TQminus1e QeQminus1

e cjradicciTQminus1

e QeQminus1e ci

radiccjTQminus1

e QeQminus1e cj

forall(i 6= j) (20)

The correlation coefficient ρwi wj can assume values within the range [minus1 1]Here the extreme normalised residuals max-w (ie maximum absolute) in Equation (10) are

treated directly as a test statistic Note that when using Equation (10) as a test statistic the decisionrule is based on a one-sided test of the form max-w le k However the distributions of max-w cannotbe derived from well-known test distributions (eg normal distribution) The procedure to computethe critical value of max-w is given step-by-step by Rofatto et al [45]

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 2: Effects of Hard and Soft Equality Constraints on

2 of 40

constraints of predicted satellite clocks to improve the precise orbit determination (POD) processingduring maneuvers

The models are usually formulated with minimal constraint or extra (redundant) constraints Inthat case we are referring to the so-called equality constraints which are usually incorporated into asystem of equations in order to try to have a model well posed [12]

For the most part minimal constraints are introduced to solve to the problem of rank deficiencyin linear systems The rank deficiency is often caused by the lack (or insufficient) information abouta problem In the field of geodesy for example minimal constraints are external information whoseprimary role is to specify the coordinate system to which the network station positions will beestimated by least-squares method (LS) This problem is known as datum definition (or also zero-orderdesign or datum choice problem) [13ndash18] Several works have been investigated in the sense ofminimum-constrained adjustment and the datum choice problem in the geodetic literature focusingon topics like free-adjustment and the role of inner constraints (see eg [19ndash22])

If the number of constraints exceeds the minimum needed to solve the rank deficiency of theequation systems we say that we have redundant (or extra) constraints Extra constraints have alsobeen used to check the stability of points in geodetic deformation analysis (see eg [23ndash25]) to test thecompatibility of constraints with the observations and the rest of the constraints [1142627] So far wehave only distinguished the constraints in terms of numerical quantity

The model can also be subject to a hard and soft (or weighted) constraints Hard constraintscan often represent a case in which there exist known functional relations between the unknownparameters Soft constraints (or looser constraints) are however for which such functional relationscan slightly be violated depending on their uncertainty [327] Soft constraints may also be referred toas a pseudo-observation model [28]

The well-known least-squares (LS) has been widely used as a standard method of modelparameters estimation in geodetic applications and many others branches of modern science [29ndash42]This is due to the flexibility of the LS since no concepts from probability theory are used in formulatingthe least-squares minimisation problem

LS has the property of being a linear unbiased estimator (LUE) and some special cases it coincideswith the best linear unbiased estimator (BLUE) The estimator which has the smallest variance ofall LUEs is called the best linear unbiased estimator (BLUE) If we have the full knowledge of theprobability density function (PDF) of the measurements the method of maximum likelihood estimation(MLE) can also be applied In case of normally distributed measurements (Gauss-Markov model)the MLE estimators are identical to the BLUE ones and therefore the LS and MLE principles provideidentical results [3243]

However the presence of undesirable outliers in dataset makes LS no longer unbiased and itdoes not coincides with MLE Here we assume that an outlier is a measurement that is so probablycaused by a blunder that it is better not used or not used as it is [44] Failure to identify an outliercan jeopardise the reliability level of a system Due to its importance outliers must be appropriatelytreated to ensure the quality of data analysis [45]

In this paper we employ the iterative data snooping (IDS) which is hypothesis test-based outlierImportant to mention that IDS is not restrict to the field of geodetic statistics but it is a generallyapplicable method [4647]

IDS is an iterative outlier elimination procedure which combines estimation testing and acorrective action [4548] Parameter estimation is often conducted in the sense of the LS Thenhypothesis testing is performed with the aim to identify any outlier that may be present in datasetAfter identification the suspected outlier is then excluded from the dataset as a corrective action(ie adaptation) and the LS is restarted without the rejected measurement If the model redundancypermits this procedure is repeated until no more (possible) outliers can be identified (see eg [31] pp135) Although here we restrict ourselves to the case of one outlier at a time IDS can also be applied

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

3 of 40

for the case of multiples (simultaneous) outliers [49] For more details about multiples (simultaneous)outlier refers to [50ndash52]

There are chances of correct and false decisions of IDS because these procedure is based onstatistical hypothesis testing Recently Rofatto et al [45] have provided an algorithm based onMonte Carlo to determine the probability levels associated with IDS In that case they described sixclasses of decisions for IDS namely probability of correct identification (PCI) probability of misseddetection (PMD) probability of wrong exclusion (PWE) probability of over-identification positive(Pover+) probability of over-identification negative (Poverminus) and probability of statistical overlap (Pol)as follows

bull PCI Probability of identifying and removing correctly an outlying measurementbull PMD Probability of not detecting the outlier (ie Type II decision error for IDS)bull PWE Probability of identifying and removing a non-outlying measurement while the lsquotruersquo outlier

remains in the dataset (ie Type III decision error [53] for IDS)bull Pover+ Probability of identifying and removing correctly the outlying measurement and othersbull Poverminus Probability of identifying and removing more than one non-outlying measurement

whereas the lsquotrue outlierrsquo remains in the datasetbull Pol occurs in cases where one alternative hypothesis has the same distribution as the another one

These hypotheses cannot be distinguished because their test statistics are numerically the sameviolating the IDS rule of one outlier at a time In that case they are nonseparable and an outliercannot be identified In other words it corresponds to the probability of flagging simultaneouslytwo (or more) measurements as outliers

Based on the probability of correct detection (PCD) and probability of correct identification(PCI) the minimal biases MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) canbe computed as sensitivity indicators for outlier detection and identification respectively Outlierdetection only informs whether or not there might have been at least one outlier However thedetection does not tell us which measurement is an outlier The localization of the outlier is a problemof outlier identification ie outlier identification implies the execution of a search among themeasurements for the most likely outlier [45] Therefore the smallest value of an outlier that can bedetected given a certain PCD defines the MDB On the other hand the smallest value of an outlier thatcan be identified given a certain PCI defines the MIB

In this paper we investigate the effects of models subject to constraints (minimum redundanthard and soft) on the probability levels associated with IDS It is important to emphasised that if astandard-deviation of a constraint (or a set of a constraint) is changed from zero to a nonzero value itis called ldquorelaxationrdquo of the constraint [28] Here we also evaluate the effect of relaxing constraintson the MIB and MDB This kind of assessment is a kind of sensitivity analysis We also highlight thatthe task of clustering a set of geodetic measurements is applied for the first time in this study Weshow that the clusters can be defined according to two deterministic parameters local redundancyand correlation between the outlier test statistics Moreover critical values optimized by Monte Carlomethod were used here [4546] in order to compute the decision classes associated with IDS ie PCI PMD PWE Pover+ Poverminus and Pol

The rest of the paper is organised as follows the elements involved with IDS are provided inSection 2 Next the method used to compute the probabilities is given in Section 3 The results for acase study are showed in Section 4 and their discussion are emphasised in Section 5 Finally the mainhighlights of this research are provided in Section 6

2 Theoretical Framework

The IDS is an important case of multiple hypothesis testing In this section therefore we brieflypresent the principal elements involved for the case of multiple testing

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

4 of 40

First the null hypothesis denoted byH0 is formulated under the condition that random errorsare normally distributed with expectation zero ie in the absence of outliers Thus the null hypothesisH0 of the standard GaussndashMarkov model in the linear or linearised form is given by [33]

H0 Ey = Ax +Ee = Ax Dy = Qe (1)

where E is the expectation operator D is the dispersion operator y isin Rntimes1 is the vector ofmeasurements A isin Rntimesu is the coefficient matrix x isin Rutimes1 is the unknown parameter vectore isin Rntimes1 is the unknown vector of measurement errors and Qe isin Rntimesn is the positive-definitecovariance matrix of the measurements y

Under normal working conditions (ieH0) the measurement error model is then given by

e sim N(0 Qe) (2)

First we assume that the coefficient matrix A suffers from a rank deficiency ie uminus rank(A) gt 0In that case minimal constraints are added to solve the problem of the rank-deficient system Anexample of how to handle the problem of the rank-deficient will be given in the section 41 where aminimal constraint in Equation 51 has been added to the matrix A in Equation 50 This means that thecolumns of the design matrix A in Equation 1 become linearly independent ie the matrix A becomefull rank such that uminus rank(A) = 0

The best linear unbiased estimator (BLUE) of e underH0 is the well-known estimated least-squaresresidual vector e isin Rntimes1 which is given by

e = yminus Ax

= yminus A(ATW A)minus1(ATWy)

= Ax + eminus A(ATW A)minus1(ATW(Ax + e))

= eminus A(ATW A)minus1(ATWe)

= (Iminus A(ATW A)minus1 ATW)e

= Re

(3)

with x isin Rutimes1 being the BLUE of x under H0 W isin Rntimesn is the known matrix of weights taken asW = σ0

2Qminus1e where σ2

0 is the variance factor I isin Rntimesn is the identity matrix and R isin Rntimesn is knownas the redundancy matrix The R matrix is an orthogonal projector that projects onto the orthogonalcomplement of the range space of A The main diagonal elements of matrix R are known as localredundancy numbers (denoted by ri of the the system The larger the number of local redundancy fora given measurement the larger the degree of importance of that measurement for the model ie thelarger the absorption of a possible error of that measurement into their corresponding least-squaresresidual

The degrees of freedom r (ie the overall redundancy) of the model underH0 (Equation (1)) is

r = rank(Qe) = nminus rank(A) = nminus u where (4)

Qe = Qe minus σ02 A(ATW A)minus1 AT (5)

On the other hand an alternative model is proposed when there are doubts about the reliabilitylevel of the model underH0 Here we assume that the validity of the null hypothesisH0 in Equation (1)can be violated if the dataset is contaminated by outliers The model in an alternative hypothesis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

5 of 40

denoted by HA is to oppose Equation (1) by an extended model that includes the unknown vectornabla isin Rqtimes1 of deterministic bias parameters as follows ([2931])

HA y = Ax + Cnabla+ e =(

A C)( x

nabla

)+ e (6)

where C isin Rntimesq is the matrix that relates bias parameters ie the values of the outliers to observationsWe restrict ourselves to the matrix (A C) having full column rank such that

r = rank(

A C)= u + q le n (7)

One of the most used procedures based on hypothesis testing for outliers in linear (or linearised)models is the well-known data snooping method [2954] This procedure consists of screening eachindividual measurement for the presence of an outlier [49] In that case data snooping is based on alocal model test such that q = 1 and therefore the n alternative hypothesis is expressed as

H(i)A y = Ax + cinablai + e =

(A ci

)( xnablai

)+ e foralli = 1 middot middot middot n (8)

Now matrix C in Equation (6) is reduced to a canonical unit vector ci which consists exclusivelyof elements with values of 0 and 1 where 1 means that the ith bias parameter of magnitude nablai affectsthe ith measurement and 0 means otherwise In that case the rank of (A ci) isin Rntimes(u+1) and the vector

nabla in Equation (6) reduces to a scalar nablai in Equation (8) ie ci=(

0 0 0 middot middot middot 1ith 0 middot middot middot 0)T

When q = nminus u an overall model test is performed For more details about the overall model test seefor example [5556]

Note that the alternative hypothesisH(i)A in Equation (8) is formulated under the condition that

the outlier acts as a systematic effect by shifting the random error distribution underH0 by its ownvalue [44] In other words the presence of an outlier in a dataset can cause a shift of the expectationunderH0 to a nonzero value Therefore hypothesis testing is often employed to check whether thepossible shifting of the random error distribution under H0 by an outlier is in fact a systematiceffect (bias) or merely a random effect This hypothesis test-based approach is called the mean-shiftmodel (see eg [1823294647525457ndash65]

In the context of the mean-shift model the test statistic involved in data snooping is given by thenormalised least-squares residual denoted by wi This test statistic also known as Baardarsquos w-testis given as follows

wi =ci

TQminus1e eradic

ciTQminus1e QeQminus1

e ci

foralli = 1 middot middot middot n (9)

The alternative hypothesis in Equation (8) is formulated in the sense that There is at least oneoutlier in the vector of measurements yirdquo [46] In that case we are interested in knowing which of thealternative hypotheses may lead to the rejection of the null hypothesis with a certain probability Thismeans testingH0 againstH(1)

A H(2)A H(3)

A H(n)A This is known as multiple hypothesis testing (see

eg [47485355575866ndash70]) In that case the test statistic coming into effect is the maximum absoluteBaardarsquos w-test value (denoted by max-w) which is computed as [47]

max-w = maxiisin1middotmiddotmiddot n

|wi| (10)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

6 of 40

The decision rule for this case is given by

Accept H0 i f max-w le k

Otherwise

Accept H(i)A i f max-w gt k

(11)

The decision rule in Equation 11 says that if none of the n w-tests get rejected then we accept thenull hypothesisH0 If the null hypothesisH0 is rejected in any of the n tests then one can only assumethat detection occurred In other words if the max-w is larger than some percentile of its probabilitydistribution (ie some critical value k) then there is evidence that there is an outlier in the datasetTherefore outlier detection only informs us whether the null hypothesisH0 is accepted or not [45]

However the detection does not tell us which alternative hypothesisH(i)A would have led to the

rejection of the null hypothesisH0 The localisation of the alternative hypothesis which would haverejected the null hypothesis is a problem of outlier identification Outlier identification implies theexecution of a search among the measurements for the most likely outlier In other words one seeks tofind which of Baardarsquos w-test is the maximum absolute value max-w and if that max-w is greater thansome critical value k

Therefore the data snooping procedure of screening measurements for possible outliers is actuallyan important case of multiple hypothesis testing and not single hypothesis testing Moreover note thatoutlier identification only happens when outlier detection necessarily exists ie ldquooutlier identificationrdquoonly occurs when the null hypothesisH0 is rejected However correct detection does not necessarilyimply correct identification [475568]

In a special case of having only one single alternative hypothesis one should decide betweenthe null hypothesis H0 and only one single alternative hypothesis H(i)

A of Equation (8) In that casethe false decisions are restricted to Type I error and Type II error The probability of a Type I Errorα0 is the probability of rejecting the null hypothesis H0 when it is true whereas the probability of aType II error β0 is the probability of failing to reject the null hypothesisH0 when it is false (note theindex lsquo0rsquo represents the case in which a single hypothesis is tested) Instead of α0 and β0 there is theconfidence level CL = 1minus α0 and the power of the test γ0 = 1minus β0 respectively The first deals withthe probability of accepting a true null hypothesisH0 the second addresses the probability of correctlyaccepting the alternative hypothesisH(i)

A In that case given a probability of a Type I decision error α0we find the critical value k0 as follows

k0 = Φminus1(

1minus α0

2

)(12)

where Φminus1 denotes the inverse of the cumulative distribution function (cdf) of the two-tailed standardnormal distribution N(0 1)

The normalised least-squares residual wi follows a standard normal distribution with theexpectation that Ewi = 0 if H0 holds true (there is no outlier) On the other hand if the systemis contaminated with a single outlier at the ith location of the dataset (ie under H(i)

A ) then theexpectation of wi is

Ewi =radic

λ0 =radic

ciTQminus1e QeQminus1

e cinabla2i (13)

where λ0 is the non-centrality parameter for q = 1 Note therefore that there is an outlier thatcauses the expectation of wi to become

radicλ0 The square-root of the non-centrality parameter

radicλ0

in Equation (13) represents the expected mean shift of a specific w-test In such a case the termci

TQminus1e QeQminus1

e ci in Equation (13) is a scalar and therefore it can be rewritten as follows [71]

|nablai| = MDB0(i) =

radicλ0

ciTQminus1e QeQminus1

e ci foralli = 1 middot middot middot n (14)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

7 of 40

where |nablai| is the Minimal Detectable Bias (MDB0(i) ) for the case in which there is only one singlealternative hypothesis which can be computed for each individual alternative hypothesis according toEquation (8)

For a single outlier the variance of an estimated outlier denoted by σ2nablai

is

σ2nablai

=(

ciTQminus1

e QeQminus1e ci

)minus1 foralli = 1 middot middot middot n (15)

Thus the MDB can also be written as

MDB0(i) = σnablai

radicλ0 foralli = 1 middot middot middot n (16)

where σnablai =radic

σ2nablai

is the standard deviation of estimated outlier nablaiThe MDB in Equations (14) or (16) of an alternative hypothesis is the smallest-magnitude outlier

that can lead to the rejection of the null hypothesisH0 for a given α0 and β0 Thus for each model ofthe alternative hypothesisH(i)

A the corresponding MDB can be computed [477273] The limitation ofthis MDB is that it was initially developed for the binary hypothesis testing case In that case the MDBis a sensitivity indicator of Baardarsquos w-test when only one single alternative hypothesis is taken intoaccount In this article we are confined to multiple alternative hypotheses Therefore both the MDBand MIB are computed by considering the case of multiple hypothesis testing

For a scenario coinciding with the null hypothesisH0 under multiple testing hypothesis thereis the probability of incorrectly identifying at least one alternative hypothesis This type of wrongdecision is known as the family-wise error rate (FWE) The FWE is defined as

FWE = αprime =le 1minus (1minus α0)n (17)

which is approximatelyFWE = αprime le ntimes α0 (18)

where α0 is the significance level for an individual test The quantity in Equation (18) is just equal tothe upper bound of the Bonferroni inequality ie αprime le nα [74] For example if the FWE level (αprime) is005 and one is running 5 tests then each test will have an α0 of 0055 = 001 In other words one usesa global Type I Error rate αprime that combines all tests under consideration instead of an individual errorrate α0 that only considers one test at a time time [70] In that case the critical value kbon f is computedas

kbon f = Φminus1(

1minus αprime

2n

)(19)

The Bonferroni in Equation (18) is a good approximation for the case in which alternativehypotheses are independent In practice however the test results always depend on each otherto some degree because we always have a correlation between w-tests The correlation coefficientbetween any Baardarsquos w-test statistic (denoted by ρwi wj ) such as wi and wj is given by [57]

ρwi wj =ci

TQminus1e QeQminus1

e cjradicciTQminus1

e QeQminus1e ci

radiccjTQminus1

e QeQminus1e cj

forall(i 6= j) (20)

The correlation coefficient ρwi wj can assume values within the range [minus1 1]Here the extreme normalised residuals max-w (ie maximum absolute) in Equation (10) are

treated directly as a test statistic Note that when using Equation (10) as a test statistic the decisionrule is based on a one-sided test of the form max-w le k However the distributions of max-w cannotbe derived from well-known test distributions (eg normal distribution) The procedure to computethe critical value of max-w is given step-by-step by Rofatto et al [45]

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 3: Effects of Hard and Soft Equality Constraints on

3 of 40

for the case of multiples (simultaneous) outliers [49] For more details about multiples (simultaneous)outlier refers to [50ndash52]

There are chances of correct and false decisions of IDS because these procedure is based onstatistical hypothesis testing Recently Rofatto et al [45] have provided an algorithm based onMonte Carlo to determine the probability levels associated with IDS In that case they described sixclasses of decisions for IDS namely probability of correct identification (PCI) probability of misseddetection (PMD) probability of wrong exclusion (PWE) probability of over-identification positive(Pover+) probability of over-identification negative (Poverminus) and probability of statistical overlap (Pol)as follows

bull PCI Probability of identifying and removing correctly an outlying measurementbull PMD Probability of not detecting the outlier (ie Type II decision error for IDS)bull PWE Probability of identifying and removing a non-outlying measurement while the lsquotruersquo outlier

remains in the dataset (ie Type III decision error [53] for IDS)bull Pover+ Probability of identifying and removing correctly the outlying measurement and othersbull Poverminus Probability of identifying and removing more than one non-outlying measurement

whereas the lsquotrue outlierrsquo remains in the datasetbull Pol occurs in cases where one alternative hypothesis has the same distribution as the another one

These hypotheses cannot be distinguished because their test statistics are numerically the sameviolating the IDS rule of one outlier at a time In that case they are nonseparable and an outliercannot be identified In other words it corresponds to the probability of flagging simultaneouslytwo (or more) measurements as outliers

Based on the probability of correct detection (PCD) and probability of correct identification(PCI) the minimal biases MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) canbe computed as sensitivity indicators for outlier detection and identification respectively Outlierdetection only informs whether or not there might have been at least one outlier However thedetection does not tell us which measurement is an outlier The localization of the outlier is a problemof outlier identification ie outlier identification implies the execution of a search among themeasurements for the most likely outlier [45] Therefore the smallest value of an outlier that can bedetected given a certain PCD defines the MDB On the other hand the smallest value of an outlier thatcan be identified given a certain PCI defines the MIB

In this paper we investigate the effects of models subject to constraints (minimum redundanthard and soft) on the probability levels associated with IDS It is important to emphasised that if astandard-deviation of a constraint (or a set of a constraint) is changed from zero to a nonzero value itis called ldquorelaxationrdquo of the constraint [28] Here we also evaluate the effect of relaxing constraintson the MIB and MDB This kind of assessment is a kind of sensitivity analysis We also highlight thatthe task of clustering a set of geodetic measurements is applied for the first time in this study Weshow that the clusters can be defined according to two deterministic parameters local redundancyand correlation between the outlier test statistics Moreover critical values optimized by Monte Carlomethod were used here [4546] in order to compute the decision classes associated with IDS ie PCI PMD PWE Pover+ Poverminus and Pol

The rest of the paper is organised as follows the elements involved with IDS are provided inSection 2 Next the method used to compute the probabilities is given in Section 3 The results for acase study are showed in Section 4 and their discussion are emphasised in Section 5 Finally the mainhighlights of this research are provided in Section 6

2 Theoretical Framework

The IDS is an important case of multiple hypothesis testing In this section therefore we brieflypresent the principal elements involved for the case of multiple testing

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

4 of 40

First the null hypothesis denoted byH0 is formulated under the condition that random errorsare normally distributed with expectation zero ie in the absence of outliers Thus the null hypothesisH0 of the standard GaussndashMarkov model in the linear or linearised form is given by [33]

H0 Ey = Ax +Ee = Ax Dy = Qe (1)

where E is the expectation operator D is the dispersion operator y isin Rntimes1 is the vector ofmeasurements A isin Rntimesu is the coefficient matrix x isin Rutimes1 is the unknown parameter vectore isin Rntimes1 is the unknown vector of measurement errors and Qe isin Rntimesn is the positive-definitecovariance matrix of the measurements y

Under normal working conditions (ieH0) the measurement error model is then given by

e sim N(0 Qe) (2)

First we assume that the coefficient matrix A suffers from a rank deficiency ie uminus rank(A) gt 0In that case minimal constraints are added to solve the problem of the rank-deficient system Anexample of how to handle the problem of the rank-deficient will be given in the section 41 where aminimal constraint in Equation 51 has been added to the matrix A in Equation 50 This means that thecolumns of the design matrix A in Equation 1 become linearly independent ie the matrix A becomefull rank such that uminus rank(A) = 0

The best linear unbiased estimator (BLUE) of e underH0 is the well-known estimated least-squaresresidual vector e isin Rntimes1 which is given by

e = yminus Ax

= yminus A(ATW A)minus1(ATWy)

= Ax + eminus A(ATW A)minus1(ATW(Ax + e))

= eminus A(ATW A)minus1(ATWe)

= (Iminus A(ATW A)minus1 ATW)e

= Re

(3)

with x isin Rutimes1 being the BLUE of x under H0 W isin Rntimesn is the known matrix of weights taken asW = σ0

2Qminus1e where σ2

0 is the variance factor I isin Rntimesn is the identity matrix and R isin Rntimesn is knownas the redundancy matrix The R matrix is an orthogonal projector that projects onto the orthogonalcomplement of the range space of A The main diagonal elements of matrix R are known as localredundancy numbers (denoted by ri of the the system The larger the number of local redundancy fora given measurement the larger the degree of importance of that measurement for the model ie thelarger the absorption of a possible error of that measurement into their corresponding least-squaresresidual

The degrees of freedom r (ie the overall redundancy) of the model underH0 (Equation (1)) is

r = rank(Qe) = nminus rank(A) = nminus u where (4)

Qe = Qe minus σ02 A(ATW A)minus1 AT (5)

On the other hand an alternative model is proposed when there are doubts about the reliabilitylevel of the model underH0 Here we assume that the validity of the null hypothesisH0 in Equation (1)can be violated if the dataset is contaminated by outliers The model in an alternative hypothesis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

5 of 40

denoted by HA is to oppose Equation (1) by an extended model that includes the unknown vectornabla isin Rqtimes1 of deterministic bias parameters as follows ([2931])

HA y = Ax + Cnabla+ e =(

A C)( x

nabla

)+ e (6)

where C isin Rntimesq is the matrix that relates bias parameters ie the values of the outliers to observationsWe restrict ourselves to the matrix (A C) having full column rank such that

r = rank(

A C)= u + q le n (7)

One of the most used procedures based on hypothesis testing for outliers in linear (or linearised)models is the well-known data snooping method [2954] This procedure consists of screening eachindividual measurement for the presence of an outlier [49] In that case data snooping is based on alocal model test such that q = 1 and therefore the n alternative hypothesis is expressed as

H(i)A y = Ax + cinablai + e =

(A ci

)( xnablai

)+ e foralli = 1 middot middot middot n (8)

Now matrix C in Equation (6) is reduced to a canonical unit vector ci which consists exclusivelyof elements with values of 0 and 1 where 1 means that the ith bias parameter of magnitude nablai affectsthe ith measurement and 0 means otherwise In that case the rank of (A ci) isin Rntimes(u+1) and the vector

nabla in Equation (6) reduces to a scalar nablai in Equation (8) ie ci=(

0 0 0 middot middot middot 1ith 0 middot middot middot 0)T

When q = nminus u an overall model test is performed For more details about the overall model test seefor example [5556]

Note that the alternative hypothesisH(i)A in Equation (8) is formulated under the condition that

the outlier acts as a systematic effect by shifting the random error distribution underH0 by its ownvalue [44] In other words the presence of an outlier in a dataset can cause a shift of the expectationunderH0 to a nonzero value Therefore hypothesis testing is often employed to check whether thepossible shifting of the random error distribution under H0 by an outlier is in fact a systematiceffect (bias) or merely a random effect This hypothesis test-based approach is called the mean-shiftmodel (see eg [1823294647525457ndash65]

In the context of the mean-shift model the test statistic involved in data snooping is given by thenormalised least-squares residual denoted by wi This test statistic also known as Baardarsquos w-testis given as follows

wi =ci

TQminus1e eradic

ciTQminus1e QeQminus1

e ci

foralli = 1 middot middot middot n (9)

The alternative hypothesis in Equation (8) is formulated in the sense that There is at least oneoutlier in the vector of measurements yirdquo [46] In that case we are interested in knowing which of thealternative hypotheses may lead to the rejection of the null hypothesis with a certain probability Thismeans testingH0 againstH(1)

A H(2)A H(3)

A H(n)A This is known as multiple hypothesis testing (see

eg [47485355575866ndash70]) In that case the test statistic coming into effect is the maximum absoluteBaardarsquos w-test value (denoted by max-w) which is computed as [47]

max-w = maxiisin1middotmiddotmiddot n

|wi| (10)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

6 of 40

The decision rule for this case is given by

Accept H0 i f max-w le k

Otherwise

Accept H(i)A i f max-w gt k

(11)

The decision rule in Equation 11 says that if none of the n w-tests get rejected then we accept thenull hypothesisH0 If the null hypothesisH0 is rejected in any of the n tests then one can only assumethat detection occurred In other words if the max-w is larger than some percentile of its probabilitydistribution (ie some critical value k) then there is evidence that there is an outlier in the datasetTherefore outlier detection only informs us whether the null hypothesisH0 is accepted or not [45]

However the detection does not tell us which alternative hypothesisH(i)A would have led to the

rejection of the null hypothesisH0 The localisation of the alternative hypothesis which would haverejected the null hypothesis is a problem of outlier identification Outlier identification implies theexecution of a search among the measurements for the most likely outlier In other words one seeks tofind which of Baardarsquos w-test is the maximum absolute value max-w and if that max-w is greater thansome critical value k

Therefore the data snooping procedure of screening measurements for possible outliers is actuallyan important case of multiple hypothesis testing and not single hypothesis testing Moreover note thatoutlier identification only happens when outlier detection necessarily exists ie ldquooutlier identificationrdquoonly occurs when the null hypothesisH0 is rejected However correct detection does not necessarilyimply correct identification [475568]

In a special case of having only one single alternative hypothesis one should decide betweenthe null hypothesis H0 and only one single alternative hypothesis H(i)

A of Equation (8) In that casethe false decisions are restricted to Type I error and Type II error The probability of a Type I Errorα0 is the probability of rejecting the null hypothesis H0 when it is true whereas the probability of aType II error β0 is the probability of failing to reject the null hypothesisH0 when it is false (note theindex lsquo0rsquo represents the case in which a single hypothesis is tested) Instead of α0 and β0 there is theconfidence level CL = 1minus α0 and the power of the test γ0 = 1minus β0 respectively The first deals withthe probability of accepting a true null hypothesisH0 the second addresses the probability of correctlyaccepting the alternative hypothesisH(i)

A In that case given a probability of a Type I decision error α0we find the critical value k0 as follows

k0 = Φminus1(

1minus α0

2

)(12)

where Φminus1 denotes the inverse of the cumulative distribution function (cdf) of the two-tailed standardnormal distribution N(0 1)

The normalised least-squares residual wi follows a standard normal distribution with theexpectation that Ewi = 0 if H0 holds true (there is no outlier) On the other hand if the systemis contaminated with a single outlier at the ith location of the dataset (ie under H(i)

A ) then theexpectation of wi is

Ewi =radic

λ0 =radic

ciTQminus1e QeQminus1

e cinabla2i (13)

where λ0 is the non-centrality parameter for q = 1 Note therefore that there is an outlier thatcauses the expectation of wi to become

radicλ0 The square-root of the non-centrality parameter

radicλ0

in Equation (13) represents the expected mean shift of a specific w-test In such a case the termci

TQminus1e QeQminus1

e ci in Equation (13) is a scalar and therefore it can be rewritten as follows [71]

|nablai| = MDB0(i) =

radicλ0

ciTQminus1e QeQminus1

e ci foralli = 1 middot middot middot n (14)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

7 of 40

where |nablai| is the Minimal Detectable Bias (MDB0(i) ) for the case in which there is only one singlealternative hypothesis which can be computed for each individual alternative hypothesis according toEquation (8)

For a single outlier the variance of an estimated outlier denoted by σ2nablai

is

σ2nablai

=(

ciTQminus1

e QeQminus1e ci

)minus1 foralli = 1 middot middot middot n (15)

Thus the MDB can also be written as

MDB0(i) = σnablai

radicλ0 foralli = 1 middot middot middot n (16)

where σnablai =radic

σ2nablai

is the standard deviation of estimated outlier nablaiThe MDB in Equations (14) or (16) of an alternative hypothesis is the smallest-magnitude outlier

that can lead to the rejection of the null hypothesisH0 for a given α0 and β0 Thus for each model ofthe alternative hypothesisH(i)

A the corresponding MDB can be computed [477273] The limitation ofthis MDB is that it was initially developed for the binary hypothesis testing case In that case the MDBis a sensitivity indicator of Baardarsquos w-test when only one single alternative hypothesis is taken intoaccount In this article we are confined to multiple alternative hypotheses Therefore both the MDBand MIB are computed by considering the case of multiple hypothesis testing

For a scenario coinciding with the null hypothesisH0 under multiple testing hypothesis thereis the probability of incorrectly identifying at least one alternative hypothesis This type of wrongdecision is known as the family-wise error rate (FWE) The FWE is defined as

FWE = αprime =le 1minus (1minus α0)n (17)

which is approximatelyFWE = αprime le ntimes α0 (18)

where α0 is the significance level for an individual test The quantity in Equation (18) is just equal tothe upper bound of the Bonferroni inequality ie αprime le nα [74] For example if the FWE level (αprime) is005 and one is running 5 tests then each test will have an α0 of 0055 = 001 In other words one usesa global Type I Error rate αprime that combines all tests under consideration instead of an individual errorrate α0 that only considers one test at a time time [70] In that case the critical value kbon f is computedas

kbon f = Φminus1(

1minus αprime

2n

)(19)

The Bonferroni in Equation (18) is a good approximation for the case in which alternativehypotheses are independent In practice however the test results always depend on each otherto some degree because we always have a correlation between w-tests The correlation coefficientbetween any Baardarsquos w-test statistic (denoted by ρwi wj ) such as wi and wj is given by [57]

ρwi wj =ci

TQminus1e QeQminus1

e cjradicciTQminus1

e QeQminus1e ci

radiccjTQminus1

e QeQminus1e cj

forall(i 6= j) (20)

The correlation coefficient ρwi wj can assume values within the range [minus1 1]Here the extreme normalised residuals max-w (ie maximum absolute) in Equation (10) are

treated directly as a test statistic Note that when using Equation (10) as a test statistic the decisionrule is based on a one-sided test of the form max-w le k However the distributions of max-w cannotbe derived from well-known test distributions (eg normal distribution) The procedure to computethe critical value of max-w is given step-by-step by Rofatto et al [45]

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 4: Effects of Hard and Soft Equality Constraints on

4 of 40

First the null hypothesis denoted byH0 is formulated under the condition that random errorsare normally distributed with expectation zero ie in the absence of outliers Thus the null hypothesisH0 of the standard GaussndashMarkov model in the linear or linearised form is given by [33]

H0 Ey = Ax +Ee = Ax Dy = Qe (1)

where E is the expectation operator D is the dispersion operator y isin Rntimes1 is the vector ofmeasurements A isin Rntimesu is the coefficient matrix x isin Rutimes1 is the unknown parameter vectore isin Rntimes1 is the unknown vector of measurement errors and Qe isin Rntimesn is the positive-definitecovariance matrix of the measurements y

Under normal working conditions (ieH0) the measurement error model is then given by

e sim N(0 Qe) (2)

First we assume that the coefficient matrix A suffers from a rank deficiency ie uminus rank(A) gt 0In that case minimal constraints are added to solve the problem of the rank-deficient system Anexample of how to handle the problem of the rank-deficient will be given in the section 41 where aminimal constraint in Equation 51 has been added to the matrix A in Equation 50 This means that thecolumns of the design matrix A in Equation 1 become linearly independent ie the matrix A becomefull rank such that uminus rank(A) = 0

The best linear unbiased estimator (BLUE) of e underH0 is the well-known estimated least-squaresresidual vector e isin Rntimes1 which is given by

e = yminus Ax

= yminus A(ATW A)minus1(ATWy)

= Ax + eminus A(ATW A)minus1(ATW(Ax + e))

= eminus A(ATW A)minus1(ATWe)

= (Iminus A(ATW A)minus1 ATW)e

= Re

(3)

with x isin Rutimes1 being the BLUE of x under H0 W isin Rntimesn is the known matrix of weights taken asW = σ0

2Qminus1e where σ2

0 is the variance factor I isin Rntimesn is the identity matrix and R isin Rntimesn is knownas the redundancy matrix The R matrix is an orthogonal projector that projects onto the orthogonalcomplement of the range space of A The main diagonal elements of matrix R are known as localredundancy numbers (denoted by ri of the the system The larger the number of local redundancy fora given measurement the larger the degree of importance of that measurement for the model ie thelarger the absorption of a possible error of that measurement into their corresponding least-squaresresidual

The degrees of freedom r (ie the overall redundancy) of the model underH0 (Equation (1)) is

r = rank(Qe) = nminus rank(A) = nminus u where (4)

Qe = Qe minus σ02 A(ATW A)minus1 AT (5)

On the other hand an alternative model is proposed when there are doubts about the reliabilitylevel of the model underH0 Here we assume that the validity of the null hypothesisH0 in Equation (1)can be violated if the dataset is contaminated by outliers The model in an alternative hypothesis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

5 of 40

denoted by HA is to oppose Equation (1) by an extended model that includes the unknown vectornabla isin Rqtimes1 of deterministic bias parameters as follows ([2931])

HA y = Ax + Cnabla+ e =(

A C)( x

nabla

)+ e (6)

where C isin Rntimesq is the matrix that relates bias parameters ie the values of the outliers to observationsWe restrict ourselves to the matrix (A C) having full column rank such that

r = rank(

A C)= u + q le n (7)

One of the most used procedures based on hypothesis testing for outliers in linear (or linearised)models is the well-known data snooping method [2954] This procedure consists of screening eachindividual measurement for the presence of an outlier [49] In that case data snooping is based on alocal model test such that q = 1 and therefore the n alternative hypothesis is expressed as

H(i)A y = Ax + cinablai + e =

(A ci

)( xnablai

)+ e foralli = 1 middot middot middot n (8)

Now matrix C in Equation (6) is reduced to a canonical unit vector ci which consists exclusivelyof elements with values of 0 and 1 where 1 means that the ith bias parameter of magnitude nablai affectsthe ith measurement and 0 means otherwise In that case the rank of (A ci) isin Rntimes(u+1) and the vector

nabla in Equation (6) reduces to a scalar nablai in Equation (8) ie ci=(

0 0 0 middot middot middot 1ith 0 middot middot middot 0)T

When q = nminus u an overall model test is performed For more details about the overall model test seefor example [5556]

Note that the alternative hypothesisH(i)A in Equation (8) is formulated under the condition that

the outlier acts as a systematic effect by shifting the random error distribution underH0 by its ownvalue [44] In other words the presence of an outlier in a dataset can cause a shift of the expectationunderH0 to a nonzero value Therefore hypothesis testing is often employed to check whether thepossible shifting of the random error distribution under H0 by an outlier is in fact a systematiceffect (bias) or merely a random effect This hypothesis test-based approach is called the mean-shiftmodel (see eg [1823294647525457ndash65]

In the context of the mean-shift model the test statistic involved in data snooping is given by thenormalised least-squares residual denoted by wi This test statistic also known as Baardarsquos w-testis given as follows

wi =ci

TQminus1e eradic

ciTQminus1e QeQminus1

e ci

foralli = 1 middot middot middot n (9)

The alternative hypothesis in Equation (8) is formulated in the sense that There is at least oneoutlier in the vector of measurements yirdquo [46] In that case we are interested in knowing which of thealternative hypotheses may lead to the rejection of the null hypothesis with a certain probability Thismeans testingH0 againstH(1)

A H(2)A H(3)

A H(n)A This is known as multiple hypothesis testing (see

eg [47485355575866ndash70]) In that case the test statistic coming into effect is the maximum absoluteBaardarsquos w-test value (denoted by max-w) which is computed as [47]

max-w = maxiisin1middotmiddotmiddot n

|wi| (10)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

6 of 40

The decision rule for this case is given by

Accept H0 i f max-w le k

Otherwise

Accept H(i)A i f max-w gt k

(11)

The decision rule in Equation 11 says that if none of the n w-tests get rejected then we accept thenull hypothesisH0 If the null hypothesisH0 is rejected in any of the n tests then one can only assumethat detection occurred In other words if the max-w is larger than some percentile of its probabilitydistribution (ie some critical value k) then there is evidence that there is an outlier in the datasetTherefore outlier detection only informs us whether the null hypothesisH0 is accepted or not [45]

However the detection does not tell us which alternative hypothesisH(i)A would have led to the

rejection of the null hypothesisH0 The localisation of the alternative hypothesis which would haverejected the null hypothesis is a problem of outlier identification Outlier identification implies theexecution of a search among the measurements for the most likely outlier In other words one seeks tofind which of Baardarsquos w-test is the maximum absolute value max-w and if that max-w is greater thansome critical value k

Therefore the data snooping procedure of screening measurements for possible outliers is actuallyan important case of multiple hypothesis testing and not single hypothesis testing Moreover note thatoutlier identification only happens when outlier detection necessarily exists ie ldquooutlier identificationrdquoonly occurs when the null hypothesisH0 is rejected However correct detection does not necessarilyimply correct identification [475568]

In a special case of having only one single alternative hypothesis one should decide betweenthe null hypothesis H0 and only one single alternative hypothesis H(i)

A of Equation (8) In that casethe false decisions are restricted to Type I error and Type II error The probability of a Type I Errorα0 is the probability of rejecting the null hypothesis H0 when it is true whereas the probability of aType II error β0 is the probability of failing to reject the null hypothesisH0 when it is false (note theindex lsquo0rsquo represents the case in which a single hypothesis is tested) Instead of α0 and β0 there is theconfidence level CL = 1minus α0 and the power of the test γ0 = 1minus β0 respectively The first deals withthe probability of accepting a true null hypothesisH0 the second addresses the probability of correctlyaccepting the alternative hypothesisH(i)

A In that case given a probability of a Type I decision error α0we find the critical value k0 as follows

k0 = Φminus1(

1minus α0

2

)(12)

where Φminus1 denotes the inverse of the cumulative distribution function (cdf) of the two-tailed standardnormal distribution N(0 1)

The normalised least-squares residual wi follows a standard normal distribution with theexpectation that Ewi = 0 if H0 holds true (there is no outlier) On the other hand if the systemis contaminated with a single outlier at the ith location of the dataset (ie under H(i)

A ) then theexpectation of wi is

Ewi =radic

λ0 =radic

ciTQminus1e QeQminus1

e cinabla2i (13)

where λ0 is the non-centrality parameter for q = 1 Note therefore that there is an outlier thatcauses the expectation of wi to become

radicλ0 The square-root of the non-centrality parameter

radicλ0

in Equation (13) represents the expected mean shift of a specific w-test In such a case the termci

TQminus1e QeQminus1

e ci in Equation (13) is a scalar and therefore it can be rewritten as follows [71]

|nablai| = MDB0(i) =

radicλ0

ciTQminus1e QeQminus1

e ci foralli = 1 middot middot middot n (14)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

7 of 40

where |nablai| is the Minimal Detectable Bias (MDB0(i) ) for the case in which there is only one singlealternative hypothesis which can be computed for each individual alternative hypothesis according toEquation (8)

For a single outlier the variance of an estimated outlier denoted by σ2nablai

is

σ2nablai

=(

ciTQminus1

e QeQminus1e ci

)minus1 foralli = 1 middot middot middot n (15)

Thus the MDB can also be written as

MDB0(i) = σnablai

radicλ0 foralli = 1 middot middot middot n (16)

where σnablai =radic

σ2nablai

is the standard deviation of estimated outlier nablaiThe MDB in Equations (14) or (16) of an alternative hypothesis is the smallest-magnitude outlier

that can lead to the rejection of the null hypothesisH0 for a given α0 and β0 Thus for each model ofthe alternative hypothesisH(i)

A the corresponding MDB can be computed [477273] The limitation ofthis MDB is that it was initially developed for the binary hypothesis testing case In that case the MDBis a sensitivity indicator of Baardarsquos w-test when only one single alternative hypothesis is taken intoaccount In this article we are confined to multiple alternative hypotheses Therefore both the MDBand MIB are computed by considering the case of multiple hypothesis testing

For a scenario coinciding with the null hypothesisH0 under multiple testing hypothesis thereis the probability of incorrectly identifying at least one alternative hypothesis This type of wrongdecision is known as the family-wise error rate (FWE) The FWE is defined as

FWE = αprime =le 1minus (1minus α0)n (17)

which is approximatelyFWE = αprime le ntimes α0 (18)

where α0 is the significance level for an individual test The quantity in Equation (18) is just equal tothe upper bound of the Bonferroni inequality ie αprime le nα [74] For example if the FWE level (αprime) is005 and one is running 5 tests then each test will have an α0 of 0055 = 001 In other words one usesa global Type I Error rate αprime that combines all tests under consideration instead of an individual errorrate α0 that only considers one test at a time time [70] In that case the critical value kbon f is computedas

kbon f = Φminus1(

1minus αprime

2n

)(19)

The Bonferroni in Equation (18) is a good approximation for the case in which alternativehypotheses are independent In practice however the test results always depend on each otherto some degree because we always have a correlation between w-tests The correlation coefficientbetween any Baardarsquos w-test statistic (denoted by ρwi wj ) such as wi and wj is given by [57]

ρwi wj =ci

TQminus1e QeQminus1

e cjradicciTQminus1

e QeQminus1e ci

radiccjTQminus1

e QeQminus1e cj

forall(i 6= j) (20)

The correlation coefficient ρwi wj can assume values within the range [minus1 1]Here the extreme normalised residuals max-w (ie maximum absolute) in Equation (10) are

treated directly as a test statistic Note that when using Equation (10) as a test statistic the decisionrule is based on a one-sided test of the form max-w le k However the distributions of max-w cannotbe derived from well-known test distributions (eg normal distribution) The procedure to computethe critical value of max-w is given step-by-step by Rofatto et al [45]

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 5: Effects of Hard and Soft Equality Constraints on

5 of 40

denoted by HA is to oppose Equation (1) by an extended model that includes the unknown vectornabla isin Rqtimes1 of deterministic bias parameters as follows ([2931])

HA y = Ax + Cnabla+ e =(

A C)( x

nabla

)+ e (6)

where C isin Rntimesq is the matrix that relates bias parameters ie the values of the outliers to observationsWe restrict ourselves to the matrix (A C) having full column rank such that

r = rank(

A C)= u + q le n (7)

One of the most used procedures based on hypothesis testing for outliers in linear (or linearised)models is the well-known data snooping method [2954] This procedure consists of screening eachindividual measurement for the presence of an outlier [49] In that case data snooping is based on alocal model test such that q = 1 and therefore the n alternative hypothesis is expressed as

H(i)A y = Ax + cinablai + e =

(A ci

)( xnablai

)+ e foralli = 1 middot middot middot n (8)

Now matrix C in Equation (6) is reduced to a canonical unit vector ci which consists exclusivelyof elements with values of 0 and 1 where 1 means that the ith bias parameter of magnitude nablai affectsthe ith measurement and 0 means otherwise In that case the rank of (A ci) isin Rntimes(u+1) and the vector

nabla in Equation (6) reduces to a scalar nablai in Equation (8) ie ci=(

0 0 0 middot middot middot 1ith 0 middot middot middot 0)T

When q = nminus u an overall model test is performed For more details about the overall model test seefor example [5556]

Note that the alternative hypothesisH(i)A in Equation (8) is formulated under the condition that

the outlier acts as a systematic effect by shifting the random error distribution underH0 by its ownvalue [44] In other words the presence of an outlier in a dataset can cause a shift of the expectationunderH0 to a nonzero value Therefore hypothesis testing is often employed to check whether thepossible shifting of the random error distribution under H0 by an outlier is in fact a systematiceffect (bias) or merely a random effect This hypothesis test-based approach is called the mean-shiftmodel (see eg [1823294647525457ndash65]

In the context of the mean-shift model the test statistic involved in data snooping is given by thenormalised least-squares residual denoted by wi This test statistic also known as Baardarsquos w-testis given as follows

wi =ci

TQminus1e eradic

ciTQminus1e QeQminus1

e ci

foralli = 1 middot middot middot n (9)

The alternative hypothesis in Equation (8) is formulated in the sense that There is at least oneoutlier in the vector of measurements yirdquo [46] In that case we are interested in knowing which of thealternative hypotheses may lead to the rejection of the null hypothesis with a certain probability Thismeans testingH0 againstH(1)

A H(2)A H(3)

A H(n)A This is known as multiple hypothesis testing (see

eg [47485355575866ndash70]) In that case the test statistic coming into effect is the maximum absoluteBaardarsquos w-test value (denoted by max-w) which is computed as [47]

max-w = maxiisin1middotmiddotmiddot n

|wi| (10)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

6 of 40

The decision rule for this case is given by

Accept H0 i f max-w le k

Otherwise

Accept H(i)A i f max-w gt k

(11)

The decision rule in Equation 11 says that if none of the n w-tests get rejected then we accept thenull hypothesisH0 If the null hypothesisH0 is rejected in any of the n tests then one can only assumethat detection occurred In other words if the max-w is larger than some percentile of its probabilitydistribution (ie some critical value k) then there is evidence that there is an outlier in the datasetTherefore outlier detection only informs us whether the null hypothesisH0 is accepted or not [45]

However the detection does not tell us which alternative hypothesisH(i)A would have led to the

rejection of the null hypothesisH0 The localisation of the alternative hypothesis which would haverejected the null hypothesis is a problem of outlier identification Outlier identification implies theexecution of a search among the measurements for the most likely outlier In other words one seeks tofind which of Baardarsquos w-test is the maximum absolute value max-w and if that max-w is greater thansome critical value k

Therefore the data snooping procedure of screening measurements for possible outliers is actuallyan important case of multiple hypothesis testing and not single hypothesis testing Moreover note thatoutlier identification only happens when outlier detection necessarily exists ie ldquooutlier identificationrdquoonly occurs when the null hypothesisH0 is rejected However correct detection does not necessarilyimply correct identification [475568]

In a special case of having only one single alternative hypothesis one should decide betweenthe null hypothesis H0 and only one single alternative hypothesis H(i)

A of Equation (8) In that casethe false decisions are restricted to Type I error and Type II error The probability of a Type I Errorα0 is the probability of rejecting the null hypothesis H0 when it is true whereas the probability of aType II error β0 is the probability of failing to reject the null hypothesisH0 when it is false (note theindex lsquo0rsquo represents the case in which a single hypothesis is tested) Instead of α0 and β0 there is theconfidence level CL = 1minus α0 and the power of the test γ0 = 1minus β0 respectively The first deals withthe probability of accepting a true null hypothesisH0 the second addresses the probability of correctlyaccepting the alternative hypothesisH(i)

A In that case given a probability of a Type I decision error α0we find the critical value k0 as follows

k0 = Φminus1(

1minus α0

2

)(12)

where Φminus1 denotes the inverse of the cumulative distribution function (cdf) of the two-tailed standardnormal distribution N(0 1)

The normalised least-squares residual wi follows a standard normal distribution with theexpectation that Ewi = 0 if H0 holds true (there is no outlier) On the other hand if the systemis contaminated with a single outlier at the ith location of the dataset (ie under H(i)

A ) then theexpectation of wi is

Ewi =radic

λ0 =radic

ciTQminus1e QeQminus1

e cinabla2i (13)

where λ0 is the non-centrality parameter for q = 1 Note therefore that there is an outlier thatcauses the expectation of wi to become

radicλ0 The square-root of the non-centrality parameter

radicλ0

in Equation (13) represents the expected mean shift of a specific w-test In such a case the termci

TQminus1e QeQminus1

e ci in Equation (13) is a scalar and therefore it can be rewritten as follows [71]

|nablai| = MDB0(i) =

radicλ0

ciTQminus1e QeQminus1

e ci foralli = 1 middot middot middot n (14)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

7 of 40

where |nablai| is the Minimal Detectable Bias (MDB0(i) ) for the case in which there is only one singlealternative hypothesis which can be computed for each individual alternative hypothesis according toEquation (8)

For a single outlier the variance of an estimated outlier denoted by σ2nablai

is

σ2nablai

=(

ciTQminus1

e QeQminus1e ci

)minus1 foralli = 1 middot middot middot n (15)

Thus the MDB can also be written as

MDB0(i) = σnablai

radicλ0 foralli = 1 middot middot middot n (16)

where σnablai =radic

σ2nablai

is the standard deviation of estimated outlier nablaiThe MDB in Equations (14) or (16) of an alternative hypothesis is the smallest-magnitude outlier

that can lead to the rejection of the null hypothesisH0 for a given α0 and β0 Thus for each model ofthe alternative hypothesisH(i)

A the corresponding MDB can be computed [477273] The limitation ofthis MDB is that it was initially developed for the binary hypothesis testing case In that case the MDBis a sensitivity indicator of Baardarsquos w-test when only one single alternative hypothesis is taken intoaccount In this article we are confined to multiple alternative hypotheses Therefore both the MDBand MIB are computed by considering the case of multiple hypothesis testing

For a scenario coinciding with the null hypothesisH0 under multiple testing hypothesis thereis the probability of incorrectly identifying at least one alternative hypothesis This type of wrongdecision is known as the family-wise error rate (FWE) The FWE is defined as

FWE = αprime =le 1minus (1minus α0)n (17)

which is approximatelyFWE = αprime le ntimes α0 (18)

where α0 is the significance level for an individual test The quantity in Equation (18) is just equal tothe upper bound of the Bonferroni inequality ie αprime le nα [74] For example if the FWE level (αprime) is005 and one is running 5 tests then each test will have an α0 of 0055 = 001 In other words one usesa global Type I Error rate αprime that combines all tests under consideration instead of an individual errorrate α0 that only considers one test at a time time [70] In that case the critical value kbon f is computedas

kbon f = Φminus1(

1minus αprime

2n

)(19)

The Bonferroni in Equation (18) is a good approximation for the case in which alternativehypotheses are independent In practice however the test results always depend on each otherto some degree because we always have a correlation between w-tests The correlation coefficientbetween any Baardarsquos w-test statistic (denoted by ρwi wj ) such as wi and wj is given by [57]

ρwi wj =ci

TQminus1e QeQminus1

e cjradicciTQminus1

e QeQminus1e ci

radiccjTQminus1

e QeQminus1e cj

forall(i 6= j) (20)

The correlation coefficient ρwi wj can assume values within the range [minus1 1]Here the extreme normalised residuals max-w (ie maximum absolute) in Equation (10) are

treated directly as a test statistic Note that when using Equation (10) as a test statistic the decisionrule is based on a one-sided test of the form max-w le k However the distributions of max-w cannotbe derived from well-known test distributions (eg normal distribution) The procedure to computethe critical value of max-w is given step-by-step by Rofatto et al [45]

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 6: Effects of Hard and Soft Equality Constraints on

6 of 40

The decision rule for this case is given by

Accept H0 i f max-w le k

Otherwise

Accept H(i)A i f max-w gt k

(11)

The decision rule in Equation 11 says that if none of the n w-tests get rejected then we accept thenull hypothesisH0 If the null hypothesisH0 is rejected in any of the n tests then one can only assumethat detection occurred In other words if the max-w is larger than some percentile of its probabilitydistribution (ie some critical value k) then there is evidence that there is an outlier in the datasetTherefore outlier detection only informs us whether the null hypothesisH0 is accepted or not [45]

However the detection does not tell us which alternative hypothesisH(i)A would have led to the

rejection of the null hypothesisH0 The localisation of the alternative hypothesis which would haverejected the null hypothesis is a problem of outlier identification Outlier identification implies theexecution of a search among the measurements for the most likely outlier In other words one seeks tofind which of Baardarsquos w-test is the maximum absolute value max-w and if that max-w is greater thansome critical value k

Therefore the data snooping procedure of screening measurements for possible outliers is actuallyan important case of multiple hypothesis testing and not single hypothesis testing Moreover note thatoutlier identification only happens when outlier detection necessarily exists ie ldquooutlier identificationrdquoonly occurs when the null hypothesisH0 is rejected However correct detection does not necessarilyimply correct identification [475568]

In a special case of having only one single alternative hypothesis one should decide betweenthe null hypothesis H0 and only one single alternative hypothesis H(i)

A of Equation (8) In that casethe false decisions are restricted to Type I error and Type II error The probability of a Type I Errorα0 is the probability of rejecting the null hypothesis H0 when it is true whereas the probability of aType II error β0 is the probability of failing to reject the null hypothesisH0 when it is false (note theindex lsquo0rsquo represents the case in which a single hypothesis is tested) Instead of α0 and β0 there is theconfidence level CL = 1minus α0 and the power of the test γ0 = 1minus β0 respectively The first deals withthe probability of accepting a true null hypothesisH0 the second addresses the probability of correctlyaccepting the alternative hypothesisH(i)

A In that case given a probability of a Type I decision error α0we find the critical value k0 as follows

k0 = Φminus1(

1minus α0

2

)(12)

where Φminus1 denotes the inverse of the cumulative distribution function (cdf) of the two-tailed standardnormal distribution N(0 1)

The normalised least-squares residual wi follows a standard normal distribution with theexpectation that Ewi = 0 if H0 holds true (there is no outlier) On the other hand if the systemis contaminated with a single outlier at the ith location of the dataset (ie under H(i)

A ) then theexpectation of wi is

Ewi =radic

λ0 =radic

ciTQminus1e QeQminus1

e cinabla2i (13)

where λ0 is the non-centrality parameter for q = 1 Note therefore that there is an outlier thatcauses the expectation of wi to become

radicλ0 The square-root of the non-centrality parameter

radicλ0

in Equation (13) represents the expected mean shift of a specific w-test In such a case the termci

TQminus1e QeQminus1

e ci in Equation (13) is a scalar and therefore it can be rewritten as follows [71]

|nablai| = MDB0(i) =

radicλ0

ciTQminus1e QeQminus1

e ci foralli = 1 middot middot middot n (14)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

7 of 40

where |nablai| is the Minimal Detectable Bias (MDB0(i) ) for the case in which there is only one singlealternative hypothesis which can be computed for each individual alternative hypothesis according toEquation (8)

For a single outlier the variance of an estimated outlier denoted by σ2nablai

is

σ2nablai

=(

ciTQminus1

e QeQminus1e ci

)minus1 foralli = 1 middot middot middot n (15)

Thus the MDB can also be written as

MDB0(i) = σnablai

radicλ0 foralli = 1 middot middot middot n (16)

where σnablai =radic

σ2nablai

is the standard deviation of estimated outlier nablaiThe MDB in Equations (14) or (16) of an alternative hypothesis is the smallest-magnitude outlier

that can lead to the rejection of the null hypothesisH0 for a given α0 and β0 Thus for each model ofthe alternative hypothesisH(i)

A the corresponding MDB can be computed [477273] The limitation ofthis MDB is that it was initially developed for the binary hypothesis testing case In that case the MDBis a sensitivity indicator of Baardarsquos w-test when only one single alternative hypothesis is taken intoaccount In this article we are confined to multiple alternative hypotheses Therefore both the MDBand MIB are computed by considering the case of multiple hypothesis testing

For a scenario coinciding with the null hypothesisH0 under multiple testing hypothesis thereis the probability of incorrectly identifying at least one alternative hypothesis This type of wrongdecision is known as the family-wise error rate (FWE) The FWE is defined as

FWE = αprime =le 1minus (1minus α0)n (17)

which is approximatelyFWE = αprime le ntimes α0 (18)

where α0 is the significance level for an individual test The quantity in Equation (18) is just equal tothe upper bound of the Bonferroni inequality ie αprime le nα [74] For example if the FWE level (αprime) is005 and one is running 5 tests then each test will have an α0 of 0055 = 001 In other words one usesa global Type I Error rate αprime that combines all tests under consideration instead of an individual errorrate α0 that only considers one test at a time time [70] In that case the critical value kbon f is computedas

kbon f = Φminus1(

1minus αprime

2n

)(19)

The Bonferroni in Equation (18) is a good approximation for the case in which alternativehypotheses are independent In practice however the test results always depend on each otherto some degree because we always have a correlation between w-tests The correlation coefficientbetween any Baardarsquos w-test statistic (denoted by ρwi wj ) such as wi and wj is given by [57]

ρwi wj =ci

TQminus1e QeQminus1

e cjradicciTQminus1

e QeQminus1e ci

radiccjTQminus1

e QeQminus1e cj

forall(i 6= j) (20)

The correlation coefficient ρwi wj can assume values within the range [minus1 1]Here the extreme normalised residuals max-w (ie maximum absolute) in Equation (10) are

treated directly as a test statistic Note that when using Equation (10) as a test statistic the decisionrule is based on a one-sided test of the form max-w le k However the distributions of max-w cannotbe derived from well-known test distributions (eg normal distribution) The procedure to computethe critical value of max-w is given step-by-step by Rofatto et al [45]

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 7: Effects of Hard and Soft Equality Constraints on

7 of 40

where |nablai| is the Minimal Detectable Bias (MDB0(i) ) for the case in which there is only one singlealternative hypothesis which can be computed for each individual alternative hypothesis according toEquation (8)

For a single outlier the variance of an estimated outlier denoted by σ2nablai

is

σ2nablai

=(

ciTQminus1

e QeQminus1e ci

)minus1 foralli = 1 middot middot middot n (15)

Thus the MDB can also be written as

MDB0(i) = σnablai

radicλ0 foralli = 1 middot middot middot n (16)

where σnablai =radic

σ2nablai

is the standard deviation of estimated outlier nablaiThe MDB in Equations (14) or (16) of an alternative hypothesis is the smallest-magnitude outlier

that can lead to the rejection of the null hypothesisH0 for a given α0 and β0 Thus for each model ofthe alternative hypothesisH(i)

A the corresponding MDB can be computed [477273] The limitation ofthis MDB is that it was initially developed for the binary hypothesis testing case In that case the MDBis a sensitivity indicator of Baardarsquos w-test when only one single alternative hypothesis is taken intoaccount In this article we are confined to multiple alternative hypotheses Therefore both the MDBand MIB are computed by considering the case of multiple hypothesis testing

For a scenario coinciding with the null hypothesisH0 under multiple testing hypothesis thereis the probability of incorrectly identifying at least one alternative hypothesis This type of wrongdecision is known as the family-wise error rate (FWE) The FWE is defined as

FWE = αprime =le 1minus (1minus α0)n (17)

which is approximatelyFWE = αprime le ntimes α0 (18)

where α0 is the significance level for an individual test The quantity in Equation (18) is just equal tothe upper bound of the Bonferroni inequality ie αprime le nα [74] For example if the FWE level (αprime) is005 and one is running 5 tests then each test will have an α0 of 0055 = 001 In other words one usesa global Type I Error rate αprime that combines all tests under consideration instead of an individual errorrate α0 that only considers one test at a time time [70] In that case the critical value kbon f is computedas

kbon f = Φminus1(

1minus αprime

2n

)(19)

The Bonferroni in Equation (18) is a good approximation for the case in which alternativehypotheses are independent In practice however the test results always depend on each otherto some degree because we always have a correlation between w-tests The correlation coefficientbetween any Baardarsquos w-test statistic (denoted by ρwi wj ) such as wi and wj is given by [57]

ρwi wj =ci

TQminus1e QeQminus1

e cjradicciTQminus1

e QeQminus1e ci

radiccjTQminus1

e QeQminus1e cj

forall(i 6= j) (20)

The correlation coefficient ρwi wj can assume values within the range [minus1 1]Here the extreme normalised residuals max-w (ie maximum absolute) in Equation (10) are

treated directly as a test statistic Note that when using Equation (10) as a test statistic the decisionrule is based on a one-sided test of the form max-w le k However the distributions of max-w cannotbe derived from well-known test distributions (eg normal distribution) The procedure to computethe critical value of max-w is given step-by-step by Rofatto et al [45]

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 8: Effects of Hard and Soft Equality Constraints on

8 of 40

The other side of the multiple testing problem is the situation in which there is an outlier in thedataset In that case apart from Type I and Type II errors there is a third type of wrong decisionassociated with Baardarsquos w-test Baardarsquos w-test can also flag a non-outlying observation whilethe lsquotruersquo outlier remains in the dataset We are referring to the Type III error [53] also referred toas the probability of wrong identification (PWI) The description of the Type III error involves aseparability analysis between alternative hypotheses [57666869] Therefore we are now interested inthe identification of the correct alternative hypothesis In that case the non-centrality parameter inEquation (13) is not only related to the sizes of Type I and Type II decision errors but also dependenton the correlation coefficient ρwi wj given by Equation (20)

On the basis of the assumption that one outlier is in the ith position of the dataset (ie H(i)A is

rsquotruersquo) the probability of a Type II error (also referenced as the probability of ldquomissed detectionrdquo denotedby PMD) for multiple testing is

PMD = P(⋂n

i=1|wi| le k

∣∣∣ H(i)A true

) (21)

and the size of a Type III wrong decision (also called ldquomisidentificationrdquo denoted by PWI) is given by

PWI =n

sumi=1P(|wj| gt |wi| foralli |wj| gt k(i 6= j)

∣∣∣ H(i)A true

)(22)

On the other hand the probability of correct identification (denoted by PCI) is

PCI = P(|wi| gt |wj| forallj |wi| gt k(i 6= j)

∣∣∣ H(i)A true

)(23)

with1minusPCI = 1minusPCD + PWI = PMD + PWI (24)

Note that the three probabilities of missed detection PMD wrong identification PWI and correctidentification PCI sum up to unity ie PMD + PWI + PCI = 1

The probability of correct detection PCD is the sum of the probability of correct identification PCI(selecting a correct alternative hypothesis) and the probability of misidentification PWI (selecting oneof the n-1 other hypotheses) ie

PCD = PCI + PWI (25)

The probability of wrong identification PWI is identically zero PWI = 0 when the correlationcoefficient is exactly zero ρwi wj = 0 In that case we have

PCD = PCI = 1minusPMD (26)

The relationship given in Equation (26) would only happen if one neglected the nature of thedependence between alternative hypotheses In other words this relationship is valid for the specialcase of testing the null hypothesisH0 against only one single alternative hypothesisH(i)

A Since the critical region in multiple hypothesis testing is larger than that in single hypothesis

testing the Type II decision error (ie PMD) for the multiple test becomes smaller [47] This meansthat the correct detection in binary hypothesis testing (γ0) is smaller than the correct detection PCDunder multiple hypothesis testing ie

PCD gt γ0 (27)

Detection is easier in the case of multiple hypothesis testing than single hypothesis testingHowever the probability of correct detection PCD under multiple testing is spread out over allalternative hypotheses and therefore identifying is harder than detecting From Equation (25) it is

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 9: Effects of Hard and Soft Equality Constraints on

9 of 40

also noted that detection does not depend on identification However outlier identification dependson correct outlier detection Therefore we have the following inequality

PCI le PCD (28)

Note that the probability of correct identification PCI depends on the probability of misseddetection PMD and wrong identification PWI for the case in which data snooping is run only once iea single round of estimation and testing However in this paper we deal with data snooping in itsiterative form (ie IDS) and therefore the probability of correct identification PCI depends on otherdecision rules

In contrast to the data snooping single run the success rate of correct detection PCD forIDS depends on the sum of the probabilities of correct identification PCI wrong exclusion (PWE)over-identification cases (Pover+ and Poverminus) and statistical overlap (Pol) ie

PCD = 1minusPMD = PCI + PWE + Pover+ + Poverminus + Pol (29)

It is important to mention that the probability of correct detection is the complement of theprobability of missed detection Note from Equation (29) that the probability of correct detection PCDis available even for cases in which the identification rate is null PCI = 0 However the probabilityof correct identification (PCI) necessarily requires that the probability of correct detection PCD begreater than zero For the same reasons given for the data snooping single run in the previous sectiondetecting is easier than identifying In that case we have the following relationship for the success rateof correct outlier identification PCI

PCI = PCD minus (PWE + Pover+ + Poverminus + Pol) (30)

such asexist(PCI) isin [0 1] lArrrArr (PCD) gt 0 (31)

It is important to mention that the wrong exclusion PWE describes the probability of identifyingand removing a non-outlying measurement while the lsquotruersquo outlier remains in the dataset In otherwords PWE is the Type III decision error for IDS) The overall wrong exclusion PWE is the result of thesum of each individual contribution to PWE ie

PWE =nminus1

sumi=1PWE(i) (32)

On the basis of the probability levels of correct detection PCD and correct identification PCI the sensitivity indicators of minimal biasesmdashMinimal Detectable Bias (MDB) and Minimal IdentifiableBias (MIB)mdashfor a given αprime can be computed as follows

MDB = arg minnablaiPCD(nablai) gt PCD foralli = 1 middot middot middot n (33)

MIB = arg minnablaiPCI(nablai) gt PCI foralli = 1 middot middot middot n (34)

Equation (33) gives the smallest outlier nablai that leads to its detection for a user-defined correctdetection rate PCD whereas (34) provides the smallest outlier nablai that leads to its identification for auser-defined correct identification rate PCI

As a consequence of the inequality in (28) the MIB will be larger than MDB ie MIB ge MDBFor the special case of having only one single alternative hypothesis there is no difference between theMDB and MIB [55] The computation of MDB0 is easily performed by Equations (14) or (16) whereasthe computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 10: Effects of Hard and Soft Equality Constraints on

10 of 40

Monte Carlo because the acceptance region (as well as the critical region) for the case of multiplealternative hypotheses is analytically intractable

3 Material and Methods

Here we use the procedure provided by Rofatto et al [45] to compute the probability levelsassociated with IDS as well as to estimate the minimal biases mdash Minimal Detectable Bias(MDB) andMinimal Identifiable Bias (MIB) The procedure is summarised in Figure 1

The procedure to compute the critical value of max-w (k) is given step-by-step as follows

1 Specify the probability density function (pdf) of the w-test statistics The pdf assigned to thew-test statistics under anH0-distribution is

(w1 w2 w3 middot middot middot wn)T sim N (0Rw) (35)

where Rw isin Rntimesn is the correlation matrix with the main diagonal elements equal to 1 and theoff-diagonal elements are the correlation between the w-test statistics computed by Equation (20)

2 In order to have w-test statistics under H0 uniformly distributed random number sequencesare produced by the Mersenne Twister algorithm and then they are transformed into a normaldistribution by using the BoxndashMuller transformation [75] BoxndashMuller has already been usedin geodesy for Monte Carlo experiments [467677] Therefore a sequence of m random vectorsfrom the pdf assigned to the w-test statistics is generated according to Equation (35) In that casewe have a sequence of m vectors of the w-test statistics as follows[

(w1 w2 w3 middot middot middot wn)T(1)

(w1 w2 w3 middot middot middot wn)T(2)

middot middot middot (w1 w2 w3 middot middot middot wn)T(m)

](36)

3 Compute the test statistic by Equation (10) for each sequence of w-test statistics Thus we have(max

iisin1middotmiddotmiddot n|wi|(1) max

iisin1middotmiddotmiddot n|wi|(2) middot middot middot max

iisin1middotmiddotmiddot n|wi|(m)

)(37)

4 Sort in ascending order the maximum test statistic in Equation (37) getting a sorted vector wsuch that

w(1) lt w(2) w(3) middot middot middot lt w(m) (38)

The sorted values w in Equation (38) provide a discrete representation of the cumulative densityfunction (cdf) of the maximum test statistic max-w

5 Determine the critical value k as follows

k = w[(1minusαprime)timesm] (39)

where [] denotes rounding down to the next integer that indicates the position of the selectedelements in the ascending order of w This position corresponds to a critical value for a stipulatedoverall false alarm probability αprime This can be done for a sequence of values αprime in parallel

It is important to mention that the probability of a Type I decision error for multiple testing αprime islarger than that of Type I for single testing α0 This is because the critical region in multiple testing islarger than that in single hypothesis testing

After finding the critical value k the procedure based on Monte Carlo is also applied to computethe probability levels of IDS when there is an outlier in the dataset The overview of of the mainelements involved with the IDS was detailed in Section 2 The flowchart is displayed in Figure 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 11: Effects of Hard and Soft Equality Constraints on

11 of 40

Inp

ut

Functional amp Stochastic Models Significance level (αrsquo) Outlier magnitude range (nablai) Probability of correct detection ( ) Probability of correct identification ( )

CD CI

Gen

erat

e

ran

do

m d

raw

s

Generate a sample of random errors

119942~119925(120782 Q119942)

Compute the total error ε = e + cinablai

Compute the least-squares residual vector e = Rε

Iterativ

e ou

tlier elimin

ation

pro

cess

Compute the maximum w-test statistic

max wi

max wi gt k

End

Identify and remove the measurement from the dataset

Size of max wi gt 1

det ATWA gt 0

Compute w-test statistics

Compute and store nCI nMD nWE nover+ nover- and nol

Co

mp

ute th

e pro

bab

ilities

Number of Monte Carlo experiments (m) = 200000

Compute the probabilities CI MD WE over+ over- and ol

Sensitivity indicators Outlier detection (MDB)

amp Outlier identification (MIB)

YES NO

YES

NO

YES NO

YES

NO

Figure 1 Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS)for each measurement in the presence of an outlier [45]

The steps displayed in Figure 1 are detailed as follows

1 First random error vectors are synthetically generated on the basis of a multivariate normaldistribution because the assumed stochastic model for random errors is based on the matrixcovariance of the observations Here we use the Mersenne Twister algorithm [78] to generate asequence of random numbers and BoxndashMuller [75] to transform it into a normal distribution

2 The total error (ε) is a combination of random errors and its corresponding outlier is given asfollows

ε = e + cinablai (40)

The magnitude intervals of simulated outliers are user-defined The magnitude intervals arebased on the standard deviation of the observation (σ) eg |3σ| to |6σ| Since the outlier can bepositive or negative the proposed algorithm randomly selects the signal of the outlier (for q = 1)Here we use the discrete uniform distribution to select the signal of the outlier

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 12: Effects of Hard and Soft Equality Constraints on

12 of 40

In Equation (40) e is the random error generated from the normal distribution according toEquation (2) and the second part cinablai is the additional parameter that describes the alternativemodel according to Equation (40)

3 Next we compute the least-squares residuals vector according to Equation (3) but now we usethe total error (ε) in Equation 40 as follows

e = Rε (41)

4 Run the IDS In this step the test statistic is computed according to (9) Then the maximumtest statistic value is obtained according to Equation (10) Now the decision rule is based on thecritical value k computed by Monte Carlo in previous stage After identifying the measurementsuspected to be the most likely outlier it is excluded from the model and least-squares estimationand data snooping are applied iteratively until there are no further outliers identified in thedataset If two or more observations are simultaneously detected (ie if maxminusw gt k and the sizeof maxminusw gt 1 then the IDS is ended) Furthermore every time that a measurement suspectedto be the most likely outlier is removed from the model we check whether the normal matrixATW A is invertible or not If the determinant of ATW A is 0 det|ATW A| = 0 then there isa necessary and sufficient condition for a square matrix ATW A to be non-invertible In otherwords the IDS is ended when det|ATW A| = 0 If no outlier is detected (ie maxminusw lt k) thenthe IDS is also ended

The IDS procedure is performed for m experiments of random error vectors for each experimentcontaminated by an outlier in the ith measurement Therefore for each measurementcontaminated by an outlier there are υ = 1 m experiments

5 After running m=200000 experiments [47] the probabilities associated with IDS are computed(note m refers to the total number of Monte Carlo experiments) The probability of correctidentification (PCI) is the ratio between the number of times that the outlier is correctly identified(denoted as nCI) and m experiments ie

PCI =nCIm

(42)

Similar to Equation (42) the wrong decisions are computed as

PMD =nMD

m(43)

where nMD is the number of experiments in which IDS does not detect the outlier (PMDcorresponds to the rate of missed detection)

PWE =nWE

m(44)

where nWE is the number of experiments in which the IDS procedure flags and removes onlyone single non-outlying measurement while the lsquotruersquo outlier remains in the dataset (PWE is thewrong exclusion rate)

Pover+ =nover+

m(45)

where nover+ is the number of experiments in which IDS correctly identifies and removes theoutlying measurement and others and Pover+ corresponds to its probability

Poverminus =noverminus

m(46)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 13: Effects of Hard and Soft Equality Constraints on

13 of 40

where noverminus represents the number of experiments in which IDS identifies and removes morethan one non-outlying measurement whereas the lsquotrue outlierrsquo remains in the dataset (Poverminus isthe probability corresponding to this error probability class)

Pol =nolm

(47)

where nol is the number of experiments in which the detector in Equation (10) flags two (or more)measurements simultaneously during a given iteration of IDS Here this is referred to as thenumber of statistical overlap nol and Pol corresponds to its probability

6 Finally the sensitivity indicators (MDB and MIB) are computed based on Equation 33 andEquation 34 respectively

In this paper the probability levels associated with IDS were computed for each observationindividually and for each outlier magnitude However they were grouped into clusters based onnumber of local redundancy (ri) and maximum absolute correlation between the w-test statistics(ρwi wj ) Furthermore we take care to control the family-wise error rate

4 Case study of Levelling Geodetic Network

The analyse of the constraints effects on the probability levels of IDS are performed by an exampleof a levelling geodetic network

A levelling geodetic network is a set of points located on the Earthrsquos surface or near it andinterconnected by height difference measurements Some of these points are associated with a verticalheight reference system (ie some points have known height) whereas others are parameters (unknownheights) The term known here means that those points with known heights are constraints necessaryto ensure that parameters (unknown heights) are sufficiently estimable In the sense of modern geodeticreference systems realisation and unification of a vertical height system consists of a combination ofGNSS (Global Navigation Satellite System) and spirit levelling with geoid models We will not gointo the details about height geodetic reference system and frame (Readers who wish to have furtherdetails on that issue please see eg [79])

The mathematical model associated with a geodetic levelling network is linear (ie the levellingmeasurements bear a known linear relationship with the unknown heights) Geodetic networksusually has more measurements than parameters (ie n gt u) ie we have a redundant measurementsystem However due to intrinsic random errors in a system redundant measurements often leadto an inconsistent system of equations To make the system consistent we have to introduce theinformation about the random measurement errors The stochastical properties of the measurementerrors are directly associated with the assumption of the probability distribution of these errors Ingeodesy and many other scientific branches the well-known normal distribution is one of the mostused as measurement error model [80] Because of this the model ceases to be purely mathematicaland becomes a statistical model with functional and stochastic part

In the absence of outliers ie under null hypothesisH0 in Equation (1) the model for levellinggeodetic network can be written as follows

∆himinusj + e∆himinusj= hj minus hi (48)

where ∆himinusj is the height difference measured from point i to j and e∆himinusjis the random error associated

with the levelling measurement For a only one single levelling line one of these points is the constraint(known height) from which the height of another point is determined The point with known height isalso referred to as control point or benchmark In a geodetic network on the other hand we have a setof levelling lines that connect both from points of unknown height and from points of known height(constraints) Under normal working conditions (ie H0) the measurement errors model is then givenby Equation (2)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 14: Effects of Hard and Soft Equality Constraints on

14 of 40

In this case study we demonstrate the application of the proposed algorithm by [45] subject to adifferent constraints scenarios for that levelling geodetic network

Recent work has been focused on non-stochastic constraints (hard constraints) Here we alsoconsider cases where constraints are subject to a random errors (ie soft constraints) That softconstraints are non-deterministic and therefore they are measured model elements that can bedesignated as a priori information The approach of this example can be applied in the designstage of geodetic network analysis [6265] Furthermore the experiments performed here can also beextended to geodetic deformation analysis when the deformation effects are unmodelled in the senseof deterministic form [28]

41 Problem description

To analyse the constraints effects on the IDS an example is taken from a geodetic levellingnetwork with 12 height differences between the points The equipment used to measure the leveldifference can be an electronic digital level In that case the levelling measurement system comprisesof a special bar-coded staff (also called barcode rod) and a digital level (instrument) A digital level isbasically a telescope that enables a horizontal line of sight Digital levels consist of additional electronicimage processing components to automatically read and analyse digital (bar coded) levelling staffswhere the graduation is replaced by a manufacturer dependent code pattern Generally the result isautomatically stored in the data collector of the digital level An example of a digital level ndash bar-codestaff system is displayed in the Figure 2 For more details about digital level see eg [81ndash84]

Figure 2 Example of a digital level ndash bar-code staff system [45]

The standard-deviation of the uncorrelated measurements were the same and taken equal toσ = 1mm The points are indicated as A to G The eight network configuration are displayed in Figure3(a b c d and e) and their details are given as follows

1 Figure 3(a) Network with 1 hard constraint (ie network minimally constrained) Since thedimension of the network is 1D the minimum information necessary to estimate the unknownheights is one The height of G was fixed as control point (hard constraint) and 6 unknown heights(ABCDEF) were minimally constrained Therefore the redundancy of the system (or overalldegrees of freedom) was r = nminus rank(A) = nminus u = 12minus 6 = 6

2 Figure 3(b) Network with one extra hard constraint (ie two hard constraints) The heights A andD were taken as hard constraints (ie heights A and D were fixed) The redundancy of the system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 15: Effects of Hard and Soft Equality Constraints on

15 of 40

in that case was r = 12minus 5 = 7 with 5 unknown heights (BCEFG) over-constrained

3 Figure 3(c) Network with two extra hard constraints (ie three hard constraints)The heights A Dand G were taken as hard constraints In that case the redundancy of the system was r = 12minus 4 = 8

4 Figure 3(d) Network with two soft constraints (A and D) In that case a standard-deviation largerthan zero was assigned to both constraints (ie σc gt 0 In other words A and D were processedas being both observations and unknown parameters ie A and D were pseudo-observationsHere the both constraints were simultaneously relaxed by considering their uncertainties 10 timesworse than measurements (ie σc = 10times σ = 10mm) 10 times better than measurements (ieσc = 01mm) and their uncertainties equal to the measurements (σc = 1mm) In that case theredundancy of the system was r = 14minus 7 = 7

5 Figure 3(e) Network processed with A D and G as pseudo-observations Those three constraintswere simultaneously relaxed by considering their standard-deviations equal to σc = 10mm (10 timesworse than measurements) σc = 01mm (10 times better than measurements) and σc = 1mm (thesame as the measurements) In that case the redundancy of the system was r = 15minus 7 = 8

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

B C

D

E F

G y12 y11

A

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

(a) (b)

(c) (d)

(e)

Unknown Heights Soft Constraints Hard Constraints

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

y1

y2

y3

y4

y5

y6

y7 y8

y9 y10

A

B C

D

E F

G y12 y11

Figure 3 Levelling Geodetic Network subject to different constraint scenarios

The following system of equations for that problem is given by

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 16: Effects of Hard and Soft Equality Constraints on

16 of 40

y1 + e1 = hB minus hA

y2 + e2 = hC minus hB

y3 + e3 = hD minus hC

y7 + e7 = hB minus hG

y8 + e8 = hC minus hG

y11 + e11 = hB minus hF

y12 + e12 = hC minus hE

(49)

The functional model (A) for the system of equations in 49 is given by

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 0

(50)

Note that the rank defect of the matrix A is uminus rank(A) = 7minus 6 = 1 In that case there is neededto add at least one constraint in order to avoid rank deficiency of the matrix A This is guaranteedwhen we take one height as known For example from the network in Figure 3(a) we have added theheight G as known (ie as a hard constraint) In that case the constraint equation should be addedinto the system in 49 ie

y13 = hG with σy13 = 0 (51)

noticing that because the standard deviation is zero the observation is non-stochastic (hard constraint)and the residual ey13 = 0 This can generate problems in the inversion of the covariance matrix of theobservations Qe for the calculation of the weight matrix W because the weight for that constraintwould be 1

0 = infin In order to avoid that problem we have eliminated the rank deficiency of matrixA by removing the seventh column of matrix A in 50 associated with the height G Now we haveuminus rank(A) = 6minus 6 = 0

The constraint defines the geodetic datum ie the S-system ([85]p41) Another approach tosolving the system of equations in 2 could be based on generalised (pseudo) inverses (see eg [86])

The location of the constraints can be chosen in some circumstances for example during thedesign stage of a geodetic network For the special case of having a minimally constrained systemthe location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIBand MDB) [18] However more constraints than the minimum necessary to have a solution (ie extraconstraints or redundant constraints) can change the least-squares residuals and hence w-test statisticsand the minimal biases

From the network with one extra constraint (2 constraints) in Figure 3(b) for example the bothfirst (height A) and fourth column (height D) of matrix A in 50 were eliminated in the case of having

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 17: Effects of Hard and Soft Equality Constraints on

17 of 40

the two heights as hard constraints For the case where these two heights (A and D) were taken as softconstraints however two observation equations were added to Equation 49 ie

y13 + e13 = hA σy13 gt 0

y14 + e14 = hD σy14 gt 0(52)

In the case of soft constraints in Equation 52 two lines were added in matrix A In other wordsA and D were taken as pseudo-observations In that case the rank deficiency was also null (ieuminus rank(A) = 7minus 7 = 0) the redundancy of the system was r = nminus rank(A) = nminus u = 7 and thematrix A was given as follows

A =

minus1 1 0 0 0 0 00 minus1 1 0 0 0 00 0 minus1 1 0 0 00 0 0 minus1 1 0 00 0 0 0 minus1 1 01 0 0 0 0 minus1 00 1 0 0 0 0 minus10 0 1 0 0 0 minus10 0 0 0 1 0 minus10 0 0 0 0 1 minus10 1 0 0 0 minus1 00 0 1 0 minus1 0 01 0 0 0 0 0 00 0 0 1 0 0 0

(53)

For this example of two soft constraints and by considering the both soft constraints withstandard-deviation σc = 10mm the symmetric and positive semi-definite covariance matrix of theobservations (Qe) was given as follows

Qe =

1 0 0 middot middot middot 0 00 1 0 middot middot middot 0 00 0 1 middot middot middot 0 0

0 0 0 middot middot middot 100 00 0 0 middot middot middot 0 100

(54)

The last two rows and columns of the matrix Qe in Equation 54 refer to the variances (σc2 =

(10mm)2 = 100mm2) of the heights constraints A and D respectivelySimilarly matrices A and Qe were constructed for the other cases studied hereAlthough other measurements are able to identify an outlier for the case of having only one single

soft constraint the pseudo-observation (constraint) is not In that case the defect configurationis associated with the additional parameter in the constraint (ie the presence of an outlier inthe constraint) In other words an additional parameter on the soft constraint will not estimableFor example if the height point G was taken as a soft constraint the presence of an outlier inpseudo-observation G would lead to rank deficiency of matrix A ie u minus rank(A) = 8 minus 7 = 1Therefore the case of having only one single soft constraint was not considered here

42 Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure

In this section we present the results of the effects of the hard constraint on the IDS The scenariosin Figure 3(a) (network minimally constrained) Figure 3(b) (two hard constraints) and Figure 3(c)(three hard constraints) were considered for the analysis

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 18: Effects of Hard and Soft Equality Constraints on

18 of 40

The correlation coefficient between w-test statistics (ρwi wj ) were computed for the three scenariosaccording to Equation(20) Table 1 provides that correlation for that network minimally constrainedTable 2 for that network over-constrained with two hard constraints and Table 3 for that networkover-constrained with three hard constraints

Table 1 Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure3(a)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03wy3 10 10 02 01 00 02 -02 00 01 04wy4 10 02 01 00 02 -02 00 01 04wy5 10 02 -02 02 05 -05 03 -03wy6 10 -02 00 00 02 -04 -01wy7 10 -03 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -03 01 04wy10 10 04 01wy11 10 -01wy12 100

Table 2 Correlation matrix of w-test statistics for the levelling network with two hard constraints inFigure 3(b)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01wy2 10 04 -01 01 -01 04 -04 -01 01 03 -03wy3 10 04 -01 -03 -01 03 -01 -01 01 04wy4 10 04 04 01 01 -03 01 01 04wy5 10 04 -01 01 04 -04 03 -03wy6 10 -01 -01 -01 03 -04 -01wy7 10 -04 -03 -04 -04 -01wy8 10 -04 -03 -01 -04wy9 10 -04 01 04wy10 10 04 01wy11 10 -01wy12 10

Table 3 Correlation matrix of w-test statistics for the levelling network with three hard constraints inFigure 3(c)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03wy3 10 01 -01 -01 01 04 01 01 01 03wy4 10 03 01 -01 -01 -04 -01 01 03wy5 10 03 -01 01 03 -03 03 -03wy6 10 01 01 01 04 -03 -01wy7 10 -01 -01 -01 -03 -01wy8 10 -01 -01 -01 -03wy9 10 -01 01 03wy10 10 03 01wy11 10 -01wy12 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 19: Effects of Hard and Soft Equality Constraints on

19 of 40

Table 4 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for each scenario of hard constraint set out of in thisstudy ie Figure 3(ab and c)

Table 4 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai and the maximumabsolute correlation (maxρwi wj

) for each scenario of hard constraint

1 hard constraint 2 hard constraints 3 hard constraintsMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0396 1589 100 0583 1309 036 0708 1188 041y2 0500 1414 047 0583 1309 036 0583 1309 032y3 0396 1589 100 0583 1309 036 0708 1188 041y4 0396 1589 100 0583 1309 036 0708 1188 041y5 0500 1414 047 0583 1309 036 0583 1309 032y6 0396 1589 100 0583 1309 036 0708 1188 041y7 0563 1333 047 0583 1309 036 0708 1188 041y8 0563 1333 047 0583 1309 036 0708 1188 041y9 0563 1333 047 0583 1309 036 0708 1188 041y10 0563 1333 047 0583 1309 036 0708 1188 041y11 0583 1309 043 0583 1309 036 0583 1309 032y12 0583 1309 043 0583 1309 036 0583 1309 032

Here the twelve levelling measurements were clustered into four clusters The clustering wasdefined according to the local redundancy (ri) and the maximum absolute correlation (maxρwi wj

) inTable 4 This is the first time that a clustering technique based on the similarity of local redundancyand the maximum absolute correlation between w-test statistics is applied to a problem of geodeticnetworks Similarly this has been done in [45] The four cluster were defined as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12

The probability levels associated with IDS were averaged for each of these clusters Here we computethe probability levels of the IDS based on the procedure in Section (3) [45] The critical values werek = 389 k = 393 and k = 393 for 1 hard constraint 2 hard constraints and 3 hard constraintsrespectively These critical values were found for αprime = 0001 according to the procedure described inAppendix The probability of correct identification (PCI) and correct detection (PCD = 1minusPMD)are displayed in Figure 4 for each number of hard constraint (denoted by hc)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 20: Effects of Hard and Soft Equality Constraints on

20 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 4 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thecase of hard constraints and for αprime = 0001 Cluster 1(ab) Cluster 2(cd) Cluster 3(ef) and Cluster4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 21: Effects of Hard and Soft Equality Constraints on

21 of 40

The outlier magnitude were defined from |5σ| to |9σ| The outlier of |5σ| was chosen because itis approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play ascan be seen in Equation (14) or Equation (16) That MDB0(i) of |5σ| was computed for significancelevel of αprime = 0001 and a power of the test γ0 = 08 This strategy reduces the search space for a MIB(Minimal Identifiable Bias) because we will always have the following inequality MIB ge MDB0(i)[4755] Remember that the IDS procedure is an example of multiple hypothesis testing (see 2)

The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33)and (34) respectively The success rate for outlier detection and outlier identification were taken asbeing PCD = PCI = 08 respectively Table 5 provides the values of MDB and MIB for that case ofhard constraints

Table 5 MDB and MIB for the case of hard constraints based on αprime = 0001 and PCD = PCI = 08

1 hard constraint 2 hard constraints 3 hard constraintsCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 - 63 63 57 572 67 68 63 64 63 643 64 64 63 63 58 584 64 64 64 64 64 64

Figure 5 shows the probability of wrong exclusion (PWE) The over-identification cases (Pover+

and Poverminus) were smaller than 0001 (ie they were practically null) There were not statistical overlap(Pol) for clusters 2 3 and 4 We will discuss more about statistical overlap (Pol) later

(c) (d)

(a) (b)

Figure 5 Probability of wrong exclusion (PWE) for the case of hard constraints and for αprime = 0001Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 22: Effects of Hard and Soft Equality Constraints on

22 of 40

43 Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure

The both configuration in Figure 3(d) and Figure 3(e) were analysed in terms of soft constraintsIn that case the critical values were k = 395 k = 395 and k = 392 for two soft constraints withσc = 01mm σc = 1mm and σc = 10mm respectively In the case of three soft constraints the criticalvalues found were k = 399 k = 399 and k = 396 for σc = 01mm σc = 1mm and σc = 10mmrespectively All these critical values were computed for αprime = 0001

The correlation coefficient between w-test statistics (ρwi wj ) are displayed in Table 6 Table 7and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements(ie σc = 10times σ = 10mm) 10 times better than measurements (ie σc = 01mm) and equal to themeasurements (σc = 1mm) respectively

Table 6 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 10mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 006 -01wy2 10 02 02 03 02 05 -05 -02 02 03 -03 003 00wy3 10 10 02 01 00 02 -02 00 01 04 01 -01wy4 10 02 01 00 02 -02 00 01 04 -01 01wy5 10 02 -02 02 05 -05 03 -03 00 00wy6 10 -02 00 00 02 -04 -01 -01 01wy7 10 -03 -03 -04 -04 -01 00 00wy8 10 -04 -03 -01 -04 00 00wy9 10 -03 01 04 00 00wy10 10 04 01 00 00wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 7 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 01mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 04 04 -03 -01 04 -03 01 01 01 -04 -01 06 -06wy2 10 04 -01 02 -01 04 -04 -01 01 03 -03 04 -04wy3 10 04 -01 -03 -01 03 -01 -01 01 04 06 -06wy4 10 04 04 01 01 -03 01 01 04 -06 06wy5 10 04 -01 01 04 -04 03 -03 -04 04wy6 10 -01 -01 -01 03 -04 -01 -06 06wy7 10 -04 -03 -04 -04 -01 -02 02wy8 10 -04 -03 -01 -04 02 -02wy9 10 -04 01 04 02 -02wy10 10 04 01 -02 02wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 23: Effects of Hard and Soft Equality Constraints on

23 of 40

Table 8 Correlation matrix of w-test statistics for the levelling network with two soft constraints withσc = 1mm in Figure 3(d)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14

wy1 10 03 02 -01 01 07 -03 00 01 02 -04 -01 04 -04wy2 10 03 01 03 01 04 -04 -01 01 03 -03 03 -03wy3 10 07 01 -01 00 03 -02 -01 01 04 04 -04wy4 10 03 02 01 02 -03 00 01 04 -04 04wy5 10 03 -01 01 04 -04 03 -03 -03 03wy6 10 -02 -01 00 03 -04 -01 -04 04wy7 10 -03 -03 -04 -04 -01 -01 01wy8 10 -04 -03 -01 -04 01 -01wy9 10 -03 01 04 01 -01wy10 10 04 01 -01 01wy11 10 -01 00 00wy12 10 00 00wy13 10 -10wy14 10

Table 9 gives the local redundancy (ri) the standard-deviation of the LS-estimated outlier σnablai andthe maximum absolute correlation (maxρwi wj

) for the scenarios of two constraints

Table 9 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of two soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0581 1312 0564 0471 1457 0681 0397 1587 0994y2 0582 1311 0376 0533 1369 0423 0501 1413 0471y3 0581 1312 0564 0471 1457 0681 0397 1587 0994y4 0581 1312 0564 0471 1457 0681 0397 1587 0994y5 0582 1311 0376 0533 1369 0423 0501 1413 0471y6 0581 1312 0564 0471 1457 0681 0397 1587 0994y7 0583 1310 0359 0571 1324 0423 0563 1333 0471y8 0583 1310 0359 0571 1324 0423 0563 1333 0471y9 0583 1310 0359 0571 1324 0423 0563 1333 0471y10 0583 1310 0359 0571 1324 0423 0563 1333 0471y11 0583 1309 0358 0583 1309 0398 0583 1309 0433y12 0583 1309 0358 0583 1309 0398 0583 1309 0433y13 0007 1163 1000 0300 1826 1000 0497 14189 1000y14 0007 1163 1000 0300 1826 1000 0497 14189 1000

From Table 9 five clusters were defined for each case of two soft constraints ie for the casewhere heights A and D were given as soft constraints in Figure 3(d) as follows

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14

The probabilities of correct identification (PCI) and correct detection (PCD) for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayedin Figure 6

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 24: Effects of Hard and Soft Equality Constraints on

24 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 6 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of two soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 25: Effects of Hard and Soft Equality Constraints on

25 of 40

Note that the Cluster 5 is associated with the two soft constraints (ie y13 and y14) The probabilityof correct identification (PCI) for these both soft constraints were null However the probability ofcorrect detection were not Figure 7 shows the probability of correct detection (PCD) for these two softconstraints (ie heights A and D)

Figure 7 Probability of correct detection PCD for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of two soft constraints (heights A and D) are displayed in Figure 8 Figure 9 gives the theprobability of wrong exclusion PWE for that two constraints (ie heights A and D)

(c) (d)

(a) (b)

Figure 8 Probability of wrong exclusion PWE for the measurements subject to the scenarios of two softconstraints for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 26: Effects of Hard and Soft Equality Constraints on

26 of 40

Figure 9 Probability of wrong exclusion PWE for the two soft constraints (Cluster 5 heights A and D)and for αprime = 0001

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were practicallynull for that case

The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayedin Table 10

Table 10 MDB and MIB for the case of two soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 25 7 71 63 632 68 68 66 66 63 633 64 64 64 64 63 634 63 63 63 63 63 635 68 - 88 - 57 -

As with the case of two soft constraints Tables 1112 and 13 show the correlations (ρwi wj ) for thecase where there were three soft constraints ie for the network configuration in Figure 3(e)

Table 11 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 10mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 02 01 01 02 10 -02 00 00 02 -04 -01 01 00 00wy2 10 02 02 03 02 05 -05 -02 02 03 -03 00 00 00wy3 10 10 02 01 00 02 -02 00 01 04 00 -01 00wy4 10 02 01 00 02 -02 00 01 04 00 01 00wy5 10 02 -02 02 05 -05 03 -03 00 00 00wy6 10 -02 00 00 02 -04 -01 -01 00 00wy7 10 -03 -03 -04 -04 -01 00 00 00wy8 10 -04 -03 -01 -04 00 00 00wy9 10 -03 01 04 00 00 00wy10 10 04 01 00 00 00wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -05 -05wy14 10 -05wy15 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 27: Effects of Hard and Soft Equality Constraints on

27 of 40

Table 12 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 01mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 -01 -01 01 -04 -01 -01 -01 -03 -01 07 -01 -04wy2 10 03 -01 01 -01 03 -03 -01 01 03 -03 03 -03 00wy3 10 01 -01 -01 01 04 01 01 01 03 01 -07 04wy4 10 03 01 -01 -01 -04 -01 01 03 -01 07 -04wy5 10 03 -01 01 03 -03 03 -03 -03 03 00wy6 10 01 01 01 04 -03 -01 -07 01 04wy7 10 -01 -01 -01 -03 -01 -04 -01 04wy8 10 -01 -01 -01 -03 -01 -04 04wy9 10 -01 01 03 -01 -04 04wy10 10 03 01 -04 -01 04wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -02 -06wy14 10 -06wy15 10

Table 13 Correlation matrix of w-test statistics for the levelling network with three soft constraintswith σc = 1mm in Figure 3(e)

wy1 wy2 wy3 wy4 wy5 wy6 wy7 wy8 wy9 wy10 wy11 wy12 wy13 wy14 wy15

wy1 10 03 01 00 01 06 -03 00 00 01 -04 -01 05 -02 -02wy2 10 03 01 03 01 04 -04 -01 01 03 -03 02 -02 00wy3 10 06 01 00 00 03 -01 00 01 04 02 -05 02wy4 10 03 01 00 01 -03 00 01 04 -02 05 -02wy5 10 03 -01 01 04 -04 03 -03 -02 02 00wy6 10 -01 00 00 03 -04 -01 -05 02 02wy7 10 -03 -02 -03 -04 -01 -02 00 02wy8 10 -03 -02 -01 -04 00 -02 02wy9 10 -03 01 04 00 -02 02wy10 10 04 01 -02 00 02wy11 10 -01 00 00 00wy12 10 00 00 00wy13 10 -04 -05wy14 10 -05wy15 10

Table 14 Local redundancy (ri) standard-deviation of the LS-estimated outlier σnablai (mm) and themaximum absolute correlation (maxρwi wj

) for each scenario of three soft constraints

σc = 01mm σc = 1mm σc = 10mmMeasurement ri σnablai maxρwi wj

ri σnablai maxρwi wjri σnablai maxρwi wj

y1 0702 1194 0660 0502 1411 0577 0398 1586 0992y2 0582 1311 0326 0533 1369 0412 0501 1413 0470y3 0702 1194 0660 0502 1411 0577 0398 1586 0992y4 0702 1194 0660 0502 1411 0577 0398 1586 0992y5 0582 1311 0326 0533 1369 0412 0501 1413 0470y6 0702 1194 0660 0502 1411 0577 0398 1586 0992y7 0704 1192 0415 0602 1289 0412 0563 1333 0470y8 0704 1192 0415 0602 1289 0412 0563 1333 0470y9 0704 1192 0415 0602 1289 0412 0563 1333 0470y10 0704 1192 0415 0602 1289 0412 0563 1333 0470y11 0583 1309 0326 0583 1309 0385 0583 1309 0433y12 0583 1309 0326 0583 1309 0385 0583 1309 0433y13 0012 0904 0660 0425 1534 0542 0663 12283 0501y14 0012 0904 0660 0425 1534 0542 0663 12283 0501y15 0019 0718 063 05 1414 0542 0665 12268 0501

The probabilities of correct identification (PCI) and correct detection PCD for the measurements(Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A D and G) aredisplayed in Figure 10

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 28: Effects of Hard and Soft Equality Constraints on

28 of 40

(c) (d)

(e) (f)

(g) (h)

(a) (b)

Figure 10 Probability of correct identification (PCI) and Probability of correct detection (PCD) for themeasurements subject to the scenarios of three soft constraints for αprime = 0001 Cluster 1(ab) Cluster2(cd) Cluster 3(ef) and Cluster 4(gh)

The probabilities of correct identification (PCI) and correct detection (PCD) in Figure 8 werecomputed for the clusters based on Table 14 as follows

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 29: Effects of Hard and Soft Equality Constraints on

29 of 40

bull Cluster 1 y1 y3 y4 and y6bull Cluster 2 y2 and y5bull Cluster 3 y7 y8 y9 and y10bull Cluster 4 y11 and y12bull Cluster 5 y13 and y14bull Cluster 6 y15

Figure 11 shows the probabilities of correct identification PCI and correct detection PCD for thethree soft constraints ie for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 3(e)

(c) (d)

(a) (b)

Figure 11 Probability of correct identification (PCI) and Probability of correct detection (PCD) for thethree constraints and for αprime = 0001 Cluster 5(ab) and Cluster 6(cd)

The probability of wrong exclusion PWE for the measurements (Cluster 1 to Cluster 4) subject tothe scenarios of three soft constraints (heights A D and G) are displayed in Figure 12 Figure 13 givesthe the probability of wrong exclusion PWE for that three constraints (ie heights A D and G)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 30: Effects of Hard and Soft Equality Constraints on

30 of 40

(c) (d)

(a) (b)

Figure 12 Probability of wrong exclusion PWE for the measurements subject to the scenarios of threesoft constraints and for αprime = 0001 Cluster 1(a) Cluster 2(b) Cluster 3(c) and Cluster 4(d)

(a) (b)

Figure 13 Probability of wrong exclusion (PWE) for the three constraints and for αprime = 0001 Cluster5(a) and Cluster 6(b)

The over-identification cases (Pover+ and Poverminus) and the statistical overlap (Pol) were alsopractically null for that case of three soft constraints

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 31: Effects of Hard and Soft Equality Constraints on

31 of 40

The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayedin Table 15

Table 15 MDB and MIB for the case of three soft constraints based on αprime = 0001 and PCD = PCI = 08

σc = 10mm σc = 1mm σc = 01mmCluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)

1 75 22 68 69 58 592 68 69 66 67 64 643 64 64 63 63 58 584 63 63 63 63 63 635 59 60 74 75 435 456 59 59 69 69 346 355

5 Discussion

We start from the scenario of one hard constraint in Figure 3(a) Table 4 reveal that the maximumcorrelation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to100 (ie maxρwi wj

= 100) This means that the measurements belonging to Cluster 1 are those thatare connected with unknown heights whose connections are limited to only two The both unknownheights A and D are tied only to two measurements (ie y1 and y6 linked to A and y3 and y4 linked toD) therefore if an outlier occurred in one of these measurements we would only be able to analyse theconsistency between them but we would not be able to distinguish which of them was contaminatedby an outlier This means that we would only be able to detect because the w-test statistics could belarger than a critical value k however in that case the values of w-test statistics would be the same andtherefore we would not have only one unique maximum w-test statistics but actually would have fourmaximum w-test statistics In other words the equation systems associated with the measurements ofCluster 1 are linearly dependent [87] Therefore there is no reliability in terms of outlier identificationfor the Cluster 1 as can be seen in Figure 3(a)

From Figure 3(b) we can note that there is reliability in terms of outlier detection for Cluster 1and it is caused by overlapping w-test statistics The probability of statistics overlap Pol for Cluster 1in this scenario of minimally constrained network is displayed in Figure 14

Figure 14 Probability of correct detection PCD and statistical overlap (Pol) for the Cluster 1 subject toone hard constraint and for αprime = 0001

The problem of not having more connections (ie more measurements) for the unknown heightsA and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) aretaken as hard constraints in 3(b) or when the heights A D and G are hard constraints in figure 3(c)Figure 3(a b) show that the measurements of Cluster 1 are now able to identify an outlier when twohard constraints (A and D fixed) are in play It is also verified to the case of three hard constraints (A Dand G fixes) in Figure 3(e f) ie there is reliability in terms of both outlier detection and identificationfor these measurements in those conditions

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 32: Effects of Hard and Soft Equality Constraints on

32 of 40

One can argue that the more constraints the larger the probability of correct identification (PCI)and correct detection (PCD) This claim is not entirely correct This may be true in case where all pointson the network have the same number of connections However in the example given if both heightsB and G were selected as hard constraints the measurements of the Cluster 1 (ie y1 y6 y3 and y4)would still have their least-squares residuals fully correlated (ie ρwy1 wy6

= 100 and ρwy3 wy4= 100)

and therefore there would be no reliability in terms of outlier identification for these measurementsFrom Table 5 we observe different behaviour for the clusters as follows

bull Cluster 1 there was no MIB for the case of having only one single hard constraint whereasMDB = MIB for the other cases However both MDB and MIB decrease significantly with theincrease in the number of hard constraints

bull Cluster 2 MDB slightly smaller than MIB Both MDB and MIB were practically the same for thecase of having two or three hard constraints

bull Cluster 3 MDB = MIB for all cases of hard constraints however both MDB and MIB decreasesignificantly with the increase in the number of hard constraints

bull Cluster 4 MDB e MIB were equal for all cases

In terms of outlier detection and identification therefore Cluster 1 was more sensitive toconstraints Cluster 3 relatively sensitive to constraints whereas the Cluster 4 completely insensitiveto constraints and Cluster 2 relatively insensitive to constraints This is also can be seen in Figure 4The reason for this is that the local redundancy (ri) of the Cluster 1 increased with the increase of thenumber of hard constraints whereas the Cluster 4 have remained the same as can be seen in Table 4

Leaving aside the cases of statistical overlap (Pol) the network presents low least-squares residualscorrelation (ρwi wj lt 05) and high local redundancy (ri gt 05) Because of this the probabilities ofwrong exclusion (PWE) were less than 1 as can be seen in Figure 5 The over-identification cases(Pover+ and Poverminus) were practically null Consequently PCI asymp PCD Due of this fact the family-wiseerror rate (αprime) should be increased in order to have more success rate in the outlier detection andidentification [45]

From Figure 15 we observe that increasing the αprime increases the both probabilities of correctidentification (PCI) and correct detection (PCD) for outlier magnitude from 5σ to 6σ in the case ofthree hard constraints and from 5σ to 68σ in the case of two hard constraints Although the rates ofPover+ and PWE also increase they are not significant when compared to the improvement of correctidentification (PCI) and detection (PCD) This same analysis can be done for the other clusters

In terms of soft constraints for the cases of two constraints in Figure 3(d) we observe from Table 9that the larger the relaxation of the constraint (ie the larger the standard-deviation of the constraintσc) the larger the residuals correlation (ρwi wj ) and the standard-deviation of the outlier σnablai and thesmaller the local redundancy (ri) Consequently the probabilities of correct identification (PCI) anddetection (PCD) get smaller and smaller with the relaxation of the constraints whereas the probabilityof wrong exclusion gets larger (PWE) This can be more clearly verified in Figure 6(a b) and Figure8(a) for Cluster 1 whose measurements are connected with the constraints A and D (ie y13 and y14 inTable 9 respectively)

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

33 of 40

(a) (b)

(c) (d)

Figure 15 Probabilities of correct identification PCI (a) correct detection PCD (b) over-identificationpositive Pover+ (c) and wrong exclusion PWE for the Cluster 1 subject to two and three hard constraintsand for αprime = 0001 and αprime = 01

Note from Figure 8(a) that the probability of wrong exclusion (PWE) increases as the magnitude ofthe outlier (nablai) increases However this is true up to a certain limit of outlier magnitude The effect ofresiduals correlation ρwi wj on the rates of wrong exclusion (PWE) and correct identification (PCI) tendsto decrease with the increase in the magnitude of the outlier nablai This effect is more clearly verified forCluster 1 in case where the precision of the constraints are ten times worse than the measurementsσc = 10σ = 10mm

Note from Figure 6 that identifying an outlier in the Cluster 1 (ie y1 y3 y4 and y6) whenσc = 10mm is more difficult than the other clusters This is due to the fact the Cluster 1 has a higherresiduals correlation ρwi wj = 0994 than other clusters In general therefore we observe that the largerthe relaxation of the constraints the larger is the effect of the correlation ρwi wj on the success rate ofoutlier identification (PCI) Consequently the higher the sensitivity indicator for outlier identification(MIB) Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenariowhere the standard-deviations of that two soft constraints are σc = 10mm is MIBMDB=2575=33On the other hand the relationship between MIB and MDB is practically one (ie MIBMDB=10) forthe others scenarios

If the FWE rate (αprime) were increased for the case where the two soft constraints of σc = 10mmare in play we would not have great advantages for Cluster 1 due to its high residuals correlation(ρwi wj = 994) From Figure 16 we can observe that the probability of correct identification (PCI)for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined αprime = 01 than αprime = 0001However the success rate is still less than 80 ie PCI lt 08 Note for example the correct

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

34 of 40

identification rate is PCI = 56 for an outlier magnitude of nablai = 8σ and αprime = 01 For αprime = 01 theMIB = 335σ = 335mm whereas for αprime = 0001 is MIB = 25σ = 25mm Therefore in that case theMIB for PCI = 08(80) and αprime = 01 would be 34 larger than user-defined αprime = 0001

Figure 16 Probabilities of correct identification (PCI) for the Cluster 1 subject to two soft constraints (2sc) A and D for αprime = 0001 and αprime = 01

The soft constraints A and D were grouped in the Cluster 5 (ie A and D were treated aspseudo-observations in the model) There is no reliability in terms of outlier identification for theconstraints because the residual correlation between them is ρwi wj = 100 as can be seen in Table 9for y13 and y14 However these soft constraints are able to detect an outlier In that case the probabilityof correct detection PCD in Figure 7 is mainly caused by the statistical overlap Pol as can be seen in theexample of the case of σc = 10mm in Figure 17 From Table 10 we observe that the larger the relaxationof the constraints the larger the MDB Note that the values of MDB are given in σ and therefore theMDB for σc = 10mm is larger than σc = 1mm and σc = 01mm ie we had the following inequalityMDB = 68σc = 68times 10mm = 68mm gt MDB = 88σc = 88times 1mm = 88mm gt MDB = 57σc =

57times 01mm = 57mm In that case if the FWE (αprime) were increased the rate of outlier detection by theCluster 4 (ie by the soft constraints) would increase

Figure 17 Probabilities of correct detection PCD and statistical overlap Pol for the two soft constraintsA and D (Cluster 5) with σc = 10mm and for αprime = 0001

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

35 of 40

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two softconstraints are verified in case of three soft constraints as can be seen in the Figures 10 11 12 and 13

In case of having three soft constraints in Figure 3(e) there is reliability in terms of outlieridentification for the three pseudo-observations y13 y14 and y15 (ie for AD and G) as can be seen inFigure 10 and Table 15 In that case we also observe that the probabilities of correct detection PCD ofthe soft constraints A and D (ie Cluster 5) were approximately 13 for σc = 10mm 16 for σc = 1mmand 24 for σc = 01mm larger than the scenario of the network subject to two soft constraints Table15 reveals that the advantage of having three soft constraints instead of two constraints is that theconstraints become identifiable in the presence of an outlier The behaviour of the probabilities ofcorrect detection (PCD) correct identification (PCI) and wrong exclusion (PWE) was similar to the caseof the two soft constraints

Furthermore the larger the relaxation of the constraints the smaller the residuals correlationbetween the measurements and the soft constraints and the larger the residuals correlation amongthe measurements This effect can be verified in Tables 6 7 and 8 for two soft constraints as well as inTables 11 12 and 13 for three soft constraints

In general we observe that the case of two soft constraints for σc = 01mm was comparablewith two hard constraints (see eg Table 5 and Table 10) in terms of the probability levels associatedwith IDS for the measurements (ie clusters 1 2 3 and 4) In the same way for the case of twosoft constraints with σc = 1mm or σc = 10mm the probabilities levels were similar to the one hardconstraint for that measurements with the benefit of two soft constraints having reliability in terms ofoutlier identification for the Cluster 1 Finally the three soft constraints with σc = 1mm and σc = 10mmwere comparable to the two soft constraints for that scenario of constraints relaxation wheres thethree soft constraints for σc = 01mm showed similar outcomes with three hard constraints for themeasurements (see eg Table 5 and Table 15) In that case however an advantage of the three softconstraints on the three hard constraints is the possibility of analysing the sensitivity of the constraintsIt is important to emphasised that the stochastic models of the measurements and constraints wereassumed well-known and defined for the analyzes performed here

6 Conclusions

In general the probability of correct identification (PCI) for the case of hard constraints is largerthan soft constraints It can be verified in Figure 4 Figure 6 and Figure 10 Therefore hard constraintsshould be used in the stage of pre-processing data for the purpose of identifying and removing possibleoutlying measurements In that process one should opt to set out the redundant hard constraints atpoints in the network where the smallest connections exist After identifying and removing possibleoutliers the soft constraints should be employed to propagate the uncertainties of the constraints(pseudobservations) to the model parameters during the process of least-squares estimation Thisrecommendation is valid for outlier detection and identification purpose

We highlight the main findings of this research as follows

bull Under a system of a high local redundancy ri gt 05 and low residuals correlation (ρwi wj lt 05) ifone increase the family-wise error rate (FWE) of the test statistic the performance of the procedurewill be improved for both scenarios of hard constraints and soft constraintsbull The larger the relaxation of the constraints the larger is the effect of the residuals correlation

(ρwi wj ) on the success rate of outlier identification (PCI) Consequently the higher the sensitivityindicator for outlier identification (MIB) and therefore more difficult it becomes to identify anoutlier

bull Under a scenario of soft constraints one should set out at least three soft constraints in order toidentify an outlier in the constraints

In other types of analysis for example deformation analysis of geodetic networks one shouldformulated the constraints as pseudo-observations with σc gt 0 in an adjustment model of system

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

36 of 40

equations In other words the relaxation of constraints makes it possible to estimate deformationeffects that are unmodelled in the deterministic model matrix for example deformations which have aspatial and temporal variations In that case the deformation can be modelled by covariance functionsthat determine the cofactors between constraints [28] This should be applied to control points locatedin the structure but not to control points located outside the structure The control points locatedoutside the structure should be hard constraints and have their stability verified by an independentprocess in order to analyse the deformation of the structure So one will be also able to analyse thesensitivity in terms of minimal detectable deformations andor minimal identifiable deformations [23]Future researches will be addressed to sensitivity analysis for deformation monitoring networks

Author Contributions Conceptualization VFR MTM and IK Data curation VFR MTM and IK Formalanalysis VFR and IK Funding acquisition MTM MRV and LGdSJ Investigation VFR MethodologyVFR and MTM Project administration MTM IK MRV and LGdSJ Software VFR and LGdSJSupervision MTM IK and MRV Validation VFR and MRV Writing ndash original draft VFR Writing ndashreview amp editing VFR MTM IK and MRV All authors have read and agreed to the published version of themanuscript

Funding This research was funded by the CNPqmdashConselho Nacional de Desenvolvimento Cientiacutefico eTecnoloacutegicomdashBrasil (proc no 1035872019-5) This research and the APC were also funded by PETROBRAS(Grant Number 201800545-0)

Acknowledgments In this section you can acknowledge any support given which is not covered by the authorcontribution or funding sections This may include administrative and technical support or donations in kind(eg materials used for experiments)

Conflicts of Interest The authors declare no conflict of interest

References

1 Lehmann R Neitzel F Testing the compatibility of constraints for parameters of a geodetic adjustmentmodel Journal of Geodesy 2013 87 555ndash566 doi101007s00190-013-0627-2

2 Fang X Weighted total least-squares with constraints a universal formula for geodetic symmetricaltransformations Journal of Geodesy 2015 89 459ndash469 doi101007s00190-015-0790-8

3 Amiri-Simkooei AR Weighted Total Least Squares with Singular Covariance Matrices Subject to Weightedand Hard Constraints J Surv Eng 2017 143 04017018

4 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

5 Odijk D Stochastic modelling of the ionosphere for fast GPS ambiguity resolution Geodesy Beyond 2000Schwarz KP Ed Springer Berlin Heidelberg Berlin Heidelberg 2000 pp 387ndash392

6 de Oliveira PS Morel L Fund F Legros R Monico JFG Durand S Durand F Modelingtropospheric wet delays with dense and sparse network configurations for PPP-RTK GPS Solutions 201721 237ndash250 doi101007s10291-016-0518-0

7 Chen B Wu L Dai W Luo X Xu Y A new parameterized approach for ionospheric tomography GPSSolutions 2019 23 96 doi101007s10291-019-0893-4

8 Ahmed E Ahmed S Mohamed D Mostafa R jag 2019 Vol 13 chapter Vertical ionospheric delayestimation for single-receiver operation p 81 2 doi101515jag-2018-0041

9 Wang A Chen J Zhang Y Meng L Wang J Performance of Selected Ionospheric Models inMulti-Global Navigation Satellite System Single-Frequency Positioning over China Remote Sensing 201911 doi103390rs11172070

10 Deo M El-Mowafy A Comparison of advanced troposphere models for aiding reductionof PPP convergence time in Australia Journal of Spatial Science 2019 64 381ndash403doi1010801449859620181472046

11 Dai X Lou Y Dai Z Hu C Peng Y Qiao J Shi C Precise Orbit Determination for GNSS ManeuveringSatellite with the Constraint of a Predicted Clock Remote Sensing 2019 11 doi103390rs11161949

12 Courant R Hilbert D Methods of mathematical physics Vol 2 Vol 2 Wiley [New York] 196213 Grafarend EW Optimization of Geodetic Networks The Canadian Surveyor 1974 28 716ndash723

doi101139tcs-1974-0120

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

37 of 40

14 Teunissen P Zero Order Design Generalized Inverses Adjustment the Datum Problem andS-Transformations Optimization and Design of Geodetic Networks Grafarend EW Sansograve F EdsSpringer Berlin Heidelberg Berlin Heidelberg 1985 pp 11ndash55

15 Dermanis A Free network solutions with the DLT method ISPRS Journal of Photogrammetry and RemoteSensing 1994 49 2 ndash 12 doihttpsdoiorg1010160924-2716(94)90061-2

16 Kotsakis C Reference frame stability and nonlinear distortion in minimum-constrained networkadjustment Journal of Geodesy 2012 86 755ndash774 doi101007s00190-012-0555-6

17 Kotsakis C Datum Definition and Minimal Constraints In Encyclopedia of Geodesy Grafarend E EdSpringer International Publishing Cham 2018 pp 1ndash6 doi101007978-3-319-02370-0_157-1

18 Matsuoka MT Rofatto VF Klein I Roberto Veronez M da Silveira LG Neto JBS Alves ACRControl Points Selection Based on Maximum External Reliability for Designing Geodetic Networks AppliedSciences 2020 10 doi103390app10020687

19 Baarda W S-transformations and criterion matrices Publ on geodesy New Series 1973 520 Schaffrin B Aspects of Network Design Optimization and Design of Geodetic Networks Grafarend

EW Sansograve F Eds Springer Berlin Heidelberg Berlin Heidelberg 1985 pp 548ndash59721 Xu PB A general solution in geodetic nonlinear rank-defect models 199722 Altamimi Z Dermanis A The Choice of Reference System in ITRF Formulation VII Hotine-Marussi

Symposium on Mathematical Geodesy Sneeuw N Novaacutek P Crespi M Sansograve F Eds Springer BerlinHeidelberg Berlin Heidelberg 2012 pp 329ndash334

23 Velsink H On the deformation analysis of point fields J Geod 2015 89 1071ndash1087doi101007s00190-015-0835-z

24 Velsink H Extendable linearised adjustment model for deformation analysis Survey Review 201547 397ndash410 doi1011791752270614Y0000000140

25 Hiddo V jag 2016 Vol 10 chapter Time Series Analysis of 3D Coordinates Using NonstochasticObservations p 5 1 doi101515jag-2015-0027

26 Rao CR Markoffrsquos Theorem with Linear Restrictions on Parameters SankhyAuml The Indian Journal ofStatistics (1933-1960) 1945 7 16ndash19

27 Velsink H Testing Methods for Adjustment Models with Constraints Journal of Surveying Engineering2018 144 04018009 doi101061(ASCE)SU1943-54280000260

28 Hiddo V Testing deformation hypotheses by constraints on a time series of geodetic observations J ApplGeod 2017 12 77ndash93 doi101515jag-2017-0028

29 Baarda W A testing procedure for use in geodetic networks Publ on geodesy New Series 1968 230 Teunissen P First and second moments of non-linear least-squares estimators Bull Geodesique (Journal of

Geodesy) 1989 63 253ndash262 doihttpsdoiorg101007BF0252047531 Teunissen P Testing Theory an introduction 2 ed Delft University Press 200632 Amiri-Simkooei A Least-squares variance component estimation theory and GPS applications Phd

thesis Delft University of Technology 200733 Koch KR Parameter estimation and hypothesis testing in linear models 2 ed Springer 199934 Tao H Feng H Xu L Miao M Long H Yue J Li Z Yang G Yang X Fan L Estimation

of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data Sensors 2020 20doi103390s20051296

35 Han M Wang Q Wen Y He M He X The Application of Robust Least Squares Method inFrequency Lock Loop Fusion for Global Navigation Satellite System Receivers Sensors 2020 20doi103390s20041224

36 Wei C Estimation for the Discretely Observed CoxacircldquoIngersollacircldquoRoss Model Driven by Small SymmetricalStable Noises Symmetry 2020 12 doi103390sym12030327

37 Sakic P Ballu V Royer JY A Multi-Observation Least-Squares Inversion for GNSS-Acoustic SeafloorPositioning Remote Sensing 2020 12 doi103390rs12030448

38 Livadiotis G General Fitting Methods Based on Lq Norms and their Optimization Stats 2020 3 16ndash31doi103390stats3010002

39 Farooq SZ Yang D Ada ENJ A Cycle Slip Detection Framework for Reliable Single Frequency RTKPositioning Sensors 2020 20 doi103390s20010304

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

38 of 40

40 Araveeporn A Comparing Parameter Estimation of Random Coefficient Autoregressive Model byFrequentist Method Mathematics 2020 8 doi103390math8010062

41 Zhang C Peng T Zhou J Ji J Wang X An Improved Autoencoder and Partial Least SquaresRegression-Based Extreme Learning Machine Model for Pump Turbine Characteristics Applied Sciences2019 9 doi103390app9193987

42 Ji J Yang M Jiang L He J Teng Z Liu Y Song H Output-Only Parameters Identification ofEarthquake-Excited Building Structures with Least Squares and Input Modification Process AppliedSciences 2019 9 doi103390app9040696

43 Kargoll B On the theory and application of model misspecification tests in geodesy Doctoral thesisUniversity of Bonn Landwirtschaftliche Fakultaumlt German Bonn 2007

44 Lehmann R On the formulation of the alternative hypothesis for geodetic outlier detection J Geod 201387 373ndash386 doi101007s00190-012-0607-y

45 Rofatto VF Matsuoka MT Klein I Roberto Veronez M da Silveira LG A Monte Carlo-Based OutlierDiagnosis Method for Sensitivity Analysis Remote Sensing 2020 12 doi103390rs12050860

46 Lehmann R Improved critical values for extreme normalized and studentized residuals in GaussndashMarkovmodels J Geod 2012 86 1137ndash1146 doi101007s00190-012-0569-0

47 Rofatto VF Matsuoka MT Klein I Veronez MR Bonimani ML Lehmann R A half-century ofBaardarsquos concept of reliability a review new perspectives and applications Surv Rev 2018 0 1ndash17[httpsdoiorg1010800039626520181548118] doi1010800039626520181548118

48 Zaminpardaz S Teunissen P DIA-datasnooping and identifiability J Geod 2019 93 85ndash101doi101007s00190-018-1141-3

49 Kok JJ States U On data snooping and multiple outlier testing [microform] Johan J Kok US Dept ofCommerce National Oceanic and Atmospheric Administration National Ocean Service Charting andGeodetic Services For sale by the National Geodetic Information Center NOAA Rockville Md 1984 p61 p

50 Knight NL Wang J Rizos C Generalised measures of reliability for multiple outliers Journal of Geodesy2010 84 625ndash635 doi101007s00190-010-0392-4

51 Gui Q Li X Gong Y Li B Li G A Bayesian unmasking method for locating multiplegross errors based on posterior probabilities of classification variables J Geod 2011 85 191ndash203doi101007s00190-010-0429-8

52 Klein I Matsuoka MT Guzatto MP Nievinski FG An approach to identify multipleoutliers based on sequential likelihood ratio tests Surv Rev 2017 49 449ndash457[httpsdoiorg1010800039626520161212970] doi1010800039626520161212970

53 Hawkins DM Identification of Outliers 1 ed Springer Netherlands 1980 httpsdoiorg101007978-94-015-3994-4

54 Baarda W Statistical concepts in geodesy Publ on geodesy New Series 1967 255 Imparato D Teunissen P Tiberius C Minimal Detectable and Identifiable Biases for quality

control Surv Rev 2019 51 289ndash299 [httpsdoiorg1010800039626520181437947]doi1010800039626520181437947

56 Teunissen PJG Distributional theory for the DIA method Journal of Geodesy 2018 92 59ndash80doi101007s00190-017-1045-7

57 Foumlrstner W Reliability and discernability of extended Gauss-Markov models Seminar on mathematicalmodels of GeodeticPhotogrammetric Point Determination with Regard to Outliers and Systematic Errors 1983 Vol Series A pp 79ndash103

58 Wang J Knight NL New Outlier Separability Test and Its Application in GNSS Positioning 201259 Lehmann R Loumlsler M Multiple Outlier Detection Hypothesis Tests versus Model Selection by

Information Criteria J Surv Eng 2016 142 0401601760 Teunissen PJG Imparato D Tiberius CCJM Does RAIM with Correct Exclusion Produce Unbiased

Positions Sensors 2017 17 doi103390s1707150861 Lehmann R Loumlsler M Congruence analysis of geodetic networks ndash hypothesis tests versus model

selection by information criteria J Appl Geod 2017 11 271ndash283 doi101515jag-2016-0049

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

39 of 40

62 Rofatto V Matsuoka M Klein I DESIGN OF GEODETIC NETWORKS BASED ON OUTLIERIDENTIFICATION CRITERIA AN EXAMPLE APPLIED TO THE LEVELING NETWORK Bull Geod Sci2018 24 152ndash170 httpdxdoiorg101590s1982-21702018000200011

63 Nguyen VK Renault Milocco R Environment Monitoring for Anomaly Detection System UsingSmartphones Sensors 2019 19 doi103390s19183834

64 Nie Y Yang L Shen Y Specific Direction-Based Outlier Detection Approach for GNSS Vector NetworksSensors 2019 19 doi103390s19081836

65 Klein I Matsuoka MT Guzatto MP Nievinski FG Veronez MR Rofatto VF A newrelationship between the quality criteria for geodetic networks Journal of Geodesy 2019 93 529ndash544doi101007s00190-018-1181-8

66 Mierlo JV Statistical Analysis of Geodetic Measurements for the Investigation of Crustal Movements InRecent Crustal Movements 1977 Whitten C Green R Meade B Eds Elsevier 1979 Vol 13 Developmentsin Geotectonics pp 457 ndash 467 doihttpsdoiorg101016B978-0-444-41783-150072-6

67 van der Marel H Roumlsters AJM Statistical Testing and Quality Analysis in 3-D Networks (part II)Application to GPS Global Positioning System An Overview Bock Y Leppard N Eds Springer NewYork New York NY 1990 pp 290ndash297

68 Yang L Wang J Knight NL Shen Y Outlier separability analysis with a multiple alternative hypothesestest J Geod 2013 87 591ndash604 doi101007s00190-013-0629-0

69 Proacuteszynski W Revisiting Baardarsquos concept of minimal detectable bias with regard to outlier identifiabilityJ Geod 2015 89 993ndash1003 doi101007s00190-015-0828-y

70 Romano JP Wolf M Multiple Testing of One-Sided Hypotheses Combining Bonferroni and the BootstrapPredictive Econometrics and Big Data Kreinovich V Sriboonchitta S Chakpitak N Eds SpringerInternational Publishing Cham 2018 pp 78ndash94

71 Teunissen P An Integrity and Quality Control Procedure for use in Multi Sensor Integration 199072 Aydin C Demirel H Computation of Baardarsquos lower bound of the non-centrality parameter J Geod

2004 78 437ndash441 doi101007s00190-004-0406-173 Aydin C Power of Global Test in Deformation Analysis J Surv Eng 2012 138 51ndash56 https

doiorg101061(ASCE)SU1943-5428000006474 Bonferroni C Teoria statistica delle classi e calcolo delle probabilitagrave Pubblicazioni del R Istituto superiore di

scienze economiche e commerciali di Firenze Libreria internazionale Seeber 193675 Box GEP Muller ME A Note on the Generation of Random Normal Deviates The Annals of Mathematical

Statistics 1958 29 610ndash611 httpsdoi101214aoms117770664576 Lemeshko BY Lemeshko SB Extending the Application of Grubbs-Type Tests in Rejecting Anomalous

Measurements Measurement Techniques 2005 48 536ndash547 doi101007s11018-005-0179-977 Lehmann R Scheffler T Monte Carlo based data snooping with application to a geodetic network J

Appl Geod 2011 5 123ndash134 httpsdoiorg101515JAG201101478 Matsumoto M Nishimura T Mersenne twister A 623-dimensionally equidistributed uniform

pseudo-random number generator ACM Transactions on Modeling and Computer Simulation 1998 8 3ndash30httpsdlacmorgcitationcfmid=272991

79 Ihde J Saacutenchez L Barzaghi R Drewes H Foerste C Gruber T Liebsch G Marti U Pail R SiderisM Definition and Proposed Realization of the International Height Reference System (IHRS) Surveys inGeophysics 2017 38 549ndash570 doi101007s10712-017-9409-3

80 Lehmann R Observation error model selection by information criteria vs normality testing StudiaGeophysica et Geodaetica 2015 59 489ndash504 doi101007s11200-015-0725-0

81 Algarni DA Ali AE Heighting and Distance Accuracy with Electronic Digital Levels Journal of King SaudUniversity - Engineering Sciences 1998 10 229 ndash 239 doihttpsdoiorg101016S1018-3639(18)30698-6

82 Gemin ARS Matos AS Faggion PLAs APPLICATION OF CALIBRATION CERTIFICATE OFDIGITAL LEVELING SYSTEMS IN THE MONITORING OF STRUCTURES A CASE STUDY AT THEGOVERNADOR JOSAtilde RICHA HYDROELECTRIC POWER PLANT - PR Boletim de CiAtildeGeodAtilde 201824 235 ndash 249

83 Takalo M Rouhiainen P Development of a System Calibration Comparator for Digital Levels in FinlandNordic Journal of Surveying and Real Estate Research 1 1

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

40 of 40

84 Wiedemann W Wagner A Wunderlich T Using IATS to Read and Analyze Digital Levelling Staffs SIG2016 Paar R Marendic A Zrinjski M Eds Croatian Geodetic Society 2016 pp 515ndash526

85 PJG T The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment Publications onGeodesy New Series 1985 8

86 Rao CR Mitra SK Generalized inverse of a matrix and its applications 197287 Hekimoglu S Erenoglu RC Sanli DU Erdogan B Detecting Configuration Weaknesses in Geodetic

Networks Survey Review 2011 43 713ndash730 doi101179003962611X13117748892632

Preprints (wwwpreprintsorg) | NOT PEER-REVIEWED | Posted 8 April 2020 doi1020944preprints2020040119v1

Page 33: Effects of Hard and Soft Equality Constraints on
Page 34: Effects of Hard and Soft Equality Constraints on
Page 35: Effects of Hard and Soft Equality Constraints on
Page 36: Effects of Hard and Soft Equality Constraints on
Page 37: Effects of Hard and Soft Equality Constraints on
Page 38: Effects of Hard and Soft Equality Constraints on
Page 39: Effects of Hard and Soft Equality Constraints on
Page 40: Effects of Hard and Soft Equality Constraints on