ieee transactions on power systems, vol. 00, no. 00,...

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 00, NO. 00, 2016 2817

Online Detection of Low-Quality SynchrophasorMeasurements: A Data-Driven Approach

Meng Wu, Student Member, IEEE, and Le Xie, Senior Member, IEEE

Abstract—In this paper, an online data-driven approach isproposed for the detection of low-quality synchrophasor measure-ments. The proposed method leverages the spatio-temporal simi-larities among multiple-time-instant synchrophasor measurementsand formulates the low-quality synchrophasor data as spatio-temporal outliers. A density-based local outlier detection techniqueis proposed to detect the spatio-temporal outliers. This data-drivenapproach involves no system modeling information. The detectionalgorithm can operate under both normal and fault-on systemconditions, with fast computation speed suitable for online applica-tions. Case studies on both synthetic and real-world synchrophasordata verify the effectiveness of the proposed approach.

Index Terms—Data mining, data quality improvement, outlierdetection, spatio-temporal similarity, synchrophasor.

I. INTRODUCTION

IN RECENT years, there has been significant deploymentof synchrophasor-based measurement systems around the

world. Compared with traditional metering units in supervisorycontrol and data acquisition (SCADA) systems, synchrophasorsprovide measurements with much higher sampling rates.The high-resolution synchrophasor measurements containrich information on system dynamics, which stimulates thedevelopment of advanced analytics, such as dynamic stateestimation [1], synchrophasor-based model validation [2],and wide-area control and protection [3], [4]. However, asa large amount of data is streaming into the control center,the synchrophasor data quality problem becomes one of themajor challenges for system operators. Generally speaking,low-quality synchrophasor data represents data that cannotaccurately reflect the underlying system behavior. The inac-curacy can be caused by various problems such as sensingnoises, data loss, and GPS time errors. As an example, theratio of low-quality synchrophasor data, reported by CaliforniaIndependent System Operator (ISO) in 2011, ranged from 10%to 17% [5]. In 2013, the ratio of low-quality synchrophasordata in China was reported to range from 20% to 30% [6]. Theonline data quality monitoring of synchrophasors becomes amajor barrier for any advanced synchrophasor-based analytics.

Manuscript received February 27, 2016; revised October 5, 2016; acceptedNovember 18, 2016. Date of publication November 29, 2016; date of cur-rent version June 16, 2017. This work was supported in part by the PowerSystems Engineering Research Center, and in part by NSF ECCS-1150944,DGE-1303378, and IIS-1636772. Paper no. TPWRS-00318-2016.

The authors are with the Department of Electrical and Computer Engi-neering, Texas A&M University, College Station, TX 77843 USA (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TPWRS.2016.2633462

In order to improve data quality of synchrophasor systems,various methods have been proposed. In [7], a synchrophasor-based state estimator is introduced to detect phasor angle biasand current magnitude scaling problems. In [8], Kalman filter-ing technique is applied to detect low-quality synchrophasordata. Both state estimator and Kalman filter-based approachesrequire prior knowledge on system topology and model param-eters for detecting low-quality data. Therefore, the detectionaccuracy of the above approaches may be affected when grosserrors are presented in system topology or parameters. Further-more, these methods cannot operate successfully when state es-timation diverges because of gross measurement errors, systemphysical disturbances, or stressful operating conditions. In [9],[10], several logic-based low-quality data detection schemes arepresented. These approaches compare synchrophasor data withcertain threshold, apply high-noise filters to raw synchrophasormeasurements, and perform cross-checking on synchrophasormeasurements obtained in nearby physical locations, in order todetect abnormal synchrophasor measurements. However, thesepre-defined logics may be rendered ineffective when large dis-turbances occur in the studied power grid. In [11], clusteringalgorithms are applied to extract information from power sys-tem time-varying data. These clustering techniques could be po-tentially applied to detect system anomalies such as low-qualitysynchrophasor data or system physical disturbances. Reference[12], [13] pioneered a purely data-driven method to improvesynchrophasor data quality. This method applies low-rank ma-trix factorization techniques to detect and repair low-qualitysynchrophasor data. It has satisfactory performance under bothnormal and fault-on operating conditions. However, since thematrix factorization techniques bear high computational burdensuch as nonlinear optimizations, it becomes a challenge whenapplied for real-time applications.

In view of the current efforts on synchrophasor data qual-ity improvement, this paper presents a data-driven approach foronline detection of low-quality synchrophasor measurements.It leverages the spatio-temporal similarities among multi-time-instant synchrophasor data, and applies density-based localoutlier detection technique to detect low-quality synchrophasormeasurements. The major advantages of the proposed approachare summarized as follows. (1) This is a purely data-drivenapproach, without requiring any prior knowledge on systemtopology or model parameters, which eliminates the potentialmisdetections caused by inaccurate system information; (2) theproposed approach can operate without any converged state esti-mation results and is suitable for filtering out gross measurement

0885-8950 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

2818 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 00, NO. 00, 2016

errors for advanced power system analytics; (3) the proposedapproach has fast computational speed, which could be benefi-cial for real-time applications; and (4) the algorithm is able toperform detections under both normal and fault-on operatingconditions. The proposed detection algorithm differentiateshigh-quality synchrophasor data recorded during system physi-cal disturbances (faults) from the low-quality data, which avoidspotential false alarms caused by physical disturbances.

The rest of the paper is organized as follows. Section IIpresents the problem formulation of the low-quality synchropha-sor data detection issue; Section III discusses the proposeddata-driven approach for low-quality synchrophasor data de-tection; Section IV presents case study results to verify theproposed approach; Section V provides concluding remarks tothis paper.

II. PROBLEM FORMULATION

This section presents the key features differentiating low-quality synchrophasor measurements from the high-qualityones. Based on these features, low-quality synchrophasor mea-surements are formulated as spatio-temporal outliers amonghigh-quality measurements in the power grid. Accordingly, thelow-quality synchrophasor data detection problem is formulatedto be a spatio-temporal outlier detection problem.

A. Key Features of High-Quality and Low-QualitySynchrophasor Data

Let m × n matrix M denote a set of synchrophasor measure-ments collected from n synchrophasor channels of the same type(i.e., all of them are voltage/current/power channels), withinm time instants. This measurement matrix can be decomposedinto the following two matrices:

M = L + D (1)

where the kth column of matrix L represents the accurate mea-surements corresponding to the kth synchrophasor channel inM , and D denotes the matrix containing inaccurate informa-tion caused by data quality problems. Each nonzero entry Dij

represents a measurement error of the jth synchrophasor chan-nel at time instant i. Here, a synchrophasor channel representsone of the following electrical quantities obtained by a syn-chrophasor: voltage magnitude, voltage phasor angle, currentmagnitude, current phasor angle, real power, and reactive power.Therefore, Mij is a real number instead of a complex number.

Definition 1: Mij is defined to be low-quality synchropha-sor data if its corresponding |Dij | > τ , where τ is a positivethreshold to determine low-quality data.

It has been shown in [12], [13], when low-quality synchropha-sor data is presented in certain power system, the rank ofmatrix M would be higher than the rank of matrix L, due tothe nonzero entries in matrix D. This phenomenon indicates thelinear dependency (similarity) among synchrophasor measure-ments would be weakened by data quality problems.

In order to demonstrate the above property of low-qualitysynchrophasor measurements, Fig. 1 shows voltage magnitudecurves measured by two synchrophasors with nearby physicallocations. Both curves were recorded at the same time period,

Fig. 1. Comparison between synchrophasor curves with and withoutlow-quality data.

when a line-tripping fault was presented in the system (from3 s to 5 s). The upper curve contains low-quality data at around1s. By observing only the upper curve, it is difficult to confirmwhether the data spikes are caused by physical disturbance ordata-quality problem, since all the data spikes have outlier be-havior compared with their temporal neighbors. However, bycomparing multiple synchrophasor curves obtained in differentlocations of the system, it would be possible to differentiatespikes caused by data-quality problems and those caused bydisturbances, since spikes caused by data-quality problems areoutliers compared with their spatial neighbors, while spikescaused by disturbances appear in curves recorded by multiplesynchrophasors and therefore cannot be considered as outlierscompared with their spatial neighbors.

The above observations can be summarized as the followingkey features of low-quality and high-quality synchrophasor dataunder normal/fault-on operating conditions:

Feature 1: Both low-quality synchrophasor measurementsand fault-on synchrophasor measurements exhibit weaktemporal similarities with the measurements obtained at theneighboring time periods, while high-quality synchrophasormeasurements obtained during normal operating conditionsexhibit strong temporal similarities with the measurementsobtained at the neighboring time periods.

Feature 2: Low-quality synchrophasor measurements ex-hibit weak spatial similarities with the measurements obtainedby the neighboring synchrophasors at the same time period,while fault-on synchrophasor measurements exhibit strongspatial similarities with the measurements obtained by theneighboring synchrophasors at the same time period.

It should be noted that strong electrical connectionsamong neighboring synchrophasors are required in order forthe above features to be valid. Therefore, higher synchrophasormeasurement redundancy would lead to better accuracy in low-quality data detection, and lack of measurement redundancycould cause misdetections for the proposed algorithm. As moreand more synchrophasors are being installed in power gridsaround the world, the measurement redundancy would be en-hanced, and therefore the detection accuracy of the proposedalgorithm would be improved.

B. Formulation of Low-Quality Synchrophasor Data asSpatio-Temporal Outliers

According to the discussions in the previous section,low-quality synchrophasor measurements have weaker

WU AND XIE: ONLINE DETECTION OF LOW-QUALITY SYNCHROPHASOR MEASUREMENTS: A DATA-DRIVEN APPROACH 2819

Fig. 2. 2D points representing synchrophasor curves under normal/fault-on/low-quality conditions. (a) Overall figure with all the 2D points undernormal/fault-on/low-quality conditions. (b) Zoomed-in figure with all the 2Dpoints under fault-on condition. (c) Zoomed-in figure with all the 2D points un-der normal condition and high-quality 2D points under low-quality condition.

spatio-temporal similarities with their high-quality neighbors,under both normal and fault-on operating conditions. Therefore,these low-quality measurements can be formulated as spatio-temporal outliers among all the synchrophasor measurementsin the system. With a proper definition of similarity metricsfor synchrophasor curves, the degree of similarity betweentwo synchrophasor curves can be quantified, and data-miningtechniques can be applied to detect the spatio-temporal outlierswhose degrees of similarity are significantly different fromother synchrophasor curves.

For a measurement matrix M obtained within a certain periodof time, general steps to formulate the detection problem aredescribed as follows:

Step 1: Define a proper similarity metric (distance function)f(Mi,Mj ), which quantifies the degree of similarity betweenthe ith and jth column of M .

Step 2: Map each column of M (a data curve obtained fromcertain synchrophasor channel) to the space S where the dis-tance function f(Mi,Mj ) is defined. Each column of M canbe represented as a point in S.

Step 3: Examine the outlier behavior of the points in S, accordingto the distance function f(Mi,Mj ). Points lying far from themajority are more likely to be outliers with low-quality data.

Fig. 2 demonstrates the above formulation through asimple example. Three 2 × 8 measurement matrices M(1),M(2), M(3) are sampled from the same set of synchrophasorchannels at three different time periods. Each matrix contains 8synchrophasor curves within 2 consecutive time instants. M(1)contains 6 high-quality synchrophasor curves and 2 low-qualitysynchrophasor curves obtained under normal operating con-dition. M(2) and M(3) contain 8 high-quality synchrophasorcurves obtained under fault-on and normal operating conditions,respectively. The Euclidean distance is used as the similarity

metric (distance function), and each synchrophasor curve in thethree matrices is projected to the 2D Euclidean space shownin Fig. 2. The x and y coordinates of each point are the datavalues at the first and second time instant of the correspondingsynchrophasor curve, respectively.

The following observations can be drawn from Fig. 2: (1) Thecluster of fault-on synchrophasor data (fault-on cluster) lies farfrom the clusters of high-quality synchrophasor data under nor-mal operating condition (normal-condition cluster), indicatingweak temporal similarity between the two clusters; (2) all thepoints within the fault-on cluster lie close to each other, indicat-ing strong spatial similarities among points within the fault-oncluster; and (3) the two points representing low-quality syn-chrophasor curves lie far from the normal-condition cluster, aswell as the majority of points in the low-quality cluster, indicat-ing weak spatial and temporal similarities with their neighboringpoints. Therefore, the low-quality data points can be defined asspatio-temporal outliers under this formulation.

III. ONLINE DETECTION OF LOW-QUALITY

SYNCHROPHASOR MEASUREMENTS

Based on the previous discussion, we propose a density-basedlocal outlier factor (LOF) analysis to detect low-quality syn-chrophasor data. In [14], similar LOF-based techniques are in-troduced for the detection of high sensing noises and false datainjections in synchrophasor data. This paper improves the sim-ilarity metrics for synchrophasor curves, which lead to morerobust performance on detecting various types of data qualityproblems, including not only sensing noises and false data in-jections, but also data spikes and un-updated data problems.

A. Similarity Metrics Between Synchrophasor Curves

In this subsection, two similarity metrics are proposedfor detecting low-quality synchrophasor data whose vari-ance is significantly higher or lower than its spatio-temporalneighborhoods.

Definition 2: Let M(k) denote the synchrophasor measure-ment matrix obtained at the kth time period. The length of eachtime period equals to the length of the moving data windowof the proposed algorithm. Let Mi(k) and Mj (k) denote theith and jth columns of M(k). Let σi(k) denote the standarddeviation of Mi(k), Let C denote the data set of all the syn-chrophasor measurements identified to be clean (without dataquality problems) by the proposed algorithm. The normalizedstandard deviation for synchrophasor data obtained from the ithchannel at the kth time period is defined as follows:

σNormi (k) =

σi(k)∑ t = k −1

t = 1 σi (t)χC (Mi (t))∑ t = k −1

t = 1 χC (Mi (t))

(2)

where

χC (Mi(t)) =

{1 (Mi(t) ∈ C)

0 (Mi(t) /∈ C)(3)

The normalized deviation σNormi (k) represents the standard

deviation of data curve obtained from the ith synchropha-sor channel at the kth time period, normalized by the aver-


age standard deviation of the historical clean measurementsobtained from the same synchrophasor channel. Consideringσi(k) as a indicator of the strength of system dynamic responserecorded by the ith synchrophasor channel at the kth time pe-riod, σNorm

i (k) is a normalized indicator which compares thecurrent strength of system dynamic response with the averagehistorical strength recorded by the same sensing channel. Thisnormalization process removes the influence of synchrophasorphysical locations on the dynamic strength of the synchrophasorcurves.

1) Similarity Metric for Low-Quality Synchrophasor DataWith High Variance: The similarity metric (distance function)fH (i, j) between Mi(k) and Mj (k) is defined as follows:

fH (i, j) =∣∣σNorm

i − σNormj

∣∣ (4)

2) Similarity Metric for Low-Quality Synchrophasor DataWith Low Variance: The similarity metric (distance function)fL (i, j) between Mi(k) and Mj (k) is defined as follows:

fL (i, j) = max

(∣∣∣∣∣

σNormi

σNormj

∣∣∣∣∣,

∣∣∣∣∣

σNormj

σNormi

∣∣∣∣∣

)

(5)

The above two similarity metrics measure the differencebetween dynamic strength of data curves Mi(k) and Mj (k).Since during the same time period k, clean synchrophasorcurves across the system tend to have similar dynamic strength(similarly low/high strength under normal/fault-on operatingcondition), fH (i, j) and fL (i, j) values tend to be small forclean measurements. However, the dynamic strength of low-quality synchrophasor curves tend to be different from that ofthe clean curves, since dynamics of low-quality synchrophasorcurves are mainly driven by the dynamics of the data qual-ity problems, rather than the true system dynamics. Therefore,fH (i, j) and fL (i, j) values tend to be large for low-qualitysynchrophasor measurements.

Although both similarity metrics could reflect the outlier be-havior of both low-quality data with high variance (such assensing noises, data spikes, etc.) and low variance (such asun-updated data), fH (i, j) tends to be more sensitive to high-variance data problems and fL (i, j) tends to be more sensitiveto low-variance data problems. Under normal operating con-ditions, the performance of fH (i, j) in detecting low-variancedata problems (such as un-updated data) could be unsatisfac-tory. This is because under normal operating conditions, thenormalized standard deviations for clean measurements tend tobe close to one, while the normalized standard deviations forlow-variance data (such as un-updated data) tend to be closeto zero. Therefore, under normal operating conditions, fH (i, j)between clean data and un-updated data would remain closeto one, while fL (i, j) between clean data and un-updated datawould be a very large number. However, under normal operat-ing conditions, fH (i, j) between two clean data sets would bea small positive number (close to zero), and fL (i, j) betweentwo clean data sets would lie around one. Therefore, the fL (i, j)value between un-updated data and clean data tends to be muchlarger than fL (i, j) value between two clean data sets, leadingto a better detection performance. This performance differenceis further demonstrated through case studies.

B. Density-Based Outlier Detections for Synchrophasor Data

Built upon the above similarity metrics, LOF analysis, whichis a density-based outlier detection technique, is applied to solvethe low-quality data detection problem. In this subsection, pro-cedures for calculating LOFs are briefly discussed. The math-ematical definition of “density” is presented below. Details ofLOF analysis can be found in [15].

1) Calculation of k−distance(p): Let the measurement ma-trix M be a database consisting of synchrophasor measurements.Let p, q, o be some objects in M , each object represents a col-umn in M . Let k be a positive integer. The distance between pand q, denoted by d(p, q), is defined by fH (p, q) or fL (p, q).

For any positive integer k, the k−distance of object p, de-noted by k−distance(p), is defined as the distance d(p, o) be-tween p and an object o∈M such that:

a) for at least k objects o′ ∈M\{p} it holds thatd(p, o′)≤d(p, 0), and

b) for at most k − 1 objects o′ ∈M\{p} it holds thatd(p, o′) < d(p, 0).

In the above definition, o′ ∈M\{p} denotes {o′ : o′ ∈M,o′ �∈ {p}}

Intuitively, k−distance(p) represents the distance betweenobject p and the kth nearest neighbor of p. The value ofk−distance(p) provides a measure on the density around theobject p. For the same number of k, smaller k−distance(p)indicates higher density around p.

2) Identification of k−distance(p) neighborhood of p:Given k−distance(p), the k−distance(p) neighborhoodof p contains every object whose distance from p isnot greater than the k−distance, i.e., Nk−distance(p)(p) ={q ∈M\{p}|d(p, q)≤k−distance(p)}.

These objects q are called the k-nearest neighbors of p.3) Calculation of reachability distance of object p with

respect to object o: The reachability1 distance of object pwith respect to object o is defined as reach − distk(p, o) =max{k−distance(o),d(p, o)}.

Intuitively, if object p is far away from object o, then thereachability distance between p and o is simply their actualdistance d(p, o). However, if they are “sufficiently” close toeach other, the actual distance d(p, o) is replaced by thek−distance(o). The reason is that in doing so, the statisticalfluctuations of d(p, o) for all the p’s close to o can be signifi-cantly reduced. The strength of this smoothing effect can be con-trolled by the parameter k. The higher the value of k, the moresimilar the reachability distances for objects within the sameneighborhood.

Fig. 3 illustrates the relationship among true distance d(p3 , o),k−distance(o), reach−distk (p1 , o), and reach−distk (p5 , o).In this example, k = 3, and true distance d(·) is the Eu-clidean distance2. According to the above definitions,k−distance(o) represents the distance between object oand the kth nearest neighbor of o. Therefore, when k = 3,

1It should be noted that the notion of reachability in this paper does not referto reachability concept in hybrid system literature.

2It should be noted that Euclidean distance is used here only for the illustrationof the concepts of k−distance(·) and reach−distk (·).In the proposed low-quality data detection algorithm, the true distance d(·) is defined by similaritymetrics fH (·) and fL (·)


Fig. 3. k−distance(o), reach−distk (p1 , o), and reach−distk (p5 , o)when k = 3.

k−distance(o) = d(p3 , o), where p3 is the third nearestneighbor of o. The radius of the circle in Fig. 3 representsk−distance(o). Since true distance d(p1 , o) < k−distance(o),and true distance d(p5 , o) > k−distance(o), the reachabilitydistance between p1 and o is reach−distk (p1 , o) =k−distance(o), while the reachability distance betweenp5 and o is reach−distk (p1 , o) = d(p5 , o). These reachabilitydistances reach−distk (·), developed through the comparisonbetween true distances d(·) and k−distance(o), will then beused to formulate the local outlier factor.

4) Calculation of local reachability density of p: The localreachability density of p is defined as

lrdMinPts(p) =

1/

( ∑o ∈NM i n P t s (p) reach−distM inP ts(p, o)

|NM inP ts(p)|

)

(6)

where NM inP ts(p) = NM inP ts−distance(p)(p), and MinPts isa positive integer.

Intuitively, the local reachability density of an object p isthe inverse of the average reachability distance based on theMinPts-nerest neighbors of p.

5) Calculation of LOF of p: The local outlier factor of p isdefined as

LOFMinPts(p) =

∑o ∈NM in P t s (p)

lrdM in P t s (o)lrdM in P t s (p)

|NMinPts(p)| (7)

The local outlier factor of object p captures the degree towhich p is an local outlier. It is the average of the ratio of thelocal reachability density of p and those of p’s MinPts-nearestneighbors. It is easy to see that the lower p’s local reachabilitydensity is, and the higher the local reachability densities of p’sMinPts-nearest neighbors are, the higher the LOF value of p is.

C. Robust Detection Criterion and Parameter Selections

In order to improve the robustness of the proposed ap-proach, the following detection criterion and parameter selectionprocedure are applied to the algorithm.

1) Robust Detection Criterion: Due to the propagation de-lay of electro-magnetic waves, synchrophasors installed at dif-ferent locations of a large-scale power system may respondto physical disturbances at the time instants slightly asyn-chronous with each other. If a short moving data window ischosen for the algorithm, this slight time shift may cause falsealarms under fault-on operating conditions. In order to avoidthe false alarms without introducing too much computational

burden, synchrophasor measurements within the current mov-ing data window are identified to contain low-quality data onlyif there are already l consecutive moving data windows priorto this current window, whose LOF values exceed the thresholdvalue. l is a integer slightly less than the length of the movingdata window. This criterion would introduce a small detectiondelay to the proposed algorithm. However, since the length ofthe moving data window is set to be short for the purpose ofonline application, the delay would be a insignificant value.

2) Parameter Selections: Three parameters need to be deter-mined for the proposed algorithm: number of nearest neighbors(MinPts) of each object, length of the moving data window, andLOF thresholds for various similarity metrics. These parame-ters can be determined through off-line training using historicaldata. In order to reduce the detection delay, the length of mov-ing data window should remain short. The MinPts value can beselected to be around half of the total number of synchrophasorchannels, by assuming the total number of low-quality curvesat each time window should be less than the total number ofhigh-quality synchrophasor curves.

According to the previous discussions, the overall flowchartof the proposed algorithm is shown in Fig. 4. Key steps forimplementing this low-quality data detection approach are asfollows.

Step 1: Create the current moving data window by reading insynchrophasor measurements at the latest time instant.

Step 2: Compute fH (·) and fL (·) values for each pair of syn-chrophasor curves.

Step 3: Compute LOF value of each synchrophasor curve, basedon fH (·) and fL (·). For each synchrophasor curve, the LOFvalue can be calculated following the equations in the previ-ous subsection.

Step 4: If the LOF value corresponding to fH (·) or fL (·) of theith synchrophasor curve exceeds the threshold, go to Step 5;otherwise, go to Step 7.

Step 5: If the previous l consecutive LOF values correspondingto fH (·) or fH (·) of the ith synchrophasor curve exceed thethreshold, go to Step 6; otherwise, go to Step 7.

Step 6: The ith synchrophasor curve is detected to contain low-quality data at current time window.

Step 7: Move the data window to the next time instant, and goback to Step 1.

Although the above calculation procedure involves loopingprocess for the LOF calculation of each synchrophasor curve,there is no time-consuming computation (such as matrix inver-sion, decomposition, etc.) involved in the above procedure. Allthe operations within the looping process request light computa-tional efforts. The computational burden of the entire process isnot significant. The computational performance of the proposedalgorithm is demonstrated through the case studies.

IV. CASE STUDIES

The proposed approach is tested using both synthetic and real-world synchrophasor data. Low-quality measurements causedby various reasons are used to verify the effectiveness ofthe approach. In all the following test cases, a unique set of


Fig. 4. Overall flowchart of the proposed approach.

Fig. 5. Synthetic synchrophasor measurements with high sensing noise.

algorithm parameters are used: moving data window length =20 data points; LOF threshold corresponding to fH (·) = 10;LOF threshold corresponding to fL (·) = 100; Number of neigh-boring data for LOF algorithm = 0.5 × number of synchropha-sor curves. In order to demonstrate the proposed method iscapable to detect low-quality data under fault-on operating con-ditions, a system physical disturbance (fault) is recorded by thesynchrophasor data in each test case.

A. Case Study With Synthetic Data

The synthetic synchrophasor measurements are sampled fromthe simulation results of a standard IEEE-14 test system, witha sampling rate of 50 Hz. A three-phase line-to-ground fault ispresented while running the simulation. In each test case, onetype of low-quality data is randomly inserted into a subset ofthe test data.

1) Synthetic Data With High Sensing Noise: This test dataset contains 14 synthetic voltage magnitude measurementcurves, where 3 of them (No. 1, 5, 14) contain Gaussian noiseslasting from 6 s to 6.4 s, with a signal-to-noise ratio (SNR) of40 dB. Fig. 5 shows the 3 curves with data quality problems.

Table I presents the detection results. It shows that all the3 noisy data segments are successfully detected, without intro-ducing any false alarm by the physical disturbance. A small de-tection delay (less than 0.38 s) is introduced, due to the length ofthe moving data window. The average computing time for eachmoving data window is 0.0161 s. Fig. 6 presents the LOF valuesof all the synchrophasor curves, when data quality problem orphysical disturbance is presented. This comparison shows that

TABLE IDETECTION RESULTS FOR SYNTHETIC SYNCHROPHASOR

DATA WITH HIGH SENSING NOISE

Index of Synchrophasor Starting Time of Ending Time ofwith High Noise Noisy Segment Noisy Segment

1 6.22 s (LOF = 620.5) 6.78 s (LOF = 31.9)5 6.34 s (LOF = 429.1) 6.78 s (LOF = 73.3)

14 6.34 s (LOF = 418.6) 6.76 s (LOF = 48.2)

Fig. 6. LOF values of synthetic synchrophasor channels when physical dis-turbance (right) or high sensing noise (left) is presented.

the LOF values exceed the threshold when low-quality data ispresented, while remain below the threshold when physical dis-turbance is presented. The results indicate the proposed methodis able to detect low-quality synchrophasor data while avoidingfalse alarms caused by system physical disturbances.

2) Synthetic Data With Spikes: This test data set contains47 synthetic real power measurement curves, where 4 of them(No. 3, 6, 30, 45) contain data spikes lasting from 6.3 s to 6.4 s.These spikes can be caused by problems such as data loss ortime skew of GPS clock [10]. Fig. 7 shows the 4 curves withdata quality problems.

The detection results are shown in Table II. All the 4 spikesare detected and no false alarm is introduced by physical dis-turbance. The detection delay introduced by the length of themoving data window is less than 0.36 s. The average computingtime for each moving data window is 0.0627 s. Fig. 8 presentsthe LOF values of all the synchrophasor curves, when data qual-ity problem or physical disturbance is presented. It is clear thatlow-quality data would cause the LOF values to exceed the


Fig. 7. Synthetic synchrophasor measurements with data spikes.

TABLE IIDETECTION RESULTS FOR SYNTHETIC SYNCHROPHASOR DATA WITH SPIKES

Index of Synchrophasor Starting Time of Ending Time ofwith Data Spike Spike Segment Spike Segment



Fig. 8. LOF values of synthetic synchrophasor channels when physical dis-turbance (right) or data spike (left) is presented.

Fig. 9. Synthetic synchrophasor measurements with un-updated data.

threshold, while system physical disturbances would not causea significant increment in LOF values.

3) Synthetic Data With Un-Updated Data: This test data setcontains 14 synthetic voltage magnitude measurement curves,where 3 of them (No. 6, 12, 13) contain un-updated data lastingfrom 6 s to 6.4 s. Fig. 9 shows the 3 curves with data qualityproblems.

Table III presents the detection results. The 3 un-updateddata segments are detected, while the presence of physical dis-turbance does not cause any false alarm. The detection delayintroduced by the length of the moving data window is less than0.36 s, and the average computation time for each moving timewindow is 0.0128 s.

4) Synthetic Data With False Data Injection: This test dataset contains 47 synthetic real power measurement curves, where4 of them (No. 15, 21, 29, 42) contain false data injections lasting

TABLE IIIDETECTION RESULTS FOR SYNTHETIC SYNCHROPHASOR

DATA WITH UN-UPDATED DATA

Index of Starting Time Ending TimeSynchrophasor with of Un-updated of Un-updated

Un-updated Data Segment Segment

6 6.36 s (LOF = 3423.6) 6.40 s (LOF = 3519.5)12 6.36 s (LOF = 3423.6) 6.40 s (LOF = 3519.5)13 6.36 s (LOF = 3423.6) 6.40 s (LOF = 3519.5)

Fig. 10. Synthetic synchrophasor measurements with false data injection.

TABLE IVDETECTION RESULTS FOR SYNTHETIC SYNCHROPHASOR

DATA WITH FALSE DATA INJECTIONS

Index of Starting Time Ending TimeSynchrophasor with of Injected of InjectedFalse Data Injection Data Segment Data Segment

15 6.32 s (LOF = 39.7) 6.78 s (LOF = 30.9)21 6.32 s (LOF = 25.7) 6.78 s (LOF = 19.9)29 6.32 s (LOF = 14.1) 6.78 s (LOF = 10.7)42 6.34 s (LOF = 10.8) 6.72 s (LOF = 10.9)

from 6 s to 6.4 s. Fig. 10 shows the 4 curves with data qualityproblems.

The detection results are shown in Table IV. Although phys-ical disturbance is presented, all the 4 false data injections arecorrectly detected and no false alarm is introduced. The detec-tion delay caused by the length of the moving data window isless than 0.38 s. The average computing time for each movingdata window is 0.0627 s.

In all the above case studies using synthetic synchropha-sor measurements, the maximum detection delay is less than0.4 s, and the maximum computing time for each moving datawindow is less than 0.1 s. It is summarized in [16] that thedata latency requirements for online quasi-steady-state applica-tions (state estimation, small signal stability analysis, oscilla-tion analysis, voltage stability analysis, etc.) range from 1 s to5 s. It is clear that both the detection delay and the computingtime of the proposed method satisfy the latency requirementsfor synchrophasor-based online quasi-steady-state applications.Therefore, the proposed method is suitable for online detectionof low-quality synchrophasor measurements, in order to im-prove the accuracy of these synchrophasor-based applications.


Fig. 11. Real-world synchrophasor measurements with high sensing noise.

TABLE VDETECTION RESULTS FOR REAL-WORLD SYNCHROPHASOR

DATA WITH HIGH SENSING NOISE

Index of Synchrophasor Starting Time of Ending Time ofwith High Noise Noisy Segment Noisy Segment


Fig. 12. LOF values of real-world synchrophasor channels when physicaldisturbance (right) or high sensing noise (left) is presented.

B. Case Study With Real-World Data

High-quality synchrophasor measurements obtained from areal-world power grid are used to test the proposed approach.The sampling rate of the data is 100 Hz. A line-tripping fault isrecorded by the data. In each test case, one type of low-qualitydata is manually inserted to a randomly-chosen subset of thetest data, so that the ground truth of the existence of low-qualitydata is known for sure.

1) Real-World Data With High Sensing Noise: This testdata set contains 39 real-world voltage magnitude measurementcurves, where 4 of them (No. 10, 15, 23, 29) contain Gaussiannoises lasting from 1 s to 1.2 s, with a SNR of 40 dB. The SNRof the original clean data set is tested to be well below 40 dB.Fig. 11 shows the 4 curves with data quality problems.

Table V presents the detection results. It shows that all the4 noisy data segments are successfully detected, without intro-ducing any false alarm by the physical disturbance. A smalldetection delay (less than 0.19 s) is introduced, due to the lengthof the moving data window. The average computing time foreach moving data window is 0.0376 s. Fig. 12 presents theLOF values of all the synchrophasor curves, when data qualityproblem or physical disturbance is presented. This comparisonshows that the LOF valus exceed the threshold when low-quality

Fig. 13. Real-world synchrophasor measurements with data spikes.

TABLE VIDETECTION RESULTS FOR REAL-WORLD SYNCHROPHASOR DATA WITH SPIKES

Index of Synchrophasor Starting Time of Ending Time ofwith Data Spike Spike Segment Spike Segment



Fig. 14. LOF values of real-world synchrophasor channels when physicaldisturbance (right) or data spike (left) is presented.

data is presented, while remain below the threshold when phys-ical disturbance is presented. The results indicate the proposedmethod is able to detect low-quality synchrophasor data whileavoiding false alarms caused by physical disturbances.

2) Real-World Data With Spikes: This test data set contains22 real-world real power measurement curves, where 4 of them(No. 3, 6, 20, 21) contain data spikes at the time instant of 1.06 s.In this test case, the length of each data spike is one sample. Thistest scenario is created in order to test the performance of thealgorithm in detecting single data dropout. Fig. 13 shows the 4curves with data quality problems.

The detection results are shown in Table VI. All the 4 spikesare detected and no false alarm is introduced by physical dis-turbance. The detection delay introduced by the length of themoving data window is less than 0.19 s. The average computingtime for each moving data window is 0.0150 s. Fig. 14 presentsthe LOF values of all the synchrophasor curves, when data qual-ity problem or physical disturbance is presented. It is clear thatlow-quality data would cause the LOF values to exceed the


Fig. 15. Real-world synchrophasor measurements with un-updated data.

TABLE VIIDETECTION RESULTS FOR REAL-WORLD SYNCHROPHASOR

DATA WITH UN-UPDATED DATA

Index of Starting Time Ending TimeSynchrophasor with of Un-updated of Un-updated

Un-updated Data Segment Segment

1 1.18 s (LOF = 4637.2) 1.20 s (LOF = 4537.2)5 1.18 s (LOF = 4637.2) 1.20 s (LOF = 4537.2)7 1.17 s (LOF = 3317.8) 1.20 s (LOF = 4537.2)

13 1.18 s (LOF = 4637.2) 1.20 s (LOF = 4537.2)

Fig. 16. Real-world current magnitude synchrophasor measurements.

threshold, while system physical disturbances would not causea significant increment in LOF.

3) Real-World Data With Un-Updated Data: This test dataset contains 13 real-world current magnitude measurementcurves, where 4 of them (No. 1, 5, 7, 13) contain un-updateddata lasting from 1s to 1.2s. Fig. 15 shows the 4 curves withdata quality problems.

Table VII presents the detection results. The 4 un-updateddata segments are detected, while the presence of physical dis-turbance does not cause any false alarm. The detection delayintroduced by the length of the moving data window is less than0.18 s, and the average computation time for each moving datawindow is 0.0115 s.

Fig. 16 presents the current magnitude data obtained fromsynchrophasor channels No. 1 and No. 2, where synchrophasorchannel No. 1 contains un-updated data from 1 s to 1.2 s, andsynchrophasor channel No. 2 contains clean data only. Fig. 17presents the normalized deviations of the two synchrophasorchannels, as the computation data window moves with time. It isclear that: 1) under normal operating conditions, the normalized

Fig. 17. Normalized deviation of synchrophasor channel No. 1 and No. 2.

Fig. 18. Similarity metric fH (i, j) (left) or fL (i, j) (right) between syn-chrophasor channels No. 1 and No. 2.

Fig. 19. LOF values when similarity metric fH (i, j) (left) or fL (i, j) (right)is applied.

deviations of clean data segments lie close to one; 2) under fault-on operating conditions, the normalized deviations of clean datasegments increase significantly; 3) the normalized deviations ofun-updated data segments decrease towards zero.

Fig. 18 presents the fH (i, j) and fL (i, j) values of syn-chrophasor channels No. 1 and No. 2, as the computation datawindow moves with time. Fig. 19 presents the LOF values ofsynchrophasor channels No. 1 and No. 2, as the computationdata window moves with time. It is clear from Figs. 18 and19 that fL (i, j) is more sensitive to the un-updated data thanfH (i, j), and therefore leads to a better detection performancefor un-updated data.

4) Real-World Data With False Data Injection: This testdata set contains 39 real-world voltage magnitude measurementcurves, where 4 of them (No. 2, 20, 27, 37) contain false datainjections lasting from 1 s to 1.2 s. Fig. 20 shows the 4 curveswith data quality problems.

The detection results are shown in Table VIII. Although phys-ical disturbance is presented, all the 4 false data injections are


Fig. 20. Real-world synchrophasor measurements with false data injection.

TABLE VIIIDETECTION RESULTS FOR REAL-WORLD SYNCHROPHASOR

DATA WITH FALSE DATA INJECTIONS

Index of Starting Time Ending TimeSynchrophasor with of Injected of InjectedFalse Data Injection Data Segment Data Segment


correctly detected and no false alarm is introduced. The detec-tion delay caused by the length of the moving data window isless than 0.19 s. The average computing time for each movingdata window is 0.0475 s.

In all the above case studies using real-world synchrophasormeasurements, the maximum detection delay is less than 0.2 s,and the maximum computing time for each moving data win-dow is less than 0.05 s. It is summarized in [16] that the datalatency requirements for online quasi-steady-state applications(such as state estimation, small signal stability analysis, oscilla-tion analysis, voltage stability analysis, etc.) range from 1 s to5 s. It is clear that both the detection delay and the computingtime of the proposed method satisfy the latency requirementsfor synchrophasor-based online quasi-steady-state applications.Therefore, the proposed method is suitable for online dtection oflow-quality synchrophasor measurements, in order to improvethe accuracy of these synchrophasor-based applications.

Since the detection delay of the proposed algorithm is mainlycaused by the length of the moving data window, the delay couldbe estimated and removed when the occurrence time of the low-quality data is reported. By doing this, the reported occurrencetime of the low-quality data could be very close to its actualoccurrence time.

For power grids with a large number of synchrophasors, thecomputation speed of the proposed algorithm could be furtherimproved by applying the detection algorithm in a decentral-ized framework. In large systems, multiple detection enginescould be applied to process synchrophasor measurements ob-tained from different physical locations or control areas (such asdifferent states or different local control centers). Synchropha-sors lying far from each other could be grouped into differentsubgroups, and be processed in parallel by different detectionengines. This decentralized framework could help reduce the

number of synchrophasor channels that need to be processed byeach detection engine, and therefore improve the computationspeed of each detection engine. Since this method does not re-quire any system-wide information (such as system topology),it can be easily decentralized without spending extra effort oncreating the reduced or equivalent system model.

Meanwhile, parallel processing could also help improve theonline computation performance of the proposed algorithm.Multiple processors could be applied at each detection en-gine, so that several consecutive moving data windows could beprocessed by different processors at the same time. This par-allel technique could improve the overall computation speedwhen the proposed algorithm is applied to power systems witha significant number of synchrophasors.

V. CONCLUSION

This paper presents a framework that is possible for on-line detection and improvement of synchrophasor data qualityissues. The proposed approach formulates the low-quality syn-chrophasor data as spatio-temporal outliers among all the syn-chrophasor measurements, and performs detection through adensity-based local outlier detection algorithm. Similarity met-rics are proposed to quantify the spatio-temporal similaritiesamong multi-time-instant synchrophasor measurements. Theproposed approach has satisfactory performance under both nor-mal and fault-on operating conditions. It requires no prior in-formation on system modeling and topology. The computationspeed of the proposed algorithm is suitable for online appli-cations. Synthetic and real-world synchrophasor measurementsare used to verify the effectiveness of the proposed approach.This framework, if successful, could potentially boost up sys-tem operators’ confidence of synchrophaosr-based analytics inmodern power systems.

Built upon this work, future research could focus on devel-oping similarity metrics with more sensitive and robust perfor-mance, identifying root causes of the low-quality problems, andcorrecting the low-quality synchrophasor data.

REFERENCES

[1] E. Ghahremani and I. Kamwa, “Dynamic state estimation in power systemby applying the extended Kalman filter with unknown inputs to phasormeasurements,” IEEE Trans. Power Syst., vol. 26, no. 4, pp. 2556–2566,Nov. 2011.

[2] D. Kosterev, “Hydro turbine-governor model validation in pacific north-west,” IEEE Trans. Power Syst., vol. 19, no. 2, pp. 1144–1149, May 2004.

[3] I. Kamwa, R. Grondin, and Y. Hebert, “Wide-area measurement basedstabilizing control of large power systems-a decentralized/hierarchicalapproach,” IEEE Trans. Power Syst., vol. 16, no. 1, pp. 136–153, Feb. 2001.

[4] J. Bertsch, C. Carnal, D. Karlson, J. McDaniel, and K. Vu, “Wide-areaprotection and power system utilization,” Proc. IEEE, vol. 93, no. 5,pp. 997–1003, May 2005.

[5] California ISO, “Five year synchrophasor plan,” California ISO, Folsom,CA, USA, Tech. Rep., Nov. 2011.

[6] W. Qi, “Comparison of differences between SCADA and WAMS real-time data in dispatch center,” in Proc. 12th Int. Workshops Electr. PowerControl Centers, Jun. 2013, pp. 1–37.

[7] S. Ghiocel et al., “Phasor-measurement-based state estimation forsynchrophasor data quality improvement and power transfer interfacemonitoring,” IEEE Trans. Power Syst., vol. 29, no. 2, pp. 881–888,Mar. 2014.


[8] K. Jones, A. Pal, and J. Thorp, “Methodology for performing synchropha-sor data conditioning and validation,” IEEE Trans. Power Syst., vol. 30,no. 3, pp. 1121–1130, May 2015.

[9] K. Martin, “Synchrophasor data diagnostics: Detection & resolution ofdata problems for operations and analysis,” in Electric Power Group We-binar Series, Jan. 2014.

[10] Q. Zhang and V. Venkatasubramanian, “Synchrophasor time skew: For-mulation, detection and correction,” in Proc. North Amer. Power Symp.,Sep. 2014, pp. 1–6.

[11] S. Dutta and T. Overbye, “Information processing and visualization ofpower system wide area time varying data,” in Proc. IEEE Symp., Comput.Intell. Appl. Smart Grid, Apr. 2013, pp. 6–12.

[12] M. Wang et al., “A low-rank matrix approach for the analysis of largeamounts of power system synchrophasor data,” in Proc. 48th Hawaii Int.Conf. Syst. Sci., Jan. 2015, pp. 2637–2644.

[13] L. Liu, M. Esmalifalak, Q. Ding, V. Emesih, and Z. Han, “Detecting falsedata injection attacks on power grid by sparse optimization,” IEEE Trans.Smart Grid, vol. 5, no. 2, pp. 612–621, Mar. 2014.

[14] M. Wu and L. Xie, “Online identification of bad synchrophasor measure-ments via spatio-temporal correlations,” in Proc. Power Syst. Comput.Conf., Jun. 2016, pp. 1–7.

[15] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof:Identifying density-based local outliers,” in Proc. ACM SIGMODInt. Conf. Manag. Data, 2000, pp. 93–104. [Online]. Available:http://doi.acm.org/10.1145/342009.335388

[16] P. Kansal and A. Bose, “Bandwidth and latency requirements for smarttransmission grid applications,” IEEE Trans. Smart Grid, vol. 3, no. 3,pp. 1344–1352, Sep. 2012.

Meng Wu (S’14) received the B.E. degree fromTianjin University, Tianjin, China, in 2010, and theM.Eng. degree from Cornell University, Ithaca, NY,USA, in 2011, both in electrical engineering. She iscurrently working toward the Ph.D. degree at TexasA&M University, College Station, TX, USA. She wasan R&D Engineer at China Electric Power ResearchInstitute, Beijing, China, from 2011 to 2012, andBeijing Sifang Automation Co. Ltd., Beijing, from2012 to 2013. Her research interests include powersystem modeling, stability analysis, renewable en-

ergy integration, and wide-area monitoring and control systems.

Le Xie (S’05–M’10–SM’16) received the B.E. de-gree in electrical engineering from Tsinghua Uni-versity, Beijing, China, in 2004, the M.S. degreein engineering sciences from Harvard University,Cambridge, MA, USA, in 2005, and the Ph.D. de-gree from the Department of Electrical and ComputerEngineering, Carnegie Mellon University, Pittsburgh,PA, USA, in 2009.

He is currently an Associate Professor with theDepartment of Electrical and Computer Engineering,Texas A&M University, College Station, TX, USA.

His research interests include modeling and control of large-scale complexsystems, smart grids application with renewable energy resources, and electric-ity markets.

ieee transactions on power systems, vol. 00, no. 00,...

Documents