j&m-2

7/27/2019 J&M-2

1/12

1402 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 28, NO. 3, JULY 2013

Detection and Classi cation of Faults in Power Transmission Lines Using Functional Analysis

and Computational IntelligenceAndr de Souza Gomes, Marcelo Azevedo Costa, Thomaz Giovani Akar de Faria, and Walmir Matos Caminhas

Abstract The transmission line is the most vulner able elementof any electrical power system due to its large physical dimension.As a consequence, many fault diagnosis algorithms have been pro-posed in the literature. In general, most p roposals use signal-pro-cessing analysis and computational intelligence. In this paper, anew model to functionally represent the phases of a transmissionlineis proposed. The detection and clas si cation strategy are devel-oped from the analysis of the models parameters and were evalu-ated using a set of simulated faults and a real database. The resultsshow that the proposed model dete cts faults very quickly, using avastly simpli ed mathematical process, and is able to classify faultsaccurately.

Index Terms Detection and classi cation of faults, power trans-mission lines.

I. I NTRODUCTION

T HE ABILITY to detect faults in transmission lines as fastas possible is crucial, since they may compromise the propagation of energy to customers and the functioning of thetransmission network. Therefore, ef cient fault detection ap-

proaches are focused primarily on the analysis of short time in-tervals, or transient signals. In this context, the use of wavelettransforms has emerged as a powerful tool for feature extrac-tion, mainly due to its ability to focus on short time intervalsfor the analysis of high-frequency components [1], [2]. Dif-ferent from Fourier transforms [3], wavelets transforms can usevarying time windows in order to extract the coef cients of themother wavelet [4][6]. Brie y, short time windows are appliedfor highfrequency components, and long time windows for low-frequency components. Further details about wavelets can be found in [1].

The estimates of wavelet coef cients require further analysisin order to detect or classify faults. In general, the coef cients

Manuscript received March 12, 2012; revised July 27, 2012, November 14,2012, and February 06, 2013; accepted February 26, 2013. Date of publicationMarch 27,2013; date of current version June 20,2013.This work wassupportedin part by CAPES-Brasil, and in part by CEMIG, in part by FAPEMIG, and in part by CNPq. Paper no. TPWRD-002602012.

A. S. Gomesand W. M. Caminhas arewith theGraduateProgram inElectricalEngineering, Federal University of Minas Gerais, Belo Horizonte 31270-901MG, Brazil.

M. A. Costa is with the Department of Production Engineering, Federal Uni-versity of Minas Gerais, Belo Horizonte 31270-901 MG, Brazil.

T. G. A. de Faria is with Electric Company of Minas Gerais, Minas Gerais,Belo Horizonte 30150-150, Brazil.

Color versions of one or more of the gures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identi er 10.1109/TPWRD.2013.2251752

are used as inputs to classi cation models, such as arti cialneural networks (ANNs) [7]; fuzzy systems [ 8][10]; classi ca-tion and regression trees (CART) [11]; support vector machines(SVMs) [12]; and a combination of these and other techniques[13], [14]. Despite the popularity of ANN , it has been widelycriticized because it requires a considerable amount of training[11]. As alternatives to ANNs, SVM and C ART reduce trainingefforts. Nevertheless, there is always a need to have innovative

methods for transmission-line p rotection which can potentiallydetect faults faster than curre nt routines [2].

In fact, current industrial relays for line protection are basedon simpler but still effective t echniques, such as the absolutesum of the differential current signal, differences in phase angle,etc. [15], [16]. The principle of differential and directional re-lays consists of the sequential surveillance of a protection zone(i.e., these relays detec t changes in the differences between thetotal current output of t he zone and the total current input en-tering the zone). For instance, threshold boundaries for differ-ential quantities and t he use of wavelets are found in recent lit-erature [17], [18].

Similarly, in order t o detect patterns that do not conform to theexpected behavior, our proposed anomaly detection approachde nes a protection zone, or a region that represents normal behavior within th e domain of transmission lines, and then de-clares any behavior outside this region as an anomaly. In gen-eral, the normal b ehavior of transmission lines is much easier to model than the anomalous behavior. This is because the be-havior of an an omaly is usually unknown and, therefore, dif -cult to model. In such modeling, statistical inference analysis[19] is crucial and provides mechanisms to properly build the boundaries b etween normal and anomalous behaviors. That is,the error of classifying normal behavior as an anomaly is con-trolled: this is also known as the control of the type I error.

Our approa ch focuses on a reliable representation of thetransmission line under normal operating conditions. Our proposed mathematical model includes stochastic componentswhich account for current and voltage stochastic deviations,or noises, under normal operating conditions. By doing so,we provide novel stochastic representation of the transmissionlines, which enables faster detection of anomalous behaviors,or faults. Therefore, our novel approach also relies on anomalydetection techniques [20].

This work was initially motivated by a research project be-tween the Department of Electronics Engineering and the Elec-tric Company of Minas Gerais (CEMIG), Brazil. CEMIG has a

transmission network of approximately 7605 km. It is the third

0885-8977/$31.00 2013 IEEE

7/27/2019 J&M-2

2/12

DE SOUZA GOMES et al. : DETECTION AND CLASSIFICATION OF FAULTS IN POWER TRANSMISSION LINES 1403

largest power transmission company in Brazil and is responsiblefor the propagation of large amounts of energy throughout thecountry. Currently, the power transmission protection system provides prompt responses to faults; meaning, the network islocally shut down when a fault is detected. Subsequently, whena fault occurs, maintenance teams move to the location of thefault. In most situations, there is no upfront information re-garding the type of fault that will be found. Therefore, once themaintenance teams reach the fault location and identify its type,additional equipment may be required. Due to a large extensionof the transmission network, the transport of extra equipmentmay considerably delay the time required to return the transmis-sion line to its normal operating behavior. Therefore, a systemthat identi es the probable cause of faults can guide the main-tenance team with proper equipment.

Initially, we proposed a methodology intended to classifyfaults in transmission lines. We found that these boundaries canalso be used to detect faults quickly. On average, the proposedmodel detects faults in 0.09 cycles (or 3.64 ms), or faster. It isimportant to point out that our detection rate of 0.094 cyclesat 60 Hz is superior than the recent published results of a half cycle using a sampling rate of 32 per cycle at 50 Hz [12], andtwo full cycles using a sampling rate of 256 samples per cycleat 50 Hz [11].

This paper is organized as follows: Section II presents thenovel approach based on the elliptical behavior of the currentand voltage signals. Section III presents the performance of themethod for simulated and real databases. Finally, Section IV presents the conclusion.

II. METHODS

A. Mathematical Background

Let the voltage and current signals in one phase of a trans-mission line at time be written as

(1)(2)

Likewise, we can express voltage and current signals as sinefunctions

(3)(4)

We do this in order to simplify the mathematical analysis of the proposed model. In this model, the angle is the delay betweenthe current sine signal and the voltage sine signal .and are the peak values of the voltage and current sine signals,respectively. is the power factor (PF), andis the angular velocity. In the Brazilian system, 60 Hz. For any electrical system, PF is also de ned as the ratio betweenactive power (P) and apparent power (S).

Fig. 1 shows the expected 2-D behavior of the voltage andcurrent sine signals for one phase of a transmission line at

Fig. 1. Elliptical behavior of a phase in a transmission line, for different valuesof : (a) and (b) .

standard operation (i.e., without noise). This behavior can bemodeled using a conic section mathematical equation or, morespeci cally, the equation of the ellipse.

It can be shown that for different values of the PF, differentvalues for the radii and the rotation angle are generated. If thevalue of the PF is kept constant and the peak values of the cur-rent and voltage sine signals change, then the shape of the el-lipse changes as shown in Fig. 2. Considering the transmissionsystem under standard operation, the 2-D elliptical behavior issimilar for each of the three phases of a transmission line.

B. Geometric Representation of One Phase in a Transmission Line

Given the operational peak values for the voltage and currentsignals, the standardized signals can be written as

(5)

(6)

7/27/2019 J&M-2

3/12


Fig. 2. Elliptical behavior of a phase in a transmission line for different peak values of current and voltage sine signals. (a) 200 kV and 0.4 kA,(b) 200 kV and 1.5 kA.

Furthermore, by applying basic trigonometric identities, itfollows:

(7)

Thus, from (5) and (6) and applying (7), the Cartesian form for

the equation of the ellipse is de ned as

(8)

Equation (8) represents the expected behavior of the electricalsystem after applying the standardized operator, and assuminga constant power factor and normal operating conditions. Thestandardized operation of voltage and current signals enablesthe comparison of these signals using the same unit scale.

The nonstandardized equation of the ellipse, for any value of and , is given by

(9)

Using the geometric properties of (8), it is possible to mon-itor the behavior of voltage and current signals for each phaseof a transmission line. As will be shown, this representation al-lows the design of a system to identify, as fast as possible, themoment when potential faults have occurred. The parameters of the ellipse can be re-estimated after a fault; that is, the value of the radii and the slope of the ellipse along with the estimatesof the parameters of the postfault ellipse, may be used as inputvariables in any classi cation system.

C. Estimates of the Coef cients of a Conic Section

According to Boldrine et al. [21], a conic section de ned inrepresents a set of points whose coordinates satisfy the gen-

eral equation

(10)

with the parameters or or .

The parameters can be estimated using a sample of the cur-rent and voltage signals. In this case, it is possible to describe aset of linear equations for the parameters in order to nd their estimates through minimum least squares.

First, (9) can be rewritten in matrix form as

When rewritten as matrices, the parameters are estimated bymeans of the solution of minimum least squares, written as

(11)

From the minimum least squares estimates, and PFcan be estimated as

D. Rotation and Translation Operations of a Conic Section

As will be shown in Section II-E, it is of interest to representthe equation of the ellipse in a reduced form. The reduced form(or canonical form) of an ellipse can be obtained starting fromthe general equation, then performing a translation operation of the Cartesian axis [21],followed by a rotation operation of angle

. The translation operation can be done by simply extractingthe average values of the original voltage and current signals.The rotation operation creates a new PF between thesignals, which is zero (i.e., ), where represents theoperation of rotation.

7/27/2019 J&M-2

4/12


Thereduced form of (8) after the operationof rotation is givenas

(12)

or, alternatively, as , whereand .

For 0 or , the nal conic equation represents adegenerated ellipse with one of the radii equal to zero and, con-sequently, the equation de nes a straight line. For the remainingcases, (8) and (9) represent an ellipse.

E. Modeling Stochastic Components

The elliptical representation of the behavior of both voltageand current signals does not consider, at rst, the existence of the noise which is inherent in the processes of generation, trans-mission, or even disturbances related to digital recorders (DRs).The DRs collect data of current and voltage in the power substa-tions. Even under normal operating conditions and without theoccurrence of faults, the transmission lines are constantly sub- jected to noise that are present in current and voltage signals.In this case, it is of interest to quantify the noise componentsand to incorporate these components into the parametric equa-tion of the ellipse. In order to do this, we propose an alternativemodel for the behavior of voltage and current signals by addingtwo stochastic components: one component to the voltage signaland the second component to the current signal, as follows:

(13)

where and are two random variables, with means of zeroand variances of and , respectively, .

The model proposed in (13) has interesting mathematical properties, as will be shown next. The fact that the stochasticcomponents have means of zero indicates that the randomvariables do not affect the average behavior of the ellipse. Bothstochastic variables are associated only with the noise, thatis, the random dispersion of the voltage and current signalswith respect to their nominal conditions values under normaloperating conditions.

The fact that the proposed stochastic model is multiplicative

with respect to the nominal peak values means that the variancesof the noise are proportional to the square of the peak values of the voltage and current signals ( and ). This speci c formu-lation has advantages with respect to the standardized operationof the signals. Thus, the standard equations for voltage and cur-rent signals, assuming the stochastic components, are de ned as

(14)

It is possible to show that the expected mathematical value of the signal with the stochastic component is given by (8). Thatis, the stochastic component changes neither the rotation nor themean value of the ellipse with respect to the Cartesian plane.

In order to t the model with the stochastic components, itis necessary to estimate the dispersion parameters and .We estimate these parameters by rst nding the solution of a1-D optimization problem that searches for the value of thatminimizes the squared Euclidean distance of a point

to the ellipse

(15)

Therefore, the residuals are calculated as

(16)

where and are the radii of the rotated ellipse, described pre-viously.

In this case, we use the optimization method named golden search [22] to nd . We repeat the optimization procedure

to a sample of points , where is thesample size. These points represent the EPS under normal op-erating conditions. Finally, we use the residuals to estimate thedispersion parameters

(17)

(18)

It is worth mentioning that our proposed model projectsthe temporal behavior of the voltage and current signals intoa statistic 2-D space. In this space, we estimate the residuals

by simply calculating the minimum distance of a point to theellipse. This procedure has the advantage of estimating voltageand current residuals simultaneously.

Having found estimates for the dispersion parameters and, the ultimate goal of the stochastic model is to build bound-

aries around the ellipse for the operation of the phases (i.e.,voltage and current signals of each phase of a transmission line),under normal operating conditions. These boundaries are built based on statistical inference analysis [19].

F. Con dence Intervals for the Ellipse Under Normal Operating Conditions

The following stochastic model is being considered for each phase of any transmission line operating under normal operatingconditions:

(19)

where and are the radii of the rotated ellipse, and andare the two random variables.

We want to build a -level con dence interval for and. To do so, we initially evaluate the points

at time , where . For this speci ccondition, , where follows an unknowndistribution with mean and variance . Similarly, for

, then , where

7/27/2019 J&M-2

5/12


also follows an unknown distribution with mean and variance. The following time points and represent the

axes of the Cartesian plane where the ellipse is located.If and are Gaussian random variables, then at

and , the con dence intervals are de ned as

(20)

where is the -score statistic with the -con dencelevel. For instance, choosing a con dence parameter of 99.7%, then [19].

In this case, the upper and lower limits of the con denceinterval are represented by equations of ellipses with adjustedradii. At , the radius is adjusted to which is theupper l imit; a nd t o , which i s the l ower l imit. I f ,then the lower limit does not exist. The same principle applies tothe (i.e., by adjusting the radius to ) (upper l imit)and to (lower boundary). If , then the lower

boundary does not exist. Nevertheless, when and ,the additive effect of the proposed stochastic model does nothold. In practice, for the construction of con dence intervals,the additive form proposed in the stochastic model does not re-sult in signi cant loss in the accuracy of the upper and lower limits.

Using the proposed additive model has advantages, such as: generating upper and lower limits, also as ellipses; generating simple rules, to assess whether points are within

the con dence region.A simple rule to determine whether a point is

within the con dence region is to check whether the following

conditions are satis

ed:

(21)

Fig. 3 shows the con dence region for the ellipse, under normal operating conditions, and considering Gaussian randomnoises with the dispersion parameter of 0.02 (or 2%).

1) Con dence Intervals for Non-Gaussian Residuals: Con- dence intervals can also be estimated if there is evidence that

the residuals do not follow a Gaussian distribution. In this sit-uation, upper and lower limits are obtained using the order sta-tistics or the percentiles of the residuals. For instance, assuminga 99% equally tailed interval, the rule to determine whether a point is within the con dence region is rewritten as

% %

% %(22)

where % and % are the0.5th percentile of thevoltageand current residuals, respectively; and

%and

%are the 99.5th percentiles of the voltage and current residuals,respectively. Brie y, we sort the residuals in ascending order

Fig. 3. Upper and lower limits de ned by modeling the stochastic component,and assuming nominal operating conditions.

and select the values which leave 0.5% below, and those whichleave 99.5% above, these values. By doing so, an empiricalcon dence interval of 99% (con dence level) is created, if therandom variables and are independent.

G. Fitting the Model Under Normal Operating Conditions

Using sample size for both voltage and current signalsunder normal operating conditions, and are esti-mated, as shown in Section II-C. Alternatively, nominal valuescan be used, without the need for a sample. Thus, using thesevalues, the voltage and current signals are standardized usingthe estimated peak values. In sequence, the sample points arerotated using angle . As a consequence, the expected behavior of current and voltage signals is given by the ellipse in its re-duced form. (See (12).) Finally, the residuals and their variancesare estimated. By doing so, a control region is established by theupper and lower limits, as de ned previously in Section II-F.

H. Fault Detection and False Fault Detection

Having de ned the control region, each phase of the trans-mission line is continuously monitored until three consecutive points violate the boundary conditions by leaving the control re-gion. For instance, if the signals are monitored using a samplesize of 32 points per cycle, then a fault is detected in approxi-mately 0.094 cycles.

Assuming a con dence interval of 99.7%, it is expected thatunder normal operating conditions, an average of 0.3% of the points will leave the boundaries. Therefore, a false fault occurs,on average, with every 333 sampled points. Nonetheless, if weconsider three consecutive points to be outside the boundaries asthe fault detection criteria, then, under normal operating condi-tions, a false fault occurs with every 40 000 000 sampled points,approximately, or

(23)

points on average. In this case, is the con dence value andis the number of consecutive points. In general, the larger thecon dence level and the number of consecutive points outsidethe boundaries, the smaller the chance of a false fault will occur.

7/27/2019 J&M-2

6/12


Fig. 4. Flowchart of the proposed method for monitoring the phases of a trans-mission line.

Nevertheless, in this situation, the time required to detect a truefaultwill increase. Therefore, the choice of and must balancethe time required to detect true faults in contrast to the number of false faults.

I. Classi cation of Faults

As shown in Section II-H, even under normal operating con-ditions, the model will eventually detect a false fault. We handlethis situation by including the normal behavior of the EPS asone of the possible outcomes of the classi er. Therefore, whena fault occurs, the classi er is able to detect whether the fault isfalse.

After the fault detection step, the classi cation stage of thefault starts. New incoming data, after fault detection, are rststandardized using previous prefault estimates. That is, we use

and prefault values. Then, the operations of rotation andtranslation are applied to the points. Recall that these operationsalso use the parameters estimated by the initial sample of size

, assuming normal operating conditions.A owchart of the proposed method for monitoring the

phases of a transmission line is shown in Fig. 4.For each phase, one postfault ellipse is estimated and the fol-

lowing parameters are used as input patterns for the classi er: radius of the ellipse, projected on the vertical axis; radius of the ellipse, projected on the horizontal axis; absolute value of the rotation angle of the ellipse, with re-

spect to the vertical axis; peak value of the transformed voltage signal; peak value of the transformed current signal; the postfault power factor.

We explored other parameters as potential inputs for the clas-si er. Nevertheless, the parameters listed before achieved the best results. The output of the classi er is de ned as the type of fault.

It is worth noting that at the postfault stage, there is no needto build a control region. We are mostly interested in evaluatingthe expected behavior of the postfault ellipse. Therefore, we donot estimate the postfault variances, nor the residuals.

Selected Classi ers: Selected classi ers were available fromthe machine intelligent platform WEKA [23]. The WEKA plat-form has been previously used in fault detection and classi ca-tion for power transmission lines [24], and it provides an ex-tensive list of classi ers. We applied the following classi ers:Bayesian networks, naive Bayes, logistic regression, radial basisfunctions, multilayer perceptron, decision table, decision tablenaive Bayes, -nearest neighbor (knn), AdaBoost, and baggingamong others. Overall, we tested 22 different classi ers. Amongthe tested classi ers, the decision table naive Bayes (DTNB),the Bayesian networks, and the -nearest neighbors provided

the best results, as shown next. The parameters of the classi erswere adjusted using a ve-fold cross-validation procedure [25].

The following, two examples using the monitoring system,or fault detection system, and the classi cation system areshown. In the rst case, a simulated database is used to detectand classify faults in transmission lines. In the second case, areal database is used to evaluate and classify weather eventsthat have compromised the operation of transmission lines of a power transmission company located in the state of MinasGerais, Brazil.

III. FAULT STUDIES

It is worth mentioning that the ellipse equation aims at mod-eling the mean behavior of each phase of any transmission lineunder normal operating conditions. After a fault, we estimate thesame model to the voltage and current signals but we standard-ized the postfault signals by applying values of ,rotation angle and translation parameters, which were estimated before the fault. As a consequence, all parameters of the post-fault ellipse are interpreted as relative values. For instance, if the post fault ellipse has a peak value of the current signal equalsto 3, then it can be said that the post fault current peak value,within the 1.4 cycles after the fault detection, is three timeslarger than the prefault peak value. The same principle applies tothe angle of the postfault ellipse. The use of our proposed para-metric models has further advantages. First, our model mimicsthe functional behavior of the transmission line and providesrobust information of the transmission line just a few momentsafter the fault, that is, effects of adverse events such as fault cur-rent decaying, switching of shunt capacitor, current variationsare accounted as noise components and, as a consequence, do

not compromise substantially postfault estimates of the ellipse.In practice, it means that the estimated ellipse using 1.4 post-fault cycles wont differ much from an estimated ellipse using2 or 3 postfault cycles. Following, we present two case studiesusing our proposed method. The rst case study is a simulateddatabase. The second case study is a real database.

A. Case Study 1: Simulated Database

An EPS model was simulated using software Power Sys-tems CAD (PSCAD), pscad.com , accessed July 13, 2012) con-sidering two power sources connected by a 230-kV transmis-sion line, 200 km long. The simulated faults were: short cir-

cuit type (AB, AC, BC, ABC, A-G, B-G, C-G, AB-G, AC-G,BC-G, where A, B, and C are the three phases of any transmis-sion line and G means ground ); and, open circuit type (A-open,B-open, C-open AB-open, AC-open, BC-open), resulting in atotal of 320 faults. We also included 20 simulations without anyfaults. The simulations of the faults assumed different values for the following quantities: type of the fault, distance to the fault,fault resistance type, angle of the fault, and power factor. Table Ishows the parameters of the simulated faults, where 100km and . For open-circuit faults the resistance was setat 1 M . For each scenario described in Table I 16 simulationsof faults were created, as previously described. Normal oper-ating conditions were included into the simulated faults, that is,if an AB short circuit fault is simulated, then the type of fault for the C phase is normal operation . Therefore, for each simulated

7/27/2019 J&M-2

7/12


TABLE IPARAMETERS OF SIMULATED FAULTS

TABLE IIDECISION TABLE FOR THE FINAL CLASSIFICATION OF THE FAULT

fault there is one possible output for each phase, which are: 1)normal operation, 2) short-circuit between phases, 3) short cir-cuit between phase and ground, and 4) open phase.

The classi cation results for each phase are applied to a deci-sion table, whose output indicates the nal diagnosis of the faultin the transmission line. Table II shows the decision table. For each phase of the transmission line, there is one possible outputamong the four possibilities: 1) normal operation, 2) short cir-cuit between phases, 3) short circuit between phase and ground,and 4) open phase.

B. Results of the Simulated Database

Fig. 5 shows a simulated AB short-circuit fault. Fig. 5(a)shows the behavior of the current and voltage signals just before

Fig. 5. Three-phase voltage and current signals, and the postfault ellipse for a simulated AB short-circuit fault. (a) Three-phase signals with simulatedAB-fault. (b) Detection step function and classi cation step function whichshow the moment at which the fault starts (horizontal dashed lines), the momentat which the fault is detected, and the moment at which the fault is classi ed.(c) Prefault and postfault signals and the postfault estimated ellipse for phase A.

and after the fault. Fig. 5(b) shows the fault detection step func-tion and the classi cation step function which describe the mo-ment at which the fault starts (horizontal dashed lines), the mo-ment at which the fault is detected and the moment at which the

7/27/2019 J&M-2

8/12


TABLE IIICLASSIFICATION TABLE

fault is classi ed. The fault was detected after three consecutive points out of the control region, or 0.094 cycles. The classi ca-tion of the fault was achieved after two cycles. Fig. 5(c) showsthe prefault and postfault signals projected into the 2-D spaceand the estimated postfault ellipse (solid line). It can be seenthat the postfault signal is very noisy. Nevertheless, the post-fault ellipse captures the average behavior of the signal withintwo cycles. It is worth mentioning that the estimated variancesfor the simulated database were very small and very similar.Therefore, we assume , and that the residuals are independent and follow a Gaussian distribution.

Table III shows the classi cation results. The Bayes network classi er [26] achieved the best result. The decision tree naiveBayes (DTNB) [27] classi er, and the -nearest neighbor clas-si er also achieved good results. On average, the classi cationrate is 96.6%.

Fig. 6 shows the simulated faults projected into the 2-D spaceof the following predicted variables: current relative peak valueand the postfault power factor. It can be noticed that the classesare grouped together and nonlinearly separable. As a conse-quence, nonlinear classi ers with low complexity such as thedecision trees, Bayesian networks, decision trees naive Bayes(DTNB), and knn provide high classi cation rates, as shown in

Table III. Fig. 6 also shows that the peak value of the postfaultcurrent signal, within the two cycles after the fault, may reach20 times the peak value of the current signal under normal oper-ating conditions for the short circuit between phases, and shortcircuit between phase and ground faults. It can be seen that the peak value of the postfault current signal for the open-phasefault is slightly larger than the peak value of the prefault currentsignal. This is because the time required for the current signalto reach zero is longer than 1.4 cycles after the fault. It is alsointeresting that for normal operation the power factor range isfrom 1 to 1. This is because of false faults which, in this case,were created using random pieces of voltage and current signals

under normal operating conditions.Table IV shows the misclassi ed patterns for each fold of the ve-fold cross-validation procedure, and for each phase (A, B,and C) of the transmission line. The rst fold has three patternswith at least one misclassi ed phase in each. For the rst pat-tern, only the output of phase B was incorrectly classi ed. It wasclassi ed as short circuit between phase and ground. Neverthe-less, phases A and C were correctly classi ed. For the second pattern, all phases were incorrectly classi ed. For the third pat-tern, again, only one phase was incorrectly classi ed.

Folds 2, 3, and 5 show one pattern each, again, with at leastone misclassi ed phase. Fold 4 does not show any misclassi ed pattern. For the second fold, all phases were incorrectly classi- ed, which is similar to the second pattern in the fold. The pattern in the third fold shows the outcomes of phases A and

Fig. 6. The 2-D space of predictive variables: current relative peak value and postfault power factor.

TABLE IVMISCLASSIFICATION OUTCOMES FOR EACH O NE OF THE 5-FOLDVALIDATION SETS , USING THE BAYESIAN NETWORK CLASSIFIER

FOR EACH PHASE (A, B, AND C) OF THE TRANSMISSION LINE

B switched. The fth fold also shows one incorrectly classi ed phase. In these cases, regardless the switched output betweenthe two phases, the general type of fault is being correctly iden-ti ed. On average, only one phase is being incorrectly classi edfor each pattern. Except in the second pattern of fold andthe pattern of the second fold in which the true type of fault is athree-phase short circuit (ABC short circuit), and the classi ca-tion output is a three-phase short circuit and ground. For thesetwo cases, the classi er is able to partially identify the type of fault, which is a three-phase short circuit.

C. Case Study 2: CEMIG Database

The CEMIG database consists of 41 records of faults reportedfrom years 2001 to 2003. Each report provides pre fault and post fault voltage and current signals for each phase of the trans-mission line. Four different types of faults are provided. Thetypes of faults and their respective codes are:

falling tree on transmission line (W1); electrical lightning in transmission line (ND); cable entanglement (K6); re close to the transmission line (AQ).The CEMIG data set does not report the type of fault for each

phase, separately. Therefore, the classi er cannot be designedto detect the type of fault for each phase, as shown for the sim-ulated data set. In this case, the output of the classi er is thefault of the transmission line and the inputs are the features of the ellipse of each phase, grouped in one single input vector as

7/27/2019 J&M-2

9/12


TABLE VCEMIG D ATABASE

follows: we rst compare the postfault peak values for each cur-rent signal, and for each phase, with its prefault peak values. Wecreate an input vector for the classi er in which the initial ele-ments are the parameters of the ellipse, among the three phases,which achieved the highest ratio between the postfault and pre-fault current peak values. Following, the parameters of the el-lipses of the remaining phases are also included into the inputvector, based on the descending order of the ratio between postand prefault current peak values. We choose the peak values of the current signals to sort the elements of the input vector be-cause, in practice, current signals were more sensitive to faultsthan voltage signals.

Table V shows the number of records in the CEMIG databasefor each type of fault. The database also includes 41 records of the transmission line under normal operating conditions. In gen-eral, the number of records is much smaller than the simulateddatabase.

D. Results of the CEMIG Database

Initially, we investigate some of the properties of the voltageand current residuals, as initially shown in (13). We sampled650 points, or approximately 20.3 cycles, from both voltage andcurrent signals under normal operating conditions. In sequence,

we estimate the residuals for both voltage and current signalsusing (16). Fig. 7 shows the histograms of the residuals for bothvoltage and current signals. The solid line represents the densityof the Gaussian distribution. It is worth noting that the empiricaldistribution of the residuals, i.e., the histograms, do not followthe Gaussian distribution. By applying the Anderson-Darlingnormality test [28] to the voltage and current residuals, we ob-tain -values of and , which providestatistical evidence that the residuals do not follow the Gaussiandistribution. Furthermore, the estimates of the standard devia-tions of the residuals are: 0.0066 and 0.0194. Wereplicate this analysis to the different phases of the transmis-

sion line and the results also indicate non-Gaussian behavior of the residuals. Therefore, we conclude that the assumption of Gaussian distribution of the residuals is not consistent with theCEMIG database and therefore we applied the proposed anal-ysis for non-Gaussian residuals, as shown in Section II-F1.

We also tested whether the random variables and areindependent. The empirical linear correlation between the cur-rent and voltage residuals is 0.2411 with a -value of .Therefore, we conclude that the residuals are not independent.As a consequence, we can not assume that, using the 0.5th andthe99.5th percentiles of the residuals in (22), the true con dencelevel is 99%. Furthermore, we can not assume that a false faultoccurs with every 8.000 sampledpointsas shown in (23). Never-theless, even if the residuals are correlated, it is possible to esti-mate the empirical false fault rate. To do so, we use the sampled

Fig. 7. Histograms of voltage and current residuals under normal operatingconditions. Solid line represents the Gaussian density distribution. (a) Residualsof voltage. (b) Residuals of current.

Fig. 8. Behavior of the voltage and current signals for four different types of faults. (a) Fire close to the transmission lines. (b) Electrical lightning.(c) Fallingtree on transmission line. (d) Cable entanglement.

points and, after building the control region, we estimate the proportion of points which lie within the control region. For theCEMIG database this value is 84.46%. Then, applying (23) with

0.8446 and considering 3 consecutive points out of thecontrol region as our fault detection criteria, a false fault occurswith every 267 sampled points. As shown in Section II-H, it

is possible to change the expected time required to detect falsefaults by increasing the control region, or the number of con-secutive points outside the control region. Particularly, the sampling rate of the CEMIG database is 64 points per cycle. There-fore, if we choose 7, then a false fault will occur, on av-erage, with every 457 000 points and the time required to detectfaults will be of approximately 0.11 cycles. Furthermore, evenif a false fault occurs, results show that the classi er is able tocorrectly identify false faults.

Due to the opening time of the breaker, which usually starts,on average, within two sine cycles after the fault, the samplesof the postfault signals were chosen as 1.4 sine cycles after thefault. Fig. 8 shows the prefault and postfault voltage and currentsignals, as well as the postfault ellipse for four different types of faults. The gure illustrates that the angle of the postfault ellipse

7/27/2019 J&M-2

10/12


Fig. 9. Three-phase voltage and current signals, and the projected ellipse, after and before the re close to the transmission line fault type. (a) Prefault and postfault voltage and current signals of a transmission line. (b) Detection stepfunction and classi cation step function which show the moment at which thefault is detected and the moment at which the fault is classi ed. (c) Prefault and postfault signals and the postfault estimated ellipse for phase A.

is quite distinct among the different types of faults. It is alsoevident that the amplitude of the postfault signals are different.It can be seen that for the electrical lightning fault [see Fig. 8(b)]

TABLE VICONFUSION MATRIX OF CEMIG D ATABASE

that the ellipse is very narrow with large values of the postfaultvoltage and current signals (i .e. , and ).

Fig. 9 shows the behavior of the current and voltage signalsafter and before a re close to the transmission line fault. Thefault was detected after seven consecutive points out of the con-trol region, or 0.1094 cycles (or 1.82 ms). In this case, the sampling rate is 64 points per cycle, as previously mentioned. Theclassi cation of fault was achieved after 1.4 cycles, as shownin Fig. 9(b). Fig. 9(c) shows that for this particular fault, the postfault ellipse is not noisy. Furthermore, 1.4 cycles after thedetection of the fault were suf cient to properly estimate the el-lipse before the process of complete shut down of the line. Itis worth mentioning that phase A achieved the highest ratio of prefault and postfault current peak values and, therefore, its es-timated parameters were used as the rst elements of the inputvector of the classi er.

In the state of Minas Gerais, during the dry season, it is quitecommon to have re in the woods close to the transmission line.The re may eventually cause short circuit faults, but may causeother types of faults as well. Nonetheless, the re usually af-fects, almost simultaneously, all phases of the transmission lineand, as a consequence, its pattern is quite distinct. In fact, our approach provides high classi cation rates for this type of fault

as shown in Table VI.As previously described, the inputs of the classi ers are

formed by six different measures of each phase. The appliedmeasures were the same used for the simulated data. Theclassi cation models were adjusted using the 5- fold-cross-val-idation method. Among the evaluated classi cation methods,the BayesNet achieved best results. The confusion matrix for the BayesNet method is shown in Table VI.

From Table VI, it can be seen that 63 samples were cor-rectly classi ed, overall, which represents a classi cation rate of 76.83%. This number represents the sum of the elements of thediagonal. These results are promising, even though the sample

size is small.It is worth noting that the classi er was able to correctly iden-tify all samples related to normal operating conditions. We in-cluded this category into the fault types in order to minimize theeffects of the false faults, as described in Section II-H.

With regard to faults related to re close to the transmissionline (AQ), the classi er achieved a classi cation rate of 73%.The classi cation rate for the four cases of cable entanglementfaults (K6) is 50%. In the latter case, the sample size is ex-tremely small.

The classi cation results for the electrical lightning faulttype (ND) show classi cation rates of 46.7%. For this class,some of the samples were erroneously classi ed as: re closeto the transmission line (26.7%), cable entanglement (20.0%)and falling tree on transmission line (6.7%).

7/27/2019 J&M-2

11/12


Fig. 10. The 3-D space of the predictive variables: current relative peak value, projected radius of the postfault ellipse, and postfault power factor.

Speci cally, for the fallen trees on the transmission lines fault

type (W1), the estimates of the parameters of the ellipse werecompromised because of singular matrices in the least squaresestimator.

Fig. 10 shows the faults projected into the 3-D space of the following predicted variables: current relative peak value, projected radius of the postfault ellipse, and postfault power factor. The gure shows the variables related to the phasewhich achieved the highest ratio of the pre and post-faultscurrent signals (i.e., these variables are the rst elements of theinput vector of the classi er). It can be noticed that the recordsrelated to normal operating conditions (OP) do not overlap theremaining classes. Overall, the elements of the re close tothe transmission line (AP) and electrical lightning (ND) share

a low degree of overlap. The elements of the falling tree ontransmission line (W1) and cable entanglement (K6) presenta higher degree of overlap. The class K6 presents just a fewelements.

IV. D ISCUSSION AND CONCLUSIONA new methodology for fault detection and fault classi ca-

tion is presented in this paper. The behavior of voltage andcurrent signals of any transmission line is modeled using anelliptical 2-D structure. Two stochastic components were in-cluded into the elliptical structure to account for noises under normal operating conditions. Based on statistical inference anal-ysis, a control region under normal operating conditions is built.Faults are quickly detected when a short sequence of pointsleaves the control region. Furthermore, the width of the bound-aries and the number of consecutive points leaving the con-trol region can be changed in order to detect faults faster. It isshown that if three consecutive points (or 0.094 cycles) leavingthe control region are chosen, then the false fault rate is 1/40000 000 sample points. Furthermore, if six consecutive points(or 0.187 cycles) are chosen, then the new false fault rate is1/1 370 000 000 000 000 sample points. In this case, the falsefault rate is so small that it can be concluded that false faultswill not occur. It is also worth mentioning that different faultdetection strategies can be designed based on distance, in stan-dard deviation units, of a point which lies outside the controlregion. Brie y, using statistical inference, the farther a point isfrom the control region, the less likely that point is to represent a

false fault; therefore, it is more likely that the point represents atrue fault. Thus, based on a single point, our approach providesmechanisms for the fault detection in 0.031 cycles. It must becautioned, however, that narrow boundaries will falsely clas-sify normal operating conditions as faults. Therefore, the widthof the boundary can be set based on the probability of detectingfalse faults.

After fault detection, 1.4 cycles of current and voltage signalsare used to estimate the expected behavior of the fault. In thissituation, the same model used to capture the normal operatingconditions of the transmission line is now applied to captureunique features of the fault. These features are used as inputs for a classi er, which determines the probable type of fault. Further-more, the computational complexity of the proposed framework is very low, and our proposal can ef ciently detect faults in realtime.

Estimates for the parameters of the model are presented for both prefault and postfault conditions. The use of geometriccomponents of the postfault ellipse as input vectors in classi- cation models proved to be a robust and innovative strategyfor classifying different types of faults, as shown by the resultsusing simulated and real databases. The simulation case studyshowed that the estimated variables of the ellipse provides non-linearly separable classes. The classi cation rates in both caseswere promising despite a small sample size in the real database.

Furthermore, our proposal presents very low complexitycompared to the Fourier transform, wavelets, or arti cial neuralnetworks (ANNs). In our proposal, each new sampling point istested using two simple mathematical expressions, as shown in(22). The estimates of the coef cients of the postfaul t ellipseare achieved, almost instantaneously, by means of a linear equation solution. Another major advantage of the proposed

method consists in its ability to generate fault c lassi cationspace with low complexity. Therefore, high classi cation ratescan be achieved with low complexity models, such as decisiontrees, Bayesian networks, and knn.

It is worth noting that the proposed methodology will classifyabnormal behavior of the current and voltage signals into oneof the possible types of faults, previously speci ed by the user.Therefore, in order to properly detect nonfault transient cases,the user must include such nonfault transient cases as regular types of faults.

Future work will aim at extending the study of statistical properties of the proposed model as well as the use of different parameters of the ellipse. Noise parameters at the postfault, as

potential inputs to the classi cation models, will also be inves-tigated further. We also aim at proposing novel extensions intothe 3-D and 6-D spaces. In the latter case, the 6-D represents thecurrent and voltage signals of the three phases of a transmissionline, simultaneously.

R EFERENCES[1] S. P. Valsan and K. S. Swarup, Wavelet transform based digital pro-

tection for transmission lines, Elect. Power Energy Syst. , vol. 31, pp.379388, 2009.

[2] A. Abdollahi and S. Seyedtabaii, Comparison of fourier & wavelettransform methods for transmission line fault classi cation, in Proc.4th Int. Power Eng. Optimiz. Conf. , 2010, pp. 579584.

[3] K. Gayathri and N. Kumarappan, Comparative study of fault identi -

cation and classi cationon EHV lines using discrete wavelet transformand fourier transform based ANN, Int. J. Elect., Comput., Syst. Eng. ,vol. 2, pp. 125136, 2008.

7/27/2019 J&M-2

12/12


[4] M. Patel, Fault detection and classi cation on a transmission lineusing wavelet multi resolution analysis and neural network, Int. J.Comput. Appl. , vol. 47, no. 22, pp. 2733, 2012.

[5] M. J. Reddy and D. K. Mohanta, A wavelet-fuzzy combined approachfor classi cation and location of transmission line faults, Int. J. Elect. Power Energy Syst. , vol. 29, pp. 669678, 2007.

[6] X. Dong, W. Kong, and T. Cui, Fault classi cation and faulted-phaseselection based on the initial current traveling wave, IEEE Trans. Power Del. , vol. 24, no. 2, pp. 552558, Apr. 2009.

[7] A. L. O. Fernandez and N. K. I. Ghonaim, A novel approach using aFIRANN for fault detection and direction estimation for high-voltagetransmission lines, IEEE Trans. Power Del. , vol. 17, no. 4, pp.894900, Oct. 2002.

[8] N. Zhang and M. Kexunovic, Coordinating fuzzy art neural networksto improve transmission line faultdetectionand classi cation, in Proc. IEEE Power Eng. Soc. Gen . Meeting , Jun. 2005, vol. 1, pp. 734740.

[9] R. Mahanty and P. D. Gupta, A fuzzy logic based fault classi cationapproach using current samples only, Elect. Power Syst. Res. , vol. 77, pp. 501507, 2007.

[10] O. A. S. Youssef, Combined fuzzy-logic wavelet-based fault classi -cation technique for power system relaying, IEEE Trans. Power Del. ,vol. 19, no. 2, pp. 582589, Apr. 2004.

[11] J. Upendar, C. Gupta, and G. Singh, Statistical decision-tree basedfault classi cation scheme for protection of power transmissionlines, Int. J. Elect. Power Energy Syst. vol. 36, no. 1, pp. 112,

2012. [Online]. Available: http://www.sciencedirect.com/science/ar-ticle/pii/S0142061511001864

[12] V. Malathi, N. S. Marimuthu, and S. Baskar, Intelligent approachesusing support vector machine and extreme machine for transmissionline protection, Neurocomputing , vol. 73, no. 1012, pp. 21602167,2010.

[13] P. Chiradeja and A. Ngaopitakkul, Identi cation of fault types for single circuit transmission line using discrete wavelet transform andarti cial neural networks, in Proc. Int. MultiConf. Eng. Comput. Sci-entists , 2009, vol. 2, pp. 15201525.

[14] A. Ngaopitakkul and C. Jettanasen, Combination of discrete wavelettransform and probabilisticneuralnetwork algorithmfor detectingfaultlocation on transmission system, Int. J. Innovative Comput., Inf. Con-trol , vol. 7, no. 4, pp. 18611873, 2011.

[15] H. Ferrer, R. E. O. Schweitzer, and S. E. Laboratories , ModernSolutions for Protection, Control and Monitoring of Electric Power Systems . Pullman, WA, USA: Schweitzer Engineering Laboratories,2010.

[16] J. L. Blackburn and T. J. Domin , Protective Relaying: Principles and Applications , 3rd ed. Boca Raton, FL: CRC, 2006.

[17] M. M. Eissa, Current directional protection technique based on po-larizing current, Int. J. Elect. Power Energy Syst. , vol. 44, no. 1, pp.488494, 2013.

[18] M. M. Eissa, A new digital busbar protection technique based on fre-quency information during ct saturation, Int. J. Elect. Power EnergySyst., vol. 45, no. 1, pp. 4249, 2013.

[19] G. Casella and R. L. Berger , Statistical Inference , 2nd ed. Paci cGrove, CA, USA: Duxbury Press, 2002.

[20] V. Chandola, A. Banerjee, and V. Kumar, Anomaly detection: Asurvey, ACM Comput. Surv. vol. 41, no. 3, pp. 15:115:58, Jul. 2009.[Online]. Available: http://doi.acm.org/10.1145/1541880.1541882

[21] J. L. Boldrini, S. I. R. Costa, V. L. Figueiredo, and H. G. Wetzler , Linear Algebra , 3rd ed. New York: Harper & Row, 1980.

[22] J. Kiefer, Sequential minimax search for a maximum, in Proc. Amer. Math. Soc. , 1953, vol. 4, no. 3, pp. 502506.

[23] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.Witten,The wekadata mining software: An update, SIGKDD Explor. Newsl. , vol. 11, no. 1, pp. 1018, 2009.

[24] A. A. Yusuff, A. A. Jimoh, and J. L. Munda, Determinant-based fea-ture extraction for fault detection and classi cationfor power transmis-sion lines, IET Gen., Transm. Distrib. , vol. 5, no. 12, pp. 12591267,2011.

[25] T. Hastie, R. Tibshirani,and J. H. Friedman , The Elements of Statistical Learning . New York: Springer, Jul. 2003.

[26] N. Friedman, D. Geiger, and M. Goldszmidt, Bayesian network clas-si ers, Mach. Learn. , vol. 29, pp. 131163, 1997.

[27] R. Kohavi, Scaling up the accuracy of naive-bayes classi ers: A de-cision-tree hybrid, in Proc. 2nd Int. Conf. Knowl. Discovery Data

Mining , 1996, pp. 202207.[28] M. A. Stephens, Edf statistics for goodness of t and some compar-

isons, J. Amer. Stat. Assoc. , vol. 69, pp. 730737, 19754.

Andr de Souza Gomes, photograph and biography not available at the timeof publication.

Marcelo Azevedo Costa, photograph and biography not available at the timeof publication.

Thomaz Giovani Akar de Faria, photograph and biography not available atthe time of publication.

Walmir MatosCaminhas, photograph and biography not available at the timeof publication.

j&m-2

Documents