fault location using wide-area measurements and sparse ... · dr. ali abur, adviser this thesis...
TRANSCRIPT
Fault Location Using Wide-Area Measurements and Sparse
Estimation
A Thesis Presented
by
Guangyu Feng
to
The Department of Electrical and Computer Engineering
in partial fulfillment of the requirements
for the degree of
Master of Science
in
Electrical and Computer Engineering
Northeastern University
Boston, Massachusetts
November 2015
NORTHEASTERN UNIVERSITY
Graduate School of Engineering
Thesis Signature Page
Thesis Title: Fault Location Using Wide-Area Measurements and Sparse Estimation
Author: Guangyu Feng NUID: 001916249
Department: Electrical and Computer Engineering
Approved for Thesis Requirements of the Master of Science Degree
Thesis Advisor
Dr. Ali AburSignature Date
Thesis Committee Member or Reader
Dr. Hanoch Lev-AriSignature Date
Thesis Committee Member or Reader
Dr. Bahram ShafaiSignature Date
Department Chair
Dr. Sheila S. HemamiSignature Date
Associate Dean of Graduate School:
Dr. Sara Wadia-FascettiSignature Date
To my parents.
ii
Contents
List of Figures v
List of Tables vi
List of Acronyms vii
Acknowledgments viii
Abstract of the Thesis ix
1 Introduction 11.1 Motivation of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Contribution of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 42.1 State of Art of Fault Location Methods . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Traveling Wave-based Methods . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Impedance-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 AI-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Wide Area Measurement System . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Sparse Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Basic Methodology 73.1 Transformation of Fault Location Problem . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Derivation of Equivalent Injection . . . . . . . . . . . . . . . . . . . . . . 73.1.2 Determination of Fault Type and Fault Line Type . . . . . . . . . . . . . . 113.1.3 Simulation Verification of Equivalent Injection Theory . . . . . . . . . . . 123.1.4 Building the Linear Estimation Equations . . . . . . . . . . . . . . . . . . 13
3.2 Sparse Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Measurement Placement for Unique Solution . . . . . . . . . . . . . . . . . . . . 16
iii
4 Alternations Improving the Method 194.1 Overlapping Group Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Second Stage OLS Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Criteria for Valid Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2.2 OLS Backup Estimation Routine . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Extended Robust Formulation and Correction Routine . . . . . . . . . . . . . . . . 214.3.1 Extended Robust Formulation . . . . . . . . . . . . . . . . . . . . . . . . 214.3.2 Recursive Solution Routine with Correction of Erroneous Measurement(s) 21
4.4 Parallel Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4.1 Problem Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4.2 Parallelism Attempted . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Simulation Results 275.1 Faults in the Unbalanced Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2 Wide-area Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.1 Teed Circuit Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2.2 Comparison of Basic LARS-Lasso and Overlapping Group Lasso . . . . . 305.2.3 Robustness of Extended LARS-Lasso Formulation Against Gross Error . . 33
5.3 Parallel Implementation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.3.1 Sequential Run Time Benchmark . . . . . . . . . . . . . . . . . . . . . . 345.3.2 Speed-up Using openMP . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.3.3 Speed-up Using GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Conclusion 37
Bibliography 39
A Appendix 45A.1
Matlab Code Solving Lasso Problem . . . . . . . . . . . . . . . . . . . . . . . . . 46A.2
Parallel C Code using openMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.3
CUDA C Code for GPU implementation . . . . . . . . . . . . . . . . . . . . . . . 81
iv
List of Figures
3.1 Illustration of a fault on a two terminal line. . . . . . . . . . . . . . . . . . . . . . 93.2 Illustration of a fault on a two terminal line with parallel branch. . . . . . . . . . . 93.3 Illustration of a fault on teed line. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Identification of fault type and faulted line type through inspecting equivalent injections. 113.5 11-bus distribution system diagram in ATP-EMTP. . . . . . . . . . . . . . . . . . 123.6 Real part of equivalent injections for the 11-bus unbalanced case. . . . . . . . . . . 133.7 Imaginary part of equivalent injections for the 11-bus unbalanced case. . . . . . . . 13
4.1 Flowchart for the recursive estimation . . . . . . . . . . . . . . . . . . . . . . . . 224.2 Three Scenarios of Data Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Example of Column Block Distribution . . . . . . . . . . . . . . . . . . . . . . . 244.4 Development process of different implementations . . . . . . . . . . . . . . . . . 25
5.1 Computation Times Used by Basic LARS-Lasso and Overlapping Group Lasso. . . 315.2 Number of Iterations by Basic LARS-Lasso and Overlapping Group Lasso. . . . . 315.3 Error of Fault Distance by Basic LARS-Lasso and Overlapping Group Lasso. . . . 325.4 Number of OLS Backup Estimations by Basic LARS-Lasso and Overlapping Group
Lasso. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.5 Comparison of Fault Distance Error under One or Two Unit Failures . . . . . . . . 335.6 Comparison of Speed-ups for Different Cases using openMP . . . . . . . . . . . . 35
v
List of Tables
5.1 Relative Error of Fault Distance for the Unbalanced 11-Bus Case . . . . . . . . . . 285.2 Platform Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.3 Sequential Run Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.4 Summary of GPU run times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.5 Summary of Profiling Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi
List of Acronyms
AI Artificial Intelligence
WAMS Wide Area Measurement System.
PMU Phasor Measurement Unit.
Lasso Least Absolute Shrinkage and Selection Operator.
The Lasso is a shrinkage and selection method for linear regression. It minimizes the usualsum of squared errors, with a bound on the sum of the absolute values of the coefficients. Ithas connections to soft-thresholding of wavelet coefficients, forward stagewise regression, andboosting methods.
LARS Least Angle Regression. The “S” suggests “Lasso” and “Stagewise”.
OLS Ordinary Least Squares.
openMP Open Multi-Processing
MPI Message Passing Interface
GPU Graphics Processing Unit
CUDA Compute Unified Device Architecture
vii
Acknowledgments
Here I wish to thank those who have supported me during and before the process of the thesis work.Among them, I’m especially grateful to Dr. Ali Abur, who has shared with me valuable researchguidance, wisdom as well as unique insights about problem-solving and debugging. I always considerit a privilege to have taken the power system analysis and unbalanced power grid course offered byDr. Abur since these courses not only made my knowledge on power system more comprehensive butalso left me with notes and codes that are still of great help to my research. In addition, I appreciateit very much the chances that Dr. Abur gave me for publishing and presenting my work even when Iwas feeling not very confident about my results.
I would like in the mean time thank Dr. Lev-Ari and Dr. Shafai in my committee as well asDr. Tomsovic from Univ. of Tennessee and Dr. Leeser from whom I have taken important courseson signal processing, linear system and control, constrained optimization and high performancecomputing. Beside enhancing my basic engineering skills, all these courses have contributed directlyor indirectly to my research.
Furthermore, I really appreciate the encouragement and help from my lab mates (or former labmates) especially Mert, Murat, Alireza, Chenxi, Pengxiang and Yuzhang. I wish all the best to theirlife and careers.
Last but not least, I can not thank my parents more for supporting me remotely both financiallyand spiritually over the past years. They are always unconditionally standing behind me no matterwhat choice I make.
viii
Abstract of the Thesis
Fault Location Using Wide-Area Measurements and Sparse Estimation
by
Guangyu Feng
Master of Science in Power System
Northeastern University, November 2015
Dr. Ali Abur, Adviser
This thesis describes a fault location method which relies on scattered wide-area synchronized phasormeasurements and usage of sparse estimation techniques. The main contribution is the way faultlocation is reformulated as a sparse estimation problem. Faulted system is modeled by equivalent ter-minal bus injections which would cause the same changes in bus voltages with the fault current drawnat any point along the faulted line. Once these injections are estimated, the fact that the ratio betweenequivalent injections at the two terminal buses depends only on the ratio of serial impedances on eachside of the fault point can be used to locate the fault. It is shown that this formulation applies to bothtwo terminal lines as well as teed lines regardless of fault type or resistance. Assuming availabilityof an accurate three-phase network model and a sufficient number of phasor measurements over theentire network, an underdetermined set of linear equations can be formed and then solved for thesparse equivalent bus injections. Then the problem fits naturally into a Least Absolute Shrinkage andSelection Operator (Lasso) formulation and can be solved via the Least Angle Regression (LARS)algorithm. Based on the condition for unique solution for Lasso problem, a scheme for optimalphasor measurement placement is also derived. Furthermore, alternations have been made to thebasic implementation of LARS so that the method’s reliability, robustness and efficiency is improved.
Using extensive simulations on both unbalanced three phase test case and wide-area network testcase, the accuracy and efficiency of the proposed method with its alternations have been verified.
ix
Chapter 1
Introduction
Power transmission lines are exposed to faults caused by natural or artificial reasons such as light-
ening, short circuits, faulty equipments, mis-operation, overload and aging. Faults fan be roughly
categorized as open circuit faults and short circuit faults. While open circuit faults are generally rare,
short circuit faults are more common in power systems and attract wide-spread attention. Faults are
also divided by being temporary or permanent. Temporary faults are know to be able to self-clear
and are more common on power lines. For these faults, accurate location of the fault can find the
weak point on the line and prevent future occurrence of fault by reparation. Permanent faults, on the
other hands, cause major damages and require immediate action from the crew to make the faulted
line return to service. Untimely or inaccurate fault location can result in prolonged line outage and
also severe economic losses and reduced reliability of the service. Thus accurate and timely fault
location has always been a critical issue in power system operation [1].
When a fault occurs on a transmission line, distance relays can provide some indication of the
general area where the fault is, but they are designed more in a way that line protection is made
on-line as fast as possible other than pinpoint the fault location. On the other hand, distance to fault
is estimated off-line from the recorded data. Depending on the types of data and models available to
researchers, there have been various methods proposed so far with most of them relying on local data
from one or both ends of the faulted transmission line.
1
CHAPTER 1. INTRODUCTION
1.1 Motivation of the Study
While there have been well-established fault locators using local measurements, chances are that
local measurement units may fail at the very time when the fault occurs. In this regard, this work
has turned to wide-area measurements for more reliable fault location method. Recent years have
seen increasing deployment of synchronized phasor measurements in power system and applications
based on the Wide Area Measurement System (WAMS) such as wide-area control and linear state
estimation. But the benefit of WAMS on fault location is not yet fully discovered since power grid
protection system have long been designed with local measurements as the decision variables.
Furthermore, previous formulations of fault location problems often need prior information about
fault type or faulted line to determine how the fault can be located. This adds to the purpose of
this work to derive a fault location method of universal applicability and straightforward solution
regardless of fault types and faulted line types.
In formulating the proposed fault location method, the advancement of sparse estimation technique
is also taken into account so that fault location problem can be tackled using new mathematical tools,
contributing to the diversity of the pool of fault locators.
1.2 Contribution of the Thesis
This work contributes the current fault location practice in the following aspects:
• Non-local, in other words, wide-area phasor measurements is used to locate the fault, thus
making it still possible to locate the fault when local measurement fails or is broken down.
• Transformation of fault location problem into linear sparse estimation problem of general
form regardless of fault types and faulted line types, enabling the usage of sparse estimation
algorithms.
• Only partial observability of the system is required to ensure unique fault location solution.
2
CHAPTER 1. INTRODUCTION
• Alternations to basic sparse estimation implementation are made to improve the reliability and
robustness of the method.
• In addition to sequential implementation, parallel implementations of the proposed method are
also investigated and realized, improving the method’s efficiency.
1.3 Thesis Outline
This thesis is composed of six chapters, the second chapter reviews past literatures on fault location
and provides technical background for the proposed method. The third chapters introduces the
formulation and basic proposed methodology. Then chapter 4 presents several alternations made on
the basic method so as to obtain better performance in terms of accuracy, robustness and efficiency.
Chapter 5 describes the test result addressing the universal applicability and performance of the
proposed method. Finally chapter 6 concludes the thesis and introduces future work.
3
Chapter 2
Background
2.1 State of Art of Fault Location Methods
Fault location methods currently used for transmission system faults can be broadly classified as
(1) traveling wave-based methods,(2) impedance-based methods and (3) Artificial Intelligence (AI)-
based methods.
2.1.1 Traveling Wave-based Methods
Traveling-wave based methods [2–6] make use of the high frequency (several kHz to MHz) elec-
tromagnetic transients induced by the fault to locate the time of arrival of the traveling wave from
the fault point. Thus, modal domain wave velocities need to be calculated in order to obtain the
distance from the fault point to the measuring device where the traveling wave is recorded. While
thanks to high frequency sampling, traveling wave based methods are independent with fault types,
the wave velocities can not always be accurately defined given the variation of the velocities with
frequency and tower configurations. In addition, there usually is a non-negligible cost associated
with the required high-frequency sampling of the waveforms.
4
CHAPTER 2. BACKGROUND
2.1.2 Impedance-based Methods
Impedance based methods are methods that use phasors to locate the fault. There is a large volume
of papers describing such methods using single-ended or double-ended measurements (e.g. [7–10]).
These methods require at least one of the terminals of the faulted line to be equipped with a monitoring
device like a digital fault recorder or other types of IEDs (intelligent electronic devices) [11]. There
are also multi-end algorithms [9,12–17] that use measurements from the multiple ends of transmission
line. For algorithms requiring more than one end of measurements, they can be further divided into
ones using synchronized or unsynchronized measurements. As stated in Sec. 1.1, many impedance
methods require prior assumption or knowledge about the fault section as well as fault type to
determine the detail of their algorithm, making them uneasy to implement.
2.1.3 AI-based Methods
AI-based methods [18–22] are those using artificial intelligence techniques such as pattern recognition
and machine learning algorithms. Unlike impedance or traveling wave based methods, they do not
rely on physics relationships between the measurement and the fault location but rather look for the
underlying connection between certain feature of the data and the fault. As the configuration of the
system evolves, these methods often need to be “retrained” to better adjust to the system.
2.2 Technical Background
2.2.1 Wide Area Measurement System
Recent years have seen major changes in power system operation due to the employment of WAMS.
Many applications, mostly regarding monitoring and control actions have been improved. Yet it has
not been clear how power system protection or fault location can benefit from this technical advantage.
Recently several studies have proposed the use of wide-area phasor measurements and locating
the fault not just based on single or double-ended measurements, but based on several measurements
taken at various locations over the whole system. The work in [23] derived a fault location factor as
5
CHAPTER 2. BACKGROUND
a function of fault distance for every bus. The homogeneity of this factor over a network is used to
identify firstly the fault region and then the exact location. In [24], the expected change of phasors
due to faults at different points of a network are used to be matched with measured phasors using an
optimization scheme. Similar to [23], its diagnosis process is also hierarchal.
2.2.2 Sparse Estimation Techniques
Another important technical basis for this work is the use of sparse estimation techniques in power
systems. In fact this kind of techniques have been used by various researchers for purposes of
detection and identification of faults. One recent work reported in [25] uses several compressive
sensing algorithms to pinpoint the location of a faulted bus. In [26] the authors employ greedy
orthogonal matching pursuit (OMP) method and the least-absolute shrinkage and selection operator
(Lasso) to identify line outages at affordable complexity. In another study [27] the authors derive an
algorithm to detect and localize malicious data attacks using L1 regularization for the generalized
likelihood ratio test.
6
Chapter 3
Basic Methodology
3.1 Transformation of Fault Location Problem
3.1.1 Derivation of Equivalent Injection
The proposed method aims to use the pre-fault steady state model of a power system and represent a
fault or multiple faults of any type on any combination of phases by current injection into the fault
point so as to keep the consistency of the system model. To discover how a fault can be identified
using the pre-fault system model, a series of classical fault analysis is conducted. Consider a single
phase to ground fault in an N bus system, first an extended bus impedance matrix Z̃bus is built with
the fault point as the (N + 1)th node. The change of bus voltages due to the fault is computed by
multiplying the (N + 1)th column of Z̃bus with a fault current If . Then using (3.1) where Ybus and
V represent the bus admittance matrix and bus voltages of the original N bus system, a sparse ∆I
is derived and the sparse bus current injections can be regarded as equivalent with the fault current
injection in terms of their effect on bus voltages.
Ybus∆V = ∆I (3.1)
The result unveils an interesting feature that the nonzero entries in ∆I only exist in the terminal
7
CHAPTER 3. BASIC METHODOLOGY
buses of the faulted line and their ratio is determined solely by the serial impedances of branches
incident to the fault point. If the faulted line is homogeneous, the distance from the fault point to
each terminal can be readily calculated from the ratio of the equivalent injections which is a real
number. Otherwise it would take some effort to look up the line parameters for translating the
ratio of injections to ratio of distances, which is not within the scope of this paper. Note that the
aforementioned feature does not require the system to be balanced, nor does it require the shunt
capacitances to be neglected, the only pre-requisition is accurate Pi-models of lines and load data at
the time of the fault. Since the bus impedance matrix for any system during the fault instant can be
seen as a constant matrix, the rule of superposition can be applied for multi-phase faults and multiple
faults in the system, their equivalent bus injections will appear in multiple phases or multiple terminal
buses instead of a single pair in the case of single phase to ground fault. It has to be clarified that
although the equivalent injection representations of fault have been built from scratch by the authors,
it also appears in literatures such as [28] and has been utilized in a different formulation.
To support the general applicability of the proposed method, detailed demonstration of the
equivalent injections will be introduced separately for two and three terminal lines.
3.1.1.1 Two terminal lines
For conventional two terminal lines, consider the scenario shown in Fig. 3.1 where the terminal
buses are indexed as A and B and the serial line impedances from the fault point to the terminals are
named z1 and z2 respectively. Here the single phase representation is used for the sake of simplicity.
Following the derivation in Sec. 3.1.1, the injections equivalent to the fault current If are:
8
CHAPTER 3. BASIC METHODOLOGY
I f
z1
z2
IA
IB
zAB2
zA#h z
Bth
! B
Figure 3.1: Illustration of a fault on a two terminal line.
IA = Ifz2
z1 + z2(3.2)
IB = Ifz1
z1 + z2(3.3)
IAIB
=z2z1
(3.4)
I f
z1
z2
IA
IB
zAB2
zA#h z
Bth
! B
Figure 3.2: Illustration of a fault on a two terminal line with parallel branch.
As shown in (3.4) the ratio of the derived equivalent injections is inversely proportional to the
ratio of the serial impedance on each side of the fault point and has nothing to do with the rest of the
network.
9
CHAPTER 3. BASIC METHODOLOGY
3.1.1.2 Teed lines
Representation of faults by equivalent bus injections applies not only to conventional two terminal
lines, but also to three terminal lines where the T-node is typically not monitored. In Fig. 3.3 the
terminal buses are named as A, B and C and the T-node as T . z1 and z2 are the impedances between
the fault point and bus A and the T-node respectively. The branch impedances of line B − T and
C − T are denoted by zBT and zCT .
I f
A
B
C
T
z1
z2
zBT
zCT
IA
IB
IC
zAth
zBth
zCth
Figure 3.3: Illustration of a fault on teed line.
Given a fault between A and T with a fault current If , the equivalent terminal injections will be
given by:
IA = If (1− z1(zBT + zCT )
zAT zBT + zAT zCT + zBT zCT) (3.5)
IB = Ifz1zCT
zAT zBT + zAT zCT + zBT zCT(3.6)
IC = Ifz1zBT
zAT zBT + zAT zCT + zBT zCT(3.7)
IBIC
=zCTzBT
(3.8)
Although the relationships between the injections seem more complicated than in (3.4) they are
10
CHAPTER 3. BASIC METHODOLOGY
still determined only by the ratios of the serial branch impedances on the faulted line. By comparing
IB and IC , the line segment A− T can be readily identified as the faulted segment and the distance
to fault point can then be determined by either comparing IA and IB or IA and IC .
3.1.2 Determination of Fault Type and Fault Line Type
From the derivations in 3.1.1 it can be concluded that the number and location of nodes with
equivalent injection inherently indicates the type of the faulted line. When multi-phase fault occurs,
the pair of equivalent injection on nodes of the same phases would still present the proportional
relationships in (3.4) and (3.8). The only difference is that the sparsity of the computed current
injection vector in two and three phase fault cases will be respectively two and three times of the
sparsity in single phase fault case. Based on the Kirchhoff’s Law, if all elements of the equivalent
injections sum up to zero, it indicates that the fault is ungrounded, otherwise it is a grounded fault.
Therefore there will be no need to know the type of the faulted line and the fault or adjust the
approach before computing these equivalent injections. Instead, they can be determined by simple
inspection following the decision making diagram below.
Eq. Injections
are Computed
Two Terminal
Line Fault
Teed Line
Fault
Grounded
Fault
Ungrounded
Fault
Single Phase Fault Two Phase Fault Three Phase Fault
# Buses with Eq. Injections = 2 # Buses with Eq. Injections = 3
Sparsity
= 2
Sparsity
= 4
Sparsity
= 6Sparsity
= 6
Sparsity
= 9
Sparsity
= 3
Sum of Eq. Injections != 0 Sum of Eq. Injections = 0
Figure 3.4: Identification of fault type and faulted line type through inspecting equivalent injections.
11
CHAPTER 3. BASIC METHODOLOGY
Since the fault branch of any type is replaced by current source branch, the feasibility of the
equivalent injection representation and decision making diagram is not affected by the value of fault
resistance as long as it is not so large that the fault current can be neglected. The proposed method is
therefore independent of fault resistance, which is not commonly seen in the field of fault location.
3.1.3 Simulation Verification of Equivalent Injection Theory
To add the credibility of the equivalent injection theory, simulation have been run in ATP-EMTP
regarding a fault in an unbalanced system (Fig. 3.5) modified from the 13-Bus feeder case in [29].
All 11 buses and 10 line sections are marked on the picture. A phase A and B to ground fault on the
1/4 part of section 5 between Bus 2 and 6 has been simulated. Using the simulated change of bus
voltages, the equivalent current injections have been computed. Fig. 3.6 and Fig. 3.7 are plots of the
real and imaginary part of the current injection vector.
Figure 3.5: 11-bus distribution system diagram in ATP-EMTP.
12
CHAPTER 3. BASIC METHODOLOGY
ID of Nodes (different phases are listed seperately)0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Real P
art
of
Eq
uiv
ale
nt
Inje
cti
on
Vecto
r
-1000
-800
-600
-400
-200
0
200Verification Through the 11-Bus Unbalanced System
Figure 3.6: Real part of equivalent injections for the 11-bus unbalanced case.
ID of Nodes (different phases are listed seperately)0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930Im
ag
. P
art
of
Eq
uiv
ale
nt
Inje
cti
on
Vecto
r
-1500
-1000
-500
0
500
1000
1500Verification Through the 11-Bus Unbalanced System
Figure 3.7: Imaginary part of equivalent injections for the 11-bus unbalanced case.
It can be readily seen from the plots that the non-zeros do appear in the terminal nodes (node 4, 5,
14, 15 correspond to the A&B phases of Bus 2&6) and proportional relationship between the pairs of
equivalent injections are valid. In addition, the decision making diagram can be easily utilized to
identify that the fault is a double phase to ground fault.
3.1.4 Building the Linear Estimation Equations
As shown in (3.1) and the supporting simulation verification, it is possible to estimate the equivalent
injections by a simple matrix-vector product of Ybus and ∆V , which however requires full observ-
13
CHAPTER 3. BASIC METHODOLOGY
ability of the system, i.e. the pre and post fault voltages at all buses are assumed to be available
and synchronized. Unfortunately, this is not a very realistic assumption for most power systems
today. Thus, this assumption is relaxed by considering that the system is partially observable with
voltage phasors known at only m out of N buses. Voltage phasors at these m buses can be measured
by Phasor Measurement Unit (PMU)s placed at these buses or derived from voltages measured by
PMUs at neighboring buses and current flow measurements on lines connecting these neighbors.
This will lead to the following underdetermined version of nodal equations for the system:
ZmI = ∆Vm (3.9)
where Zm and ∆Vm are extracted from Zbus and ∆V such that their rows correspond to the m
observable buses. In order for this complex-valued equation to be solved by existing sparse recovery
algorithms, it is converted into a larger set of real equations as described in [30] and shown below.
X =
real(Zm) −imag(Zm)
imag(Zm) real(Zm)
(3.10)
y =
real(∆Vm)
imag(∆Vm)
(3.11)
As a result, the following real-valued estimation problem is obtained:
y = Xβ (3.12)
The solution vector β will contain two sub-vectors, namely the real and imaginary parts of the
equivalent injections.
14
CHAPTER 3. BASIC METHODOLOGY
3.2 Sparse Estimation Algorithm
The linear equation given by (3.12) is underdetermined and would normally have infinite solutions.
However, it could be proved that when the number of non-zeros in β is smaller than half of the
number of rows in X , its sparsest solution will be unique [31]. The sparsest solution for fault
in-between line terminals could only be the one with equivalent bus injections because otherwise any
sparser injections would indicate a fault on a bus. Therefore the most straightforward formulation for
solving (3.12) with a sparsity constraint is:
min ‖β‖0 s.t. y = Xβ (3.13)
where the L0 norm of the solution β is defined as:
‖β‖0 = #{i = 1, 2, ...N |βi 6= 0} (3.14)
Since this L0 minimization formulation is non-convex and NP-hard as it involves combinatorial
optimization, the authors turn to the formulation of L1 minimization because L1 norm is proved to
be a convex relaxation of L0 norm. A popular L1 norm minimization formulation is Lasso (least
absolute shrinkage and selection operator) [32]:
β̂ ∈ argminβ∈Rp
1
2‖y −Xβ‖22 + λ‖β‖1 (3.15)
where the tuning of parameter λ>0 leads to solutions with different sparsity. One efficient algorithm
for solving this formulation is LARS (Least Angle Regression and Shrinkage) which iteratively
selects a predictor with an equal correlation criterion [33]. The merit of this algorithm is that it is
parameter-free and requires little computation time. In the following sections the formulation (3.15)
and its solution through LARS algorithm will be addressed as basic LARS-Lasso method so as to be
distinguished with its alternations.
15
CHAPTER 3. BASIC METHODOLOGY
The basic procedure for LARS is as follows, it can be concluded that the complexity of the LARS
algorithm is O(N), that is, it features linear complexity.
1. Start with an all-zero initial coefficient vector, and find the predictor (column in X) most
correlated with the response;
2. Take the largest step possible in the direction of this predictor until some other predictor, has
as much correlation with the current residual;
3. Proceed in a direction equiangular between the two predictors until a third variable earns its
way into the “most correlated” set;
4. Then proceed equiangularly between the former three predictors, that is, along the “least angle
direction”, until a fourth variable enters, and so on.
3.3 Measurement Placement for Unique Solution
The dictionary matrix in (3.15) should satisfy certain conditions in order for (3.15) to be solvable,
and this condition can be used in placing the voltage measurements. Based on the KKT optimality
conditions for (3.15), a sufficient condition for its unique solution is introduced in [34]. Firstly the
concept of equicorrelation set ε is developed as follows:
ε = {i ∈ {1,...p} : |XTi (y −Xβ̂)| = λ} (3.16)
In other words, β̂ε is the set containing only the non-zero entries of the solution β̂. Then the condition
for unique solution will be given by the following theorem:
Lemma 1 For any y, X and λ > 0, if null(Xε) = 0, or equivalently rank(Xε) = |ε|, then the
Lasso solution is unique.
In order to apply this condition for the placement of PMUs, the columns of the dictionary matrix
X and the variables in β are divided into groups g1, g2......, gL where L is the total number of lines
and group gi contains the indices of columns inX (also the indices of entries in β) corresponding to
16
CHAPTER 3. BASIC METHODOLOGY
the terminal buses of the ith line. Assuming that the fault occurs at one line at a time, the non-zero
entries in the solution vector will appear in only one of the groups. To guarantee that the Lasso
problem for each line has a unique solution, ∀i ∈ 1, 2, ..., L the submatrix Xgi should have full
column rank. Note that if the positive sequence network of a system is to be studied, the submatrix
extracted from Zm according to the grouping of g1, g2......, gL will only have two column vectors(or
three for teed lines). If the full three phase model is used, the submatrix will contain six column
vectors corresponding to the terminal buses in phases A, B, and C. However, even for this case, only
two single phase columns need to be compared due to the inherent independence between columns
of different phases.
Intuitively speaking, for two normalized column vectors if the entries at the same row are dif-
ferent from each other then it will be sufficient to consider them to be independent of each other.
However, the bus impedance matrix is usually ill-conditioned with highly correlated columns, thus
the numerical threshold for two entries to be different has to be large enough for reliable performance
of Lasso solvers. This is accomplished by assigning th = 50% of the maximum difference between
two columns as the threshold. As an example, for the jth line from bus k to bus m the following
operation is performed:
d(:, j) = |Zbus(:, k)−Zbus(:,m)| (3.17)
s(:, j) = {i ∈ {1, ...N} : d(i, j) > th ·max(d(:, j))}. (3.18)
This relationship can be summarized as a binary matrix A1 in which each row is related with
each branch and each column with each bus in the system.
A1(i, j) ={ 1 if i ∈ s(:, j)
0 otherwise.(3.19)
17
CHAPTER 3. BASIC METHODOLOGY
Since a given bus voltage can be measured either directly by a PMU installed at the bus or indi-
rectly by a PMU at one of its neighboring buses, similar to the procedure for optimally placing PMUs
for state estimation [35], theN×N (N is the number of buses) bus incidence matrixA2 is defined as:
A2(i, j) ={ 1 if bus i is connected with bus j
0 otherwise.(3.20)
In order to obtain a unique solution using the minimum number of PMUs the following binary
integer optimization problem will be formulated, where xi = 1(i = 1, 2, ....N) implies an installed
PMU at bus i, and the non-zero entries in the productA2x represent the buses whose voltages can
be obtained based on the chosen PMU placement.
minN∑i=1
xi (3.21)
s.t. A1A2x > 1̂,x is binary. (3.22)
The solution of this optimization will yield only partially observable system, unlike the case of
PMU-based state estimation which requires full network observability.
18
Chapter 4
Alternations Improving the Method
Upon the description of the proposed method in 3, several alternations have been made aiming to
improve the performance of the method in terms of reliability, robustness and efficiency.
4.1 Overlapping Group Lasso
In addition to the basic formulation of Lasso optimization, an alternative formulation is also consid-
ered, where the structural feature of the solution in addition to sparsity is taken into account, that
is, the non-zero injections in the solution only appear at the terminal buses of a line. Therefore
the solution is sparse not only element-wise but also group-wise. Recalling 3.3 where the buses
in a system with L lines can be grouped by g1, g2......, gL, it is recognized that these groups are
overlapping since a bus in transmission system is usually connected to more than one line. Therefore,
incorporating the penalty on group sparsity leads to the following formulation [36]:
β̂ ∈ argminβ∈Rp
1
2‖y −Xβ‖22 + λ1‖β‖1 + λ2
L∑i=1
‖βgi‖2 (4.1)
This reformulated problem can be solved by using the “CVX”, a software package for specifying and
solving convex programs [37, 38].
19
CHAPTER 4. ALTERNATIONS IMPROVING THE METHOD
4.2 Second Stage OLS Estimation
4.2.1 Criteria for Valid Solution
After Lasso or overlapping group Lasso solvers return the solution vector for (3.15) or (4.1), the
following criteria are used to check the validity of the solution.
1. After thresholding the absolute value of the entires in the solution with a small number ε, the
sparsity of the solution vector belongs to either 2,4,6 or 3,6,9 where the latter case indicates
fault on a teed circuit line.
2. The indices of “non-zero” entries in the solution belongs to one of the groups of indices g1,
g2......, gL.
3. The ratio between the non-zeros on each bus should satisfy either (3.4) or (3.8).
4.2.2 OLS Backup Estimation Routine
If the solution does not satisfy all the above conditions, then it will be necessary to fix the result based
on whatever Lasso or overlapping group Lasso yields. Fortunately when these two solvers fail to
give the correct coefficients in the solution, the support of the invalid solutions do contain the indices
corresponding to at least one of the faulted line terminals. As a result, several groups of indices from
the set g1, g2......, gL overlapping with the support of the invalid solution are to be considered as
suspect indices. Then the valid solution can be computed by a second stage unregularized ordinary
least square estimation which has also been suggested in [33] under the name “LARS-OLS hybrid”.
In the case of the proposed method, this is done by extracting the columns of the dictionary matrix
X using the suspected group indices (denoted by s) and solving an overdetermined linear equation
via (4.2).
β̂s = (XTs Xs)
−1XTs y (4.2)
20
CHAPTER 4. ALTERNATIONS IMPROVING THE METHOD
The resulting β̂s with the least error will be the final solution. This second stage backup evaluation
can be regarded as a reliable protection for the proposed method which might not work well under
certain noises.
4.3 Extended Robust Formulation and Correction Routine
4.3.1 Extended Robust Formulation
Since chances are individual measurement unit in the system may fail or suffer from corruption, it is
considered necessary to investigate the robustness of the proposed method against such gross error.
This leads us to an extended Lasso formulation in [39], which incorporates sparse missing or grossly
corrupted measurements into (3.15). Below is the the extended version of the underdetermined linear
equation (3.12), where e stands for sparse gross errors added to the correct voltage measurements.
y = Xβ + e =[X I
] β
e
(4.3)
Using (4.3), the only difference in the solution routine with the basic LARS-Lasso is the re-
placement of the original dictionary matrixX with the extended matrix[X I
]. In this way, if
individual measurement units fail or happen to be inaccurate, the e vector will have corresponding
non-zero entries, the locations and values of gross errors along with correct equivalent injections can
be identified simultaneously.
4.3.2 Recursive Solution Routine with Correction of Erroneous Measurement(s)
To achieve better performance using the extended robust formulation, the estimated gross error
vector can be recycled to correct the measurements backwards, a corresponding recursive routine
is designed as in Fig. 4.1. The initially corrupted measurement vector will be updated with the
estimated e and then used for the next round of Lars routine until no gross error is detected. This
approach helps improving the final accuracy of the fault location even though it comes with the
21
CHAPTER 4. ALTERNATIONS IMPROVING THE METHOD
increase of computation time which is not so critical since fault location is an offline application and
single round of Lars takes little time to run.
Start
Implement Lars using
extended formulation
Error vector has
non-zero entry?
Correct the
measurement vector
with estimated
gross error
End of estimation
Figure 4.1: Flowchart for the recursive estimation
4.4 Parallel Implementation
4.4.1 Problem Decomposition
In terms of efficiency of the proposed method, although is has been found that LARS has linear
complexity, as the scale of power system increases, it’s worried that the computation speed of fault
location program will drop below the satisfactory limit for the fault to be repaired in time. Thus
investigations are made on how the sparse estimation problem can be decomposed allowing for
solution using multiple processors running in parallel. Based on [40], there can be three types of data
distribution for the dictionary matrix namely row block distribution, column block distribution and
general block distribution are illustrated, as shown in Fig. 4.2 below.
Among the three distribution scenarios, it’s considered that column block distribution is the most
appropriate one for parallelizing the original Lasso problem because in the application of power
system fault location each column in the dictionary matrix represents a bus in the network and
22
CHAPTER 4. ALTERNATIONS IMPROVING THE METHOD
Figure 4.2: Three Scenarios of Data Distribution
aggregations of buses form sub-networks which is a common way of decomposing large power
networks.
To decompose the original lasso problem into smaller problems, it is necessary to take into account
again the notion of groups of indices. For each line in the network, there are four corresponding
column indices in the dictionary matrix and this four indices form a group. For a network with L
lines, there will be L groups of indices. Thus as shown in the column block distribution scenario,
each parallel processor can take charge of a subset of the dictionary matrix’s columns which is the
union of several groups of indices corresponding to several lines. As with the overlapping Lasso
formulation, the subsets of columns are overlapping since the terminals of a line cannot be assigned
to different groups.
When the problem is decomposed using the column block distribution, it can be easily seen
that the sub-problems are independent with each other. In other words, the sub-problems are
embarrassingly parallel [41]. Take Fig. 4.3 as an example, there are three pairs of estimation problems
to solved in parallel: (A1, b), (A2, b) and (A3, b). Each one of {A1, A2, A3} covers roughly
L/nWorker, (nWorker = 3) groups of lines, the tie-line between two areas will correspond to the
overlapping columns between two sub-matrices.
Assume the sparse injections exist in the x1 part, then the first sub-problem can be solved
smoothly by the first processor, resulting in the same sparse solution with the original problem,
but in the meantime the other sub-problems will run-into exhaustive search or exit halfway due to
internal detection of discrepancies because there is no sparse solution for them. To achieve the goal
23
CHAPTER 4. ALTERNATIONS IMPROVING THE METHOD
Figure 4.3: Example of Column Block Distribution
of reducing computation time, there should be a communication mechanism between processors so
that once a single processor is done with its estimation other processors will be told to stop.
4.4.2 Parallelism Attempted
Based on the sequential Matlab code of the Lars algorithm [42] for solving Lasso, the author has
developed 4 parallel implementations of the algorithm using Matlab Parallel Computation Toolbox
(PCT) [43], Message Passing Interface (MPI) [44], Open Multi-Processing (openMP) [45] and
GPU [46] respectively. The MPI and openMP implementations are modified from the sequential C
code of LARS which uses the GNU Scientific Library (GSL) [47] for defining and calculating vectors
and matrices. The main reason for using the GSL library is that it’s built within the gnu-4.8.1 compiler
and it provides an interface to the Basic Linear Algebra Subprograms (BLAS) [48] operations which
apply to vector and matrix objects. For set operations such as union and set difference the standard
library (STL) [49] has been applied.
The following Fig.4.4 identifies the development path and supporting libraries for each version.
Among the four parallel versions, Matlab PCT, MPI and openMP versions are based on the column
block decomposition of the original problem, therefore they are all embarrassingly parallel, each
thread/worker uses part of the dictionary matrix to apply the Lars algorithm and one of them will
eventually get the correct solution. The GPU version is written in Compute Unified Device Architec-
ture (CUDA) C [50], and GPU-accelerated libraries namely cuBLAS [51] and Thrust [52] which are
analogous to the CPU based GSLBLAS and STL libraries are utilized.
24
CHAPTER 4. ALTERNATIONS IMPROVING THE METHOD
Matlab sequential
C++ sequenital
Matlab PCT C++ with MPI
GSL,
GSLBLAS
STL
CUDA on GPU
cuBLAS
Thrust
C++ with openMP
: parallel program
: sequential program
: supporting library
MPI: Message Passing Interface
openMP: Open Multi-Processing
GPU: Graphics Processing Unit
Figure 4.4: Development process of different implementations
Among the 4 implementations, the Matlab PCT is only used for prototyping the problem decom-
position and the SPMD (single program multiple data) [53] scheme which is also used in the MPI
implementation. Ans for the MPI version, it’s later found out that the communication desired is
incompatible with MPI’s broadcast function. The MPI Bcast() can only come into effect when
it’s called before the use of the broadcasted variable and the root of the broadcast has to be assigned
without knowing which thread can generate the correct sparse solution. Therefore in the MPI version
the overall timing is also not determined by the “successful” thread but the thread who takes the
longest time, nothing can be done to tell other threads to stop when one thread has succeeded.
Therefore, neither Matlab PCT or MPI implementation will used for the efficiency test.
On the other hand, due to the inherent difference between the architecture of multi-core CPU
and GPU, it’s not promising to transplant the solution routine on each core of the MPI or openMP
implementation to each thread in GPU. In this light, the reasonable way is to take advantage of
massive GPU threads to accelerate the matrix/vector operations while keeping the framework of
25
CHAPTER 4. ALTERNATIONS IMPROVING THE METHOD
the sequential C version unchanged. The author has resorted to the cuBLAS library which is a
GPU-accelerated version of the complete standard BLAS library. Similarly the use of STL library
has been replaced by using the its GPU analogous library named Thrust. Both cuBLAS and Thrust
functions implicitly create objects on GPU and do the computation without user-defined kernel
functions. Unfortunately although cuBLAS functions can be called by use-defined kernel functions,
Thrust does not allow so, hence even though it’s fairly convenient to use these two libraries, there is
no freedom in testing different grid and block sizes.
As for the openMP version, the most important merit of it is its use of shared memory instead of
protocol based communication, even though this merit comes with the limit of computation resource:
sheer openMP program can only use the cores equipped for one compute node whereas there is no
such limit for MPI. The broadcast from the “successful” thread to others is easily implemented by
changing the value of a shared variable so that the loop in other threads will break. OpenMP can
control the overall run time by the “successful” thread which is the fastest thread among all.
26
Chapter 5
Simulation Results
To generate an overall picture of the method’s applicability and performance (accuracy, efficiency,
robustness), various tests have been performed from different perspectives. The accuracy of the
estimation is evaluated by the error in fault distance. Assign D as the total length of the faulted line,
d as the distance from the fault point to the terminal bus where the equivalent injection is larger, and
resti and rtrue as the estimated and “true” ratio between the equivalent injections, the relative fault
distance error can be derived by (5.3):
desti =D
resti + 1(5.1)
dtrue =D
rtrue + 1(5.2)
err =‖desti − dtrue‖
dtrue=‖rtrue − resti‖resti + 1
(5.3)
In the following sections, firstly the universal applicability of the method to different fault types
and line types are verified through an unbalanced test case as well as an “teed” line in the IEEE
118 bus test case from [54]. Then the performance of the proposed method and its variations are
evaluated using extensive simulation in the 118 bus test case. A 300 bus test case as well as a 3395
bus test case will also be used for efficiency test of the parallel implementation.
27
CHAPTER 5. SIMULATION RESULTS
5.1 Faults in the Unbalanced Test Case
Using the same unbalanced ATP test case as in Section 3.1.3, the applicability of the proposed method
to identifying faults of various types on both balanced and unbalanced lines has been validated.
For the 10 line sections in the test system (Fig. 3.5), all five types of fault (single to three phase,
grounded and ungrounded) are simulated and the equivalent injections are estimated using full
measurements of all buses because the system is too small to apply the optimized PMU placement.
The measurement noise and gross error is not taken into consideration in this case, since in later
subsections these will be discussed for larger test cases. The relative error of the estimated distance
for all simulated faults are summarized in Table 5.1, where “SLG, DL, DLG, 3L, 3LG” stand for
single phase to ground fault, double phase short circuit fault, double phase to ground fault, three phase
short circuit fault and three phase to ground fault respectively. The fault resistance is set to be 0.1 ohm.
Table 5.1: Relative Error of Fault Distance for the Unbalanced 11-Bus Case
Section ID SLG DL DLG 3L 3LG
Sec 1 3.86e-5 6.75e-5 1.12e-4 7.57e-5 2.41e-5Sec 2 2.10e-5 1.06e-5 1.18e-5 N/A N/ASec 3 3.63e-5 2.30e-5 7.10e-6 N/A N/ASec 4 5.40e-5 5.12e-5 5.06e-5 8.95e-5 1.38e-4Sec 5 2.13e-5 3.30e-5 1.68e-5 7.35e-5 5.41e-5Sec 6 1.37e-4 1.18e-4 7.44e-5 9.62e-5 9.24e-5Sec 7 5.20e-3 2.00e-3 2.40e-3 N/A N/ASec 8 6.16e-5 N/A N/A N/A N/ASec 9 3.80e-3 N/A N/A N/A N/ASec 10 2.41e-5 5.84e-6 1.38e-5 3.49e-5 2.24e-5
From the statistics above it can be seen that the accuracy of the equivalent injection estimation
is comparatively better in grounded faults than ungrounded fault. This is only a rough observation
because the electromagnetic measurements provided by ATP have to be fitted into phasors, the
performance of curve fitting also has effect on the final error. In any case, the proposed method can
be regarded as accurate from practical point of view.
28
CHAPTER 5. SIMULATION RESULTS
5.2 Wide-area Cases
In the following sections the IEEE-118 bus test system for which only positive sequence model is
available will be used firstly to verify the applicability of the proposed method to teed circuit fault,
then examine the effectiveness of the proposed method in terms of accuracy, number of iterations and
number of OLS backup evaluations under different levels of measurement noise. After implementing
the binary optimization shown in (3.22), it came out that 24 PMUs need to be placed making 96 out
of the 118 buses observable.
For faults on each of the 170 lines, changes in bus voltages are simulated by injecting a pair of
proportional currents at the terminal buses commensurate with the description of equivalent injections
in (3.4). The location results for all 170 lines in noise free condition as well as under different levels
of Gaussian measurement noises are collected to generate statistical performance metrics. Both
the LARS algorithm for Lasso and the CVX package are employed in Matlab environment using a
2.6GHz Intel i5 processor.
5.2.1 Teed Circuit Fault
To verify the equivalent three terminal injections described in Section 3.1.1.2, Bus 71 in the 118-Bus
system is reduced as a teed point since it is a zero-injection bus with only bus 70, 72, and 73 as its
neighbors. Faults are simulated on line section 70-71, 71-72 and 71-73 respectively and each fault
can be located correctly. For instance, for a fault on line 70-71 at a point (2/7)th of the line distance
away from bus 70, the equivalent injections estimated by basic LARS-Lasso are:
I70 = 383.0554− j759.1415(p.u.) (5.4)
I72 = 13.1256− j24.9163(p.u.) (5.5)
I73 = 47.1500− j102.6877(p.u.) (5.6)
29
CHAPTER 5. SIMULATION RESULTS
Let z70−71, z71−72 and z71−73 represent the series line impedances and z70−f represent the
impedance from bus 70 to the fault point. Then, comparing the three injections leads to the following
relationships:
I72I73
=z71−72z71−73
=0.0446 + j0.1800
0.0087 + j0.0454(5.7)
zs = z70−71z71−72 + z70−71z71−73 + z71−72z71−73 (5.8)
If = I70 + I72 + I73 = 443.33− j886.75(p.u.) (5.9)
z70−f =I72zs
Ifz71−73= 0.0025 + j0.0102(p.u.) (5.10)
z70−fz70−71
= 0.2863 ≈ 2
7(5.11)
Therefore, the location of the fault which is (2/7)th of the total distance from bus 70 to the teed
point (bus 71) is accurately estimated.
5.2.2 Comparison of Basic LARS-Lasso and Overlapping Group Lasso
The performance of the method is examined by comparing the results from basic LARS-Lasso for-
mulation with overlapping group Lasso formulation in terms of 3 metrics: average computation time,
average number of iterations, average error, and number of cases requiring second estimation among
all fault cases. Since the software package CVX uses a more general purpose interior-point algorithm
other than the LARS algorithm used for Lasso, it is much slower than LARS and the comparison of
computation time is only for the sake of completeness, as it will be shown in later section how the
efficiency will be using different number of computation cores. Graphic representation of comparison
results are shown in Fig. 5.1 to Fig. 5.4.
It can be seen from Fig. 5.1 to Fig. 5.4 that the overlapping group Lasso works more stably
looking from number of iteration, number of OLS backup evaluation under different levels of noise
although it is not that highly accurate as basic Lars implementation, this may be due to the limited
30
CHAPTER 5. SIMULATION RESULTS
0 1e−5 1e−4 1e−3 1e−2 3e−20
0.5
1
1.5
2
2.5
3
3.5
4
Standard Deviation of Measurement Noise
Av
e.
Co
mp
uta
tio
n T
ime
(s
)
Comparison of Computation Time (s)
Lasso
OvlpGrpLasso
Figure 5.1: Computation Times Used by Basic LARS-Lasso and Overlapping Group Lasso.
0 1e−5 1e−4 1e−3 1e−2 3e−20
50
100
150
200
250
Standard Deviation of Measurement Noise
Av
e.
Nu
mb
er
of
Ite
rati
on
Comparison of Number of Iteration
Lasso
OvlpGrpLasso
Figure 5.2: Number of Iterations by Basic LARS-Lasso and Overlapping Group Lasso.
adjustment that can be made to the tolerance of CVX. However, if timing is to be placed importance
on, overlapping group Lasso will be less favorable than basic LARS-Lasso.
31
CHAPTER 5. SIMULATION RESULTS
0 1e−5 1e−4 1e−3 1e−2 3e−20
0.002
0.004
0.006
0.008
0.01
0.012
0.014
Standard Deviation of Measurement Noise
Av
e.
Fa
ult
Dis
tan
ce
Err
or
Comparison of Fault Distance Error
Lasso
OvlpGrpLasso
Figure 5.3: Error of Fault Distance by Basic LARS-Lasso and Overlapping Group Lasso.
0 1e−5 1e−4 1e−3 1e−2 3e−20
10
20
30
40
50
60
Standard Deviation of Measurement Noise
Av
e.
#O
LS
ba
ck
up
Es
tim
ati
on
Comparison of #OLS backup Estimation
Lasso
OvlpGrpLasso
Figure 5.4: Number of OLS Backup Estimations by Basic LARS-Lasso and Overlapping GroupLasso.
32
CHAPTER 5. SIMULATION RESULTS
5.2.3 Robustness of Extended LARS-Lasso Formulation Against Gross Error
To evaluate the robustness of the extended robust formulation, the same set of fault cases under
Gaussian noises are solved using the extended formulation with one and two failed measurements
(output equals zero) respectively and each case is run twice: one with and the other without the
recursive routine. Hence there are four records of performance to be compared with that of the
gross error free cases solved by basic LARS-Lasso. In each gross error-corrupted case the error
vector can be accurately estimated and used for correcting the measurement in the recursive routine.
Fig.5.5 demonstrates the comparison of estimation error under different levels of noise. It can be
concluded that the effect sparse gross errors have on the accuracy of the location method is within
acceptable limits. Applying the recursive routine will help increase the accuracy towards that of the
gross error-free cases.
10−5
10−4
10−3
10−2
10−1
0
0.002
0.004
0.006
0.008
0.01
0.012
Standard Deviation of Measurement Noise
Av
e.
Fa
ult
Dis
tan
ce
Err
or
Comparison of Average Fault Distance Error
Error Free Cases
One Unit Failure Without Correction
One Unit Failure With Correction
Two Unit Failure Without Correction
Two Unit Failure With Correction
Figure 5.5: Comparison of Fault Distance Error under One or Two Unit Failures
33
CHAPTER 5. SIMULATION RESULTS
5.3 Parallel Implementation Results
To evaluate the efficiency of the proposed method especially its parallel implementation, a 300 bus
as well as 3395 bus test case are also used for simulations. The sequential and parallel programs are
run on the discovery cluster offered by the Information Technology Services, Research Computing at
Northeastern University. The queue used belongs to the node group “nodes10g2” whose technical
specifications are as follows:
Table 5.2: Platform Information
CPU Intel Xeon CPU E5-2680 2.8GHz
# logical cores on each node 40
RAM 64GB
5.3.1 Sequential Run Time Benchmark
Before examining the speed-up gained from parallel implementation, the run time of Matlab and
C++ sequential programs are record as below to serve as benchmark for efficiency test.
Table 5.3: Sequential Run Times
nBus Matlab Run Time C++ Run Time
118 0.0245s 0.0116s300 0.0981s 0.1153s3395 89.4747s 213.5774s
5.3.2 Speed-up Using openMP
The openMP implementation is tested using a series number of cores ranging from 5 to 40 which
is the maximum number of usable cores on a compute node. The speed-ups are shown in Fig.5.6
below.
34
CHAPTER 5. SIMULATION RESULTS
0 5 10 15 20 25 30 35 4010
−1
100
101
102
103
Number of Cores used in openMP
Av
e.
Sp
ee
d−
up
Comparison of Speed−ups for Different Cases
118−Bus Case
300−Bus Case
3395−Bus Case
Figure 5.6: Comparison of Speed-ups for Different Cases using openMP
It can be seen that for large system as the 3395 bus test system, the accelerating effect openMP
has on the efficiency is much more obvious than that for smaller cases. And as the number of cores
goes up, the computation time for the 3395 bus case becomes more and more acceptable.
5.3.3 Speed-up Using GPU
Since the library functions automatically launch the kernel functions, it’s not possible to design
the dimension of the grids, therefore there’s only one way of execution for the GPU version. The run
times and corresponding speed-ups are shown in 5.3.3.
Table 5.4: Summary of GPU run times
nBus GPU Run Time Speed-up
118 0.4733s 0.02139300 1.1700s 0.098523395 10.1567s 21.0283
35
CHAPTER 5. SIMULATION RESULTS
It seems odd that for smaller cases like the 118 and 300 bus case, the GPU version is much slower
than the sequential C program. A close look at profiling results indicates that this is mostly caused
by vector initialization and set operation functions using the Thrust library. The following 5.3.3 is
a extracted of the profiling result from which it could be seen that even the most time consuming
cuBLAS function (”void gemv2T kernel val”) is much faster than a Thrust function. Since there’s
a lot set operation in the while loop and this kind of operation involves data dependency between
elements of vectors, it’s not practical to write kernel functions to parallelize them. Therefore so far
the most promising approach to parallelize the proposed method is through openMP.
Table 5.5: Summary of Profiling Results
Total Time% Total Time Total Calls Avg. Time Function Name
53.61% 384.56ms 8496 45.263us void::thrust::system......9.92% 71.172ms 33960 2.0950us CUDA memcpy DtoH (device to host)0.08% 450.72us 30 15.023us void gemv2T kernel val
Despite of the non-ideal result, the profiling result does verify the efficiency of cuBLAS functions
and the dynamic kernel launches by cuBLAS and Thrust function calls (the gridSize and blockSize
varies each time). It’s fair to say that GPU can provide significant speed-up for floating point matrix
operations while it’s not guaranteed to help with integer set operations.
36
Chapter 6
Conclusion
The major contribution of this work is the transformation of fault location problem of both two
terminal lines and teed circuit lines into a sparse estimation problem or to be more specific, Lasso
problem which can be solved effectively by the LARS algorithm. Using the condition for unique
solution to the Lasso formulation, an optimal PMU placement scheme is designed which results
in only partial observability of the system. Beside the basic LARS-Lasso implementation, the
overlapping group Lasso formulation which customizes the penalty for sparsity using structural
information of the solution is also built and implemented. In addition, an extended robust Lasso
formulation designed to tackle sparse gross errors is applied and equipped with an recursive routine
for correcting the discrepant measurements. To improve the method’s efficiency for large systems,
parallel computation techniques are explored and implemented.
Using specifically cases, the proposed method’s universal applicability to different types of faults
and lines are verified. The performance of basic LARS-Lasso implementation, overlapping group
Lasso as well as Lasso with extended robust formulation are examined using wide-area test case.
Furthermore, the result for parallel implementation base on openMP has revealed that for large
systems, the proposed method can be accelerated so that the required time for locating the fault can
be satisfactory.
37
CHAPTER 6. CONCLUSION
Future research topics might include the investigation of the method’s robustness against high
noise levels using different amount of redundant measurements. The issue of parameter error in the
dictionary matrix is also worthy of exploration.
38
Bibliography
[1] M. M. Saha, J. Izykowski, E. Rosołowski, J. Izykowski, and E. Rosołowski, Fault location on
power networks. Springer, 2010.
[2] P. Gale, P. Crossley, X. Bingyin, Y. Ge, B. Cory, and J. Barker, “Fault location based on travelling
waves,” in Fifth International Conference on Developments in Power System Protection. IET,
1993, pp. 54–59.
[3] F. H. Magnago and A. Abur, “Fault location using wavelets,” IEEE Transactions on Power
Delivery, vol. 13, no. 4, pp. 1475–1480, 1998.
[4] C. Y. Evrenosoglu and A. Abur, “Travelling wave based fault location for teed circuits,” IEEE
Transactions on Power Delivery, vol. 20, no. 2, pp. 1115–1121, 2005.
[5] M. Korkali, H. Lev-Ari, and A. Abur, “Traveling-wave-based fault-location technique for
transmission grids via wide-area synchronized voltage measurements,” IEEE Transactions on
Power Systems, vol. 27, no. 2, pp. 1003–1011, 2012.
[6] R. Razzaghi, G. Lugrin, H. Manesh, C. Romero, M. Paolone, and F. Rachidi, “An efficient
method based on the electromagnetic time reversal to locate faults in power networks,” IEEE
Transactions on Power Delivery, vol. 28, no. 3, pp. 1663–1673, 2013.
[7] T. Takagi, Y. Yamakoshi, M. Yamaura, R. Kondow, and T. Matsushima, “Development of a
new type fault locator using the one-terminal voltage and current data,” IEEE Transactions on
Power Apparatus and Systems, no. 8, pp. 2892–2898, 1982.
39
BIBLIOGRAPHY
[8] T. Kawady and J. Stenzel, “A practical fault location approach for double circuit transmission
lines using single end data,” IEEE Transactions on Power Delivery, vol. 18, no. 4, pp. 1166–
1173, 2003.
[9] S. M. Brahma, “Fault location scheme for a multi-terminal transmission line using synchronized
voltage measurements,” IEEE Transactions on Power Delivery, vol. 20, no. 2, pp. 1325–1331,
2005.
[10] D. Novosel, D. G. Hart, E. Udren, and J. Garitty, “Unsynchronized two-terminal fault location
estimation,” IEEE Transactions on Power Delivery, vol. 11, no. 1, pp. 130–138, 1996.
[11] M. Kezunovic, “Smart fault location for smart grids,” IEEE Transactions on Smart Grid, vol. 2,
no. 1, pp. 11–22, 2011.
[12] C.-W. Liu, K.-P. Lien, C.-S. Chen, and J.-A. Jiang, “A universal fault location technique for
N-terminal transmission lines,” IEEE Transactions on Power Delivery, vol. 23, no. 3, pp.
1366–1373, 2008.
[13] T. Funabashi, H. Otoguro, Y. Mizuma, L. Dube, and A. Ametani, “Digital fault location
for parallel double-circuit multi-terminal transmission lines,” IEEE Transactions on Power
Delivery, vol. 15, no. 2, pp. 531–537, 2000.
[14] G. Manassero Jr, E. C. Senger, R. M. Nakagomi, E. L. Pellini, and E. C. N. Rodrigues, “Fault-
location system for multiterminal transmission lines,” IEEE Transactions on Power Delivery,
vol. 25, no. 3, pp. 1418–1426, 2010.
[15] D. A. Tziouvaras, J. B. Roberts, and G. Benmouyal, “New multi-ended fault location design for
two-or three-terminal lines,” 2001.
[16] M. Abe, N. Otsuzuki, T. Emura, and M. Takeuchi, “Development of a new fault location system
for multi-terminal single transmission lines,” IEEE Transactions on Power Delivery, vol. 10,
no. 1, pp. 159–168, 1995.
40
BIBLIOGRAPHY
[17] T. Nagasawa, M. Abe, N. Otsuzuki, T. Emura, Y. Jikihara, and M. Takeuchi, “Development
of a new fault location algorithm for multi-terminal two parallel transmission lines,” IEEE
Transactions on Power Delivery, vol. 7, no. 3, pp. 1516–1532, 1992.
[18] H. Livani and C. Y. Evrenosoglu, “A fault classification and localization method for three-
terminal circuits using machine learning,” IEEE Transactions on Power Delivery, vol. 28, no. 4,
pp. 2282 – 2290, 2013.
[19] R. Mahanty and P. D. Gupta, “Application of RBF neural network to fault classification and
location in transmission lines,” IEE Proceedings-Generation, Transmission and Distribution,
vol. 151, no. 2, pp. 201–212, 2004.
[20] J. Gracia, A. Mazon, and I. Zamora, “Best ANN structures for fault location in single-and
double-circuit transmission lines,” IEEE Transactions on Power Delivery, vol. 20, no. 4, pp.
2389–2395, 2005.
[21] Z. Chen and J.-C. Maun, “Artificial neural network approach to single-ended fault locator for
transmission lines,” IEEE Transactions on Power Systems, vol. 15, no. 1, pp. 370–375, 2000.
[22] A. Mazon, I. Zamora, J. Minambres, M. Zorrozua, J. Barandiaran, and K. Sagastabeitia, “A new
approach to fault location in two-terminal transmission lines using artificial neural networks,”
Electric Power Systems Research, vol. 56, no. 3, pp. 261–266, 2000.
[23] Q. Jiang, X. Li, B. Wang, and H. Wang, “PMU-based fault location using voltage measurements
in large transmission networks,” IEEE Transactions on Power Delivery, vol. 27, no. 3, pp.
1644–1652, 2012.
[24] A. Salehi-Dobakhshari and A. M. Ranjbar, “Application of synchronised phasor measurements
to wide-area fault diagnosis and location,” IET Generation, Transmission & Distribution, vol. 8,
no. 4, pp. 716–729, 2014.
[25] M. Majidi, A. Arabali, and M. Etezadi-Amoli, “Fault location in distribution networks by
compressive sensing,” IEEE Transactions on Power Delivery (Early Access), 2014.
41
BIBLIOGRAPHY
[26] H. Zhu and G. B. Giannakis, “Sparse overcomplete representations for efficient identification
of power line outages,” IEEE Transactions on Power Systems, vol. 27, no. 4, pp. 2215–2224,
2012.
[27] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, “Malicious data attacks on smart grid state estima-
tion: Attack strategies and countermeasures,” in 2010 First IEEE International Conference on
Smart Grid Communications (SmartGridComm). IEEE, 2010, pp. 220–225.
[28] C. Wang, Q.-Q. Jia, X.-B. Li, and C.-X. Dou, “Fault location using synchronized sequence
measurements,” International Journal of Electrical Power & Energy Systems, vol. 30, no. 2, pp.
134–139, 2008.
[29] IEEE Distribution System Analysis Subcommittee. (2013) Distribution test feeders. [Online].
Available: http://ewh.ieee.org/soc/pes/dsacom/testfeeders/index.html
[30] A. Sharif-Nassab, M. Kharratzadeh, M. Babaie-Zadeh, and C. Jutten, “How to use real-valued
sparse recovery algorithms for complex-valued sparse recovery?” in Proceedings of the 20th
European Signal Processing Conference (EUSIPCO). IEEE, 2012, pp. 849–853.
[31] D. L. Donoho, “For most large underdetermined systems of linear equations the minimal L1-
norm solution is also the sparsest solution,” Communications on Pure and Applied Mathematics,
vol. 59, no. 6, pp. 797–829, 2006.
[32] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical
Society. Series B (Methodological), pp. 267–288, 1996.
[33] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani et al., “Least angle regression,” The Annals of
statistics, vol. 32, no. 2, pp. 407–499, 2004.
[34] R. J. Tibshirani et al., “The lasso problem and uniqueness,” Electronic Journal of Statistics,
vol. 7, pp. 1456–1490, 2013.
[35] B. Xu and A. Abur, “Observability analysis and measurement placement for systems with
pmus,” in IEEE PES Power Systems Conference and Exposition. IEEE, 2004, pp. 943–946.
42
BIBLIOGRAPHY
[36] L. Yuan, J. Liu, and J. Ye, “Efficient methods for overlapping group lasso,” in Advances in
Neural Information Processing Systems, 2011, pp. 352–360.
[37] M. Grant and S. Boyd, “Graph implementations for nonsmooth convex programs,” in Recent
Advances in Learning and Control, ser. Lecture Notes in Control and Information Sciences,
V. Blondel, S. Boyd, and H. Kimura, Eds. Springer-Verlag Limited, 2008, pp. 95–110,
http://stanford.edu/∼boyd/graph dcp.html.
[38] ——, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.
com/cvx, Mar. 2014.
[39] N. M. Nasrabadi, T. D. Tran, and N. Nguyen, “Robust lasso with missing and grossly corrupted
observations,” in Advances in Neural Information Processing Systems, 2011, pp. 1881–1889.
[40] Z. Peng, M. Yan, and W. Yin, “Parallel and distributed sparse optimization,” in Signals, Systems
and Computers, 2013 Asilomar Conference on. IEEE, 2013, pp. 659–646.
[41] Argonne National Laboratory, “Parallel Algorithm Examples,” http://www.mcs.anl.gov/∼itf/
dbpp/text/node10.html.
[42] S. S. Kim, “Lars Algorithm,” http://www.mathworks.com/matlabcentral/fileexchange/
23186-lars-algorithm, Mar. 2009.
[43] T. Mathworks, “Matlab Parallel Computing Toolbox,” http://www.mathworks.com/help/
distcomp/index.html, 2015.
[44] Argonne National Laboratory, “The Message Passing Interface (MPI) Standard,” http://www.
mcs.anl.gov/research/projects/mpi/.
[45] Blaise Barney, Lawrence Livermore National Laboratory, “openMP,” https://computing.llnl.
gov/tutorials/openMP/.
[46] Nvidia, “What is GPU Accelerated Computing?” http://www.nvidia.com/object/
what-is-gpu-computing.html.
43
BIBLIOGRAPHY
[47] “GSL - GNU Scientific Library,” http://www.gnu.org/software/gsl/, Aug. 2014.
[48] “BLAS (Basic Linear Algebra Subprograms),” http://www.netlib.org/blas/, Sep. 2014.
[49] “Standard C++ Library Reference,” http://www.cplusplus.com/reference/, Feb. 2015.
[50] Nvidia, “CUDA C Programming Guide,” https://docs.nvidia.com/cuda/
cuda-c-programming-guide/.
[51] “cuBLAS,” http://docs.nvidia.com/cuda/cublas/#axzz3XkIZYzaJ, Feb. 2015.
[52] “Thrust,” http://thrust.github.io/, Feb. 2015.
[53] “SPMD,” http://www.mathworks.com/help/distcomp/spmd.html, Feb. 2015.
[54] University of Washinton, Seattle. (1993) Power systems testcase archive. [Online]. Available:
http://www.ee.washington.edu/research/pstca/
44
Appendix A
Appendix
45
APPENDIX A. APPENDIX
A.1
Matlab Code Solving Lasso Problem
This part of code implements the LARS with extended formulation. The codes for basic LARS and
overlapping group Lasso are similar to this thus omitted.
1 function stat = solver_robust_lars_fun(sig, errV_unit, err_level, lam_e, correct
)
2 global nBus nBranch BranchTypeInfo MeasuredBus Aeq Aeq_noQR Q Neighbour Zbus_red
Zbus_f nG G isOpt LengthRatio I0 Result Flag Iter_Err
3
4 nCase = 0;
5 noise_ratio = -1*ones(nBranch,1);
6 % Iter_Err = (-1)*ones(nBranch,2);
7 Com_time = (-1)*ones(nBranch,1);
8 Require_2nd_eval = false(nBranch,1);
9
10 true_inj = [real(I0), imag(I0),LengthRatio*real(I0),LengthRatio*imag(I0)];
11 true_inj = sort(true_inj); % increasing order
12
13 [Q,R] = qr(Aeq_noQR);
14 ext_Aeq = [R, lam_e*inv(Q)];
15 %ext_Aeq = [Aeq, lam_e*eye(2*length(MeasuredBus))];
16
17 stopCriterion = {};
18 stopCriterion{1,1} = 'MSE';%'maxKernels';
19 MSE = 1e-6;
20 stopCriterion{1,2} = MSE;%1000
21 for fault_sec_id = 1:1:nBranch;
22 if BranchTypeInfo(fault_sec_id,3) > 0
23 Flag(fault_sec_id) = -1;
24 continue;
25 end
46
APPENDIX A. APPENDIX
26 nCase = nCase + 1;
27 % Flag(fault_sec_id) = 1;
28 Fault_sec = BranchTypeInfo(fault_sec_id,1:2);
29 I_s = sparse(Fault_sec.', [1;1], [LengthRatio*I0; I0], nBus, 1);
30 I = full(I_s);
31 deltaV = Zbus_f*I;
32 V_noise = sig.*randn(nBus,1); % normal distribution with a mean of 0 and a
standard deviation of sig.
33 deltaV = deltaV+V_noise;
34 noise_ratio(nCase,1) = norm(V_noise)/norm(deltaV);
35
36 deltaV2 = deltaV(MeasuredBus);
37
38 beq_org = [real(deltaV2); imag(deltaV2)];
39
40 deltaV2(errV_unit,1) = deltaV2(errV_unit,1)*err_level;
41
42 beq = [real(deltaV2); imag(deltaV2)];
43
44 err_set = beq - beq_org;
45
46 beq = Q\beq;
47
48 Result{nCase,1} = fault_sec_id;
49 Result{nCase,2} = Fault_sec;
50
51 isValid = false;
52 unsolvableCase = 0;
53
54 existErr = true;
55 while existErr == true
56 t0 = clock;
57 sol = lars(beq, ext_Aeq, [], 'lasso',stopCriterion,[],0,1); % solve by
lars
47
APPENDIX A. APPENDIX
58 time1 = etime(clock, t0);
59
60 [tttt,nIter] = size(sol);
61 Result{nCase,5} = nIter;
62 active_set = sol(1,nIter).active_set;
63 beta = sol(1,nIter).beta;
64
65 th = 2e-2*max(abs(beta));
66 inj_beta_idx = find(abs(beta)>th);
67
68 inj_nodes_idx = active_set(inj_beta_idx);%%%%%%%%%%%%%%%%
69 inj = beta(inj_beta_idx);
70
71 temp1 = find(inj_nodes_idx>2*nBus);
72 if ˜isempty(temp1)
73 err_val = inj(temp1)*lam_e;
74 err_node_idx = inj_nodes_idx(temp1)-2*nBus;
75 temp2 = find(inj_nodes_idx<=2*nBus);
76 inj_nodes_idx = inj_nodes_idx(temp2);
77 inj = inj(temp2);
78 Result{nCase,10} = err_val;
79 Result{nCase,11} = err_node_idx;
80 error_vector = zeros(2*length(MeasuredBus),1);
81 error_vector(err_node_idx) = err_val.';
82 err_of_err = norm(error_vector-err_set)/norm(err_set);
83 Result{nCase,12} = err_of_err;
84
85 existErr = false;
86
87 % beq = beq - Q\error_vector;
88
89 if correct == true
90 beq = beq - Q\error_vector;
91 existErr = true; % go back to do lars again
48
APPENDIX A. APPENDIX
92 end
93 else
94 existErr = false;
95 end
96
97 end
98 Result{nCase,3} = inj_nodes_idx;
99 Result{nCase,4} = inj;
100 %% see if the inj is valid
101 if mod(length(inj_nodes_idx),2)==0
102 isValid = checkInj(inj, inj_nodes_idx, nBus);
103 else
104 isValid = false;
105 end
106 %% 2nd stage OLS
107 if ˜isValid
108 Flag(fault_sec_id) = -1;
109 Require_2nd_eval(fault_sec_id) = true;
110 [sel_idx, nOLS_test, inj_best, minResidual] = OLS_backup(inj_nodes_idx,
nBus, beq, Aeq, Neighbour); % global Aeq Neighbour
111
112 if sel_idx(1)˜=Fault_sec(1) || sel_idx(2)˜=Fault_sec(2)
113 % if minResidual > norm(error_vector)
114 disp(['case', num2str(nCase),' for fault on ', num2str(Fault_sec(1)),
'-', num2str(Fault_sec(2)),' could not be solved through OLS!'])
115 nIter = -1;
116 unsolvableCase = -1;
117 Flag(fault_sec_id) = -1;
118 end
119 Result{nCase,7} = sel_idx;
120 Result{nCase,9} = nOLS_test;
121 inj = inj_best.';
122 Result{nCase,8} = inj;
123 else
49
APPENDIX A. APPENDIX
124 Flag(fault_sec_id) = 1;
125 end
126 %% Record stats
127 Com_time(nCase) = time1;
128 Iter_Err(nCase,1) = nIter;
129 if unsolvableCase > -1
130 unsolvableCase = genError3(true_inj, inj, LengthRatio);
131 Result{nCase,6} = unsolvableCase;
132 end
133 Iter_Err(nCase,2) = unsolvableCase;
134
135 end
136 %% Process stats
137 total_2nd_eval = sum(Require_2nd_eval)
138 % Iter_Err = Iter_Err(1:nCase,:);
139 Iter_Err_idx_1 = find(Iter_Err(:,1) > 0);
140 Iter_Err_idx_2 = find(Iter_Err(:,2) > 0);
141 valid_iter = Iter_Err(Iter_Err_idx_1,1);
142 valid_error = Iter_Err(Iter_Err_idx_2,2);
143 noise_ratio = noise_ratio(1:nCase,:);
144
145 ave_noise = mean(noise_ratio)
146 ave_iter = mean(valid_iter)
147 ave_err = mean(valid_error)
148 ave_time = mean(Com_time(1:nCase))
149 stat = [ave_noise, ave_iter, ave_err, ave_time, total_2nd_eval];
150 end
50
APPENDIX A. APPENDIX
A.2
Parallel C Code using openMP
This piece of attached code is the parallel C code using openMP for implementing LARS. The codes
for sequential C and MPI implementation are similar to this thus omitted.
1 // FL_Lars.cpp : Defines the entry point for the console application.
2 //
3
4 //#include "stdafx.h"
5
6 //#include "Lars.h"
7 //#include "util.h"
8 #include <time.h>
9 #include <omp.h> // openmp header
10 #include <stdlib.h>
11
12 #include <iostream>
13 #include <stdio.h>
14 #include <stddef.h>
15 #include <math.h>
16 #include <algorithm>
17 #include <string>
18 #include <vector>
19
20 #include <gsl/gsl_math.h>
21 #include <gsl/gsl_vector.h>
22 #include <gsl/gsl_matrix.h>
23 #include <gsl/gsl_blas.h>
24 #include <gsl/gsl_linalg.h>
25 #include <gsl/gsl_errno.h>
26
27 //using namespace std;
51
APPENDIX A. APPENDIX
28 #define MM_MAX_LINE_LENGTH 1025
29 #define MM_PREMATURE_EOF 12
30 #define PI 3.141592653589793238462643383279
31 #define MSE_th 0.001
32 #define RESOLUTION_OF_LARS 1e-12
33 #define REGULARIZATION_FACTOR 1e-11
34 #define MAX_ITER 50
35 #define BETA_TH 1e-3
36
37 double sum_gsl_vec(gsl_vector *vv)
38 {
39 double result;
40 result = 0;
41 for (size_t k = 0; k < vv->size; k++)
42 {
43 result = result + gsl_vector_get (vv, k);
44 }
45 return result;
46 }
47
48 int new_mm_read_mtx_array_size(FILE *f, int *M, int *N)
49 {
50 char line[MM_MAX_LINE_LENGTH];
51 int num_items_read;
52 /* set return null parameter values, in case we exit with errors */
53 *M = *N = 0;
54
55 /* now continue scanning until you reach the end-of-comments */
56 do
57 {
58 if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL)
59 return MM_PREMATURE_EOF;
60 }while (line[0] == '%');
61
52
APPENDIX A. APPENDIX
62 /* line[] is either blank or has M,N, nz */
63 if (sscanf(line, "%d %d", M, N) == 2)
64 return 0;
65
66 else /* we have a blank line */
67 do
68 {
69 num_items_read = fscanf(f, "%d %d", M, N);
70 if (num_items_read == EOF) return MM_PREMATURE_EOF;
71 }
72 while (num_items_read != 2);
73
74 return 0;
75 }
76
77 int print_matrix(FILE *f, const gsl_matrix *m)
78 {
79 int status, n = 0;
80
81 for (size_t i = 0; i < m->size1; i++) {
82 for (size_t j = 0; j < m->size2; j++) {
83 if ((status = fprintf(f, "%.12f ", gsl_matrix_get(m, i,
j))) < 0)
84 return -1;
85 n += status;
86 }
87
88 if ((status = fprintf(f, "\n")) < 0)
89 return -1;
90 n += status;
91 }
92
93 return n;
94 }
53
APPENDIX A. APPENDIX
95
96 int print_vector(FILE *f, const gsl_vector *m)
97 {
98 int status, n = 0;
99
100 for (size_t j = 0; j < m->size; j++) {
101 if ((status = fprintf(f, "%.12f\n", gsl_vector_get(m, j))) < 0)
102 return -1;
103 n += status;
104 }
105 return n;
106 }
107
108 int main(int argc, char **argv)
109 {
110 int max_n_threads = omp_get_max_threads();
111 printf("max #threads = %d\n",max_n_threads);
112
113 int n_avail_procs = omp_get_num_procs();
114 printf("# of available processors = %d\n",n_avail_procs);
115
116 if (argc <2){printf("need exactly 1 argument for #threads\n");return 0;}
117
118 int caseID = atoi(argv[1]);
119
120 int startFaultCaseID = atoi(argv[2]);
121
122 int endFaultCaseID = atoi(argv[3]);
123 int nTotalFaultCases = endFaultCaseID-startFaultCaseID+1;
124
125 int n_assigned_threads = atoi(argv[4]);
126
127 omp_set_num_threads(n_assigned_threads);
128 printf("#threads assigned = %d\n",n_assigned_threads);
54
APPENDIX A. APPENDIX
129
130 double start_time, end_time, exe_time;
131
132 //bool toldToStop = 0;
133 int othersStop = 0;
134
135 int tid;
136 gsl_set_error_handler_off();
137 /*=========================Build the sub-cases======================== */
138 //BUILD CASE===Lars_prob Lars_sub(caseID);
139
140 size_t nRow;
141 size_t nBr;
142
143 gsl_vector *b;
144 gsl_matrix *groupIdx;
145
146 FILE *f;
147 int m, n;
148 int row, col;
149 double entry;
150 /* Read Original Am */
151 gsl_matrix *Am;
152 char s[20];
153 sprintf(s, "Data/ieee%d/Aeq_%d.dat", caseID, caseID);
154 printf("reading %s\n", s);
155
156 f = fopen(s, "r");
157 if(f == NULL) {
158 printf("ERROR: %s does not exist, exiting.\n", s);
159 exit(EXIT_FAILURE);
160 }
161 //mm_read_mtx_array_size(f, &m, &n);
162 new_mm_read_mtx_array_size(f, &m, &n);
55
APPENDIX A. APPENDIX
163 Am = gsl_matrix_calloc(m, n);
164 for(int i = 0; i < m*n; i++) {
165 row = i % m;
166 col = floor((double)i/m);
167 fscanf(f, "%lf", &entry);
168 gsl_matrix_set(Am, row, col, entry);
169 }
170 fclose(f);
171
172 /* Read groupIdx */
173 sprintf(s, "Data/ieee%d/groupIdx_%d.dat", caseID, caseID);
174 printf("reading %s\n", s);
175
176 f = fopen(s, "r");
177 if(f == NULL) {
178 printf("ERROR: %s does not exist, exiting.\n", s);
179 exit(EXIT_FAILURE);
180 }
181 //mm_read_mtx_array_size(f, &m, &n);
182 new_mm_read_mtx_array_size(f, &m, &n);
183 groupIdx = gsl_matrix_calloc(m, n);
184 for(int i = 0; i < m*n; i++) {
185 row = i % m;
186 col = floor((double)i/m);
187 fscanf(f, "%lf", &entry);
188 gsl_matrix_set(groupIdx, row, col, entry);
189 }
190 fclose(f);
191
192 nRow = Am->size1;
193 //nTotalCol = Am->size2;
194 nBr = groupIdx->size1;
195 size_t nGroupPerWorker = floor((double)nBr/n_assigned_threads);
196 //beta = gsl_vector_calloc(nCol);
56
APPENDIX A. APPENDIX
197 /*======== Decompose A =========*/
198
199 //#pragma omp barrier
200 //start_time = omp_get_wtime();
201 gsl_vector *time_array;
202 time_array = gsl_vector_calloc(nTotalFaultCases);
203
204 int iFaultCase ;
205 for(iFaultCase = startFaultCaseID; iFaultCase <= endFaultCaseID; iFaultCase
++ )
206 {
207 /* Read b */
208 sprintf(s, "Data/ieee%d/beq_%d.dat", caseID, iFaultCase);
209 printf("reading %s\n", s);
210
211 f = fopen(s, "r");
212 if(f == NULL) {
213 printf("ERROR: %s does not exist, exiting.\n", s);
214 exit(EXIT_FAILURE);
215 }
216 //mm_read_mtx_array_size(f, &m, &n);
217 new_mm_read_mtx_array_size(f, &m, &n);
218 b = gsl_vector_calloc(m);
219 for (int i = 0; i < m; i++) {
220 fscanf(f, "%lf", &entry);
221 gsl_vector_set(b, i, entry);
222 }
223 fclose(f);
224 #pragma omp parallel private(tid)
225 {
226 size_t nCol;
227 gsl_matrix *A;
228 bool no_xtx = 0;
229 gsl_vector *beta;
57
APPENDIX A. APPENDIX
230 std::vector<size_t> active;
231 int nIter = 0;
232 int canPrintSol = 0;
233 tid = omp_get_thread_num();
234
235 nCol = 0;
236
237 std::vector<size_t> temp_idx;
238 size_t groupIdx_temp;
239 for(size_t t = (tid)*nGroupPerWorker; t < std::min(nBr,(tid+1)*
nGroupPerWorker); t++)
240 {
241 for(size_t p = 0; p < 4; p++)
242 {
243 groupIdx_temp = (size_t)gsl_matrix_get(groupIdx, t, p)-1;
244 std::vector<size_t>::iterator it;
245 it = std::find(temp_idx.begin(), temp_idx.end(), groupIdx_temp);
246 if (it == temp_idx.end()) // not exist
247 {
248 temp_idx.push_back(groupIdx_temp);
249 nCol += 1;
250 }
251 }
252 }
253 std::sort(temp_idx.begin(),temp_idx.end());
254 A = gsl_matrix_calloc(nRow,nCol);
255 beta = gsl_vector_calloc(nCol);
256
257 gsl_vector *A_temp_col;
258 A_temp_col = gsl_vector_calloc(nRow);
259
260 for(size_t p = 0; p < nCol; p++)
261 {
262 gsl_matrix_get_col(A_temp_col, Am, temp_idx[p]);
58
APPENDIX A. APPENDIX
263 gsl_matrix_set_col (A, p, A_temp_col);//Aeq_temp = Aeq(:,Idx_worker{
k,1});
264 }
265 gsl_vector_free(A_temp_col);
266 temp_idx.clear();
267 //gsl_matrix_free(Am);
268 /*========================END OF CASE BUILDING
===========================*/
269
270 #pragma omp barrier
271 start_time = omp_get_wtime();
272
273 /*========================SOLVE THE SUB-PROB
==============================*/
274 std::vector <size_t> all_candidate;
275 for(size_t i=0; i<nCol; i++) all_candidate.push_back(i);
276 gsl_matrix *x;
277 x = gsl_matrix_calloc(nRow, nCol);
278 gsl_matrix_memcpy(x, A);
279 gsl_vector *temp_col;
280 temp_col = gsl_vector_calloc(nRow);
281 double temp_sum_mean;
282 double temp_access;
283 for(size_t i = 0; i < nCol; i++)// i th column
284 {
285 gsl_matrix_get_col(temp_col, x, i);//x_ith_col->temp_col
286 temp_sum_mean = sum_gsl_vec(temp_col);
287 temp_sum_mean = temp_sum_mean/nRow;
288
289 gsl_vector_add_constant(temp_col, -temp_sum_mean);//substract mean
290 gsl_matrix_set_col(x, i, temp_col);// copy temp_col back to x
291 }
292 // x = x./ sx_rep;
293 gsl_vector *sx;
59
APPENDIX A. APPENDIX
294 sx = gsl_vector_calloc(nCol);
295 gsl_matrix *x2;
296 x2 = gsl_matrix_calloc(nRow, nCol);
297 gsl_vector *temp_col2;
298 temp_col2 = gsl_vector_calloc(nRow);
299 gsl_matrix_memcpy(x2, x);
300 gsl_matrix_mul_elements(x2, x);//x2<-x2.*x;
301 for(size_t i = 0; i < nCol; i++)// i th column
302 {
303 gsl_matrix_get_col(temp_col, x, i);
304 gsl_matrix_get_col(temp_col2, x2, i);
305 temp_sum_mean = sum_gsl_vec(temp_col2);
306 temp_sum_mean = sqrt(temp_sum_mean);
307 //if (nCol*nCol > 1000000)
308 if (nCol*nCol > 100000000)
309 {
310 if(temp_sum_mean < RESOLUTION_OF_LARS)
311 {
312 temp_sum_mean = GSL_POSINF;
313 //printf("sx(%lu)<RESOLUTION_OF_LARS, need to be erased from
all_candidate\n",i);
314 all_candidate.erase(all_candidate.begin()+i);//all_candidate
= find(sx < GSL_POSINF);
315 }
316 }
317
318 gsl_vector_set(sx, i, temp_sum_mean);
319 gsl_vector_scale(temp_col, 1/temp_sum_mean);//x = x./ sx_rep;
320 gsl_matrix_set_col(x, i, temp_col);// copy temp_col back to x
321 }
322 gsl_matrix_free(x2);
323 gsl_vector_free(temp_col);
324 gsl_vector_free(temp_col2);
325
60
APPENDIX A. APPENDIX
326 //
327 gsl_matrix *xtx;
328 //xtx = gsl_matrix_calloc(nCol, nCol);
329 //if (nCol*nCol > 10000000)
330 if (nCol*nCol > 1000000000)
331 {
332 //printf("[%d]Too large matrix, lars will not pre-calculate xtx\n",
tid);
333 no_xtx = 1;
334 }
335 else
336 {
337 xtx = gsl_matrix_calloc(nCol, nCol);
338
339 //xtx = xˆT*x;
340
341 gsl_matrix *temp_c;
342 temp_c = gsl_matrix_alloc(nCol,nCol);
343 gsl_matrix_set_zero(temp_c);
344 gsl_blas_dgemm(CblasTrans, CblasNoTrans, 1.0, x, x, 1.0, temp_c);
345 gsl_matrix_memcpy(xtx, temp_c);
346
347 }
348 gsl_vector *y;
349 y = gsl_vector_calloc(nRow);
350
351 gsl_vector_memcpy(y, b);
352
353 double my;
354 my = sum_gsl_vec(b)/nRow;
355 gsl_vector_add_constant(y, -my);//y = b - my;
356 /*==========INITIALIZATION==========*/
357 active.clear();
358 //inactive = all_candidate;
61
APPENDIX A. APPENDIX
359 std::vector<size_t> inactive(all_candidate);
360 gsl_vector *mu_a;
361 mu_a = gsl_vector_calloc(nRow);
362 gsl_vector_set_zero(mu_a);
363
364 gsl_vector *mu_a_plus;
365 mu_a_plus = gsl_vector_calloc(nRow);
366 gsl_vector *mu_a_OLS;
367 mu_a_OLS = gsl_vector_calloc(nRow);
368
369 //gsl_vector *beta;
370 //beta = gsl_vector_calloc(nCol);
371 gsl_vector_set_zero(beta);
372 gsl_vector *beta_new;
373 beta_new = gsl_vector_calloc(nCol);
374 gsl_vector *beta_OLS;
375 beta_OLS = gsl_vector_calloc(nCol);
376
377 gsl_vector *y2;
378 y2 = gsl_vector_calloc(nRow);
379
380 gsl_vector_memcpy(y2, y);
381 gsl_vector_mul(y2, y);
382 double MSE = sum_gsl_vec(y2);
383 MSE = MSE/nRow;
384 gsl_vector_free(y2);
385
386 gsl_vector *c; //correlation vector
387 c = gsl_vector_calloc(nCol);
388 double C_max;
389 //C_max = max(abs(c));
390 size_t C_max_ind;
391 std::vector<size_t> C_max_ind_pl;
392 std::vector<size_t> drop;
62
APPENDIX A. APPENDIX
393 /*==========MAIN LOOP===============*/
394 //othersStop = 0;
395 //MPI_Bcast(&othersStop, 1, MPI_INT, tid, MPI_COMM_WORLD);
396 while(nIter < MAX_ITER )//&& othersStop==0)
397 {
398 if(othersStop == 1)
399 {
400 canPrintSol = 0;
401 break;
402 }
403
404 if(MSE <= MSE_th)
405 {
406 printf("[%d]MSE <= MSE_th, break.\n", tid);
407 canPrintSol = 1;
408 break;
409 }
410
411 nIter += 1;
412 //c = xˆT*(y-mu_a);
413 gsl_vector *y_mu_a;
414 y_mu_a = gsl_vector_calloc(nRow);
415
416 gsl_vector_memcpy(y_mu_a, y);
417 gsl_vector_sub(y_mu_a, mu_a);
418
419 gsl_vector_set_zero(c);
420 gsl_blas_dgemv(CblasTrans, 1.0, x, y_mu_a, 1.0, c);
421
422 gsl_vector_free(y_mu_a);
423
424 double c_temp;
425 gsl_vector *abs_c_inactive;
426 abs_c_inactive = gsl_vector_calloc((size_t)inactive.size());
63
APPENDIX A. APPENDIX
427 for(size_t k = 0; k < (size_t)inactive.size(); k++)
428 {
429 c_temp = gsl_vector_get(c,inactive[k]);
430 if(c_temp <0) c_temp = -c_temp;
431
432 gsl_vector_set(abs_c_inactive, k, c_temp);
433 }
434 C_max = gsl_vector_max(abs_c_inactive);
435 C_max_ind = gsl_vector_max_index(abs_c_inactive);
436 C_max_ind = inactive[C_max_ind];//C_max_ind = inactive(C_max_ind);
437 //active = sort(union(active,C_max_ind));
438 std::vector<size_t>::iterator it;
439 it = std::find(active.begin(), active.end(), C_max_ind);
440 if(it == active.end())
441 {
442 active.push_back(C_max_ind);
443 //if(tid==10) fprintf(f_act_record, "[%d] %lu added to active
set\n", nIter, C_max_ind);
444 }
445 std::sort(active.begin(), active.end());
446 //C_max_ind_pl = abs(c(inactive))>C_max-RESOLUTION_OF_LARS;
447 for(size_t k = 0; k < abs_c_inactive->size; k++)
448 {
449 if(gsl_vector_get(abs_c_inactive,k) > (C_max-RESOLUTION_OF_LARS)
)
450 {
451 C_max_ind_pl.push_back(inactive[k]);//C_max_ind_pl =
inactive(C_max_ind_pl);
452 it = std::find(active.begin(), active.end(), inactive[k]);
453 if (it == active.end())
454 {
455 active.push_back(inactive[k]);//active = sort(union(
active, C_max_ind_pl));
456 //if(tid==10)fprintf(f_act_record,"[%d]%lu added to
64
APPENDIX A. APPENDIX
active set\n", nIter, inactive[k]);
457 }
458 }
459 }
460 std::sort(active.begin(), active.end());
461 gsl_vector_free(abs_c_inactive);
462
463 std::vector<size_t> inactive_temp;
464 //inactive = setdiff(all_candidate, active);
465 std::set_difference(all_candidate.begin(),all_candidate.end(),active
.begin(),active.end(), std::inserter(inactive_temp,
inactive_temp.begin()));
466 inactive = inactive_temp;
467 std::vector<size_t>::iterator it2;
468 it2 = std::find(drop.begin(), drop.end(), C_max_ind);
469
470 if(!drop.empty() && it2 == drop.end())
471 {
472 //active(find(active==C_max_ind))=[];
473 //printf("[%d][%d]Dropped item and index of maximum correlation
is not the same. But it is being ignored here...\n", tid,
nIter);
474 canPrintSol = 0;
475 break;
476 }
477 if(!drop.empty())
478 {
479 C_max_ind = 0;
480 C_max_ind_pl.clear();
481 //printf("size of drop is %lu, drop has %lu.\n",drop.size(),
drop[0]);
482 }
483
484 //active = setdiff(active,drop);
65
APPENDIX A. APPENDIX
485 std::vector<size_t> active_temp;
486 std::set_difference(active.begin(),active.end(),drop.begin(),drop.
end(), std::inserter(active_temp, active_temp.begin()));
487 active = active_temp;
488
489 //inactive = sort(union(inactive,drop));
490 std::sort(drop.begin(), drop.end());
491 std::set_union(inactive.begin(),inactive.end(),drop.begin(),drop.end
(),inactive_temp.begin());
492 //
493 gsl_vector *x_active_col;
494 x_active_col = gsl_vector_calloc(nRow);
495 gsl_matrix *xa;
496 xa = gsl_matrix_calloc(nRow, (size_t)active.size());
497 gsl_matrix *ga;
498 ga = gsl_matrix_calloc((size_t)active.size(), (size_t)active.size())
;
499
500 double s_temp_0;
501 gsl_vector *s;
502 if(active.empty())
503 {
504 printf("[%d][%d] active is empty, maybe hard to allocate s\n",
tid, nIter);
505 }
506 s = gsl_vector_calloc(active.size());
507
508 for(size_t k = 0; k < (size_t)active.size(); k++)
509 {
510 double c_active_temp = gsl_vector_get(c, active[k]);
511 if(c_active_temp >= RESOLUTION_OF_LARS)//s = sign(c(active));
512 {gsl_vector_set(s,k,1); s_temp_0 = 1;}//s = 1;
513
514 if(c_active_temp <= -RESOLUTION_OF_LARS)
66
APPENDIX A. APPENDIX
515 {gsl_vector_set(s,k,-1); s_temp_0 = -1;}//s = -1;
516
517 if(c_active_temp < RESOLUTION_OF_LARS && c_active_temp > -
RESOLUTION_OF_LARS)
518 {gsl_vector_set(s,k,0); s_temp_0 = 0;}//s = 0;
519 //printf("s = sign(c(active)), s[k] = %f\n",s_temp_0);
520 gsl_matrix_get_col(x_active_col, x, active[k]);//xa = x(:,active
).*repmat(sˆT,n,1);
521 gsl_vector_scale(x_active_col, s_temp_0);
522 gsl_matrix_set_col(xa, k, x_active_col);
523 }
524 gsl_vector_free(x_active_col);
525
526 if(!no_xtx)
527 {
528 double ssT_temp;
529 double xtx_temp;
530 for(size_t i = 0; i < (size_t)active.size(); i++)//ga = xtx(
active,active).*(s*sˆT);
531 {
532 for(size_t j = 0; j < (size_t)active.size(); j++)
533 {
534 xtx_temp = gsl_matrix_get(xtx, active[i], active[j]);
535 //printf("xtx(active[%d],active[%d]) = %lf\n",active[i],
active[j],xtx_temp);
536 ssT_temp = gsl_vector_get(s,i)*gsl_vector_get(s,j);
537 xtx_temp = xtx_temp*ssT_temp;
538 //printf("after xtx_temp*s*s \n");
539 gsl_matrix_set(ga, i, j, xtx_temp);
540 //printf("ga(%d,%d) = %lf\n",i,j,xtx_temp);
541 }
542 }
543 }
544 gsl_vector_free(s);
67
APPENDIX A. APPENDIX
545
546 if(no_xtx)//ga = xaˆT*xa;
547 {
548 gsl_matrix_set_zero(ga);
549 gsl_blas_dgemm(CblasTrans, CblasNoTrans, 1.0, xa, xa, 1.0, ga);
550
551 }
552 if(ga->size2<1)
553 {
554 printf("[%d][%d] the nCol of ga is zero, hard to allocate Sga.\n
", tid, nIter);
555 }
556 //invga = ga\eye(size(ga,1));
557 int ss;
558 gsl_matrix *invga = gsl_matrix_calloc(ga->size2, ga->size2);
559 gsl_permutation *perm = gsl_permutation_alloc(ga->size2);
560 //if(tid == 10 ) printf("[%d][%d] before inverting ga.\n", tid,
nIter);
561
562 if(othersStop == 1)
563 {
564 canPrintSol = 0;
565 break;
566 }
567
568 gsl_matrix *ga_copy = gsl_matrix_calloc(ga->size1, ga->size2);
569 gsl_matrix_memcpy(ga_copy, ga);
570 gsl_matrix *Vga = gsl_matrix_calloc(ga->size2, ga->size2);
571 gsl_vector *Sga = gsl_vector_calloc(ga->size2);
572
573 int status = gsl_linalg_SV_decomp_jacobi(ga_copy, Vga, Sga);
574
575 if(status)
576 { /* an error occurred */
68
APPENDIX A. APPENDIX
577 printf("[%d][%d] Jacobi error occured\n", tid, nIter);
578 canPrintSol = 0;
579 break;
580 }
581
582 //gsl_linalg_SV_decomp_jacobi(ga_copy, Vga, Sga);
583 if(gsl_vector_min(Sga)<1e-9)
584 {
585 //printf("[%d][%d] ga is not invertible\n", tid, nIter);
586 canPrintSol = 0;
587 break;
588 }
589 else
590 {
591 double cond_ga;
592 cond_ga = gsl_vector_max(Sga)/gsl_vector_min(Sga);
593 if(cond_ga>1e9)
594 {
595 //printf("[%d][%d] cond(ga) is too large\n", tid, nIter);
596 canPrintSol = 0;
597 break;
598 }
599 }
600 gsl_matrix_free(ga_copy);
601 gsl_matrix_free(Vga);
602 gsl_vector_free(Sga);
603
604 gsl_linalg_LU_decomp(ga, perm, &ss);
605 gsl_linalg_LU_invert(ga, perm, invga);// THIS MIGHT CHANGE GA?
606 //if(tid == 10 ) printf("[%d][%d] after inverting ga.\n", tid, nIter
);
607 if(ga->size1<1)
608 {
609 printf("[%d][%d] the nRow of ga is zero, hard to allocate
69
APPENDIX A. APPENDIX
invga_col.\n", tid, nIter);
610 }
611 if(ga->size2<1)
612 {
613 printf("[%d][%d] the nCol of ga is zero, hard to allocate
sum_invga_col&invga_row.\n", tid, nIter);
614 }
615 gsl_vector *invga_col;
616 invga_col = gsl_vector_calloc(ga->size1);
617 gsl_vector *sum_invga_col;
618 sum_invga_col = gsl_vector_calloc(ga->size2);
619 for(size_t k = 0; k < ga->size2; k++)
620 {
621 gsl_matrix_get_col(invga_col, invga, k);
622 gsl_vector_set(sum_invga_col, k, sum_gsl_vec(invga_col));
623 }
624 double aa;//aa = sum(sum(invga))ˆ(-1/2);
625 aa = sum_gsl_vec(sum_invga_col);
626 aa = sqrt(aa);
627 aa = 1/aa;
628 //printf("aa = %lf\n",aa);
629 gsl_vector_free(invga_col);
630 gsl_vector_free(sum_invga_col);
631 //wa = aa*sum(invga,2);
632 gsl_vector *invga_row;
633 invga_row = gsl_vector_calloc(ga->size2);
634
635 if(invga->size1<1)
636 {
637 printf("[%d][%d] the nRow of invga is zero, hard to allocate wa
.\n", tid, nIter);
638 }
639 gsl_vector *wa;
640 wa = gsl_vector_calloc(invga->size1);
70
APPENDIX A. APPENDIX
641 double wa_temp;
642 for(size_t k = 0; k < ga->size1; k++)
643 {
644 gsl_matrix_get_row(invga_row, invga, k);
645 wa_temp = aa*sum_gsl_vec(invga_row);
646
647 gsl_vector_set(wa, k, wa_temp);
648 }
649 gsl_vector_free(invga_row);
650 gsl_matrix_free(invga);
651 gsl_permutation_free(perm);
652 //ua = xa*wa;
653 gsl_vector *ua;
654 ua = gsl_vector_calloc(nRow);
655 gsl_vector_set_zero(ua);
656 gsl_blas_dgemv(CblasNoTrans, 1.0, xa, wa, 1.0, ua);
657 // test using Eq 2.7
658 if(xa->size2<1)
659 {
660 printf("[%d][%d] the nCol of xa is zero, hard to allocate test_1
.\n", tid, nIter);
661 }
662 gsl_vector *test_1;//test_1 = xaˆT*ua;
663 test_1 = gsl_vector_calloc(xa->size2);
664 gsl_vector_set_zero(test_1);
665 gsl_blas_dgemv(CblasTrans, 1.0, xa, ua, 1.0, test_1);
666 if(test_1->size<1)
667 {
668 printf("[%d][%d] the size of test_1 is zero, hard to allocate
test_2.\n", tid, nIter);
669 }
670 gsl_vector *test_2; //test_2 = aa*ones(size(test_1));
671 test_2 = gsl_vector_calloc(test_1->size);
672
71
APPENDIX A. APPENDIX
673 double test_1_2_temp;
674 for(size_t k = 0; k < xa->size2; k++)
675 {
676 test_1_2_temp = gsl_vector_get(test_1, k);
677 test_1_2_temp = test_1_2_temp - aa;
678 if(test_1_2_temp<0) test_1_2_temp = -test_1_2_temp;
679 gsl_vector_set(test_2, k, test_1_2_temp);
680 }
681
682 double test_1_2 = sum_gsl_vec(test_2);
683
684 double test_3;
685
686 gsl_blas_ddot(ua, ua, &test_3);
687
688 test_3 = sqrt(test_3);
689 test_3 = test_3 - 1;
690 if(test_3 <0) test_3 = -test_3;
691
692 gsl_vector_free(test_1);
693 gsl_vector_free(test_2);
694 if(test_1_2 > RESOLUTION_OF_LARS*100 || test_3>RESOLUTION_OF_LARS
*100)
695 {
696 //printf("[%d][%d] Eq 2.7 test failure.\n", tid, nIter);
697 canPrintSol = 0;
698 break;
699 }
700 //a = xˆT*ua;
701 gsl_vector *a;
702 a = gsl_vector_calloc(nCol);
703 gsl_vector_set_zero(a);
704 gsl_blas_dgemv(CblasTrans, 1.0, x, ua, 1.0, a);
705
72
APPENDIX A. APPENDIX
706 double c_inactive_temp;
707 double a_inactive_temp;
708 double tmp_1;
709 double tmp_2;
710 std::vector<double> tmp;
711 for(size_t k = 0; k < (size_t)inactive.size(); k++)
712 {
713 c_inactive_temp = gsl_vector_get(c, inactive[k]);//tmp_1 = (
C_max-c(inactive))./(aa-a(inactive));
714 a_inactive_temp = gsl_vector_get(a, inactive[k]);//tmp_2 = (
C_max+c(inactive))./(aa+a(inactive));
715 tmp_1 = (C_max-c_inactive_temp)/(aa-a_inactive_temp);
716 tmp_2 = (C_max+c_inactive_temp)/(aa+a_inactive_temp);
717 //gsl_matrix_set(tmp_3, k, 1, tmp_1);//tmp_3 = [tmp_1,tmp_2];
718 //gsl_matrix_set(tmp_3, k, 2, tmp_2);
719 if(tmp_1>0) tmp.push_back(tmp_1);//tmp = tmp_3(find(tmp_3>0));
720 if(tmp_2>0) tmp.push_back(tmp_2);
721 }
722 double gamm;
723 if(tmp.empty())
724 gamm = C_max/aa;
725 else
726 gamm = *std::min_element(tmp.begin(),tmp.end());//gamma = min(
tmp);
727
728 gsl_vector *d;//d = zeros(1,nCol);
729 d = gsl_vector_calloc(nCol);
730 gsl_vector_set_zero(d);
731 gsl_vector *tmpp;//tmp = zeros(1,nCol);
732 tmpp = gsl_vector_calloc(nCol);
733 gsl_vector_set_zero(tmpp);
734 double d_temp;
735 double beta_temp;
736 std::vector<double> tmpp2;
73
APPENDIX A. APPENDIX
737 double s_temp;
738 for(size_t k = 0; k < (size_t)active.size(); k++)
739 {
740 double c_active_temp2 = gsl_vector_get(c, active[k]);
741 if(c_active_temp2 >= RESOLUTION_OF_LARS)//s = sign(c(active));
742 s_temp = 1;
743 if(c_active_temp2 <= -RESOLUTION_OF_LARS)
744 s_temp = -1;
745 if(c_active_temp2 < RESOLUTION_OF_LARS && c_active_temp2 > -
RESOLUTION_OF_LARS)
746 s_temp = 0;
747 //printf("active[k]=%d, s_temp = %f\n",active[k],s_temp);
748 d_temp = s_temp*gsl_vector_get(wa, k);
749 //printf("d[active[k] = s_temp.*wa = %lf\n",d_temp);
750 if(d_temp==0)//if (length(find(d(active)==0)))
751 {
752 //printf("[%d][%d] Something wrong with vector d: Eq 3.4\n",
tid, nIter);
753 canPrintSol = 0;
754 break;
755 }
756 gsl_vector_set(d, active[k], d_temp);//d(active) = s.*wa;
757 }
758 for(size_t k = 0; k < (size_t)active.size(); k++)
759 {
760 beta_temp = gsl_vector_get(beta, active[k]);
761 d_temp = gsl_vector_get(d, active[k]);
762 beta_temp = -beta_temp/d_temp;
763
764 gsl_vector_set(tmpp, active[k], beta_temp);//tmp(active) = -1*
beta(active)./d(active);
765 if(beta_temp>0)
766 {
767 tmpp2.push_back(beta_temp);//tmp2 = tmp(find(tmp>0));
74
APPENDIX A. APPENDIX
768 //printf("%lf goes to tmpp2 in tmp2 = tmp(find(tmp>0));\n",
beta_temp);
769 }
770 }
771
772 drop.clear();//drop = [];
773 double gamm_tilde = GSL_POSINF;
774
775 if(!tmpp2.empty())
776 {
777 std::vector<double>::iterator min_tmpp2_p;
778 min_tmpp2_p = std::min_element(tmpp2.begin(),tmpp2.end());
779
780 double & min_tmpp2 = *min_tmpp2_p;//????
781
782 if(!tmpp2.empty() && gamm >= min_tmpp2)
783 {
784 gamm_tilde = min_tmpp2; //gamma_tilde = min(tmp2);
785 for(size_t k = 0; k < nCol; k++)//drop = find(tmp==
gamma_tilde);
786 {
787 if(gsl_vector_get(tmpp, k)==gamm_tilde)
788 {
789 drop.push_back(k);
790 //if(tid==10) fprintf(f_act_record, "[%d] %lu added
to drop at line: 808\n", nIter, k);
791 }
792 }
793 }
794 }
795 tmpp2.clear();
796 gsl_vector_free(tmpp);
797
798 double min_gamm = gamm;
75
APPENDIX A. APPENDIX
799 if(gamm > gamm_tilde) min_gamm = gamm_tilde;
800 gsl_vector_memcpy(mu_a_plus, mu_a);
801 gsl_blas_daxpy(min_gamm, ua, mu_a_plus);//mu_a_plus = mu_a + min(
gamm,gamm_tilde)*ua;
802
803 gsl_vector_memcpy(beta_new, beta);
804 gsl_blas_daxpy(min_gamm, d, beta_new);//beta_new = beta + min(gamm,
gamm_tilde)*d;
805
806 //active = setdiff(active,drop);
807 std::vector<size_t> active_temp2;
808 std::set_difference(active.begin(),active.end(),drop.begin(),drop.
end(),std::inserter(active_temp2, active_temp2.begin()));
809 active = active_temp2;
810 active_temp2.clear();
811
812 //inactive = setdiff(all_candidate,active);
813 std::vector<size_t> inactive_temp2;
814 std::set_difference(all_candidate.begin(),all_candidate.end(),active
.begin(),active.end(),std::inserter(inactive_temp2,
inactive_temp2.begin()));
815 inactive = inactive_temp2;
816 inactive_temp2.clear();
817 //printf("line 721:size of inactive is %lu.\n",inactive.size());
818 for(size_t k = 0; k < (size_t)drop.size(); k++)//beta_new(drop)=0;
819 {
820 gsl_vector_set(beta_new, drop[k], 0);
821 }
822 double C_max_aa;
823 C_max_aa = C_max/aa;
824 gsl_vector_memcpy(mu_a_OLS, mu_a);
825 gsl_blas_daxpy(C_max_aa, ua, mu_a_OLS);//mu_a_OLS = mu_a + C_max/aa*
ua;
826
76
APPENDIX A. APPENDIX
827 gsl_vector_memcpy(beta_OLS, beta);
828 gsl_blas_daxpy(C_max_aa, d, beta_OLS);//beta_LS = beta + C_max/aa*d;
829
830 //MSE = sum((y-mu_a_OLS).ˆ2)/length(y);
831 gsl_vector *y_minus_mu_a;
832 y_minus_mu_a = gsl_vector_calloc(nRow);
833 gsl_vector_memcpy(y_minus_mu_a, y);
834 gsl_vector_sub(y_minus_mu_a, mu_a_OLS);
835 gsl_vector_mul(y_minus_mu_a, y_minus_mu_a);
836 double sum_y_mu_a = sum_gsl_vec(y_minus_mu_a);
837 MSE = sum_y_mu_a/nRow;
838
839 gsl_vector_free(y_minus_mu_a);
840 gsl_vector_free(d);
841 gsl_vector_free(ua);
842 gsl_vector_free(wa);
843 gsl_vector_free(a);
844 gsl_matrix_free(xa);
845 gsl_matrix_free(ga);
846
847 //update
848 gsl_vector_memcpy(mu_a, mu_a_plus);//mu_a = mu_a_plus;
849 gsl_vector_memcpy(beta, beta_new);//beta = beta_new;
850 double d_beta_temp;
851 gsl_vector *beta_backup;
852 beta_backup = gsl_vector_calloc(nCol);
853 gsl_vector_memcpy(beta_backup, beta);
854 if(MSE <= MSE_th)
855 {
856 //beta = beta(active)./sx(active);
857 gsl_vector_set_zero(beta);
858 for(size_t k = 0; k < (size_t)active.size(); k++)
859 {
860 d_beta_temp = gsl_vector_get(beta_backup, active[k]);
77
APPENDIX A. APPENDIX
861 d_beta_temp = d_beta_temp/gsl_vector_get(sx, active[k]);
862 gsl_vector_set(beta, active[k], d_beta_temp);
863 }
864 double temp_beta_element;
865 std::vector<size_t> active_backup(active);
866 for(size_t k = 0; k < (size_t)active_backup.size(); k++)
867 {
868 temp_beta_element = gsl_vector_get(beta, active_backup[k]);
869 if (temp_beta_element >= -BETA_TH && temp_beta_element <=
BETA_TH)
870 {
871 std::vector<size_t>::iterator it_act;
872 it_act = std::find (active.begin(), active.end(),
active_backup[k]);
873
874 active.erase(it_act);
875 gsl_vector_set(beta, active_backup[k], 0);
876 }
877 }
878 active_backup.clear();
879 if(active.size()>4)
880 {
881 //printf("[%d] Invalid result despite of convergence.\n",
tid);
882 canPrintSol = 0;
883 break;
884 }
885 canPrintSol = 1;
886 //othersStop = 1;//need to be broadcasted!
887 break;
888 }
889 gsl_vector_free(beta_backup);
890 }
891 //if(nIter>=MAX_ITER){printf("[%d] Reached nIter limit!\n", tid);}
78
APPENDIX A. APPENDIX
892
893 gsl_vector_free(mu_a_plus);
894 gsl_vector_free(mu_a_OLS);
895 gsl_vector_free(beta_new);
896 gsl_vector_free(beta_OLS);
897 gsl_vector_free(c);
898 gsl_vector_free(mu_a);
899 gsl_vector_free(y);
900 gsl_matrix_free(x);
901 gsl_vector_free(sx);
902
903 if(!no_xtx) {gsl_matrix_free(xtx);}
904
905 if(!canPrintSol)
906 {
907 gsl_matrix_free(A);
908 //gsl_vector_free(b);
909 gsl_vector_free(beta);
910 //gsl_matrix_free(groupIdx);
911 active.clear();
912 }
913
914 if(canPrintSol)// && !othersStop)
915 {
916 othersStop = 1;
917
918 end_time = omp_get_wtime();
919 exe_time = end_time - start_time;
920 printf("========Solved successfully by tid %d========\n",tid);
921
922 printf("[%d]Thread Runtime = %f\n", tid, exe_time);
923 printf("============Solution by tid %d============\n", tid);
924 //fprintf(f_record, "[%d]Runtime = %f\n", tid, exe_time);
925 //printf("[%d]Runtime = %f\n", tid, exe_time);
79
APPENDIX A. APPENDIX
926 printf("[%d]====Number of Iteration: %d====\n",tid, nIter);
927 for(size_t iii = 0; iii<active.size(); iii++)
928 {
929 printf("[%d]------active_idx:%lu------beta[active_idx]:%6.8f
------\n",tid, active[iii], gsl_vector_get(beta,active[iii])
);
930 }
931
932 gsl_matrix_free(A);
933 //gsl_vector_free(b);
934 gsl_vector_free(beta);
935 //gsl_matrix_free(groupIdx);
936 active.clear();
937 printf("
====************************************************************====\
n");
938
939 }
940 #pragma omp barrier
941 end_time = omp_get_wtime();
942 if(tid == 0)
943 {
944 exe_time = end_time - start_time;
945 printf("[%d] Runtime = %f\n", iFaultCase, exe_time);
946 gsl_vector_set(time_array, iFaultCase-startFaultCaseID, exe_time);
947 }
948 }
949 gsl_vector_free(b);
950 }
951 //end_time = omp_get_wtime();
952 //exe_time = end_time - start_time;
953
954 printf("=============Average Runtime = %f\n", sum_gsl_vec(time_array)/
nTotalFaultCases);
80
APPENDIX A. APPENDIX
955
956 gsl_matrix_free(Am);
957 gsl_matrix_free(groupIdx);
958 gsl_vector_free(time_array);
959
960 /*========================END OF SOLVING==============================*/
961
962 return 0;
963 }
A.3
CUDA C Code for GPU implementation
This piece of attached code is the CUDA C code used for GPU implementation.
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4 #include <time.h>
5 #include <cuda_runtime.h>
6 // for std
7 #include <math.h>
8 #include <algorithm>
9 #include <string>
10 #include <vector>
11 // includes, cuda
12 //#include <cublas.h>
13 #include <cublas_v2.h>
14 #include <thrust/host_vector.h>
15 #include <thrust/device_vector.h>
16 #include <thrust/device_ptr.h>
17
81
APPENDIX A. APPENDIX
18 #include <thrust/sort.h>
19 #include <thrust/merge.h>
20 #include <thrust/transform.h>
21 #include <thrust/reduce.h>
22 #include <thrust/scan.h>
23 #include <thrust/set_operations.h>
24 #include <thrust/iterator/discard_iterator.h>
25 #include <thrust/device_ptr.h>
26 #include <thrust/iterator/counting_iterator.h>
27 #include <thrust/iterator/transform_iterator.h>
28 #include <thrust/iterator/permutation_iterator.h>
29 #include <thrust/extrema.h>
30
31 #include <thrust/functional.h>
32 #include <thrust/sequence.h>
33 #include <thrust/fill.h>
34
35
36 #define MM_MAX_LINE_LENGTH 1025
37 #define MM_PREMATURE_EOF 12
38 #define PI 3.141592653589793238462643383279
39 #define MSE_th 0.001
40 #define RESOLUTION_OF_LARS 1e-12
41 #define REGULARIZATION_FACTOR 1e-11
42 #define MAX_ITER 50
43 #define INF 1e12
44 #define BETA_TH 1e-3
45
46 // matrix indexing convention
47 #define id(m, n, ld) (((n) * (ld) + (m)))
48
49 int new_mm_read_mtx_array_size(FILE *f, int *M, int *N)
50 {
51 char line[MM_MAX_LINE_LENGTH];
82
APPENDIX A. APPENDIX
52 int num_items_read;
53 /* set return null parameter values, in case we exit with errors */
54 *M = *N = 0;
55
56 /* now continue scanning until you reach the end-of-comments */
57 do
58 {
59 if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL)
60 return MM_PREMATURE_EOF;
61 }while (line[0] == '%');
62
63 /* line[] is either blank or has M,N, nz */
64 if (sscanf(line, "%d %d", M, N) == 2)
65 return 0;
66
67 else /* we have a blank line */
68 do
69 {
70 num_items_read = fscanf(f, "%d %d", M, N);
71 if (num_items_read == EOF) return MM_PREMATURE_EOF;
72 }
73 while (num_items_read != 2);
74
75 return 0;
76 }
77
78 struct MinusC: public thrust::unary_function<double, double>
79 {
80 const double offset;
81 __host__ __device__ MinusC(double _offset) :
82 offset(_offset)
83 {
84 }
85 __host__ __device__ double operator()(double x)
83
APPENDIX A. APPENDIX
86 {
87 return (double) x-offset;
88 }
89 };
90 struct DivideByC: public thrust::unary_function<double, double>
91 {
92 const double offset;
93 __host__ __device__ DivideByC(double _offset) :
94 offset(_offset)
95 {
96 }
97 __host__ __device__ double operator()(double x)
98 {
99 return (double) x/offset;
100 }
101 };
102 struct MultipleByC: public thrust::unary_function<double, double>
103 {
104 const double offset;
105 __host__ __device__ MultipleByC(double _offset) :
106 offset(_offset)
107 {
108 }
109 __host__ __device__ double operator()(double x)
110 {
111 return (double) x*offset;
112 }
113 };
114 struct Inv: public thrust::unary_function<double, double>
115 {
116 __host__ __device__ double operator()(double x)
117 {
118 return (double) 1.0 / x;
119 }
84
APPENDIX A. APPENDIX
120 };
121 struct Abs: public thrust::unary_function<double, double>
122 {
123 __host__ __device__ double operator()(double x)
124 {
125 if(x<0) x = -x;
126 return (double) x;
127 }
128 };
129 struct SquareRoot: public thrust::unary_function<double, double>
130 {
131 __host__ __device__ double operator()(double x)
132 {
133 return (double) sqrt(x);
134 }
135 };
136 struct Square: public thrust::unary_function<double, double>
137 {
138 __host__ __device__ double operator()(double x)
139 {
140 return (double) x*x;
141 }
142 };
143 struct FilterIgnores: public thrust::unary_function<double, double>
144 {
145 __host__ __device__ double operator()(double x)
146 {
147 if(x< RESOLUTION_OF_LARS)
148 return INF;
149 else
150 return x;
151 }
152 };
153 // main
85
APPENDIX A. APPENDIX
154 int main(int argc, char** argv)
155 {
156
157 if (argc <2){printf("need exactly 1 argument for #threads\n");return 0;}
158
159 //double start_time, end_time, exe_time;
160
161 int caseID = atoi(argv[1]);
162 printf("Will solve nBus = %d case.\n", caseID);
163
164 int startFaultCaseID = atoi(argv[2]);
165
166 int endFaultCaseID = atoi(argv[3]);
167 int nTotalFaultCases = endFaultCaseID-startFaultCaseID+1;
168 //int nProcs = atoi(argv[2]);
169 /*=========================BUILD THE PROBLEM======================== */
170
171 bool no_xtx = 0;
172
173 FILE *f;
174 int m, n;
175
176 double entry;
177 /* Read A */
178 char s[20];
179 sprintf(s, "Data/ieee%d/Aeq_%d.dat", caseID, caseID);
180 printf("reading %s\n", s);
181
182 f = fopen(s, "r");
183 if(f == NULL) {
184 printf("ERROR: %s does not exist, exiting.\n", s);
185 exit(EXIT_FAILURE);
186 }
187 //mm_read_mtx_array_size(f, &m, &n);
86
APPENDIX A. APPENDIX
188 new_mm_read_mtx_array_size(f, &m, &n);
189
190 int nRow = m;
191 int nCol = n;
192
193 printf("nRow = %d, nCol = %d.\n", nRow, nCol);
194 //A = (double*)malloc(nRow*nCol * sizeof(double));
195 thrust::host_vector<double> A(nRow*nCol);
196
197 int row, col;
198
199 for(int i = 0; i < m*n; i++) {
200 row = i % m;
201 col = floor((double)i/m);
202 fscanf(f, "%lf", &entry);
203 //gsl_matrix_set(Am, row, col, entry);
204
205 A[id(row, col, nRow)] = entry;
206 }
207
208 fclose(f);
209
210 thrust::host_vector<double> timearray(nTotalFaultCases);
211 int iFaultCase ;
212 for(iFaultCase = startFaultCaseID; iFaultCase <= endFaultCaseID; iFaultCase
++ )
213 {
214
215 /* Read b */
216 sprintf(s, "Data/ieee%d/beq_%d.dat", caseID, iFaultCase);
217 printf("reading %s\n", s);
218
219 f = fopen(s, "r");
220 if(f == NULL) {
87
APPENDIX A. APPENDIX
221 printf("ERROR: %s does not exist, exiting.\n", s);
222 exit(EXIT_FAILURE);
223 }
224 //mm_read_mtx_array_size(f, &m, &n);
225 new_mm_read_mtx_array_size(f, &m, &n);
226 if(m!=nRow) {
227 printf("ERROR: length of b doesn't match with nRow of A.\n", s);
228 exit(EXIT_FAILURE);
229 }
230 //b = gsl_vector_calloc(m);
231 //b = (double*)malloc(nRow * sizeof(double));
232 thrust::host_vector<double> b(nRow);
233
234 for (int i = 0; i < nRow; i++) {
235 fscanf(f, "%lf", &entry);
236 b[id(i,0,nRow)] = entry;
237 }
238 fclose(f);
239
240 /*====================INITIALIZATION OF TIMING AND CUBLAS
=====================*/
241
242 //clock_t start=clock();
243 //double dtime;
244
245 // initialize cublas
246 cublasStatus_t status;
247 cublasHandle_t hd;
248
249 status = cublasCreate(&hd);
250
251
252 if (status != CUBLAS_STATUS_SUCCESS)
253 {
88
APPENDIX A. APPENDIX
254 fprintf(stderr, "!!!! CUBLAS initialization error\n");
255 return EXIT_FAILURE;
256 }
257 else printf("CUBLAS initializations success\n");
258
259 /*===========================SOLVE THE PROBLEM
===============================*/
260
261 // transfer host to device
262 thrust::device_vector<double> dx(A.begin(), A.end());
263 thrust::device_vector<double> dy(b.begin(), b.end());
264
265 clock_t start=clock();
266 double dtime;
267
268 thrust::device_vector<int> all_candidate(nCol);
269 // initialize all_candidate to 0,1,2,3, .... for(int i=0; i<nCol; i++)
all_candidate.push_back(i);
270 thrust::sequence(all_candidate.begin(), all_candidate.end(),0);
271
272 // mx = mean(xin); x = xin - repmat(mx,size(xin,1),1);
273 thrust::device_vector<double> ones(nRow, 1);
274 thrust::device_vector<double> sumA(nCol, 0);
275
276
277 double* p_dx = thrust::raw_pointer_cast(&dx[0]);
278 double* p_dy = thrust::raw_pointer_cast(&dy[0]);
279 double* p_ones = thrust::raw_pointer_cast(&ones[0]);
280 double* p_sumA = thrust::raw_pointer_cast(&sumA[0]);
281
282
283 const double c1 = 1.0;
284 const double c0 = 0.0;
285 const double c_1 = -1.0;
89
APPENDIX A. APPENDIX
286
287 cublasDgemv(hd, CUBLAS_OP_T, nRow, nCol, &c1, p_dx, nRow, p_ones, 1, &c0
, p_sumA, 1);
288
289 thrust::transform(sumA.begin(), sumA.end(), sumA.begin(), DivideByC((
double)nRow));
290
291 //B nROw*nCol, make each row of B sumA[k]
292 thrust::device_vector<double> BsumA(nRow*nCol);
293 for(int k = 0; k < nCol; k++)// TOO SLOW!!!
294 {
295 thrust::fill(BsumA.begin()+k*nRow, BsumA.begin()+(k+1)*nRow, sumA[k
]);
296 }
297 double* p_BsumA = thrust::raw_pointer_cast(&BsumA[0]);
298 thrust::device_vector<double> dx_copy = dx;
299 double* p_dxcopy = thrust::raw_pointer_cast(&dx_copy[0]);
300 cublasDgeam(hd, CUBLAS_OP_N, CUBLAS_OP_N, nRow, nCol, &c1, p_dxcopy,
nRow, &c_1, p_BsumA, nRow, p_dx, nRow);
301 dx_copy.clear();
302
303 // sx = sum(x.ˆ2).ˆ(1/2);ignores = find(sx < RESOLUTION_OF_LARS); sx(
ignores)=inf;
304 // x = x ./ repmat(sx,size(xin,1),1);
305 thrust::device_vector<double> sx = dx;
306 thrust::transform(sx.begin(), sx.end(), sx.begin(), Square());
307
308 double* p_sx = thrust::raw_pointer_cast(&sx[0]);
309 cublasDgemv(hd, CUBLAS_OP_T, nRow, nCol, &c1, p_sx, nRow, p_ones, 1, &c0
, p_sumA, 1);
310
311 thrust::transform(sumA.begin(), sumA.end(), sumA.begin(), SquareRoot());
312 thrust::device_vector<double> ssx(sumA.begin(), sumA.end());
313
90
APPENDIX A. APPENDIX
314 //thrust::device_vector<int> all_candidate_backup = all_candidate;
315 for(int k = 0; k <nCol; k++)
316 {
317 if(sumA[k]<RESOLUTION_OF_LARS)
318 {
319 sumA[k] = INF;
320 all_candidate.erase(all_candidate.begin()+k);
321 }
322 //thrust::transform(dx.begin()+k*nRow, dx.begin()+(k+1)*nRow, dx.
begin()+k*nRow, DivideByC(sumA[k]));
323 //cublasDscal(hd, nCol, 1/sumA[k], p_dx, 1);
324 }
325 thrust::device_vector<double> forDivide(nCol*nCol,0);
326 for(int k = 0; k < nCol; k++)// TOO SLOW!!!
327 {
328 forDivide[id(k,k,nCol)]=1/sumA[k];
329 }
330 double* p_forDivide = thrust::raw_pointer_cast(&forDivide[0]);
331 thrust::device_vector<double> dx_copy2 = dx;
332 double* p_dxcopy2 = thrust::raw_pointer_cast(&dx_copy2[0]);
333 cublasDgemm(hd, CUBLAS_OP_N, CUBLAS_OP_N, nRow, nCol, nCol, &c1,
p_dxcopy2, nRow, p_forDivide, nCol, &c0, p_dx, nRow);
334 dx_copy2.clear();
335 sumA.clear();
336 ones.clear();
337 sx.clear();
338
339 thrust::device_vector<double> xtx(1, 0);
340 //double* p_dxtx = thrust::raw_pointer_cast(&xtx[0]);
341 if (nCol*nCol > 100000000)
342 {
343 printf("Too large matrix, lars will not pre-calculate xtx\n");
344 no_xtx = 1;
345 }
91
APPENDIX A. APPENDIX
346 else
347 {
348 xtx.resize(nCol*nCol);
349 //thrust::device_vector<double> xtx(nCol*nCol, 0);
350 double* p_dxtx = thrust::raw_pointer_cast(&xtx[0]);
351 //xtx = xˆT*x;
352 cublasDgemm(hd, CUBLAS_OP_T, CUBLAS_OP_N, nCol, nCol, nRow, &c1,
p_dx, nRow, p_dx, nRow, &c0, p_dxtx, nCol);
353 }
354
355
356 //y = yin-mean(yin);
357 double mean_y;
358 cublasDdot (hd, nRow, p_dy, 1, p_ones, 1, &mean_y);
359 mean_y = mean_y/nRow;
360 thrust::transform(dy.begin(), dy.end(), dy.begin(), MinusC(mean_y));
361 ones.clear();
362
363 /*==========INITIALIZATION==========*/
364 thrust::device_vector<int> d_active;
365 ////active.clear();
366 //inactive = all_candidate;
367 thrust::device_vector<int> inactive=all_candidate;
368 thrust::device_vector<double> mu_a(nRow, 0);
369 thrust::device_vector<double> mu_a_plus(nRow, 0);
370 thrust::device_vector<double> mu_a_OLS(nRow, 0);
371 thrust::device_vector<double> beta(nCol, 0);
372 thrust::device_vector<double> beta_new(nCol, 0);
373 thrust::device_vector<double> beta_OLS(nCol, 0);
374 //MSE = sum(y.ˆ2)/length(y);
375 thrust::device_vector<double> y2=dy;
376 double* p_y2 = thrust::raw_pointer_cast(&y2[0]);
377 thrust::transform(y2.begin(), y2.end(), y2.begin(), Square());
378 double MSE;
92
APPENDIX A. APPENDIX
379 cublasDasum(hd, nRow, p_y2, 1, &MSE);
380 MSE = MSE/nRow;
381 y2.clear();
382
383 thrust::device_vector<double> c(nCol);
384
385 //double C_max;
386 //C_max = max(abs(c));
387 //int C_max_ind;
388 thrust::device_vector<int> C_max_ind_pl;
389 thrust::device_vector<int> drop;
390 int nIter = 0;
391 bool canPrintSol = 0;
392 /*==============MAIN LOOP===============*/
393
394 while(nIter < MAX_ITER)
395 {
396
397 if(MSE <= MSE_th)
398 {
399 printf("MSE <= MSE_th=%6.8f, break.\n", MSE_th);
400 canPrintSol = 0;
401 break;
402 }
403 const double cc1 = 1.0;
404 const double cc0 = 0.0;
405 nIter += 1;
406
407 //c = xˆT*(y-mu_a);
408 thrust::device_vector<double> y_mu_a(nRow);
409 thrust::transform(dy.begin(), dy.end(), mu_a.begin(), y_mu_a.begin()
, thrust::minus<double>());
410
411
93
APPENDIX A. APPENDIX
412 double* p_y_mua = thrust::raw_pointer_cast(&y_mu_a[0]);
413 thrust::device_vector<double> c(nCol, 0);
414 double* p_c = thrust::raw_pointer_cast(&c[0]);
415
416 cublasDgemv(hd, CUBLAS_OP_T, nRow, nCol, &cc1, p_dx, nRow, p_y_mua,
1, &cc0, p_c, 1);
417 y_mu_a.clear();
418
419 // [C_max,C_max_ind] = max(abs(c(inactive))); C_max_ind = inactive
(C_max_ind);
420 thrust::device_vector<int> C_max_ind(1);
421 int* p_cmaxInd = thrust::raw_pointer_cast(&C_max_ind[0]);
422 thrust::device_vector<double> c_inact(inactive.size());
423
424 for(int k = 0; k < inactive.size(); k++)
425 {
426 c_inact[k] = c[inactive[k]];
427 if(c_inact[k] < 0) c_inact[k] = -c_inact[k];
428 }
429 const double* p_cinact = thrust::raw_pointer_cast(&c_inact[0]);
430
431 thrust::device_vector<double>::iterator iter_max = thrust::
max_element(c_inact.begin(), c_inact.end());
432
433 unsigned int position = iter_max - c_inact.begin();
434 double C_max = *iter_max;
435
436 C_max_ind[0] = inactive[position];
437
438 for(int k = 0; k < inactive.size(); k++)
439 {
440 if(c_inact[k] >= C_max-RESOLUTION_OF_LARS && inactive[k]!=
C_max_ind[0]) C_max_ind.push_back(inactive[k]);
441 }
94
APPENDIX A. APPENDIX
442
443
444 //active = sort(union(active,C_max_ind));
445 // set_union returns an iterator C_end denoting the end of input
446 thrust::device_vector<int>::iterator C_end;
447 thrust::device_vector<int> new_active0(d_active.size()+C_max_ind.
size());
448 C_end = thrust::set_union(d_active.begin(), d_active.end(),
C_max_ind.begin(), C_max_ind.end(), new_active0.begin());
449 // shrink C to exactly fit output
450 new_active0.erase(C_end, new_active0.end());
451 d_active = new_active0;
452 new_active0.clear();
453 thrust::sort(d_active.begin(), d_active.end(), thrust::less<int>());
454
455
456 //inactive = setdiff(all_candidate, active);
457 thrust::device_vector<int>::iterator inact_end;
458 inact_end = thrust::set_difference(all_candidate.begin(),
all_candidate.end(), d_active.begin(), d_active.end(), inactive.
begin());
459 inactive.erase(inact_end, inactive.end());
460 //if ˜isempty(drop) & length(find(drop==C_max_ind))==0
461 thrust::device_vector<int>::iterator iter;
462 iter = thrust::find(drop.begin(), drop.end(), C_max_ind[0]);
463 if(!drop.empty() && iter == drop.end())
464 {
465 printf("[%d]Dropped item and index of maximum correlation is not
the same. But it is being ignored here...\n", nIter);
466 canPrintSol = 0;
467 break;
468 }
469 if(!drop.empty())
470 {
95
APPENDIX A. APPENDIX
471 C_max_ind.clear();
472 }
473 //active = setdiff(active,drop);
474 thrust::device_vector<int> new_active(d_active.size());
475 thrust::device_vector<int>::iterator act_end;
476 act_end = thrust::set_difference(d_active.begin(), d_active.end(),
drop.begin(), drop.end(), new_active.begin());
477 new_active.erase(act_end, new_active.end());
478 d_active = new_active;
479 new_active.clear();
480
481 //inactive = sort(union(inactive,drop));
482 thrust::device_vector<int> new_inactive(inactive.size()+drop.size())
;
483 inact_end = thrust::set_union(inactive.begin(), inactive.end(), drop
.begin(), drop.end(), new_inactive.begin());
484 new_inactive.erase(inact_end, new_inactive.end());
485 inactive = new_inactive;
486 new_inactive.clear();
487 thrust::sort(inactive.begin(), inactive.end(), thrust::less<int>());
488 //
489 if(d_active.empty())
490 {
491 printf("[%d] active is empty, maybe hard to allocate s\n", nIter
);
492 }
493 //s = sign(c(active)); % eq 2.10 xa = x(:,active).*repmat(s',n,1);
% eq 2.4
494 thrust::device_vector<double> xa(nRow*d_active.size());
495 thrust::device_vector<double> ga(d_active.size()*d_active.size());
496 thrust::device_vector<double> s(d_active.size());
497 double s_temp_0;
498 //xa = x(:,active).*repmat(sˆT,n,1);
499 for(int k = 0; k < d_active.size(); k++)
96
APPENDIX A. APPENDIX
500 {
501 double c_active_temp = c[d_active[k]];
502 if(c_active_temp >= RESOLUTION_OF_LARS)//s = sign(c(active));
503 {s[k] = 1; s_temp_0 = 1;}//s = 1;
504
505 if(c_active_temp <= -RESOLUTION_OF_LARS)
506 {s[k] = -1; s_temp_0 = -1;}//s = -1;
507
508 if(c_active_temp < RESOLUTION_OF_LARS && c_active_temp > -
RESOLUTION_OF_LARS)
509 {s[k] = 0; s_temp_0 = 0;}//s = 0;
510 thrust::copy(dx.begin()+d_active[k]*nRow, dx.begin()+(d_active[k
]+1)*nRow, xa.begin()+k*nRow);
511 thrust::transform(xa.begin()+k*nRow, xa.begin()+(k+1)*nRow, xa.
begin()+k*nRow, MultipleByC(s_temp_0));
512 }
513
514 if(!no_xtx)//ga = xtx(active,active).*(s*sˆT);
515 {
516 double ssT_temp;
517 double xtx_temp;
518 for(int i = 0; i < d_active.size(); i++)
519 {
520 for(int j = 0; j < d_active.size(); j++)
521 {
522 xtx_temp = xtx[id(d_active[i], d_active[j], nCol)];
523 ssT_temp = s[i]*s[j];
524 xtx_temp = xtx_temp*ssT_temp;
525 ga[id(i,j,d_active.size())] = xtx_temp;
526 }
527 }
528 }
529 double* p_xa = thrust::raw_pointer_cast(&xa[0]);
530 double* p_ga = thrust::raw_pointer_cast(&ga[0]);
97
APPENDIX A. APPENDIX
531
532 if(no_xtx)//ga = xaˆT*xa;
533 {
534 cublasDgemm(hd, CUBLAS_OP_T, CUBLAS_OP_N, d_active.size(),
d_active.size(), nRow, &cc1, p_xa, nRow, p_xa, nRow, &cc0,
p_ga, d_active.size());
535 }
536 s.clear();
537
538 //invga = ga\eye(size(ga,1));
539 thrust::device_vector<double> invga(d_active.size()*d_active.size())
;//invga
540 if(ga.size()==1)
541 {
542 //printf("ga is scalar, invresion is simple\n");
543 if(ga[0]==0)
544 {
545 printf("ga is zero!!!\n");
546 canPrintSol = 0;
547 break;
548 }
549 else
550 invga[0] = 1/ga[0];
551 }
552
553 int batchSize = 1;
554 int n = d_active.size();
555 int lda = n;
556 if(n<1)
557 {
558 printf("[%d] the size of active is 0.\n", nIter);
559 exit(EXIT_FAILURE);
560 }
561
98
APPENDIX A. APPENDIX
562 int *P, *INFO;
563 cudaMalloc<int>(&P,n * batchSize * sizeof(int));
564 cudaMalloc<int>(&INFO,batchSize * sizeof(int));
565
566 double* p_invga = thrust::raw_pointer_cast(&invga[0]);//p_invga
567 const double* p_gac = thrust::raw_pointer_cast(&ga[0]);
568
569 thrust::device_vector<double*> array_pA(1);
570 array_pA[0] = p_ga;
571 double** p_pA = thrust::raw_pointer_cast(&array_pA[0]);
572
573 cublasDgetrfBatched(hd,n,p_pA,lda,P,INFO,batchSize);
574 int INFOh = 0;
575 cudaMemcpy(&INFOh,INFO,sizeof(int),cudaMemcpyDeviceToHost);
576 if(INFOh == n)
577 {
578 fprintf(stderr, "Factorization Failed: Matrix is singular\n");
579 cudaDeviceReset();
580 exit(EXIT_FAILURE);
581 }
582
583 thrust::device_vector<const double*> array_pAc(1);
584 array_pAc[0] = p_gac;
585 const double** p_pAc = thrust::raw_pointer_cast(&array_pAc[0]);
586
587 thrust::device_vector<double*> array_pinvA(1);
588 array_pinvA[0] = p_invga;
589 double** p_pinvA = thrust::raw_pointer_cast(&array_pinvA[0]);
590
591 cublasDgetriBatched(hd,n,p_pAc,lda,P,p_pinvA,lda,INFO,batchSize);
592 cudaMemcpy(&INFOh,INFO,sizeof(int),cudaMemcpyDeviceToHost);
593 if(INFOh != 0)
594 {
595 fprintf(stderr, "Inversion Failed: Matrix is singular\n");
99
APPENDIX A. APPENDIX
596 cudaDeviceReset();
597 exit(EXIT_FAILURE);
598 }
599
600 cudaFree(P), cudaFree(INFO);
601
602 // aa= sum(sum(invga))ˆ(-1/2);% eq 2.5
603 double aa;
604 double sum_invga = 0;
605
606 thrust::device_vector<double> ones2(n, 1);
607 double* p_ones2 = thrust::raw_pointer_cast(&ones2[0]);
608 thrust::device_vector<double> sum_invga_col(n);
609 double *p_sum_invga_col = thrust::raw_pointer_cast(&sum_invga_col
[0]);
610 cublasDgemv(hd, CUBLAS_OP_T, n, n, &cc1, p_invga, n, p_ones2, 1, &
cc0, p_sum_invga_col, 1);
611 sum_invga = thrust::reduce( sum_invga_col.begin() , sum_invga_col.
end());
612 aa = 1/sqrt(sum_invga);
613
614 // wa= aa*sum(invga,2);% eq 2.6
615 thrust::device_vector<double> sum_invga_row(n);
616 double *p_sum_invga_row = thrust::raw_pointer_cast(&sum_invga_row
[0]);
617 cublasDgemv(hd, CUBLAS_OP_N, n, n, &cc1, p_invga, n, p_ones2, 1, &
cc0, p_sum_invga_row, 1);
618 thrust::device_vector<double> wa(n);
619 double *p_wa = thrust::raw_pointer_cast(&wa[0]);
620 thrust::transform(sum_invga_row.begin(), sum_invga_row.end(), wa.
begin(), MultipleByC(aa));
621
622 // ua= xa*wa;% eq 2.6
623 thrust::device_vector<double> ua(nRow);
100
APPENDIX A. APPENDIX
624 double *p_ua = thrust::raw_pointer_cast(&ua[0]);
625 cublasDgemv(hd, CUBLAS_OP_N, nRow, n, &cc1, p_xa, nRow, p_wa, 1, &
cc0, p_ua, 1);
626
627
628 // test using Eq 2.7
629 //test_1 = xaˆT*ua;
630 thrust::device_vector<double> test_1(n,0);
631 double *p_test_1 = thrust::raw_pointer_cast(&test_1[0]);
632 cublasDgemv(hd, CUBLAS_OP_T, nRow, n, &cc1, p_xa, nRow, p_ua, 1, &
cc0, p_test_1, 1);
633
634 //test_2 = aa*ones(size(test_1));
635 thrust::device_vector<double> test_2(n,0);
636 double *p_test_2 = thrust::raw_pointer_cast(&test_2[0]);
637 //test_1_2 = sum(sum(abs(test_1-test_2)));
638 thrust::transform(test_1.begin(), test_1.end(), test_2.begin(),
MinusC(aa));
639 thrust::transform(test_2.begin(), test_2.end(), test_2.begin(), Abs
());
640 double test_1_2 = thrust::reduce( test_2.begin() , test_2.end());
641
642
643 //test_3 = norm(ua) - 1;
644 double test_3;
645 const double *p_uac = thrust::raw_pointer_cast(&ua[0]);
646 cublasDnrm2(hd, nRow, p_uac, 1, &test_3);
647 test_3 = test_3-1;
648 if(test_3<0) test_3 = -test_3;
649
650
651 if(test_1_2 > RESOLUTION_OF_LARS*100 || test_3>RESOLUTION_OF_LARS
*100)
652 {
101
APPENDIX A. APPENDIX
653 printf("[%d]Eq 2.7 test failure.\n", nIter);
654 canPrintSol = 0;
655 break;
656 }
657 //a = xˆT*ua;
658 thrust::device_vector<double> a(nCol);
659 double *p_a = thrust::raw_pointer_cast(&a[0]);
660 cublasDgemv(hd, CUBLAS_OP_T, nRow, nCol, &cc1, p_dx, nRow, p_ua, 1,
&cc0, p_a, 1);
661
662 double c_inactive_temp;
663 double a_inactive_temp;
664 double tmp_1;
665 double tmp_2;
666 thrust::device_vector<double> tmp;
667 for(int k = 0; k < inactive.size(); k++)
668 {
669 c_inactive_temp = c[inactive[k]];//tmp_1 = (C_max-c(inactive))
./(aa-a(inactive));
670 a_inactive_temp = a[inactive[k]];//tmp_2 = (C_max+c(inactive))
./(aa+a(inactive));
671 tmp_1 = (C_max-c_inactive_temp)/(aa-a_inactive_temp);
672 tmp_2 = (C_max+c_inactive_temp)/(aa+a_inactive_temp);
673 if(tmp_1>0) tmp.push_back(tmp_1);//tmp = tmp_3(find(tmp_3>0));
674 if(tmp_2>0) tmp.push_back(tmp_2);
675 }
676
677 double gamm;
678 if(tmp.empty())
679 //if(tmp->empty())
680 gamm = C_max/aa;
681 else
682 {//gamma = min(tmp);
683 thrust::device_vector<double>::iterator iter4 = thrust::
102
APPENDIX A. APPENDIX
min_element(tmp.begin(), tmp.end());
684 gamm = *iter4;
685 }
686
687
688 thrust::device_vector<double> d(nCol,0);//d = zeros(1,nCol);
689 thrust::device_vector<double> tmpp(nCol,0);//tmp = zeros(1,nCol);
690 double d_temp;
691 double beta_temp;
692 thrust::device_vector<double> tmpp2;
693 double s_temp;
694 for(int k = 0; k < d_active.size(); k++)
695 {
696 double c_active_temp2 = c[d_active[k]];
697 if(c_active_temp2 >= RESOLUTION_OF_LARS)//s = sign(c(active));
698 s_temp = 1;
699 if(c_active_temp2 <= -RESOLUTION_OF_LARS)
700 s_temp = -1;
701 if(c_active_temp2 < RESOLUTION_OF_LARS && c_active_temp2 > -
RESOLUTION_OF_LARS)
702 s_temp = 0;
703 d_temp = s_temp*wa[k];// d(active) = s.*wa;
704 if(d_temp==0)//if (length(find(d(active)==0)))
705 {
706 printf("[%d]Something wrong with vector d: Eq 3.4\n", nIter)
;
707 canPrintSol = 0;
708 break;
709 }
710 d[d_active[k]] = d_temp;
711 }
712
713 for(int k = 0; k < d_active.size(); k++)
714 {
103
APPENDIX A. APPENDIX
715 beta_temp = beta[d_active[k]];
716 d_temp = d[d_active[k]];
717 beta_temp = -beta_temp/d_temp;
718 tmpp[d_active[k]] = beta_temp;//tmp(active) = -1*beta(active)./d
(active);
719
720 if(beta_temp>0)
721 {
722 tmpp2.push_back(beta_temp);//tmp2 = tmp(find(tmp>0));
723 //tmpp2->push_back(beta_temp);//tmp2 = tmp(find(tmp>0));
724 //printf("%lf goes to tmpp2 in tmp2 = tmp(find(tmp>0));\n",
beta_temp);
725 }
726 }
727
728 drop.clear();//drop = [];
729 double gamm_tilde = INF;
730
731 if(!tmpp2.empty())
732 {
733 thrust::device_vector<double>::iterator iter3 = thrust::
min_element(tmpp2.begin(), tmpp2.end());
734
735 unsigned int position = iter3 - tmpp2.begin();
736 double min_tmpp2 = *iter3;
737
738 if(!tmpp2.empty() && gamm >= min_tmpp2)
739 {
740 gamm_tilde = min_tmpp2; //gamma_tilde = min(tmp2);
741 for(int k = 0; k < nCol; k++)//drop = find(tmp==gamma_tilde)
;
742 {
743 if(tmpp[k]==gamm_tilde)
744 {
104
APPENDIX A. APPENDIX
745 drop.push_back(k);
746 //if(rank==10) fprintf(f_act_record, "[%d] %d added
to drop at line: 808\n", nIter, k);
747 }
748 }
749 }
750 }
751 tmpp2.clear();
752 tmpp.clear();
753
754 double min_gamm = gamm;
755 if(gamm > gamm_tilde) min_gamm = gamm_tilde;
756
757 //const double *p_min_gamm = &min_gamm;
758 mu_a_plus = mu_a;
759 //double *p_mu_a_plus = thrust::raw_pointer_cast(&mu_a_plus[0]);
760 //cublasDaxpy(hd, nRow, p_min_gamm, p_uac, 1, p_mu_a_plus, 1);//
mu_a_plus = mu_a + min(gamm,gamm_tilde)*ua;
761 thrust::device_vector<double> new_ua1=ua;
762 thrust::transform(ua.begin(), ua.end(), new_ua1.begin(), MultipleByC
(min_gamm));
763 thrust::transform(mu_a.begin(), mu_a.end(), new_ua1.begin(),
mu_a_plus.begin(), thrust::plus<double>());
764 new_ua1.clear();
765
766
767 beta_new = beta;
768 //const double *p_dc = thrust::raw_pointer_cast(&d[0]);
769 //double *p_beta_new = thrust::raw_pointer_cast(&beta_new[0]);
770 //cublasDaxpy(hd, nCol, p_min_gamm, p_dc, 1, p_beta_new, 1);//
beta_new = beta + min(gamm,gamm_tilde)*d;
771 thrust::device_vector<double> new_d1=d;
772 thrust::transform(d.begin(), d.end(), new_d1.begin(), MultipleByC(
min_gamm));
105
APPENDIX A. APPENDIX
773 thrust::transform(beta.begin(), beta.end(), new_d1.begin(), beta_new
.begin(), thrust::plus<double>());
774 new_d1.clear();
775
776 //active = setdiff(active,drop);
777 thrust::device_vector<int> new_active2(d_active.size());
778 thrust::device_vector<int>::iterator act_end2;
779 act_end2 = thrust::set_difference(d_active.begin(), d_active.end(),
drop.begin(), drop.end(), new_active2.begin());
780 new_active2.erase(act_end2, new_active2.end());
781 d_active = new_active2;
782 new_active2.clear();
783
784 //inactive = setdiff(all_candidate,active);
785 thrust::device_vector<int>::iterator inact_end2;
786 inact_end2 = thrust::set_difference(all_candidate.begin(),
all_candidate.end(), d_active.begin(), d_active.end(), inactive.
begin());
787 inactive.erase(inact_end2, inactive.end());
788
789 for(int k = 0; k < drop.size(); k++)//beta_new(drop)=0;
790 {
791 beta_new[drop[k]]= 0;
792 }
793
794 double C_max_aa;
795 C_max_aa = C_max/aa;
796 //const double * p_C_max_aa = &C_max_aa;
797 mu_a_OLS = mu_a;
798 //double *p_mu_a_OLS = thrust::raw_pointer_cast(&mu_a_OLS[0]);
799 //cublasDaxpy(hd, nRow, p_C_max_aa, p_uac, 1, p_mu_a_OLS, 1);//
mu_a_OLS = mu_a + C_max/aa*ua;
800 thrust::device_vector<double> new_ua2=ua;
801 thrust::transform(ua.begin(), ua.end(), new_ua2.begin(), MultipleByC
106
APPENDIX A. APPENDIX
(C_max_aa));
802 thrust::transform(mu_a.begin(), mu_a.end(), new_ua2.begin(),
mu_a_OLS.begin(), thrust::plus<double>());
803 new_ua2.clear();
804
805
806 beta_OLS = beta;
807 ////const double *p_dc = thrust::raw_pointer_cast(&d[0]);
808 //double *p_beta_OLS = thrust::raw_pointer_cast(&beta_OLS[0]);
809 //cublasDaxpy(hd, nCol, p_C_max_aa, p_dc, 1, p_beta_new, 1);//
beta_LS = beta + C_max/aa*d;
810 thrust::device_vector<double> new_d2=d;
811 thrust::transform(d.begin(), d.end(), new_d2.begin(), MultipleByC(
C_max_aa));
812 thrust::transform(beta.begin(), beta.end(), new_d2.begin(), beta_OLS
.begin(), thrust::plus<double>());
813 new_d2.clear();
814
815
816 //MSE = sum((y-mu_a_OLS).ˆ2)/length(y);
817 thrust::device_vector<double> y_minus_mu_a(nRow);
818 y_minus_mu_a = dy;
819 //double *p_y_minus_mu_a = thrust::raw_pointer_cast(&y_minus_mu_a
[0]);
820 //double* p_mu_a_OLSc = thrust::raw_pointer_cast(&mu_a_OLS[0]);
821 //const double c_m1 = -1;
822 //cublasDaxpy(hd, nRow, &c_m1, p_mu_a_OLSc, 1, p_y_minus_mu_a, 1);
823 thrust::transform(dy.begin(), dy.end(), mu_a_OLS.begin(),
y_minus_mu_a.begin(), thrust::minus<double>());
824 thrust::transform(y_minus_mu_a.begin(), y_minus_mu_a.end(),
y_minus_mu_a.begin(), Square());
825 MSE = thrust::reduce( y_minus_mu_a.begin() , y_minus_mu_a.end());
826 MSE = MSE/nRow;
827
107
APPENDIX A. APPENDIX
828 y_minus_mu_a.clear();
829 d.clear();
830 ua.clear();
831 wa.clear();
832 a.clear();
833 xa.clear();
834 ga.clear();
835 invga.clear();
836
837 //update
838 mu_a = mu_a_plus;
839 beta = beta_new;
840 thrust::device_vector<double> beta_backup=beta;
841 double d_beta_temp;
842 if(MSE <= MSE_th)
843 {
844
845 //beta = beta(active)./sx(active);
846 thrust::fill(beta.begin(), beta.end(), 0);
847 for(int k = 0; k < d_active.size(); k++)
848 {
849 d_beta_temp = beta_backup[d_active[k]];
850 d_beta_temp = d_beta_temp/ssx[d_active[k]];
851 beta[d_active[k]] = d_beta_temp;
852 }
853
854
855 canPrintSol = 1;
856 //othersStop = 1;//need to be broadcasted!
857 break;
858 }
859 beta_backup.clear();
860 }
861 if(nIter>=MAX_ITER){printf(" Reached nIter limit!\n");}
108
APPENDIX A. APPENDIX
862 mu_a_plus.clear();
863 mu_a_OLS.clear();
864 beta_new.clear();
865 beta_OLS.clear();
866 c.clear();
867 mu_a.clear();
868 dy.clear();
869 dx.clear();
870 //sx.clear();
871 ssx.clear();
872
873 if(!no_xtx) {xtx.clear();}
874 thrust::host_vector<double> h_beta(beta.begin(), beta.end());
875 thrust::host_vector<int> active(d_active.begin(), d_active.end());
876 /*========================END OF SOLVING==============================*/
877 dtime = ((double)clock() - start)/CLOCKS_PER_SEC;
878 if(canPrintSol ==1)
879 {
880 printf("========Solved successfully ========\n");
881 //fprintf(f_record, "[%d]Runtime = %f\n", rank, exe_time);
882 printf("Runtime = %f\n", dtime);
883
884 printf("====Number of Iteration: %d====\n", nIter);
885 for(int iii = 0; iii<active.size(); iii++)
886 {
887 printf("------active_idx:%d------beta[active_idx]:%6.8f------\n"
, active[iii], h_beta[active[iii]] );
888 }
889 }
890 else
891 {
892 printf("========Not Solved TAT ========\n");
893 }
894 timearray[iFaultCase-startFaultCaseID] = dtime;
109
APPENDIX A. APPENDIX
895 // memory clean up
896
897 b.clear();
898 h_beta.clear();
899 active.clear();
900
901 // shutdown
902 status = cublasDestroy(hd);
903
904 if (status != CUBLAS_STATUS_SUCCESS)
905 {
906 fprintf(stderr, "!!!! shutdown error (A)\n");
907 return EXIT_FAILURE;
908 }
909 }
910 A.clear();
911 //
912 double t_ave = thrust::reduce(timearray.begin(),timearray.end());
913 t_ave = t_ave/nTotalFaultCases;
914 printf("//***********========[%d] Ave. Time = %lf ========*********\n", caseID,
t_ave);
915 timearray.clear();
916
917 /*
918 if(argc <= 1 || strcmp(argv[1], "-noprompt"))
919 {
920 printf("\nPress ENTER to exit...\n");
921 getchar();
922 }*/
923 return EXIT_SUCCESS;
924 }
110
Index
L0 minimization problem, 15
L1 minimization problem, 15
Equivalent Injection, 7
LARS, 15
Lasso, 15
OLS Backup Estimation, 20
Overlapping Group Lasso, 19
Teed lines, 10
111