r 0 0electriconf/2012/posters/tao...0.06 0.08 0.1 0.12 0.14 problem size (ieee test feeders) approx....

1
Distribution Power Flow Model Forward / Backward Sweep Small complex Matrix-Vector Mult Monte Carlo Simulation: Distribution System Probabilistic Monitoring System Problems from Power System Uncertainties in electric power distribution system: Wind power, DG, PHEV, responsive load, etc. Lack of real time measurements: AMI, Smart metering may help, but still in preliminary stage. Probabilistic Model for Uncertainties: Wind forecast model, EV pattern, Load profile. A Probabilistic Solution of Distribution System Advances in High Performance Computing Fully utilizing the computing power is a difficult problem NO free speedup, parallelism, memory , special instructions Require knowledge in specific domain & computer architecture Potential Benefit: Moore’s law! High Performance Computing Enabled Power System Solution Monte Carlo based Probabilistic Solution of Distribution System: “Golden standard” for probabilistic power flow Well fitted problem on modern computing platform Extensible to contingency analysis, steady state time-series… An Affordable Supercomputing Center for Distribution Substation Performance Result: High Performance Computing Engine Monte Carlo Results and Web User Interface Motivation and Background A Multi-core High Performance Computing Framework for Probabilistic Solutions of Distribution Systems Tao Cui and Franz Franchetti Email: [email protected] Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA Methods & System Latest Results Conclusions & Future Work Workstation + GPU accelerator Core i7 + Tesla C1060 1Tflop/s peak performance $5,000 class Image: Nvidia Image: Dell + Year 2010: Program optimization / parallelization: Enable fast computation of large amount of power flow. Performance can be further increased on new platform: Tracking new development in CPU micro-architecture. GPU: small, less powerful but many more cores. Applications of fast distribution power flow solver: Probabilistic monitor of distribution system Fast time series solution; statistical analysis… Active Role of Distribution System in Smart Grid Input with Randomness Physically Deterministic Method: Load Flow Output result with probability features Algorithm y x *Original figure by Andrew Hsu (EESG) abc abc abc n m m abc abc abc m n m I c V d I V A V B I Aggressive Code Optimizations on Monte Carlo Solver Data structure optimization Algorithm level optimization SIMD vectorization Multithreading: Squeezing Computation Power out of the Computer Architecture. Push Performance to the Hardware Peak. 0.01 0.1 1 10 4 16 64 256 Speed (Gflops) C++STL Speed Optimized C++STL Intel Fully Optimized C Array C Pattern SSE SSE+Pthread Node (Bus) Numbers ~ 500x >40x Performance 4 core Core2Extreme 2.66 GHz RNG & Load Flow in Buf A N RNG & Load Flow in Buf A 2 RNG & Load Flow in Buf A 1 Real Time Interval (SCADA Interval) Scheduling Thd 0 Computing Thd 2 Computing Thd 1 Computing Thd N Sync Signal KDE in all Buf Bs Result Out RNG & Load Flow in Buf B N RNG & Load Flow in Buf B 2 RNG & Load Flow in Buf B 1 Sync Signal Out Switch Buffer A,B KDE in all Buf As Result Out RNG: Random Number Generator KDE: Kernel Density Estimation 0 20 40 60 80 100 120 140 160 180 Core2Extreme 2.66GHz (4-core, SSE) Xeon X5680 3.33GHz (6-core, SSE) Core i5-2400 3.10GHz (4-core, AVX) 2 Xeon7560 2.27GHz (16-core, SSE) Speed (Gflop/s) Optimized Scalar SIMD (SSE or AVX) Multi-Core Performance on Different Machines for IEEE37 Testfeeder Load Flow Solver Load Flow Solver Load Flow Solver 2~64 parallel threads ... 0 0.5 1 0 0.5 1 0 0.5 1 Generate Random Samples Solve Load Flows Estimate Density -10 -5 0 5 10 15 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Problem Size (IEEE Test Feeders) Approx. flops Approx. Time / Core2 Extreme Approx. Time / Core i5 Baseline. C++ ICC o3 (~300x faster then Matlab) Comments IEEE37: one iteration 12 K ~ 0.3 us ~ 0.3 us IEEE37: one load flow (5 Iter) 60 K ~ 1.5 us ~ 1.5 us 0.01 kVA error IEEE37: 1 million load flow 60 G ~ < 2 s ~ < 1 s ~ 60 s (>5 hrs Matlab) SCADA Interval: 4 seconds IEEE123: 1 million load flow 200 G ~ < 10 s ~ < 3.5 s ~ 200 s (>15 hrs Matlab) Translate speed (Gflop/s) into run time: 0.8 0.9 1 0 5 10 15 20 100 run Phase A 0.98 1 1.02 0 20 40 60 80 100 Phase B 0.93 0.94 0.95 0 50 100 150 Phase C voltage, (p.u.) 0.8 0.9 1 0 5 10 15 20 1000 run 0.98 1 1.02 0 20 40 60 80 100 0.93 0.94 0.95 0 50 100 150 voltage, (p.u.) 0.8 0.9 1 0 5 10 15 20 10000 run 0.98 1 1.02 0 20 40 60 80 100 0.93 0.94 0.95 0 50 100 150 voltage, (p.u.) 0.8 0.9 1 0 5 10 15 20 100000 run 0.98 1 1.02 0 20 40 60 80 100 0.93 0.94 0.95 0 50 100 150 voltage, (p.u.) 0.8 0.9 1 0 5 10 15 20 1000000 run 0.98 1 1.02 0 20 40 60 80 100 0.93 0.94 0.95 0 50 100 150 voltage, (p.u.) 0.8 0.9 1 0 5 10 15 20 10000000 run 0.98 1 1.02 0 20 40 60 80 100 0.93 0.94 0.95 0 50 100 150 voltage, (p.u.) Out: Voltage on Node 738 -400 -200 0 200 400 0 1 2 3 4 x 10 -3 Active power, kw In: Active Power P~ u=0,std=100kw on Phase A of Node 738,711,741 Web Server @ ECE CMU Web UI Monte Carlo solver Multi-core Desktop Computing Server Meter Meter Data Aggregator Simulated Meters Using Computing Nodes Web UI Web UI Web UI Campus Network Internet // input[0~2]: vector real part; // input[3~5]: vector imaginary part; // constant: parameters (r or c on left). switch (matrix_pattern){ case real_diagonal_equal_matrix: output[0] = *constant * input[0]; ... output[5] = *constant * input[5]; break; case imag_diagonal_equal_matrix: output[0] = -*constant * input[3]; output[1] = -*constant * input[4]; output[2] = -*constant * input[5]; output[3] = *constant * input[0]; output[4] = *constant * input[1]; output[5] = *constant * input[2]; break; //cases for other frequent matrix patterns ... default: //default full 3x3 complex matrix ... } r 0 0 0 r 0 0 0 r i 0 0 0 i 0 0 0 i c 11 c 12 c 13 c 21 c 22 c 23 c 31 c 32 c 33 r: real number, i: imaginary number c: complex number ... Matrix Pattern: Pseudo-code for Matrix-Vector Multiplication: Code Generation (Spiral) N0 N2 N3 N4 N5 N1 Load Substation Load Load Load Load pN0 pN1 pN2 pN3 pN4 Forward Sweep Pointer Array pN5 pN5 pN4 pN3 pN2 pN1 Backward Sweep Pointer Array pN0 Array Access N5 Data: Parameters Current Voltage N4 Data: Parameters Current Voltage Consecutive in Data Array ... Sample FBS Load Flow Result Sample FBS Load Flow Result SIMD Instructions Scalar Instructions Vector Register: 4 floats in SSE Scalar Register: 1 float This work is supported by NSF through awards 0931978 and 0702386. Related Publications: 1. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Distribution Power Flow,” The 43rd North American Power Symposium (NAPS), Boston, USA, Aug 2011. 2. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Probabilistic Solutions of Distribution Systems,” IEEE PES General Meeting 2012, San Diego, CA, USA. Acknowledgement & Publications

Upload: others

Post on 28-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Distribution Power Flow Model

    • Forward / Backward Sweep

    • Small complex Matrix-Vector Mult

    Monte Carlo Simulation:

    Distribution System Probabilistic Monitoring System

    Problems from Power System

    • Uncertainties in electric power distribution system:

    • Wind power, DG, PHEV, responsive load, etc.

    • Lack of real time measurements:

    • AMI, Smart metering may help, but still in preliminary stage.

    • Probabilistic Model for Uncertainties:

    • Wind forecast model, EV pattern, Load profile.

    • A Probabilistic Solution of Distribution System

    Advances in High Performance Computing

    • Fully utilizing the computing power is a difficult problem

    • NO free speedup, parallelism, memory , special instructions

    • Require knowledge in specific domain & computer architecture

    • Potential Benefit: Moore’s law!

    High Performance Computing Enabled Power System Solution

    • Monte Carlo based Probabilistic Solution of Distribution System:

    • “Golden standard” for probabilistic power flow

    • Well fitted problem on modern computing platform

    • Extensible to contingency analysis, steady state time-series…

    An Affordable Supercomputing Center for Distribution Substation

    Performance Result: High Performance Computing Engine

    Monte Carlo Results and Web User Interface

    Motivation and Background

    A Multi-core High Performance Computing Framework for

    Probabilistic Solutions of Distribution Systems

    Tao Cui and Franz Franchetti Email: [email protected]

    Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA

    Methods & System Latest Results

    Conclusions & Future Work

    Workstation + GPU accelerator Core i7 + Tesla C1060 1Tflop/s peak performance $5,000 class

    Image: Nvidia

    Image: Dell

    +

    Year 2010:

    Program optimization / parallelization:

    • Enable fast computation of large amount of power flow.

    Performance can be further increased on new platform:

    • Tracking new development in CPU micro-architecture.

    • GPU: small, less powerful but many more cores.

    Applications of fast distribution power flow solver:

    • Probabilistic monitor of distribution system

    • Fast time series solution; statistical analysis…

    Active Role of Distribution System in Smart Grid

    Input with Randomness

    Physically Deterministic Method: Load Flow

    Output result with probability features

    Algorithm

    y

    x

    *Original figure by Andrew Hsu (EESG)

    abc abc abcn m m

    abc abc abcm n m

    I c V d I

    V A V B I

    Aggressive Code Optimizations on Monte Carlo Solver

    • Data structure optimization • Algorithm level optimization

    • SIMD vectorization • Multithreading:

    Squeezing Computation Power out of the Computer Architecture.

    Push Performance to the Hardware Peak.

    0.01

    0.1

    1

    10

    4 16 64 256

    Speed (Gflops) C++STL Speed Optimized C++STL Intel Fully Optimized

    C Array C Pattern

    SSE SSE+Pthread

    Node (Bus) Numbers

    ~500x

    >40x

    Performance 4 core Core2Extreme 2.66 GHz

    RNG & Load Flow in Buf AN

    RNG & Load Flow in Buf A2

    RNG & Load Flow in Buf A1

    Real Time Interval

    (SCADA Interval)

    Scheduling Thd 0

    Computing Thd 2

    Computing Thd 1

    Computing Thd N

    Sync Signal

    KDE in all Buf Bs Result Out

    RNG & Load Flow in Buf BN

    RNG & Load Flow in Buf B2

    RNG & Load Flow in Buf B1

    Sync

    Signal Out

    Switch Buffer A,B

    KDE in all Buf As Result Out

    RNG: Random Number Generator

    KDE: Kernel Density Estimation

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    Core2Extreme

    2.66GHz(4-core, SSE)

    Xeon X5680

    3.33GHz(6-core, SSE)

    Core i5-2400

    3.10GHz(4-core, AVX)

    2 Xeon7560

    2.27GHz(16-core, SSE)

    Speed (Gflop/s)

    Optimized Scalar

    SIMD (SSE or AVX)

    Multi-Core

    Performance on Different Machines for IEEE37 Testfeeder

    Load Flow Solver

    Load Flow Solver

    Load Flow Solver

    … 2~64 parallel

    threads ...

    0

    0.5

    1

    0

    0.5

    1

    0

    0.5

    1

    Generate Random Samples Solve Load Flows Estimate Density

    -10 -5 0 5 10 150

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    Problem Size

    (IEEE Test Feeders)

    Approx.

    flops

    Approx. Time /

    Core2 Extreme

    Approx. Time

    / Core i5

    Baseline. C++ ICC –o3

    (~300x faster then Matlab)

    Comments

    IEEE37: one iteration 12 K ~ 0.3 us ~ 0.3 us

    IEEE37: one load flow (5 Iter) 60 K ~ 1.5 us ~ 1.5 us 0.01 kVA error

    IEEE37: 1 million load flow 60 G ~ < 2 s ~ < 1 s ~ 60 s (>5 hrs Matlab) SCADA Interval:

    4 secondsIEEE123: 1 million load flow 200 G ~ < 10 s ~ < 3.5 s ~ 200 s (>15 hrs Matlab)

    Translate speed (Gflop/s) into run time:

    0.8 0.9 1

    0

    5

    10

    15

    20

    100 run

    Ph

    ase

    A

    0.98 1 1.020

    20

    40

    60

    80

    100

    Ph

    ase

    B

    0.93 0.94 0.950

    50

    100

    150

    Ph

    ase

    C

    voltage, (p.u.)

    0.8 0.9 1

    0

    5

    10

    15

    20

    1000 run

    0.98 1 1.020

    20

    40

    60

    80

    100

    0.93 0.94 0.950

    50

    100

    150

    voltage, (p.u.)

    0.8 0.9 1

    0

    5

    10

    15

    20

    10000 run

    0.98 1 1.020

    20

    40

    60

    80

    100

    0.93 0.94 0.950

    50

    100

    150

    voltage, (p.u.)

    0.8 0.9 1

    0

    5

    10

    15

    20

    100000 run

    0.98 1 1.020

    20

    40

    60

    80

    100

    0.93 0.94 0.950

    50

    100

    150

    voltage, (p.u.)

    0.8 0.9 1

    0

    5

    10

    15

    20

    1000000 run

    0.98 1 1.020

    20

    40

    60

    80

    100

    0.93 0.94 0.950

    50

    100

    150

    voltage, (p.u.)

    0.8 0.9 1

    0

    5

    10

    15

    20

    10000000 run

    0.98 1 1.020

    20

    40

    60

    80

    100

    0.93 0.94 0.950

    50

    100

    150

    voltage, (p.u.)

    Out: Voltage on

    Node 738

    -400 -200 0 200 4000

    1

    2

    3

    4x 10

    -3

    Active power, kw

    In: Active Power

    P~ u=0,std=100kw

    on Phase A of

    Node 738,711,741

    Web Server

    @ ECE CMU

    Web UIMonte Carlo solver

    Multi-core Desktop

    Computing ServerMeterMeterData Aggregator

    Simulated Meters Using Computing Nodes

    Web UI

    Web UI

    Web UI

    Campus NetworkInternet

    // input[0~2]: vector real part;

    // input[3~5]: vector imaginary part;

    // constant: parameters (r or c on left).

    switch (matrix_pattern){

    case real_diagonal_equal_matrix:

    output[0] = *constant * input[0];

    ...

    output[5] = *constant * input[5];

    break;

    case imag_diagonal_equal_matrix:

    output[0] = -*constant * input[3];

    output[1] = -*constant * input[4];

    output[2] = -*constant * input[5];

    output[3] = *constant * input[0];

    output[4] = *constant * input[1];

    output[5] = *constant * input[2];

    break;

    //cases for other frequent matrix patterns

    ...

    default: //default full 3x3 complex matrix

    ...

    }

    r 0 0

    0 r 0

    0 0 r

    i 0 0

    0 i 0

    0 0 i

    c11 c12 c13c21 c22 c23c31 c32 c33

    r: real number,

    i: imaginary number

    c: complex number

    ...

    Matrix Pattern: Pseudo-code for Matrix-Vector Multiplication:

    Code Generation (Spiral)

    N0

    N2

    N3 N4 N5

    N1

    Load

    Substation

    Load Load Load

    Load

    pN0

    pN1

    pN2

    pN3

    pN4

    Forward

    Sweep

    Pointer

    Array

    pN5

    pN5

    pN4

    pN3

    pN2

    pN1

    Backward

    Sweep

    Pointer

    Array

    pN0

    Array Access

    N5 Data:

    Parameters

    Current

    Voltage

    N4 Data:

    Parameters

    Current

    Voltage

    Consecutive

    in Data Array

    ...

    Sample FBS Load Flow Result

    Sample FBS Load Flow Result

    SIMD Instructions

    Scalar Instructions

    Vector Register:

    4 floats in SSE

    Scalar Register:

    1 float

    • This work is supported by NSF through awards 0931978 and 0702386. • Related Publications:

    1. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Distribution Power Flow,” The 43rd North American Power Symposium (NAPS), Boston, USA, Aug 2011.

    2. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Probabilistic Solutions of Distribution Systems,” IEEE PES General Meeting 2012, San Diego, CA, USA.

    Acknowledgement & Publications