high-performance outlier detection algorithm for finding ...lfwu/slides/bdac14.pdf ·...
Post on 15-Feb-2019
215 Views
Preview:
TRANSCRIPT
BDAC-SC14 1 / 17
High-Performance Outlier Detection Algorithm for Finding
Blob-Filaments in Plasma
Lingfei Wu1, Kesheng Wu2, Alex Sim2, Michael Churchill3, Jong Y. Choi4,
Andreas Stathopoulos1, CS Chang3, and Scott Klasky4
1College of William and Mary2Lawrence Berkeley National laboratory3Princeton Plasma Physics Laboratory
4Oak Ridge National Laboratory
Outline
• Outline
Introduction
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 2 / 17
Introduction
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
What is an outlier ?
• Outline
Introduction
• Outlier Detection
• Our goal
• Blobs in fusion
• Motivation
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 3 / 17
An outlier is a data object that deviates significantly from the
rest of the objects, as if it were generated by a different
mechanism.1
• Outliers could be errors or noise to be eliminated
• Outliers can lead to the discovery of important information in data
Outlier detection is employed in a variety of applications:
• fraud detection
• time-series monitoring
• medical care
• public safety and security
1Jiawei Han and Micheline Kamber, Data Mining, Southeast Asia Edition:
Concepts and Techniques, Morgan kaufmann, 2006.
Our goal
• Outline
Introduction
• Outlier Detection
• Our goal
• Blobs in fusion
• Motivation
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 4 / 17
Outlier detection is an important task in many safety critical
environments.
• An outlier demands to be detected in real-time
• A suitable feedback is provided to alarm the control system
• The size of data sets need fast and scalable outlier detection
methods
Our goal: apply the outlier detection techniques to effectively
tackle the fusion blob detection problem on extremely large
parallel machines
• Massive amounts of data are generated from fusion experiments
/ simulations
• Near real-time understanding of data is needed to predict
performance
Blobs in fusion
• Outline
Introduction
• Outlier Detection
• Our goal
• Blobs in fusion
• Motivation
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 5 / 17
What is fusion & Why fusion?
• Fusion is viable energy
source for the future
• Fossil fuels will run out
soon; Solar and wind have
limited potential
• Advantages of fusion:
inexhaustible, clear and
safe
Blobs in fusion
• Outline
Introduction
• Outlier Detection
• Our goal
• Blobs in fusion
• Motivation
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 5 / 17
Blobs are intermittent bursts of particles near the edge of
the confined plasma
⇒ Driven by turbulence
Blobs are bad for fusion performance
because they:
• Transport heat and particles away from
the confined plasma
• May damage the main chamber wall
• Lead to increased levels of neutrals and
impurities, bypassing control
mechanisms
Blob detection is a very important task!
Big data challenges in fusion energy
• Outline
Introduction
• Outlier Detection
• Our goal
• Blobs in fusion
• Motivation
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 6 / 17
Fusion experiments generate massive amounts of data:
• Diagnostics measuring lasts
from a few to several hundred
seconds generating large
amounts of data, ∼ Gigabytes
to Terabytes!
• Large-scale fusion simulation
generates ∼ a few tens of
Terabytes per second!
Big data challenges in fusion energy
• Outline
Introduction
• Outlier Detection
• Our goal
• Blobs in fusion
• Motivation
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 6 / 17
Difficulties in large-scale data analysis:
• Existing data analysis is often
a single-threaded, slow, and
only for post-run analysis
• Fusion experiments demand
real-time data analysis
• E.g. ICEE aims to apply blob
detection for monitoring health
of fusion experiments in
KSTAR
Real-time blob detection is a very challenging
task!
Three approaches for blob detection
• Outline
Introduction
Related work
• Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 7 / 17
Single
threshold &
conditional
averaging
• The exact criterion varies
• Averaging may destroy important
information
Image
analysis
techniques
• Very sensitive to the setting of
parameters
• Hard to use generic method for all
images
Contouring
method &
thresholding
• Can not be a real-time blob detection
• May miss detecting blobs at the edge
• Is still post-run-analysis
An efficient blob detection approach
• Outline
Introduction
Related work
Blob detection
• Our approach
• The sketch
• Refine mesh
• Two-step detection
• Fast CCL
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 8 / 17
• Our approach: an outlier detection algorithm for efficiently
finding blobs in fusion simulations / experiments
◦ Two-step outlier detection with various criteria after
normalizing the local intensity
◦ Leverage a fast connected component labeling method to
find blob components based on a refined triangular mesh
• Contributions:
◦ A new method not missing detection of blobs in the edge of
the region of interests compared to contouring method
◦ Targeting for more challenging in-shot-analysis and
between-shot-analysis
◦ The first research work to achieve blob detection in a few
milliseconds
Outlier detection algorithm for finding blobs
• Outline
Introduction
Related work
Blob detection
• Our approach
• The sketch
• Refine mesh
• Two-step detection
• Fast CCL
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 9 / 17
Sketch the proposed outlier detection algorithm:
Refine mesh in the region of interests
• Outline
Introduction
Related work
Blob detection
• Our approach
• The sketch
• Refine mesh
• Two-step detection
• Fast CCL
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 10 / 17
1.2 1.4 1.6 1.8 2 2.2 2.4
-1
-0.5
0
0.5
1
R (m)
Magnetic Fields in Poloidal Plane
Z (
m)
Poloidal PlaneRegion of Interests
2.25 2.26 2.27 2.28 2.29 2.3 2.31 2.32-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
R (m)
Z (
m)
Reinfed Original
• Compute 4 times more triangles by creating new vertexes with
the three middle points of original edges
• Apply recursively until reaching the desired resolution
• Depend on specified data set and demanded resolution
Two-step outlier detection to identify blobs
• Outline
Introduction
Related work
Blob detection
• Our approach
• The sketch
• Refine mesh
• Two-step detection
• Fast CCL
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 11 / 17
Motivation for two-step outlier detection for finding blobs:
A contour plot in the region of interests
Two-step outlier detection to identify blobs
• Outline
Introduction
Related work
Blob detection
• Our approach
• The sketch
• Refine mesh
• Two-step detection
• Fast CCL
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 11 / 17
Apply exploratory data analysis to analyze the underlying
distribution of the local normalized density:
0 2 4 6 8 10 120
1
2
3
4
5
6
7x 10
4 Density distribution fitting using 50 bins
Normalized electron density (n_e/n_e0)
Num
ber
of p
oint
s in
eac
h bi
n
(a) Extreme Value Distribution
0 2 4 6 8 10 120
1
2
3
4
5
6
7x 10
4 Density distribution fitting using 50 bins
Normalized electron density (n_e/n_e0)
Num
ber
of p
oint
s in
eac
h bi
n
(b) Log Normal Distribution
[
N(ri, zi, t)− µ > α ∗ σ,∀(ri, zi) ∈ Γ,N(ri, zi, t)− µ2 > β ∗ σ2,∀(ri, zi) ∈ Γ2.
]
A fast connected component labeling algorithm
• Outline
Introduction
Related work
Blob detection
• Our approach
• The sketch
• Refine mesh
• Two-step detection
• Fast CCL
Hybrid parallel
Evaluations
Conclusion
BDAC-SC14 12 / 17
We apply an efficient connected component labeling algorithm
on a refined triangular mesh to find blob components:
• This is a two-pass approach and each triangle is scanned firstly
• Reduce unnecessary memory access if any vertex in a triangle
is found to be connected with others
• After the label array is filled full, we need flatten the union and
find tree
• Second pass is performed to correct labels and all blob
candidate components are found
Parallelization of blob detection approach
• Outline
Introduction
Related work
Blob detection
Hybrid parallel
• MPI/OpenMP
Evaluations
Conclusion
BDAC-SC14 13 / 17
A hybrid MPI/OpenMP parallelization on many-core processor
architecture:
• High-level: use MPI to allocate n processes to process each
time frame
• Low-level: use OpenMP to accelerate the computations with m
threads
Results: same time frame + four planes
• Outline
Introduction
Related work
Blob detection
Hybrid parallel
Evaluations
• Results I
• Results II
• Results III
Conclusion
BDAC-SC14 14 / 17
Results: same plane + four time frames
• Outline
Introduction
Related work
Blob detection
Hybrid parallel
Evaluations
• Results I
• Results II
• Results III
Conclusion
BDAC-SC14 15 / 17
Results: real-time blob detection
• Outline
Introduction
Related work
Blob detection
Hybrid parallel
Evaluations
• Results I
• Results II
• Results III
Conclusion
BDAC-SC14 16 / 17
100
101
102
103
104
10-3
10-2
10-1
100
101
102
103
Real Time Blob Detection
Number of processes
Tim
e (S
econ
d)
I/O Time - MPII/O Time - MPI/OpenMPDetection Time - MPIDetection Time - MPI/OpenMP
100
101
102
103
104
100
101
102
103
104
Real Time Blob Detection
Number of processes
Spe
edup
ove
r se
quat
ial
MPI SpeedupMPI/OpenMP Speedup
• Complete blob detection in around 2 ms with MPI/OpenMP using
4096 cores and in 3 ms with MPI using 1024 cores
• MPI/OpenMP is two times faster than MPI
• Linear time speedup in blob detection time and slightly more in
I/O time
Conclusion and future work
• Outline
Introduction
Related work
Blob detection
Hybrid parallel
Evaluations
Conclusion
• Conclusion
BDAC-SC14 17 / 17
We present for the first time a real time blob detection method
for finding blob-filaments in real fusion experiments or
numerical simulations.
• Key components:
◦ Two-step outlier detection with various criteria
◦ A fast connected component labeling method
◦ Hybrid MPI/OpenMP parallelization
• Future work:
◦ Test the detection algorithm to experimental measurement
data from operating fusion devices
◦ Develop a blob tracking algorithm
top related