the typhoon track classification in el niño/la niña...
Embed Size (px)
TRANSCRIPT

The Typhoon Track Classification in El Niño/La Niña Events
Tseng, John Chien-Han
Central Weather Bureau, Taipei, Taiwan
Pao, Hsing-Kuo
Department of Computer Science and Information Engineering
National Taiwan University of Science and Technology, Taipei, Taiwan
Faloutsos, Christos
School of Computer Science, Carnegie Mellon University, PA, USA
Sep/20/2012

2
The purpose…
To classify the typhoon trajectories in a period of
time belonged to El Niño or La Niña events.

3
1986/88 and 1998/00
Typhoon track
La Niña
El Niño

4
Related Work
• (Lee et al., 2007): use MDL(minimum description length) to decide the ensemble(average) path of typhoon tracks.
• (Camargo et al., 2007a, b): use trajectories clustering based on Gaffney and Smyth(1999)with mixtures regression model and arrange 7 clusters of typhoon tracks.
• (Risi, 2004): use Markov chain to predict the typhoon paths.
• (Vlachos and Kollios, 2002): combine longest common subsequence to discover similar multidimensional trajectories
Common point: every typhoon trajectory is regarded as one
sequence.
We think we probably could check typhoons by collecting a
period of time of typhoon cases to be one sequence.

5
Outline
• Motivation
• Methodology and data- tri-plots and fractal dimension- the data
• Performance of Tri-plots
• Conclusions and discussions

6
Fractal Dimension
These two diagrams have same fractal dimension.
a b
1
2 22
0
1 , 8,
3
1 , 8 , ( )
3
log ( ) log(8 ) log8 log8lim
log(1 ) log3 log3log(3 )
n
n
in S N
in S N
it implies
N nd
n

7
Tri-plots: cross-plot, self-plot
Assuming there are two datasets A and B, and the cross-plot
function is defined as:
, , , ( ) log( ) A B A i B i
i
Cross r C C
where CA, i (CB, i ) is the number of points from set A(B)in the i-th
cell, and r is the distance of the pairs of points. Hence, the cross-
plot function is proportional to the count of A-B pairs within distance
r, and the cross-plot is the figure of the cross-plot function versus
log(r).
, , ( 1)( ) log( )
2
A i A ii
A
C CSelf r
Also, the self-plot function is defined as

8
The Typical Tri-plots Result
When we compare two datasets A and B,…
Adopt form Traina et al.(2001)

9
The Data
• The Japan Meteorological Agency(JMA)typhoon tracks
data from 1950 to 2009. The time resolution is about 3~6
hrs.
• The National Center for Environment Prediction(NCEP)reanalysis II data from 1980 to 2009.
• The El Niño, La Niña, or neutral events based on Niño 3.4
index are published by NOAA.(http://www.cpc.noaa.gov/products/analysis_monitoring/ensostuff/ensoy
ears.shtml)

10
El Niño, La Niña, Neutral events
ENSO La Niña Neutral
1982/May~1983/Jul 1984/Oct~1985/Sep 1980/Feb~1982/Apr
1986/Aug~1988/Feb 1988/May~1989/May 1983/Aug~1984/Sep
1991/Apr~1992/Jul 1995/Sep~1996/Mar 1985/Oct~1986/Jul
1994/Apr~1995/Mar 1998/Jul~2000/Jul 1989/Jun~1991/Apr
1997/Apr~1998/May 2007/Aug~2008/Jun 1992/Aug~1994/Apr
2002/Apr~2003/Feb 1996/Apr~1997/Apr
2004/Jul~2005/Feb 2001/Mar~2002/Apr
2003/Apr~2004/May
2005/Mar~2006/Jul
2008/Jun~2009/Jul

11
Outline
• Motivation
• Methodology and data- tri-plots and fractal dimension- the data
• Performance of Tri-plots
• Conclusions and discussions

12
Experiment 4: use lon., lat., u, v, minp 5-D( , , , , )u v p

13
Topography effect, the geodistance calculation
• Use land, ocean (1, 0) grid (lon., lat.) data to be one mask function as follows:
• Define Gaussian function , 22
( )( )
2( )tc
G e
x x
x
where x=(lon., lat.), means one specific grid point;
and xtc=(lon., lat.), means the typhoon center.
1, ( ( , ))
0,
if at landM lon lat
if at ocean
xx
x
2000kmThe standard deviation in the Gaussian function is
• The distance is calculated by(120 ,80 )
(80 ,20 )
( ) ( ) ( )
W N
tc
E S
dist M Gx x x x

14
The Distribution of Self Plot
The dist. of (m,b) by 300 300u v p topo( , , , , , )

15
The Procedure of Tri-plots Experiment
• The slope and intercept of self-plot or cross-plot can be
regarded as the point and the distance in the space.
• If the datasets are quite different each other, then the
distribution will be more dispersive or separated.
• We use the distribution of self-plot to decide what kind of
feature we will use in classification. At the same time, the
cross-plots can be regarded as the distances of ISOMAP. The
points composed by the leading 5 eigenvectors of ISOMAP
are substituted into SSVM for classification.

SVM(Support Vector Machines),
SSVM(Smooth Support Vector Machines)
16

The Classification Based on Expanded Self-plot points by SSVM
• The training error is 0.254;
the testing error is 0.286
(passed the 95%
confidence interval test)
17

ISOMAP(Isometric Feature Mapping)
18
This method was proposed by
Tenebaum et al.(2000, Science)

19
ISOMAP+SSVM
Figure 3.7: The 2-D structure from ISOMAP with k=6 based on cross-plots
experiments.

20
The Result of Classification Based on Cross-Plots
• Select 4 leading eigenvectors of ISOMAP to compose the different event points.
• Give different k nearest neighbors. • 10 –fold cross validation and all the results passed 95% confidence test.
kSSVM 5 6 7 8 9 10
Training error 0.030 0.012 0.130 0.085 0.062 0.057
Testing error 0.304 0.320 0.337 0.274 0.302 0.301
Table 1: Error table of the SSVM based on cross-plots and ISOMAP
Self-plot (training/testing): 0.254/0.286

21
The Classification Result of Half Period of Event
kSSVM
5 6 7 8 9 10
Training error 0.250 0.354 0.389 0.091 0.109 0.018
Testing error 0.391 0.393 0.400 0.358 0.335 0.296
Table 2: Error table of earlier half period of event
by SSVM based on cross-plots and ISOMAP
kSSVM
5 6 7 8 9 10
Training error 0.313 0.110 0.012 0.014 0.102 0.019
Testing error 0.386 0.366 0.326 0.333 0.342 0.343
Table 3: Error table of later half period of event by
SSVM based on cross-plots and ISOMAP
kSSVM
5 6 7 8 9 10
Testing error
0.416 0.433 0.416 0.466 0.450 0.450
Table 6: Error table of using earlier period
event to predict later period event
kSSVM
5 6 7 8 9 10
Testing error
0.416 0.483 0.433 0.433 0.433 0.416
Table 7: Error table of using later period
event to predict earlier period event
Self-plot (training/testing): 0.29/0.30
Self-plot (training/testing): 0.25/0.26

22
The Classification Result of Half Track
kSSVM 5 6 7 8 9 10
Training error 0.032 0.014 0.277 0.087 0.068 0.061
Testing error 0.306 0.321 0.392 0.277 0.306 0.290
kSSVM
5 6 7 8 9 10
Training error 0.269 0.142 0.097 0.314 0.320 0.115
Testing error 0.384 0.363 0.322 0.399 0.399 0.363
Table 4: Error table of first half of track by
SSVM based on cross-plots and ISOMAP
Table 5: Error table of later half of track by
SSVM based on cross-plots and ISOMAP
kSSVM 5 6 7 8 9 10
Testing error
0.423 0.433 0.333 0.466 0.366 0.416
kSSVM
5 6 7 8 9 10
Testing error
0.416 0.433 0.416 0.433 0.417 0.416
Table 8: Error table of using first half
of track to predict later half of track
Table 9: Error table of using later half
of track to predict earlier half of track
Self-plot (training/testing): 0.35/0.413
Self-plot (training/testing): 0.315/0.333

23
Conclusions and discussions
• The features of classification in tri-plots are
, which can be used or considered in typhoon databases or typhoon tracks
predictions.
• We have about 67-70% accuracy to classify the typhoon tracks belonged to
El Niño or La Niña events based on tri-plots.
• Afford the objective view to cluster the typhoon tracks of the different annual
year events. Moreover, tri-plots could be the diagnostic tool of typhoon track
ensemble prediction system.
• Self-plots can be regarded as the metadata of typhoon track data, that is, the
distance or inner product of two self-plot points explains the similarity of two
tracks, and the distance or inner product is very easy to calculate.
• How to implement these methods to other traditional Meteorological data, e.g.
500 hPa height anomaly correlation score, ensemble prediction perturbation
sets, weather regime classification.
• we just split the original 12 events into 60 events, and the number of the
events is still too few.
( , , , , , )u v p topo

24
ReferenceCamargo, S, J., A. W. Robertson, S. J. Gaffney, and P. Smyth. Cluster analysis of typhoon tracks.
Part I: general properties. Journal of Climate, vol. 20, pages 3635-3653, 2007a.
Camargo, S, J., A. W. Robertson, S. J. Gaffney, and P. Smyth. Cluster analysis of typhoon tracks.
Part II: large circulation and ENSO. Journal of Climate, vol. 20, pages 3654-3676, 2007b.
Gaffney, S. and P. Smyth. Trajectory clustering with mixtures of regression models. In the
International Conference on Knowledge Discovery and Data Mining, pages 63-72, 1999.
Lee, J.-G., J. Han, and K.-Y. Whang. Trajectory clustering: A partition-and-group framework.
International Conference on Management of Data, pages 593-640. 2007.
Traina, A., C. Traina, S. Papadimitriou, and C. Faloutsos. Tri-Plots: Scalable tools for multi
dimensional data. In the International Conference on Knowledge Discovery and Data
Mining, pages 184-193, 2001.
Vlachos, M. D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In
the International Conference on Data Engineering, pages 673-684, 2002.
Thank You!

25
The DistributionThe dist. of (m,b) by ( , ) The dist. of (m,b) by ( , , )p
The dist. of (m,b) by ( , , , , )u v p The dist. of (m,b) by ( , , , , , )u v p topo

9798, 0203 events
26
Same slope but different
intercept…

8889, 9800 events
27
Same slope but different
intercept…