# the typhoon track classification in el niño/la niña...

Embed Size (px)

TRANSCRIPT

The Typhoon Track Classification in El Niño/La Niña Events

Tseng, John Chien-Han

Central Weather Bureau, Taipei, Taiwan

Pao, Hsing-Kuo

Department of Computer Science and Information Engineering

National Taiwan University of Science and Technology, Taipei, Taiwan

Faloutsos, Christos

School of Computer Science, Carnegie Mellon University, PA, USA

Sep/20/2012

2

The purpose…

To classify the typhoon trajectories in a period of

time belonged to El Niño or La Niña events.

3

1986/88 and 1998/00

Typhoon track

La Niña

El Niño

4

Related Work

• (Lee et al., 2007): use MDL(minimum description length) to decide the ensemble(average) path of typhoon tracks.

• (Camargo et al., 2007a, b): use trajectories clustering based on Gaffney and Smyth（1999）with mixtures regression model and arrange 7 clusters of typhoon tracks.

• (Risi, 2004): use Markov chain to predict the typhoon paths.

• (Vlachos and Kollios, 2002): combine longest common subsequence to discover similar multidimensional trajectories

Common point: every typhoon trajectory is regarded as one

sequence.

We think we probably could check typhoons by collecting a

period of time of typhoon cases to be one sequence.

5

Outline

• Motivation

• Methodology and data- tri-plots and fractal dimension- the data

• Performance of Tri-plots

• Conclusions and discussions

6

Fractal Dimension

These two diagrams have same fractal dimension.

a b

1

2 22

0

1 , 8,

3

1 , 8 , ( )

3

log ( ) log(8 ) log8 log8lim

log(1 ) log3 log3log(3 )

n

n

in S N

in S N

it implies

N nd

n

7

Tri-plots: cross-plot, self-plot

Assuming there are two datasets A and B, and the cross-plot

function is defined as:

, , , ( ) log( ) A B A i B i

i

Cross r C C

where CA, i (CB, i ) is the number of points from set A（B）in the i-th

cell, and r is the distance of the pairs of points. Hence, the cross-

plot function is proportional to the count of A-B pairs within distance

r, and the cross-plot is the figure of the cross-plot function versus

log(r).

, , ( 1)( ) log( )

2

A i A ii

A

C CSelf r

Also, the self-plot function is defined as

8

The Typical Tri-plots Result

When we compare two datasets A and B,…

Adopt form Traina et al.(2001)

9

The Data

• The Japan Meteorological Agency（JMA）typhoon tracks

data from 1950 to 2009. The time resolution is about 3~6

hrs.

• The National Center for Environment Prediction（NCEP）reanalysis II data from 1980 to 2009.

• The El Niño, La Niña, or neutral events based on Niño 3.4

index are published by NOAA.(http://www.cpc.noaa.gov/products/analysis_monitoring/ensostuff/ensoy

ears.shtml)

10

El Niño, La Niña, Neutral events

ENSO La Niña Neutral

1982/May~1983/Jul 1984/Oct~1985/Sep 1980/Feb~1982/Apr

1986/Aug~1988/Feb 1988/May~1989/May 1983/Aug~1984/Sep

1991/Apr~1992/Jul 1995/Sep~1996/Mar 1985/Oct~1986/Jul

1994/Apr~1995/Mar 1998/Jul~2000/Jul 1989/Jun~1991/Apr

1997/Apr~1998/May 2007/Aug~2008/Jun 1992/Aug~1994/Apr

2002/Apr~2003/Feb 1996/Apr~1997/Apr

2004/Jul~2005/Feb 2001/Mar~2002/Apr

2003/Apr~2004/May

2005/Mar~2006/Jul

2008/Jun~2009/Jul

11

Outline

• Motivation

• Methodology and data- tri-plots and fractal dimension- the data

• Performance of Tri-plots

• Conclusions and discussions

12

Experiment 4: use lon., lat., u, v, minp 5-D( , , , , )u v p

13

Topography effect, the geodistance calculation

• Use land, ocean (1, 0) grid (lon., lat.) data to be one mask function as follows:

• Define Gaussian function , 22

( )( )

2( )tc

G e

x x

x

where x=(lon., lat.), means one specific grid point;

and xtc=(lon., lat.), means the typhoon center.

1, ( ( , ))

0,

if at landM lon lat

if at ocean

xx

x

2000kmThe standard deviation in the Gaussian function is

• The distance is calculated by(120 ,80 )

(80 ,20 )

( ) ( ) ( )

W N

tc

E S

dist M Gx x x x

14

The Distribution of Self Plot

The dist. of (m,b) by 300 300u v p topo( , , , , , )

15

The Procedure of Tri-plots Experiment

• The slope and intercept of self-plot or cross-plot can be

regarded as the point and the distance in the space.

• If the datasets are quite different each other, then the

distribution will be more dispersive or separated.

• We use the distribution of self-plot to decide what kind of

feature we will use in classification. At the same time, the

cross-plots can be regarded as the distances of ISOMAP. The

points composed by the leading 5 eigenvectors of ISOMAP

are substituted into SSVM for classification.

SVM(Support Vector Machines),

SSVM(Smooth Support Vector Machines)

16

The Classification Based on Expanded Self-plot points by SSVM

• The training error is 0.254;

the testing error is 0.286

(passed the 95%

confidence interval test)

17

ISOMAP(Isometric Feature Mapping)

18

This method was proposed by

Tenebaum et al.(2000, Science)

19

ISOMAP+SSVM

Figure 3.7: The 2-D structure from ISOMAP with k=6 based on cross-plots

experiments.

20

The Result of Classification Based on Cross-Plots

• Select 4 leading eigenvectors of ISOMAP to compose the different event points.

• Give different k nearest neighbors. • 10 –fold cross validation and all the results passed 95% confidence test.

kSSVM 5 6 7 8 9 10

Training error 0.030 0.012 0.130 0.085 0.062 0.057

Testing error 0.304 0.320 0.337 0.274 0.302 0.301

Table 1: Error table of the SSVM based on cross-plots and ISOMAP

Self-plot (training/testing): 0.254/0.286

21

The Classification Result of Half Period of Event

kSSVM

5 6 7 8 9 10

Training error 0.250 0.354 0.389 0.091 0.109 0.018

Testing error 0.391 0.393 0.400 0.358 0.335 0.296

Table 2: Error table of earlier half period of event

by SSVM based on cross-plots and ISOMAP

kSSVM

5 6 7 8 9 10

Training error 0.313 0.110 0.012 0.014 0.102 0.019

Testing error 0.386 0.366 0.326 0.333 0.342 0.343

Table 3: Error table of later half period of event by

SSVM based on cross-plots and ISOMAP

kSSVM

5 6 7 8 9 10

Testing error

0.416 0.433 0.416 0.466 0.450 0.450

Table 6: Error table of using earlier period

event to predict later period event

kSSVM

5 6 7 8 9 10

Testing error

0.416 0.483 0.433 0.433 0.433 0.416

Table 7: Error table of using later period

event to predict earlier period event

Self-plot (training/testing): 0.29/0.30

Self-plot (training/testing): 0.25/0.26

22

The Classification Result of Half Track

kSSVM 5 6 7 8 9 10

Training error 0.032 0.014 0.277 0.087 0.068 0.061

Testing error 0.306 0.321 0.392 0.277 0.306 0.290

kSSVM

5 6 7 8 9 10

Training error 0.269 0.142 0.097 0.314 0.320 0.115

Testing error 0.384 0.363 0.322 0.399 0.399 0.363

Table 4: Error table of first half of track by

SSVM based on cross-plots and ISOMAP

Table 5: Error table of later half of track by

SSVM based on cross-plots and ISOMAP

kSSVM 5 6 7 8 9 10

Testing error

0.423 0.433 0.333 0.466 0.366 0.416

kSSVM

5 6 7 8 9 10

Testing error

0.416 0.433 0.416 0.433 0.417 0.416

Table 8: Error table of using first half

of track to predict later half of track

Table 9: Error table of using later half

of track to predict earlier half of track

Self-plot (training/testing): 0.35/0.413

Self-plot (training/testing): 0.315/0.333

23

Conclusions and discussions

• The features of classification in tri-plots are

, which can be used or considered in typhoon databases or typhoon tracks

predictions.

• We have about 67-70% accuracy to classify the typhoon tracks belonged to

El Niño or La Niña events based on tri-plots.

• Afford the objective view to cluster the typhoon tracks of the different annual

year events. Moreover, tri-plots could be the diagnostic tool of typhoon track

ensemble prediction system.

• Self-plots can be regarded as the metadata of typhoon track data, that is, the

distance or inner product of two self-plot points explains the similarity of two

tracks, and the distance or inner product is very easy to calculate.

• How to implement these methods to other traditional Meteorological data, e.g.

500 hPa height anomaly correlation score, ensemble prediction perturbation

sets, weather regime classification.

• we just split the original 12 events into 60 events, and the number of the

events is still too few.

( , , , , , )u v p topo

24

ReferenceCamargo, S, J., A. W. Robertson, S. J. Gaffney, and P. Smyth. Cluster analysis of typhoon tracks.

Part I: general properties. Journal of Climate, vol. 20, pages 3635-3653, 2007a.

Camargo, S, J., A. W. Robertson, S. J. Gaffney, and P. Smyth. Cluster analysis of typhoon tracks.

Part II: large circulation and ENSO. Journal of Climate, vol. 20, pages 3654-3676, 2007b.

Gaffney, S. and P. Smyth. Trajectory clustering with mixtures of regression models. In the

International Conference on Knowledge Discovery and Data Mining, pages 63-72, 1999.

Lee, J.-G., J. Han, and K.-Y. Whang. Trajectory clustering: A partition-and-group framework.

International Conference on Management of Data, pages 593-640. 2007.

Traina, A., C. Traina, S. Papadimitriou, and C. Faloutsos. Tri-Plots: Scalable tools for multi

dimensional data. In the International Conference on Knowledge Discovery and Data

Mining, pages 184-193, 2001.

Vlachos, M. D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In

the International Conference on Data Engineering, pages 673-684, 2002.

Thank You!

25

The DistributionThe dist. of (m,b) by ( , ) The dist. of (m,b) by ( , , )p

The dist. of (m,b) by ( , , , , )u v p The dist. of (m,b) by ( , , , , , )u v p topo

9798, 0203 events

26

Same slope but different

intercept…

8889, 9800 events

27

Same slope but different

intercept…