Transcript
Page 1: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Introduction to IEEE ICDM Data Mining

Contest (ICDM DMC 2007)

[email protected]

Page 2: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Main Parts

• Introduction to ICDM DMC 2007

• The work of our team

Page 3: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Introduction to ICDM DMC 2007

• This year’contest is the first IEEE ICDM Data Mining Contest,which will be held in conjunction with the 2007 IEEE International Conference on Data Mining.

• http://www.cse.ust.hk/~qyang/ICDMDMC07/

Page 4: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

What is the Problem?

• This year's contest is about indoor location estimation from radio signal strengths received by a client device from various WiFi Access Points (APs)

Page 5: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

What is the AP?

• Access Points are base stations for the wireless network. They transmit and receive radio frequencies for wireless enabled devices to communicate with.

Page 6: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

• The client device (which can be a PDA) is equipped with a wireless card that can receive signals from many surrounding wireless access points (APs).  Each of these APs is identifiable with a unique ID.  Based on the collection of signal strength values (RSS values), a data mining algorithm running on the client device tries to figure out the current location of the user.

Page 7: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

RSS Vectors

• RSS Vector = <(AP1, RSS Value1), (AP2, RSS Value2)...(AP k, RSS Value k)>

• The ID of AP is an integer between 0 and 100.• The value is also an interger between 0 and –99.• The number k is different in difference RSS• The WiFi data are very noisy due to the so-called

multi-path effect in indoor environments

Page 8: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Location Label

• All WiFi data are collected in 247 locations, where each location is a grid.  A grid has a size of about 1.5m×1.5m.

• Location label is an integer between 1 and 247.

Page 9: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task 1. Indoor Location Estimation

• All the WiFi data (training data and test data) are collected by the same device in the same time period.

• There are two types of data provided in this task:

• 1 trace data • 2 non-trace data.

Page 10: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task1. trace data

Page 11: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Some statistical information of task1.trace data

• 40 traces • 1404 collections , 130 collections labeled• 11881 pairs of APID and value• Average 8.5 pairs of APID and value per

collection, the minimum is 1,maximum is 19

Page 12: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task1. non-trace data

Page 13: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Some statistical information of task1.non-trace data

• 1792 collections of RSS values • 375 collections labeled• Average 8.5 pairs of APID and value per

collection, the minimum is 1,maximum is 19

• 15256 pairs of APID and value

Page 14: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task_2_training_data

Page 15: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Some statistical information of Task_2_training_data

• 2322 collections of RSS values • 621 collections labeled• 2.5 collections labeled per class. Min is 1

and max is 8• Average 8.6 pairs of APID and value per

collection, the minimum is 2,maximum is 19

Page 16: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task2 Test Dataset

Page 17: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Task2 Landmark Dataset

Page 18: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Evaluation Criterion

• For Task 1, baseline is precision=60%.

• For Task 2, baseline is precision=30%.

Page 19: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

The algorithm of our teamfor task2

Page 20: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Step1:sieve out the collections labeled

Page 21: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Step2:Get Differences of Arbitrary Two Collections labeled

• Number of the pairs of APID – value which are only in one collection

• Sum of absolute of such RSS value above with -100

• Number of the pairs of APID – value which are in two collection

• Sum of absolute of such RSS value above• Is or is not same location, 1 is same and –1 is

not

Page 22: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

An example

• collectionA:119 18:-96 23:-87 66:-69

• collectionB: 54 18:-94 83:-62 85:-76 86:-72 89:-85

• The Five number is 6,149,1,2,-1

Page 23: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Step3:Get coefficients by Linear Fitting

• e=dlmread('distance_matrix.txt');• b=e(:,5);• x=e(:,7:9);• x(:,2)=[];• [x1,y1] = find(b>0);• x_pos =x(x1,:);• b_pos=b(x1,1);• x_append = x;• b_append = b;• for i = 1:floor(length(b)/length(b_pos))• x_append=cat(1,x_append,x_pos);• b_append = cat(1,b_append,b_pos);• end• a=x_append\b_append;• c=(x*a).*b;• accuracy = sum(c>0)/length(b);• display(accuracy);

Page 24: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Remainder Steps:

Step4: Get centers of per class( the collections of the same location)

Step5: Testing.Our highest precision=28.30%

Page 25: Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Thank you!


Top Related