performance improvement for bayesian classification on spatial data with p-trees amal s. perera...

Performance Improvement for Bayesian Classification on Spatial Data with P-Trees

Amal S. PereraMasum H. SeraziWilliam Perrizo

Dept. of Computer ScienceNorth Dakota State University

Fargo, ND 58105

These notes contain NDSU confidential andProprietary material.Patents pending on the P-tree technology

Outline

• Introduction• P-Tree• P-Tree Algebra• Bayesian Classifier• Calculating Probabilities using P-Trees• Band-based vs. Bit-based approach • Sample Data• Classification Accuracy• Classification Time• Conclusion

Introduction

• Classification is a form of data analysis and data mining that can be used to extract models describing important data classes or to predict future data trends.

• Some data classification techniques are:

Decision Tree Induction Bayesian Neural Networks K-Nearest Neighbor

Case Based Reasoning Genetic Algorithm rough sets fuzzy logic techniques

• A Bayesian classifier is a statistical classifier, which uses Bayes’ theorem to predict class membership as a conditional probability that a given data sample falls into a particular class.

Introduction Cont..

• The P-Tree data structure allows us to compute the Bayesian probability values efficiently, without resorting to the naïve Bayesian assumption.

• Bayesian classification with P-Trees has been used successfully in remotely sensed image precision agriculture to predict yield and in genomics (2-yeast hybrid classification) to place in the ACM 02KDD-cup competition. http://www.biostata.wisc.edu/~craven/kddcup/winners.html

• To completely eliminate the naïve assumption, a bit-based Bayesian classification is used instead of a band-based approach.

http://www.biostata.wisc.edu/~craven/kddcup/winners.html

P-Tree

• Most spatial data comes in a band format called BSQ.

• Each BSQ band is divided into several files, one for each bit position of the data values. This format is called ‘bit Sequential’ or bSQ.

• Each bSQ bit file, Bij (file constructed from the jth bits of ith band), into a tree structure, called a Peano Tree (P-Tree).

• P-Trees represent tabular data in a lossless, compressed, bit-by-bit, recursive, datamining-ready arrangement.

A bSQ file, its raster spatial file and P-Tree

Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count

Level Fan-out QID (Quadrant ID)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

55

16 8 15 16

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

16 16

55

0 4 4 4 4

158

1 1 1 0

3

0 0 1 0

1

1 1

3

0 1

1111110011111000111111001111111011111111111111111111111101111111

P-Tree Algebra

• Logical operator– And – Or– Complement– Other (XOR, etc)

• Applying this operators we calculate value P-Trees, interval P-Trees, and slice P-Trees.

Ptree: 55 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 16 ____8__ _15__ 16 / / | \ / | \ \ 3 0 4 1 4 4 3 4 //|\ //|\ //|\ 1110 0010 1101

Complement: 9 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 0 ____8__ __1__ 0 / / | \ / | \ \ 1 4 0 3 0 0 1 0 //|\ //|\ //|\ 0001 1101 0010

’ indicates COMPLEMENT

operation

P-Tree Algebra Cont..

• Basic P-Trees can be combined using logical operations to produce P-Trees for the original values at any level of bit precision. Using 8-bit precision for values, Pb11010011 , which counts the numer of occurrences of 11010011 in each quadrant, can be constructed from the basic P-Trees as:

Pb11010011 = Pb1 AND Pb2 AND Pb3’ AND Pb4 AND Pb5’ AND Pb6’ AND Pb7 AND Pb8

AND operation is simply the pixel-wise

AND of the bits

• Similarly, any data set in the relational format can be represented as P-Trees. For any combination of values, (v1,v2,…,vn), where vi is from band-i, the quadrant-wise count of occurrences of this combination of values is given by:

P(v1,v2,…,vn) = P1v1 ^ P2v2 ^ … ^ Pnvn

Bayesian Classifier

Pr(Ci | X) is the posterior probability

Pr(Ci) is the prior probability

Can find conditional probabilities, Pr(X|Ci).

Classify X with Max Pr(Ci | X)

Since Pr(X) is constant for all classes, therefore, instead maximize Pr(X|Ci) * Pr(Ci).

)()(*)|(

)|(XPr

iCPriCXPrXiCPr

Based on Bayes Theorem:

Calculating Probabilities Pr(X|Ci)

Using naïve assumption Pr(X | Ci ) = Pr( X1 | Ci ) × Pr( X2 | Ci )… × Pr( Xn | Ci )× Pr( XC | Ci) Scan the data and calculate Pr(X | Ci ) for given X .

Using P-Trees:

Pr(X|Ci) = # training samples in Ci having pattern X / # samples in class Ci

= RC[ P1(X1) ^ P2(X2) ^ … ^Pn(Xn) ^ PC(Ci) ] / RC[ PC(Ci) ]

Problem ? : if RC[ P1(X1) ^ P2(X2) ^ … ^Pn(Xn) ^ PC(Ci) ] = 0 for all i i.e unclassified pattern does not exist in the training set.

Band-based-P-tree Approach

• When all RC = 0 for given pattern

– Reduce the restrictiveness of the pattern• Removing the attribute with least information gain

– Calculate (assume attribute 2 has the least IG)• Pr( X | Ci ) = RC[ P1X1 ^ P3X3 ^ … ^ PnXn ^ PCCi ] / RC[ PCCi ]

• Calculation of information gain Using P-trees– 1 time calculation for the entire training data

Bit-based Approach• Search for similar patterns by removing the least significant bits in the attribute space.

• The order of the bits to be removed is selected by calculating the info gain (IG).

(b)(a)

R

00

01

10

11

01 10 1100 G

G(c)

01 10 1100

R

00

01

10

11

R

00

01

10

11

01 1000 11 G

00

01

10

11

(d)01 10 1100

R

G

E.g., Calculate the Bayesian conditional probability value for the pattern [G,R] = [10,01] in 2-attribute space.

Assume IG for 1st significant bit of R < that of G.

Assume IG for 2nd significant bit of G < that of R.

Initially, search for the pattern, [10,01] (a).

If not found, search for [1_,01] considering IG for the 2nd significant bit. Search space will increase (b).

If not found, search for [1_,0_] considering IG for the 2nd significant bit. Search space will increase (c).

If not found, search for [1_,_ _] considering IG for the 1st significant bit. Search space will increase (d).

Experiments

• The experimental data was extracted from two sets of aerial photographs of the Best Management Plot (BMP) of the Oakes Irrigation Test Area (OITA) near Oaks, North Dakota.

»

• The images were taken in 1997 and 1998.

• Each image contains 3 bands, red, green and blue reflectance values.

»

• Three other files contain synchronized soil moisture, nitrate and yield values.

Classification Accuracy

• Accuracy of the proposed bit-based approach is compared with band-based, and KNN with Euclidian distance.

• It is clear that our approach out performs the others.

Classification accuracy for '97 Data

0

10

20

30

40

50

60

70

80

90

1K 4K 16K 65K 260K

Training Data Size (pixels)

Band-Ptree KNN-Euc. Bit

Classification Accuracy Cont..

• The accuracy of the approach was also compared to an existing Bayesian belief network classifier. The classifier is J Cheng's Bayesian Belief Network available at

http://www.cs.ualberta.ca/~jcheng/ .

– This classifier was the winning entry for the KDD Cup 2001 data mining competition. The developer claims that the classifier can perform with or without domain knowledge.

• For the comparison smaller training data sets ranging from 4K to 16K pixels were used due to the inability of the implementation to handle larger data sets.

Training Size (pixels)Bit-Ptree

Based Bayesian Belief

4000 66 % 26 %

16000 67 % 51 %

The Belief network was built without using any domain knowledge to make it comparable with to P-Tree approach.

Accuracy

http://www.cs.ualberta.ca/~jcheng/







Classification Time

• P-Tree approach requires no build time (lazy classifier).

• In most lazy classifiers the classification time per tuple varies with the number of items in the training set due to the requirement of having to scan the training data.

• P-Tree approach does not require a traditional data scan.

• The data in figure was collected using 5 significant bits and a threshold probability of 0.85.

• The time is given for scalability comparisons.

Variation of Classification Time with Training Size for bit-P-tree alg.

0

100

200

300

0 100 200 300

Trainig sample size (pixels)

Conclusion

• Naïve assumption reduces the accuracy of the classification in this particular application domain.

• Our approach increases accuracy of a P-Tree Bayesian classifier by completely eliminating the naïve assumption.

– New approach has a better accuracy than the existing P-Tree based Bayesian classifier.

– It was also shown to be better than a Bayesian belief network implementation and a Euclidian distance based KNN approach.

• It has the same computational cost with respect to the use of P-Tree operations as the previous P-tree approach, and is scalable with respect to the size of the data set.

performance improvement for bayesian classification on spatial data with p-trees amal s. perera...

Documents