delineating metropolitan housing submarkets with fuzzy clustering methods julie sungsoon hwang...
Post on 19-Dec-2015
214 Views
Preview:
TRANSCRIPT
Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods
Julie Sungsoon HwangDepartment of Geography, University of Washington
Jean-Claude ThillDepartment of Geography, State University of New York at Buffalo
November 10, 2005North American Meetings of Regional Science Association International
Outlines
• Research objectives
• Methodology: specification
• Methodology: illustration
• Evaluating the performance of fuzzy clustering
• Conclusions
Research objectives
• Demonstrate the use of fuzzy c-means (FCM) algorithm for delineating housing submarkets– Comparison to K-means
• Discuss empirical characteristics of FCM applied to given applications, in particular choice of parameters– Cluster validity index
Challenges
• Are the boundaries of clusters crisp?
Cluster A
Cluster C
X1
X2
Housing market in metropolitan area q
Cluster B
Cluster A
Cluster B Cluster C
X1
X2
Housing market in metropolitan area p
• Our task is to group census tracts to homogeneous housing submarkets within a metropolitan area
• Using fuzzy c-means algorithm• In order to examine whether fuzzy set-based
clustering can do the better job• Implemented in 85 metropolitan areas• Most of data set are public (e.g. 2000 Census)• The whole procedure is automated in GIS
Methodology: flow chart
National
Regional
Local…Census Tract Layer
# x1 x2 x3 … xm
1
2
3
…
n
# y1 y2 … yk
1
2
3
…
n
Cluster Analysis# U1 U2 … Uc
1 1 0 … 0
2 0 1 … 0
… 0 1 … 0
n 0 0 … 1
# U1 U2 … Uc
1 0.85 0.05 … 0.10
2 0.12 0.80 .. 0.05
… 0.02 0.74 … 0.12
n 0.40 0.03 … 0.50
K-means
Fuzzy Fuzzy CC--meansmeans
Candidate variables
Significant variables
Stepwise regression (k ≤ m)
Metro
Hard Cluster Layer
(c ≤ n)
Fuzzy Cluster Layer
…1
2
c
k: # selected variables
c: # submarkets
For each metropolitan area
Uj: membership to cluster j
Explanatory variables for house priceVar_Name Variable Definition Data Year Spatial Unit
Socioeconomic/demographic Characteristics of Residents
pcincome per capita income Census 2000 Census Tract
college % college degree Census 2000 Census Tract
managep % management workers Census 2000 Census Tract
prodp % production workers Census 2000 Census Tract
famcpchl % family with children Census 2000 Census Tract
nfmalone % nonfamily living alone Census 2000 Census Tract
black_p % black Census 2000 Census Tract
nhwht_p % non-hispanic white Census 2000 Census Tract
nativebr % native born Census 2000 Census Tract
Structural Characteristics of Housing Units
medroom median number of room Census 2000 Census Tract
hudetp % detached housing unit Census 2000 Census Tract
yrhublt median year structure built Census 2000 Census Tract
Locational Characteristics (Amenities) of Neighborhoods
ptratio pupil to teacher ratio NCES* 2002 School District
schexp school expenditure per student NCES 2002 School District
vrlcrime violent crime rate FBI** 2003 Designated Place
prpcrime property crime rate FBI 2003 Designated Place
jobacm job accessibility (Hansen 1959) CTPP*** 2000 Census Tract
*National Center for Education Statistics; **FBI annual report “Crime in the U.S. 2003”; *** CTPP: Census Transportation Planning Package Dependent variables: median home value of owner-occupied housing units
Metropolitan AreasCMSAMSA
State
300 0 300 600 Miles
N
Source: TIGER/Line 1999
Metropolitan AreasCMSAMSA
StateStudy Set
300 0 300 600 Miles
N
Source: TIGER/Line 1999
Study set: 85 metropolitan areas
kx
iv
• Clustering method that minimizes the following objective function:
• Updates cluster means vi and membership degree uik until the algorithm converges
ikum
2
1 1
( )n c
mik k i A
k i
u x v
Vectors of data point, 1 ≤ k ≤ n
Center of cluster i, 1 ≤ i ≤ c
Membership degree of data point k with cluster i; [0,1]
Fuzziness amount associated with assigning data point k to cluster i, 1≤ m ≤ ∞
1 1
n nm m
i ik k ikk k
v u x u
12/( 1)
1
mc
k iik
j k j
x vu
x v
Source: Bezdek 1981
#
#
#
#
#
#
#
#
#
#
#
#
#
#
####
#
#
#
#
#
#
#
##
#
#
#
#
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
#
#
#
#
#
##
#
#
##
# #
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
#
#
##
#
#
#
#
##
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
x1
x2
What is fuzzy c-means (FCM)?
(III-3a) (III-3b)
Extended fuzzy c-means algorithm
• Step 1: Initialize the parameters related to fuzzy partitioning: c = 2 (2 ≤ c cmax), m = 1 (1 ≤ m mmax), where c is an integer, m is a real number; Fix minc where minc is incremental value of m ( 0 < minc ≤ 0.1); Fix cut-off threshold L; Choose validity index v
• Step 2: Given c and m, initialize U(0) so that it becomes the fuzzy matrix. Then at step l, l = 0, 1, 2, ….;
• Step 3: Calculate the c fuzzy cluster centers {vi(l)} with (III-3a) and U(l)• Step 4: Update U(l+1) using (III-3b) and {vi(l)}• Step 5: Compare U(l) to U(l+1) in a convenient matrix norm; if || U(l+1) – U(l) || ≤ L to
go step 6; otherwise return to Step 3.• Step 6: Compute the validity index for given c and m• Step 7: If c < cmax, then increase c c + 1 and go to step 3; otherwise go to step 8• Step 8: If m < mmax, then increase m m + minc and go to step 3; otherwise go to
step 9• Step 9: Obtain the optimal validity index from , optimal number of clusters c*, and
optimal amount of fuzziness exponent m*; The optimal fuzzy partition U is obtained given c* and m*
Cluster validity indices
2
1 1
( )( )
c n
iki k
uPC U
n
Partition coefficient
21 1
[ log ( )]( )
c c
ik iki k
u uPE U
n
Partition entropy
22
1 12
,
( )
min
n c
ik k i Ak i
XB
i j i j
u x vU
n v v
Xie-Beni index
2
1
1
11 1
2(2 ) /
1 1
( )
( )
nm
ik k ic Ak
ni
ikk
VI c cw w
ij j i Ai j
u x v
uS
z z
1
1
1ij w
cj i A
l j l Al j
z z
z z
1 2 1 1 2[ , ,...., , ] [ , ,...., , ]
1 1,1 1,
T Tc c cz z z z v v v x
i c j c j i
SVi indexwhere w is set to 2 in this study
• Selected validity indices are calibrated over the study set
Xie-Beni index is recommended as a validity indexAverage m* is 1.38
0
0.2
0.4
0.6
0.8
1
1.2
1.4
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of clusters c
Ind
ex
va
lue UXB
PC
PE
SVI/100
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
Fuzziness amount mIn
dex
val
ue
UXB
SVI/100
Determining c* and m*
Dimensionality of Buffalo housing market
Predictor Coefficient Standard Error t-statistics p-value
Constant -1455768 164417 -8.85 0.000
Per capita income 2.3667 0.2791 8.48 0.000
% college degree 88221 11346 7.78 0.000
% family: couple with children 65735 18775 3.50 0.001
% detached housing unit -31260 5527 -5.66 0.000
Housing age (year) 692.88 80.26 8.63 0.000
% non-hispanic white 11186 3914 2.86 0.005
% native born status 130039 31111 4.18 0.000
Job accessibility -0.05266 0.02227 -2.36 0.019
Hedonic regression equation of median home value in Buffalo, NY
Adjusted R sq = 84.3%
Optimal number of housing submarkets c*, Optimal fuzziness amount m*, Buffalo, NY
c m 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
2 0.4735 0.4570 0.4380 8.0983 10.4115 12.5478 14.4334 16.0634 17.4645 18.6721
3 0.4136 0.3889 0.3460 0.3385 10.7864 12.9137 14.7939 16.4217 17.8290 19.0553
4 0.7802 0.7116 0.6080 0.5241 1.3154 6.8837 7.4807 8.0441 8.5632 9.0391
5 0.5560 0.5622 0.5940 0.6121 0.4683 0.3404 0.6489 0.6850 0.7206 0.7555
6 0.6223 0.7578 1.0187 0.8173 0.6907 1.3393 1.4074 1.4819 1.5595 1.6382
7 0.8836 0.6903 0.6881 0.6016 0.6148 0.9515 2.4397 2.6306 2.8317 3.0383
8 0.5981 0.5888 0.5703 0.5232 0.3992 0.7381 0.8910 1.2388 1.2926 1.3538
9 0.9645 0.6160 0.4836 0.4866 0.8449 1.4020 1.4198 1.8317 1.8639 1.9161
10 0.7053 0.6004 0.6619 0.5873 0.5868 1.3465 1.5081 1.6875 1.8215 1.8591
c* 3 3 3 3 8 5 5 5 5 5
Values in the cell represent Xie-Beni index given c and m
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
ZPCINCOME ZCOLLEGE ZFAMCPCHL ZHUDETP ZYRHUBLT ZNHWHT_P ZNATIVEBR ZJOBACM
Attribute Vector
Clu
ste
r M
ea
n
Cluster 1
Cluster 2
Cluster 3
c* = 3; m* = 1.3
No Data
Membership degree to Cluster 10 - 0.10.1 - 0.20.2 - 0.30.3 - 0.40.4 - 0.50.5 - 0.60.6 - 0.70.7 - 0.80.8 - 0.90.9 - 1
Interstate Highway
(A)
Membership to Cluster 1
No Data
Membership degree to Cluster 20 - 0.10.1 - 0.20.2 - 0.30.3 - 0.40.4 - 0.50.5 - 0.60.6 - 0.70.7 - 0.80.8 - 0.90.9 - 1
Interstate Highway
(B)
Membership to Cluster 2
No Data
Membership degree to Cluster 30 - 0.0990.099 - 0.1970.197 - 0.2960.296 - 0.3950.395 - 0.4930.493 - 0.5920.592 - 0.6910.691 - 0.7890.789 - 0.8880.888 - 0.986
Interstate Highway
(C)
Membership to Cluster 3
No Data
Defuzzified Clusters123
Interstate Highway
(D)
Defuzzified Clusters
Buffalo housing submarkets
• Compare the sum of squared error derived from KM (m=1) and FCM (m=m*) given c*
Fuzzy clustering outperforms crisp clustering
Paired Samples Statistics
1026.546 85 3848.268377 417.4033
745.7332 85 3022.266891 327.8109
j2_hcm
j2_fcm
Pair1
Mean N Std. DeviationStd. Error
Mean
Paired Samples Test
280.8133 915.57126275 99.30765 83.32912 478.2974 2.828 84 .006j2_hcm - j2_fcmPair 1Mean Std. Deviation
Std. ErrorMean Lower Upper
95% ConfidenceInterval of the
Difference
Paired Differences
t df Sig. (2-tailed)
22
1 1
( )n c
ik k i Ak i
u x v
Compare FCM with K-means (KM)
Conclusions
• Fuzzy set theory provides a mechanism for uncertainty handling involved in classification task
• Fuzzy c-means algorithm is of practical use in delineating housing submarkets
• Fuzzy set theory needs further attention in social science fields
• More works on the choice of parameters are needed
top related