ann+gis: an automated system for property valuation

10
Neurocomputing 71 (2008) 733–742 ANN+GIS: An automated system for property valuation Noelia Garcı´a, Matı´as Ga´mez, Esteban Alfaro Faculty of Business and Economics, University of Castilla-La Mancha, Plaza de la Universidad 1, 02071 Albacete, Spain Abstract Although property valuation models have become an important paradigm in real estate market research, the results of the most well- known approaches are limited due to various data-related problems such as the non-linearity of relationships, the presence of noise, or the absence of necessary information. This paper focuses on overcoming these obstacles. We introduce an automated system for property valuation that combines artificial neural network models with a geographic information system, and both tools have shown their potential usefulness in the field of economic research. The artificial neural network models used in this work are the multilayer perceptron, the radial basis function, and Kohonen’s maps. r 2007 Elsevier B.V. All rights reserved. Keywords: Artificial neural networks; Geographic information systems; Housing prices 1. Introduction The task of valuating housing properties has been largely developed within the real estate market analysis. This regression problem has been usually tackled econometri- cally through hedonic and repeat-sales models, both belonging to the transaction-based approach [23]. The hedonic pricing model was first developed by Rosen [18] in 1974 and constitutes a linear regression approach in which the property price is determined as the weighted sum of the different characteristics of which the property is made up. The other approach, the repeat-sales model, was intro- duced by Bailey et al. [3] in 1963. This model has been applied far less than the hedonic model due to the difficulty of finding the required information to implement it. More recently, there have been some successful attemps from the geostatistics field [8]. Additionally, since the pioneering work of Borst [5], in 1991 the artificial neural network (ANN) models have become a very attractive alternative to the more traditional econometric models. The main advantage of these techniques is the ability to deal with non-linear relationships or initially unknown models. Other works that must be mentioned are: Tay and Ho [20], Do and Grudnitsky [6], Worzala et al. [22], McCluskey [14], Nguyen and Cripps [16]. This latter work concludes that ANN performs better than multiple- regression analysis if a sufficient training data size is provided. On the other hand, whichever approach is used, the analysis can be improved through the integration with a geographic information system. As Thruston [21] stated, ANN linked to GIS can be used to simulate how the human brain processes spatial data problems. There are many applications in which the ANN coupled to GIS has turned out very useful. For instance, we can mention: land use, oceanography, forestry, consumer movement, airport noise evaluation and so on. The aim of this work is to show how different models of ANN and a geographic information system can be combined to constitute a very powerful tool for economic research, specifically for the design of an automated property appraisal system and for other complex tasks related to the real estate market (e.g., the objective assignation of a quality level to each property, which clearly has a large impact on the market value). In order to reach these goals, we will use three of the most well- known ANN models [4,10,13,17]: the multilayer percep- tron (MLP), radial basis function networks (RBF), and self-organizing feature maps (SOFM) also known as Kohonen’s maps [12]. The first two models represent ARTICLE IN PRESS www.elsevier.com/locate/neucom 0925-2312/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2007.07.031 Corresponding author. Tel.: +34 967 599 200; fax: +34 967 599 220. E-mail addresses: [email protected] (N. Garcı´a), [email protected] (M. Ga´mez), [email protected] (E. Alfaro).

Upload: noelia-garcia

Post on 10-Sep-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

0925-2312/$ - se

doi:10.1016/j.ne

�CorrespondE-mail addr

Matias.Gamez@

(E. Alfaro).

Neurocomputing 71 (2008) 733–742

www.elsevier.com/locate/neucom

ANN+GIS: An automated system for property valuation

Noelia Garcıa, Matıas Gamez, Esteban Alfaro�

Faculty of Business and Economics, University of Castilla-La Mancha, Plaza de la Universidad 1, 02071 Albacete, Spain

Abstract

Although property valuation models have become an important paradigm in real estate market research, the results of the most well-

known approaches are limited due to various data-related problems such as the non-linearity of relationships, the presence of noise, or

the absence of necessary information. This paper focuses on overcoming these obstacles. We introduce an automated system for property

valuation that combines artificial neural network models with a geographic information system, and both tools have shown their

potential usefulness in the field of economic research. The artificial neural network models used in this work are the multilayer

perceptron, the radial basis function, and Kohonen’s maps.

r 2007 Elsevier B.V. All rights reserved.

Keywords: Artificial neural networks; Geographic information systems; Housing prices

1. Introduction

The task of valuating housing properties has been largelydeveloped within the real estate market analysis. Thisregression problem has been usually tackled econometri-cally through hedonic and repeat-sales models, bothbelonging to the transaction-based approach [23]. Thehedonic pricing model was first developed by Rosen [18] in1974 and constitutes a linear regression approach in whichthe property price is determined as the weighted sum of thedifferent characteristics of which the property is made up.The other approach, the repeat-sales model, was intro-duced by Bailey et al. [3] in 1963. This model has beenapplied far less than the hedonic model due to the difficultyof finding the required information to implement it.

More recently, there have been some successful attempsfrom the geostatistics field [8]. Additionally, since thepioneering work of Borst [5], in 1991 the artificial neuralnetwork (ANN) models have become a very attractivealternative to the more traditional econometric models.The main advantage of these techniques is the ability todeal with non-linear relationships or initially unknown

e front matter r 2007 Elsevier B.V. All rights reserved.

ucom.2007.07.031

ing author. Tel.: +34967 599 200; fax: +34 967 599 220.

esses: [email protected] (N. Garcıa),

uclm.es (M. Gamez), [email protected]

models. Other works that must be mentioned are: Tay andHo [20], Do and Grudnitsky [6], Worzala et al. [22],McCluskey [14], Nguyen and Cripps [16]. This latter workconcludes that ANN performs better than multiple-regression analysis if a sufficient training data size isprovided. On the other hand, whichever approach is used,the analysis can be improved through the integration with ageographic information system. As Thruston [21] stated,ANN linked to GIS can be used to simulate how thehuman brain processes spatial data problems. There aremany applications in which the ANN coupled to GIS hasturned out very useful. For instance, we can mention: landuse, oceanography, forestry, consumer movement, airportnoise evaluation and so on.The aim of this work is to show how different models of

ANN and a geographic information system can becombined to constitute a very powerful tool for economicresearch, specifically for the design of an automatedproperty appraisal system and for other complex tasksrelated to the real estate market (e.g., the objectiveassignation of a quality level to each property, whichclearly has a large impact on the market value). In orderto reach these goals, we will use three of the most well-known ANN models [4,10,13,17]: the multilayer percep-tron (MLP), radial basis function networks (RBF), andself-organizing feature maps (SOFM) also known asKohonen’s maps [12]. The first two models represent

Page 2: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

Fig. 1. Street map of Albacete.

Table 1

Confusion matrix for MLP and SOFM models

Real class Assigned class

MLP SOFM

Bad Standard Very

good

Bad Standard Very

good

No

class

Training

Bad 43 1 0 30 10 0 4

Standard 7 203 2 5 198 5 4

Very

good

0 8 24 2 15 14 1

Validation

Bad 14 5 0 13 6 0 0

Standard 5 96 0 6 89 5 1

Very

good

2 7 8 0 13 4 0

Test

Bad 17 3 0 14 5 1 0

Standard 5 91 3 1 90 3 5

Very

good

0 3 15 0 9 8 1

N. Garcıa et al. / Neurocomputing 71 (2008) 733–742734

a very interesting alternative to traditional methods withregard to regression and supervised classification, whereasSOFM are specially designed for clustering tasks. So, wewill use the MLP and RBF models for the regression taskof estimating housing prices and MLP and Kohonen’smaps for intermediate tasks relating to the imputation ofmissing values for various qualitative variables such as thequality of the property.

The second section of this article describes the crucialproblem to be solved, i.e. the estimation of free housingprices in the city of Albacete (Spain), and provides essentialdetails of the sample as well as preliminary data processing.Section 3 deals with the implementation of neural modelsand underlines the most important results. In Section 4, wecombine the best model in terms of accuracy with thegeographic information system by creating a computerprogram in the SciViews graphics environment of the freesoftware R, and this combination provides the automatedvaluation system. Finally, Section 5 discusses someexperimental results and suggests some changes in thedesign of databases related to real estate markets so thatthe generalized use of the procedure proposed in theprevious section could be possible.

2. Problem description

As we have already mentioned, the main objective of thispaper is to develop an automated valuation system. Such asystem must be able to estimate the market value of aproperty from information about the location and othercharacteristics that may have some influence on it. Thestarting point is to compile the database and available

information has been obtained by a sampling procedurefrom data supplied by real estate agencies due to the lack ofany official information relating to relevant characteristicssuch as quality, parking, heating, etc. After much hardwork, we obtained 591 sample cases corresponding to realmarket transactions conducted in Albacete in 2002. Formore details, see [9]. The sample records contain a wealthof information about the following explanatory variables:

Page 3: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

Table 2

Regression statistics

Price/m2 Total price

MLP RBF MLP RBF

Data mean 1285.9390 1292.1970 139 967.7 135 204.4

Data SD 296.0914 301.5658 460 32.14 47 511.7

Error mean 18.3310 �24.8083 �271.8061 �1183.905

Error SD 139.3686 153.6088 13 049.79 17 601.32

Absolute error mean 98.6936 116.802 9085.213 13 220.65

Relative error mean 7.6748 9.0390 6.4909 9.7783

SD ratio 0.4707 0.5093 0.2835 0.3705

Correlation (target and estimation) 0.8833 0.8606 0.9591 0.9317

R2 0.8129 0.7406 0.9196 0.8628

N. Garcıa et al. / Neurocomputing 71 (2008) 733–742 735

Property type

1

Ins

tion2

afte

Ha

dis

nor

Property type: a qualitative variable with the value 0(apartments) and 1 (single family houses).

Coord.X

� Coord.Y

AgeSurface

BedroomsBathrooms

Location: the postal address is converted into twonumerical variables CoordX and CoordY using thelocation of the exact point on the geo-referenced1 streetmap.

Lift Total

� Age: expressed in years. housing priceBalcony

Heating

� Surface: measured in usable square metres. �

Quality

Number of bedrooms: number of rooms apart from theliving room.

Parking

� Storage room

GabLod

Bathrooms: numerical variable resulting from addingone point for a complete bathroom and half a point foran incomplete bathroom without a bath or shower.

Fig. 2. MLP 14:16–5–1:1 architecture.

� Lift: a qualitative variable with a value of 1 if there is alift and 0 if not. � Balcony: a qualitative variable with a value of 1 if there

is a balcony with more than 15m2.

� Heating: a qualitative variable with a value of 0 if there

is no heating system and 1 if there is.

� Quality: a qualitative variable with three categories:

J Bad for old houses built with poor quality materialsand in a bad state of repair.

J Standard for new or semi-new houses that were builtaccording to standard levels of quality and for oldrenovated properties.

J Very good, for top quality new houses.

Thi

titu

fr

The

r r

ll. T

plac

th

Parking: quantitative variable that indicates parkingspaces. � Storage room: a qualitative variable with a value of 0 if

there is not a storage room and 1 if there is.

� Gablod: in addition to the information provided by the

agencies, the use of GIS has enabled us to includeanother variable that measures the distance to the well-accepted city centre, i.e. Plaza de Gabriel Lodares.2

s tool was provided by the Teledetection and GIS section of the

to de Desarrollo Regional having received the necessary authoriza-

om the Mayor of Albacete.

square Plaza de Gabriel Lodares has been taken as the city centre

uling out other points such as the Plaza del Altozano or the City

he reason for this is that these ‘‘historical centres’’ have been

ed by the growth of the city that has been restricted towards the

by the railway lines.

3

and

wo4

Total housing price3: from this variable we have obtaineda new one, the square meter price as the ratio betweenthe total price and the usable surface. These variablesconstitute dependent variables, i.e. the outputs in thedifferent neural models that will be designed.

Fig. 1 reproduces the electronic map of Albacete, withthe dots representing the 591 sample records.It is worth mentioning that further information was

requested from the agencies such as the main orientation ofthe building or the presence of recreational areas, swim-ming pools, gardens, etc. Unfortunately, however, much ofthis information about these variables was found to bemissing and so could not be used. For the remainingvariables with a reasonable number of omitted values, wedecided to complete them using the k-nearest technique forthe quantitative variables and ANN models4 (MLP andSOFM) for classification tasks for the qualitative variables.In the following section, we will briefly describe the

quality variable procedure. This variable gathers a greatdeal of information and can often be very subjective. For

It is important to point out that these prices come from true buying

selling operations rather than the offer prices that are largely used in

rk on this subject.

The neural models have been estimated using TRAJAN software.

Page 4: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

Fig. 3. Correlation between target and estimated prices.

N. Garcıa et al. / Neurocomputing 71 (2008) 733–742736

29 cases in the total sample, the agencies had not labelledthe level of quality and there was not enough informationto assign it (home improvements, condition of the floors,carpentry, windows, etc.), and so we used the remaining

562 cases to develop a network capable of estimating themissing data. In order to measure the true error in order toselect the best networks in terms of their generalizationcapacity, the available samples were divided into three

Page 5: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

Fre

quen

cies

0] 0] 0] 0] 0] 0] ] 0] 0] 0] 0]

0

5

10

15

20

25

30

35

40

;

0]

[-50000 -4

500 0]

[-45000;-4

000 0]

[-40000;-3

500

[-35000;-3

000

[-30000;-2

500

[-25000;-2

000

[-20000;-1

500

[-15000;-1

000

[-10000;-5

00

[-5000;0]

[0; 5000]

[5000;10000

[10000;1500

[15000;2000

[20000;2500

[25000;3000 0]

[30000;3500 0]

[35000;4000

Error

Fre

quen

cies

Fig. 4. Error histogram for the test set.

Table 3

Sensitivity analysis

Ranking Variable Ratio Ranking Variable Ratio

1 Surface 2.6978 8 Quality 1.4074

2 Heating 2.1082 9 Balcony 1.2087

3 Lift 1.9416 10 Age 1.1859

4 GabLod 1.7586 11 Coord. Y 1.1514

5 Type 1.6852 12 Bedroom 1.1129

6 Parking 1.5713 13 Coord. X 1.0972

7 Storage 1.4987 14 Bathroom 1.0486

It can be seen that all input variables passed the sensitivity analysis.

N. Garcıa et al. / Neurocomputing 71 (2008) 733–742 737

different sets: the training set (288 cases), the validation set(137 cases), and the test set (137 cases). After many trials,the two best networks were selected: a 14:14–6–3:1 MLP(14 nodes in the input layer, a hidden layer with sixnodes, and one node for each quality level in the outputlayer), and a 14–49 SOFM5 (14 units in the input layerand 49 nodes in the competition layer). Table 1 shows theconfusion matrix for both models.

From the diagonal form of the confusion matrix in Table 1,we can conclude that there is no confusion betweenextreme classes. Paying special attention to the results forthe test set, the percentage of correctly classified cases bythe MLP is close to 90%, while Kohonen’s map is onlycapable of classifying 82% of the cases. Since the MLPresults were more satisfactory than Kohonen’s ones,missing data were substituted from MLP predictions.6

3. Estimation of free housing prices

Having completed the sample, we now come to themain objective of this research—that is, the estimation ofproperty prices in Albacete. This task is approached fromtwo perspectives, the square meter price and the total price,and for each one we trained a large number of MLP andRBF networks. Two models were then selected for eachdependent variable. Table 2 shows the main results for thefour models.

The most significant value is the prediction errorstandard deviation. If this measure is not better than thetraining data standard deviation, then the network hasperformed no better than a simple mean estimator. We cananalyse the explained variance of the model through theratio of the prediction error SD to the training data SD.A value significantly lower than 1.0 indicates goodregression performance. In sight of the previous regressionstatistics, we could conclude that the best results wereobtained for the MLP estimating the total price. In thefollowing section, we will study the process of designingand training this model in detail.

The MLP architecture proposed here was selected after alarge number of trials. The number of nodes in the inputand the output layer was determined by the structure ofour analysis, i.e. the number of explanatory and outputvariables, respectively. On the other hand, the number of

5Although self-organizing feature maps were primarily designed to solve

clustering tasks, they can also be used to supervise classification tasks.

Once the SOFM has been trained, we can label each competition node.

The labels allow us to compute the error as the percentage of well-

classified cases in the supervised classification as normal. In this work, one

restriction has been imposed and that is that at least 50% of the cases

where one node is the winner must belong to the same class for this node

to be labelled with this class. With this restriction, we attempt to keep give-

and-take between the number of nodes that will remain unlabelled and

trust in the labelled neurons. In this case, the result is that nine cases in the

training set were not classified, one in the validation set and six in the test

set.6Nevertheless, an exhaustive comparison was carried out which

concluded that agreement between both procedures exceeded 75%.

hidden layers and elements in each one was chosen bytaking as a criterion the construction of a network with theleast possible complexity. This objective resulted in ourselecting a 14:16–5–1:1 network, i.e. an input layer with 14nodes, pre-processed into 16 nodes,7 a hidden layer withfive elements, and finally an output layer with one. Thisarchitecture is shown in Fig. 2.The following training decisions were set. Firstly, the

sample was divided into three subsets: a training set (50%),a validation set (25%), and a test set (25%). The activationfunctions were selected to be linear in the input layer andsigmoid in the hidden and output layers. Weights andthreshold were then randomly set. The network was trainedwith the Delta-Bar-Delta8 algorithm with the sum of

7All variables were pre-processed before being introduced into the

network. The numerical variables were scaled to produce new variables in

the range 0–1. The qualitative variables were encoded by the two-state

technique in a single input variable, except for the variable quality that was

converted using the one-of-N method. This technique uses a set of

variables, one for each possible nominal value. In this case, there were

three categories of quality, so the total number of variables changes from

14 to 16.8The Delta-Bar-Delta rule, proposed in [11], is an improvement of

standard back propagation. The objective is to accelerate the convergence

of the learning process from the following idea: since the error surface may

have different gradients along the direction of each weight, it might be

desirable to allow the learning rates to differ for each adjustable parameter

in the network and to allow these rates to be adaptable during the epochs

Page 6: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

145441

62459.250

Y (AGE)

0 0

X (GABLOD)

2375 250513735.1

238635

Y (SURFACE)35 0

X (AGE)

60

Fig. 5. Response surfaces.

100000

N. Garcıa et al. / Neurocomputing 71 (2008) 733–742738

squared errors (SSE) as the error function. The details ofthe algorithm were:

98000

96000

(fo

in t

for

low9

line

ord

to

Maximum number of epochs: 2500.

ri

ce

� Initial learning rate: 0.001.

P 94000

� Increment: 0.07. � 92000 Decay: 0.5. � Smoothing9: 0.5

90000

Age

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Fig. 6. Network response to the age control.

Once the network has been completely trained after 2346epochs, the model must be validated and for this, we willreanalyze the results in Table 2. For the test set, thepercentage of the variance that has been explained is closeto 92%, the correlation coefficient between the estimatedand the target output is 0.96, and the absolute error meanis about 9085h.

It is, however, worth specifying two points. Firstly, weknow that a high correlation coefficient does not implycoincidence between the estimated price and the target.However, if we look at the graph in Fig. 3, the idea ofcoincidence is not mad since the tendency line almostcoincides with the squared diagonal; in other words, mostpoints are close to the diagonal line where the estimatedprices and the target prices are the same. Secondly, asFig. 3 shows, only a few points are a long way from thetendency line for the test set. Moreover, an analysis of theerror histogram in Fig. 4 reveals that only by eliminatingsix extreme errors do the results change significantly. Morespecifically, the absolute error mean and the relative meanerror drop to 7811h and 5.65%, respectively.

Once the model has been satisfactorily validated, therelative contribution of each input variable to the global

otnote continued)

he learning process. The learning rate should be higher when changes

one weight in consecutives epochs occur in the same direction but

er when the signs of those changes are opposite.

It is worth mentioning that the increments on the learning rate are

ar whereas the decays are set to be exponential. This is necessary in

er to prevent the learning rate from growing too fast while allowing it

decay rapidly if needed while guaranteeing the positive sign.

performance can be assessed by means of a sensitivityanalysis, which entails testing network performance as ifthe input variable were unavailable. Table 3 shows the ratiofor each variable. This ratio measures the relation of theerror if the correspondent input variable is unavailablewith the error if all variables are available. A ratio of oneor lower therefore means that pruning this input variablehas no effect on network performance.Although an artificial neural network model is usually

considered like a ‘‘black box’’ in the sense that it becomesvery difficult to explain the relationships between eachinput with the output, this task can be resolved throughresponse surface graphs. Fig. 5 shows two of the mostinteresting response surfaces. The graph on the left-handside analyses the network response for the age and thedistance to the city centre and the graph on the right-handside analyses the response for surface and age.As expected, according to the monocentric assump-

tion,10 the slope of the response to GabLod was negative.

10This assumption has been widely studied since pioneer research by

Alonso [1], and refers to the existence of a central business district (CBD)

where economic and social activities are concentrated. Since property

prices are supposedly higher in this area, the relationship between the

distance to the CBD and value should have a negative slope. After

Alonso’s work, many authors’ research has been driven by this

Page 7: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

Fig. 7. Initial application page.

Fig. 8. Data editor to enter property characteristics.

N. Garcıa et al. / Neurocomputing 71 (2008) 733–742 739

However, it is worth pointing out that in centrally locatedareas, the oldest properties command the highest prices. Onthe other hand, the response had a non-linear behaviour

(footnote continued)

assumption: e.g. Mok, Chan and Cho [15]; Atack and Margo [2] or Dunse

and Jones [7]. Nevertheless, there have been a considerable number of

studies that have replaced the monocentric assumption with the non-

monocentric or polycentric one, assuming the presence of a group of sub-

centres that makes it impossible to observe an inverse relation between the

value and the distance to the main centre. However, we consider the

polycentric assumption to be more plausible in cities which are larger than

Albacete, which has nearly 1 60 000 inhabitants living in 1234 km2.

further from the centre in such a way that the highest priceswere found for both extremes of age. The response graphfor surface and age shows that for the largest properties,the price tends to decrease with age, whereas for medium-sized properties, the price only decreases with age until acertain point after which it starts to increase. This suggestsan interesting non-linear behaviour on the effect of age asthe univariate response graph in Fig. 6 also shows.Fig. 6 shows that when the other characteristics are kept

at a constant value that is equal to their mean, the lowestprice is reached for 22-year-old houses. This result is in linewith the one shown in [6] where the authors found evidence

Page 8: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESS

Fig. 9. Zoom into the area where the property in the example is located.

N. Garcıa et al. / Neurocomputing 71 (2008) 733–742740

of the existence of a reversal in the relationship between ageand value since the negative sign only holds for the first16–20 years of the property’s life. The reason for thisbehaviour was theorized by Sabella [19], arguing that afterthis point the property price starts to increase due to theincreasing value of the plot of land where the property isbuilt. At this moment, it is worth stressing the usefulness offlexible models such as ANN for modelling non-linearrelationships as in this case.

Fig. 10. Final report of the estimation process.

4. GIS and neural model integration: automated and

intelligent valuation system

Once the model has been estimated and validated, it canbe used to estimate prices for properties from the sampleand this task should be done in the easiest way possible. Inorder to do this, the estimated neural model can becombined with the geographical information system so thatit is only necessary to click on a dot on the Albacete map toselect a property and complete a data editor with itsparticular characteristics. In this work, this has been donein the R SciViews graphics environment, and the applica-tion has been personalized by creating a series of buttons toallow non-specialists to use the program. Fig. 7 shows the

Page 9: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESSN. Garcıa et al. / Neurocomputing 71 (2008) 733–742 741

initial page with a brief description of the most importantbuttons.

The process can be summarized as follows. The first stepis to click on the maptools button so as to load the differentlayers of the geographical information system (streets,blocks, etc.) and also the necessary tools for mapnavigation. The following step is to click on the Albacetemap button to visualize it and to use the mouse and thezoom buttons to move through the city in search of aproperty whose price needs to be estimated. Once the househas been located, we must click on the point estimationbutton and place the pointer in the exact location. A dataeditor such as the one shown on the left-hand side of Fig. 8then appears so that the property characteristics may beentered. It is worth mentioning that variables such as thelocation or the distance to the city centre are shownautomatically thanks to the use of the GIS.

Let us take as an example the estimation of the price of aflat located at 10 Cristobal Lozano Street. We imagine thatthe flat is 17 years old, 95m2, and standard quality, and hasthree bedrooms, a full bathroom, a toilet, a lift, no balcony,a heating system, one parking space, and no storage room.The estimated price given by the neural network system, inthe example 127 592h, appears in a box on the location ofthe property in the map (Fig. 9).

Finally, a more complete output can be ordered byclicking on the Make a report button. In this report(Fig. 10), we can find the main data for the example and therespective estimated price.

5. Conclusions

In this work, we have presented the construction of anautomated valuation system through the combination ofan artificial neural model and the geographical informationsystem. The combination of ANN and GIS has proved tobe a very powerful and useful tool for the task of real estatevaluation and these results could surely be extended to anyproblem dealing with spatial data.

With regard to the particular results of our empiricalwork, the objective has been reached satisfactorily, and theMLP model has performed better than SOFM in thequality level assignation problem. Comparing the perfor-mance of MLP and RBF models for property priceestimation, the best results were achieved by the MLPestimating the total price. This network yielded an R2 of0.92 and a relative mean error of 5.65%. Maybe the reasonfor these results is the size of the available sample (591records) that is small for RBF network requirements.

The sensitivity analysis showed that one of the mostimportant variables was the distance to the central businessdistrict (in this work, Plaza Gabriel Lodares-Gablod) witha negative slope according to the monocentric assumption.Another important result was the non-linear behaviour onthe effect of age in the housing price. In this sense, itis worth mentioning the capability of neural models to

extract non-linear relationships that cannot be detected bymore traditional models.Our work is by no means finished, and in the future we

will explore the possibilities of the GIS and includeadditional inputs in the analysis such as accessibility,existence of green spaces, distance to educational centres,and other social, economic and geographical factors. Forthis, it is imperative that geographical information isregularly updated so that data relating to the housingmarket is easily available. This is the only way to keep theautomated valuation system up-to-date, for if it is not, thesystem will lose its usefulness.

References

[1] W. Alonso, Location and Land Use, Harvard University Press,

Cambridge, MA, 1964.

[2] J. Atack, R.A. Margo, Location, location, location! The price

gradient for vacant urban land: New York, 1835–1900, J. Real Estate

Financ. Econ. 16 (2) (1998) 151–172.

[3] M.J. Bailey, R.F. Muth, H.O. Nourse, A regression method for real

estate price index construction, J. Am. Stat. Assoc. 58 (1963)

933–942.

[4] C.M. Bishop, Neural Networks for Pattern Recognition, Clarendon

Press, Oxford, 1995.

[5] R.A. Borst, Artificial neural networks: the next modellin/calibration

technology for the assessment community?, Prop. Tax J. 10 (1) (1991)

69–94.

[6] A.Q. Do, G. Grudnitski, A neural network analysis of the effect of

age on housing values, J. Real Estate Res. 8 (2) (1993) 253–264.

[7] N. Dunse, C. Jones, A hedonic price model of office rents, J. Prop.

Val. Invest. 16 (3) (1998) 297–312.

[8] M. Gamez, J.M. Montero, N. Garcıa, Kriging methodology for

regional economic analisis: estimating the housing price in Albacete,

Int. Adv. Econ. Res. 6 (3) (2000) 438–450.

[9] N. Garcıa, Diseno de Redes Neuronales Artificiales para el Mercado

Inmobiliario. Aplicacion a la ciudad de Albacete, Ph.D. Thesis,

Department of Business and Economics Sciences, University of

Castilla-La Mancha, 2004.

[10] S. Haykin, Neural Nerworks. A Comprehensive Foundation,

Prentice Hall, Englewood Cliffs, NJ, 1994.

[11] R.A. Jacobs, Increased rates of convergente through learning rate

adaptation, Neural Networks 1 (4) (1988) 295–307.

[12] T. Kohonen, Self-Organizing Maps, second ed., Springer, Berlin,

Heidelberg, 1997.

[13] B. Martın del Brıo, A. Sanz, A. Redes Neuronales y Sistemas

Borrosos, RA-MA, 1997.

[14] W. McCluskey, R. Borst, An evaluation of MRA, comparable sale

analyisis, and ANNs for the mass appraisal of residential properties

in North Ireland, Assess. J. 4 (1) (1997) 47–55.

[15] H.M.K. Mok, P.P.K. Chan, Y.S. Cho, A hedonic price model for

private properties in Hong-Kong, J. Real Estate Financ. Econ. 10

(1995) 37–48.

[16] N. Nguyen, A. Cripps, Predicting housing value: a comparison of

multiple regression analysis and artificial neural networks, J. Real

Estate Res. 22 (3) (2001) 314–326.

[17] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge

University Press, Cambridge, 1999.

[18] S. Rosen, Hedonic prices and implicit markets: product differentia-

tion in pure competition, J. Polit. Econom. 82 (1) (1974) 34–55.

[19] E.M. Sabella, Determining the relationship between the property’s

age and its market value, Assesors J. 9 (1974) 81–85.

[20] D.P.H. Tay, D.K.K. Ho, Artificial intelligence and the mass

appraisal of residential apartments, J. Prop. Val. Invest. 10 (1991/

1992) 524–540.

Page 10: ANN+GIS: An automated system for property valuation

ARTICLE IN PRESSN. Garcıa et al. / Neurocomputing 71 (2008) 733–742742

[21] J. Thurston, GIS & Artificial Neural Networks: Does your GIS

Think?, GISVision Magazine, 2002.

[22] E. Worzala, M. Lenk, A. Silva, An exploration of neural networks

and its application to real estate valuation, J. Real Estate Res. 10 (2)

(1995) 185–201.

[23] C.Y. Yiu, C.S. Tam, A review of recent empirical studies on property

price gradients, J. Real Estate Lit. 12 (3) (2004) 307–322.

Noelia Garcıa teaches Statistics at the Faculty of

Economic and Business Sciences in the University

of Castilla-La Mancha. She got her degree in

Economics at the University of Madrid (UAM)

in 1996 and completed her Ph.D. in Economics in

2004 on the construction of an intelligent and

automated system for property valuation through

the combination of neural nets and a geographic

information system (GIS). Current research deals

with spatial statistics and the combination of

classifiers (decision trees and neural nets) for solving heated topics in the

Economics.

Matıas Gamez teaches Statistics at the Faculty of

Economic and Business Sciences in the University

of Castilla-La Mancha. He got his degree in

Mathematics at the University of Granada in

1991 and finished a Master in Applied Statistics a

year after. He completed his Ph.D. in Economics

at the University of Castilla-La Mancha in 1998

on the application of geo-statistical techniques to

the estimation of housing prices. Current research

deals with spatial statistics and the combination

of classifiers (decision trees and neural nets) for solving heated topics in

the Economics.

Esteban Alfaro teaches Statistics at the Faculty of

Economic and Business Sciences in the University

of Castilla-La Mancha. He completed his degree

in Business in 1999 and got his Ph.D. in

Economics in 2005, both in the University of

Castilla-La Mancha. His thesis dealt with the

application of ensemble classifiers to corporate

failure prediction. Current research deals with

spatial statistics and the combination of classi-

fiers (decision trees and neural nets) for solving

heated topics in the Economics.