analysing spatial distribution and dependencies between
TRANSCRIPT
![Page 1: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/1.jpg)
Analysing spatial distribution and
dependencies between spatial data
sets
Kirsi Virrantaus
29.10.2020
GIS-E1060 Spatial Analytics
![Page 2: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/2.jpg)
Spatiaalisen jakauman ja spatiaalisten
riippuvuuksien analyysi
Kirsi Virrantaus
29.10.2020
GIS-E1060 Spatial Analytics
![Page 3: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/3.jpg)
Contents and learning goals
• 1. Revisiting Moran´s I – continuation to the last week exercise
– You understand more about the interpretation of Moran´s I
– You can use correlogram in analysis of spatial autocorrelation type
– You understand the difference between clustering and spatial
autocorrelation
• 2. Introduction to spatio-statistical testing and analysis of
dependencies between two spatial data sets
– You understand how the distribution of point pattern can be
analysed comparing it to CSR model
– You understand how spatio-statistical testing can be used in
identifying spatial dependencies
– You can make simple spatio-statistical tests
APPENDIX: Background material for the examples
![Page 4: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/4.jpg)
Understanding Moran´s I and spatial
autocorrelation
Lisää Moranin indeksin tulkinnasta ja
spatiaalisesta autokorrelaatiosta
![Page 5: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/5.jpg)
1. Spatial autocorrelation can be
analysed by using Moran´s I
W is important, showing the adjacency of compared objects
W also defines the used neighbourhood-concept
The core of the formula is the covariance term
(O´Sullivan & Unwin)
![Page 6: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/6.jpg)
How to analyse the Moran´s I value?
• We know that Moran´s I gets value between -1…1
• Value 0 indicates no spatial autocorrelation
• Value 1 indicates positive autocorrelation (theoretical)
• Value -1 indicates negative autocorrelation (theoretical)
• What else does the I value tell about the spatial
arrangement of the data set ?
![Page 7: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/7.jpg)
Spatial covariance
• Moran´s I is based on covariance function and so-called
adjacency matrix (weight matrix) W
• Adjacency matrix stores the spatial
relationships/adjacencies of elements that we want to
take into account in autocorrelation analysis
– Adjacency can be defined in many ways
– Immediate neighbours, or neighbours with lag
– If elements are neighbours matrix gives 1, if not it gives 0
• Covariance is calculated by comparing how adjacent
elements vary together from the mean value
![Page 8: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/8.jpg)
Spatial correlogram
• Spatial correlogram is a plot that shows the values of
spatial autocorrelation (for example Moran´s I) against
the distance
• In book Dale & Fortin there is a good example on
different spatial autocorrelation structures and the
corresponding spatial correlograms; pages 148…151
• By analyzing the map you can see how the different
examples are described by the spatial correlogram
• By interpreting the correlogram you can understand
more about spatial arrangement
![Page 9: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/9.jpg)
https://stats.stackexchange.com/questions/100919/what-causes-a-u-shaped-
pattern-in-the-spatial-correlogram
![Page 10: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/10.jpg)
Computing correlogram
• Moran´s index is calculated for pairs of x values that are
in distance d; W(d) is the number of pairs f sampling
locations at the distance d
• Moran´s index gets values as normally between 1…-1
• When there are too few pairs and the spatial layout
looks non-stationary, the index can get bigger values
• This happens at the largest distances most often
![Page 11: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/11.jpg)
Small exercise
• You get a set of schematic grid maps (a,b,c,d,e,f) and a
set of correlograms (1,2,3,4,5,6)
• Try to analyse which correlogram describes each of the
maps
• You will have a Wednesday exercise on the same topic
• This small exercise is just to help understanding how
correlogram works
![Page 12: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/12.jpg)
![Page 13: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/13.jpg)
Towards variogram analysis
• Varigram analysis is another descriptive method to
analyse spatial autocorrelation
– The strength, direction and spatial range
• Variogram is used in so-called Kriging interpolation
method
– You learn variogram idea in next lecture
![Page 14: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/14.jpg)
Autocorrelated or clustered ?
Autokorreloitunut vai klusteroitunut ?
![Page 15: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/15.jpg)
Autocorrelated or clustered ?
• You have learnt now a bit more about spatial
autocorrelation
• You have also learnt the various spatial arrangement
types, like clustering
• Do these two concepts mean the same ?
• It is easy to get confused with these concepts
![Page 16: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/16.jpg)
What is the difference of spatial
autocorrelation and spatial clustering– Cluster = “a group of similar things or people positioned or
occurring closely together”
• The definition is based of similarity and distance
• The measure used is actually the distance
• both coordinates and attributes can be considered as
dimensions and the distances between points are calculated
in n dimensional space; similarity = short distance
– Spatial autocorrelation is “a measure of similarity (correlation)
between nearby observations”
• spatial autocorrelation defines clustering, the focus is in the
variation of attribute values
– Clustering can be defined without attributes (only
coordenates of the points), autocorrelation is about
attribute values
![Page 17: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/17.jpg)
Some other concepts of spatial
distribution
• Hot spot
– “An area in which the values of measured variables are
relatively high in comparison to its surroundings”
• Cold spot
– “An area in which the values of measured variables are
relatively low in comparison to its surroundings”
• The calculated measures are based on (point) density in
a defined neighbourhood (difference to Kernel)
• The results is called as “heat map”
![Page 18: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/18.jpg)
https://www.gislounge.com/difference-heat-
map-hot-spot-map/
Kernel density map
Heat map
![Page 19: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/19.jpg)
Spatio-statistical testing and inference
Spatio-tilastollinen testaus ja päättely
![Page 20: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/20.jpg)
2. Statistical view to problems
• When researchers study the world and phenomena relatedto it they apply scientific method
• The reseach is based on empirical data sets on thephenomenon
• Scientific method starts with analysis of contents and concept definition
• The data available is described in order to getunderstanding about possible dependencies
• Hypotheses are suggested and a model is built in orderto test the hypothesis
• If the hypothesis gets support the model can be developedinto laws and even theory
![Page 21: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/21.jpg)
Tilastotieteen näkökulma ongelmiin
• Kun tutkijat lähestyvät todellisuuden ilmiöitä ja ongelmia, he
käyttävät tieteellistä lähestymistapaa
• Tutkimus perustuu empiiriseen dataan
• Tieteellinen lähestymistapa lähtee kontekstin ja käsitteiden
määrittelystä
• Käytettävissä olevaa dataa kuvaillaan, jotta saataisiin käsitys
vallitsevista riippuvuuksista asioiden välillä
• Luodaan hypoteeseja ja kehitetään malleja hypoteesien
testaamiseksi
• Jos hypoteesit saavat vahvistusta niistä voidaan kehittää
lakeja(sääntöjä) ja jopa teorioita
![Page 22: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/22.jpg)
(Rogerson, 2015)
Statistics develops and applies methods and models
which can be used in making conclusions and decisions
on phenomena based on numerical or quantitative data
that includes uncertainty and randomness
![Page 23: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/23.jpg)
Description and inference
• Statistical methods can be used in describing the phenomenon
– Descriptive statistics
– Methods describe the characteristics like: mean, variance,
standard deviation (statistical indices), statistical models and
their estimation
– Sample of population
• Inferential methods
– The goal is to make conclusions based on collected data
– Attempt to be able to predict how the phenomenon behaves in
future
– Methods used are statistical models and their estimation and
testing
– Core tool is the hypothesis on the behaviour of the phenomenon
– A model is used to represent the entire population
![Page 24: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/24.jpg)
Kuvailu ja päättely
• Deskriptiivinen eli kuvaileva tilastotiede
– Kuvailevat menetelmät
– Tilastollisilla tunnusluvuilla kuvaillaan datan piirteitä: keskiarvo,
varianssi, keskihajonta, mediaani
– Otosaineiston käsittely
• Tilastollinen inferenssi eli päättely
– Tavoitteena tehdä johtopäätöksiä kerätyn datan perusteella
– Pyrkimys voida ennustaa ilmiön käyttäytymistä tulevaisuudessa
– Menetelmät ovat tilastolliset mallit, niiden estimointi ja testaus
– Perustuu hypoteesiin ilmiön käyttäytymisestä
– Mallin oletetaan kuvaavan koko populaatiota
![Page 25: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/25.jpg)
Spatio-statistical methods
• GIS users typically use descriptive
spatiostatistical tools– Easy to use and interpret
• Researchers also apply statistical inference– Requires good knowledge and skills on statistical methods
• Methods are often the same but they are used
in different ways
![Page 26: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/26.jpg)
Spatiotilastolliset menetelmät
• GIS-käyttäjät käyttävät usein kuvailevia menetelmiä
– Helppoja käyttää ja tulkita
• Tutkijat käyttävät tutkimuksissaan tilastollista
päättelyä
– Vaativat hyvät tiedot ja taidot tilastollisesta
analyysistä
• Menetelmät voivat olla samoja mutta niitä käytetään
eri tavalla
![Page 27: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/27.jpg)
Statistical inference with spatial
point patterns
• In spatiostatistical inference we study and model random
variables in 2-,3- or even 4 dimensions
• methods are most often related with
– Point patterns and point processes
– Geostatistics, interpolation
• In the methods
– Empirical data is compared with the CRP/IRP
– First aim is to identify the spatial autocorrelation
– The goal is to develope a model that can show the
characteristics of the found spatial behaviour
![Page 28: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/28.jpg)
Tilastollinen päättely spatiaalisessa
analyysissä
• Spatiaalisessa tilastollisessa päättelyssä mallinnetaan
satunnaismuuttujan arvojen vaihtelua kaksi-tai kolmiulotteisessa
avaruudessa(myös neljä, jos mukana aika)
• Menetelmät liittyvät useimmiten
– pistejoukkojen tarkasteluun eli pisteprosesseihin tai sitten
– interpolointiin eli geostatistiikkaan
• Menetelmissä
– Dataa verrataan täydellisen satunnaisuuden malliin
– Pyritään tunnistamaan autokorrelaatio ja sen vahvuus
– Tavoitteena malli, joka huomioi esim. autokorrelaation ominaisuudet
![Page 29: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/29.jpg)
The analysis starts often by use of some descriptive method: The
density of domestic fires by Kernel estimator - describing the data set
by intensity
![Page 30: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/30.jpg)
(O´Sullivan & Unwin)
The spatial arrangement can be described by using distance
to the nearesr neighbour –method and just visually analyse
the frequency graphs
![Page 31: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/31.jpg)
(O´Sullivan,D.,& Unwin,D., 2003)
![Page 32: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/32.jpg)
Some definitions needed
Määritelmiä
Point, patterns, point processes
Pistejoukot, pisteprosessit
![Page 33: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/33.jpg)
Example on point pattern: domestic fires
![Page 34: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/34.jpg)
How to define point pattern ?• simplest formulation of point pattern
– a set X = {x ∈ D} where D, which can be called the 'study region,' is a subset of Rn, a n-dimensional Euclidean space
– points have location and they can have attributes (qualitative, quantitative)
• point pattern can be visualized as a map
• point pattern analysis is the study of the spatial or spatio-temporalarrangements of points
• Basic measures:– frequency (how many) and intensity (how many per area unit)
– mean center, standard distance (how points are located in relationto their mean center or median center)
• arrangement of the points:– random, regular, clustered
– clustered/dispersed
• the concept of Complete Spatial Randomness, CSR– Is used in statistical testing of the spatial distribution
– http://gispopsci.org/point-pattern-analysis/
![Page 35: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/35.jpg)
Pistekuvion määritelmä• pistekuvion yksinkertainen määritelmä
– joukko X = {x ∈ D} missä D, jota voidaan kutsua ‘tutkimusalueeksi,' on Rn:nosajoukko n-ulotteisessa Euclidisessa avaruudessa
– pisteillä on sijainti, ja niillä voi olla myös laadullista ja määrällistä ominaisuustietoa
• pistekuvio voidaan visualisoida karttana
• pistekuvion analyysi on pisteiden tilajärjestyksen tutkimista
• perusmittarit:– frekvenssi (kuinka monta) ja intensiteetti (kuinka monta per
alueyksikkö)
– keskiarvopiste, etäisyyden keskihajonta (ilmaisee kuinka pisteet ovat sijoittuneet niiden keskiarvopisteen suhteen)
• Pisteiden tilajärjestys:– satunnainen, säännöllinen, klusteroitunut
– klusteroitunut, hajautunut
• täydellisen spatiaalisen satunnaisuuden käsite– käytetään spatiaalisen jakautumisen testaamisessa
– http://gispopsci.org/point-pattern-analysis/
![Page 36: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/36.jpg)
Methods use simple indices of
descriptive spatial statistics
– frequency• amount of the objects in the study area
– intensity• amount of objects per unit area
– mean center • is the point whose coordinates are the mean of the
corresponding coordinates of all the events of the pattern; average x, average y;
– median center• is the location to which the sum of traveled distances from
points is shortest; shortest total distance to all other features in the study area
– distance• in larger scales, assumption that the world is flat: most often
Euclidian distance
– standard distance• shows how dispersed the points are around the mean center
![Page 37: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/37.jpg)
Käytetään yksinkertaisia
kuvailevan spatiotilastotieteen
tunnuslukuja eri menetelmissä
– frekvenssi (tilastotieteessä) (esiintymistiheys)
• (samanarvoisten) pisteiden määrä tutkimusalueella
– tiheys
• kohteiden lukumäärä alueyksikössä
– keskiarvopiste
• piste, jonka koordinaatit ovat pistejoukon vastaavien koordinaattien keskiarvot
– mediaanipiste
• piste, josta etäisyyksien summa toisiin pisteisiin tutkimusalueella on pienin
– etäisyys
• tavallisesti Euklidinen etäisyys (muitakin on)
– keskietäisyys
• kuvaa pisteiden hajontaa keskiarvopisteen ympärillä
![Page 38: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/38.jpg)
How to model point patterns ?
• Points are random/stochastic variables produced by a process (likedomestic fires in Helsinki)
• When the variables are indexed by spatial points a spatial random field is created
– Variables in spatial random field are geometrically dependent
• Spatial random field is a spatial stochastic process
– Mathematical and statistical methods can be applied under somerestrictions
• In inferential statistics empirical data is compared to mathematical model
• In spatial statistics Poisson process is often used as the model of complete randomness
– Various methods are used in showing whether the data set fits withthe model or not
– See on pages 58…64 the theory behind Poisson; actually Poissonis a simplification of binomial distribution which is laborous to calculate; however by a simple example you can understand theformula and the idea
![Page 39: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/39.jpg)
Pistekuvion mallintamisen perusteita
• Pisteet ovat tulosta jostain reaalimaailman prosessista; kuten esimerkiksi
”onnettomuudet”
– Reaalimaailman prosessia voidaan yrittää kuvata matemaattisella
prosessilla
• Koska reaalimaailman prosesseihin liittyy aina sattumanvaraisuutta, niitä
kuvataan satunnaisprosesseilla, stokastisilla prosesseilla
• Tapahtumia, esim. onnettomuuksia, kuvataan prosessin
satunnaismuuttujina
• Kun satunnaismuuttujat indeksoidaan avaruuden pisteille syntyy
spatiaalinen stokastinen prosessi, spatiaalinen satunnaiskenttä
– Spatiaalisessa satunnaiskentässä muuttujat ovat sidoksissa
geometrisesti
• Tilastollisessa päättelyssä empiiristä dataa verrataan johonkin sopivaan
matemaattiseen malliin, pistedataa usein Poisson prosessiin
• Poisson prosessi on siis täydellisen satunnaisuuden malli, johon empiiristä
dataa verrataan ja todetaan kuinka hyvin data sopii malliin
– Ks. sivut 58-64 kirjassa; siinä esitellään binomijakauma esimerkillä
![Page 40: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/40.jpg)
Example on point pattern as a process
• ”domestic fires” can be considered as a process
• variable x = domestic fire, describes the process
• Variable x has possible locations
• We are interested on the probabilities of the possible locations
• If the environment in Helsinki would be totally homogeneous and we
could not recognize any spatial autocorrelation in the domestic fire
phenomenon
– We could assume that the probability of having a fire in any
location is equal throughout Helsinki
– and we could say that the process is random
• But in Helsinki the domestic fires depend on
– The location of the buildings (no building => no fire)
– The population (in most cases people cause fires)
– Other dependencies ….(facilities, materials…)
![Page 41: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/41.jpg)
Esimerkki rakennuspaloista
prosessina• ”rakennuspalot” on prosessi
• muuttuja x = rakennuspalo, kuvaa prosessia
• muuttujalla x on mahdollisia sijainteja
• Olemme kiinnostuneita rakennuspalojen mahdollisten sijaintien
todennäköisyyksistä
• Jos esimerkiksi Helsingin alue olisi täysin homogeeninen
ominaisuuksiltaan emmekä voisi tunnistaa autokorrelaatiota
rakennuspalo-ilmiössä
– voisimme olettaa, että rakennuspalon todennäköisyys olisi
samansuuruinen jokaisessa Helsingin alueen sijainnissa
– voisimme sanoa, että rakennuspalo-prosessi on satunnainen
• Mutta Helsingin alueella rakennuspalon syttyminen riippuu
– Rakennusten sijainnista (ei rakennusta => ei paloa)
– Väestön sijainnista (useimmiten ihmiset aiheuttavat tulipalot)
– Muista riippuvuuksia ….(laitteistot, materiaalit…)
![Page 42: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/42.jpg)
Can you identify some spatial
process/phenomenon that could be
random ?
Mieti voisiko jokin spatiaalinen
prosessi/ilmiö olla satunnainen ?
![Page 43: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/43.jpg)
Example on point pattern: domestic firesQuestions: How many in the study region? Intensity?
Spatial/spatio-temporal arrangement? Dependencies to
other data?
![Page 44: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/44.jpg)
The example continues
• Many observation data sets can be approximated by using well
known mathematically defined random variables, by using
probability distributions
• Most typical is the normal distribution
• For discrete phenomena, we can use discrete random variables,
for example binomial distribution
• The most popular distribution for discrete spatial phenomena is
Poisson distribution that gives the probability of having an event
in a specified location
![Page 45: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/45.jpg)
Esimerkki jatkuu• useita havaintoaineistoja voidaan approksimoida käyttämällä
tunnettuja matemaattisesti määriteltyjä satunnaismuuttujia, käyttämällä todennäköisyysjakaumia
• normaalijakauma on tunnetuin
• ilmiöt ovat kuitenkin erilaisia: esim. jatkuvia ja diskreettejä; käytetään erilaisia jakaumia
• diskreeteille ilmiöille käytetään diskreettejä satunnaismuuttujia ja esimerkiksi binomijakaumaa
• diskreeteille ilmiöille käytetyin jakauma on Poisson jakauma(matemaattisesti helpompi kuin binomi)
• sillä voidaan mallintaa tapahtuman todennäköisyyttä tietyssä sijainnissa
![Page 46: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/46.jpg)
CSR, complete spatial randomness
• CSR is a concept that represents the behaviour of the
0-hypothesis in analysis of point pattern arrangement
• It stands for stationary process, in which
– the intensity does not change over the space
– there is no interaction between the points over the space
• CSR is also called as independent random process,
IRP
• Poisson process is often used in modeling CSR of
point patterns
![Page 47: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/47.jpg)
CSR
• CSR on käsite, joka kuvaa analyysin 0-hypoteesin
pistekuvion järjestyksestä
• Se on stationäärinen prosessi, jossa
– intensiteetti ei muutu
– pisteiden välillä ei ole interaktiota
• CSRää kutsutaan myös riippumattomaksi
satunnaisprosessiksi
• Poisson prosessia käytetään usein CSRn mallina
![Page 48: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/48.jpg)
Poisson process, Poisson distribution
• Poisson process is stochastic process that models discrete,
independent events in space or time, stationarity is assumed
• Poisson distribution is produced by Poisson process
• Poisson distribution has one parameter, intensity, lambda, that is
the expected value in a time unit and also the variance
• Using this distribution we can calculate the probability of having
a given amount of points in a given space
• We only need to know lambda (expected value)
![Page 49: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/49.jpg)
Poisson prosessi, Poisson jakauma
• Poisson prosessi on stokastinen prosessi, joka mallintaa toisistaan
riippumattomia tapahtumia paikassa ja ajassa, stationaarisuusehto
• Poisson prosessi on prosessi, joka tuottaa Poisson jakauman
• Poisson jakaumassa on yksi parametri, Poisson prosessin
intensiteetti lambda, joka on tapahtumien odotusarvo
aika(paikka)yksikössä, jakauman varianssi on lambda
• Poisson jakauma kertoo todennäköisyyden tietylle
tapahtumamäärälle tietyssä aika(paikka)alueessa eri lambda-
arvoilla
• Täytyy tuntea vain lambda
![Page 50: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/50.jpg)
How to analyse empirical data set with a
CSR hypothesis
Empiirisen datasetin analysointi
hypoteesina satunnainen jakauma
![Page 51: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/51.jpg)
Use of simple statistical test:
Quadrat method
• so-called quadrat methods
– the region is divided into subareas
– amount of events in each quadrat arerecorded
• the quadrats can fill the study region with no overlaps
• the quadrats can be randomly placed
• we can compute
– quadrat counts – number of events in eachquadrat
– frequency distribution
![Page 52: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/52.jpg)
Yksinkertainen tilastollinen testi:
Tutkimusalamenetelmä
• tutkimusalamenetelmät (quadrats)
– jaetaan alue samankokoisiin osiin (neliö,monikulmio)
• lasketaan havainnot osa-alueittain
– osat voivat täyttää alueen kokokaan (gridi)
– osat voidaan valita satunnaisesti
– voidaan laskea
• tutkimusalakohtaiset pistesummat
• frekvenssijakauma – miten pisteiden sijoittuminen jakautuu osa-alueiden kesken
![Page 53: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/53.jpg)
(O´Sullivan & Unwin)
![Page 54: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/54.jpg)
Analysis of the quadrat counts
• Poisson distribution is the null hypothesis of the point pattern (showing the IRP,CRP)
– if variance/mean(VMR) = 1, distribution is Poisson
– if the ratio > 1, the point pattern is more clustered
– if the ratio < 1, the point pattern is more evenly distributed
• In analysis Khi2 test can be applied
– independent observations
![Page 55: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/55.jpg)
Tutkimusalatulosten analyysi
• pistekuvion jakauman mallintamiseen voidaan käyttää binomijakaumaa tai sen approksimaatiota Poisson –jakaumaa (IRP)
– yksinkertaisin testi siitä kuinka hyvin aineisto noudattaa tätä jakaumaa;
• varianssi/keskiarvo(VMR)=1 jakauma on Poisson
• jos suhde > 1 aineisto klusteroituneempaa
• jos suhde < 1 aineisto tasaisemmin jakautuvaa
• jakauman analysointiin voidaan myös esim. Khi2 -testiä ks. esimerkki s. 98…
– riippumattomat havainnot
![Page 56: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/56.jpg)
More examples of using statistical
methods in inference
• Methods that we have learned to know as summary statistics can
also be used in statistical inference
• from reserach work by Ms. Olga Spatenkova
– analysis of fire and resecue incident data and some socio
economical explanatory variables
– the goal of the research is to find good variables to the model
risk of incidents
– K- function was used were used and the statistics were compared to
simulated hypothetical process; G-function was used in finding the best
model for modeling the potential explanatory variables, statistical
sigficance was calculated and conclusions were made; see the thesis
Chapter 6
![Page 57: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/57.jpg)
Lisää esimerkkejä tilastollisesta
päättelystä
• Menetelmiä, joita on käytetty kuvailemaan dataa, voidaan myös
käyttää päättelyyn
• Olga Spatenkovan väitöskirja
– Analysoidaan rakennuspalotapahtumadataa ja verrataan sitä
joihinkin mahdollisesti selittäviin muuttujiin
– Tavoitteena on löytää mahdollisia rakennuspaloja selittäviä
tekijöitä ja parantaa riskitasomallia
– K-funktiota käytetään rakennuspalojen satunnaisuuden
analysointiin; G-funktiota käytetään mahdollisten selittävien
muuttujien löytämiseen; tilastollisen merkittävyyden testaus ja
johtopäätösten teko, kappale 6
![Page 58: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/58.jpg)
Ĝ function for building fires and
population density
• Ĝ function (solid line)
• Theoretical values for random distribution (dashed line)
• Simulation envelopes (dotted line)
• (Spatenkova, 2009)
![Page 59: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/59.jpg)
Ĝ function for building fires and
stage of life in households
• Ĝ function (solid line)
• Theoretical values for random distribution (dashed line)
• Simulation envelopes (dotted line)
![Page 60: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/60.jpg)
Ĝ function for building fires and
building type
• Ĝ function (solid line)
• Theoretical values for random distribution (dashed line)
• Simulation envelopes (dotted line)
![Page 61: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/61.jpg)
Analysis steps
• Analysis of Kernel density; findings about differencies fromrandom distribution; two clear hotspots
• Anaysis of domestic fires´ distance statistics; plot of K-function of the empirical data and the theoretical values of CSR model; CSR model with the same intensity simulated 90 times, envelope created; clustering found
• Analyses with domestic fires, socio-economic variables and building types by using G-function; simulated process for comparison
• Statistical significance testing
• Special features– Edge problem solved by buffer
– Temporal aspect taken into account by dividing fires into day, evening and night fires
![Page 62: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/62.jpg)
Analyysin vaiheet
• Kernel tiheys analyysi; havaintoja poikkeamasta satunnaiseen jakaumaan; kaksi selkeää hotspottia
• rakennuspalojen etäisyysanalyysi; K-funktiolla plotataanempiirinen aineisto ja teoreettinen satunnaisuuden malli; satunnaisuuden mallilla simuloidaan samaa tapahtumatiheyttä ja saadaan kuvaajaan min ja max alue; löytyy klusteroituneisuutta
• Analyysi rakennuspalojen, sosio-ekonomisten muuttujien ja rakennustyyppien välisistä riippuvaisuuksista G-funktiolla, satunnaisuutta mallinnetaan kuten edellä
• Erityisiä piirteitä– Edge problem ratkaistaan puskurialueella
– Ajallinen dimensio otettu huomioon jakamalla rakennuspalot päivä, ilta ja yö -onnettomuuksiin
![Page 63: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/63.jpg)
Appendix 1: Background
• in fire and rescue a so-called risk-level model is used for
resource allocation
• in Finland risk-level model is used in each municipality
• the variables in the model are:
– population density
– floor area
– intensity of traffic accidents
• based on these data, risk level is calculated in each grid cell (size
250 m x 250 m)
![Page 64: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/64.jpg)
Liite 1: Taustaa sovelluksesta
• pelastustoimessa käytetään ns. riskitasomallia, jonka avulla
voidaan sijoitella resurssit oletetun tarpeen mukaan
• onnettomuuksien riskitaso lasketaan Suomessa kaikkiin kuntiin
• mallissa käytetään riskiä ennustavina muuttujina
– asukastiheyttä
– kerrospinta-alaa
– liikenneonnettomuustiheyttä
• näiden muuttujien avulla lasketaan onnettomuuksien riskitaso ja
saadaan riskitasokartta, resoluutiolla 250 m x 250 m
(tilastoruudun koko)
![Page 65: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/65.jpg)
In this lecture examples from Olga
Spatenkova´s thesis• Olga made her doctoral thesis on analysis of domestic fire data in
Helsinki area
• She tried several analysis methods in order to find possible causes ar
at least some correlations between some socioeconomic variables
and domestic fires
• She tried spatiao statistical descriptive and inferential methods,
geographically weighted regression and also some spatial data mining
methods like neural networks
![Page 66: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/66.jpg)
sivu 66
Tulos: ”Riskitasokartta” – onnettomuuksien
todennäköisyysluokat alueella laskettuna valittujen muuttujien
perusteella –
Risk map, probability classes of indicents
Tämän avulla
voidaan mm. sijoittaa
resurssit oikeisiin
paikkoihin alueella.
Palokalustoa sinne,
missä näyttää olevan
suurin onnettomuus-
todennäköisyys.
Red - high
Yellow - medium
Green - low
![Page 67: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/67.jpg)
Problem: we know the incidents, but we want to
know the potential causes of them
(Spatenkova,
2009)
The first
task is just to
analyse the
events as a map.
Incidents are
taken from
Pronto-database.
![Page 68: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/68.jpg)
Kartat osoittavat, että onnettomuustiheys ja asukastiheys
Helsingissä eivät korreloi; Maps show that there is no correlation
between incidents and population density
b. Asukastiheys – osoitteen mukaan/
Population density, according to addressa. Onnettomuustiheys/Incident density
![Page 69: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/69.jpg)
Lasketaan onnettomuustiheydet ns. Kernel –tiheyspintana:
erikseen päivä- ja yöaikaan; The incident density is then
computed separately by the day data and the night time data
Päivä/Daytime Yö/Nighttime
![Page 70: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/70.jpg)
Tulos: asukastiheys ja yöajan onnettomuustiheys korreloivat
spatiotemporaalisesti; The result: Population density and
incident density correlate spatio-temporally
b. Asukastiheys osoitteiden mukaan =
asukastiheys yöaikaana. Onnettomuustiheys yöaikaan
![Page 71: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/71.jpg)
Onnettomuustiheyden analyysi Kernel tiheyspinnalla –
yöonnettomuudet pe-la
Nighttime weekend incident density by Kernel
Spatenkova,O., 2009
Kernel –tiheyspinnan tuottaminen
Kernel-density surface
![Page 72: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/72.jpg)
Ĝ functio: rakennuspalot ja asukastiheys –
analyysin perustana niiden välinen etäisyys
▪ Ĝ funktio (yhtenäinen viiva): kumulatiivinen frekvenssikäyrä, joka kuvaarakennuspalojen ja asukastiheyden (kummatkin tiheyspinnasta gridin solujenkeskipisteisiin muunnettuna ja asukastiheys kolmeen luokkaan luokiteltuna) välisten etäisyyksien määrän aineistossa kumulatiivisesti
▪ Teoreettiset satunnaisen jakauman pisteet (katkoviiva line), simuloidut arvot(pisteviiva)
▪ (Spatenkova, O,. 2009)
![Page 73: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/73.jpg)
Millaisia menetelmiä?
What kind of methods?
• Kernel density estimation is a descriptive method thatrequired visual interpretation made by the user
• Kernel tiheysestimointi on kuvaileva menetelmä, joka vaatii käyttäjän visuaalisen tulkinnan
• G-function can be used both as descriptive and as inferential method, including a hypothesis of randomdistribution
• G-funktiota voidaan käyttää sekä kuvailevana että on tilastollisena päättelymenetelmänä, jossa hypoteesina satunnainen jakauma
• GWR is regression model that can be used in analysis of dependencies between variables, it is a model for prediction
![Page 74: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/74.jpg)
Literature
• O´Sullivan & Unwin: Geographic Information Analysis, Chapters:2,3,4
– References made in the slides to the 2003 edition of the book
• Dale,M.R., & Fortin, M-J., Spatial analysis. A guide for ecologists. Ch 6., pp. 140…152.
• Spatenkova,O., Discovering spatio-temporal relationships: A case study of risk modelling of domestic fires. Doctoral thesis, Helsinki University of Technology, 2009. Chapter 6.
• Rogerson,P., Statistical methods for geography. A students guide. 2015. This book can be used as background reading material, if you need to know some details.
• Brundson,C., Comber,L.,An introduction to R for spatial analysis & mapping, Chapter 6 (6.1 - 6.6)
![Page 75: Analysing spatial distribution and dependencies between](https://reader035.vdocument.in/reader035/viewer/2022081503/62a1fb58a20a6c7db92802ea/html5/thumbnails/75.jpg)
Reading material
• O´Sullivan & Unwin: Geographic Information Analysis, Chapters: 3,4,5
– References made in the slides to the 2003 edition of the book
• Dale,M.R., & Fortin, M-J., Spatial analysis. A guide for ecologists. Ch6., pp. 140…152.
• Spatenkova,O., Discovering spatio-temporal relationships: A case study of risk modelling of domestic fires. Doctoral thesis, Helsinki University of Technology, 2009. Chapter 6.
• Rogerson,P., Statistical methods for geography. A students guide. 2015. This book can be used as background reading material, if youneed to know some details.
• Brundson,C., Comber,L.,An introduction to R for spatial analysis & mapping, Chapter 6 (6.1 - 6.6)