an investigation in defining neighbourhood boundaries using location based social media

Upload: thoughtful-practice

Post on 22-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    1/151

    1

    An Investigation in Defining Neighbourhood Boundaries

    Using Location Based Social Media

    Tai Tong KAM

    28thAugust 2015

    For BENVGSC6: Dissertation

    Supervised by: Steven Gray, Dr Elsa Arcaute

    Word Count: 10,169 words

    This dissertation is submitted in partial fulfilment for the requirements for the MSc in

    Smart Cities and Urban Analytics in the Centre for Advanced Spatial Analysis, Bartlett

    Faculty of the Built Environment, University College London.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    2/151

    2

    ABSTRACT

    The widespread use of smartphones and social media has opened opportunities for

    researchers to define one of the most elusive concepts in cities: neighbourhoods. While

    the number of neighbourhood detection methods using location based social media have

    increased in recent years, there is much that we do not know about the process. For

    example, researchers have rarely integrated the neighbourhoods detected with

    administrative data to add meaning beyond what can be inferred from social media.

    This work takes a step towards better understanding neighbourhood detection methods,

    and also attempts to add meaning to the clusters / neighbourhoods generated by

    incorporating administrative data to these clusters / neighbourhoods.

    I break down the neighbourhood detection process into three common elements (a) the

    unit used for aggregation, (b) the type of clustering method used; and (c) the similarity

    measure.

    I then illustrate one way of better understanding the neighbourhood detection process by

    applying multiple variations of the Livehoods method (Cranshaw et al., 2012) on data

    from Greater London, and find that in addition to neighbourhood clusters, the

    Livehoods method may also be able to generate clusters that depict the citys boundaries

    from the residents perspective.

    I also make a preliminary attempt in this work to combine the clusters / neighbourhoods

    formed using the Livehoods method with data from LondonsLower Super Output

    Areas to investigate ethnic diversity in neighbourhoods. I found that using location

    based social media may generate neighbourhood boundaries that are more appropriate

    than or can complement traditional administrative boundaries for studies where

    definitions of neighbourhood goes beyond arbitrary administrative boundaries and a

    multifaceted view of neighbourhoods is needed.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    3/151

    3

    DECLARATION

    I, Tai Tong Kam, hereby declare that this dissertation is all my original work and that all

    sources have been acknowledged. It is 10,169 words in length.

    Signature

    ====================

    Date: 28thAugust 2015

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    4/151

    4

    TABLE OF CONTENTS

    1. RESEARCH GOAL AND OVERVIEW..................................................................... 8

    1.1. Research goal, motivations, and limitations ......................................................... 8

    1.2.

    Overview........................................................................................................... 10

    2. INTRODUCTION.................................................................................................... 12

    2.1. Neighbourhoods ................................................................................................ 12

    2.2. Location Based Social Media and Detecting Neighbourhood Boundaries......... . 14

    2.3. Review of Methods for Neighbourhood Detection ............................................. 16

    3. METHODOLOGY ................................................................................................... 25

    3.1. Data sources ...................................................................................................... 25

    3.2. Data sorting, import, storage and analysis......................................................... 26

    3.3. The Livehoods method...................................................................................... 26

    4. ANALYZING THE LIVEHOODS METHOD.......................................................... 30

    4.1. Tuning the number of smallest eigenvalues (k).................................................. 30

    4.2. Tuning the alpha constant ()............................................................................ 33

    4.3. Tuning the nearest neighbours parameter (m).................................................. 34

    4.4. Using cosine similarity....................................................................................... 35

    4.5. Nearest neighbours versus full similarity graph ................................................ 36

    4.6. Summary........................................................................................................... 36

    5. DESCRIPTION OF LIVEHOOD CLUSTERS / NEIGHBOURHOODS.................. 38

    5.1. Overview of neighbourhoods............................................................................. 38

    5.2. Breakdown of individual neighbourhoods ......................................................... 46

    6. COMPARING LIVEHOODS CLUSTERS TO LOWER SUPER OUTPUT AREAS 54

    7. CONCLUSION......................................................................................................... 59

    7.1. Concluding Remarks ......................................................................................... 59

    7.2. Limitations and Future Research ...................................................................... 60

    8. BIBLIOGRAPHY.................................................................................................... 649. APPENDIX.............................................................................................................. 67

    9.1. Scripts for collecting and formatting data for analysis ...................................... 67

    9.1.1. IPython notebook: twitter_streaming.ipynb ............................................... 67

    9.1.2. IPython notebook: extract_twitter_data.ipynb........................................... 70

    9.1.3. IPython notebook: foursquare_search_place.ipynb................................... 75

    9.1.4. IPython notebook: format_data_for_analysis.ipynb.................................. 84

    9.2. Scripts for Livehoods clustering method........................................................... 89

    9.2.1. Bash script: install.sh................................................................................. 89

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    5/151

    5

    9.2.2. Bash script: runLDN.sh............................................................................. 90

    9.2.3. Python script: clustering.py....................................................................... 92

    9.2.4. Python script: clusteringalgo.py................................................................. 94

    9.2.5. Python script: getdata.py ..........................................................................100

    9.2.6. Python script: utils.py ...............................................................................111

    9.3. Scripts for visualizing cluster results ................................................................119

    9.3.1. Python script: formatresults.py.................................................................119

    9.3.2. Python script: visualize_cluster_results.py................................................127

    9.4. Scripts for comparing Lower Super Output Areas with Livehoods clusters in

    terms of ethnic diversity ..............................................................................................138

    9.4.1. Python script: extract_ldn_lsoa.ipynb .......................................................138

    9.4.2. Python script: add_ethnic_diversity_to_geojson.ipynb.............................141

    9.4.3. Python script: stats_for_eth_diversity.ipynb .............................................146

    9.4.4. R script: ethnic_diversity_chart.R ............................................................148

    9.5. Livehood clusters for nearest neighbours parameter m=5 to m=20..................149

    9.6. Largest cluster generated from Livehoods method ...........................................151

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    6/151

    6

    LIST OF FIGURES

    Figure 1: Relationship between number of smallest eigenvalues (k) found and number of

    clusters formed ................................................................................................................. 32

    Figure 2: Boundaries formed for different number of clusters ... ........ ........ ......... ....... ......... .. 33

    Figure 3: Boundaries formed for different alpha constants ........ ........ ......... ....... ........ ......... .. 34Figure 4: Boundaries formed for different nearest neighbours parameter (m) ....................... 35

    Figure 5: Clustering results for London ................................................................................ 40

    Figure 6: Properties of Livehood clusters ............................................................................. 44

    Figure 7: Overall distribution of venues and checkins across clusters ............ ........ ......... ....... 47

    Figure 8: Hirschman concentration index (HI) for clusters.......... ........ ......... ....... ........ ......... .. 56

    LIST OF TABLES

    Table 1: Summary statistics for cluster results for London ........ ........ ........ ........ ........ ........ .... 41

    Table 2: Percentage difference between proportion of venues within cluster to proportion of

    venues within city in terms of Foursquares main categories............................................... 50

    Table 3: Percentage difference between proportion of users within cluster checking-in to

    proportion of users within city checking-in in terms of Foursquares main categories............ 52

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    7/151

    7

    ACKNOWLEDGMENTS

    I would like to thank my supervisors, Steven Gray and Elsa Arcaute, who have been

    extremely supportive and helpful throughout the dissertation process. Steven was also

    instrumental in helping me process the data by guiding me on the process for setting up

    the cloud computing infrastructure required to run the time-consuming scripts in parallel.

    On the other hand, Elsa introduced me to Anastasios Noulas from the University of

    Cambridge, who kindly provided the Foursquare data used in this work.

    I would also like to thank all the teachers, staff and fellow course mates at CASA, who

    have given me a great year of friendship, learning and joy in my time at CASA and

    inspired me to do better.

    Finally, I would like to thank my partner Cherlyn Ng, whose love, patience and support

    made it possible for me to focus on my work while we were 6,740 miles apart.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    8/151

    8

    1. RESEARCH GOAL AND OVERVIEW

    1.1.Research goal, motivations, and limitations

    The widespread use of smartphones and social media has generated an immense

    amount of data which has been used to study topics such as mobility and event

    detection in the city (Silva et al., 2013). Some researchers have been attempting to

    use the data to define one of the most elusive concepts in cities: neighbourhoods

    (Cranshaw et al., 2012; Falher et al., 2015; Zhang et al., 2013). While the research is

    promising, there is much that we do not understand about the process of detecting

    neighbourhoods using location based social media. For example, we do not know

    how the neighbourhoods detected compare with traditional administrative

    boundaries, and how we can combine the neighbourhoods detected with data from

    these administrative boundaries to help us better understand cities dynamically. We

    also do not know how the neighbourhoods detected may change when data over

    different time periods or different time intervals are used and what these changes

    may mean.

    This work takes a step towards better understanding neighbourhood detection

    methods. I break down the neighbourhood detection process into three common

    elements (a) the unit used for aggregation, (b) the type of clustering method used;

    and (c) the similarity measure used so that they can be studied in depth.

    Better understanding can come in the form of research on particular elements in the

    neighbourhood detection process across a variety of methods and comparing the

    differences when different elements are used. It can also come in the form of better

    understanding a particular method in depth and exploring how the neighbourhoods

    formed are different depending on the parameters used.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    9/151

    9

    In this dissertation, I illustrate one way of doing this by applying multiple variations

    of the Livehoods method (Cranshaw et al., 2012) on data from Greater London. The

    Livehoods method was chosen as it is a venues-based approach which has not been

    used as much in the literature. In addition, it has not yet been applied to the Greater

    London area.

    As mentioned above, we do not understand how we can combine the clusters /

    neighbourhoods detected via neighbourhood detection methods with data from these

    administrative boundaries to help us better understand cities. Integrating cluster /

    neighbourhoods detected using neighbourhood detection with data from

    administrative boundaries is rare in the neighbourhood detection literature as most

    researchers using neighbourhood detection methods have used them for developing

    recommendation engines that find similar places based on social media activity. As

    such, I make a preliminary attempt in this work to combine the clusters /

    neighbourhoods formed using the Livehoods method with data from more

    traditional administrative boundaries (the Lower Super Output Areas in this case) to

    extend the meaningfulness of the clusters / neighbourhoods formed. In particular, I

    have tried to integrate ethnic diversity data with the clusters / neighbourhoods

    formed using the Livehoods method.

    As neighbourhood detection using location based social media is relatively new and

    there are few comparisons between existing neighbourhood detection methods, this

    work is not aimed at evaluating whether one method or even whether particular

    elements of a method are better than another. Neighbourhood detection is a form of

    clustering, and determining the best clustering method has a certain degree of

    subjectivity.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    10/151

    10

    1.2.Overview

    The dissertation is divided into seven sections.

    Section Twodiscusses the concept of neighbourhoods, its importance for

    understanding cities and why social media is a useful source of data for defining

    neighbourhoods. I will review the methods that have so far been used for defining

    neighbourhoods and three common elements used by the methods: (a) the unit used

    for aggregation, (b) the type of clustering method used; and (c) the similarity

    measure used. I will then describe what we have learnt so far about neighbourhood

    detection using location based social media, and outline some ideas for better

    understanding these methods.

    Sections three to six illustrates one way we can better understand neighbourhood

    detection methods by taking a closer look at the Livehoods method (Cranshaw et al.,

    2012). Section Threebegins by describing the data and methodology used.

    Section Fourthen considers different variations of Cranshaw et als (2012)

    Livehoods method for neighbourhood detection and tests three different parameters

    to find out if changing them affects the clustering results.

    Section Five describes the clusters / neighbourhoods that are formed using the

    Livehoods method and explores some types of information that can be derived from

    these clusters, by combining the clusters with Foursquares venues database.

    Section Six describes the clusters / neighbourhoods that are formed using the

    Livehoods method by combining them with data from Lower Super Output Areas

    (LSOAs) in Greater London. It discusses the issue of the modifiable areal unit

    problem (Openshaw, 1984) and how characteristics of the clusters / neighbourhoods

    formed using the Livehoods method may be more appropriate than traditional

    administrative boundaries such as the LSOAs.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    11/151

    11

    Section Seven consists of concluding remarks and outlines some ideas for further

    research that can help us better understand neighbourhood detection methods using

    location based social media.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    12/151

    12

    2. INTRODUCTION

    2.1.Neighbourhoods

    Neighbourhoods are a ubiquitous feature of urban livingeveryone lives in a

    neighbourhood. Many groups have an interest in understanding neighbourhoods.

    Cranshaw and Yano (2010) note that analysing neighbourhoods is of interest to

    businesses such as realtors and developers as the quality of a neighbourhood

    affects the value of their assets, and to researchers in the social sciences as they seek

    to understand neighbourhood and community level factors that influence

    phenomenon such as obesity rates and perceived happiness through neighbourhood

    effects (Sampson et al., 2002). A third group that has an interest in neighbourhoods

    are city governments that implement neighbourhood interventions and wish to

    identify where the interventions would make sense and be most effective. Being

    able to identify neighbourhoods in our cities would be valuable to all three groups.

    While there is a general consensus that a neighbourhood is a contiguous

    geographic area within a larger city, limited in size, and somewhat homogeneous in

    its characteristics (Weiss et al., 2007), it is hard to pin down a more exact definition

    (Chaskin, 1998; Weiss et al., 2007). Researchers have defined neighbourhoods in

    terms of 3 dimensions with varying emphasisby social ties, physical demarcations

    and residents experiences (Chaskin, 1997). These are influenced by many factors

    such as administrative boundaries, manmade features such as roads, natural features

    such as rivers, demographics, social networks of the people that live in or frequent

    the area, and the availability of services and facilities (Cranshaw and Yano, 2010).

    Each persons perception of their neighbourhood boundaries may differ, even from

    their neighbours, and these perceptions may also differ from the official boundaries

    used by city governments for urban planning or neighbourhood initiatives

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    13/151

    13

    (Campbell et al., 2009). However, researchers have also found evidence that

    residents often identify a common core within their neighbourhood, and the

    differences are about the boundaries where neighbourhoods begin and end

    (Campbell et al., 2009).

    Neighbourhoods differ from communities, in the sense that neighbourhoods are tied

    to a spatial unit with boundaries, while communities are not limited to spatial units.

    This difference is reflected in how the role of neighbourhoods in cities has shifted

    over time. To summarize Chaskin (1997), neighbourhoods in the past were tied

    closely to the idea of community. There were close ties between those living within

    a neighbourhood and a strong sense of identity, akin to an urban village. However,

    as transportation systems improved and communication over long distances became

    available, ties within a neighbourhood have become less close and more functional,

    providing a space where neighbours share information, aid and services. When

    studying social ties within neighbourhoods, it may be useful to look at common

    social and functional activities between those living in a neighbourhood and where

    these activities take place. These may give an indication of places that are

    considered part of the neighbourhood for those involved in the activities.

    Traditionally, studies on neighbourhoods and the neighbourhood effect have used

    boundaries where data was easily available, such as administrative and political

    boundaries. The data is often reliable as they are typically collected by government

    agencies, and the boundaries used usually do not change greatly. Such data is useful

    for understanding long term trends and behaviours such as demographics and

    urbanisation. However, these traditional data sources are usually collected at certain

    periods with long intervals between each period. The data collected represents

    snapshots at particular points in time, and do not capture the multiple changes that

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    14/151

    14

    may occur in between data collection periods. This means that data from traditional

    data sources are less useful for reseachers interested in questions where trends and

    behaviors are more short term or temporary in nature, such as commuting behaviour

    during transport strikes or riots, are unable to capture the. For example, full censuses

    in the United Kingdom take place once every ten years. In addition, data from

    traditional sources is often expensive and time consuming to collect. Such issues

    means data from traditional sources are less suitable for studying trends and

    behaviours that are more short term in nature and change frequently. For studying

    more short term and dynamic trends and behaviours, location based social media is

    likely to be a more suitable data source.

    2.2.Location Based Social Media and Detecting Neighbourhood Boundaries

    Location based social media is a relatively new source of data for researchers. Users

    of these platforms post their thoughts or activities with location data attached. Many

    of the characteristics of data from these posts or check-ins make it suitable for

    studying short term phenomena and behaviours. It is easily available, it is cheap and

    quick to collect, and it provides multiple points of data within a short period. Its

    biggest advantage over other data sources is the amount of context that it provides.

    A typical data point from location based social media contains information on who

    the user is, where the user was, when the data was created. It also provides

    additional information depending on the social media platform used. For example,

    Twitter1users post tweets indicating what they were doing or thinking, Instagram2

    users post photos, and Foursquare3users provide more detailed information about

    1https://twitter.com/

    2https://instagram.com/

    3https://foursquare.com/

    https://twitter.com/https://twitter.com/https://twitter.com/https://instagram.com/https://instagram.com/https://instagram.com/https://foursquare.com/https://foursquare.com/https://foursquare.com/https://foursquare.com/https://instagram.com/https://twitter.com/
  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    15/151

    15

    their location. Social media platforms may provide additional contextual

    information. The aforementioned Foursquare, for example, maintains a database of

    venues that their users post from. This database contains rich contextual information

    such as the type of venue (e.g. restaurant, school) and its popularity, which can be

    linked to the posts from its users. Furthermore, it is possible to look at the

    relationships between different users on social media platforms through the users

    interactions with each other.

    Silva et al (2012) observe that the widespread adoption of smartphones and social

    media websites has created a valuable opportunity to study city dynamics. Data

    from location based social media provides rich contextual information on user

    activity at different times of day. These characteristics make location-based social

    media useful in detecting the invisible image of cities (Silva et al., 2012), such as

    patterns of transition between locations that serve different functions in the city.

    Given that city neighbourhoods do not follow strict boundaries and can shift over

    time (Chaskin, 1997), location-based social media, which provides a large amount

    of data in real time, is a useful source of information for neighbourhood detection in

    cities and identifying changes over time. As such, researchers have also started to

    use social media to detect neighbourhood boundaries.

    Using data from location based social media has its limitations. While data from

    location based social media has rich context and can be collected easily, such

    platforms are typically used by young males who are interested in technology

    (Cranshaw et al., 2012), thus the data represents a skewed demographic. Using such

    data may generate clusters / neighbourhoods that reflect the views of a certain

    demographic, which may not be in agreement with the general population. In

    addition, data on these platforms are usually private unless the user agrees to share

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    16/151

    16

    the data publicly, which further limits the amount of data available for analysis.

    Another factor to consider is that users may curate the types of places that they

    check-in at using location based social media. Places that are considered more

    socially desirable to be at may be over represented when using data from location

    based social media. For example, people may be more likely to checkin when eating

    at a new fancy restaurant or shopping in a branded goods store rather than when

    they are eating at a fast food restaurant or shopping in a discount store. This means

    that conclusions based on data from location based social media will likely be

    biased towards such socially desirable venues. In the case of neighbourhood

    detection, the clusters / neighbourhoods formed may be similarly biased. Previous

    research has shown that users have been more likely to check-in at venues

    concerning travel and transport, office buildings, and residences (Preotiuc-Pietro

    and Cohn, 2013). Despite these limitations, reseachers believe that data from

    location based social media can still be valuable for its rich contextual information

    and sheer volume available (Silva et al., 2013).

    2.3.Review of Methods for Neighbourhood Detection

    What follows is a review of neighbourhood detection methods using location-based

    social media. Neighbourhood detection using location based social media is

    typically treated as a clustering problem, and the methods used so far reflect this

    paradigm. Essentially, researchers wish to cluster users social media activities into

    contiguous geographic areas based on certain measures of similarity.

    Neighbourhood detection methods usually contain three elements:

    a. The unit used for aggregation (e.g. grid-based, venue-based)

    b. The type of clustering method (e.g. K-Means clustering, spectral

    clustering)

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    17/151

    17

    c. The similarity measures used

    Unit used for aggregation

    While the data from location based social media comes in the form of individual

    posts or check-ins, they are usually aggregated in some spatial form before being

    clustered. A common method used in neighbourhood detection is to take the grid-

    based approach for aggregating the posts. This means dividing the city into multiple

    grid squares of equal size and aggregating the properties of the posts within the grid

    square. The properties of the grid squares are later used to calculate similarity

    measures between grid squares during clustering. Noulas et al (2011), for example,

    used a grid-square approach where each grid contained the distribution of

    Foursquare venue categories nearby and the number of check-ins at these venues.

    Grid squares that are contiguous and are similar to each other based on the

    clustering algorithm are then grouped up and form neighbourhoods. Grid-based

    approaches can alter the neighbourhoods formed depending on the number, size and

    shape of the grid cells used, and is an important consideration when adopting this

    approach. For example, large grid cells means a lower number of grids overall and

    will increase the speed of processing, but are less precise in delineating

    neighbourhood boundaries. In certain cases, the grid square itself may be treated as

    a neighbourhood. The size of the grid is often a key decision that has to be made in

    grid-based approaches.

    A second method is the venues-based approach. Venues are locations specifically

    identified by location-based social media platforms, which usually have a database

    of venues that users can check-in from. Researchers can make use of the data

    contained in these venue databases in addition to the posts made by the users to

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    18/151

    18

    develop methods for neighbourhood detection. Venues that are considered similar to

    each other and fulfil a proximity criterion such as being within a certain distance

    from each other are then grouped together and the area bounded by these venues

    form a neighbourhood. The proximity criterion is important as it defines the

    geographic aspect of the venues. It is similar to how defining the size and shape of

    the grids in the grid-based approach determines how the grids are geographically

    related to each other. One of the earliest attempts at neighbourhood detection using

    location based social media is called Livehoods (Cranshaw et al., 2012) and this

    took the venues-based approach. Zhang et al (2013) pointed out that one of the

    weaknesses of the venues-based approach is that the neighbourhoods formed have to

    be geographically tied to the network of venues used, whereas the grid-based

    approach does not.

    Clustering methods

    Clustering methods used in neighbourhood detection are a reflection of the breadth

    and variety of clustering methods used in other fields. This dissertation does not

    seek to determine which clustering methods are the best methods for

    neighbourhood detection using location baesd social media, since there is a certain

    degree of subjectivity. So far, neighbourhood detection methods have included

    clustering methods such as K-Means clustering (Del Bimbo et al., 2014), spectral

    clustering (Cranshaw et al., 2012; Noulas et al., 2011), and topic-based modelling

    (Cranshaw and Yano, 2010). Each clustering method used involves the researcher

    choosing parameters used. Examples are the number of topics to use for topic-based

    modelling and the number of clusters in K-Means clustering.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    19/151

    19

    Similarity measures

    A variety of similarity measures have been used in neighbourhood detection. In

    terms of properties to include in the similarity measure, researchers have used

    properties related to users, such as the users check-in patterns and interests (Del

    Bimbo et al., 2014). Researchers have also used properties related to venues in the

    databases of location based social media platforms, such as the distribution of

    Foursquare venue categories nearby and the number of check-ins at these venues

    (Noulas et al., 2011). Other researchers have combined the above mentioned

    properties with temporal properties to provide a contextually richer set of properties

    to calculate similarity (Falher et al., 2015; Zhang et al., 2013). Different properties

    characterise neighbourhoods in different ways, and makes them useful for different

    purposes. Amongst the three dimensions of neighbourhoods mentioned earlier

    (social ties, physical demarcations and residents experiences), methods in

    neighbourhood detection using location based social media have typically used

    properties related to residents experiences, for example the number of check-ins,

    the temporal pattern of check-ins, and the type and number of venues in the area.

    Cosine similarity measures similarity as the angle between two vectors (Xia et al.,

    2015). In neighbourhood detection methods, these vectors represent the properties of

    the grid and of the venues in the grid-based method and the venues-based method

    respectively. Cosine similarity is often used for clustering in neighbourhood

    detection with location based social media, and often preferred over other similarity

    measures because cosine similarity does not take the magnitude of the vectors into

    account. This is useful in cases where the magnitudes of the vectors differ greatly

    but at the same time less important for determining similarity. For example, cosine

    similarity is often used in information retrieval to determine document similarity as

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    20/151

    20

    the relative frequency of words in each document and across documents are more

    important than the total number of words in a document (Huang, 2008). Similarly,

    the magnitude of vectors used in neighbourhood detection differ greatly. The most

    popular venues often garner many more check-ins than those less popular and the

    most active users check-in much more frequently than those who are less active

    (Scellato and Mascolo, 2011). As such, researchers have found that relative

    frequencies between venues/grid squares are more useful for neighbourhood

    detection rather than absolute numbers, and prefer cosine similarity measures over

    Euclidean distance measures when measuring similarity for neighbourhood

    detection (Cranshaw et al., 2012; Preoiuc-Pietro et al., 2013).

    Researchers use different combinations of the three elements (unit used for

    aggregation, clustering method, similarity measure) of neighbourhood detection to

    create neighbourhoods, depending on their research purpose. Within each element,

    researchers have also had to make decisions that influence the eventual

    neighbourhoods formed. Most of the research so far seek to compare urban

    neighbourhoods within and across cities so that recommendation engines can make

    better recommendations based on criteria such as the users check-in patterns, the

    users preferred venue categories and the users interests. Their goals are to suggest

    new places that the user may wish to visit, which are similar to places the user has

    visited in the past.

    A typical example of a neighbourhood detection method for recommendation

    engines comes from Noulas et al (2011). They take a grid-based approach and use a

    spectral clustering algorithm to cluster grid squares based on the distribution of

    Foursquare venue categories nearby and the number of check-ins at these venues.

    The method creates neighbourhoods that give us an idea of what type of places are

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    21/151

    21

    in an area, and a measure of their importancebased on users check-in activity.

    Another example is Del Bimbo et al (2014)s LiveCities method, which performed

    K-means clustering using data on Facebook check-ins and user interests and

    Foursquare venue categories.

    An early attempt at neighbourhood detection was the Livehoods algorithm

    (Cranshaw et al., 2012), which took the venues-based approach and used spectral

    clustering to cluster Foursquare venues in Pittsburgh in the United States based on

    spatial and social proximity. Through interviews with local residents, Cranshaw et al

    (2012) found that neighbourhood detection methods could generate clusters /

    neighbourhoods that reflect the character of life in cities. More recent attempts

    have combined more information and experimented with different elements. For

    example, Zhang et al (2013)s Hoodsquare method takes a grid-based approach and

    assesses the similarity of a grid cell with its neighbouring grid cells based on (a) the

    distribution of Foursquare venue categories in vicinity; (b) whether these venues

    were frequented by tourists or locals, and; (c) the busiest time of the day in terms of

    check-ins at these venues. Neighbourhoods were then formed by finding groups of

    grid cells that had high relative homogeneity. Zhang et al (2013) point out that using

    multiple types of information may better represent the multifaceted nature of

    neighbourhoods, and that grid-based methods may be more suitable for identifying

    neighbourhoods as the boundaries formed using grid-based methods are not bound

    to a particular set of venues.

    The most recent attempt at neighbourhood detection using location based social

    media describes neighbourhoods in terms of the activity they host (Falher et al.,

    2015). Falher et al consider 2 neighbourhoods to be similar if they contain the same

    kind of Foursquare venues in the same proportion. In addition to basing the

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    22/151

    22

    similarity of these venues on the number of check-ins and unique users as well as

    the temporal distribution of the check-ins, they also take into account the

    distribution of Foursquare venues in the surrounding area.

    Cranshaw and Yano (2010) provided a different perspective by treating the question

    as an issue of latent topic discovery. They divided the city into grids and applied

    topic based modeling to the grids, using each grid as a document and each

    Foursquare category tag as a word. With this method, they were able to identify

    clusters of places and activities that often appeared together (e.g. beach and seafood).

    While research on neighbourhood detection using location based social media has

    flourished, there is less research available on understanding whether these methods

    accurately reflect neighbourhoods in reality, and how they can contribute to

    purposes other than recommending new places that users may wish to visit.

    Researchers using the Livehoods algorithm attempted to validate the

    neighbourhoods generated through their algorithm (Cranshaw et al., 2012). The

    neighbourhoods identifiedby Cranshaw et als algorithm included neighbourhoods

    that corresponded with municipal boundaries, those that were subsets of municipal

    boundaries and those that spilled over to more than one municipal boundary.

    Cranshaw et al interviewed 27 residents that lived in the city and found that the

    neighbourhoods generated by their Livehoods method closely matched the residents

    perspectives of neighbourhoods in the city. Cranshaw et als research provides

    evidence that the boundaries generated by neighbourhood detection algorithms can

    capture local dynamics that includes factors such as municipal boundaries,

    demographics, traffic flow and economic development.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    23/151

    23

    Some researchers have argued that including more properties in the similarity

    measures would better characterise the units being aggregated and produce clusters

    that more closely match actual neighbourhoods. For example, Del Bimbo et al (2014)

    use both static features (e.g. categories assigned by location based social networks)

    and dynamic features (e.g. distribution of the interests of the people who check in at

    venues) in their LiveCities method to create neighbourhoods for Florence, which

    they then validated qualitatively through online questionnaires with 28 residents.

    They found that including both types of features produce neighbourhoods that better

    reflect the residents perceptions.

    There is much that we do not know about the methods used for neighbourhood

    detection process with location based social media. For example, we do not know

    how the neighbourhoods detected compare with traditional administrative

    boundaries, and how we can combine the neighbourhoods detected with data from

    these administrative boundaries to help us better understand cities dynamically. We

    also do not know how the neighbourhoods detected may change when data over

    different time periods or different time intervals are used and what these changes

    may mean.

    Better understanding can come in the form of research on particular elements in the

    neighbourhood detection process across a variety of methods and comparing the

    differences when different elements are used. It can also come in the form of better

    understanding a particular method in depth and exploring how the neighbourhoods

    formed are different depending on the parameters used. In this dissertation, I look at

    the Livehoods method in depth by applying variations of the method on data

    collected on Greater London. The Livehoods method was chosen as it is a venues-

    based approach which has not been used as much in the literature. It is also one of

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    24/151

    24

    the rare methods in the neighbourhood detection literature that has validated the

    clusters / neighbourhoods generated with the citys residents and found strong

    support that the residents perceptions agreed with the clusters formed. This gives it

    legitimacy in being able to detect actual neighbourhoods compared to other

    neighbourhood detection methods. In addition, it has not yet been applied to the

    Greater London area.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    25/151

    25

    3. METHODOLOGY

    Python was used for most of the analysis and visualization in this work. IPython

    notebooks were used for early exploration and experimentation with the data and

    Python scripts were written in the later stages to run the neighbourhood detection

    method. All scripts used for this work can be found in the appendix section.

    3.1.Data sources

    The data used for analysis consists of 42,581 Foursquare check-ins at 8,845 venues

    by 12,397 unique users in the Greater London area from 6thApril 2011 to 31stMay

    2011. This data was kindly provided by Anastasios Noulas from the University of

    Cambridge. For each check-in, the data consists of the user ID, the time, the latitude

    and longitude, and the venue ID. Further information on the venues was collected

    using the python package foursquare. This included information on the venues

    name, category and subcategory (as categorized by the social media network

    Foursquare).

    Data was also collected from 6thApril 2015 to 31stMay 2015 for three cities:

    London, Singapore and New York City. The Python package tweepy was used to

    collect data from Twitters streaming API, which offers samples of the data being

    posted on Twitter in real time. A subset of this data consists of Foursquare checkins

    from users who have linked their Foursquare accounts to their Twitter accounts such

    that their Foursquare checkins also appear as tweets on Twitter. The scripts for

    collecting this data and formatting them for analysis are also included in the

    appendix. While this data was eventually not used in the analysis for this work,

    future work could compare the results generated across the three different cities, or

    the results generated from 2 different time periods in London.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    26/151

    26

    3.2.Data sorting, import, storage and analysis

    The data was formatted using the Python package pandas, which was developed to

    mimic the R softwares capabilities in managing large tables of data quickly and

    easily. To improve the speed of the analysis, many of the intermediate data required

    was pre-generated and stored in various file formats such as JSON files, numpy files

    for matrices and pickle files created using the Python pickle package.

    As each run of the method took a significant amount of time of one to two hours, an

    Amazon cloud server was set up to run the multiple variations of the neighbourhood

    detection method. This greatly sped up the process.

    The results of the neighbourhood detection method were stored in pickle files. They

    were subsequently converted to GeoJSON format and also stored in a MySQL

    database using Pythons sqlalchemy package for further analysis and visualization.

    In parts of the process where GeoJSON files had to be manipulated, the Python

    packages fiona and shapely were used to manage GeoJSON files and check for

    relationships between geographic features, for example whether a particular venue

    was within a particular boundary.

    Many of the visualizations in this work were created using Pythons matplotliband

    seaborn packages. Figure 8 was created using the software R and its ggplot library.

    3.3.The Livehoods method

    The Livehoods method is Cranshaw et als (2012) method for neighbourhood

    detection using location based social media. It is a venues-based approach that

    performs spectral clustering on an affinity matrix that takes both spatial affinity and

    social affinity into consideration. This method sought to fit the intuitive notion that

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    27/151

    27

    neighbourhoods are areas that a similar set of people frequent the more often the

    same people go to the same venues, the more likely these venues are in the same

    neighbourhood. To validate this method, Cranshaw et al (2012) had conducted

    qualitative interviews with residents in their study area and verified that the

    neighbourhoods generated by their method closely matched the residents

    perspectives of neighbourhoods in the city.

    Specifically, I applied the following steps from Cranshaw et al (2012) to generate

    the affinity matrices used in the spectral clustering algorithm:

    1.

    Given the following sets:

    a. Set V, a set of nvFoursquare venues, for which we can compute a

    geographic distance , between the venues given their latitudeand longitude coordinates.

    b. Set U, a set of nuFoursquare users

    c.

    Set C, a set of checkins of users in Uto the venues in V

    Each venue vin Vis then represented by an nudimensional vector

    where the uthcomponent of is the number of times user uchecked-in

    to v.

    2. Compute the social similaritys(i, j)between each pair of venues i, j Vby

    comparing the vectors and . Cosine similarity was used for this measure,where

    , = ( . )

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    28/151

    28

    3. Compute an nvby nvaffinity matrix on the venues. For a given venue v, let

    Nm(v)be the mclosest venues to vaccording to the , . for someparameter m. Then we let

    , = {, + , 0,

    where is a small constant that prevents any degenerate matrices from

    forming. In Cranshaw et al (2012)s work,a value of 1 102was used for.

    The affinity matrices were generated using the python packages numPy (Van Der

    Walt et al., 2011) and sciPy (Jones et al., 2001), and spectral clustering was

    performed on the affinity matrices using the python package scikit-learn (Pedregosa

    et al., 2011). To determine the number of clusters that the algorithm should create, I

    used the commonly-used eigengap heuristic (Noulas et al., 2011; Planck and

    Luxburg, 2006). This involved calculating the ksmallest eigenvalues of the

    normalized Laplacian of the affinity matrix, and setting the number of clusters as the

    number where the largest difference in eigenvalues occurred.

    The question of determining parameters such as the number of clusters to form is an

    important issue for clustering algorithms (Lancichinetti and Fortunato, 2009; Planck

    and Luxburg, 2006; Zelnik-Manor and Perona, 2004). For some clustering

    algorithms, researchers have found that maximizing modularity is a useful techniqueto guide which values to use for various parameters (Lancichinetti and Fortunato,

    2009), though they also recognize that this technique has its own limitations

    (Fortunato and Barthlemy, 2007; Good et al., 2010; Lancichinetti and Fortunato,

    2011). For spectral clustering algorithms such as the one used in the Livehoods

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    29/151

    29

    method, the eigengap heuristic was developed in particular to maximize modularity

    for the clusters generated (Donetti and Munoz, 2004).

    Cranshaw et al (2012) included a post processing step after spectral clustering to

    break up any cluster that spanned too large a geographic area (more than 40% of the

    geographic area in their work on Pittsburgh), and redistributed the venues in those

    clusters to the nearest cluster instead. In my work, the spectral clustering algorithm

    typically produced one cluster that spans a large part of the city. This seems to be a

    qualitatively different type of cluster where its boundaries are a reflection of what

    the users of the social media platform regard as the boundaries of their city, rather

    than any particular neighbourhood. As there was no theoretical reason to redistribute

    the venues in this large cluster and as a result expand the boundaries of the other

    clusters, I chose not to break up the large cluster.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    30/151

    30

    4. ANALYZING THE LIVEHOODS METHOD

    As described above, there are a number of parameters in the Livehoods method

    (Cranshaw et al., 2012) that can be tuned to generate the neighbourhood boundaries:

    the number of smallest eigenvalues to calculate (k), the number of nearest

    neighbours (m), and the alpha constant . Cranshaw et als (2012) values for these

    parameters for the Pittsburgh metropolitan area were 45, 10 and 0.01 respectively.

    Cranshaw et al (2012) acknowledged that tuning the clusters is non-trivial and may

    lead to experimenter bias. As such, it is worthexploring how tuning the parameters

    affects the resulting neighbourhoods formed to better understand the Livehoods

    method.

    4.1.Tuning the number of smallest eigenvalues (k)

    In general, as the value for kincreased, the total number of clusters formed

    increased as well. Figure 1 illustrates the relationship between k and the total

    number of clusters formed using the eigengap heuristic, for values of kfrom 0 to

    200 and Cranshaw et als (2012) values of 0.01 for the alpha constant and 10 for the

    number of nearest neighbours. The number of clusters formed increases at certain

    threshold value of k, and remains constant until the next threshold is reached. The

    threshold values for kin this case are 7, 9, 13, 25, 43, 74 and 101 with the

    corresponding values for the number of clusters formed being 5, 7, 11, 23, 41, 72

    and 99.

    Figure 2 shows the boundaries of the clusters that are formed when the 7 different

    values are used in the Livehoods method, with m= 10 and = 0.01. As the number

    of clusters created increases, the larger clusters tend to break up into smaller and

    smaller clusters. The areas near the centre of the city tend to be broken up first, and

    continue to be broken up into smaller clusters as the number of clusters increase.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    31/151

    31

    The clusters nearer to the edges of the city tend to remain large and unbroken.

    Generally, the clusters formed nearer the edge of the city are larger than the clusters

    formed nearer the centre of the city. This phenomenon is likely because the density

    of venues further from the centre of the city is much lower than the density of

    venues nearer the centre of the city. Since the Livehoods method uses a nearest

    neighbours criterion for identifying adjacent venues, areas where venues are less

    dense will cover larger areas when searching for adjacent venues and result in the

    method creating boundaries with larger areas. Many of the clusters formed when

    there are a higher number of clusters are either subsets of the clusters formed using a

    lower number of clusters, or very similar to the clusters formed using a lower

    number of clusters. The clear exception occurs where k= 74 and 72 clusters are

    formeda previously undetected large cluster is formed. This is the qualitatively

    different cluster mentioned earlier.

    Donetti and Munoz (2004) have pointed out that the weakest part of the eigengap

    heuristic is that we do not know how many eigenvalues (kin the Livehoods method)

    should be calculated apriori. While Cranshaw et al (2012) also has not provided any

    guidelines on how to choose the right value of k for cities of different sizes, cities

    occupying a larger area could be seen to potentially contain more neighbourhoods,

    and larger values of kshould be used. As the Greater London area is much larger

    than Pittsburgh, kshould be larger than 45. A kvalue of 100 was arbitrarily chosen

    in this work to test the effects of tuning the nearest neighbour parameter and the

    alpha constant, to reflect the possibility of a higher number of neighbourhoods in

    London. An even higher value may be more suitable as London is many times larger

    than Pittsburgh, but this value was used to keep computation requirements

    manageable.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    32/151

    32

    Figure 1: Relationship between number of smallest eigenvalues (k) found and number of clusters formed

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    33/151

    33

    Figure 2: Boundaries formed for different number of clusters

    5 clusters (k = 7)

    7 clusters (k = 9)

    11 clusters (k =13)

    23 clusters (k = 25)

    41 clusters (k = 43)

    72 clusters (k = 74)

    99 clusters (k = 101)

    4.2.Tuning the alpha constant ()

    To see if the alpha constant influenced the clusters formed using the Livehoods

    method, clusters were formed with k= 100, m= 10 and varying from 0.00 to 0.05

    In general, there was little difference in the clusters formed. Figure 3 depicts the

    boundaries formed using the various alpha constants. Almost all clusters formed are

    consistent or highly similar at the different alpha values. In certain rare instances,

    some clusters are merged or subdivided into 2 clusters. This shows that varying the

    alpha constant between 0.00 and 0.05 do not greatly influence the boundaries

    formed. A clear exception occurs with the largest cluster in the shift from = 0.00

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    34/151

    34

    to = 0.01it expands greatly to include many other parts of the Greater London

    area. This boundary remains consistent as increases. This behaviour again

    highlights the qualitatively different nature of this cluster.

    Figure 3: Boundaries formed for different alpha constants

    = 0.00

    = 0.01

    = 0.02

    = 0.03

    = 0.04

    = 0.05

    4.3.Tuning the nearest neighbours parameter (m)

    To see if the nearest neighbours parameter influenced the clusters formed using the

    Livehoods method, clusters were formed with k = 100, = 0.01, and mvarying

    from 5 to 20. Figures 4 depicts the boundaries formed for some of the values used.

    When m= 5, the boundaries formed overlap many of the other boundaries. As m

    increases, the number of overlaps decrease and more stable clusters are formed. For

    m= 8 to m= 20, the clusters formed are largely consistent with each other. Smaller

    clusters with a high density of venues are more consistent than larger clusters with

    low density of venues. The largest cluster changes in shape and size as at different

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    35/151

    35

    levels of m. It is hard to determine the optimal number to use for m, but values of 8

    and higher seem to generate reasonably consistent clusters.

    Figure 4: Boundaries formed for different nearest neighbours parameter (m)

    m = 5

    m= 8

    m= 10

    m= 15

    m= 18

    m= 20

    4.4.Using cosine similarity

    It has been mentioned earlier that cosine similarity was preferred over other

    similarity measures because cosine similarity does not take the magnitude of vectors

    into account. In the case of forming neighbourhoods and determining venue

    similarity, the relative frequency of the user checkins at each venue and across

    venues matter more than the total number of user checkins at each venue. Similarity

    measures that include magnitude such as Euclidean distance are thus less suitable

    than the cosine similarity measure. Using Jaccard similarity, a variant of the cosine

    similarity measure, produced results similar to the cosine similarity measure.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    36/151

    36

    4.5.Nearest neighbours versus full similarity graph

    The k-nearest neighbours similarity graph was chosen for constructing the affinity

    matrix instead of the full similarity graph as the k-nearest neighbours graph better

    captured check-in behaviour in neighbourhoods. While individuals have regular

    mobility patterns and often return to a few highly frequented locations such as home,

    school or work (Gonzlez et al., 2008), this differs from their check-in behaviour on

    location based social media networks60% to 80% of check-ins occur at places

    that were not visited before by individual users (Noulas et al., 2012). Using the full

    similarity graph meant that most of the similarity captured would relate to new

    places that the users visited over the time period. This would create clusters of

    venues that related to types of places that groups of users preferred to visit such as

    museums, nightspots and stadiums, and generate boundaries that span most of the

    city. These boundaries cannot be classified as neighbourhoods, given that they

    overlap each other greatly and cover areas that are similar to each other.

    The nearest neighbours graph, on the other hand, captures similarity relating to users

    who visited sets of venues close to one another. The boundaries formed often have

    clear separation from each other and there is very little overlap in terms of area

    covered by the boundaries. These boundaries better fit the intuitive notion of

    neighbourhoods in a city.

    4.6.Summary

    Through an investigation of the Livehoods method, I have found that using different

    alpha values from 0.01 to 0.05 and nearest neighbours parameters above 8 generally

    do not affect the results of the clusters formed. I have also found that using different

    values for the number of smallest eigenvalues changes the resulting number of

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    37/151

    37

    clusters formed, with more clusters being formed when the number of eigenvalues

    increases. The investigation also revealed that two types of clusters may be formed

    by the method. One type of cluster is the contiguous geographic space that can be

    associated with neighbourhoods, and another type of cluster seems to be large and

    spans the entire city.

    In the next two sections, I will use one of the sets of clusters / neighbourhoods

    generated by the Livehoods method to illustrate the types of information that can be

    derived from clusters formed using the Livehoods method, and neighbourhood

    detection methods in general. In section 5, I combine the clusters formed with data

    from Foursquares venues database and use it to describe the types of venues and

    activities that take place within the cluster. Incorporating information from location

    based social media to better understand the clusters / neighbourhoods formed is

    common for researchers using neighbourhood detection methods.

    In section 6, I attempt to combine the cluster / neighbourhoods formed using the

    Livehoods method with data from administrative boundaries (the Greater London

    Lower Super Output Areas in this case) and determine the ethnic diversity of the

    clusters / neighbourhoods formed. Integrating cluster / neighbourhoods detected

    using neighbourhood detection with data from administrative boundaries is rare in

    the neighbourhood detection literature as most researchers using neighbourhood

    detection methods have used them for developing recommendation engines that find

    similar places based on social media activity. My attempt tries to add more meaning

    to the clusters formed so that they can be used for other purposes, such as

    investigating ethnic diversity issues within neighbourhoods.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    38/151

    38

    5. DESCRIPTION OF LIVEHOOD CLUSTERS / NEIGHBOURHOODS

    5.1.Overview of neighbourhoods

    For comparison, the Livehoods method was applied to the Foursquare data with k=

    100, = 0.01, and m= 10. For the Greater London area, 72 clusters were generated.

    Their boundaries are depicted in Figure 5. The numbers on the clusters will be used

    as a reference for labelling and describing the results below. As mentioned earlier,

    the largest cluster formed (cluster 66 in this case) is not depicted in the figures as it

    is a qualitatively different type of cluster, and not included when describing the

    clustering results. The boundaries for this cluster can be found in the appendix.

    Table 1 contains summary statistics related to each cluster. The area for each cluster

    ranged from to 0.11 square kilometers (cluster 48) to 203 square kilometers (cluster

    18) with a median of 1.86 square kilometers per cluster. While tests (using Pythons

    powerlaw package) show no support for a power law distribution, the distribution is

    highly skewed with many small clusters and a few huge clusters. The huge clusters

    also tend to have low density in terms of checkins and venues, and as such they

    could be an artefact of the nearest neighbours proximity criterion. In sparse areas,

    the nearest neighbours tend to be further apart from each other than in dense areas,

    thus venues far apart from each other are more likely to be linked and clustered

    together.

    Figures 6a to 6c depict properties of the clusters in terms of absolute numbers - the

    number of venues in each cluster ranged from 16 (cluster 45) to 279 (cluster 38)

    with a median of 129.0; the number of check-ins in each cluster ranged from 43

    (cluster 45) to 5147 (cluster 2) with a median of 412; and the number of unique

    users checking-in in each cluster ranged from 10 (cluster 45) to 2585 (cluster 2) with

    a median of 230. Figures 6d to 6f depict properties of the clusters relative to the area

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    39/151

    39

    of the cluster and the number of venues in the clusterthe number of venues per

    square kilometer ranged from 1.27 (cluster 18) to 1,304.61 (cluster 7) with a median

    of 43.95; the number of checkins per venue ranged from 1.26 (cluster 65) to 40.09

    (cluster 26) with a median of 3.23; and the number of unique users per venue ranged

    from 0.55 (cluster 67) to 19.52 (cluster 16) with a median of 1.89.

    Many of the distributions of cluster properties are highly skewed. Clusters 2, 13, 16

    and 26 are particularly active clusters and are in the top 5 in terms of users and

    checkins across all clusters, whether in absolute terms or on a per venue basis.

    Collectively, the four clusters account for 29.5% of all checkins from 60% of unique

    users despite containing only 5.7% of all venues across the city. This is

    understandable for clusters 2 and 13 as they are in the city centre, and cluster 26 as it

    is at Heathrow airport. Cluster 16 consists of Wembley stadium, and it is likely that

    it had such high values for users and checkins during that period as it was the host

    for the 2011 UEFA Champions League Final on 28 thMay 2011, which is within the

    period of analysis. People attending this event are highly likely to checkin on social

    media as it is a rare and meaningful event for them. Under more normal

    circumstances, cluster 16 likely would have values closer to the median.

    Across all clusters, cluster 18 stands out with the largest area and relatively low

    frequencies of users and venues over such a large area. It could be classified as an

    outlier, but results for the cluster have been included for completeness. In addition,

    all variations of the Livehoods method detect this cluster or a cluster similar to this

    cluster. This is more likely an artefact of using the nearest neighbours proximity

    criterion as discussed above.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    40/151

    40

    Figure 5: Clustering results for London

    Greater London area

    City area

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    41/151

    41

    Table 1: Summary statistics for cluster results for London

    Cluster Area (sq

    km)

    Number of

    checkins

    Number of

    users

    Number of

    venues

    Number of

    check-ins per

    sq km

    Number of

    users per sq

    km

    Number of

    venues per sq

    km

    Number of

    check-ins per

    venue

    Number of

    check-ins per

    user

    Number of

    users per

    venue

    0 0.69 1002 641 238 1447.35 925.9 343.78 4.21 1.56 2.69

    1 0.89 469 321 165 527.2 360.84 185.48 2.84 1.46 1.95

    2 1.25 5147 2585 161 4121.23 2069.82 128.91 31.97 1.99 16.06

    3 26.83 356 178 180 13.27 6.63 6.71 1.98 2 0.99

    4 2.95 851 450 163 288.6 152.61 55.28 5.22 1.89 2.76

    5 0.75 462 230 102 616.58 306.95 136.13 4.53 2.01 2.25

    6 2.19 1055 556 239 481.71 253.87 109.13 4.41 1.9 2.33

    7 0.16 695 447 215 4217.23 2712.38 1304.61 3.23 1.55 2.08

    8 0.82 754 493 195 924.93 604.76 239.21 3.87 1.53 2.53

    9 1.77 610 325 241 344.83 183.72 136.24 2.53 1.88 1.35

    10 1.5 806 409 253 536.37 272.18 168.36 3.19 1.97 1.62

    11 0.6 967 622 231 1602.32 1030.65 382.77 4.19 1.55 2.69

    12 1.09 294 163 120 270.77 150.12 110.52 2.45 1.8 1.36

    13 2.73 2888 2032 202 1056.98 743.7 73.93 14.3 1.42 10.06

    14 4.62 540 213 155 116.81 46.07 33.53 3.48 2.54 1.37

    15 0.62 1357 578 108 2184.13 930.31 173.83 12.56 2.35 5.35

    16 22.55 3508 1737 89 155.54 77.01 3.95 39.42 2.02 19.52

    17 1.74 691 322 165 396.12 184.59 94.59 4.19 2.15 1.95

    18 203.11 257 110 157 1.27 0.54 0.77 1.64 2.34 0.7

    19 0.88 248 154 101 280.51 174.19 114.24 2.46 1.61 1.52

    20 2.08 556 296 154 267.1 142.2 73.98 3.61 1.88 1.92

    21 23.94 831 398 257 34.71 16.63 10.74 3.23 2.09 1.55

    22 12.1 453 304 157 37.43 25.12 12.97 2.89 1.49 1.94

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    42/151

    42

    Cluster Area (sqkm)

    Number ofcheckins

    Number ofusers

    Number ofvenues

    Number ofcheck-ins per

    sq km

    Number ofusers per sq

    km

    Number ofvenues per sq

    km

    Number ofcheck-ins per

    venue

    Number ofcheck-ins per

    user

    Number ofusers per

    venue

    23 4.7 378 168 139 80.49 35.78 29.6 2.72 2.25 1.21

    24 1.56 464 296 123 296.6 189.21 78.62 3.77 1.57 2.41

    25 42.64 285 121 135 6.68 2.84 3.17 2.11 2.36 0.9

    26 0.35 2165 975 54 6131.41 2761.26 152.93 40.09 2.22 18.0627 0.41 348 235 163 844.05 569.97 395.34 2.13 1.48 1.44

    28 0.31 167 117 48 543.27 380.61 156.15 3.48 1.43 2.44

    29 1.24 827 384 54 668.99 310.63 43.68 15.31 2.15 7.11

    30 1.71 1921 547 148 1126.03 320.63 86.75 12.98 3.51 3.7

    31 0.75 160 124 31 214.22 166.02 41.5 5.16 1.29 4

    32 136.96 432 340 131 3.15 2.48 0.96 3.3 1.27 2.6

    33 25.62 405 224 141 15.81 8.74 5.5 2.87 1.81 1.59

    34 0.21 637 394 188 3098.25 1916.34 914.4 3.39 1.62 2.1

    35 0.15 181 94 38 1197.88 622.1 251.49 4.76 1.93 2.47

    36 22.11 321 140 93 14.52 6.33 4.21 3.45 2.29 1.51

    37 0.6 358 183 73 600.17 306.79 122.38 4.9 1.96 2.51

    38 0.32 1169 740 279 3624.81 2294.57 865.12 4.19 1.58 2.65

    39 1.4 1366 622 161 974.53 443.75 114.86 8.48 2.2 3.8640 8.27 179 69 81 21.65 8.34 9.8 2.21 2.59 0.85

    41 5.94 144 82 87 24.23 13.79 14.64 1.66 1.76 0.94

    42 0.28 481 311 75 1702.65 1100.88 265.49 6.41 1.55 4.15

    43 1.86 172 134 29 92.24 71.87 15.55 5.93 1.28 4.62

    44 75.25 167 69 99 2.22 0.92 1.32 1.69 2.42 0.7

    45 1.13 43 10 16 38.16 8.88 14.2 2.69 4.3 0.62

    46 6.48 65 30 40 10.03 4.63 6.17 1.62 2.17 0.75

    47 11.88 315 149 144 26.51 12.54 12.12 2.19 2.11 1.03

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    43/151

    43

    Cluster Area (sqkm)

    Number ofcheckins

    Number ofusers

    Number ofvenues

    Number ofcheck-ins per

    sq km

    Number ofusers per sq

    km

    Number ofvenues per sq

    km

    Number ofcheck-ins per

    venue

    Number ofcheck-ins per

    user

    Number ofusers per

    venue

    48 0.11 199 155 36 1761.06 1371.68 318.58 5.53 1.28 4.31

    49 31.95 173 86 89 5.42 2.69 2.79 1.94 2.01 0.97

    50 0.66 255 117 99 387.71 177.89 150.52 2.58 2.18 1.18

    51 0.55 385 248 131 705.65 454.55 240.1 2.94 1.55 1.8952 39.21 775 287 129 19.77 7.32 3.29 6.01 2.7 2.22

    53 1.12 751 413 209 670.36 368.65 186.56 3.59 1.82 1.98

    54 87.89 202 93 107 2.3 1.06 1.22 1.89 2.17 0.87

    55 5.6 316 98 123 56.39 17.49 21.95 2.57 3.22 0.8

    56 18.86 551 287 200 29.21 15.21 10.6 2.76 1.92 1.44

    57 1.12 189 105 79 168.69 93.72 70.51 2.39 1.8 1.33

    58 0.33 766 444 132 2296.85 1331.33 395.8 5.8 1.73 3.36

    59 21.86 412 195 193 18.85 8.92 8.83 2.13 2.11 1.01

    60 47.01 228 88 107 4.85 1.87 2.28 2.13 2.59 0.82

    61 1.27 115 60 56 90.25 47.08 43.95 2.05 1.92 1.07

    62 1.99 181 56 66 90.82 28.1 33.12 2.74 3.23 0.85

    63 9.31 47 20 28 5.05 2.15 3.01 1.68 2.35 0.71

    64 8.39 1325 681 261 157.85 81.13 31.09 5.08 1.95 2.6165 10.86 54 31 43 4.97 2.86 3.96 1.26 1.74 0.72

    67 33.75 99 28 51 2.93 0.83 1.51 1.94 3.54 0.55

    68 14.95 103 44 38 6.89 2.94 2.54 2.71 2.34 1.16

    69 4.78 113 76 73 23.62 15.89 15.26 1.55 1.49 1.04

    70 0.5 699 367 115 1388.01 728.75 228.36 6.08 1.9 3.19

    71 34.32 532 323 221 15.5 9.41 6.44 2.41 1.65 1.46

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    44/151

    44

    Figure 6: Properties of Livehood clusters

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    45/151

    45

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    46/151

    46

    5.2.Breakdown of individual neighbourhoods

    The venues within each cluster are venues that can be found on the location based

    social network Foursquare. Foursquare categorizes its venues in a category hierarchy

    with three levels. The 10 main categories at the top of the hierarchy are: Arts &

    Entertainment, College & University, Event, Food, Nightlife Spot, Outdoors &

    Recreation, Professional & Other Places, Residence, Shop & Service, and Travel &

    Transport. Each of these 10 main categories have their own subcategories, which

    themselves can be further subcategorized. There are more than 200 subcategories and

    sub-subcategories altogether. As places may be referred to at different levels of

    granularity, some venues may not have a sub-subcategory. For example, London

    Heathrows Terminal 5 falls in the Travel & Transport main category, the airport

    subcategory, and the airport terminal sub-subcategory. The London Heathrow Airport,

    on the other hand, falls in the same main and subcategories, but does not have a sub-

    subcategory.

    We can gain insight to the makeup of the city by creating city profiles using

    information on venue categories of each and the behavior of the users of location based

    social media networks. To calculate the distribution of venues / checkins by category

    for the city, the formula used to calculate the value for each category (A) was:

    = .

    . 100

    Figure 7 shows the overall distribution of venues and checkins across all clusters

    according to Foursquares main categoriesin percentage values. 29.23% of venues in

    the data are in the food category, followed by 17.05% of venues in the nightlife spots

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    47/151

    47

    category. Users, however, check-in mostly at venues related to travel & transport

    (23.04%), professional & other places (18.86%), and arts & entertainment venues

    (15.68%). From here, we can observe that venues in the travel & transport, professional

    & other places, nightlife spot and arts & entertainment receive a disproportionate

    number of checkins. This means that clusters formed based on Foursquare checkins are

    likely to be biased towards these venues in these categories, and may be more suitable

    for research questions related to such categories (e.g. transport, culture).

    Figure 7: Overall distribution of venues and checkins across clusters

    % of venues

    % of checkins

    Similar profiles can be created for each cluster to form neighbourhood profiles. To

    calculate the distribution of venues / checkins by category within a neighbourhood, the

    formula used to calculate the value for each category (B) was:

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    48/151

    48

    = . . 100

    This gives a sense of the type of venues in the clusters and the type of activities that

    occur within them. These neighbourhood profiles were compared with the city profile

    to understand which categories within the neighbourhood were overrepresented /

    underrepresented. For each category, the formula was:

    = 100

    Tables 2 and 3 contain the percentage difference figures for all clusters for venues and

    checkins respectively, with the highest positive difference for each cluster highlighted.

    These percentage differences between each category was used to determine which types

    of venues occurred more frequently and which types of venues users checked-in at

    more frequently within the cluster. For example, clusters 28 and 29 have more venues

    and checkins in the travel and transport category, as these clusters are essentially the

    London Heathrow airport terminals, which we expect to have a higher concentration of

    venues and checkins related to travel and transport. Another example is clusters with

    high levels of concentration of venues and checkins in the college & university category.

    Clusters 27, 46 and 47 have percentage difference figures of over 1000% for users

    checking-in, and they contain University College London, Brunel University London,

    and the Queen Mary University of London respectively.

    From tables 2 and 3, we again observe differences between checkin behaviour and types

    of venues. For many clusters, the most overrepresented category in terms of venues is

    different from the most overrepresented category in terms of checkins. Cluster 3, for

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    49/151

    49

    example, would be characterised as a cluster in the outdoors & recreation category in

    terms of venues, and as a cluster in the residence category in terms of checkins.

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    50/151

    50

    Table 2: Percentage difference between proportion of venues within cluster to proportion of venues

    within city in terms of Foursquares main categories

    Note: Empty cells indicate that the cluster did not contain venues in that categoryCluster Arts &

    Entertain

    ment

    College &

    University

    Food Nightlife

    Spot

    Outdoors &

    Recreation

    Professional &

    Other Places

    Residence Shop &

    Service

    Travel &

    Transport

    0 -36.9 22.98 -12.42 -2.02 -69.76 99.3 -74.25 73.22 -75.35

    1 56.72 -22.09 94.24 -56.19 56.2 -62.7 -25.86 -73.22

    2 -30.23 -31.03 -20.18 102.84 193.33 -20.29 -75.62 -38.013 -38.26 11.54 -4.65 134.71 -28.4 58.7 -29.9 5.49

    4 -58.42 -36.97 18.11 -40.54 -22.52 10.51 18.75 -57.63 75.24

    5 67.78 9 34.81 -45.16 -30.51 -65.77 -37.2 63.82

    6 -9.85 -79.5 9.09 13.45 31.04 47.02 -17.33 -35.31

    7 -31.44 -77.73 60.26 56.88 -78.1 2.95 -52.95 -79.92

    8 -54.73 -22.79 23.46 -9.36 -49.38 2.55 -83.84 48.28 -18.77

    9 -55.9 180.75 2.22 -4.14 -90.14 133.34 -62.21 -42.23 -57.8

    10 -48.85 55.06 -18.64 -4.91 -42.81 122.44 -75.65 -21.83 -15.52

    11 49.53 154.99 27.43 9.59 -68.65 -8.55 10.19 -61.68

    12 26.02 -4.49 25.29 92.24 -76.52 -54.33 -70.01 -8.28 -71.29

    13 30.04 10.82 -22.51 9.05 13.9 -53.57 -43.21 40.73

    14 -0.96 -13.19 21.4 -36.72 38.45 41.43 -13.5 -8.14

    15 -20.1 5.53 -25.73 -13.13 -73.83 166.216 129.2 -39.23 -34.44 -14.57 1.53 45.47 33.45 56.65

    17 21.43 24.18 54.36 -24.57 -41.32 -80.73 6.06 -44.67

    18 -52.25 117.14 -29.47 -13.5 60.17 -25.01 263.68 18.16 3.35

    19 -14.22 -51.24 31.58 -38.66 -4.08 -53.37 22.5 -25.08 61.23

    20 -15.11 286.03 -8.37 -23.11 58.2 7.68 -19.18 -1.14 -12.97

    21 13.19 -1.02 -10.97 31.98 143.38 -49.51 74.07 14.06 -46.44

    22 60.08 -33.82 -18.15 70.65 95.26 -36.71 24.69 -36.45 -35.35

    23 -34.8 -62.94 -2.77 -39.4 64.02 24.05 -6.9 -0.35 33.67

    24 18.68 -10.05 31.48 -60.4 -55.77 -21.14 -71.75 141.86 -52.68

    25 -49.49 -28.26 -22.95 88.27 9.84 92.36 61.76 9.32

    26 -84.9 -74.66 -80.66 520.55

    27 5.15 557.5 -5.91 -9.77 52.44 -43.69 -2.42 -68.56

    28 -51.49 -42.13 -86.13 -57.63 413.88

    29 -57.48 -64.32 -69.86 -27.37 383.09

    30 -84.8 280.27 24.38 -69.56 -32 -6.33 -34.86 65.99 -6.48

    31 291.8 78.17 6.84 34.47 -12.38 -43.2 11.9 -46.44

    32 62.27 -59.01 -4.74 -48.43 20.96 -67.33 80.23 33.85 66.35

    33 113.59 49.43 -27.19 -20.11 28.6 -52.36 205.03 -13.9 34.75

    34 -52.34 21.84 5.64 -60.03 12.27 118.52 -79.64

    35 85.21 40.38 15.75 23.61 -30.97 -55.25 164.5 -19.12 -57.8

    36 -16.27 -36.54 -61.95 35.7 212.06 21.38 99.28 -14.69 -4.63

    37 -35.32 -17.32 1.74 5.48 38.55 -85.88 143.13

    38 171.65 -14.21 54.33 33.81 -83.13 -64.45 -89.22 -24.21 -79.37

    39 -13.91 226.23 -21.75 10.8 -51.87 4 -38.53 -68.67 76.51

    40 -44.94 12.62 -37 23.14 -80.04 253.85 20.23 41.13

    41 3.16 17.28 -7.7 69.65 15.34 -81.31 120.97 57.67 -73.56

    42 -39.18 38.28 -6.71 -13.03 -32 -44.9 -86.72 180.57

    43 -21.64 -73.29 -77.59 -12.38 13.6 -65.78 328.44

    44 -73.54 20.32 -36.87 -1.62 166.26 -23.29 -62.21 15.55 71.79

    45 -56.6 -27.16 184.75 269.19 263.68

    46 461.5 -36.87 5.95 107.09 11.88 76.33 7.84 -36.71

    47 -6.69 112.17 -33.73 -2.14 143.45 7.09 99.89 8.67 -25.59

    48 473 73.62 -45.37 42.38 -76.93 -72.2

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    51/151

    51

    Cluster Arts &

    Entertain

    ment

    College &

    University

    Food Nightlife

    Spot

    Outdoors &

    Recreation

    Professional &

    Other Places

    Residence Shop &

    Service

    Travel &

    Transport

    49 4.48 -15.42 -17.82 -12.38 -14.8 198.41 59.69 -10.74

    50 19.84 9 34.81 -38.3 -46.4 -13.13 -31.54 25.6 -18.09

    51 36.96 -61.07 48.82 37.11 -61.71 -13.13 -25.24 -70.75

    52 -25.23 27.5 -55.4 -62.58 -58.2 8.39 140.23 22.44 149.11

    53 -58.63 -76.49 30.43 -17.18 -19.06 -47.53 94.2 -8.11

    54 -75.45 11.62 -41.43 -1.71 37.23 -19.93 5.16 125.1 17.43

    55 7.23 -2.48 -12.28 -7.99 43.88 -45.59 83.76 124.77 -56.03

    56 19.84 -75.23 16.98 -22.1 -39.09 -48.67 117.82 66.52 -25.54

    57 -68.66 -28.73 49.58 25.51 -64.95 -43.2 -10.48 36.88 -67.87

    58 92.51 -63.52 -15.24 0.94 25.56 51.16 -31.27 -29.94 -17.77

    59 -88.42 -47.36 -9.25 65.54 42.38 -66.44 98.37 1.1 2.85

    60 -76.85 5.28 -21.08 -60.27 81.21 -24.48 429 81.98 -28.8

    61 -53.7 -21.08 -7.3 -48.23 -16.09 32.25 21.32 89.88

    62 -18.51 -7.35 4.17 74.81 173.36 -26.16 -46.62 -58.23

    63 -7.39 -36.87 58.92 3.55 -32.87 142.65 -36.71

    64 23.23 -62.64 -6.19 -10.71 65.34 10.16 -6.15 -46.19 34.75

    65 -48.94 2.83 -33 73.74 156.72 -73.83 104.77

    67 120.25 -81.23 25.99 146.27 -0.22 -21.37 20.23 50.53

    68 764.33 40.38 -47.39 -11.71 38.06 -77.62 -11.83 -15.6169 -68.66 81.63 -10.35 75.23 -77.28 79.04 -86.31 -14.31

    70 -52.24 -17.67 -15.9 -6.06 -31.49 -81.66 215.81

    71 18.68 -10.05 -17.4 13.15 54.82 -3.22 154.23 -17.94 -22.27

  • 7/24/2019 An Investigation in Defining Neighbourhood Boundaries Using Location Based Social Media

    52/151

    52

    Table 3: Percentage difference between proportion of users within cluster checking-in to proportion of users

    within city checking-in in terms of Foursquares main categories

    Note: Empty cells indicate that the cluster did not contain checkins at venues in that categoryCluster Arts &

    Entertain

    ment

    College &

    University

    Food Nightlife

    Spot

    Outdoors &

    Recreation

    Professional &

    Other Places

    Residence Shop &

    Service

    Travel &

    Transport

    0 -79.6 -0.55 40.74 27.39 -68.98 -18.97 -85.59 301.5 -56.14

    1 -76.69 102.86 443.13 -84.49 -9.12 -53.68 -62.41 -75.13

    2 -96.62 -89.9 -89.18 -79.65 397.75 -94.49 -98.82 -93.593 -81.43 104.46 56.17 47.46 -35.65 512.27 -40.5 -20.42

    4 -72.77 9.73 -4.76 -70.33 -73.65 -20.14 4.96 -88.73 154.67

    5 126.08 -58.82 45.69 -29.67 -90.09 -83.59 -64.74 54.72

    6 116.72 -82.9 21.01 42.39 -48.17 -15.14 50.33 -77.63

    7 -81.33 -86.93 259.98 268.23 -43.21 -52.84 -28.4 -87.57

    8 -78.01 -39.08 159.78 23.12 -55.41 -63.36 -90.29 176.79 -32.2

    9 -85.31 425.78 109.76 105.26 -98.28 67.16 -40.13 -46.84 -47.13

    10 -77.83 92.76 14.47 19.42 -85.73 52.78 -81.92 8.74 26.19

    11 -13.69 445.6 72.18 60.66 -68.8 -57.56 207.14 -75.39

    12 -48.3 71.85 178.87 242.42 -60.69 -44.18 201.37 -45.07 -77.12

    13 -82.6 -61.13 -82.41 612.28 -83.99 -89.89 -94.21 -28.25

    14 -38.17 13.87 51.86 -92 40.98 290.4 -50.08 13.97

    15 -72.15 -35.39 -67.18 -73.94 -96.01 236.6816 524.9 -91.76 -95.61 -99.12 -92.89 -81.57 -86.8 -93.28

    17 107.66 10.02 96.5 -12.62 -78.23 -78.63 95.86 -68.76

    18 -80.65 243.03 21.36 83.09 -41.15 -43.26 822.86 37.06 -12.31

    19 -34.14 -61.93 244.84 8.38 -56.45 -81.68 142.79 -17.42 -14.89

    20 -32.68 524.9 32.26 -3.85 83.51 -38.03 75.02 -42.16 -10.84

    21 195.81 -44.45 12.16 78.69 -30.11 -76.61 192.27 -35.31 -77.81

    22 200.33 117.03 8.52 100.78 -40.43 -73.89 90.3 -72.75 -70.07

    23 -84.82 -24.29 61.91 -3.02 41.43 -2.85 20.7 -7.8 11.63

    24 -28.72 41.17 65.53 -59.82 -88.47 -39.35 -67.85 397.29 -90.41

    25 -89.82 2.18 15.6 -7.1 -20.61 331.64 62.27 28.77

    26 -97.28 -94.73 -99.06 316.58

    27 -84.32 1098.59 126.17 174.43 -12.23 107.7 -4.81 -73.98

    28 -91.32 3.49 -91.78 -73.64 248.71

    29 -75.58 -87.72 -96.76 -86.46 295.48

    30 -82.04 31.14 -37.7 -92.67 108.91 -84.51 -77.6 505.79 -84.18

    31 352.94 186.79 -24.24 14.29 -93.44 -93.1 220.05 -83.63

    32 287.79 -56.49 -11.74 -65.94 -77.61 -84.3 125.45 -35.42 -45.16

    33 95.82 70.33 -26.53 3.9 -16.51 -61.95 559.49 -22.22 -20.15

    34 -92.35 174.14 129.52 -40.18 -36.21 167.03 -86.18

    35 -79.11 -44.45 104.41 58.14 204.98 -83.29 342.83 -74.63 -28.66

    36 71.27 -30.99 -67.45 32