exploring the data universe with semantic signatures: plous lecture 2015
TRANSCRIPT
Analogies Observatories Semantic Signatures Challenges Next Steps
Exploring the Data Universe with
Semantic Signatures
Plous Lecture 2015
Krzysztof JanowiczSTKO Lab, University of California, Santa Barbara, USA
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Puddingand planets
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Plum Pudding
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Thomson’s Plum Pudding Model (1904)
Positive charge distributed equally in the atom, electrons embedded as raisins
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Rutherford(-Bohr) Solar System Model (1911/13)
Small nucleus with a high mass and electrons that revolve around it
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Analogies
‘And I cherish more than anything the Analogies,my most trustworthy masters. They know all thesecrets of Nature, and they ought to be leastneglected in Geometry.’ (Johannes Kepler)
Analogies enable us to explore a newdomain (target) by mapping its structureto another, more familiar domain (source).They allow us to ask new questions whichonly become meaningful in the new domain.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Observatoriesand sensors
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Astronomical Observatories And Their Sensors
The Griffith Observatory
Griffith donated funds and land to build the observatory to make astronomy accessible tothe public. This was in clear contrast to the prevailing idea of locating observatories onremote mountaintops and restrict them to scientists. Today, our society is willing to investbillions to study phenomena that may not even exist anymore (e.g., the Pillars of Creation).
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Astronomical Observatories And Their Sensors
Observatories and Their Sensors
Whether on land or in space, observatories and their sensors servedifferent purposes and are most useful when they work together.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Astronomical Observatories And Their Sensors
Spectral Signatures, Bands, and Remote Sensing
Spectral signatures are the combination of emitted, reflected, or absorbedelectromagnetic radiation at varying wavelengths (bands) that uniquelyidentify a feature type.Spectral libraries, the idea of sharing spectral signatures, hasrevolutionized remote sensing.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
A Universe of Data?
The Data Universe: Synthesis Is The New Analysis
What is the common core of the digital universe, physical-cyber-socialsystems, digital earth, 4th paradigm, big data, social machines, and so forth?Synthesis is the new analysisObservational science versus experimental science(Unintended)reuse of existing data, semantic interoperabilityHeterogeneity: multi-thematic, multi-perspective, multi-resolution
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
A Universe of Data?
Towards Data Observatories
Web Science Trust: ‘A web observatory is a system that gives public accessto some specific aspects of the WWW and provides the infrastructure andvisualization techniques to support monitoring, analysis, and experiments.’Web Science Trust wants to establish a network of observatories.New questions: are there laws of the data universe?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Constructing the
Analogy
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Semantic Signatures and Bands
Semantic Signatures As Analogy To Spectral Signatures
Geospatial bandsbased on geographic location
ANNDRipley’s K BinsJ MeasureDzero
Temporal bandsbased on geo-social check-ins
24 Hours7 DaysSeasons
Thematic bandsbased on venue tips and reviews
LDA topicsTF-IDF
Makes use of dataheterogeneity
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Thematic Bands
Thematic Bands & Geo-Indicativeness
Places at geographic location 34.43, -119.71 are:of types city, county seat,...at the coastline, near the mountains, have Mediterranean climate,...described in terms of urban area, economy, tourism, government, employment,...
Interesting observation: some of these terms will co-occur by type, others per region.Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Thematic Bands
Thematic Bands & Geo-Indicativeness
A thematic band can becomputed out of unstructuredtext from sources such asWikipedia, travel blogs, newsarticles, and so forth.Non-georeferenced plain textis often still geo-indicativeDifferent types of geographicfeatures have different,diagnostic topics associated tothem (out of 500 topics)Indicative topics and be lifted tothe type-level.Here, we modeled topics usinglatent Dirichlet allocation (LDA)
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Thematic Bands
Thematic Bands & Geographic Feature Types
City topics: 204>450>104>282>267>497>443>484>277>97>...Town topics: 425>450>419>367>104>429>266>69>204>308>...Mountain topics: 27>110>5>172>208>459>232>398>453>183>...
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Temporal Bands
Temporal Bands
Study geo-socialcheck-in data tolocation-based socialnetworks.Aggregate them to thefeature type level andclean them.Intuitively, people visitwineries in theafter-noon and eveningand bakeries in themornings.Combining weekly andhourly bands to createplace type signatures.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Spatial Bands
Spatial Bands
POI plotted by similarity to bar and post office in OpenStreetMap data (London)Similarity measured as association strength in OSM change historyBars (and similar features) tend to clump togetherPost Offices (and similar features) are rather uniformly distributed
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Spatial Bands
Spatial Bands
Dzero measures the likelihood of features of a certain type to co-occurwithin a specific semantic and spatial range.General idea: generate recommendations and clean up data based ontype likelihood. ’How likely is a post office directly next to an existing one?’
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Sensor Resolution & Social Sensing
(Remote sensing) sensors can be characterized by their resolution
Spatial resolution: smallest feature that can be detected, i.e., the pixel size.Temporal resolution: smallest time interval between a repeated observation.Spectral resolution: number, position, and width of spectral bands.Radiometric resolution: small distinguishable differences in radiation magnitude.
Analogous social sensor resolutions, e.g., types of bands, number of topics.Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Platial Resolution of Termporal Signatures
Circular temporal signatures histograms for Theme Park (a,b,c) andDrugstore (d,e,f).About 50% of ≈ 400 Point Of Interest (POI) types are regionally invariant in the USA.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Temporal Resolution of Termporal Signatures
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
The ’Foursquare-day’How and when do people check-in at places, manually, automatically?Do they check-out? If not, after what time are they checked-out automatically?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Distinguishable Feature Types For Thematic Signatures From 500-Topics
Which classes in a feature type schema can be meaningfully distinguished?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
SpatialSearch Challenges
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
1. Challenge: Mapping User Locations from Spaces to Places
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
1. Challenge: Mapping User Locations from Spaces to Places
Estimate the place visited by a user from the user’s spatial location(e.g., as measured by their smartphone).
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
Baseline: Google Place API
Marker Category Distance (m)A Bakery 39.2B Nightclub 41.4C Nightclub 69.9D American Restaurant 62.7E Bakery 73.7F Fast Food 65.0G Apparel Store 85.8H Ice Cream Shop 82.6I Movie Theater 94.2J Pub 88.9K Cosmetics Shop 60.9L Diner 70.0M Italian Restaurant 45.7N Furniture / Home Store 114.9O Grocery Store 147.8P BBQ Joint 82.3Q Burrito Place 88.1R Italian Restaurant 93.6
Geolocation APIs map geographic coordinates, e.g., from a user’ssmartphone, to an ordered sets of nearby candidate POI.These services typically return the n nearest POI within a certain radius anduse spatial distance to the provided coordinates to determine their order.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
Our Approach: Distort POI Locations Using Temporal Signatures
Marker Category Distance (m) Monday 10AM (10−3 ) Saturday 11PM (10−3 )A Bakery 39.2 6.28 4.08B Nightclub 41.4 0.26 44.16C Nightclub 69.9 0.26 44.16D American Restaurant 62.7 1.61 9.50E Bakery 73.7 6.28 4.08F Fast Food 65.0 4.80 5.78G Apparel Store 85.8 2.51 1.09H Ice Cream Shop 82.6 0.84 15.88I Movie Theater 94.2 1.44 11.00J Pub 88.9 0.53 22.66K Cosmetics Shop 60.9 3.87 1.57L Diner 70.0 5.49 7.56M Italian Restaurant 45.7 1.42 7.96N Furniture / Home Store 114.9 4.79 5.01O Grocery Store 147.8 4.53 1.38P BBQ Joint 82.3 0.43 9.35Q Burrito Place 88.1 0.54 3.16R Italian Restaurant 93.6 1.42 7.96
The likelihood of visiting a coffee shop, university, bakery, etc at 7pm israther low, while it is a peak hour for restaurants.In analogy to scale distortion in cartography, we can modify the purely spatialranking by pulling and pushing places based on the check-in probabilityof their temporal type signatures.Different distortion models: linear, non-linear, symmetrical, non-symmetric
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
Our Approach: Distort POI Locations Using Temporal Signatures
Marker ActualDist.(m)
DistortedDist.(m)
A 39.2 25.8B 41.4 71.4C 69.9 99.9D 62.7 79.8E 73.7 60.3F 65.0 59.5G 85.8 95.6H 82.6 106.7I 94.2 112.8J 88.9 116.1K 60.9 61.1L 70.0 60.6M 45.7 64.5N 114.9 109.5O 147.8 143.9P 82.3 110.5Q 88.1 115.2R 93.6 112.4
Method MRR SRR nDCG 1st Pos.Distance-Only 0.359 443.8 0.583 211Temporally Adjusted 0.453 793.5 0.711 423
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
2. Challenge: Vague Cognitive Regions
Where is SoCal and NorCal?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Baseline: Tests With Human Participants
44 participants, 90 hexagon tessellation (≈ 4920km2 each)
Google Maps search for SoCal
[More on the extraction of polygons at [email protected]]
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Data and Correlations
Source SoCal NorCal TotalFlickr 22132 19706 41838Instagram 169648 116984 286632Twitter 10376 3294 13670Travel Blogs 107 78 185Wikipedia 1450 700 2150
0 1000 2000 3000 4000
0.0
0.2
0.4
0.6
0.8
1.0
Empirical Cumluative Distribution
Flickr photo counts per userC
DF
Source ρ (M1) τ (M1)Flickr 0.881 0.721Instagram 0.867 0.711Twitter 0.874 0.714TravelBlogs & Wikipedia 0.897 0.74Means 0.870 0.712
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Vague Cognitive Regions and Inter-rater Agreement
5
5
5 6.25
5.75
6.75
5.25
5.75
5.25
6.75
4.25
6.75
6.75
5.75
5.25
6.75
5.75
5.25
5.75
4.75
6.75
6.75
4.25 4.75
5.25
4
4
5
4
4
4
5
5.5
5.5
3.5
6.5
1.5
6.5
6.5 5.5
5.5
3.5
4.5
4.253
3
33
3.5
2.5
3.5
2.5
2.5
2.5
3.5
2.5
3.25
3.25
3.75
1.75
2.75
3
2
2
22.25 2.25
2.252.25
0 80 160 240 32040Miles
®
LegendInsufficient Data
Very Northern Californian
Moderately Northern Californian
Slightly Northern CalifornianEqually Northern and Southern CalifornianSlightly Southern Californian
Moderately Southern Californian
Very Southern Californian
Standard Deviations< 0.01
0.01 - 0.500.51 - 1.001.01- 1.73> 1.73
Source Four Raters Five RatersKendall’s W 0.953 0.929p-value < 0.001 < 0.001
Key idea: Data sources becomeraters/ participants.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Vague Cognitive Regions and Thematic Signatures
Do you even have to mine for the Socal and Norcal term directly?Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Vague Cognitive Regions and Self-Similarity
0 5 10 15
0.00
0.05
0.10
0.15
0.20
KLD Divergence
Northern CaliforniaSouthern CaliforniaBoth Northern & Southern California
Based on 60 topics, the similarity between SoCal (and NorCal) cells ishigher than between SoCal and NorCal cells.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Is the Data Universe Homogenous amd Isotropic?
Limits Of The Data Universe Analogy
At large scale, the physical universe is homogenous and isotropic
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Is the Data Universe Homogenous amd Isotropic?
Limits Of The Data Universe Analogy
In terms of geospatial distribution the Social Media Web is neither homogenousnor isotropic. If you direct your social sensing instrument to a certain region,there will be no signal.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
NextSteps
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
POI Pulse Observatory
POI Pulse Observatory: Explore the Pulse of Los Angeles Using Signatures
http://poipulse.com/
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
POI Pulse Observatory
A Public Data Observatory at UCSB?
A tangible & public observatory atUCSB; remember Griffith’s will.Show & stream data from differentsources and show analysis resultsVisualize privacy implications of dataShow citizens how their everyday datais used for scientific discoveries
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
The Right Place
Exploring the Data Universe with Semantic Signatures K. Janowicz