listening to the pulse of our cities with stream reasoning (and few more technologies)

81
Listening to the pulse of our cities with Stream Reasoning (and few more technologies) Emanuele Della Valle @manudellavalle - [email protected] http://emanueledellavalle.org

Upload: emanuele-della-valle

Post on 21-Apr-2017

647 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Listening to the pulse of our cities with Stream Reasoning (and few more technologies)Emanuele Della Valle@manudellavalle - [email protected]://emanueledellavalle.org

Page 2: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Share, Remix, Reuse — Legally This work is licensed under the Creative Commons

Attribution 3.0 Unported License. Your are free:

• to Share — to copy, distribute and transmit the work

• to Remix — to adapt the work Under the following conditions

• Attribution — You must attribute the work by inserting– “[source http://emanueledellavalle.org]” at the end of each

reused slide– a credits slide stating

- These slides are partially based on “Listening to the pulse of our cities fusing Social Media Streams and Call Data Records” by Emanuele Della Valle

To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ 2

Page 3: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org3

Me

Assistant Professor at DEIBPolitecnico di Milano

Expert in semantic technologies and stream computing

Brander of stream reasoning: an approach to master the velocity and variety dimension of Big Data

15 years experience in research and innovation projects

Startupper: fluxedo.com R&D advisor: socialometers.com

[email protected]

@manudellavalle

http://emanueledellavalle.org

http://streamreasoning.org

http://fluxedo.com

Page 4: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Acknowledgements

Politecnico di Milano• DEIB

– What- Scientific direction- Semantic technologies- Stream Processing- Data science

– Who- Emanuele Della Valle- Marco Balduini

• Density Design Lab– What

- Visual analytics– Who

- Paolo Ciuccarelli- Matteo Azzi

Telecom Italia• SKIL Lab

– What- Big Data technology- Data Science

– Who - Fabrizio Antonelli- Roberto Larker

Funding agency

4

Page 5: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Agenda

Context Problem Experimental setting Solution Evaluation Conclusions

5

Page 6: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The digital reflection of our cities is sharpening

6

[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]

Page 7: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The digital reflection of our cities is sharpening

7

[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]

because the urban environment is captured in open datasets

Page 8: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The digital reflection of our cities is sharpening

8

[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]

and streams of information flows through our cities thanks to

Page 9: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The digital reflection of our cities is sharpening

9

[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]

and streams of information flows through our cities thanks tothe pervasive deploymentof sensors

Page 10: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The digital reflection of our cities is sharpening

10

[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]

and streams of information flows through our cities thanks tothe wide adoption of smart phones

Page 11: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The digital reflection of our cities is sharpening

11

[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]

and streams of information flows through our cities thanks tothe usage of (location-based) social networks

Page 12: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

and it tracks changes with a decreasing delay

12

Page 13: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

and it tracks changes with a decreasing delay

13

Data source By when Frequency DelayCensus data 100s year years monthsNewspaper 100s year days 1 dayWeather sensors 10s year hours/minutes hours/minutesTV news 10s years hours minutesTraffic sensors years 15 minutes minutesCall Data Recors years 15 minutes hoursSocial media years seconds seconds IoT recently milliseconds milliseconds

Page 14: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 14

Data piles up without easing decision making

I have to decide:A or B?

Why not C?What if D?

mayor

Page 15: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

But smarter Big Data can …

…advance our ability to feel the pulse of our cities

15

fusing all those data sources

making sense of the fused information

mayor

Definitely E!

to improve decision making and deliver innovative services

Page 16: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Can we collect, analyse and repurpose• social media and

• Call Data Records to allow

• perceiving emerging patterns and

• observing their dynamics?

Let's focus on a concrete research question

16

[photo: https://www.flickr.com/photos/debord/4932655275]

Page 17: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Can we collect, analyse and repurpose

• social media captured at place and events and

• privacy-preserving aggregates of Call Data Records

to allow visually• perceiving emerging patterns and

• observing their dynamics?

More precisely, the research question is

17

[photo: https://www.flickr.com/photos/debord/4932655275]

Page 18: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How to set up an experiment?

18

[photo: https://www.flickr.com/photos/myfuturedotcom/6053042920]

Question AnswerWhich city? MilanComparing what? Milan Design Week vs. Milan in generalExperimental subjects? Event Managers & casual audience

Page 19: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

What's Milan Design Week?

19

[map: http://www.fuorisalone.it]

The Milan Design Week (MDW) is a city-scale event • held yearly in Milan, • featuring around 1,200 events • in 500+ places spread across the city and • attracting about half a million people from all over the

world.

Page 20: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 20

CitySensing for event managers (2013)F. Antonelli, M.Azzi, M.Balduini, P.Ciuccarelli, E.Della Valle, R. Larcher: City sensing: visualising mobile and social data about a city scale event. AVI 2014: 337-338

http://jol.telecomitalia.com/jolskil/citysensing/

Page 21: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 21

CitySensing for casual audience (2014)

M.Balduini, E.Della Valle, M.Azzi, R.Larcher, F.Antonelli, and P.Ciuccarelli: CitySensing: Fusing City Data for Visual Storytelling. IEEE MultiMedia.

http://jol.telecomitalia.com/jolskil/citysensing/http://citysensing.fuorisalone.it/

Page 22: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Ingredients of the proposed solution

Big Data technologies- Address "volume" of data that do not fit in

memory- Address "velocity" of data streams in memory

semantic technologies - Address "variety" using Ontology Based Data

Access- Named Entity Recognition and Linking

data science- Statistical modelling- Detecting anomalies

Visual analytics- Allow no-expert access to data- Tell stories out of data

22

StreamReasoning

Page 23: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 23

What's Stream Reasoning?

Tame Variety and Velocity simultaneously

Traditional StreamReasoning

E.Della Valle, S. Ceri, F. van Harmelen, D. Fensel: It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009)

E. Della Valle, D. Dell'Aglio, A. Margara: Taming velocity and variety simultaneously in big data with stream reasoning: tutorial. DEBS 2016: 394-401

Page 24: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 24

What's Stream Reasoning?

Tame Variety and Velocity simultaneously

Traditional StreamReasoning

E.Della Valle, S. Ceri, F. van Harmelen, D. Fensel: It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009)

E. Della Valle, D. Dell'Aglio, A. Margara: Taming velocity and variety simultaneously in big data with stream reasoning: tutorial. DEBS 2016: 394-401

Page 25: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 0

Time

Reality

Capture

Frame

Digital Reflex

Set up a conceptual model (FraPPE) to master the variety in the data sources

M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. ISWC 2015

Page 26: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 0

02/05/2023

Grid

Cell

Time

Frame

Set up a conceptual model (FraPPE) to master the variety in the data sources

M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. ISWC 2015

Page 27: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 0

02/05/2023

Pixel Frame 1

Time

Set up a conceptual model (FraPPE) to master the variety in the data sources

M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. ISWC 2015

Page 28: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 0

02/05/2023

Place A

Event A

Time

Set up a conceptual model (FraPPE) to master the variety in the data sources

M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. ISWC 2015

Page 29: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 0

02/05/2023

Event A

Time

Frame 1

Set up a conceptual model (FraPPE) to master the variety in the data sources

M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. ISWC 2015

Page 30: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. ISWC 2015

How CitySensing works – step 0

02/05/2023

Event B

Place B

Time

Frame 2

Set up a conceptual model (FraPPE) to master the variety in the data sources

Page 31: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

FraPPE offers an homogenous view to the visual analytics interface built on heterogeneous data

How CitySensing works – step 0

31

Geo-spatial fragmentProvenance fragmentTime Varying fragmentFraPPE specifics

Page 32: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 1

32

For every pixel compute continuously the volume of Call Data Records (using privacy-preserving aggregation)

Real data recorded on 13 April 2013 between 13:00 and 00:00

Page 33: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 2

33

Find continuously the anomalous pixels comparing the current volumes with a model of the volumes in this time period

Real data recorded on 13 April 2013 between 13:00 and 00:00

Page 34: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 3

34

Map continuously anomalies to the districts of Milano Design Week

Brera

Tortona

What'sthis?

Real data recorded on 13 April 2013 between 13:00 and 00:00

Page 35: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 4

35

For every anomalous pixel continuously capture the hashtags and semantic entities named in the social media streams

Brera

Tortona

What'sthis?

Real data recorded on 13 April 2013 between 13:00 and 00:00

Page 36: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

How CitySensing works – step 5

36

Continuously discard the hashtags and semantic entities that are systematically used

Brera

Tortona

Real data recorded on 13 April 2013 between 13:00 and 00:00

Page 37: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 37

Logical architecture of CitySensing – setup time

Analyse Data Stream

Build Models

Capture Data Stream Capture Static Data

MDW

Page 38: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 38

Logical architecture of CitySensing – run time

Analyse Data Stream

Build Models

Detect Anomalies

Capture Data Stream

Visualize Analysis

Store Analysis

Capture Static Data

MDW

Page 39: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 39

Logical architecture of CitySensing – run time

Analyse Data Stream

Build Models

Detect Anomalies

Capture Data Stream

Visualize Analysis

Store Analysis

Capture Static Data

MDW

StreamReasoning

InductiveDeductive

Page 40: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 40

Few more details on Stream Reasoning

Uses logical window

Connects to a variety ofdata streams

Real-timequery answering

complex event processing analysis

Stream Reasonerfor data

"in-motion"(In-memory)

Storedata

"at-rest"(distributed)

optimizesjoins

MDW

Page 41: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Capturing static data via FraPPE

The frame duration was fixed to15 minutes

Milano area was covered with • 1 grid (100x100)• 10,000 cells• 250x250 meters in each cell

(the size of the mobile network cells in the centre of Milan)

During the Milano Design Week a total of 5.76 Mln pixel werecaptured

+1000 events in +600 placeswhere collected using the crowd-sourced databases of fuorisalone.it, breradesigndistrict.it and tortonaroundesign.com thanks to a partnership with studiolabo

41

Cells in which there are placeshosting Milan Design Week 2013events

Page 42: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Processing Telecom Italia Call Data Records

1.92 Mln Gaussian models were built• one for each pixel (i.e., for each frame and cell)• grouping the frames by working and week-end days • using two months of Call Data Records, and• verifying volume of CDR has a Gaussian distribution with an

Anderson-Darling test with a significance of 0.05

Built on Pig, R e Cascalog The processing on 7 m1.large EC2 machines took 24 hours

42

Bad case Good case

His

togr

am

His

togr

am

Q-Q

Plo

t

Q-Q

plo

t

Page 43: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Processing Telecom Italia Call Data Records

Volume of CDR captured in Milan during the Design Week

Calls, SMS and Internet access were aggregated(with privacy-preservingmethods) and an anomaly index was computed for each of the 1.92 Mln pixel/day

The processing of 1 day on 7 m1.large EC2 took 20 mins

43

What 2013 2014Calls 16,743,875 19,719,629

SMSs 19,454,497 20,240,485

Internet data accesses 137,381,761 197,767,245

[image: https://cerijayne.files.wordpress.com/2011/10/outliersss.png]

Page 44: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Do CDR-anomalous pixels relate to events?

CDR-anomalous pixels =pixels in which the anomaly index is high (>+2σ and <-2σ)

To test if the anomalous pixels were related to the events of the Milan Design Week• We used three ground truth

– the pixel of Milan– the pixels of Brera district– the pixels of Tortona district

where there was at least an event of Milan Design Week 2013• We compute

– Precision – Recall

of the anomalous pixels to find pixels in those three ground truths

44

Page 45: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 45

Do CDR-anomalous pixels relate to events?

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

Mila

nB

rera

Toro

tna 09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

Tuesday Wednesday Thursday Friday Saturday Sunday

precision

Page 46: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 46

Do CDR-anomalous pixels relate to events?

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

Mila

nB

rera

Toro

tna 09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

Tuesday Wednesday Thursday Friday Saturday Sunday

recall

Page 47: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 47

Do CDR-anomalous pixels relate to events?

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

Mila

nB

rera

Toro

tna 09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

Tuesday Wednesday Thursday Friday Saturday Sunday

precision recall

Lesson learnt

• High precision

• Low recall at city scale

• High recall in Brera and Tortona

Page 48: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Processing Social Streams

The machinery: the Streaming Linked Data framework

48

M.Balduini, E.Della Valle, D.Dell'Aglio, M.Tsytsarau, T.Palpanas, and C.Confalonieri:Social Listening of City Scale Events Using the Streaming Linked Data Framework. International Semantic Web Conference (2) 2013: 1-16

Stream Bus

AnalyserDecorator

Adapter Publisher VisualizerStream

HTTP

HTTP

Data Source Streaming Linked Data Server HTML5 Browser

Page 49: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 49

Processing Social Streams

M.Balduini, A.Bozzon, E.Della Valle, Y.Huang, G-J Houben: Recommending Venues Using Continuous Predictive Social Media Analytics. IEEE Internet Computing 18(5): 28-35 (2014)

Happily inside a bottle of Heineken beer @ the Heineken Magazzini#heinekendesignweek

EventMilan Design Week

Event Heineken Design Week

LocationThe Magazzini

hosts

has location

K

now

ledg

e G

raph

WCompanyHeineken

W Drinkbeer

producesorganized by

Wide as Wikipedia As deep as you like

Page 50: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Processing Social Streams

predictive models were built• For hastags and semantic entities systematically present• Using a Holt-Winter method

• grouping the frames by – working and week-end days and– Early morning, morning, afternoon, evening, and late night

• Analysing 300,000 geo-located micro-posts collected other 6 months in Milano area (november 2013, aprile 2014)

• It takes few seconds per hashtag/semantic entity on a 60€/month VM in a IaaS

50

DataFittedForecastLower 2,5%Upper 97,5%

Page 51: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Processing Social Streams

Usage of #milan in the weeks around Milan Design Week

Subtracting the predicted usage of #milan

51

200 – 700

700 – 1100

1100 – 1400

1400 – 1900

1900 – 200

200 – 700

700 – 1100

1100 – 1400

1400 – 1900

1900 – 200

WD WE WD WE WD WE WD WE WD

Milan Design Week

WD WE WD WE WD WE WD WE WD

Page 52: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Processing Social Streams

The difference between the observed and the predicted usage of #milan perfectly fits the usage of #mdw (the official hashtag of Milan Design Week)

52

200 – 700

700 – 1100

1100 – 1400

1400 – 1900

1900 – 200

200 – 700

700 – 1100

1100 – 1400

1400 – 1900

1900 – 200

WD WE WD WE WD WE WD WE WD

Milan Design Week

Anomalous usage of

#milan

Usage of #mdw

Page 53: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Processing Social Streams

Geo-references micro-posts captured, semantically annotated, cleansed using the predictive models and analyzed in Milan area

For each pixel with at least 1 micro-post we computed The volume related to Milano Design Week The top-10 hashtags The top-3 locations/events

Real-time processing was possible with our in-memory C-SPARQL engine and the Streaming Linked Data framework on a 20€/month VM in a IaaS

53

What 2013 2014Geo-located micropost 57,154 21,782

Linked to Milano Design Week 3,569 3,499

Linked to a specific location/event 761 547

Page 54: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Do socially active pixels relate to events?

socially active pixels =pixels in which we captured social media that talk about Milan Design Week

To computes • precision• recall

of the socially active pixels in find pixels in pixels in the three ground truths about Milan, Brera district and Tortona district

54

Page 55: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

55

Do socially active pixels relate to events? M

ilan

Bre

raTo

rotn

a

Tuesday Wednesday Thursday Friday Saturday Sunday

precision

Page 56: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

56

Do socially active pixels relate to events? M

ilan

Bre

raTo

rotn

a

Tuesday Wednesday Thursday Friday Saturday Sunday

recall

Page 57: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

57

Do socially active pixels relate to events? M

ilan

Bre

raTo

rotn

a

Tuesday Wednesday Thursday Friday Saturday Sunday

precision recall

Page 58: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.10.20.30.40.50.60.70.80.9

1

58

Do socially active pixels relate to events? M

ilan

Bre

raTo

rotn

a

Tuesday Wednesday Thursday Friday Saturday Sunday

precision recall

Lesson learnt

• High precision

• Acceptable recall in the districts

Page 59: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Anomalous Socially active Intersection Similar?

Are CDR-anomalous and socially active pixels similar?

Which of the following four scenarios?

59

Page 60: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Are CDR-anomalous and socially active pixels similar?

More formally• Jaccard

• E.g.,

60

J(A,B) = 8/11 J(A,B) = 3/11

A B A

B

J(A,B) = |A ∩ B|

|A∪B|

Page 61: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

09 04:00

09 10:00

09 16:00

09 22:00

10 04:00

10 10:00

10 16:00

10 22:00

11 04:00

11 10:00

11 16:00

11 22:00

12 04:00

12 10:00

12 16:00

12 22:00

13 04:00

13 10:00

13 16:00

13 22:00

14 04:00

14 10:00

14 16:00

14 22:000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

61

Are CDR-anomalous and socially active pixels similar?B

rera

Toro

tna

Tuesday Wednesday Thursday Friday Saturday Sunday

recall CDR-anomalous recall socially active Jaccard

Lesson learntAt district level, in the large majority of the cases the

socially active pixels are also CDR-anomalous pixels

Page 62: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 62

Visualizing for a casual audience

Page 63: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 63

See it in action!

http://youtu.be/MOBie09NHxM

Page 64: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation methodology for casual audience

Guessability study• Can you guess what I mean without any explanation?

E.g.

64

Dinosaur extinction

"The Shining" by Stephen King

Page 65: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation of interface guessability

65

Page 66: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The patters you should have got

The CDR-anomaly and the social activity is

66

Correlated Partially correlated Not correlated

Page 67: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation of interface guessability

67

Q: In Brera District the volume of social media signal is partially correlated with the value of mobile anomaly signal A:

FALSE

UNCERTAINTRUE

00.20.40.60.8

1

Page 68: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation of interface guessability

68

Q: In Brera District the volume of social media signal is partially correlated with the value of mobile anomaly signal A:

FALSE

UNCERTAINTRUE

00.20.40.60.8

1

Page 69: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation of interface guessability

69

Q: In Porta Romana the volume of social media signal is strongly correlated with the value of mobile anomaly signal A:

FALSE

UNCERTAINTRUE

0

0.2

0.4

0.6

0.8

1

Page 70: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation of interface guessability

70

Q: In Porta Romana the volume of social media signal is strongly correlated with the value of mobile anomaly signal A:

FALSE

UNCERTAINTRUE

0

0.2

0.4

0.6

0.8

1

Page 71: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation of interface guessability

71

Q: In Tortona District the volume of social media signal is strongly correlated with the value of mobile anomaly signalA:

FALSE

UNCERTAINTRUE

0

0.2

0.4

0.6

0.8

1

Page 72: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Evaluation of interface guessability

72

Q: In Tortona District the volume of social media signal is strongly correlated with the value of mobile anomaly signalA:

FALSE

UNCERTAINTRUE

0

0.2

0.4

0.6

0.8

1

Page 73: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Back to the research question

73

[photo: https://www.flickr.com/photos/debord/4932655275]

Can we collect, analyse and repurpose

• social media captured at place and events and

• privacy-preserving aggregates of Call Data Records

to allow visually

• perceiving emerging patterns and

• observing their dynamics?

Yes!at least, in Milano Design Week 2013 and 2014

[photo: https://flic.kr/p/beuDaX ]

Page 74: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

… and I was so crazy to start up a company …

74

http://www.socialometers.com

Page 75: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

Lesson Learnt for Stream Reasoning

The technical barriers are high The theoretical foundations are incomplete The veracity problem is sort of forgotten

75

Page 76: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

High Technical Barriers for Stream Reasoning

We are getting close to a shared understanding on RDF Stream Processing (RDF stream and continuous extension of SPARQL)• See http://www.w3.org/community/rsp/

Missing infrastructure• Only one proposal for RDF stream publishing

– http://streamreasoning.github.io/TripleWave/ • Only one proposal for RDF Stream Processing APIs

– http://streamreasoning.org/resources/rsp-services Only prototypes, some unmaintained Need for scalable system built on Big Data technologies

(e.g., Spark/Flink) Lack of systematic and comparative evaluation

• too many benchmarks all focusing RDF stream processing with little emphasis on reasoning

76

Page 77: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 77

Incomplete Stream Reasoning theory

Two reference models exist• RSP-QL: Built on SPARQL semantics

– D.Dell'Aglio, E. Della Valle, J-P Calbimonte, Ó. Corcho:RSP-QL Semantics: A Unifying Query Model to Explain Heterogeneity of RDF Stream Processing Systems. Int. J. Semantic Web Inf. Syst. 10(4): 17-44 (2014)

• LARS: Built on datalog-style rules– H.Beck, M.Dao-Tran, T.Eiter, M.Fink: LARS: A Logic-Based Framework for

Analyzing Reasoning over Streams. AAAI 2015: 1431-1438 However

• What's the complexity of Q/A in RSP-QL/LARS?• How to deal with inconsistency appearing over time?• How do stream reasoning and event calculus relates?

OBDA on static data ≠ OBDA for continuous querying ans = data + query Ans(t) = sys(t) + data(t) + query

What about inductive stream reasoning?

Page 78: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org

The veracity problem is sort of forgotten

Some initial works• D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V.

Tresp, A. Rettinger, H. Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics. IEEE Intelligent Systems 25(6): 32-41 (2010)

• M. Nickles, A. Mileo: Web Stream Reasoning Using Probabilistic Answer Set Programming. RR 2014: 197-205

• A-Y Turhan, E. Zenker: Towards Temporal Fuzzy Query Answering on Stream-based Data. HiDeSt@KI 2015: 56-69

Missing Theory?

78

Page 79: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 79

Take home message … guess it :-)

Page 80: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Emanuele Della Valle - @manudellavalle - http://emanueledellavalle.org 80

Take home message … guess it :-)

Emanuele Della Valle@manudellavalle

[email protected]://emanueledellavalle.org

Thank you!

Any question?

Page 81: Listening to the pulse of our cities with Stream Reasoning (and few more technologies)

Listening to the pulse of our cities with Stream Reasoning (and few more technologies)Emanuele Della Valle@manudellavalle - [email protected]://emanueledellavalle.org