on the quest for changing knowledge. capturing emerging entities from social media. webscience 2016...

Post on 23-Feb-2017

1.044 Views

Category:

Social Media

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

On the Quest for Changing KnowledgeMarco Brambilla, Stefano Ceri, Florian Daniel, Emanuele Della

Valle

@marcobrambi

Data-driven innovation

and

Innovation-driven data

Innovation requires

PreciseTo the pointUp-to-date

Domain-specific

information

There are more things In heaven and earth, Horatio, Than are dreamt of in your philosophy.

Shakespeare (Hamlet Act 1, scene 5)

From Data to Wisdom

Formalizing new knowledge is hard

Only high frequency emerges

The long tail challenge

Knowledge Extraction

Text miningSemantic Web

Search and recommendation systems

No specific care for emerging knowledge

Heaven and HeartHow to peer through an effective window

on real world?

Social media, our blessing and curse

Domain experts matter

Can we use social networks to discover emerging knowledge?

Beware the streetlamp effect

The bias of the sourceThe bias of the observer

Famous Emerging

Evolving Knowledge

Overview

Knowledge Enrichment Setting

Emerging Knowledge Harvesting

Domain TypesTypes selected by the experts

Relevant for the domain

Seed characterizationSelected by the expert

Belonging to an expert type

Thoroughly Described# @ a w

Social Media Sourcing

Content coming from the seeds’ accounts

Candidate Selection

Potentially any entity extracted from the social streams

Resulting in huge sets of candidates

# @ a w ♥

Candidate Typing

Candidate Pruning

Initial pruning of candidates based on

TF-DF:= df * tf / (N – df +1)

(*) variant of TF-IDF that does not discount document frequency because we are actually happy about frequent appearance

(we don’t look for information entropy!)

Candidate Ranking

Candidate Vector Space

Purely syntactic

Semantic:Based on entity extraction / DBpedia

Based on deep learning on images / ClarifAI

Example Analysis

Experiments

Fashion brands Writers Painters

Exhibitions

4,400 strategies evaluated

44 alternative feature vectors (12 basic features and 32 aggregations)

9 different weighting values for aggregations

5 levels of recall for entity extraction

3 different distances

Pruning PhaseFrom 4,400 down to 10 strategiesEliminating the less relevant parameters

Italian Fashion BrandsPrecision @5 = 0.2Increasing # seeds reduces precision

Australian Writers – 22 seedsPrecision @5 = 0.8

Innovative Painters – 21 seedsPrecision @5 = 0.6

Twitter vs. Instagram P@5 = 1.0 P@5 = 0.8

vs.

Fashion: Twitter + Instagram&

&

Writers: Twitter + Instagram

Prec. = 1

Conclusion

It’s about time to build innovation based on data

and build knowledge based on innovation

Harvesting can be iterative

On the Quest for Changing Knowledge

contact usMarco Brambilla, @marcobrambi, marco.brambilla@polimi.it

http://datascience.deib.polimi.it

top related