educating data scientists: the sobigdata master experience

24
Social Mining & Big Data Ecosystem Educating Data Scientists: the SoBigData master experience www.sobigdata.eu Fosca Giannotti, Valerio Grossi ISTI-CNR Pisa H2020-INFRAIA-2014-2015 Grant Agreement N. 654024

Upload: research-data-alliance

Post on 22-Jan-2018

210 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Educating Data Scientists: the SoBigData master experience

Social Mining & Big Data Ecosystem

Educating Data Scientists:

the SoBigData master experience

www.sobigdata.euFosca Giannotti, Valerio Grossi

ISTI-CNR Pisa

H2020-INFRAIA-2014-2015 Grant Agreement N. 654024

Page 2: Educating Data Scientists: the SoBigData master experience

Modern science is data-intensive, multidisciplinary, collaborative and global

– efficiency of data management (noSQL paradigms and cloud computing play important role here) and curation, search, sharing, transfer.

– managing the complexity of the analytical process is a key issue (scalable distributed analytical methods and and Visual Analytics are crucial here).

Firenze, 14 Nov 2016

Page 3: Educating Data Scientists: the SoBigData master experience

Validation

Data

Dem

ogr

aph

ic d

ata

Geo

grap

hic

dat

aM

ove

me

nt

dat

aTr

ansp

ort

dat

a

Models

T-C

lust

erin

gT-

Patt

ern

s

Forecasts

Big Data Analytics process

Firenze, 14 Nov 2016

Page 4: Educating Data Scientists: the SoBigData master experience

Interdisciplinary and collaborative

• for sharing data/models/processes and results of experiments (different level of interoperability and semantic enrichment)

• to realize experiments by combining resources (data, methods and results) belonging to different communities.

– This call for tools facilitating the govern of complex analytical process in a workflow style or mega-modeling.

– This call also for sophisticate search that supports resource discovery.

Firenze, 14 Nov 2016

Page 5: Educating Data Scientists: the SoBigData master experience

Data scientist

A new kind of professionalhas emerged, the data scientist, who combines the skills of software programmer, statistician andstoryteller/artist to extractthe nuggets of gold hiddenunder mountains of data.

Firenze, 14 Nov 2016

Page 6: Educating Data Scientists: the SoBigData master experience

Four core points of a data scientist

• Data Procurement and Curation

• Making sense of Data

• Story-telling

• Respond step-by-step on technical correctness and legal and ethical issues

Firenze, 14 Nov 2016

Page 7: Educating Data Scientists: the SoBigData master experience

SoBigData is…

A Multidisciplinary European Infrastructure for Big Data and Social

Data Mining providing an integrated ecosystem for ethically

sensitive scientific discoveries and advanced applications of social

data mining on the various dimensions of social life, as recorded by

“big data”.

Firenze, 14 Nov 2016

Page 8: Educating Data Scientists: the SoBigData master experience

Social Mining - Answer to:

Firenze, 14 Nov 2016

• Who will win US elections? What’s the elector’s current intention of vote? How reliable is it?

• Which are the indicators of social well-being (beyond GDP) and how can they be computed and monitored?

• How is the aging population effectively helped by the social participation to digital community services?

• What is the link between media ownership and media content? Is there bias in news reporting? And in content reviews?

• Is an infective disease emerging? How is its diffusion model?

Page 9: Educating Data Scientists: the SoBigData master experience

Firenze, 14 Nov 2016

Page 10: Educating Data Scientists: the SoBigData master experience

Estimating traffic fluxes on road network with mobile phone

data

A

B

C

HW

Firenze, 14 Nov 2016

Page 11: Educating Data Scientists: the SoBigData master experience

Predicting Success“Football is a simple game: 22 men chase a ball for 90 minutes and at the end, the Germans always win”-- Gary Lieneker (after Italy 1990 Final)

Firenze, 14 Nov 2016

Page 12: Educating Data Scientists: the SoBigData master experience

Managing Data does not meansSupport discover

Provide access, Verify the quality of data, Clean errors, outliers, anomalierTransform data in a format suitable for specific data analytical tools

It must include support for• legal interoperability

– copyright management, – licensing of single and derivative products– terms of use

• fine-grained policies– attribution,– citation policy, – provenance management

• Ethics issues

Managing Data: what this means?

Firenze, 14 Nov 2016

Page 13: Educating Data Scientists: the SoBigData master experience

Metadata in the SoBigData RI experience

• Huge datasets often describe human activities, which implies privacy and ethical issues

• As a Research Infrastructure FAIRness is one of our main targets– The success of the RI is directly connected to the fact that

datasets are Findable, Accessible, Interoperable and Reusable

– The intellectual property has to be considered– The design of a highly structured metadata schema allows

the RI to automatically grant or deny access to a dataset, to force the acceptance of terms of use or signing NDAs…

Page 14: Educating Data Scientists: the SoBigData master experience

SoBigData metadata structure

• A highly structured and detailed metadata structure has been designed in order to provide information about:– Description of the dataset (to make it Findable)– How the dataset has been produced– Intellectual Property– Privacy issues– Who can access the data and how (terms of use,

NDA…)• Mainly based on the DataCite standard

Page 15: Educating Data Scientists: the SoBigData master experience

The ethics of SoBigData

• Gathering large quantities of data has serious consequences that SoBigData is trying to address. These consequences range from personal harm, to issues of autonomy, injustice and inequality.

• In order to deal with these problems, SoBigData adheres to a value-sensitive design approach. This approach consists in using design solutions to overcome ethical dilemma’s, in this case those between the utility of the data gathered vs. the protection of the individuals subject to the research.

• In order to make the ideals of SoBigData successful, scientific methods also need to be developed in order embed moral principles in practice.

Page 16: Educating Data Scientists: the SoBigData master experience

Ethics: the challenge for SoBigData

• How do we create an infrastructure in which such data and methods can be disseminated and improved upon?

1. A Massive Online Open Cource (MOOC) which instructs all prospective researchers about the legal and ethicaldangers of big data research and the steps they can take to minimise these;

2. A set of workflows that outline the steps researchers can take when designing their approach;

3. Information pop-ups which redirect researchers to state-of-the-art ethical methods.

Page 17: Educating Data Scientists: the SoBigData master experience

Meta data definition: Ethics

Firenze, 14 Nov 2016

Page 18: Educating Data Scientists: the SoBigData master experience

Meta data definition: Intellectual Properties

Firenze, 14 Nov 2016

Page 19: Educating Data Scientists: the SoBigData master experience

Master in Big Data Analytics & Social Mininghttp://www.sobigdata.eu/master/bigdata

Firenze, 14 Nov 2016

Page 20: Educating Data Scientists: the SoBigData master experience

Firenze, 14 Nov 2016

Page 21: Educating Data Scientists: the SoBigData master experience

Education

• Big Data Sensing

• Big Data Mining

• Big Data Story Telling

• Big Data Technology

• Big Data for Social Good

• Big Data Ethics

Firenze, 14 Nov 2016

Page 22: Educating Data Scientists: the SoBigData master experience

Students: their studies

0

1

2

3

4

5

6

7

8

2015

2016

Firenze, 14 Nov 2016

Page 23: Educating Data Scientists: the SoBigData master experience

Gender distribution

0

5

10

15

20

25

2014-2015 2015-2016

M

F

Firenze, 14 Nov 2016

Page 24: Educating Data Scientists: the SoBigData master experience

Firenze, 14 Nov 2016