powerpoint presentation · n = true unknown number of deaths. yellow list has a individuals, m of...

Post on 02-Oct-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol

• The Revolution

• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol

• The Revolution

• 1992-1995 • Given the proceedings of the Canadian parliament,

3 millions sentences carefully translated into French and English, the Candide system automatically learns how English and French are related.

• Worked well, but never became popular • And it could not be improved further!

– Takes every translation it can find on the web.

– A trillion of words, 95 billions English sentences

– Very unevenly translated!

– Does not apply any grammatical rule, no models, only statistical

analysis.

– It works way better than anything else.

– Not because of better quality of the data. Just because of size.

– It got the size because it accepted bad, messy data, not made

for this purpose.

Trading quantity with quality

– It improves all the time.

Do we need models, when we have lots of data?

It is not always enough to crunch data!

MODEL-BASED STATISTICS

• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol

• The Revolution

Syrian Civil War

Significance, April 2015 Megan Price, Anita Gohdes and Patrick Ball

• policy and military decisions • resource allocation • war crimes tribunals

4 groups produce 4 lists of people killed in Syria:

We can match the lists and compare reports.

4 groups produce 4 lists of people killed in Syria:

IDEA: Comparing the size of the overlaps • If most of the cases on the lists overlap, the real

number of deaths is not much larger than the number of cases listed.

4 groups produce 4 lists of people killed in Syria:

IDEA: Comparing the size of the overlaps • If most of the cases on the lists overlap, the real

number of deaths is not much larger than the number of cases listed.

• If the overlap is small, the number of deaths is larger than the union of reports.

N = true unknown number of deaths. Yellow list has A individuals, M of those are also in the blue list, which has in total B individuals. The probability of being in a random list of size A

from a population of size N is 𝐀

𝐍.

The probability of being in a list of size B is 𝐁

𝐍.

The probability of being in a list of size M is 𝑴

𝐍.

If two organisations work independently, the probability of being in both

yellow and blue list is the product of the individual probabilities: 𝐀

𝐍∙

𝐁

𝑁.

But “ to be both in A and B” is the same as M, so it must be: 𝐀

𝐍∙

𝐁

𝐍=

𝐌

𝐍 ,

and therefore we estimate 𝐍 =𝐀∙𝐁

𝐌.

N

A B M

reporting groups

• Documented data suggest that deaths slightly decreased from one month to the next, while the estimates tell this is not true.

95% confidence interval

• 1554 documented casualties

December 2012 ----- March 2013

• Confidence intervals suggests there were as many as 3793 deaths

MODEL-BASED STATISTICS

DELIVERS DEEPER UNDERSTANDING THAN JUST DATA SUMMARIES

• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol

• The Revolution

TRACKING UNEMPLOYMENT USING MOBILE PHONE DATA

Toole, J. L., Lin, Y. R., Muehlegger, E., Shoag, D., González, M. C., & Lazer, D. Journal of The Royal Society Interface, 2015

• Real time estimate of changes in unemployment, at arbitrarily fine spatial scale, using mobile phone data already collected.

• Ahead traditional indicators in European countries

Data - mobile phone calls: • caller -> receiver • location • time

Training: Case of a large factory closing down

• Compare individual signal before vs. after closure • Find special features of the signal when job is lost

Calibrating: A region with official unemployment estimates • Match “lost-job” mobile phone signal to unemployment rates

Predict: Current (and near future) unemployment

Training

Prediction

Based on mobile phones

Official rates

• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol

• The Revolution

University of Pittsburgh

EXPERIMENT:

• 720 female and male social drinkers, 21-28 years

• randomly assigned to groups of 3 strangers

• seated, offered three beverages during 36 minutes

• video recorded (35 million frames)

Alcohol groups: juice plus vodka

Placebo groups: told alcohol, given juice + hint of vodka

Smiles are infectious!

Explore the impact of alcohol and group gender composition

on the likelihood that an initial smile will progress into a mutual

smile, instead than remaining unreciprocated.

(Enjoyment) “Duchenne” smile

Social Display

Smiling and Speech Behavior of Three Group Members for

10 Minutes of Interaction

Time (Coded Every 1/30th Sec)

Sp

eech

Sm

ilin

g

Group Gender

Makeup

Alcohol

% (N)

Placebo

% (N)

All Males 50.4% (614) 37.4% (321)

2 Males 1 Female 48.8% (780) 44.4% (490)

1 Male 2 Female 49.8% (714) 49.1% (683)

All Females 49.4% (822) 48.2% (679)

Percentage of smiles leading to a mutual smile

Effect of alcohol on mutual smiles is larger in all-males

than in the groups with all females.

Personalised solutions

Forecasting the transient

• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol

• The Revolution

The impact of

data rich information technologies

is deep.

• we work less

• The Internet-of-Things • Automation, sensors • Smart software

1. Jobs get replaced, brokers, megler, drivers, IT, … 2. More productive, less time searching for info… Fewer working hours to do the same job.

• we work less

• less difference between work & free time,

weakening the concept of salary

1. Freelance, not just one employer 2. Networking (after work) is part of the job 3. Ideas matter, and they do not come between 9 and 5 Is this work or my own free time? What shall a salary cover? More trust and collaboration between employers and employees is necessary.

• we work less

• less difference between work & free time,

weakening the power of salary

• less private property

1. Information and data are abundant and available for free (while markets exploit scarcity) 2. Re-use of data 3. Monopolies (which can play with prices) that capture data and sell it, will fail, because much data is free. The concepts of “cost” and “private property” are shaking. There is less possibility for profit.

• we work less

• less difference between work & free time,

weakening the power of salary

• less private property

• the sharing economy, escaping the

market rules

1. Collaborative production of goods, services 2. Organisations are different: no managers, no contracts 3. Need to redefine taxing systems

loppemarked

• Product for all and for free • Produced in collaboration • no profit • 208 employers • 73.000 active contributors

• if commercial: 3 billion USD revenue/year • but impossible for others to make profit in this area any more

• we work less

• less difference between work & free time,

weakening the power of salary

• less private property

• the sharing economy, escaping the

market rules

• networking people, puts power in the

hands of many

THE END OF

FREE MARKET

CAPITALISM?

My health records

“Can we use your data for a study on Alzheimer?”

“Can we use your data for a study on myopia?”

1980

Documents of war: Understanding the Syrian conflict Megan Price, Anita Gohdes Patrick Ball

top related