the tyranny of data? the bright and dark sides of …...the tyranny of data? the bright and dark...

The Tyranny of Data?

The Bright and Dark Sides of

Data-driven Algorithmic Decision

Making for Social Good

Nuria Oliver, PhDDirector of Research in Data Science @ Vodafone

Chief Data Scientist @ Data-Pop Alliance

Chief Scientific Advisor @ Vodafone Institute.

The Context

We live in a world of data

We live in a world of data

More data was created in the last two years

than the previous 5,000 years of humanity

Data --and the ability to

make sense of it-- are

arguably the most valuable

assets in the digitaleconomy.

The European Data Economy

Source: European Data Market Study

Almost €300 billion in 2016 €739 billion in 2020

6.16 million in 2016 255,000 in 2016

10.43 million in 2020 359,050 in 2020

Data workers Data companies

Data economy value

Exponential growth in

computing power, storage

and availability of datahave enabled significant

developments in

data-driven AI

Computational Social SciencesThe ubiquity of mobile phones enables us to collect and

analyze, for the first time in human history, large-scale

aggregated and anonymized human behavioral data of

entire cities, countries or even continents

The opportunity is HUGE to help decision making units

(governments, UN, Red Cross…) make more informed decisions thanks to the existence of quantitative real-time

information about populations

Source: Kapersky Lab

Source: Kapersky Lab

March 2016

1. How can (Big) Data help monitorthe SDGs by “filling data gaps” with more granular & disaggregated data—and what does monitoring something do to that something?

2. How can (Big) Data help promote (or impede?) the SDGSand their underlying human development vision and objectives—including towards and through lower (or higher?) inequalities?

The (Big) Data Revolution and the Sustainable Development Goals

Data-Pop Alliance is a global coalition on Big Data

& development created by the Harvard Humanitarian Initiative,

MIT Media Lab, and Overseas Development Institute joined by Flowminder, bringing

together researchers, experts, practitioners and activists to “promote a people-

centered Big Data revolution” by locally co-designing and deploying collaborative

research, training, and engagement

activities

Leadership

Prof. Alex ‘Sandy’

Pentland

Academic Director

Prof. Patrick Vinck

Co-Director &

Co-Founder

Prof. Phuong

Pham

Elizabeth Stuart

Co-Director for ODI

Dr Emma

Samman

Dr Emmanuel Letouzé

Director & Co-Founder

Dr Linus Bengtsson

Co-Director for HHI

Dr Nuria Oliver

Chief Data Scientist

Algorithms strongly influencing

decision-making and resource

optimization for public goods

through the analysis of massive

amounts of (human behavioral)

data from a variety of sources

Data-driven Social Good Algorithms

Transportation

Energy

Natural Disasters

Humanitarian Crises

Climate Change

Public Health

Urban Studies

Population

Studies

Agriculture

Areas of impact

Economic Development

Financial Inclusion

• Many decisions with significant individual and societal

implications are now made by or assisted by algorithms:

lending, policing, sentencing, resource allocation…

• Data-driven algorithmic decision-making may enhance

government efficiency and public service delivery

• Parag Khanna (Technocracy in America) argues that a data-

driven direct technocracy is superior to today’s democracy

because it may dynamically capture people’s needs while

avoiding human biases, corruption, conflicts of interest….

The Promise of Algorithmic

Decision-Making

• Global economic development projects have often been

governed by a “tyranny of experts”

• Technocratic justifications for interventions are considered to

be objective

• Intended beneficiaries are unaware of black-box decision

making

• Experts may act with impunity

• Several parallelisms to what we might refer to as “the tyranny

of algorithms”

The Tyranny of Algorithmic Decision-Making?

Six Areas of Improvement

• Computational violations of privacy

• Bias, social exclusion and

discrimination

• Information Asymmetry

• Opacity

• Veracity

• Ethics

1. Computational Violations of Privacy

• Inference of personal attributes from non-personal data: Personality, sexual orientation, intelligence,

ethnicity, political views inference from Facebook

likes (Kosinski et al, 2013), Facebook profile pictures

(Segalin et al, 2017) and patterns of access to the

3G/4G network (Park et al, 2017)

Algorithm could correctly

distinguish between gay and

straight men 81% of the time,

and 74% for women, better than

humans

Source: “Deep neural networks are more accurate than humans at detecting sexual orientation from facial images” Kosinski and WangJournal of Personality and Social Psychologyhttps://osf.io/zn79k/

35,000 facial images that men

and women publicly posted on a

US dating website.

Publicly available data could be used to infer sexual orientation without explicit consent

2. Discrimination, Bias

• Algorithmic decisions might reproduce and

even magnify patterns of discrimination due to

prejudices in decision makers, existing biases in

society and/or biases in the data

• Disparate impact, misuse of models, type of

model

• ProPublica study of COMPAS recidivism

algorithm

http://gendershades.org/overview.html

On the Web: Race and Gender Stereotypes

3. Asymmetry

• Information: The ability to accumulate and

manipulate behavioral data about customers and

citizens with unprecedented scale may give

companies and intrusive/authoritarian governments

powerful means to manipulate segments of the

population through targeted marketing or social

control strategies

• Skills: Lack of computational and data literacy

among citizens

“Social media manipulation is big business. Since 2010, political parties and governments have spent more than half a billion dollars on the research, development, and implementation of

psychological operations and public opinion manipulation over social media. In a few

countries this includes efforts to counter extremism, but in most countries this involves the

spread junk news and misinformation during elections, military crises, and complex

humanitarian disasters”.

3. Asymmetry: Manipulation

50 million Facebook

profiles harvested to

access personal

information taken without

authorization in early 2014

to build a system that

could profile individual US

voters, in order to target

them with personalized

political advertisements.

Facebook/Cambridge Analytica Scandal

Search Engine Manipulation (SEME): When one candidate is favored in search results, that

can easily shift the voting preferences of undecided

voters by 20 percent or more — up to 80 percent in

some demographic groups

Search Suggestion (SSE): (a)Google is manipulating opinions from the very first

character people type into the Google search bar,

and

(b) by manipulating search suggestions, Google can

turn a 50/50 split among undecided voters into a

90/10 split

Two powerful opinion shaping subliminal effects

4. Opacity

• Algorithmic decisions might lack

transparency for a variety of reasons (Burrell et al, 2013):

• Intentional opacity

• Illiterate opacity

• Intrinsic opacity

5. Veracity

• Today we can create fully synthetic

text, images and videos (deepfakes) which are

indistinguishable from real content

• Deepfakes could shape our public

opinion and influence our collective

decision-making

Source: Nvidia research

DerpFakes:

https://www.youtube.com/channel/UCUix6Sk2MZkVOr5PWQrtH1g

Source: University of Washington

6. Ethics

• Well intentioned projects might

have negative non-ethical

unintended consequences that need to be considered

• Projects within the law might be

non-ethical

The Way Forward

Six Lines of Work

• User (humanity)-centric approaches

• Ethical principles

• Algorithmic Transparency

• Discrimination-aware decision making

• Living labs

• Multi-disciplinary and diverse teams

1. User Centric Approaches Personal Data Stores / Markets

• Last Day/Week/Month view

• Environment data category

indiv. /community views

(timeline, maps)

• 6 individual views + 3 whole-

period individual views:

evolution of Expenses

(weeks/categ.), more

frequent contacts

(phone/BT)

Source: Mobile Territorial Lab

1. User Centric Approaches

Secure control of Personal Data

https://www.enigma.co/enigma_full.pdf

Source: “Enigma: Decentralized Computation Platform with Guaranteed Privacy”, Zyskind, G., Nathan, O. and Pentland, A.

A peer-to-peer network, enabling different parties to jointly store and run

computations on data while keeping the data completely private. An

external blockchain is utilized as the controller of the network, manages

access control and identities, and serves as a tamper-proof log of events.

Security deposits and fees incentivize operation, correctness, and fairness

of the system. Enigma removes the need for a trusted third party, enabling

autonomous control of personal data. For the first time, users are able to

share their data with cryptographic guarantees regarding their privacy.

2. Ethical principles are needed

• We need to include ethical

considerations since the inception of

an algorithm

• Multi-disciplinary teams

• Ethics panels and ethical code of

conduct

• Chief Ethics Officer (CEO)

2. Ethical principles are needed

1. Behind data there are people

2. Privacy is not a binary variable

3. Guard against re-identification 4. Practice ethical data sharing

5. Know the strengths and limitations of

the data

6. Debate the tough ethical choices

7. Develop a code of conduct

8. Design data and systems for auditability

9. Consider the broader consequences

10.Know when to break these rules

From “Ten simple rules for responsible big data research” by M Zook at al, PLOS Comp Biol, 2017

Example of

Principles in Big Data Research

2. Ethical principles are neededAsilomar & Future of Life Institute

https://futureoflife.org/ai-principles/

1.Safety

2.Failure Transparency

3.Judicial Transparency

4.Responsibility and Accountability

5.Value alignment

6.Human values

7.Personal privacy and control

8. Liberty and privacy

9. Shared, broad benefit

10.Shared, broad prosperity

11.Human control

12.Non-subversion

13.AI arms race

Trustworthy AI has two components:

(1)Respect fundamental rights, applicable

regulation and core principles and values,

ensuring an “ethical purpose”

(2)Be technically robust and reliable since,

even with good intentions, a lack of

technological mastery can cause

unintentional harm

“Machine learning researchers

should avoid using totally ordered

objective functions or loss functions

as optimization goals in high-stakes

applications.”

“High-stakes systems should always

exhibit uncertainty about the best

action in some cases and rely on

human decisions”

3. Algorithmic Transparency

• Explainable algorithms

• Transparency regarding:

• the limitations and uncertainties of

the algorithms

• when we are dealing with an

algorithm vs a human

• how is our data being used, what

for

Principles for Algorithmic Transparency and Accountability

4. Discrimination-aware decision-making

Data Algorithm Model Decision

Preprocessing In-processing Postprocessing

Fairness Utility/PerformanceTrade-off

1. Define anti-discrimination or fairness constraints

2. Transform the data/algorithm/decision to satisfy the constraints3. Measure the data/model/decision utility

5. Living Labs and Sandboxes

100+ participants in Trento, ItalyInstrumented phone + Personal Data

Store

Volunteers to participate in user studies

on the topic of mobile personal data User-centric mobile data monetization

Laboratorio Urbano, BogotaOpen space for collaborative work

Multi-disciplinary experimentation and

analysis of Bogota’s urban challenges

Goal: generate innovative solutions

Research, design and development of AI systems is dominated

today by high-educated, very well paid males

However, AI systems are being used to model and predict the

behaviors, tastes and traits of very diverse populations with

very different life experiences

More diversity in the field would help ensure that AI systems

reflect and are meaningful to a broader user base and

viewpoints

Source AINow Institute 2017 Report

6. Multi-disciplinary and diverse teams are

a must

The Copenhagen Letter

http://copenhagenletter.org

1. Tech is not above us

2. Progress is more than

innovation

3. Let’s build from trust4. Design open to

scrutiny5. Humanity-centered

design

Fairness or JusticeNon-discrimination

Cooperation

FATEN Algorithms

AutonomyAccountability

Intelligence Augmentation

FATEN Algorithms

Transparency

FATEN Algorithms

bEneficenceprogress, sustainability,

diversity, veracity,Education

FATEN Algorithms

No-maleficence: reliability, security,

reproducibility, prudence, privacy

FATEN Algorithms

Data-enabled decision-making for good

Source: The Tyranny of Data? The Bright and the DarkSides of Data-Driven Decision-making for Social Good, Lepri et al, in Transparent Data Mining for Big and Small data, Springer 2016

“It is only when we

honor these

requirements that we will

be able to move (…) to

a data-enabled model

of democratic

governance … for the

people.”

User-centric data

ownership and

management

Education, living

labs, citizen

engagement,

multidisciplinary and

diverse teams

FATEN: Fair,

Accountable,

Transparent,

Beneficient and

Non-maleficient

Algorithms

Related Publications

•"Fair, transparent and accountable algorithmic decision-making processes"

Lepri, B., Oliver, N., Letouze, E., Pentland, A. and Vinck, P.

Springer Journal on Philosophy and Technology, 2017

•"The Tyranny of Data?: The Bright and Dark Sides of Data-driven Decision-making for

Social Good" Lepri, B., Oliver, N., Letouze, E., Pentland, A. and Vinck, P. in "Transparent data mining for Big and Small data" Springer, 2016

•"The Rise of Decentralized Personal Data Markets" in "Trust::Data: A New Framework for

Identity and Data Sharing", CreateScience Independent Publishing Platform, Oct 2016

Staiano, J., Zyskind, G., Lepri, B., Oliver, N. and Pentland, A.

•"The mobile territorial lab: a multilayered and dynamic view on parents' daily lives“

Centellegher, S., de Nadai, M., Caraviello, M., Leonardi, C., Vescovi, M., Ramadian,

Y., Oliver, N., Pianesi, F., Pentland, A., Antonelli, F. and Lepri, B.

EPJ Data Science, SpringerOpen, Feb 2016

•" Money Walks: A Human-Centric Study on the Economics of Personal Data "

Statiano, J. , Lepri, B., Oliveira, N. , Caraviello, M., Sebe, N. and Oliver, N.Proceedings of ACM Ubicomp 2014. Seattle. September 2014. Best paper award

http://www.nuriaoliver.com/papers/Philosophy_and_Technology_final.pdf

http://www.springer.com/philosophy/epistemology+and+philosophy+of+science/journal/13347

https://arxiv.org/abs/1612.00323

https://www.amazon.com/Trust-Data-Framework-Identity-sharing/dp/153911421X

http://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-016-0064-6

http://arxiv.org/abs/1407.0566

S. Ruggieri. “Using t-closeness anonymity to control for non-discrimination”. Transactions

on Data Privacy, 7(2), pp.99-129, 2014.

S. Hajian, J. Domingo-Ferrer, and O. Farras. “Generalization-based privacy preservation

and discrimination prevention in data publishing and mining”. Data Mining and

Knowledge Discovery, 28(5-6), pp.1158-1188, 2014.

C. Dwork, M. Hardt, T. Pitassi, O. Reingold and R. S. Zemel. “Fairness through awareness”. In

ITCS 2012, pp. 214-226, 2012.

S. Hajian, J. Domingo-Ferrer, A. Monreale, D. Pedreschi, and F. Giannotti. “Discrimination-

and privacy-aware patterns”. In Data Mining and Knowledge Discovery, 29(6), 2015.

F. Kamiran, T. Calders and M. Pechenizkiy. “Discrimination aware decision tree learning”.

In ICDM, pp. 869-874, 2010.

R. Zemel, Y. Wu, K. Swersky, T. Pitassi and C. Dwork. “Learning fair representations”. In

ICML, pp. 325-333, 2013.

Edelman, Benjamin G. and Luca, Michael, Digital Discrimination: The Case of

Airbnb.com(January 10, 2014). Harvard Business School NOM Unit Working Paper No. 14-

054.

Aniko Hannak, Gary Soeller, David Lazer, Alan Mislove, and Christo Wilson: Measuring

Price Discrimination and Steering on E-commerce Web Sites. Proc. of IMC. Vancouver,

Canada, November 2014.

Gary Soeller, Karrie Karahalios, Christian Sandvig, and Christo Wilson: MapWatch:

Detecting and Monitoring International Border Personalization on Online Maps. Proc. of

WWW. Montreal, Quebec, Canada, April 2016

PresentationsKDD Tutorial on bias, discrimination 2017 by Castillo et al

Suresh Venkatasubramanian: Keynote at ICWSM 2016

Ricardo Baeza: Keynote at WebSci 2016

Toon Calders: Keynote at EGC 2016

Data Transparency Lab - http://dtlconferences.org/

Fairness, Accountability, and Transparency in Machine Learning (FATML) workshop and resources

http://dtlconferences.org/

What can we all do to responsibly leverage

data-driven FATEN algorithmic decision-

making with positive social impact?

[email protected]

@nuriaoliver

the tyranny of data? the bright and dark sides of …...the tyranny of data? the bright and dark...

Documents