the tyranny of data? the bright and dark sides of …...the tyranny of data? the bright and dark...
TRANSCRIPT
The Tyranny of Data?
The Bright and Dark Sides of
Data-driven Algorithmic Decision
Making for Social Good
Nuria Oliver, PhDDirector of Research in Data Science @ Vodafone
Chief Data Scientist @ Data-Pop Alliance
Chief Scientific Advisor @ Vodafone Institute.
The Context
We live in a world of data
We live in a world of data
More data was created in the last two years
than the previous 5,000 years of humanity
Data --and the ability to
make sense of it-- are
arguably the most valuable
assets in the digitaleconomy.
The European Data Economy
Source: European Data Market Study
Almost €300 billion in 2016 €739 billion in 2020
6.16 million in 2016 255,000 in 2016
10.43 million in 2020 359,050 in 2020
Data workers Data companies
Data economy value
Exponential growth in
computing power, storage
and availability of datahave enabled significant
developments in
data-driven AI
Computational Social SciencesThe ubiquity of mobile phones enables us to collect and
analyze, for the first time in human history, large-scale
aggregated and anonymized human behavioral data of
entire cities, countries or even continents
The opportunity is HUGE to help decision making units
(governments, UN, Red Cross…) make more informed decisions thanks to the existence of quantitative real-time
information about populations
Source: Kapersky Lab
Source: Kapersky Lab
March 2016
1. How can (Big) Data help monitorthe SDGs by “filling data gaps” with more granular & disaggregated data—and what does monitoring something do to that something?
2. How can (Big) Data help promote (or impede?) the SDGSand their underlying human development vision and objectives—including towards and through lower (or higher?) inequalities?
The (Big) Data Revolution and the Sustainable Development Goals
Data-Pop Alliance is a global coalition on Big Data
& development created by the Harvard Humanitarian Initiative,
MIT Media Lab, and Overseas Development Institute joined by Flowminder, bringing
together researchers, experts, practitioners and activists to “promote a people-
centered Big Data revolution” by locally co-designing and deploying collaborative
research, training, and engagement
activities
Leadership
Prof. Alex ‘Sandy’
Pentland
Academic Director
Prof. Patrick Vinck
Co-Director &
Co-Founder
Prof. Phuong
Pham
Elizabeth Stuart
Co-Director for ODI
Dr Emma
Samman
Dr Emmanuel Letouzé
Director & Co-Founder
Dr Linus Bengtsson
Co-Director for HHI
Dr Nuria Oliver
Chief Data Scientist
Algorithms strongly influencing
decision-making and resource
optimization for public goods
through the analysis of massive
amounts of (human behavioral)
data from a variety of sources
Data-driven Social Good Algorithms
Transportation
Energy
Natural Disasters
Humanitarian Crises
Climate Change
Public Health
Urban Studies
Population
Studies
Agriculture
Areas of impact
Economic Development
Financial Inclusion
• Many decisions with significant individual and societal
implications are now made by or assisted by algorithms:
lending, policing, sentencing, resource allocation…
• Data-driven algorithmic decision-making may enhance
government efficiency and public service delivery
• Parag Khanna (Technocracy in America) argues that a data-
driven direct technocracy is superior to today’s democracy
because it may dynamically capture people’s needs while
avoiding human biases, corruption, conflicts of interest….
The Promise of Algorithmic
Decision-Making
• Global economic development projects have often been
governed by a “tyranny of experts”
• Technocratic justifications for interventions are considered to
be objective
• Intended beneficiaries are unaware of black-box decision
making
• Experts may act with impunity
• Several parallelisms to what we might refer to as “the tyranny
of algorithms”
The Tyranny of Algorithmic Decision-Making?
Six Areas of Improvement
• Computational violations of privacy
• Bias, social exclusion and
discrimination
• Information Asymmetry
• Opacity
• Veracity
• Ethics
1. Computational Violations of Privacy
• Inference of personal attributes from non-personal data: Personality, sexual orientation, intelligence,
ethnicity, political views inference from Facebook
likes (Kosinski et al, 2013), Facebook profile pictures
(Segalin et al, 2017) and patterns of access to the
3G/4G network (Park et al, 2017)
Algorithm could correctly
distinguish between gay and
straight men 81% of the time,
and 74% for women, better than
humans
Source: “Deep neural networks are more accurate than humans at detecting sexual orientation from facial images” Kosinski and WangJournal of Personality and Social Psychologyhttps://osf.io/zn79k/
35,000 facial images that men
and women publicly posted on a
US dating website.
Publicly available data could be used to infer sexual orientation without explicit consent
2. Discrimination, Bias
• Algorithmic decisions might reproduce and
even magnify patterns of discrimination due to
prejudices in decision makers, existing biases in
society and/or biases in the data
• Disparate impact, misuse of models, type of
model
• ProPublica study of COMPAS recidivism
algorithm
http://gendershades.org/overview.html
On the Web: Race and Gender Stereotypes
3. Asymmetry
• Information: The ability to accumulate and
manipulate behavioral data about customers and
citizens with unprecedented scale may give
companies and intrusive/authoritarian governments
powerful means to manipulate segments of the
population through targeted marketing or social
control strategies
• Skills: Lack of computational and data literacy
among citizens
“Social media manipulation is big business. Since 2010, political parties and governments have spent more than half a billion dollars on the research, development, and implementation of
psychological operations and public opinion manipulation over social media. In a few
countries this includes efforts to counter extremism, but in most countries this involves the
spread junk news and misinformation during elections, military crises, and complex
humanitarian disasters”.
3. Asymmetry: Manipulation
50 million Facebook
profiles harvested to
access personal
information taken without
authorization in early 2014
to build a system that
could profile individual US
voters, in order to target
them with personalized
political advertisements.
Facebook/Cambridge Analytica Scandal
Search Engine Manipulation (SEME): When one candidate is favored in search results, that
can easily shift the voting preferences of undecided
voters by 20 percent or more — up to 80 percent in
some demographic groups
Search Suggestion (SSE): (a)Google is manipulating opinions from the very first
character people type into the Google search bar,
and
(b) by manipulating search suggestions, Google can
turn a 50/50 split among undecided voters into a
90/10 split
Two powerful opinion shaping subliminal effects
4. Opacity
• Algorithmic decisions might lack
transparency for a variety of reasons (Burrell et al, 2013):
• Intentional opacity
• Illiterate opacity
• Intrinsic opacity
5. Veracity
• Today we can create fully synthetic
text, images and videos (deepfakes) which are
indistinguishable from real content
• Deepfakes could shape our public
opinion and influence our collective
decision-making
Source: Nvidia research
DerpFakes:
https://www.youtube.com/channel/UCUix6Sk2MZkVOr5PWQrtH1g
Source: University of Washington
6. Ethics
• Well intentioned projects might
have negative non-ethical
unintended consequences that need to be considered
• Projects within the law might be
non-ethical
The Way Forward
Six Lines of Work
• User (humanity)-centric approaches
• Ethical principles
• Algorithmic Transparency
• Discrimination-aware decision making
• Living labs
• Multi-disciplinary and diverse teams
1. User Centric Approaches Personal Data Stores / Markets
• Last Day/Week/Month view
• Environment data category
indiv. /community views
(timeline, maps)
• 6 individual views + 3 whole-
period individual views:
evolution of Expenses
(weeks/categ.), more
frequent contacts
(phone/BT)
Source: Mobile Territorial Lab
1. User Centric Approaches
Secure control of Personal Data
https://www.enigma.co/enigma_full.pdf
Source: “Enigma: Decentralized Computation Platform with Guaranteed Privacy”, Zyskind, G., Nathan, O. and Pentland, A.
A peer-to-peer network, enabling different parties to jointly store and run
computations on data while keeping the data completely private. An
external blockchain is utilized as the controller of the network, manages
access control and identities, and serves as a tamper-proof log of events.
Security deposits and fees incentivize operation, correctness, and fairness
of the system. Enigma removes the need for a trusted third party, enabling
autonomous control of personal data. For the first time, users are able to
share their data with cryptographic guarantees regarding their privacy.
2. Ethical principles are needed
• We need to include ethical
considerations since the inception of
an algorithm
• Multi-disciplinary teams
• Ethics panels and ethical code of
conduct
• Chief Ethics Officer (CEO)
2. Ethical principles are needed
1. Behind data there are people
2. Privacy is not a binary variable
3. Guard against re-identification 4. Practice ethical data sharing
5. Know the strengths and limitations of
the data
6. Debate the tough ethical choices
7. Develop a code of conduct
8. Design data and systems for auditability
9. Consider the broader consequences
10.Know when to break these rules
From “Ten simple rules for responsible big data research” by M Zook at al, PLOS Comp Biol, 2017
Example of
Principles in Big Data Research
2. Ethical principles are neededAsilomar & Future of Life Institute
https://futureoflife.org/ai-principles/
1.Safety
2.Failure Transparency
3.Judicial Transparency
4.Responsibility and Accountability
5.Value alignment
6.Human values
7.Personal privacy and control
8. Liberty and privacy
9. Shared, broad benefit
10.Shared, broad prosperity
11.Human control
12.Non-subversion
13.AI arms race
Trustworthy AI has two components:
(1)Respect fundamental rights, applicable
regulation and core principles and values,
ensuring an “ethical purpose”
(2)Be technically robust and reliable since,
even with good intentions, a lack of
technological mastery can cause
unintentional harm
“Machine learning researchers
should avoid using totally ordered
objective functions or loss functions
as optimization goals in high-stakes
applications.”
“High-stakes systems should always
exhibit uncertainty about the best
action in some cases and rely on
human decisions”
3. Algorithmic Transparency
• Explainable algorithms
• Transparency regarding:
• the limitations and uncertainties of
the algorithms
• when we are dealing with an
algorithm vs a human
• how is our data being used, what
for
Principles for Algorithmic Transparency and Accountability
4. Discrimination-aware decision-making
Data Algorithm Model Decision
Preprocessing In-processing Postprocessing
Fairness Utility/PerformanceTrade-off
1. Define anti-discrimination or fairness constraints
2. Transform the data/algorithm/decision to satisfy the constraints3. Measure the data/model/decision utility
5. Living Labs and Sandboxes
100+ participants in Trento, ItalyInstrumented phone + Personal Data
Store
Volunteers to participate in user studies
on the topic of mobile personal data User-centric mobile data monetization
Laboratorio Urbano, BogotaOpen space for collaborative work
Multi-disciplinary experimentation and
analysis of Bogota’s urban challenges
Goal: generate innovative solutions
Research, design and development of AI systems is dominated
today by high-educated, very well paid males
However, AI systems are being used to model and predict the
behaviors, tastes and traits of very diverse populations with
very different life experiences
More diversity in the field would help ensure that AI systems
reflect and are meaningful to a broader user base and
viewpoints
Source AINow Institute 2017 Report
6. Multi-disciplinary and diverse teams are
a must
The Copenhagen Letter
http://copenhagenletter.org
1. Tech is not above us
2. Progress is more than
innovation
3. Let’s build from trust4. Design open to
scrutiny5. Humanity-centered
design
Fairness or JusticeNon-discrimination
Cooperation
FATEN Algorithms
AutonomyAccountability
Intelligence Augmentation
FATEN Algorithms
Transparency
FATEN Algorithms
bEneficenceprogress, sustainability,
diversity, veracity,Education
FATEN Algorithms
No-maleficence: reliability, security,
reproducibility, prudence, privacy
FATEN Algorithms
Data-enabled decision-making for good
Source: The Tyranny of Data? The Bright and the DarkSides of Data-Driven Decision-making for Social Good, Lepri et al, in Transparent Data Mining for Big and Small data, Springer 2016
“It is only when we
honor these
requirements that we will
be able to move (…) to
a data-enabled model
of democratic
governance … for the
people.”
User-centric data
ownership and
management
Education, living
labs, citizen
engagement,
multidisciplinary and
diverse teams
FATEN: Fair,
Accountable,
Transparent,
Beneficient and
Non-maleficient
Algorithms
Related Publications
•"Fair, transparent and accountable algorithmic decision-making processes"
Lepri, B., Oliver, N., Letouze, E., Pentland, A. and Vinck, P.
Springer Journal on Philosophy and Technology, 2017
•"The Tyranny of Data?: The Bright and Dark Sides of Data-driven Decision-making for
Social Good" Lepri, B., Oliver, N., Letouze, E., Pentland, A. and Vinck, P. in "Transparent data mining for Big and Small data" Springer, 2016
•"The Rise of Decentralized Personal Data Markets" in "Trust::Data: A New Framework for
Identity and Data Sharing", CreateScience Independent Publishing Platform, Oct 2016
Staiano, J., Zyskind, G., Lepri, B., Oliver, N. and Pentland, A.
•"The mobile territorial lab: a multilayered and dynamic view on parents' daily lives“
Centellegher, S., de Nadai, M., Caraviello, M., Leonardi, C., Vescovi, M., Ramadian,
Y., Oliver, N., Pianesi, F., Pentland, A., Antonelli, F. and Lepri, B.
EPJ Data Science, SpringerOpen, Feb 2016
•" Money Walks: A Human-Centric Study on the Economics of Personal Data "
Statiano, J. , Lepri, B., Oliveira, N. , Caraviello, M., Sebe, N. and Oliver, N.Proceedings of ACM Ubicomp 2014. Seattle. September 2014. Best paper award
S. Ruggieri. “Using t-closeness anonymity to control for non-discrimination”. Transactions
on Data Privacy, 7(2), pp.99-129, 2014.
S. Hajian, J. Domingo-Ferrer, and O. Farras. “Generalization-based privacy preservation
and discrimination prevention in data publishing and mining”. Data Mining and
Knowledge Discovery, 28(5-6), pp.1158-1188, 2014.
C. Dwork, M. Hardt, T. Pitassi, O. Reingold and R. S. Zemel. “Fairness through awareness”. In
ITCS 2012, pp. 214-226, 2012.
S. Hajian, J. Domingo-Ferrer, A. Monreale, D. Pedreschi, and F. Giannotti. “Discrimination-
and privacy-aware patterns”. In Data Mining and Knowledge Discovery, 29(6), 2015.
F. Kamiran, T. Calders and M. Pechenizkiy. “Discrimination aware decision tree learning”.
In ICDM, pp. 869-874, 2010.
R. Zemel, Y. Wu, K. Swersky, T. Pitassi and C. Dwork. “Learning fair representations”. In
ICML, pp. 325-333, 2013.
Edelman, Benjamin G. and Luca, Michael, Digital Discrimination: The Case of
Airbnb.com(January 10, 2014). Harvard Business School NOM Unit Working Paper No. 14-
054.
Aniko Hannak, Gary Soeller, David Lazer, Alan Mislove, and Christo Wilson: Measuring
Price Discrimination and Steering on E-commerce Web Sites. Proc. of IMC. Vancouver,
Canada, November 2014.
Gary Soeller, Karrie Karahalios, Christian Sandvig, and Christo Wilson: MapWatch:
Detecting and Monitoring International Border Personalization on Online Maps. Proc. of
WWW. Montreal, Quebec, Canada, April 2016
PresentationsKDD Tutorial on bias, discrimination 2017 by Castillo et al
Suresh Venkatasubramanian: Keynote at ICWSM 2016
Ricardo Baeza: Keynote at WebSci 2016
Toon Calders: Keynote at EGC 2016
Data Transparency Lab - http://dtlconferences.org/
Fairness, Accountability, and Transparency in Machine Learning (FATML) workshop and resources
What can we all do to responsibly leverage
data-driven FATEN algorithmic decision-
making with positive social impact?
@nuriaoliver