on the edge of human-data interaction in the datab xiot.ed.ac.uk/files/2019/05/mortier-opt.pdf ·...
TRANSCRIPT
Networks & Operating SystemsSRG, Computer Laboratory
On the Edge of Human-Data Interaction in theRichard Mortier
DATAB X01000100 01100001 01110100 0110000101100010 01111000
http://weputachipinit.tumblr.com/“Itwasjustadumbthing. Thenweputachipinit.Nowit'sasmartthing.”
http://bigdatapix.tumblr.com/“BigDataisvisualizedinsomanyways... allofthemblueandwithnumbersandlensflare.”
Living in a Big Data World
• Challenges and Opportunities• Who’s tracking us, to what end?• Personalisation, Internet of Things
• Digital Footprints• Intimate information in large, rich data silos• Never forgets or forgives
Key Challenge: How do we enable data subjects to control collection
and exploitation of both their data and data about them?
�2
Existing Ecosystem: Move Data
�3
your data
your data
you
processorsyour data
data
data
your data
A Structural Problem?
• The Internet is fragmented, distributed systems are difficult• Centralising simplifies things• The Cloud means we can, so we do!
�4
https://www.stickermule.com/marketplace/3442-there-is-no-cloud
• Ease of cloud computing means, by default, we move data to the cloud for processing
your data
your data
you
processorsyour data
data
data
your data
• Horizon (~2009): Build us a Magic Context Service!
Restructuring the Problem
• Horizon (~2009): Build us a Magic Context Service!• No-one could explain, but it definitely involved
using ALL the personal datas
�5
• The Lazy Computerist’s Approach: Punt on the hard problems!• I don’t know what you want when you say you want context• But give me some program encoding what you want, I’ll run it for you
❶ request
permission❷
processing❸
sources
processors
results❹
subjects ❺ interac(ons • Dataware, a service-oriented architecture for personal data processing• User provides platform to run processor’s code
• Key? Move code to data, not data to code
Constructing Interaction
• Many proposed interaction models• E.g., pay-per-use
• Little about how to actually provide for it• Dataware was one such proposal• Accountable transaction between parties in terms of request,
permission, audit• But there’s a lot more to consider here…
�6
Human-Data Interaction
• Data collected
• Analytics process data
• Inferences drawn
• Actions taken as a result
�7
Seeing & understanding
We are unaware of • the many sources of
data collected about us,
• the analyses performed on this data, and
• the implications of these analyses
We Lack Legibility, Agency, Negotiability
�8
Capacity to act
We are unaware of• how to affect data
collection,• how to affect data
analysis,• if they even exist, and
we know enough to want to employ them
Dynamics of interaction
We’re still trapped by current systems and services• Binary accept/reject of
terms• Cannot subsequently
modify or refine our decisions
Articulation Work
• Dataware subject is engaged in cooperative work• There is interdependence between subject, processor, perhaps other
subjects• Activities must thus be meshed together, e.g., Schmidt (1994) • maintaining reciprocal awareness of salient activities within a
cooperative ensemble• directing attention towards current state of cooperative activities• assigning tasks to members of the ensemble• handing over aspects of the work for others to pick up
�9
Databox: Dataware v2
�10
• Mediates access to data, local or remote• Control internal and external communications• Log all I/O for users to inspect, control
you
yourdata
yourdata
yourdata
yourdata
datayourdata
data
yourdatabox
processors
Databox moves code to data, minimising data release and
retaining control over processing
!11
Henry downloads his bank's app onto his databox.
FRAUD DETECTIONDATAB X01000100 01100001 01110100 01100001 01100010 01111000
...sometime later, in Thailand
...a large transaction is made with Henry's card.
Henry's banking app checkshis location.
and tells the Bank Henry isNOT in Thailand.
The transaction is refused.
Henry is happy. So is his bank manager.
isHenry in
Thailand?
NO
Databox: Current Implementation
�12
Arbiter
GitHub
AppDriver
ContainerManager Proxy
Dashboard User
Actuators
SourcesArbiter
BridgeExport
DriverStoreApp
3rd Parties
RegistryCore
Network
CoreUIAppStore
Does Databox Provide for HDI?
• App construction and installation • App manifests describe the data sources needed• User reifies requested data sources with those available on installation
• Logging and audit • All interactions (reads, writes) are logged and collected• Both real-time dashboard display and offline analysis and visualisation
• Possibility of automated risk profiling (of numeric timeseries data) • Based on characteristics of subject’s personal data• Composable across sources and app publishers
�13
Enabling Physical Interactivity
• Physical devices often easier to reason about • Visible; Located; Proximate; Portable
• Physical access control (“bag of keys”) iswidely understood
• For example, • Access to our smart meter data allowed only if a green tag is in my
Databox and in my wife’s Databox, or when the green tag is in one Databox and we’re both in the house
• Physical interactions providing for virtual connectivity
�14
Big Data Analytics?
�15
Big Data Big Data Analytics
Small Data
aggr
egat
e
public
private
traditional centralised cloud
Big Data Analytics? Small Data Analytics!
�16
Big Data Big Data Analytics
Small Data Small Data Analytics
aggr
egat
e
public
private
traditional centralised cloud
exploratory decentralised computation
aggr
egat
e
Wide-Area Distributed AnalyticsCurrent: centralise data so it can be processed, usually in big datacenters
�17
First attempt: distribute models and then refine locally
Online learning
Cooperativelearning
u1
MS
Batch learning
Training data
ML1i+1Inferenced1
di+1
InferenceML1i+1MPi+1
u1
ML1i+1Inferenced1
di+1
InferenceML1i+1MPi+1
u3
ML1i+1Inferenced1
di+1
InferenceML1i+1MPi+1
u2
Goal? Fully distributed inference and learning
ASPSSP
BSP
pBSPpSSP
ASP
Strong consistencySlow iteration rateFully centralised
Weak consistencyFast iteration rateFully distributed
BSPSSP
ASP
Consistency
Com
pleteness PSP
data
datasubject
processorsdata
data
data
data
Design Challenges
• Sharing data • Need to support offline data collection from e.g., mobile phones• Need a rendezvous and identity service for direct interconnection
• Shared data • No current platform is a good fit to social dynamics of a household!• Who and how to manage users, groups?• Who gets to be root?
�18
Questions?
http://hdiresearch.org/https://databoxproject.uk/
http://ocaml.xyz/
http://mort.io/[email protected]
�19