on the edge of human-data interaction in the datab xiot.ed.ac.uk/files/2019/05/mortier-opt.pdf ·...

Post on 11-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Networks & Operating SystemsSRG, Computer Laboratory

On the Edge of Human-Data Interaction in theRichard Mortier

DATAB X01000100 01100001 01110100 0110000101100010 01111000

http://weputachipinit.tumblr.com/“Itwasjustadumbthing. Thenweputachipinit.Nowit'sasmartthing.”

http://bigdatapix.tumblr.com/“BigDataisvisualizedinsomanyways... allofthemblueandwithnumbersandlensflare.”

Living in a Big Data World

• Challenges and Opportunities• Who’s tracking us, to what end?• Personalisation, Internet of Things

• Digital Footprints• Intimate information in large, rich data silos• Never forgets or forgives

Key Challenge: How do we enable data subjects to control collection

and exploitation of both their data and data about them?

�2

Existing Ecosystem: Move Data

�3

your data

your data

you

processorsyour data

data

data

your data

A Structural Problem?

• The Internet is fragmented, distributed systems are difficult• Centralising simplifies things• The Cloud means we can, so we do!

�4

https://www.stickermule.com/marketplace/3442-there-is-no-cloud

• Ease of cloud computing means, by default, we move data to the cloud for processing

your data

your data

you

processorsyour data

data

data

your data

• Horizon (~2009): Build us a Magic Context Service!

Restructuring the Problem

• Horizon (~2009): Build us a Magic Context Service!• No-one could explain, but it definitely involved

using ALL the personal datas

�5

• The Lazy Computerist’s Approach: Punt on the hard problems!• I don’t know what you want when you say you want context• But give me some program encoding what you want, I’ll run it for you

❶ request

permission❷

processing❸

sources

processors

results❹

subjects ❺ interac(ons • Dataware, a service-oriented architecture for personal data processing• User provides platform to run processor’s code

• Key? Move code to data, not data to code

Constructing Interaction

• Many proposed interaction models• E.g., pay-per-use

• Little about how to actually provide for it• Dataware was one such proposal• Accountable transaction between parties in terms of request,

permission, audit• But there’s a lot more to consider here…

�6

Human-Data Interaction

• Data collected

• Analytics process data

• Inferences drawn

• Actions taken as a result

�7

Seeing & understanding

We are unaware of • the many sources of

data collected about us,

• the analyses performed on this data, and

• the implications of these analyses

We Lack Legibility, Agency, Negotiability

�8

Capacity to act

We are unaware of• how to affect data

collection,• how to affect data

analysis,• if they even exist, and

we know enough to want to employ them

Dynamics of interaction

We’re still trapped by current systems and services• Binary accept/reject of

terms• Cannot subsequently

modify or refine our decisions

Articulation Work

• Dataware subject is engaged in cooperative work• There is interdependence between subject, processor, perhaps other

subjects• Activities must thus be meshed together, e.g., Schmidt (1994) • maintaining reciprocal awareness of salient activities within a

cooperative ensemble• directing attention towards current state of cooperative activities• assigning tasks to members of the ensemble• handing over aspects of the work for others to pick up

�9

Databox: Dataware v2

�10

• Mediates access to data, local or remote• Control internal and external communications• Log all I/O for users to inspect, control

you

yourdata

yourdata

yourdata

yourdata

datayourdata

data

yourdatabox

processors

Databox moves code to data, minimising data release and

retaining control over processing

!11

Henry downloads his bank's app onto his databox.

FRAUD DETECTIONDATAB X01000100 01100001 01110100 01100001 01100010 01111000

...sometime later, in Thailand

...a large transaction is made with Henry's card.

Henry's banking app checkshis location.

and tells the Bank Henry isNOT in Thailand.

The transaction is refused.

Henry is happy. So is his bank manager.

isHenry in

Thailand?

NO

Databox: Current Implementation

�12

Arbiter

GitHub

AppDriver

ContainerManager Proxy

Dashboard User

Actuators

SourcesArbiter

BridgeExport

DriverStoreApp

3rd Parties

RegistryCore

Network

CoreUIAppStore

Does Databox Provide for HDI?

• App construction and installation • App manifests describe the data sources needed• User reifies requested data sources with those available on installation

• Logging and audit • All interactions (reads, writes) are logged and collected• Both real-time dashboard display and offline analysis and visualisation

• Possibility of automated risk profiling (of numeric timeseries data) • Based on characteristics of subject’s personal data• Composable across sources and app publishers

�13

Enabling Physical Interactivity

• Physical devices often easier to reason about • Visible; Located; Proximate; Portable

• Physical access control (“bag of keys”) iswidely understood

• For example, • Access to our smart meter data allowed only if a green tag is in my

Databox and in my wife’s Databox, or when the green tag is in one Databox and we’re both in the house

• Physical interactions providing for virtual connectivity

�14

Big Data Analytics?

�15

Big Data Big Data Analytics

Small Data

aggr

egat

e

public

private

traditional centralised cloud

Big Data Analytics? Small Data Analytics!

�16

Big Data Big Data Analytics

Small Data Small Data Analytics

aggr

egat

e

public

private

traditional centralised cloud

exploratory decentralised computation

aggr

egat

e

Wide-Area Distributed AnalyticsCurrent: centralise data so it can be processed, usually in big datacenters

�17

First attempt: distribute models and then refine locally

Online learning

Cooperativelearning

u1

MS

Batch learning

Training data

ML1i+1Inferenced1

di+1

InferenceML1i+1MPi+1

u1

ML1i+1Inferenced1

di+1

InferenceML1i+1MPi+1

u3

ML1i+1Inferenced1

di+1

InferenceML1i+1MPi+1

u2

Goal? Fully distributed inference and learning

ASPSSP

BSP

pBSPpSSP

ASP

Strong consistencySlow iteration rateFully centralised

Weak consistencyFast iteration rateFully distributed

BSPSSP

ASP

Consistency

Com

pleteness PSP

data

datasubject

processorsdata

data

data

data

Design Challenges

• Sharing data • Need to support offline data collection from e.g., mobile phones• Need a rendezvous and identity service for direct interconnection

• Shared data • No current platform is a good fit to social dynamics of a household!• Who and how to manage users, groups?• Who gets to be root?

�18

Questions?

http://hdiresearch.org/https://databoxproject.uk/

http://ocaml.xyz/

http://mort.io/richard.mortier@cl.cam.ac.uk

�19

top related