2020.02.06 workshop on federated learning and analytics

Confidential + Proprietary

Federated learning at Google:systems, algorithms, and applicationsK. Bonawitz [email protected] the work of many

Workshop on Federated Learning and Analytics (FL-IBM’20)2020.02.06

http://federated.withgoogle.com

mailto:[email protected]

Federated learning

Federated learning is a machine learning setting where multiple entities (clients) collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client's raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective.

working definition proposed in Advances and Open Problems in Federated Learning (arxiv/1912.04977)

https://arxiv.org/abs/1912.04977

Why federated learning?

Data is born at the edge

Billions of phones & IoT devices constantly generate data

Data enables better products and smarter models

Data processing is moving on device:● Improved latency● Works offline● Better battery life● Privacy advantages

E.g., on-device inference for mobile keyboards and cameras.

Can data live at the edge?

Data processing is moving on device:● Improved latency● Works offline● Better battery life● Privacy advantages

E.g., on-device inference for mobile keyboards and cameras.

Can data live at the edge?

What about analytics?What about learning?

ML on Sensitive Data: Privacy versus UtilityPr

ivac

y

Utility

ML on Sensitive Data: Privacy versus UtilityPr

ivac

y

Utility

perception

today

Priv

acy

Utility

1. Policy

2. Technology

ML on Sensitive Data: Privacy versus Utility

perception

Priv

acy

Utility

goal 1. Policy

2. New Technology

ML on Sensitive Data: Privacy versus Utility (?)

Push the pareto frontier with better technology.

Make achieving high privacy and utility possible with less work.

today

APPLICATIONS INFRASTRUCTURE

RESEARCH

→ New capabilities →

← Real-world grounding ←

→ R

eal-w

orld

pro

blem

s →

← N

ovel

sol

utio

ns ←

→ Requirements →

← Practical solutions ←

Early days (2017) https://ai.googleblog.com/2017/04/federated-learning-collaborative.html

https://ai.googleblog.com/2017/04/federated-learning-collaborative.html

The nascent field of federated learning


And workshops like this one...

Advances and Open Problems in FL

58 authors from 25 institutions

arxiv.org/abs/1912.04977

https://arxiv.org/abs/1912.04977


● Cross-device vs Cross-silo

● Data partitioning: Horizontal, Vertical, other

Characteristics of the federated learning setting (I)

Datacenter distributed learning Cross-silo federated learning

Cross-device federated learning

Setting Training a model on a large but "flat" dataset. Clients are compute nodes in a single cluster or datacenter.

Training a model on siloed data. Clients are different organizations (e.g., medical or financial) or datacenters in different geographical regions.

The clients are a very large number of mobile or IoT devices.

Data distribution Data is centrally stored, so it can be shuffled and balanced across clients. Any client can read any part of the dataset.

Data is generated locally and remains decentralized. Each client stores its own data and cannot read the data of other clients. Data is not independently or identically distributed.

Orchestration Centrally orchestrated. A central orchestration server/service organizes the training, but never sees raw data.

Wide-area communication

None (fully connected clients in one datacenter/cluster).

Hub-and-spoke topology, with the hub representing a coordinating service provider (typically without data) and the spokes connecting to clients.

Data availability All clients are almost always available. Only a fraction of clients are available at any one time, often with diurnal and other variations.

Distribution scale Typically 1 - 1000 clients. Typically 2 - 100 clients. Massively parallel, up to 1010 clients.

Characteristics of the federated learning setting (II)

Datacenter distributed learning Cross-silo federated learning

Cross-device federated learning

Addressability Each client has an identity or name that allows the system to access it specifically.

Clients cannot be indexed directly (i.e., no use of client identifiers)

Client statefulness Stateful --- each client may participate in each round of the computation, carrying state from round to round.

Generally stateless --- each client will likely participate only once in a task, so generally we assume a fresh sample of never before seen clients in each round of computation.

Primary bottleneck Computation is more often the bottleneck in the datacenter, where very fast networks can be assumed.

Might be computation or communication.

Communication is often the primary bottleneck, though it depends on the task. Generally, federated computations uses wi-fi or slower connections.

Reliability of clients Relatively few failures. Highly unreliable --- 5% or more of the clients participating in a round of computation are expected to fail or drop out (e.g., because the device becomes ineligible when battery, network, or idleness requirements for training/computation are violated).

Data partition axis Data can be partitioned / re-partitioned arbitrarily across clients.

Partition is fixed. Could be example-partitioned (horizontal) or feature-partitioned (vertical).

Fixed partitioning by example(horizontal).

ML Engineer's Workflow

engineer

cloud data

Train & evaluateon cloud data

Model engineer workflow

server

clients

server

Final modelvalidation steps

engineer

Modeldeployment workflow

Modeldeployment workflow

clients

server

engineer

Deploy modelto devices

for on-deviceinference

Train & evaluateon decentralized

data

clients

server

Federatedtraining

engineer

opted-in

Federated learning

data device

Need me?

Federated learning

data device

Need me?

Not now

data device

Need me?

Yes!

Federated learning

initial model

engineer

Federated learning

data device updatedmodel

data device

Federated learning

initial model

engineer

(ephemeral)updatedmodel

data device

Federated learning

initial model

engineer

updatedmodel

Privacy principleFocused collection

Devices report only what is needed for this computation

data device

Federated learning

initial model

engineer

updatedmodel

Privacy principleEphemeral reports

Server never persistsper-device reports

data device

combinedmodel

∑

Federated learning

initial model

engineer

updatedmodel

data device

combinedmodel

∑

Federated learning

initial model

engineer

updatedmodel

Privacy principleOnly-in-aggregate Engineer may only access combined

device reports

(another)initial model

data device

combinedmodel

Federated learning

engineer

(another)combined

model


∑

Federated learning

engineer

data device

data device

(another)combined

model


∑

Typical orders-of-magnitude

100-1000s of users per round

100-1000s of rounds to convergence

1-10 minutes per round

engineer

Federated learning

Federated learning at Google500M+ installs

Daily use by multiple teams

Powering features on Pixel devices and in Gboard and Android Messages.

Federated Learning on Pixel Phones

Gboard: next-word prediction

Federated RNN (compared to prior n-gram model):● Better next-word prediction accuracy: +24%● More useful prediction strip: +10% more clicks

Federated modelcompared to baseline

A. Hard, et al. Federated Learning for Mobile Keyboard Prediction. arXiv:1811.03604

Other federated models in GboardEmoji prediction● 7% more accurate emoji predictions● prediction strip clicks +4% more● 11% more users share emojis!

Action predictionWhen is it useful to suggest a gif, sticker, or search query?● 47% reduction in unhelpful suggestions● increasing overall emoji, gif, and sticker

shares

Discovering new wordsFederated discovery of what words people are typing that Gboard doesn’t know.

T. Yang, et al. Applied Federated Learning: Improving Google Keyboard Query Suggestions. arXiv:1812.02903

M. Chen, et al. Federated Learning Of Out-Of-Vocabulary Words. arXiv:1903.10635

Ramaswamy, et al. Federated Learning for Emoji Prediction in a Mobile Keyboard. arXiv:1906.04329.

Privacy-in-depth

We advocate for building federated systems wherein the privacy properties degrade as gracefully as possible in cases where one technique or another fails to provide its intended privacy contribution.

∑

initial model

engineer

updatedmodel


device reports

Privacy principleEphemeral reports

Server never persistsper-device reports

Privacy principleFocused collection

Devices report only what is needed for this computation

∑

initial model

engineer


device reports

Wouldn't it be great if...

Confidential + Proprietary

Secure Aggregation.Existing protocols either:

Transmit a lot of data

Fail when users drop out

(or both)

A novel protocol for K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.

Alice

Bob

Carol

Random positive/negative pairs, aka antiparticles

Devices cooperate to sample random pairs of 0-sum perturbations vectors.

Matched pair sums to 0

Alice

Bob

Carol

Random positive/negative pairs, aka antiparticles

Devices cooperate to sample random pairs of 0-sum perturbations vectors.

Alice

Bob

Carol

Add antiparticles before sending to the server

Each contribution looks random on its own...

++

+

+

+

+

The antiparticles cancel when summing contributions

++

+

+

+

but paired antiparticles cancel out when summed.


+∑

Alice

Bob

Carol

Revealing the sum.

++

+

but paired antiparticles cancel out when summed.


+

+

+∑

Alice

Bob

Carol

Google aggregates users' updates, but cannot inspect the individual updates.

∑

Secure Aggregation

# Params Bits/Param # Users Expansion

220 = 1 m 16 210 = 1 k 1.73x

224 = 16 m 16 214 = 16 k 1.98x

Communication Efficient

Secure⅓ malicious clients + fully observed server

Robust

⅓ clients can drop out

Interactive Cryptographic ProtocolEach phase, 1000 clients + server interchange messages over 4 rounds of communication.

K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.

data device

updated model

model

+∑

1. Devices “clip” their updates, limiting any one user's contribution

2. Server adds noise when combining updates

DP-SGD plus Federated Averaging M. Abadi, et. al. Deep Learning with Differential Privacy. CCS 2016.

H. B. McMahan, et al. Learning Differentially Private Recurrent Language Models. ICLR 2018.

data device

updated model

model

∑

+

++

+

+

Secure Aggregation + DP → Distributed DP

data device

updated model

model

∑

+

++

+

+

Secure Aggregation + DP → Distributed DP

Local DP (lower privacy/higher ε)

Central DP (high privacy/

small ε)

ML Engineer's Workflow

Train & evaluateon decentralized

data

clients

server

Federatedtraining

engineer

engineer

cloud data


server

This is what we like to see ...

engineer

cloud data


server

Gahhh!

… but sometimes we see this

Typical ML Tasks requiring data inspection

Augenstein, et. al. Generative Models for Effective ML on Private, Decentralized Datasets. Arxiv, 2019.

Typical ML Tasks requiring representative examples

Augenstein, et. al. Generative Models for Effective ML on Private, Decentralized Datasets. Arxiv, 2019.

Example Federated GAN Problem

Example Federated GAN ProblemAfter application update, the classification accuracy drops

Train 2 GANs: one on a subset of data exhibiting high classification accuracy, and another on low classification accuracy.

Example Federated GAN Problem

Example Federated GAN ResultsGAN after 1000 rdsGAN after 0 rdsExample of Real

Data on Devices in Sub-Population

Population Description

EMNIST Dataset, 50% of Devices have their images ‘flipped’ (black<-> white)

Sub-Population Description

Devices where data classifies with ‘low’ accuracy

Devices where data classifies with ‘high’ accuracy

Example Federated GAN ResultsExample of Real Data on Devices in Sub-Population






After 1000 roundsAfter 1 round

Example Federated GAN ResultsAfter 1000 roundsAfter 1 roundExample of Real

Data on Devices in Sub-Population






Example Federated GAN ResultsGAN after 1000 rdsExample of Real

Data on Devices in Sub-Population Now the modeler

can discern this difference ...

… indicating that this is the problem

Development environment designed specifically for FL● Language that combines TF and communication (embedded in Python)● Libraries of FL algorithms (tff.learning) expressed in this language● Runtimes, datasets, examples, etc., for (simulation-based) experiments

Part of TensorFlow ecosystem● tensorflow.org/federated

OSS project on GitHub● github.com/tensorflow/federated

http://tensorflow.org/federated

http://github.com/tensorflow/federated

Easy way to get started exploring the FL space on your own● Pseudocode-like style of programming, high-level and compact● Reference implementations of core FL algorithms such as federated

averaging that you can fork/modify● Preprocessed datasets and some standard models (more coming,

contribute your own)● Modular and configurable simulation environment (Python notebooks)● Repro of research (emerging), including simulation scripts, models,

hyperparameters, to fork, modify, and experiment with

A way to leverage the latest FL research in your application● Designed from day 1 to facilitate deployment to physical devices● Designed for smooth transition from simulations into production

○ Federated learning logic is expressed in a platform- and language-independent manner, so your code does not have to change during this transition

● Designed for composability and hackability○ Explicit mechanisms for expressing FL code as reusable and stackable modules○ Code structure that’s easy to read and modify

● Actively used at Google, integrated with our production infrastructure● Deployment options (emerging), interfaces and tools

* Embedded in Python.

Communication is an integral part of your application logic!● Canned algorithms don’t always work out of the box

○ You may have to try different algorithms○ Your specific use case may call for engineering a custom communication pattern

● A given deployment scenario may call for additional ingredients○ Compression, differential privacy, adaptive, stateful, multi-round algorithms, etc.

Existing tools offer inadequate communication abstractions● Point-to-point messaging, checkpoints, etc. are much too low-level● Allreduce-like abstractions not a good fit for mobile device deployment● No first-class support from the type system, etc.

Portability between research and production is essential● Effective development may only be feasible on a live deployed system

○ E.g., by evaluating ideas by training and evaluation in “dry mode”

● Reduced friction for deploying new research algorithms in production○ Plus ability to use simulation framework to test production code

Consequences:● For maximum portability, code should be platform/language-agnostic● Program logic should be expressed declaratively to support:

○ Ability to compile to diverse platforms○ Ability to statically analyze all code to verify that it has the properties we want

CLIENTS

CLIENTS

68.0

70.5

69.8

70.1

a local item of data of type float32(e.g., a sensor reading or a model weight)

CLIENTS

68.0

70.5

69.8

70.1


a “federated value” (a multi-set)

CLIENTS

68.0

70.5

69.8

70.1



has type {float32}@CLIENTS

CLIENTS

68.0

70.5

69.8

70.1



the “placement”type of local items on each client


68.0

70.5

69.8

70.1


the “placement”type of local items on each client


SERVER

68.0

70.5

69.8

70.1 the “placement”type of local items on each client


SERVER

?

68.0

70.5

69.8

70.1{float32}@CLIENTS

SERVER

?

float32@SERVER

68.0

70.5

69.8


SERVER

69.5

float32@SERVER

distributed aggregation

68.0

70.5

69.8


SERVER

69.5

float32@SERVER

distributed aggregation

federated “op” can be interpreted as a function even though its inputs and outputs are in different places

68.0

70.5

69.8

70.1

SERVER

69.5distributed aggregation


{float32}@CLIENTS → float32@SERVER

68.0

70.5

69.8

70.1

SERVER

69.5


tff.federated_mean

it represents an abstract specification ofa distributed communication protocol

{float32}@CLIENTS → float32@SERVER

READINGS_TYPE = tff.FederatedType(tf.float32, tff.CLIENTS)

# An abstract specification of a simple distributed system

@tff.federated_computation(READINGS_TYPE)

def get_average_temperature(sensor_readings):

return tff.federated_mean(sensor_readings)

@tff.federated_computation(READINGS_TYPE)

def get_average_temperature(sensor_readings):

return tff.federated_mean(sensor_readings)

What does “get_average_temperature” represent?● The body of the Python function was traced once, disposed of, and

replaced by serialized abstract representation in TFF’s language.

an instance of TFF’s computation.proto (no longer Python code)

https://github.com/tensorflow/federated/blob/master/tensorflow_federated/proto/v0/computation.proto

temperaturesensor readings(an input)

temperaturethreshold(an input)?

output: % of sensor readings > threshold

federated broadcasttemperaturesensor readings(an input)

temperaturethreshold(an input)


federated broadcast

federatedmap

1

0

1

>to_float

>

>




federated broadcast

1

0

1

>to_float

>

> federated mean




federatedmap

95

@tff.federated_computationdef get_fraction_over_threshold(readings, threshold):

return ...

client-side input server-side input

96


return tff.federated_mean( tff.federated_map( exceeds_threshold_fn, [readings, tff.federated_broadcast(threshold)]))

collective operations and communication

97

@tff.tf_computationdef exceeds_threshold_fn(reading, threshold): return tf.to_float(reading > threshold)



local on-device processing

98

READINGS_TYPE = tff.FederatedType(tf.float32, tff.CLIENTS)THRESHOLD_TYPE = tff.FederatedType(tf.float32, tff.SERVER)

@tff.tf_computation(tf.float32, tf.float32)def exceeds_threshold_fn(reading, threshold): return tf.to_float(reading > threshold)

@tff.federated_computation(READINGS_TYPE, THRESHOLD_TYPE)def get_fraction_over_threshold(readings, threshold):


types

@tff.federated_computation( SERVER_MODEL_TYPE, SERVER_FLOAT_TYPE, CLIENT_DATA_TYPE)def federated_train(model, learning_rate, data): return ...

101

initial modelon the server

server-supplied learning rate

on-device data

102

@tff.federated_computation( SERVER_MODEL_TYPE, SERVER_FLOAT_TYPE, CLIENT_DATA_TYPE)def federated_train(model, learning_rate, data): return tff.federated_mean( tff.federated_map( local_train, [tff.federated_broadcast(model), tff.federated_broadcast(learning_rate), data]))

server-to-client communication

103


everything needed for local training is now on clients

104


clients train locallyanother computation

105


averaging the locally trained models

How to inject compression when broadcasting data:

tff.federated_map(decode, tff.federated_broadcast( tff.federated_apply(encode, x)))

How to inject differential privacy when aggregating:tff.federated_mean( tff.federated_map(y → clip(y) + noise, x))

NOTE: Showing a lambda expression here in a simplifed form for the sakeof clarity; you would define a TFF computation (it’s shown in the tutorial).

initial model

locallytrained models

Calling a TFF computation like a Python function:

train_data, test_data = tff.simulation.datasets.emnist.load_data()

112

create an object that represents training and test data

train_data, test_data = tff.simulation.datasets.emnist.load_data()all_clients = train_data.client_ids

113

obtain the list of client ids(only accessible to the code that sets

up the experiment loop in Python)

train_data, test_data = tff.simulation.datasets.emnist.load_data()all_clients = train_data.client_ids... = train_data.create_tf_dataset_for_client(...)

construct an eager tf.data.Dataset for a given client

114


for round_num in range(5): clients_selected_in_this_round = random.sample(all_clients, 10)

115

in each round, simulate client selection (in Python)


for round_num in range(5): clients_selected_in_this_round = random.sample(all_clients, 10) federated_train_data = [ train_data.create_tf_dataset_for_client(c).repeat(10) for c in clients_selected_in_this_round] # Run the computation...

116

construct and post-process eager tf.data.Datasets for these clients

tff.learning

model_fn = lambda: tff.learning.from_keras_model( … )

119

tff.learning

absorb an existing Keras model for use in TFF


train = tff.learning.build_federated_averaging_process(model_fn)eval = tff.learning.build_federated_evaluation(model_fn)

120

tff.learning

TFF computations for training and evaluation



state = train.initialize()

121

tff.learning

create server state for the first round

122

tff.learning



state = train.initialize()for _ in range(5): client_data = …

loop over rounds, pick a slice of client data in each (as shown a few slides ago)

123

tff.learning



state = train.initialize()for _ in range(5): client_data = … state, metrics = train.next(state, client_data)

run a single round of training, produce new server state and mertrics

124

tff.learning



state = train.initialize()for _ in range(5): client_data = … state, metrics = train.next(state, client_data)

metrics = eval(state.model, ...)

extract the trained model and evaluate it

Modular framework for runtimes (in tff.framework)● Provided single-machine multi-threaded executor (shown in tutorials)

tff.framework.set_default_executor(

tff.framework.create_local_executor())

● More ready-to-use setups emerging○ GCP/GKE

● Can setup custom executor stacks from building blocks○ Multi-machine, multi-tier, GPU-enabled, etc.

● Can contribute executor components to the framework○ Abstract interface tff.framework.Executor○ Alternatively, the gRPC variant of this interface

Two kinds of approaches viable today:● Plug devices as components into TFF’s simulation runtime framework

○ e.g., as custom executors, via gRPC

● Use the (emerging) compiler toolset to generate executable artifacts○ e.g., see tff.backends.mapreduce

More deployment options on the way!

All that you’ve seen is open-source, available on GitHub● github.com/tensorflow/federated● tensorflow.org/federated

Many ways to contribute to the emerging TFF ecosystem● Apply the tff.learning API to existing ML models and data● Develop new federated algorithms using TFF abstractions● Help evolve core abstractions to make TFF more expressive● Help improve usability and evolve libraries built around TFF● Integrate with new backends to expand deployment options

http://github.com/tensorflow/federated

http://tensorflow.org/federated

federated training

clients

server

engineer

modeldeployment

What can theworld see?

admin

What can the server admin see?

What can the network see?

What can the server see?

What can thedevice see?

What can the engineer see?

Privacy Principle

Focused collection

Privacy Principle

Only-in-aggregaterelease

Improving privacy

Privacy Principle

No memorization of individuals’ data

Technology

Secure Aggregation

Technology

Private Retrieval

Technology

Differential Privacy

Privacy Principle

Minimize data exposure

Encryption at rest and on the wireLimit retention time

Compute on encrypted values

Privacy Principle

Anonymous / ephemeralcollection

Technology

Federated Learning

Technology

Federated Analytics

2020.02.06 workshop on federated learning and analytics

Documents