andreas hellander, salman toor department of information ... · 4/23/2019 · promises to let...
TRANSCRIPT
![Page 1: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/1.jpg)
Privacy-preserving federated machine learning
Andreas Hellander, Salman Toor
Department of Information Technology, Division of scientific computing, Uppsala University
Scaleout: www.scaleoutsystems.com
![Page 2: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/2.jpg)
Federated Machine Learning
● Federated Conformal Prediction
● Algorithms for FedML
● FedML security, Blockchain
Cloud Computing
https://www.it.uu.se/research/group/dca
Data Engineering Sciences
● Hierarchical analysis of spatial
and temporal image data
(HASTE)
● Parallel, peer-to-peer streaming
● Intelligent storage backends
● Continuous analytics
Distributed Computing Applications @ IT/UU
![Page 3: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/3.jpg)
Background
● Founded out of three research
teams at Uppsala University.
● Applied focus on large scale
production infrastructure in
computational science and
biotech research.
Expertise
● Cloud architecture
● Machine learning pipelines
● Continuous analytics
● Scientific data management
Cases
● SNIC cloud
● SciLifeLab
● IIS
● Rymdstyrelsen
● Safespring
Bridging the gap between research and production grade systems in machine learning
Scaleout
![Page 4: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/4.jpg)
The centralized ML paradigm
Data Store 1
Machine learning model
Data Store 2
Data Store 3
Central Data Store
Queries
Predictions
1. Centralize data from different
sources (data lake, cloud).
2. Create ML model using centralised
data (cluster computing)
![Page 5: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/5.jpg)
But in many cases we cannot move data
Private/Proprietary Data
Regulated Data
Big data
Central Data StoreMachine Learning
model
![Page 6: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/6.jpg)
How can parties construct joint ML models without sharing/pooling data?
![Page 7: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/7.jpg)
Federated machine learning
1. Train local machine
learning model on
local/private data.
2. Combine local model
updates into a global,
federated model.
![Page 8: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/8.jpg)
Smart software on top of decentralized infrastructure/instruments
● Let’s a supplier of physical infrastructure/instruments build smart software to support all clients.
● Calibration, predictive maintenance etc.
● Customer A’s data is never shared with Customer B, or with the supplier.
● High-value, unique software offering for those using the FedML services.
Federated Model
Software services
Federated learning systemInfrastructure vendor
![Page 9: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/9.jpg)
Integrity-preserving smart homes● Digital tools/video surveillance in
home care.
Train and deploy models based on homeowners’ private interactions without collecting central data.
![Page 10: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/10.jpg)
Integrity preserving fleet management
● Model driver/staff behavior without compromising their integrity.
● Big data, poor connectivity
By Éric Chassaing - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=8876959
![Page 11: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/11.jpg)
Key benefit of federated learning
Promises to let parties collaborate to
build stronger models than what could
be attained the parties in isolation.
● This examples uses incremental learning
of linear models to do FedML.
● Stochastic Gradient Descent.
● One of many possible approaches to
decentralized model construction.
N. Gauraha, O. Spjuth, A. Hellander (2019) manuscript in preparation)
![Page 12: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/12.jpg)
Standard ML on pooled data
Ok to share features?Ok to share
model/parameters?
Create one joint model
Combine predictions of separate model
Privacy-preserving/data protecting ML
No
Yes
No
Our focus area
Ok to share data?
No
Yes
Yes
![Page 13: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/13.jpg)
Example, FedML on gboard
● Local model for search
suggestion, with context and
whether suggestion was clicked
● On device the history is
processed, and then only a
model update is suggested to
● Based on Federated Averaging
https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
![Page 14: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/14.jpg)
Federated Averaging
From McMahan et al. https://arxiv.org/abs/1602.05629
1. Out of K alliance members/clients, pick a fraction C to do a global model update.
2. Perform E epochs of SGD on local minibatch of size B.
3. Average locally updated weights.
![Page 15: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/15.jpg)
Different ways to do FedML:
● Federated averaging with stochastic gradient descent● Using incremental learners● Ensemble methods ● Hybrids between the above
FedML taxonomy: https://docs.google.com/spreadsheets/d/1SCwwkS_tUw-yAVMJZltJSt3NmhA_w6JG7xNJYkE8ORs/edit#gid=0
![Page 16: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/16.jpg)
Privacy-preserving conformal prediction
Conformal prediction is a class of ML methods that give valid measures of model performance.
● Valid (based on a rigorous mathematical framework) prediction intervals/sets. ● Can be used with any standard machine learning method● No need for priors (unlike Bayesian learning)● Removes the need to talk about “domain of applicability”. ● Very interesting in the context of FedML since this class of methods gives a reliable way to measure
global model performance/improvement.
Ola SpjuthAssoc. Prof. at UU. Lead scientist AI at
Scaleout
Gauraha, N. and Spjuth, O. Synergy Conformal Prediction for Regression DiVA preprint. 1288708 (2019). URL: www.diva-portal.org/smash/get/diva2:1288708/FULLTEXT01.pdf
Gauraha, N. and Spjuth, O. Synergy Conformal Prediction DiVA preprint. 360504 (2018). URL: urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360504
![Page 17: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/17.jpg)
UN Handbook for Privacy-Preserving Techniques: https://docs.google.com/document/d/1GYu6UJI81jR8LgooXVDsYk1s6FlM-SbOvo3oLHglFhY/mobilebasic
Privacy-preservation properties of FedML?
● Input privacy simplified since data stays locally (handled according to local policies)
● Output privacy - depends on the algorithm, how easy it is to invert the model etc.
● What can be learned from the coordination of computation?
○ Different for federated averaging and ensemble methods.
![Page 18: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/18.jpg)
● Differential privacy (add noise to data)
● Homomorphic encryption (compute directly on encrypted data)
● Secure multiparty computation (emulate a trusted third party)
● Secure enclaves (a hardware solution to private computations)
Privacy & securityApart from “standard security” (data at rest and in transit), a number of techniques can be used to enhance privacy:
![Page 19: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/19.jpg)
Homomorphic encryption
● SEAL (Microsoft): https://github.com/Microsoft/SEAL
● HELib (IBM):https://github.com/shaih/HElib
● PALlSADE: https://git.njit.edu/palisade/PALISADE
Computations directly on encrypted data producing encrypted results.
● Outsourced secure computations. ● “Secure pooling of data”● Still not feasible for real world ML tasks.
● In FedML we do not need to outsource computations, except for parts such as secure aggregation of model weights / scores etc. For those parts of the algorithm, HE can be a viable option.
![Page 20: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/20.jpg)
Secure multiparty computation(Secure computation, MPC, privacy preserving computation)
● No trust amongst parties P● Do not want to trust a third party to compute f ● MPC deals with protocols to emulate a trusted third party. ● Highly active area of research, hard problem for large N and large fraction
of dishonest members.
Parties P_1 .. P_N each with private data x_1,..x_N want to compute y = f(x_1, .., X_N)
In FedML, see e.g. the PySyft project, MPC in PyTorch: https://github.com/OpenMined/PySyft
![Page 21: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/21.jpg)
Differential Privacy can protect against inference attacks
● Rigorous statistical technique to measuring and minimizing the privacy leakage from a statistical database.
● Add controlled noise to function we want to compute (e.g. Laplace mechanism).
● An interesting tradeoff between accuracy and the number of allowed queries to the model given epsilon.
● Related to the sensitivity of the function
Explored for FedML by e.g. Papernot et. al. https://arxiv.org/abs/1802.08908
![Page 22: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/22.jpg)
Private aggregation of teacher ensemblesPapernot et al., https://arxiv.org/pdf/1802.08908.pdf
![Page 23: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/23.jpg)
Differential privacy
● add noise to data (protects
against inference attacks)
Differential privacy & Homomorphic encryption in FedML
Homomorphic encryption
● Methods work on
encrypted data
Secure multiparty computation
● Aggregate/compute without a
third party trust provider/server.
![Page 24: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/24.jpg)
Backdooring federated learning
Bagdasaryan et al. How to backdoor federated learning (2019) https://arxiv.org/pdf/1807.00459.pdf
● Big threat to a FedML comes from within the alliance / from compromized members.
● Large alliances can be expected to be relatively robust to data poisoning attacks.
● Bagdasaryan et al. shows how their proposed approach of model replacement can efficiently introduce backdoors in a global model.
● Secure aggregation/MPC makes it impossible to detect a malicious model update, and who submitted it!
![Page 25: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/25.jpg)
What does it take to build a production federated learning system ?
● Decentralized computing / fog computing ● Information security/systems security expertise● Trust provider (third-party or decentralized protocol)● Machine learning algorithms adapted to the decentralized case● Protection against adversarial ML
○ Data poisoning○ Inference attacks○ …
A considerable increase in system and developer complexity compared to the standard paradigm!
![Page 26: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/26.jpg)
Research challengesFedML is a research area that spans many differents areas of computer science and
mathematics.
Scalability and ML performance
How do we (re)design algorithms and
frameworks to scale out to the fog and edge?
Decentralized computation
How can we do FedML without a third-party
trust provider?
Adversarial ML
How can we make the system robust to
dishonest members and external threats?
![Page 27: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/27.jpg)
Selection of ongoing research projects
● Privacy-preserving conformal prediction.● Federated online learning. ● Consensus protocols for decentralized model training. ● Performance and scalability of FedML using Blockchain.
Distributed Computing Applications (DCA) research arena at UU.
![Page 28: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/28.jpg)
Security and Trust in FedML
![Page 29: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/29.jpg)
Data privacy
● In federated machine learning environment, data never leaves the premises. Only the model parameters (or weights) are shared between federated members
● Data owners have complete control over the datasets
● The training of incoming models can be offline or online within the data owner’s secure environment
![Page 30: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/30.jpg)
Security● Different levels of security
○ Communication level
○ Service level
○ Host level
Example of communication security
Host Identity Protocol (HIP) Architecture
![Page 31: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/31.jpg)
Trust building mechanisms for FedML
● FedML is inherently a distributed system with full control over the local environment
● Less or zero control over distributed datasets
● Contributions from different federated members can make or break the global model
● A transparent and efficient FedML framework allows different parties to work together
![Page 32: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/32.jpg)
Blockchain Technology
![Page 33: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/33.jpg)
What is Blockchain?
● Blockchain technology enables distributed public ledgers that hold immutable data in a secure and encrypted way and ensure that records can never be altered
● The trust in the system does not arise from the relationship between parties or through an intermediary but from the technology and the process of comparison in the network
● Three major types: ○ Public chain (Decentralized)○ Private chain (Centralized)○ Consortium chain (Partially Decentralized)
![Page 34: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/34.jpg)
How Blockchain works?
● Block ○ contains important information
● Chain○ ensures that the content of the block remains trustworthy at all times
● Consensus algorithms○ Proof of Work (PoW) -> Resource hungry ○ Proof of Stake (PoS) -> Energy efficient○ Ripple -> Energy efficient
![Page 35: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/35.jpg)
Current implementations of Blockchain
● There are a number of software platforms based on blockchain technology that enable developers to build and deploy decentralized applications
○ Ethereum, popular because of its smart contract functionality○ EOS, combines security of Bitcon and smart contract functionality of
Ethereum○ LISK, a flexible implementation based on Javascript ○ ….
![Page 36: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/36.jpg)
Challenges and opportunities● In general, Blockchain technology has limited scalability
○ Limited number of transactions○ With large data blocks, system slowly leads to centralization○ Current limit on the block size is 1MB
● Most of the limitations are related to the public chains
● In case of FedML ★ Each federation has a limited scope based on a specific model training★ Consortium chains is a realistic approach for the model training ★ Blocks only hold important changes which reduce the size of the
complete chain
![Page 37: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/37.jpg)
Blockchain and FedML● We are working to design a new platform for FedML that will hold features
of the Blockchain technology
● The aim will be to provide security, auditability and checkpointing for global model training
● The platform will allow different stakeholders to jointly train models in a more transparent and secure manner
![Page 38: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/38.jpg)
Future perspective● Blockchain technology will allow different organizations to work more
efficiently ○ Small organizations often lack valuable datasets but are early adopters
of the new technologies ○ Large organizations often have huge datasets but are slow adapter of
the new technologies
● The use of smart contracts will add more functionalities in the system that can be used to build incentive based model training
● In future, a marketplace can be created based on usefulness and transparency of machine learning models
![Page 39: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/39.jpg)
Federated learning in production?
Secure model communication,
anomaly detection, etc.
API Federated components
Global model serving
ML pipeline
APIML pipeline
APIML pipeline
![Page 40: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/40.jpg)
Scaleout Studio | Developing Scaleout Store | Package & Deploying Scaleout Serve | Serving
Scaleout Federated Platform
ML studio
- Ingestion- Prepare & Analyse Data- Modeling & Testing- Training
ML workflow automation
- Automated ML Studio Pipelines
API
APIModel management
- Versioning- Annotation- Storage- Distribution
APIModel
serving
- Scaling- LB- SLA/OLA
Monitoring & Visualizations
API
API
Endpoint registry
Scaleout Federated Platform
Graphical User Interface Incl Pipeline Visualization
Au
then
tica
tio
n a
nd
Au
tho
riza
tio
n
Model Sharing
Joint Training
Federation Orchestration
Federation Identity & Security
Federation Cross Validation & Holdout Set
![Page 41: Andreas Hellander, Salman Toor Department of Information ... · 4/23/2019 · Promises to let parties collaborate to build stronger models than what could be attained the parties](https://reader036.vdocument.in/reader036/viewer/2022070907/5f7bd647361ca832a528fdd8/html5/thumbnails/41.jpg)
To learn more about our Scaleout work on production FedML, see MVP
presented at TestaCenter:
https://www.youtube.com/watch?v=K-JUNkAYs-4