federated learning for healthcare informatics · healthcare data sources without ignoring the...

Noname manuscript No.(will be inserted by the editor)

Federated Learning for Healthcare Informatics

Jie Xu · Benjamin S. Glicksberg ·Chang Su · Peter Walker · Jiang Bian ·Fei Wang∗

Received: date / Accepted: date

Abstract With the rapid development of computer software and hardwaretechnologies, more and more healthcare data are becoming readily availablefrom clinical institutions, patients, insurance companies and pharmaceuticalindustries, among others. This access provides an unprecedented opportunityfor data science technologies to derive data-driven insights and improve thequality of care delivery. Healthcare data, however, are usually fragmented andprivate making it difficult to generate robust results across populations. Forexample, different hospitals own the electronic health records (EHR) of differ-ent patient populations and these records are difficult to share across hospitalsbecause of their sensitive nature. This creates a big barrier for developing ef-fective analytical approaches that are generalizable, which need diverse, “bigdata". Federated learning, a mechanism of training a shared global model witha central server while keeping all the sensitive data in local institutions wherethe data belong, provides great promise to connect the fragmented healthcaredata sources with privacy-preservation. The goal of this survey is to provide areview for federated learning technologies, particularly within the biomedicalspace. In particular, we summarize the general solutions to the statistical chal-

J. Xu, C. Su and F. Wang∗Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NewYork, USAE-mail: [email protected]

B. S. GlicksbergInstitute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, New York,USA

P. WalkerU.S. Department of Defense Joint Artificial Intelligence Center, Washington, D.C., USA

J. BianDepartment of Health Outcomes and Biomedical Informatics, College of Medicine. Univer-sity of Florida, Gainesville, Florida, USA.

arX

iv:1

911.

0627

0v2

[cs

.LG

] 2

0 A

ug 2

020

2 Jie Xu et al.

lenges, system challenges and privacy issues in federated learning, and pointout the implications and potentials in healthcare.

Keywords Federated Learnin · Healthcare · Privacy

1 Introduction

The recent years have witnessed a surge of interest related to healthcare dataanalytics, due to the fact that more and more such data are becoming readilyavailable from various sources including clinical institutions, patient individ-uals, insurance companies and pharmaceutical industries, among others. Thisprovides an unprecedented opportunity for the development of computationaltechniques to dig data-driven insights for improving the quality of care delivery[93,64].

Healthcare data are typically fragmented because of the complicated natureof the healthcare system and processes. For example, different hospitals maybe able to access the clinical records of their own patient populations only.These records are highly sensitive with protected health information (PHI)of individuals. Rigorous regulations, such as the Health Insurance Portabilityand Accountability Act (HIPAA) [26], have been developed to regulate theprocess of accessing and analyzing such data. This creates a big challenge formodern data mining and machine learning (ML) technologies, such as deeplearning [53], which typically requires a large amount of training data.

Federated learning is a paradigm with a recent surge in popularity as itholds great promise on learning with fragmented sensitive data. Instead ofaggregating data from different places all together, or relying on the traditionaldiscovery then replication design, it enables training a shared global modelwith a central server while keeping the data in local institutions where thethey originate.

The term “federated learning" is not new. In 1976, Patrick Hill, a philos-ophy professor, first developed the Federated Learning Community (FLC) tobring people together to jointly learn, which helped students overcome theanonymity and isolation in large research universities [34]. Subsequently, therewere several efforts aiming at building federations of learning content and con-tent repositories [75,66,5]. In 2005, Rehak et al. [75] developed a referencemodel describing how to establish an interoperable repository infrastructureby creating federations of repositories, where the metadata are collected fromthe contributing repositories into a central registry provided with a single pointof discovery and access. The ultimate goal of this model is to enable learningfrom diverse content repositories. These practices in federated learning com-munity or federated search service have provided effective references for thedevelopment of federated learning algorithms.

Federated learning holds great promises on healthcare data analytics. Forboth provider (e.g., building a model for predicting the hospital readmissionrisk with patient Electronic Health Records (EHR) [63]) and consumer (pa-tient) based applications (e.g., screening atrial fibrillation with electrocardio-

Federated Learning for Healthcare Informatics 3

grams captured by smartwatch [71]), the sensitive patient data can stay eitherin local institutions or with individual consumers without going out duringthe federated model learning process, which effectively protects the patientprivacy. The goal of this paper is to review the setup of federated learning,discuss the general solutions and challenges, as well as envision its applicationsin healthcare.

In this review, after a formal overview of federated learning, we summarizethe main challenges and recent progress in this field. Then we illustrate thepotential of federated learning methods in healthcare by describing the suc-cessful recent research. At last, we discuss the main opportunities and openquestions for future applications in healthcare.

𝐰! ∶= $"#$

%𝑛"𝑛𝐰"!

𝐰"!𝐰! 1

2

3

Local model

Local data

Training local model

Uploading localmodel to server

Delivering new global model

Security control

Central server

Model aggregation

Fig. 1: Schematic of the federated learning framework. The model istrained in a distributed manner: The institutions periodically communicatethe local updates with a central server to learn a global model. The centralserver aggregates the updates and sends back the parameters of the updatedglobal model.

Difference with Existing Reviews There has been a few review arti-cles on federated learning recently. For example, Yang et al. [97] wrote theearly federated learning survey summarizing the general privacy-preservingtechniques that can be applied to federated learning. Some researchers sur-veyed sub-problems of federated learning, e.g., personalization techniques [51],semi-supervised learning algorithms [41], threat models [60], mobile edge net-works [58]. Kairouz et al. [43] discussed recent advances and presented anextensive collection of open problems and challenges. Li et al. [55] conductedthe review on federated learning from a system viewpoint. Different from thosereviews, this paper provided the potential of federated learning to be appliedin healthcare. We summarized the general solution to the challenges in feder-ated learning scenario and surveyed a set of representative federated learning

4 Jie Xu et al.

methods for healthcare. In the last part of this review, we outlined some direc-tions or open questions in federated learning for healthcare. An early versionof this paper is available on arXiv [95].

2 Federated Learning

Federated learning is a problem of training a high-quality shared global modelwith a central server from decentralized data scattered among large numberof different clients (Fig. 1). Mathematically, assume there are K activatedclients where the data reside in (a client could be a mobile phone, a wearabledevice, or a clinical institution data warehouse, etc). Let Dk denote the datadistribution associated to client k and nk the number of samples available fromthat client. n =

∑Kk=1 nk is the total sample size. Federated learning problem

boils down to solving a empirical risk minimization problem of the form [48,49,61]:

minw∈Rd

F (w) :=K∑

k=1

nk

nFk(w) where Fk(w) :=

1

nk

∑xi∈Dk

fi(w), (1)

where w is the model parameter to be learned.In particular, algorithms for federated learning face with a number of chal-

lenges [87,12], specifically:

– Statistical: The data distribution among all clients differ greatly, i.e.,∀k 6= k, we have Exi∼Dk

[fi(w;xi)] 6= Exi∼Dk[fi(w;xi)]. It is such that any

data points available locally are far from being a representative sample ofthe overall distribution, i.e., Exi∼Dk

[fi(w;xi)] 6= F (w).– Communication: The number of clients K is large and can be much

bigger than the average number of training sample stored in the activatedclients, i.e., K � (n/K).

– Privacy and Security: Additional privacy protections are needed for un-reliable participating clients. It is impossible to ensure none of the millionsof clients are malicious.

Next, we will survey, in detail, the existing federated learning related workson handling such challenges.

2.1 Statistical Challenges of Federated Learning

The naive way to solve the federated learning problem is through FederatedAveraging (FedAvg) [61]. It is demonstrated can work with certain non inde-pendent identical distribution (non-IID) data by requiring all the clients toshare the same model. However, FedAvg does not address the statistical chal-lenge of strongly skewed data distributions. The performance of convolutionalneural networks trained with FedAvg algorithm can reduce significantly dueto the weight divergence [98]. Existing research on dealing with the statisticalchallenge of federated learning can be grouped into two fields, i.e., consensussolution and pluralistic solution.


2.1.1 Consensus Solution

Most centralized models are trained on the aggregated training samples ob-tained from the samples drawn from the local clients [87,98]. Intrinsically, thecentralized model is trained to minimize the loss with respect to the uniformdistribution [65]: D =

∑Kk=1

nk

n Dk, where D is the target data distributionfor the learning model. However, this specific uniform distribution is not anadequate solution in most scenarios.

To address this issue, the recent proposed solution is to model the tar-get distribution or force the data adapt to the uniform distribution [98,65].Specifically, Mohri et al. [65] proposed a minimax optimization scheme, i.e.,agnostic federated learning (AFL), where the centralized model is optimizedfor any possible target distribution formed by a mixture of the client distribu-tions. This method has only been applied at small scales. Compared to AFL,Li et al. [56] proposed q-Fair Federated Learning (q-FFL), assigning higherweight to devices with poor performance, so that the distribution of accu-racy in the network reduces in variance. They empirically demonstrate theimproved flexibility and scalability of q-FFL compared to AFL.

Another commonly used method is globally sharing a small portion of databetween all the clients [98,67]. The shared subset is required containing a uni-form distribution over classes from the central server to the clients. In additionto handle non-IID issue, sharing information of a small portion of trusted in-stances and noise patterns can guide the local agents to select compact trainingsubset, while the clients learn to add changes to selected data samples, in orderto improve the test performance of the global model [30].

2.1.2 Pluralistic Solution

Generally, it is difficult to find a consensus solution w that is good for allcomponents Di. Instead of wastefully insisting on a consensus solution, manyresearchers choose to embracing this heterogeneity.

Multi-task learning (MTL) is a natural way to deal with the data drawnfrom different distributions. It directly captures relationships amongst non-IIDand unbalanced data by leveraging the relatedness between them in compar-ison to learn a single global model. In order to do this, it is necessary totarget a particular way in which tasks are related, e.g. sharing sparsity, shar-ing low-rank structure, graph-based relatedness and so forth. Recently, Smithet al. [87] empirically demonstrated this point on real-world federated datasetsand proposed a novel method MOCHA to solve a general convex MTL prob-lem with handling the system challenges at the same time. Later, Corinziaet al. [17] introduced VIRTUAL, an algorithm for federated multi-task learn-ing with non-convex models. They consider the federation of central serverand clients as a Bayesian network and perform training using approximatedvariational inference. This work bridges the frameworks of federated and trans-fer/continuous learning.

6 Jie Xu et al.

The success of multi-task learning rests on whether the chosen relatednessassumptions hold. Compared to this, pluralism can be a critical tool for deal-ing with heterogeneous data without any additional or even low-order termsthat depend on the relatedness as in MTL [22]. Eichner et al. [22] consideredtraining in the presence of block-cyclic data, and showed that a remarkablysimple pluralistic approach can entirely resolve the source of data heterogene-ity. When the component distributions are actually different, pluralism canoutperform the “ideal” IID baseline.

a Model compression b Client selection

c Update reducing d Peer-to-peer learning

Fig. 2: Communication efficient federated learning methods. Exist-ing research on improving communication efficiency can be categorized into amodel compression, b clients selection, c updates reducing and d peer-to-peerlearning.

2.2 Communication Efficiency of Federated Learning

In federated learning setting, training data remain distributed over a largenumber of clients each with unreliable and relatively slow network connec-tions. Naively for synchronous protocol in federated learning [87,50], the totalnumber of bits that required during uplink (clinets → server) and downlink(server → clients) communication by each of the K clients during training aregiven by

Bup/down ∈ O(U × |w| × (H(4wup/down) + β)︸︷︷︸update size

) (2)

where U is the total number of updates performed by each client, |w| is thesize of the model and H(4wup/down) is the entropy of the weight updatesexchanged during transmitting process. β is the difference between the true


update size and the minimal update size (which is given by the entropy) [81].Apparently, we can consider three ways to reduce the communication cost:a) reduce the number of clients K, b) reduce the update size, c) reduce thenumber of updates U . Starting at these three points, we can organize exist-ing research on communication-efficient federated learning into four groups,i.e., model compression, clients selection, updates reducing and peer-to-peerlearning (Fig. 2).

2.2.1 Client Selection.

The most natural and rough way for reducing communicationn cost is to re-strict the participated clients or choose a fraction of parameters to be updatedat each round. Shokri et al. [84] use the selective stochastic gradient descentprotocol, where the selection can be completely random or only the parame-ters whose current values are farther away from their local optima are selected,i.e., those that have a larger gradient. Nishio et al. [67] proposed a new pro-tocol referred to as FedCS, where the central server manage the resources ofheterogeneous clients and determine which clients should participate the cur-rent training task by analyzing the resource information of each client, such aswireless channel states, computational capacities and the size of data resourcesrelevant to the current task. Here, the server should decide how much data,energy and CPU resources used by the mobile devices such that the energyconsumption, training latency, and bandwidth cost are minimized while meet-ing requirements of the training tasks. Anh [4] thus propose to use the DeepQ-Learning [91] technique that enables the server to find the optimal data andenergy management for the mobile devices participating in the mobile crowd-machine learning through federated learning without any prior knowledge ofnetwork dynamics.

2.2.2 Model Compression

The goal of model compression is to compress the server-to-client exchanges toreduce uplink/downlink communication cost. The first way is through struc-tured updates, where the update is directly learned from a restricted spaceparameterized using a smaller number of variables, e.g. sparse, low-rank [50],or more specifically, pruning the least useful connections in a network [29,100],weight quantization [15,81], and model distillation [35]. The second way is lossycompression, where a full model update is first learned and then compressedusing a combination of quantization, random rotations, and subsampling be-fore sending it to the server [50,2]. Then the server decodes the updates beforedoing the aggregation.

Federated dropout, in which each client, instead of locally training an up-date to the whole global model, trains an update to a smaller sub-model [11].These sub-models are subsets of the global model and, as such, the computedlocal updates have a natural interpretation as updates to the larger globalmodel. Federated dropout not only reduces the downlink communication but

8 Jie Xu et al.

also reduces the size of uplink updates. Moreover, the local computationalcosts is correspondingly reduced since the local training procedure dealingwith parameters with smaller dimensions.

Answer 1

Answer 2

QueryNearly

indistinguishable

Sharing Computation Reconstruction

a Secure multi-party computation b Differential privacy

Fig. 3: Privacy-preserving schemes. a Secure multi-party computation.In security sharing, security values (blue and yellow pie) are split into anynumber of shares that are distributed among the computing nodes. Duringthe computation, no computation node is able to recover the original valuenor learn anything about the output (green pie). Any nodes can combine theirshares to reconstruct the original value. b Differential privacy. It guaranteesthat anyone seeing the result of a differentially private analysis will make thesame inference (Answer 1 and Answer 2 are nearly indistinguishable).

2.2.3 Updates Reduction

Kamp et al. [44] proposed to average models dynamically depending on theutility of the communication, which leads to a reduction of communicationby an order of magnitude compared to periodically communicating state-of-the-art approaches. This facet is well suited for massively distributed systemswith limited communication infrastructure. Bui et al. [10] improved federatedlearning for Bayesian neural networks using partitioned variational inference,where the client can decide to upload the parameters back to the central serverafter multiple passes through its data, after one local epoch, or after justone mini-batch. Guha et al. [28] focused on techniques for one-shot federatedlearning, in which they learn a global model from data in the network usingonly a single round of communication between the devices and the centralserver. Besides above works, Ren et al. [76] theoretically analyzed the detailedexpression of the learning efficiency in the CPU scenario and formulate atraining acceleration problem under both communication and learning resourcebudget. Reinforcement learning and round robin learning are widely used tomanage the communication and computation resources [4,94,101,38].


2.2.4 Peer-to-Peer Learning

In federated learning, a central server is required to coordinate the trainingprocess of the global model. However, the communication cost to the centralserver may be not affordable since a large number of clients are usually in-volved. Also, many practical peer-to-peer networks are usually dynamic, andit is not possible to regularly access a fixed central server. Moreover, becauseof the dependence on central server, all clients are required to agree on onetrusted central body, and whose failure would interrupt the training processfor all clients. Therefore, some researches began to study fully decentralizedframework where the central server is not required [83,77,52,33]. The localclients are distributed over the graph/network where they only communicatewith their one-hop neighbors. Each client updates its local belief based on owndata, then aggregates information from the one-hop neighbors.

2.3 Privacy and Security

In federated learning, we usually assume the number of participated clients(e.g., phones, cars, clinical institutions...) is large, potentially in the thou-sands or millions. It is impossible to ensure none of the clients are malicious.The setting of federated learning, where the model is trained locally withoutrevealing the input data or the model’s output to any clients, prevents di-rect leakage while training or using the model. However, the clients may infersome information about another client’s private dataset given the execution off(w), or over the shared predictive model w [90]. To this end, there have beenmany efforts focus on privacy either from an individual point of view or mul-tiparty views, especially in social media field which significantly exacerbatedmultiparty privacy (MP) conflicts [89,88].

2.3.1 Secure Multi-Party Computation

Secure multi-party computation (SMC) has a natural application to federatedlearning scenarios, where each individual client use a combination of cryp-tographic techniques and oblivious transfer to jointly compute a function oftheir private data [70,7]. Homomorphic encryption is a public key system,where any party can encrypt its data with a known public key and performcalculations with data encrypted by others with the same public key [23]. Dueto its success in cloud computing, it comes naturally into this realm, and ithas certainly been used in many federated learning researches [32,13].

Although SMC guarantees that none of the parties share anything witheach other or with any third party, it can not prevent an adversary fromlearning some individual information, e.g., which clients’ absence might changethe decision boundary of a classifier, etc. Moreover, SMC protocols are usu-ally computationally expensive even for the simplest problems, requiring iter-

10 Jie Xu et al.

ated encryption/decryption and repeated communication between participantsabout some of the encrypted results [70].

Table 1: Summary of recent work on federated learning for healthcare

Problem ML Method # Hospital DataPatient Similarity Learning [54] Hashing 3 MIMIC-III [42]Patient Similarity Learning [96] Hashing 20 MIMIC-IIIPhenotyping [47] *TF 1-5 MIMIC-III, UCSD [92]Phenotyping [59] NLP 10 MIMIC-III

Representation Learning [85] PCA 10-100 ADNI, UK Biobank,PPMI, MIRIAD

Mortality Prediction [37] Autoencoder 5-50 eICU CollaborativeResearch Database [73]

Hospitalizations Prediction [9] SVM 5, 10 Boston Medical CenterPreterm-birth Prediction [8] RNN 50 Cerner Health Facts

Mortality Prediction [72] LR, NN 31 eICU CollaborativeResearch Database

Mortality Prediction [82] LR, MLP 2 MIMIC-III*TF: Tensor Factorization, MLP: Multi-layer Perceptron

2.3.2 Differential Privacy

Differential privacy (DP) [20] is an alternative theoretical model for protect-ing the privacy of individual data, which has been widely applied to manyareas, not only traditional algorithms, e.g. boosting [21], principal componentanalysis [14], support vector machine [78], but also deep learning research [1,62]. It ensures that the addition or removal does not substantially affect theoutcome of any analysis, and is thus also widely studied in federated learningresearch to prevent the indirect leakage [84,62,1]. However, DP only protectsusers from data leakage to a certain extent, and may reduce performance inprediction accuracy because it is a lossy method [16]. Thus, some researcherscombine DP with SMC to reduce the growth of noise injection as the number ofparties increases without sacrificing privacy while preserving provable privacyguarantees, protecting against extraction attacks and collusion threats [16,90].

3 Applications

Federated learning has been incorporated and utilized in many domains. Thiswidespread adoption is due in part by the fact that it enables a collabora-tive modeling mechanism that allows for efficient ML all while ensuring dataprivacy and legal compliance between multiple parties or multiple computingnodes. Some promising examples that highlight these capabilities are virtualkeyboard prediction [31,62], smart retail [99], finance [97], vehicle-to-vehiclecommunication [80]. In this section, we focus primarily on applications withinthe healthcare space, but also discuss promising applications in other domainssince some principles can be applied to healthcare.


3.1 Healthcare

EHRs have emerged as a crucial source of real world healthcare data that hasbeen used for an amalgamation of important biomedical research [24,39], in-cluding for machine learning research [64]. While providing a huge amount ofpatient data for analysis, EHRs contain systemic and random biases overalland specific to hospitals that limit the generalizability of results. For example,Obermeyer et al. [68] found that a commonly used algorithm to determine en-rollment in specific health programs were biased against African Americans,assigning the same level of risk to healthier Caucasian patients. These im-properly calibrated algorithms can arise due to a variety of reasons, such asdifferences in underlying access to care or low representation in training data.It is clear that one way to alleviate the risk for such biased algorithms is theability to learn from EHR data that is more representative of the global pop-ulation and which goes beyond a single hospital or site. Unfortunately, dueto a myriad of reasons such as discrepant data schemes and privacy concerns,it is unlikely that data will eve be connected together in a single database tolearn from all at once. The creation and utility of standardized common datamodels, such as OMOP [36], allow for more wide-spread replication analysesbut it does not overcome the limitations of joint data access. As such, it isimperative that alternative strategies emerge for learning from multiple EHRdata sources that go beyond the common discovery-replication framework.Federated learning might be the tool to enable large-scale representative MLof EHR data and we discuss many studies which demonstrate this fact below.

Federated learning is a viable method to connect EHR data from medicalinstitutions, allowing them to share their experiences, and not their data, witha guarantee of privacy [8,27,74,19,57,37]. In these scenarios, the performanceof ML model will be significantly improved by the iterative improvements oflearning from large and diverse medical data sets. There have been some taskswere studied in federated learning setting in healthcare, e.g., patient similaritylearning [54], patient representation learning, phenotyping [47,59], predictivemodeling [9,37,82], etc. Summary of these work is listed in Table 1.

3.2 Others

An important application of federated learning is for natural language pro-cessing (NLP) tasks. When Google first proposed federated learning conceptin 2016, the application scenario is Gboard - a virtual keyboard of Googlefor touchscreen mobile devices with support for more than 600 language vari-eties [31,62]. Indeed, as users increasingly turn to mobile devices, fast mobileinput methods with auto-correction, word completion, and next-word predic-tion features are becoming more and more important. For these NLP tasks,especially next-word prediction, typed text in mobile apps are usually betterthan the data from scanned books or speech-to-text in terms of aiding typingon a mobile keyboard. However, these language data often contain sensitive

12 Jie Xu et al.

information, e.g., passwords, search queries, or text messages with personal in-formation. Therefore, federated learning has a promising application in NLPlike virtual keyboard prediction [31,62,6].

Table 2: Popular tools for federated learning research

Project Name Developer Description

PySyft [79] OpenMinedIt decouples private data from model training usingfederated learning, DP and MPC within PyTorch.TensorFlow bindings is also available [69].

TFF [25] GoogleWith TFF, TensorFlow provides users with a flexibleand open framework through which they can simulatedistributed computing locally.

FATE [3] WebankFATE support the Federated AI ecosystem, where asecure computing protocol is implemented based onhomomorphic encryption and MPC.

Tensor/IO [18] Dow et al.

Tensor/IO is a lightweight cross-platform library foron-device machine learning, bringing the power of Ten-sorFlow and TensorFlow Lite to iOS, Android, andReact native applications.

Other applications include smart retail [99], finance [46] and so on. Specifi-cally, smart retail aims to use machine learning technology to provide personal-ized services to customers based on data like user purchasing power and prod-uct characteristics for product recommendation and sales services. In termsof financial applications, Tencent’s WeBank leverages federated learning tech-nologies for credit risk management, where several Banks could jointly gen-erate a comprehensive credit score for a customer without sharing his or herdata [97]. With the growth and development of federated learning, there aremany companies or research teams that have carried out various tools ori-ented to scientific research and product development. Popular ones are listedin Table 2.

4 Conclusions and Open Questions

In this survey, we review the current progress on federated learning including,but not limited to healthcare field. We summarize the general solutions to thevarious challenges in federated learning and hope to provide a useful resourcefor researchers to refer. Besides the summarized general issues in federatedlearning setting, we list some probably encountered directions or open ques-tions when federated learning is applied in healthcare area in the following.

– Data Quality. Federated learning has the potential to connect all theisolated medical institutions, hospitals or devices to make them share theirexperiences with privacy guarantee. However, most health systems sufferfrom data clutter and efficiency problems. The quality of data collectedfrom multiple sources is uneven and there is no uniform data standard.


The analyzed results are apparently worthless when dirty data are acci-dentally used as samples. The ability to strategically leverage medical datais critical. Therefore, how to clean, correct and complete data and accord-ingly ensure data quality is a key to improve the machine learning modelweather we are dealing with federated learning scenario or not.

– Incorporating Expert Knowledge. In 2016, IBM introduced Watsonfor Oncology, a tool that uses the natural language processing systemto summarize patients’ electronic health records and search the powerfuldatabase behind it to advise doctors on treatments. Unfortunately, someoncologists say they trust their judgment more than Watson tells themwhat needs to be done 1. Therefore, hopefully doctors will be involved inthe training process. Since every data set collected here cannot be of highquality, so it will be very helpful if the standards of evidence-based ma-chine is introduced, doctors will also see the diagnostic criteria of artificialintelligence. If wrong, doctors will give further guidance to artificial intel-ligence to improve the accuracy of machine learning model during trainingprocess."

– Incentive Mechanisms. With the internet of things and the variety ofthird party portals, a growing number of smartphone healthcare apps arecompatible with wearable devices. In addition to data accumulated in hos-pitals or medical centers, another type of data that is of great value iscoming from wearable devices not only to the researchers, but more impor-tantly for the owners. However, during federated model training process,the clients suffer from considerable overhead in communication and com-putation. Without well-designed incentives, self-interested mobile or otherwearable devices will be reluctant to participate in federal learning tasks,which will hinder the adoption of federated learning [45]. How to design anefficient incentive mechanism to attract devices with high-quality data tojoin federated learning is another important problem.

– Personalization. Wearable devices are more focus on public health, whichmeans helping people who are already healthy to improve their health, suchas helping them exercise, practice meditation and improve their sleep qual-ity. How to assist patients to carry out scientifically designed personalizedhealth management, correct the functional pathological state by examiningindicators, and interrupt the pathological change process are very impor-tant. Reasonable chronic disease management can avoid emergency visitsand hospitalization and reduce the number of visits. Cost and labor savings.Although there are some general work about federated learning personal-ization [86,40], for healthcare informatics, how to combining the medicaldomain knowledge and make the global model be personalized for everymedical institutions or wearable devices is another open question.

– Model Precision. Federated tries to make isolated institutions or devicesshare their experiences, and the performance of machine learning model willbe significantly improved by the formed large medical dataset. However,

1 http://news.moore.ren/industry/158978.htm

14 Jie Xu et al.

the prediction task is currently restricted and relatively simple. Medicaltreatment itself is a very professional and accurate field. Medical devicesin hospitals have incomparable advantages over wearable devices. And themodels of Doc.ai could predict the phenome collection of one’s biometricdata based on its selfie, such as height, weight, age, sex and BMI2. Howto improve the prediction model to predict future health conditions is def-initely worth exploring.

Acknowledgements The work is supported by ONR N00014-18-1-2585 and NSF 1750326.

Conflict of interest

The authors declare that they have no conflict of interest.

References

1. Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang,L.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security, pp. 308–318. ACM (2016)

2. Agarwal, N., Suresh, A.T., Yu, F.X.X., Kumar, S., McMahan, B.: cpsgd:Communication-efficient and differentially-private distributed sgd. In: Advances inNeural Information Processing Systems, pp. 7564–7575 (2018)

3. AI, W.: Federated ai technology enabler. https://www.fedai.org/cn/ (2019)4. Anh, T.T., Luong, N.C., Niyato, D., Kim, D.I., Wang, L.C.: Efficient training man-

agement for mobile crowd-machine learning: A deep reinforcement learning approach.IEEE Wireless Communications Letters (2019)

5. Barcelos, C., Gluz, J., Vicari, R.: An agent-based federated learning object searchservice. Interdisciplinary journal of e-learning and learning objects 7(1), 37–54 (2011)

6. Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., Kiddon,C., Konecny, J., Mazzocchi, S., McMahan, H.B., et al.: Towards federated learning atscale: System design. arXiv preprint arXiv:1902.01046 (2019)

7. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H.B., Patel, S.,Ramage, D., Segal, A., Seth, K.: Practical secure aggregation for privacy-preservingmachine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computerand Communications Security, pp. 1175–1191. ACM (2017)

8. Boughorbel, S., Jarray, F., Venugopal, N., Moosa, S., Elhadi, H., Makhlouf, M.: Fed-erated uncertainty-aware learning for distributed hospital ehr data. arXiv preprintarXiv:1910.12191 (2019)

9. Brisimi, T.S., Chen, R., Mela, T., Olshevsky, A., Paschalidis, I.C., Shi, W.: Federatedlearning of predictive models from federated electronic health records. Internationaljournal of medical informatics 112, 59–67 (2018)

10. Bui, T.D., Nguyen, C.V., Swaroop, S., Turner, R.E.: Partitioned variational inference:A unified framework encompassing federated and continual learning. arXiv preprintarXiv:1811.11206 (2018)

11. Caldas, S., Konečny, J., McMahan, H.B., Talwalkar, A.: Expanding the reachof federated learning by reducing client resource requirements. arXiv preprintarXiv:1812.07210 (2018)

12. Caldas, S., Wu, P., Li, T., Konečny, J., McMahan, H.B., Smith, V., Talwalkar, A.:Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097 (2018)

13. Chai, D., Wang, L., Chen, K., Yang, Q.: Secure federated matrix factorization (2019)

2 https://doc.ai/blog/do-you-know-how-valuable-your-medical-da/

https://www.fedai.org/cn/


14. Chaudhuri, K., Sarwate, A.D., Sinha, K.: A near-optimal algorithm for differentially-private principal components. The Journal of Machine Learning Research 14(1), 2905–2943 (2013)

15. Chen, Y., Sun, X., Jin, Y.: Communication-efficient federated deep learning withasynchronous model update and temporally weighted aggregation. arXiv preprintarXiv:1903.07424 (2019)

16. Cheng, K., Fan, T., Jin, Y., Liu, Y., Chen, T., Yang, Q.: Secureboost: A losslessfederated learning framework. arXiv preprint arXiv:1901.08755 (2019)

17. Corinzia, L., Buhmann, J.M.: Variational federated multi-task learning. arXiv preprintarXiv:1906.06268 (2019)

18. doc.ai: Declarative, on-device machine learning for ios, android, and react native.https://github.com/doc-ai/tensorio (2019)

19. Duan, R., Boland, M.R., Liu, Z., Liu, Y., Chang, H.H., Xu, H., Chu, H., Schmid,C.H., Forrest, C.B., Holmes, J.H., et al.: Learning from electronic health records acrossmultiple sites: A communication-efficient and privacy-preserving distributed algorithm.Journal of the American Medical Informatics Association 27(3), 376–385 (2020)

20. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves:Privacy via distributed noise generation. In: Annual International Conference on theTheory and Applications of Cryptographic Techniques, pp. 486–503. Springer (2006)

21. Dwork, C., Rothblum, G.N., Vadhan, S.: Boosting and differential privacy. In: 2010IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 51–60. IEEE(2010)

22. Eichner, H., Koren, T., McMahan, H.B., Srebro, N., Talwar, K.: Semi-cyclic stochasticgradient descent. arXiv preprint arXiv:1904.10120 (2019)

23. Fontaine, C., Galand, F.: A survey of homomorphic encryption for nonspecialists.EURASIP Journal on Information Security 2007, 15 (2007)

24. Glicksberg, B.S., Johnson, K.W., Dudley, J.T.: The next generation of precisionmedicine: observational studies, electronic health records, biobanks and continuousmonitoring. Human molecular genetics 27(R1), R56–R62 (2018)

25. Google: Tensorflow federated. https://www.tensorflow.org/federated (2019)26. Gostin, L.O.: National health information privacy: regulations under the health insur-

ance portability and accountability act. JAMA 285(23), 3015–3021 (2001)27. Gruendner, J., Schwachhofer, T., Sippl, P., Wolf, N., Erpenbeck, M., Gulden, C., Kap-

sner, L.A., Zierk, J., Mate, S., Stürzl, M., et al.: Ketos: Clinical decision support andmachine learning as a service–a training and deployment platform based on docker,omop-cdm, and fhir web services. PloS one 14(10) (2019)

28. Guha, N., Talwalkar, A., Smith, V.: One-shot federated learning. arXiv preprintarXiv:1902.11175 (2019)

29. Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural net-works with pruning, trained quantization and huffman coding. arXiv preprintarXiv:1510.00149 (2015)

30. Han, Y., Zhang, X.: Robust federated training via collaborative machine teaching usingtrusted instances. arXiv preprint arXiv:1905.02941 (2019)

31. Hard, A., Rao, K., Mathews, R., Beaufays, F., Augenstein, S., Eichner, H., Kiddon,C., Ramage, D.: Federated learning for mobile keyboard prediction. arXiv preprintarXiv:1811.03604 (2018)

32. Hardy, S., Henecka, W., Ivey-Law, H., Nock, R., Patrini, G., Smith, G., Thorne, B.:Private federated learning on vertically partitioned data via entity resolution and ad-ditively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017)

33. He, C., Tan, C., Tang, H., Qiu, S., Liu, J.: Central server free federated learning oversingle-sided trust social networks. arXiv preprint arXiv:1910.04956 (2019)

34. Hill, P.: The rationale for learning communities and learning community models. (1985)35. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv

preprint arXiv:1503.02531 (2015)36. Hripcsak, G., Duke, J.D., Shah, N.H., Reich, C.G., Huser, V., Schuemie, M.J., Suchard,

M.A., Park, R.W., Wong, I.C.K., Rijnbeek, P.R., et al.: Observational health datasciences and informatics (ohdsi): opportunities for observational researchers. Studiesin health technology and informatics 216, 574 (2015)

https://github.com/doc-ai/tensorio

https://www.tensorflow.org/federated

16 Jie Xu et al.

37. Huang, L., Liu, D.: Patient clustering improves efficiency of federated machine learningto predict mortality and hospital stay time using distributed electronic medical records.arXiv preprint arXiv:1903.09296 (2019)

38. Ickin, S., Vandikas, K., Fiedler, M.: Privacy preserving qoe modeling using collaborativelearning. arXiv preprint arXiv:1906.09248 (2019)

39. Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards betterresearch applications and clinical care. Nature Reviews Genetics 13(6), 395–405 (2012)

40. Jiang, Y., Konečny, J., Rush, K., Kannan, S.: Improving federated learning personal-ization via model agnostic meta learning. arXiv preprint arXiv:1909.12488v1 (2019)

41. Jin, Y., Wei, X., Liu, Y., Yang, Q.: A survey towards federated semi-supervised learn-ing. arXiv preprint arXiv:2002.11545 (2020)

42. Johnson, A.E., Pollard, T.J., Shen, L., Li-wei, H.L., Feng, M., Ghassemi, M., Moody,B., Szolovits, P., Celi, L.A., Mark, R.G.: Mimic-iii, a freely accessible critical caredatabase. Scientific data 3, 160035 (2016)

43. Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N.,Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and openproblems in federated learning. arXiv preprint arXiv:1912.04977 (2019)

44. Kamp, M., Adilova, L., Sicking, J., Hüger, F., Schlicht, P., Wirtz, T., Wrobel, S.:Efficient decentralized deep learning by dynamic model averaging. In: Joint EuropeanConference on Machine Learning and Knowledge Discovery in Databases, pp. 393–409.Springer (2018)

45. Kang, J., Xiong, Z., Niyato, D., Yu, H., Liang, Y.C., Kim, D.I.: Incentive design forefficient federated learning in mobile networks: A contract theory approach. arXivpreprint arXiv:1905.07479 (2019)

46. Kawa, D., Punyani, S., Nayak, P., Karkera, A., Jyotinagar, V.: Credit risk assessmentfrom combined bank records using federated learning (2019)

47. Kim, Y., Sun, J., Yu, H., Jiang, X.: Federated tensor factorization for computationalphenotyping. In: Proceedings of the 23rd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, pp. 887–895. ACM (2017)

48. Konečny, J., McMahan, B., Ramage, D.: Federated optimization: Distributed opti-mization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015)

49. Konečny, J., McMahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: Dis-tributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527(2016)

50. Konečny, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Fed-erated learning: Strategies for improving communication efficiency. arXiv preprintarXiv:1610.05492 (2016)

51. Kulkarni, V., Kulkarni, M., Pant, A.: Survey of personalization techniques for federatedlearning. arXiv preprint arXiv:2003.08673 (2020)

52. Lalitha, A., Kilinc, O.C., Javidi, T., Koushanfar, F.: Peer-to-peer federated learningon graphs. rXiv preprint arXiv:1901.11173 (2019)

53. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015)54. Lee, J., Sun, J., Wang, F., Wang, S., Jun, C.H., Jiang, X.: Privacy-preserving pa-

tient similarity learning in a federated environment: development and analysis. JMIRmedical informatics 6(2), e20 (2018)

55. Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith1, V.: Federatedoptimization for heterogeneous networks. arXiv preprint arXiv:1812.06127 (2019)

56. Li, T., Sanjabi, M., Smith, V.: Fair resource allocation in federated learning. arXivpreprint arXiv:1905.10497 (2019)

57. Li, Z., Roberts, K., Jiang, X., Long, Q.: Distributed learning from multiple ehrdatabases: Contextual embedding models for medical events. Journal of biomedicalinformatics 92, 103138 (2019)

58. Lim, W.Y.B., Luong, N.C., Hoang, D.T., Jiao, Y., Liang, Y.C., Yang, Q., Niyato, D.,Miao, C.: Federated learning in mobile edge networks: A comprehensive survey. arXivpreprint arXiv:1909.11875 (2019)

59. Liu, D., Dligach, D., Miller, T.: Two-stage federated phenotyping and patient repre-sentation learning. arXiv preprint arXiv:1908.05596 (2019)

60. Lyu, L., Yu, H., Yang, Q.: Threats to federated learning: A survey. arXiv preprintarXiv:2003.02133 (2020)


61. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligenceand Statistics, pp. 1273–1282 (2017)

62. McMahan, H.B., Ramage, D., Talwar, K., Zhang, L.: Learning differentially privaterecurrent language models. arXiv preprint arXiv:1710.06963 (2017)

63. Min, X., Yu, B., Wang, F.: Predictive modeling of the hospital readmission risk frompatients’ claims data using machine learning: A case study on copd. Scientific reports9(1), 2362 (2019)

64. Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.: Deep learning for healthcare:review, opportunities and challenges. Briefings in bioinformatics 19(6), 1236–1246(2018)

65. Mohri, M., Sivek, G., Suresh, A.T.: Agnostic federated learning. In: K. Chaudhuri,R. Salakhutdinov (eds.) Proceedings of the 36th International Conference on MachineLearning, Proceedings of Machine Learning Research, vol. 97, pp. 4615–4625. PMLR,Long Beach, California, USA (2019)

66. Mukherjee, R., Jaffe, H.: System and method for dynamic context-sensitive federatedsearch of multiple information repositories (2005). US Patent App. 10/743,196

67. Nishio, T., Yonetani, R.: Client selection for federated learning with heterogeneousresources in mobile edge. arXiv preprint arXiv:1804.08333 (2018)

68. Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in analgorithm used to manage the health of populations. Science 366(6464), 447–453(2019)

69. OpenMined: Pysyft-tensorflow. https://github.com/OpenMined/PySyft-TensorFlow(2019)

70. Pathak, M., Rane, S., Raj, B.: Multiparty differential privacy via aggregation of locallytrained classifiers. In: Advances in Neural Information Processing Systems, pp. 1876–1884 (2010)

71. Perez, M.V., Mahaffey, K.W., Hedlin, H., Rumsfeld, J.S., Garcia, A., Ferris, T., Bal-asubramanian, V., Russo, A.M., Rajmane, A., Cheung, L., et al.: Large-scale assess-ment of a smartwatch to identify atrial fibrillation. New England Journal of Medicine381(20), 1909–1917 (2019)

72. Pfohl, S.R., Dai, A.M., Heller, K.: Federated and differentially private learning forelectronic health records. arXiv preprint arXiv:1911.05861 (2019)

73. Pollard, T.J., Johnson, A.E., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eicucollaborative research database, a freely available multi-center database for criticalcare research. Scientific data 5, 180178 (2018)

74. Raja, P.V., Sivasankar, E.: Modern framework for distributed healthcare data analyticsbased on hadoop. In: Information and Communication Technology-EurAsia Confer-ence, pp. 348–355. Springer (2014)

75. Rehak, D., Dodds, P., Lannom, L.: A model and infrastructure for federated learningcontent repositories. In: Interoperability of Web-Based Educational SystemsWorkshop,vol. 143. Citeseer (2005)

76. Ren, J., Yu, G., Ding, G.: Accelerating dnn training in wireless federated edge learningsystem. arXiv preprint arXiv:1905.09712 (2019)

77. Roy, A.G., Siddiqui, S., Pölsterl, S., Navab, N., Wachinger, C.: Braintorrent: A peer-to-peer environment for decentralized federated learning. arXiv preprint arXiv:1905.06731(2019)

78. Rubinstein, B.I., Bartlett, P.L., Huang, L., Taft, N.: Learning in a large function space:Privacy-preserving mechanisms for svm learning. arXiv preprint arXiv:0911.5708(2009)

79. Ryffel, T., Trask, A., Dahl, M., Wagner, B., Mancuso, J., Rueckert, D., Passerat-Palmbach, J.: A generic framework for privacy preserving deep learning. arXiv preprintarXiv:1811.04017 (2018)

80. Samarakoon, S., Bennis, M., Saad, W., Debbah, M.: Federated learning for ultra-reliable low-latency v2v communications. In: 2018 IEEE Global Communications Con-ference (GLOBECOM), pp. 1–7. IEEE (2018)

81. Sattler, F., Wiedemann, S., Müller, K.R., Samek, W.: Robust and communication-efficient federated learning from non-iid data. arXiv preprint arXiv:1903.02891 (2019)

https://github.com/OpenMined/PySyft-TensorFlow

18 Jie Xu et al.

82. Sharma, P., Shamout, F.E., Clifton, D.A.: Preserving patient privacy while training apredictive model of in-hospital mortality. arXiv preprint arXiv:1912.00354 (2019)

83. Shayan, M., Fung, C., Yoon, C.J., Beschastnikh, I.: Biscotti: A ledger for private andsecure peer-to-peer machine learning. arXiv preprint arXiv:1811.09904 (2018)

84. Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. ACM (2015)

85. Silva, S., Gutman, B., Romero, E., Thompson, P.M., Altmann, A., Lorenzi, M.: Feder-ated learning in distributed medical databases: Meta-analysis of large-scale subcorticalbrain data. arXiv preprint arXiv:1810.08553 (2018)

86. Sim, K.C., Zadrazil, P., Beaufays, F.: An investigation into on-device personalizationof end-to-end automatic speech recognition models. arXiv preprint arXiv:1909.06678(2019)

87. Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A.S.: Federated multi-task learning.In: Advances in Neural Information Processing Systems, pp. 4424–4434 (2017)

88. Such, J.M., Criado, N.: Multiparty privacy in social media. Commun. ACM 61(8),74–81 (2018)

89. Thomas, K., Grier, C., Nicol, D.M.: unfriendly: Multi-party privacy risks in social net-works. In: International Symposium on Privacy Enhancing Technologies Symposium,pp. 236–252. Springer (2010)

90. Truex, S., Baracaldo, N., Anwar, A., Steinke, T., Ludwig, H., Zhang, R.: A hybridapproach to privacy-preserving federated learning. arXiv preprint arXiv:1812.03224(2018)

91. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence (2016)

92. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)

93. Wang, F., Preininger, A.: Ai in health: State of the art, challenges, and future direc-tions. Yearbook of medical informatics 28(01), 016–026 (2019)

94. Wang, X., Han, Y., Wang, C., Zhao, Q., Chen, X., Chen, M.: In-edge ai: Intelligentizingmobile edge computing, caching and communication by federated learning. arXivpreprint arXiv:1809.07857 (2018)

95. Xu, J., Wang, F.: Federated learning for healthcare informatics. arXiv preprintarXiv:1911.06270 (2019)

96. Xu, J., Xu, Z., Walker, P., Wang, F.: Federated patient hashing. In: AAAI, pp. 6486–6493 (2020)

97. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: Concept andapplications. ACM Trans. Intell. Syst. Technol. 10(2), 12:1–12:19 (2019). DOI10.1145/3298981. URL http://doi.acm.org/10.1145/3298981

98. Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning withnon-iid data. arXiv preprint arXiv:1806.00582 (2018)

99. Zhao, Y., Zhao, J., Jiang, L., Tan, R., Niyato, D.: Mobile edge computing, blockchainand reputation-based crowdsourcing iot federated learning: A secure, decentralized andprivacy-preserving system. arXiv preprint arXiv:1906.10893 (2019)

100. Zhu, H., Jin, Y.: Multi-objective evolutionary federated learning. IEEE transactionson neural networks and learning systems (2019)

101. Zhuo, H.H., Feng, W., Xu, Q., Yang, Q., Lin, Y.: Federated reinforcement learning.rXiv preprint arXiv:1901.08277 (2019)

http://doi.acm.org/10.1145/3298981

federated learning for healthcare informatics · healthcare data sources without ignoring the...

Documents