deep learning for iot big data and streaming analytics: a...

1

Deep Learning for IoT Big Data and StreamingAnalytics: A Survey

Mehdi Mohammadi, Graduate Student Member, IEEE, Ala Al-Fuqaha, Senior Member, IEEE,Sameh Sorour, Senior Member, IEEE, Mohsen Guizani, Fellow, IEEE

Abstract—In the era of the Internet of Things (IoT), anenormous amount of sensing devices collect and/or generatevarious sensory data over time for a wide range of fieldsand applications. Based on the nature of the application, thesedevices will result in big or fast/real-time data streams. Applyinganalytics over such data streams to discover new information,predict future insights, and make control decisions is a crucialprocess that makes IoT a worthy paradigm for businesses and aquality-of-life improving technology. In this paper, we provide athorough overview on using a class of advanced machine learningtechniques, namely Deep Learning (DL), to facilitate the analyticsand learning in the IoT domain. We start by articulating IoTdata characteristics and identifying two major treatments forIoT data from a machine learning perspective, namely IoT bigdata analytics and IoT streaming data analytics. We also discusswhy DL is a promising approach to achieve the desired analyticsin these types of data and applications. The potential of usingemerging DL techniques for IoT data analytics are then discussed,and its promises and challenges are introduced. We present acomprehensive background on different DL architectures andalgorithms. We also analyze and summarize major reportedresearch attempts that leveraged DL in the IoT domain. Thesmart IoT devices that have incorporated DL in their intelligencebackground are also discussed. DL implementation approacheson the fog and cloud centers in support of IoT applications arealso surveyed. Finally, we shed light on some challenges andpotential directions for future research. At the end of each section,we highlight the lessons learned based on our experiments andreview of the recent literature.

Keywords-Deep Learning, Deep Neural Network, Internet ofThings, On-device Intelligence, IoT Big Data, Fast data analytics,Cloud-based analytics.

I. INTRODUCTION

The vision of the Internet of Things (IoT) is to transformtraditional objects to being smart by exploiting a wide range ofadvanced technologies, from embedded devices and commu-nication technologies to Internet protocols, data analytics, andso forth [1]. The potential economic impact of IoT is expectedto bring many business opportunities and to accelerate theeconomic growth of IoT-based services. Based on McKinsey’sreport for the economic impact of IoT by 2025 [2], the annualeconomic impact of IoT would be in the range of $2.7 to$6.2 trillion. Healthcare constitutes the major part, about 41%of this market, followed by industry and energy with 33%and 7% of the IoT market, respectively. Other domains such

Mehdi Mohammadi and Ala Al-Fuqaha are with the Department ofComputer Science, Western Michigan University, Kalamazoo, MI 49008USA (E-mail: {mehdi.mohammadi,ala.al-fuqaha}@wmich.edu.). SamehSorour (E-mail: [email protected]) and Mohsen Guizani (E-mail:[email protected]) are with the Department of Electrical and ComputerEngineering, University of Idaho, Moscow, ID 83844 USA.

Fig. 1. IoT data generation at different levels and deep learning models toaddress their knowledge abstraction.

as transportation, agriculture, urban infrastructure, security,and retail have about 15% of the IoT market totally. Theseexpectations imply the tremendous and steep growth of the IoTservices, their generated data and consequently their relatedmarket in the years ahead.

In recent years, many IoT applications arose in differentvertical domains, i.e., health, transportation, smart home, smartcity, agriculture, education, etc. The main element of mostof these applications is an intelligent learning mechanismfor prediction (i.e., classification or regression), or clustering.Among the many machine learning approaches, Deep Learning(DL) has been actively utilized in many IoT applications inrecent years. These two technologies (i.e., DL and IoT) areamong the top three strategic technology trends for 2017 thatwere announced at Gartner Symposium/ITxpo 2016 [3]. Thecause of this intensive publicity for DL refers to the fact thattraditional machine learning approaches do not address theemerging analytic needs of IoT systems. Instead, IoT systemsneed different modern data analytic approaches and artificialintelligence (AI) methods according to the hierarchy of IoTdata generation and management as illustrated in Figure 1.

The growing interest in the Internet of Things (IoT) andits derivative big data need stakeholders to clearly understandtheir definition, building blocks, potentials and challenges. IoTand big data have a two way relationship. On one hand, IoTis a main producer of big data, and on the other hand, it is animportant target for big data analytics to improve the processesand services of IoT [4]. Moreover, IoT big data analyticshave proven to bring value to the society. For example, it isreported that, by detecting damaged pipes and fixing them, the

arX

iv:1

712.

0430

1v1

[cs

.NI]

9 D

ec 2

017

2

Department of Park Management in Miami has saved aboutone million USD on their water bills [5].

IoT data are different than the general big data. To betterunderstand the requirements for IoT data analytics, we need toexplore the properties of IoT data and how they are differentfrom those of general big data. IoT data exhibits the followingcharacteristics [5]:

• Large-Scale Streaming Data: A myriad of data capturingdevices are distributed and deployed for IoT applications,and generate streams of data continuously. This leads toa huge volume of continuous data.

• Heterogeneity: Various IoT data acquisition devicesgather different information resulting in data heterogene-ity.

• Time and space correlation: In most of IoT applications,sensor devices are attached to a specific location, and thushave a location and time-stamp for each of the data items.

• High noise data: Due to tiny pieces of data in IoTapplications, many of such data may be subject to errorsand noise during acquisition and transmission.

Although obtaining hidden knowledge and information outof big data is promising to enhance the quality of our lives, itis not an easy and straightforward task. For such a complexand challenging task that goes beyond the capabilities of thetraditional inference and learning approaches, new technolo-gies, algorithms, and infrastructures are needed [6]. Luckily,the recent progresses in both fast computing and advancedmachine learning techniques are opening the doors for bigdata analytics and knowledge extraction that is suitable forIoT applications.

Beyond the big data analytics, IoT data calls for another newclass of analytics, namely fast and streaming data analytics, tosupport applications with high-speed data streams and requir-ing time-sensitive (i.e., real-time or near real-time) actions.Indeed, applications such as autonomous driving, fire pre-diction, driver/elderly posture (and thus consciousness and/orhealth condition) recognition demands for fast processing ofincoming data and quick actions to achieve their target. Severalresearchers have proposed approaches and frameworks forfast streaming data analytics that leverage the capabilities ofcloud infrastructures and services [7], [8]. However, for theaforementioned IoT applications among others, we need fastanalytics in smaller scale platforms (i.e., at the system edge) oreven on the IoT devices themselves. For example, autonomouscars need to make fast decisions on driving actions such aslane or speed change. Indeed, this kind of decisions shouldbe supported by fast analytics of possibly multi-modal datastreaming from several sources, including the multiple vehiclesensors (e.g., cameras, radars, lidars, speedometer, left/rightsignals, etc.), communications from other vehicles, and trafficentities (e.g., traffic light, traffic signs). In this case, transfer-ring data to a cloud server for analysis and returning back theresponse is subject to latency that could cause traffic violationsor accidents. A more critical scenario would be detectingpedestrians by such vehicles. Accurate recognition should beperformed in strict real-time to prevent fatal accidents. Thesescenarios imply that fast data analytics for IoT have to be

close to or at the source of data to remove unnecessary andprohibitive communication delays.

A. Survey Scope

DL models in general bring two important improvementsover the traditional machine learning approaches in the twophases of training and prediction. First, they reduce the needfor hand crafted and engineered feature sets to be used forthe training. Consequently, some features that might not beapparent to a human view can be extracted easily by DLmodels. In addition, DL models improve the accuracy of theprediction model using large amount of data.

In this paper, we review a wide range of deep neural network(DNN) architectures and explore the IoT applications thathave benefited from DL algorithms. The paper identifies fivemain foundational IoT services that can be used in differentvertical domains beyond the specific services in each domain.It will also discuss the characteristics of IoT applications andthe guide to matching them with the most appropriate DLmodel. This survey focuses on the confluence of two emergingtechnologies, one in communication networks, i.e., IoT and theother in artificial intelligence, i.e., DL, detailing their potentialapplications and open issues. The survey does not covertraditional machine learning algorithms for IoT data analyticsas there are some other attempts, mentioned in section I-B,that have covered such approaches. Moreover, this survey alsodoes not go into the details of the IoT infrastructure from acommunications and networking perspective.

B. Related Work

To the best of our knowledge, there does not exist an articlein the literature that is dedicated to surveying the specificrelation between IoT data and DL as well as applications ofDL methods in IoT. There are few works presenting commondata mining and machine learning methods that have beenused in IoT environments. The work presented in [9] by Tsaiet al. focused on data mining approaches in IoT. It addresseddifferent classification, clustering, and frequent pattern miningalgorithms for the IoT infrastructure and services. However,that work did not consider DL approaches, which is the focusof our survey. Moreover, their focus is mainly on offline datamining, while we also consider learning and mining for bothreal-time (i.e., fast) and big data analytics.

In [10], Perera et al. have reviewed different classes ofmachine learning approaches (supervised and unsupervised,rules, fuzzy logic, etc.) in the reasoning phase of a context-aware computing system, and have discussed the potentials ofapplying those methods in IoT systems. Nonetheless, they alsodid not study the role of DL on the context reasoning.

The work in [11] by Alsheikh et al. provides a survey of ma-chine learning methods for wireless sensor networks (WSNs).In that work, the authors studied machine learning methods inthe functional aspects of WSNs, such as routing, localization,and clustering, as well as non-functional requirements, suchas security and quality of service. They reviewed severalalgorithms in supervised, unsupervised, and reinforcementlearning approaches. This work focuses on the infrastructure of

3

Fig. 2. Structure of the survey.

WSN (which is one potential infrastructure for implementingIoT applications), while our work is not dependent on thesources of data (i.e., IoT infrastructures) and covers a widerange of IoT applications and services. Moreover, the focusof [11] was on traditional machine learning methods, whereasthis article focuses on advanced and DL techniques.

Finally, Fadlullah et al. [12] addressed DL approaches innetwork traffic control systems. While this work primarilyfocuses on the infrastructure of network, it differs from ourwork that focuses on the usage of DL in IoT applications.

C. Contributions

This paper is intended for IoT researchers and developerswho want to build analytics, AI systems, and learning solutionson top of their IoT infrastructure, using the emerging DLmachine learning approaches. The contributions of this papercan be summarized as follows:

• In order to adopt DL approaches in the IoT ecosystems,we identify the key characteristics and issues of IoT data.

• Compared to some related work in the literature that haveaddressed machine learning for IoT, we review the state-of-the-art DL methods and their applicability in the IoTdomain both for big data and streaming data analytics.

• We review a wide range of IoT applications that haveused DL in their context. We also provide a comparisonand a guideline for using different types of DNN in thevarious IoT domains and applications.

• We review the recent approaches and technologies fordeploying DL on all levels of IoT hierarchy from resourceconstrained devices to the fog and the cloud.

• We highlight the challenges and future research directionsfor the successful and fruitful merging of DL and IoT

applications.The rest of this paper is organized as follows. In section II,

we highlight the IoT data characteristics and describe whatIoT big data as well as fast and streaming data are, andhow they are different from the general big data. SectionIII presents several common and successful architectures ofDNNs. It also includes a brief description of advancementstoward real-time and fast DL architectures as well as state-of-the-art algorithms that are joint with DL. A succinct reviewof several frameworks and tools with different capabilitiesand algorithms that support DNNs is also presented. IoTapplications in different domains (e.g., healthcare, agriculture,ITS, etc.) that have used DL will be surveyed in section IV.Section V reviews the attempts to bring DNN to the resourceconstraint devices. Section VI explains the works that investi-gated bringing the DNN models to the scale of fog and cloudcomputing. Future research direction and open challenges arepresented in section VII. The paper is concluded in SectionVIII with a summary of its main take-away messages. Figure 2depicts the structure of the paper.

II. IOT DATA CHARACTERISTICS AND REQUIREMENTSFOR ANALYTICS

IoT data can be streamed continuously or accumulatedas a source of big data. Streaming data refers to the datagenerated or captured within tiny intervals of time and needto be promptly analyzed to extract immediate insights and/ormake fast decisions. Big data refers to huge datasets that thecommonly used hardware and software platforms are not ableto store, manage, process, and analyze. These two approachesshould be treated differently since their requirements foranalytic response are not the same. Insight from big data

4

analytics can be delivered after several days of data generation,but insight from streaming data analytics should be ready ina range of few hundreds of milliseconds to few seconds.

A. IoT fast and streaming data

Many research attempts suggested streaming data analyticsthat can be mainly deployed on high-performance computingsystems or cloud platforms. The streaming data analytics onsuch frameworks is based on data parallelism and incrementalprocessing [13]. By data parallelism, a large dataset is parti-tioned into several smaller datasets, on which parallel analyticsare performed simultaneously. Incremental processing refersto fetching a small batch of data to be processed quickly ina pipeline of computation tasks. Although these techniquesreduce time latency to return a response from the streamingdata analytic framework, they are not the best possible solutionfor time-stringent IoT applications. By bringing streaming dataanalytics closer to the source of data (i.e., IoT devices oredge devices) the need for data parallelism and incrementalprocessing is less sensible as the size of the data in thesource allows it to be processed rapidly. However, bringingfast analytics on IoT devices introduces its own challengessuch as limitation of computing, storage, and power resourcesat the source of data.

B. IoT Big data

IoT is well-known to be one of the major sources of bigdata, as it is based on connecting a huge number of smartdevices to the Internet to report their frequently captured statusof their environments. Recognizing and extracting meaningfulpatterns from enormous raw input data is the core utility ofbig data analytics as it results in higher levels of insights fordecision-making and trend prediction. Therefore, extractingthese insights and knowledge from the big data is of extremeimportance to many businesses, since it enables them togain competitive advantages. In social sciences, Hilbert [14]compares the impact of big data analytics to that of theinvention of the telescope and microscope for astronomy andbiology, respectively.

Several works have described the general features of bigdata from different aspects [14]–[17] in terms of volume,velocity, and variety. However, we adopt the general definitionof big data to characterize the IoT big data through thefollowing “6V’s” features:

• Volume: Data volume is a determining factor to considera dataset as big data or traditional massive/ very largedata. The quantity of generated data using IoT devices ismuch more than before and clearly fits this feature.

• Velocity: The rate of IoT big data production and pro-cessing is high enough to support the availability of bigdata in real-time. This justifies the needs for advancedtools and technologies for analytics to efficiently operategiven this high rate of data production.

• Variety: Generally, big data comes in different forms andtypes. It may consist of structured, semi-structured, andunstructured data. A wide variety of data types may be

produced by IoT such as text, audio, video, sensory dataand so on.

• Veracity: Veracity refers to the quality, consistency, andtrustworthiness of the data, which in turn leads to accurateanalytics. This property needs special attention to holdfor IoT applications, especially those with crowd-sensingdata.

• Variability: This property refers to the different rates ofdata flow. Depending on the nature of IoT applications,different data generating components may have inconsis-tent data flows. Moreover, it is possible for a data sourceto have different rates of data load based on specifictimes. For example, a parking service application thatutilizes IoT sensors may have a peak data load in rushhours.

• Value: Value is the transformation of big data to usefulinformation and insights that bring competitive advantageto organizations. A data value highly depends on both theunderlying processes/services and the way that data istreated. For example, a certain application (e.g., medicalvital sign monitoring) may need to capture all sensor data,while a weather forecast service may need just randomsamples of data from its sensors. As another example, acredit card provider may need to keep data for a specificperiod of time and discard them thereafter.

Beyond the aforementioned properties, researchers [14] [16]have identified other characteristics such as:

• Big data can be a byproduct or footprint of a digitalactivity or IoT interplay. The use of Google’s mostcommon search terms to predict seasonal flu is a goodexample of such digital byproduct.

• Big data systems should be horizontally scalable, thatis, big data sources should be able to be expanded tomultiple datasets. This attribute also leads to the com-plexity attribute of big data, which in turn imposes otherchallenges like transferring and cleansing data.

Performing analytics over continuous data flows are typi-cally referred to as stream processing or sometimes complexevent processing (CEP) in the literature. Strohbach et al. [18]proposed a big data analytics framework for IoT to supportthe volume and velocity attributes of IoT data analytics. Theintegration of IoT big data and streaming data analytics, anopen issue that needs more investigation, has been also studiedas part of that work. However, their proposed framework isdesigned to be deployed on cloud infrastructures. Moreover,their focus is on the data management aspect of the frameworkand did not use advanced machine learning models such asDL. Other on-the-shelf products such as Apache Storm arealso available for real-time analytics on the cloud. A big gapin this area is the lack of frameworks and algorithms that canbe deployed on the fog (i.e., system edge) or even on the IoTdevices. When DL comes to play in such cases, a trade-offbetween the depth and performance of the DNN should beconsidered.

III. DEEP LEARNING

DL consists of supervised or unsupervised learning tech-niques based on many layers of artificial neural networks

5

Fig. 3. Google Trend showing more attention toward deep learning in recentyears.

that are able to learn hierarchical representations in deeparchitectures. DL architectures consist of multiple processinglayers. Each layer is able to produce non-linear responsesbased on the data from its input layer. The functionality of DLis imitated from the mechanisms of human brain and neuronsfor processing of signals.

DL architectures have gained more attention in recent yearscompared to the other traditional machine learning approaches.Such approaches are considered as being shallow-structuredlearning architectures versions (i.e., a limited subset) of DL.Figure 3 shows the searching trend of five popular machinelearning algorithms in Google trends, in which DL is becom-ing more popular among the others. Although Artificial NeuralNetworks (ANNs) have been introduced in the past decades,the growing trend for DNNs started in 2006 when G. Hintonet al. presented the concept of deep belief networks [19].Thereafter, the state-of-the-art performance of this technologyhas been observed in different fields of AI including imagerecognition, image retrieval, search engines and informationretrieval, and natural language processing.

A. Architectures

In this section, we present a brief overview of several com-mon DL models as well as the most cutting-edge architecturesthat have been introduced in recent years. Interested readerscan refer to other literature that surveyed the models andarchitectures of DL in more details, such as [20]. Table I sum-marizes these models, their attributes, and their characteristics.

A DNN consists of an input layer, several hidden layers,and an output layer. Each layer includes several units calledneurons. A neuron receives several inputs, performs a weightedsummation over its inputs, then the resulting sum goes throughan activation function to produce an output. Each neuron has avector of weights associated to its input size as well as a biasthat should be optimized during the training process. Figure 4depicts the structure of a neuron.

Fig. 4. A neuron is a unit of artificial neural networks, with several inputsand trainable weights and bias.

Fig. 5. The overall mechanism of training of a DL model.

In the training process, the input layer assigns (usuallyrandomly) weights to the input training data and passes itto the next layer. Each subsequent layer also assigns weightsto their input and produces their output, which serves as theinput for the following layer. At the last layer, the final outputrepresenting the model prediction will be produced. A lossfunction determines how right or wrong is this prediction bycomputing the error rate between the prediction and true value.The error rate is propagated back across the network to theinput layer. The network then repeats this training cycle, afterbalancing the weights on each neuron in each cycle, until theerror rate falls below a desired threshold. At this point, theDNN is trained and is ready for inference. In Figure 5, thehigh level mechanism of training for DL models is illustrated.

In a broad categorization, DL models fall into threecategories, namely generative, discriminative, and hybridmodels. Though not being a firm boundary, discriminativemodels usually provide supervised learning approaches, whilegenerative models are used for unsupervised learning. Hybridmodels incorporate the benefits of both discriminative andgenerative models.

6

1) Convolutional Neural Networks (CNNs):For vision-based tasks, DNNs with a dense connection be-tween layers are hard to train and do not scale well. Oneimportant reason is the translation-invariance property of suchmodels. They thus do not learn the features that might trans-form in the image (e.g., rotation of hand in pose detection).CNNs have solved this problem by supporting translation-equivariance computations. A CNN receives a 2-D input (e.g.,an image or speech signal) and extracts high level featuresthrough a series of hidden layers. The hidden layers consist ofconvolution layers as well as fully connected layers at the end.The convolution layer is at the core of a CNN and consists ofa set of learnable parameters, called filters, that have the sameshape as the input’s shape but with smaller dimensions. In thetraining process, the filter of each convolutional layer goesthrough the whole input volume (e.g., in case of an image, itgoes across the width and length of the image) and calculatesan inner product of the input and the filter. This computationover the whole input leads to a feature map of the filter.

Another building block of a CNN is the pooling layers,which operate on the feature maps. The objective of havingpooling layers is to reduce the spatial size of the representa-tion, in order to both cut down the number of parameters andcomputation times and to reduce the chance of overfitting. Maxpooling is a common approach that partitions the input spaceinto non-overlapping regions and picks the maximum valuefor each region.

The last important component in CNN is the RectifiedLinear Units (ReLU), which consist of neurons with activationfunction in the form of f(x) = max(0, x). The introductionof this activation function in CNN results in a faster trainingtime without affecting the generalization of the network in asensible negative way [21].

A main difference between CNNs and fully connectednetworks is that each neuron in CNNs is connected onlyto a small subset of the input. This decreases the totalnumber of parameters in the network and enhances the timecomplexity of the training process. This property is calledlocal connectivity.

2) Recurrent Neural Networks (RNNs):In many tasks, prediction is dependent on several previ-ous samples such that, in addition to classifying individualsamples, we also need to analyze the sequences of inputs.In such applications, a feed-forward neural network is notapplicable since it assumes no dependency between input andoutput layers. RNNs have been developed to address this issuein sequential (e.g., speech or text) or time-series problems(sensor’s data) with various length. The input to an RNNconsists of both the current sample and the previous observedsample. In other words, the output of an RNN at time step t−1affects the output at time step t. Each neuron is equipped witha feedback loop that returns the current output as an input forthe next step. This structure can be expresses such that eachneuron in an RNN has an internal memory that keeps theinformation of the computations from the previous input.

To train the network, an extension of the backpropagationalgorithm, called Backpropagation Through Time (BPTT), is

Fig. 6. Structure of a recurrent neural network.

used. Due to the existence of cycles on the neurons, we cannotuse the original backpropagation here, since it works based onerror derivation with respect to the weight in their upper layer,while we do not have a stacked layer model in RNNs. The coreof BPTT algorithm is a technique called unrolling the RNN,such that we come up with a feed-forward network over timespans. Figure 6 depicts the structure of an RNN and unrolledconcept.

Traditional RNNs can be considered as deep models sincethey can be seen as several non-linear layers of neuronsbetween the input layer and the output layer when they areunfolded in time [22]. However, considering the architectureand the functionality of RNNs, the hidden layers in RNNsare supposed to provide a memory instead of a hierarchicalrepresentation of features [23]. There are several approachesto make RNNs deeper, including adding more layers betweenthe input and hidden layers, stacking more hidden layers, andadding more layers between hidden layers and the outputlayer [22].

3) Long Short Term Memory (LSTM):LSTM is an extension of RNNs. Different variations of LSTMhave been proposed, though most of them have followed thesame design of the original network [24]. LSTM uses theconcept of gates for its units, each computing a value between0 and 1 based on their input. In addition to a feedback loopto store the information, each neuron in LSTM (also called amemory cell) has a multiplicative forget gate, read gate, andwrite gate. These gates are introduced to control the accessto memory cells and to prevent them from perturbation byirrelevant inputs. When the forget gate is active, the neuronwrites its data into itself. When the forget gate is turned offby sending a 0, the neuron forgets its last content. When thewrite gate is set to 1, other connected neurons can write to thatneuron. If the read gate is set to 1, the connected neurons canread the content of the neuron. Figure 7 depicts this structure.

An important difference of LSTMs compared to RNNsis that LSTM units utilize forget gates to actively controlthe cell states and ensure they do not degrade. The gatescan use sigmoid or tanh as their activation function. In fact,

7

Fig. 7. Structure of a LSTM memory cell. Solid arrow lines show the flowof data and dashed arrow lines show the signals coming from gates.

these activation functions cause the problem of vanishinggradient during backpropagation in the training phase ofother models using them. By learning what data to rememberin LSTMs, stored computations in the memory cells are notdistorted over time. BPTT is a common method for trainingthe network to minimize the error.

4) Autoencoders (AEs):AEs consist of an input layer and an output layer that areconnected through one or more hidden layers. AEs have thesame number of input and output units. This network aims toreconstruct the input by transforming inputs into outputs withthe simplest possible way, such that it does not distort theinput very much. This kind of neural networks has been usedmainly for solving unsupervised learning problems as well astransfer learning [25].

AEs have two main components: An encoder and adecoder. The encoder receives the input and transforms itto a new representation, which is usually called a code orlatent variable. The decoder receives the generated codeat the encoder, and transforms it to a reconstruction ofthe original input. The training procedure in AEs involvesminimizing reconstruction error, i.e., the output and inputshowing minimal difference. Figure 8 illustrates the structureof a typical AE. There are several variations and extensionsof AEs like denoising AE, contractive AE, stacked AE, sparseAE, and variational AE.

5) Variational Autoencoders (VAEs):VAEs, introduced in 2013, are a popular generative modelframework whose assumptions on the structure of the datais not strong, while having a fast training process throughbackpropagation [26]. Moreover, this model has been usedfor semi-supervised learning [27]. Therefore, it is a goodfit for IoT solutions that present variety of data as well asscarcity of labeled data. For each data point x, there is avector of corresponding latent variables denoted by z. Thetraining architecture of a VAE consists of an encoder anda decoder with parameters φ and θ, respectively. A fixed

Fig. 8. Structure of an autoencoder network.

Fig. 9. Structure of a variational autoencoder network.

form distribution qφ(z|x) helps the encoder in estimatingthe posterior distribution pθ(z|x). The model consists of twonetworks: One generating samples and the other performingapproximate inference. A schematic of the VAE is depictedin Figure 9.

6) Generative Adversarial Networks (GANs):GANs, introduced by Goodfellow et al. [28], consist of twoneural networks, namely the generative and discriminativenetworks, which work together to produce synthetic and high-quality data. The former network (a.k.a. the generator) isin charge of generating new data after it learns the datadistribution from a training dataset. The latter network (a.k.a.the discriminator) performs discrimination between real data(coming from training data) and fake input data (coming fromthe generator). The generative network is optimized to produceinput data that is deceiving for the discriminator (i.e., data thatthe discriminator cannot easily distinguish whether it is fakeor real). In other words, the generative network is competingwith an adversary discriminative network. Figure 10 depicts

8

Fig. 10. Concept of a generative adversarial network.

the concept of GANs.The objective function in GANs is based on minimax

games, such that one network tries to maximize the valuefunction and the other network wants to minimize it. Ineach step of this imaginary game, the generator, willing tofool the discriminator, plays by producing a sample datafrom random noise. On the other hand, the discriminatorreceives several real data examples from the training setalong with the samples from the generator. Its task is thento discriminate real and fake data. The discriminator isconsidered to perform satisfactorily if its classificationsare correct. The generator also is performing well if itsexamples have fooled the discriminator. Both discriminatorand generator parameters then are updated to be ready forthe next round of the game. The discriminator’s output helpsthe generator to optimize its generated data for the next round.

7) Restricted Boltzmann Machine (RBMs):An RBM is a stochastic ANN that consists of two layers:A visible layer that contains the input that we know, and ahidden layer that contains the latent variables. The restrictionin RBMs is applied to the connectivity of neurons comparedto Boltzmann machine. RBMs should build a bipartite graph,such that each visible neuron should be connected to all hiddenneurons and vice versa, but there is no connection between anytwo units in a same layer. Moreover, the bias unit is connectedto all of the visible and hidden neurons. RBMs can be stackedto form DNNs. They are also the building block of deep beliefnetworks.

The training data is assigned to visible units. The trainingprocedure can use backpropagation and gradient descentalgorithms to optimize the weights of the network. Theobjective of training RBM is to maximize the product of allprobabilities of the visible units. The functionality of RBMis similar to the AEs as it uses forward feeding to computethe latent variables, which are in turn used to reconstruct theinput using backward feeding. The structure of an RBM isshown in Figure 11.

Fig. 11. Structure of a restricted Boltzmann machine. The visible and hiddenlayers have separate bias.

Fig. 12. Structure of a deep belief network. The dash arrows show the featureextraction path and solid arrows show the generative path.

8) Deep Belief Network (DBNs):DBNs are a type of generative ANNs that consist of avisible layer (corresponding to the inputs) and several hiddenlayers (corresponding to latent variables). They can extracthierarchical representation of the training data as well asreconstruct their input data. By adding a classifier layer likesoftmax, it can be used for prediction tasks.

The training of a DBN is performed layer by layer, suchthat each layer is treated as an RBM trained on top of theprevious trained layer. This mechanism makes a DBN anefficient and fast algorithm in DL [29]. For a given hiddenlayer in DBN, the hidden layer of previous RBM acts as theinput layer. Figure 12 shows the structure of a typical DBN.

9) Ladder Networks:Ladder networks were proposed in 2015 by Valpola et al. [30]to support unsupervised learning. Later, they were extended towork in semi-supervised settings [31] and have shown state-

9

Fig. 13. Ladder network structure with two layers.

of-the-art performance for several tasks, such as handwrittendigits recognition and image classification. The architecture ofa ladder network consists of two encoders and one decoder.The encoders act as the supervised part of the network and thedecoder performs unsupervised learning. One of the encoders,called clean encoder, produces the normal computations whilethe other encoder, called corrupted encoder, adds Gaussiannoise to all layers.

Using a denoising function, the decoder can reconstruct therepresentations at each layer given the corresponding corrupteddata. The difference between the reconstructed and clean dataat each layer is used for computing the denoising cost ofthat layer. In the encoder side, the cost function uses thedifference between the corrupted output of encoder layersand the corresponding clean outputs. The training objectiveis to minimize the sum of cost in the supervised part andunsupervised network. Figure 13 shows the structure of aladder network.

B. Fast and Real-time DL Architectures

The literature for fast and real-time analytics using DLmodels over the stream of data are still in their infancy. Aninitial work in this area is done by Liang et al. [32]. Ithas extended the extreme learning machine (ELM) networksto apply an online sequential learning algorithm to singlehidden layer feed-forward networks. Their framework, calledOS-ELM, learns the training data one-by-one as well aschunk-by-chunk, and only newly arrived data go through thetraining process. This architecture is the base for the real-time manufacturing execution system that is proposed in [33].In this work, OS-ELM has been used for shop floor objectlocalization using RFID technology. Zou et al. [34] havealso reported using this architecture for an indoor localizationalgorithm based on WiFi fingerprinting, in which the OS-ELMmodel can bear well the dynamic environmental changes whilestill showing good accuracy.

For convolutional networks, the architecture proposed byRen et al., called Faster R-CNN [35] (based on Fast R-CNN [36]), aims to detect objects in images in real-time.Object detection in images needs more computations andhence consumes more energy compared to the image classifi-cation tasks, since the system has a large number of potentialobject suggestions that need to be evaluated. The proposedarchitecture is based on applying region proposal algorithmsin full CNNs that perform object bounds prediction andobjectness score computation at each position at the same time.Their evaluation of the proposed object detection architectureindicates that the run time of the system is between 5-17frames per second (fps). Mao et al. [37] also used Fast R-CNN for embedded platforms reporting a run time of 1.85 fpsin embedded CPU+GPU platform, which have been shownto be energy-efficient with a close-to-real-time performance.However, for image processing tasks, we can consider anapproach to be truly real-time when it can process and analyze30 fps or better. Redmon et al. [38] developed YOLO that hasreached the performance of 45 fps, and even a smaller versionof it, Fast YOLO, achieving 155 fps, which are suitable forsmart cameras.

C. Joint DL with Other Approaches

DL architectures also have been used jointly in othermachine learning approaches to make them more efficient.The nonlinear function approximation of DL models thatcan support thousands or even billions of parameters isa strong motivation to use this method in other machinelearning approaches in need of such functions. Moreover,the automatic feature extraction in deep models is anothermotivating reason to exploit these models jointly withother approaches. In the following subsections, a summary ofsuch approaches that are suitable for IoT scenarios is provided.

1) Deep Reinforcement Learning:Deep Reinforcement Learning (DRL) [39] is a combinationof reinforcement learning (RL) with DNNs. It aims to createsoftware agents that can learn by themselves to establishsuccessful policies for gaining maximum long-term rewards.In this approach, RL finds the best policy of actions overthe set of states in an environment from a DNN model. Theneed for a DNN in an RL model becomes evident when theunderlying environment can be represented by a large numberof states. In such situation, traditional RL is not efficientenough. Instead, a DL model can be used to approximate theaction values in order to estimate the quality of an action ina given state. Systems that use DRL in their context are intheir infancy, but already have showed very promising results.In the field of IoT, the work presented in [40] uses DRL ina semi-supervised setting for localization in smart campusenvironments. Figure 14 shows a sample result of suchmethod when a DNN model helps for gaining more rewardsin a semi-supervised setting (left sub-figure in Figure 14)and its reward interpretation to the accuracy (right sub-figure).

10

TABLE ISUMMARY OF DEEP LEARNING MODELS.

Model Category Learning model Typical input data Characteristics

AE Generative Unsupervised Various

• Suitable for feature extraction,dimensionality reduction• Same number of input and output units• The output reconstructs input data• Works with unlabeled data

RNN Discriminative Supervised Serial, time-series

• Processes sequences of datathrough internal memory• Useful in IoT applications withtime-dependent data

RBM Generative Unsupervised,Supervised Various

• Suitable for feature extraction,dimensionality reduction, and classification• Expensive training procedure

DBN Generative Unsupervised,Supervised Various

• Suitable for hierarchical featuresdiscovery• Greedy training of the network layer by layer

LSTM Discriminative Supervised Serial, time-series,long time dependent data

• Good performance whit data oflong time lag• Access to memory cell is protected by gates

CNN Discriminative Supervised 2-D (image, sound, etc.)

• Convolution layers takebiggest part of computations• Less connection compared to DNNs.• Needs a large training dataset for visual tasks.

VAE Generative Semi-supervised Various • A class of Auto-encoders• Suitable for scarcity of labeled data

GAN Hybrid Semi-supervised Various• Suitable for noisy data• Composed of two networks: one generator andone discriminator

Ladder Net Hybrid Semi-supervised Various• Suitable for noisy data• Composed of three networks: two encodersand one decoder

Fig. 14. Deep reinforcement learning (supervised and semi-supervised): Obtaining rewards (left) and their corresponding accuracy measurement (right) [40].

2) Transfer Learning with Deep Models:Transfer learning, which falls in the area of domain adap-tation and multi-task learning, involves the adaptation andimprovement of learning in a new domain by transferringthe knowledge representation that has been learned from dataof a related domain [41]. Transfer learning is an interestingpotential solution for many IoT applications where gatheringtraining data is not an easy task. For example, consideringthe training of a localization system through Bluetooth LowEnergy (BLE) or WiFi fingerprinting using smart phones, theRSSI values at a same time and location for different platforms(e.g., iOS and Android) vary. If we have a trained model

for one platform, the model can be transferred to the otherplatform without re-collecting another set of training data forthe new platform.

DL models are well matched to transfer learning due to theirability to learn both low-level and abstract representationsfrom input data. Specifically, Stacked denoising AEs [41]and other variations of AEs [42] have been shown to performvery well in this area. Transfer learning with DNNs is stillan ongoing and active research area in AI community, andwe have not seen reported real-world applications in IoT.

11

3) Online Learning Algorithms joint with DL:As the stream of data generated from IoT applications goesthrough the cloud platforms for analysis, the role of onlinemachine learning algorithms becomes more highlighted, as thetraining model needs to be updated by the incremental volumeof data. This is opposed to what the current technologiessupport, which is based on batch learning techniques, wherethe whole training data set should be available for trainingand, thereafter, the trained model cannot evolve by newdata. Several research works report applying online learningtechniques on various DL models, including stacked denoisingAEs [43], sum-product networks [44], and RBMs [45].

D. Frameworks

The rapid growth of interest to use DL architectures indifferent domains has been supported by introducing severalDL frameworks in recent years. Each framework has its ownstrength based on its supported DL architectures, optimizationalgorithms, and ease of development and deployment [46].Several of these frameworks have been used widely in researchfor efficient training of DNNs. In this section, we review someof these frameworks.

H2O: H2O is a machine learning framework that providesinterfaces for R, Python, Scala, Java, JSON, and Coffee-Script/JavaScript [47]. H2O can be run in different modesincluding standalone mode, on Hadoop, or in a Spark Cluster.In addition to common machine learning algorithms, H2Oincludes an implementation of a DL algorithm, which isbased on feed-forward neural networks that can be trainedby Stochastic Gradient Descent (SGD) with backpropagation.H2O’s DL AE is based on the standard deep (multi-layer)neural net architecture, where the entire network is learnedtogether, instead of being stacked layer-by-layer.

Tensorflow: Initially developed for Google Brain project,Tensorflow is an open source library for machine learningsystems using various kinds of DNNs [48]. It is used bymany Google products including Google Search, Google Mapsand Street View, Google Translate, YouTube and some otherproducts. Tensorflow uses graph representations to build neu-ral network models. Developers can also take advantage ofTensorBoard, which is a package to visualize neural networkmodels and observe the learning process including updatingparameters. Keras1 also provides a high level of programmingabstraction for Tensorflow.

Torch: Torch is an open source framework for machinelearning containing a wide range of DL algorithms for easydevelopment of DNN models [49]. It has been developed uponLua programming language to be light-weight and fast fortraining DL algorithms. It is used by several companies andresearch labs like Google, Facebook, and Twitter. It supportsdeveloping machine learning models for both CPUs and GPUs,and provides powerful parallelization packages for trainingDNNs.

Theano: Theano is an open source Python-based frameworkfor efficient machine learning algorithms, which supportscompiling for CPUs and GPUs [50]. It uses the CUDA

1https://keras.io/

library in optimizing the complicated codes that need to berun on GPUs. It also allows parallelism on CPUs. Theanouses graph representations for symbolic mathematical expres-sions. Through this representation, symbolic differentiationof mathematical expressions is supported in Theano. Severalwrappers including Pylearn2, Keras, and Lasagne provideeasier programming experience on top of Theano [51].

Caffe: Caffe [52] is an open source framework for DLalgorithms and a collection of reference models. It is basedon C++, supports CUDA for GPU computations, and providesinterfaces for Python and Matlab. Caffe separates modelrepresentation from its implementation. This has been madepossible by defining models by configurations without hard-coding them in the source code. Switching between platforms(e.g., CPU to GPU or mobile devices) is easy by only changinga flag. Its speed on GPU is reported to be 1 ms/image forprediction and 4 ms/image for training.

Neon: Neon2 is another open source DL framework basedon Python with high performance for modern DNNs, suchas AlexNet [21], VGG, and GoogleNet. It supports develop-ing several commonly used models, such as CNNs, RNNs,LSTMs, and AEs, on both CPUs and GPUs. The list is beingextended as they implemented GANs for semi-supervisedlearning using DL models. It also supports easy changing ofthe hardware platform back-ends.

Bahrampour et al. in [46] have provided a comparative studyfor four of the aforementioned tools namely, Caffe, Neon,Theano and Torch. Although the performance of each toolvaries in different scenarios, Torch and Theano showed theoverall best performance in most of the scenarios. Anotherbenchmarking is provided in [53], comparing the runningperformance of Caffe, TensorFlow, Torch, CNTK, and MXNet.Table II summarizes and compares different DL frameworks.

E. Lessons Learned

In this section, we reviewed several common DL architec-tures that can serve in the analytics component of various IoTapplications. Most of these architectures work with varioustypes of input data generated by IoT applications. However,to get better performance for serial or time-series data, RNNsand their variations are recommended. For the cases whereinput data is more than one-dimensional, variations of CNNswork better. Table I summarizes DL architectures.

A few attempts toward making DL architectures fast andreal-time responsive were also discussed. This avenue needsmore exploration and research to be applicable in manytime-sensitive IoT applications. Emerging machine learningarchitectures and techniques that both benefit from DL andaddress the specific IoT application requirements were alsohighlighted. Indeed, DRL can support autonomousness of IoTapplications, transfer learning can fill the gap of lack oftraining data sets, and online learning matches the need forstream analysis of IoT data.

We also reviewed several common and powerful frame-works for the development of DL models. For IoT appli-cations, training times, run times, and dynamic update of

2http://neon.nervanasys.com

12

TABLE IIPROPERTIES OF FRAMEWORKS FOR DEVELOPING DEEP LEARNING (AS OF SEPTEMBER 2017).

Frameworks CoreLanguage Interface Pros Cons Used in IoT

Application

H2O Java R, Python,Scala, REST API • Wide range of interfaces

• Limited number ofmodels are supported• Is not flexible

[54]

Tensorflow C++ Python, Java,C, C++, Go

• Fast on LSTM training• Support to visualizenetworks

• Slower trainingcompared to otherPython-based frameworks

[55]

Theano Python Python• Supports various models• Fast on LSTM trainingon GPU

• Many low level APIs [56]

Torch Lua C, C++

• Supports various models• Good documentation• Helpful error debuggingmessages

• Learning a new language [55] [57]

Caffe C++ Python,Matlab

• Provides a collection ofreference models• Easy platform switching• Very good at convolutionalnetworks

• Not very good forrecurrent networks [58]–[60]

Neon Python Python

• Fast training time• Easy platform switching• Supports modernarchitectures like GAN

• Not supporting CPUmulti-threading [61]

Chainer [62] Python Python

• Supports modernarchitectures• Easier to implementcomplex architectures• Dynamic change of model

• Slower forward computationin some scenarios [63]

Deeplearning4j Java Python, Scala,Clojure

• Distributed training• Imports models frommajor frameworks(e.g., TensorFlow, Caffe,Torch and Theano)• Visualization tools

• Longer training timecompared to other tools [64], [65]

the trained models are determining factors for a reliable andefficient analytic module. Most of the current frameworksfollow the pattern of “define-and-run” instead of “define-by-run” [62]. The former does not allow dynamic updatesof the model while the latter supports such modifications.Chainer [62] is a framework that follows the latter patternand can handle dynamic changes of the model.

IV. DL APPLICATIONS IN IOTDL methods have been shown promising with state-of-the-

art results in several areas, such as signal processing, naturallanguage processing, and image recognition. The trend isgoing up in IoT verticals. Some neural network models workbetter in special domains. For example, convolutional networksprovide better performance in applications related to vision,while AEs perform very well with anomaly detection, datadenoising, and dimensionality reduction for data visualization.It is important to make this link between the kind of neuralnetwork model that best fits each of the different applicationdomains.

In this section, we review successful applications of DLin IoT domains. Based on our observation, many IoT relatedapplications utilize vision and image classification (like trafficsign recognition, or plant disease detection that we will discussin Section IV-B) as their base intelligent service. There areother services, such as human pose detection, which are uti-lized for smart home applications or intelligent car assistance.

We identify several kinds of these services as foundationalservices on which other IoT applications can be built. Thecommon property of these services is that they should betreated in a fast analytic mode instead of piling their datafor later analytics. Indeed, each domain may have specificservices beyond these foundational services. Figure 15 showsthe foundational services and the IoT applications on top ofthem.

In the following subsections, we first review foundationalservices of IoT that use DL as their intelligence engine,then highlight the IoT applications and domains where acombination of foundational services as well as specific onesmay be utilized.

A. Foundational Services

1) Image Recognition:A large portion of IoT applications addresses scenarios inwhich the input data for DL is in the form of videos or images.Ubiquitous mobile devices equipped with high resolutioncameras facilitate generating images and videos by everyone,everywhere. Moreover, intelligent video cameras are used inmany places like smart homes, campuses, and manufacturersfor different applications. Image recognition/classification andobject detection are among the fundamental usages of suchdevices.

One issue with the IoT-related systems that have addressedimage recognition is the use of specific source datasets

13

Fig. 15. IoT applications and the foundational services.

for evaluation of their performance. Most of these systemsemploy the available common image datasets such asMNIST, VGG, etc. Though being good for comparison withother approaches, those datasets do not show the specificcharacteristics of IoT systems. For example, the input for thetask of vehicle detection in smart cars would not be alwaysa clear image, and there are cases where the input image isat night, or in a rainy or foggy weather. These cases are nothandled through the available datasets and hence the modelstrained based on these datasets are not comprehensive enough.

2) Speech/Voice Recognition:With the massive proliferation of smart mobile devices andwearables, automatic speech recognition is becoming a morenatural and convenient way for people to interact with theirdevices [66]. Also, the small size of mobile devices andwearables nowadays lower the possibility of having touchscreens and keyboards as a means of input and interactionwith these devices. However, the main concern for providingspeech/voice recognition functionality on resource-constraineddevices is its energy-intensiveness, especially when the datais processed through neural networks. In a typical speechrecognition neural network model, voice data is represented asthe raw input to the network. The data is processed throughthe hidden layers, and the likelihood of the voice data to aparticular speech sound is presented at the output layer.

Price et al. [67] have reported that they have built a special-purpose low-power DL chip for automatic speech recognition.The new specialized chip consumes a tiny amount of energybetween 0.2 and 10 milliwatts, 100 times lesser than theenergy consumption for running a speech recognition tool incurrent mobile phones. In the new chip, DNNs for speechrecognition have been implemented. For the sake of energysaving, three levels of voice activity recognition are designedwith three separate neural networks, each of which having adifferent level of complexity. A lowest complexity network,thus consuming the lowest amount of energy, detects voiceactivity by monitoring the noise in the environment. If thisnetwork identifies a voice, the chip runs the next complexitylevel recognition network whose task is acoustic modeling toidentify if the voice looks like speech. If the output of this

network is a high likelihood, then the third network, havingthe highest energy consumption, is triggered to run to identifyindividual words.

3) Indoor Localization:Providing location aware services, such as indoor navigationand location aware marketing in retailers, are becoming preva-lent in indoor environments. Indoor localization may also haveapplications in other sectors of IoT, such as in smart homes,smart campuses, or hospitals. The input data generated fromsuch applications usually comes from different technologies,such as vision, visible light communication (VLC), infrared,ultrasound, WiFi, RFID, ultrawide band, and Bluetooth. Forthe approaches based on WiFi or Bluetooth, most of theliterature have used mobile phones for receiving signals fromthe fixed transmitters (i.e., access points or iBeacons), whichare called fingerprints. Among these fingerprinting approaches,several attempts reported the use of DL models to predict thelocation [68]–[70].

DL has been used successfully to locate indoor positionswith high accuracy. In a system called DeepFi [68], a DLmethod over fingerprinting WiFi channel state informationdata has been utilized to identify user positions. This systemconsists of offline training and online localization phases.In the offline training phase, DL is exploited to train allthe weights based on the previously stored channel stateinformation fingerprints. Other works [69], [70] report usingvariations of DL models in conjunction with other learningmethods to extract features and estimate positions. Theseexperiments assert that the number of hidden layers and unitsin DL models has a direct effect on the localization accuracy.In [71], a CNN is used for indoor localization by fusion ofboth magnetic and visual sensing data. Moreover, a CNN hasbeen trained in [72] to determine the indoor positions of usersby analyzing an image from their surrounding scene.

Lu et al. have also used LSTM networks for localizingsoccer robots [73]. In this application, data collected fromseveral sensors, namely Inertia Navigation System (INS) andvision perceptions, are analyzed to predict the position of therobot. The authors reported improved accuracy and efficiencycompared to two baseline methods, namely standard Extended

14

Kalman Filtering (EKF) and the static Particle Filtering.

4) Physiological and Psychological State Detection:IoT combined with DL techniques have been also employedto detect various physiological and psychological states ofhumans, such as pose, activity, and emotions. Many IoTapplications incorporate a module for human pose estimationor activity recognition to deliver their services, e.g., smarthomes, smart cars, entertainment (e.g., XBox), education,rehabilitation and health support, sports, and industrial man-ufacturing. For example, convenient applications in smarthomes are built based on the analysis of occupant’s poses. Thecameras transfer the video of the occupant to a DNN to findout the pose of the person and take the most appropriate actionaccordingly. Toshev et al. [74] report a system employing aCNN model to achieve this goal. This sort of services can alsobe used in education to monitor the attention of students, andin retail stores to predict the shopping behavior of customers[75].

Ordonez et al. [76] have proposed a DL framework thatcombines the strength of CNN and LSTM neural networksfor human activity recognition from wearable sensor data(accelerometer, gyroscope, etc.). Their model consists of fourconvolutional layers with rectified linear units (ReLUs) fol-lowed by two LSTM layers and a softmax layer. They showedthat this combination outperformed a baseline model that isjust based on convolutional layers by 4% on average. Thework of Tao et al. [77] also used LSTM architecture for humanactivity recognition based on mobile phone sensor’s data. Liet al. [78] also report the usage of raw data from passiveFRID tags for detecting medical activities in a trauma room(e.g., blood pressure measurement, mouth exam, cardiac leadplacement, etc.) based on a deep CNN.

In [79], a combined model of CNN and RNN was pro-posed for gesture recognition in video frames. This modelshowed better results compared to the models without suchcombination, and asserted the importance of the recurrencecomponent for such task. In [80], Fragkiadaki et al. proposeda DNN model called Encoder-Recurrent-Decoder (ERD) forhuman body pose recognition and motion prediction in videosand motion capture data sets. The proposed model consistedof an RNN with an encoder before the recurrent layers and adecoder after them. This architecture was shown to outperformConditional Restricted Boltzmann Machines (CRBMs) for thisapplication.

Beyond the physical movements, emotion estimation ofhumans from video frames has been also investigated in[81] using a model that consists of a CNN, DBN, and AE.Furthermore, the work in [82] used mobile inertial sensordata for motion detection. It confirmed that human motionpatterns can be used as a source of user identification andauthentication. The employed model in this system is acombination of convolutional layers and clockwork RNN.

5) Security and Privacy:Security and privacy is a major concern in all IoT domainsand applications. Smart homes, ITS, Industry, smart grid, andmany other sectors consider security as a critical requirement.

Indeed, the validity of the functionality of the systems dependson protecting their machine learning tools and processes fromattackers.

False Data Injection (FDI) is a common type of attack ondata-driven systems. In [83], He et al. proposed a ConditionalDBN to extract FDI attack features from the historical data ofsmart grids, and use these features for attack detection in real-time. The work in [84] is also related to anomaly detectionthat may occur in vehicle networks.

Smart phones as great contributers to IoT data and ap-plications are also under serious threats of hacker attacks.Consequently, protecting these devices from a variety ofsecurity issues is necessary for IoT perspectives beyond theusers’ concerns. Yuan et al. [85] proposed a DL framework toidentify malwares in Android apps. Their architecture is basedon a DBN by which they reported accuracy of 96.5% to detectmalware apps.

The security and privacy preservation of deep machinelearning approaches are the most important factors for theacceptance of using these methods in IoT sectors. Shokri etal. [86] proposed a method to address the privacy preservationissues in DL models when they are subject to distributedlearning. Their approach was able to preserve both the privacyof participants’ training data and the accuracy of the models atthe same time. The core of their approach is based on the factthat stochastic gradient descent optimization algorithms, usedin many DL architectures, can be performed in a parallel andasynchronous way. Individual participants can thus indepen-dently train the model on their own data and share a portionof their model parameters with other participants. Abadi etal. [87] also proposed a method for privacy guarantee in DLmodels using differentially private stochastic gradient descentalgorithm.

B. Applications

1) Smart Homes:The concept of smart homes involve a broad range of applica-tions based on IoT, which can contribute to enhancing homes’energy usage and efficiency, as well as the convenience,productivity, and life-quality of their occupants. Nowadays,home appliances can connect to the Internet and provideintelligent services. For example, Microsoft and Liebherr in acollaborative project are applying Cortana DL to the informa-tion gathered from inside the refrigerator [88]. These analyticsand predictions can help the household to have a better controlon their home supplies and expenses, and, in conjunction withother external data, can be used for monitoring and predictinghealth trends.

Over one third of the generated electricity in the U.S. isconsumed by the residential sector [89], with HVAC andlighting devices consisting the largest source of such con-sumption in buildings. This demand is expected to grow ina slower pace by smart management of energy as well as theefficiency improvements in appliances. Hence, the ability tocontrol and improve energy efficiency and predict the futureneed is a must for smart home systems. In the smart homeapplications, electricity load prediction are the most common

15

applications that employ different DL networks to figure outthe task. Manic et al. [89] performed a comparison analysis ofload forecasting for home energy consumption using three DLarchitectures, including LSTM, LSTM Sequence-to-Sequence(S2S) and CNN. Their results show that LSTM S2S predictsthe future usage better than the other architectures, followed byCNN, and then LSTM. They also compared the same datasetover a conventional ANN, and all of the aforementionedmodels outperformed the ANN model.

Feng et al. [90] report the use of RBMs and DBNs forfall detection in a home care environment. Normal posturesin such environment are standing, sitting, bending, and lying.Lying on the floor longer than a threshold is considered as afallen posture. Their evaluation shows that RBM outperformsDBN in terms of classification accuracy. The lack of largedatasets and performing offline detection are the restrictionsof their method.

2) Smart City:Smart city services span over several IoT domains, such astransportation, energy, agriculture, etc. However, this area ismore interesting from a machine learning perspective as theheterogeneous data coming from different domains lead to bigdata, which can result in high-quality output when analyzedusing DL models.

Toshiba has recently developed a DL testbed jointly withDell Technologies, and used this testbed in a Smart Commu-nity Center in Kawasaki, Japan, to evaluate the data collectedin the Center [91]. The aim of running the testbed is to measurethe effectiveness of using DL architectures in IoT ecosystems,and identify the best practices for service improvement includ-ing increasing machines’ availability, optimizing monitoringsensors, and lowering maintenance expenses. The big data thatfeeds the testbed were gathered from building management, airconditioning and building security.

One important issue for smart city is predicting crowdmovements patterns, and their use in public transportation.Song et al. [92] developed a system based on DNN modelsto achieve this goal on a city level. Their system is built upona four-layer LSTM neural network to learn from human mo-bility data (GPS data) joint with their transportation transitionmodes (e.g., stay, walk, bicycle, car, train). They treated theprediction of people’s mobility and transportation mode as twoseparated tasks instead of joining all these features together.Consequently, their learning system is based on a multi-taskdeep LSTM architecture to jointly learn from the two setsof features. The choice of LSTM was driven by the spatio-temporal nature of human mobility patterns. The authors assertthat their approach based on multi-task deep LSTM achievesbetter performance compared to both shallow LSTMs havingonly one single LSTM layer as well as deep LSTMs withoutmulti-tasking.

Liang et al. [93] presented a real-time crowd density predic-tion system in transportation stations that leverages the mobilephone users’ telecommunication data known as caller detailrecord (CDR). CDR data are gathered when a user takes atelecommunication action (i.e., call, SMS, MMS, and Internetaccess) on the phone, which usually includes data about user

TABLE IIITYPICAL IOT-BASED SERVICES IN SMART CITY

Service Reference Input data DL architecture

Crowd density/transportationprediction

[92] GPS/ transition mode LSTM

[93]Telecommunicationdata/CDR

RNN

Wastemanagement

[58] Garbage images CNN

Parking lotmanagement

[94], [95]Images of parkingspaces

CNN

ID, time, location, and the telecommunication action of theuser. They built their system based on an RNN model for metrostations, and reported more accurate predictions compared tononlinear autoregressive neural network models.

Waste management and garbage classification is anotherrelated task for smart cities. A straightforward method toperform this automation is through vision-based classificationsusing deep CNNs as it has been done in [58].

Amato et al. in [94] developed a decentralized system toidentify the occupied and empty spots in parking lots usingsmart cameras and deep CNNs. The authors deployed a smallarchitecture of a CNN on smart cameras, which are equippedwith Raspberry Pi 2 model. These embedded devices in smartcameras can thus run the CNN on each device to classifyimages of individual parking spaces as occupied or empty.The cameras then send only the classification output to acentral server. Valipour et al. [95] also developed a systemfor detecting parking spots using CNN, which has shownsatisfactory results compared to SVM baselines. Table IIIsummarizes the aforementioned attempts.

3) Energy:The two way communication between energy consumers andthe smart grid is a source of IoT big data. In this context, smartmeters are in the role of data generation and acquisition forthe fine grained level of energy consumption measurement.Energy providers are interested to learn the local energyconsumption patterns, predict the needs, and make appropriatedecisions based on real-time analytics. Mocanu et al. in [96]have developed a kind of RBM to identify and predict thebuildings’ energy flexibility in real-time. The advantage ofthis model beyond showing good performance and accuracy isthat flexibility identification can be performed with flexibilityprediction concurrently. In [97], two variations of RBMs areused to forecast energy consumption for short term intervalsin residential houses. The model includes a Conditional RBM(CRBM) and a Factored Conditional RBM (FCRBM). Theirresults indicate that FCRBM performs better that CRBM,RNN and ANN. Moreover, by extending the forecasting hori-zon, FCRBM and CRBM show more accurate predictions thanthe RBM and ANN.

On the smart grid side, forecasting the power from solar,wind, or other types of natural sustainable sources of energyis an active research field. DL is increasingly used in manyapplications in this domain. For example, in [98], Gensler etal. investigate the performance of several DL models, such

16

as DBNs, AEs, and LSTMs, as well as multilayer perceptron(MLP) for predicting the solar power of 21 photovoltaicplants. For solar power prediction, a main element of theinput is a numeric value for weather forecasting in a giventime horizon. From their evaluation, the combination of AEsand LSTMs (Auto-LSTM) has been shown to produce thebest results compared to other models, followed by DBN. Thereason for obtaining a good prediction score by Auto-LSTMis that they are able to extract features from raw data, which isnot the case for ANN and MLP. In [63], an online forecastingsystem based on LSTM is proposed to predict the solar flarepower 24 hours ahead.

4) Intelligent Transportation Systems:Data from Intelligent Transportation Systems (ITS) is anothersource of big data that is becoming ubiquitous every day.Ma et al. [56] presented a system of transportation networkanalysis based on DL. They have employed RBM and RNNarchitectures as their models in a parallel computing environ-ment, and GPS data from participating taxies as the input ofthe models. The accuracy of their system to predict trafficcongestion evolution over one hour of aggregated data isreported to be as high as 88% which was computed within lessthan 6 minutes. [99] also reported the investigation on short-term traffic flow prediction. They used LSTM as their learningmodel and reported better accuracy for LSTM compared toother methods including SVM, simple feed forward neuralnetworks, and stacked AEs. For different intervals (15, 30,45, and 60 min) LSTM showed the lowest mean absolutepercentage error (MAPE) rate. However, for short intervals of15 minutes, the error rate of SVM is slightly higher than theLSTM model. This result can be interpreted by the fast that thesmall number of data points in short intervals does not makestronger discrimination boundaries for the classification taskin the LSTM model compared to the SVM model. In anotherstudy [84], ITS data are exposed to an intrusion detectionsystem based on DNN to improve the security of in-vehicularnetwork communications.

ITS also motivate the development of methods for traf-fic sign detection and recognition. Applications such as au-tonomous driving, driver assistance systems, and mobile map-ping need such sort of mechanisms to provide reliable services.Ciresan et al. [100] presented a traffic sign recognition systembased on DNNs of convolutional and max-pooling layers. Theyintroduced a multi-column DNN architecture that includesseveral columns of separate DNNs, and reported increasedaccuracy with this approach. The input is preprocessed by sev-eral different preprocessors, and a random number of columnsreceives the preprocessed input to proceed with training. Thefinal prediction output is the average of all the DNNs’ outputs.Their results show that this proposed method, achieving arecognition rate of 99.46%, has been able to recognize trafficsigns better than the humans on the task with 0.62% moreaccuracy.

In order to be applicable in real scenarios, these analyticsneed to be performed in real-time. Lim et al. in [101] proposeda real-time traffic sign detection based on CNN that has beenintegrated with a general purpose GPU. They reported F1

measure of at least 0.89 in their results with data havingillumination changes. To have a faster inference engine, theyused a CNN with two convolutional layers.

Furthermore, self-driving cars use DNNs in performingmany tasks, such as detecting pedestrians, traffic signs,obstacles, etc. There are several startups that use DL in theirself-driving cars to perform different tasks when driving inthe streets [102].

5) Healthcare and Wellbeing:IoT combined with DL have been also employed in providinghealthcare and wellbeing solutions for individuals and commu-nities. For instance, developing solutions based on mobile appsto accurately measure dietary intakes is a track of researchthat can help control the health and wellbeing of individuals.Liu et al. in [103] and [59] developed a system to recognizefood images and their relevant information, such as typesand portion sizes. Their image recognition algorithm is basedon CNNs that achieved competitive results compared to thebaseline systems.

DL for classification and analysis of medical images is ahot topic in the healthcare domain. For example, Pereira etal. [104] used the idea of recognizing handwritten imagesby CNNs to help identifying Parkinson’s disease in its earlystages. Their model learns features from the signals of a smartpen that uses sensors to measure handwritten dynamics duringthe individual’s exam. Muhammad et al. [105] propose a voicepathology detection system using IoT and cloud frameworks,in which patients’ voice signals are captured through sensordevices and are sent to a cloud server for analytics. Theyused an extreme learning machine trained by voice signalsto diagnose the pathology. In [106], DL was employed fordetection of cardiovascular diseases from mammograms. Intheir study, Wang et al. used breast arterial calcification(BAC) revealed in mammograms as a sign of coronary arterydisease. They developed a CNN with twelve layers to identifythe existence of BAC in a patient. Their results show thatthe accuracy of their DL model is as good as the humanexperts. Although this work has been done offline, it showsthe potential of developing or extending mammogram devicesin IoT contexts for online and early detection of such diseases.

Researchers also used time series medical data inconjunction with RNN based models for early diagnosis andprediction of diseases. Lipton et al. [107] investigated theperformance of LSTM networks to analyze and recognizepatterns in multivariate time series of medical measurementsin intensive care units (ICUs). The input data in their systemconsist of sensor data of vital signs as well as lab test results.Their performance results show that an LSTM model trainedon raw time-series data outperforms a MLP network. Asurvey of DL in health informatics is provided in [108].

6) Agriculture:Producing healthy crops and developing efficient ways ofgrowing plants is a requirement for a healthy society andsustainable environment. Disease recognition in plants usingDNNs is a direction that have shown to be a viable solution.In a study that is reported by Sladojevic et al. [60], the

17

authors developed a plant disease recognition system basedon the classification of leave images. They have used a deepconvolutional network model implemented using the Caffeframework. In this model, diseased leaves in 13 categoriescan be identified from the healthy ones with an accuracy ofabout 96%. Such recognition model can be exploited as a smartmobile applications for farmers to identify the fruit, vegetable,or plant disease based on their leaf images captured by theirmobile devices. It can also allow them to select remedies orpesticides in conjunction with complementary data.

DL also has been used in remote sensing for land and cropdetection and classification [109] [110] [111]. The directionestablished in these works enabled the automated monitoringand management of the agricultural lands in large scales. Inmost of such works, deep convolutional networks are usedto learn from images of the land or crops. In [109], it isreported that using CNN has yielded an accuracy of 85% indetecting major crops, including wheat, sunflower, soybeans,and maize, while outperforming other approaches such as MLPand random forest (RF).

Furthermore, DL has been reported to be utilized for predic-tion and detection tasks for automatic farming. For example,[112] has used a DL model based on deep CNNs for obstacledetection in agricultural fields, which enables autonomousmachines to operate safely in them. The proposed system wasable to detect a standardized object with an accuracy between90.8% to 99.9%, based on the field (e.g., row crops or grassmowing).

Moreover, fruit detection and finding out the stage of fruit(raw or ripe) is critical for automated harvesting. In [113],Sa et al. used a variation of CNN, called Region-basedCNN, for image analysis of fruits. The input image of thesystem comes in two modes: one containing RGB colorsand the other is near-infrared. The information of theseimages are combined in the model and has achieved detectionimprovement compared to pixel-based training models.

7) Education:IoT and DL contribute to the efficiency of education systems,from kindergarten to higher education. Mobile devices cangather learners’ data and deep analytical methods can beused for prediction and interpretation of learners progress andachievements. Augmented reality technology combined withwearables and mobile devices are also potential applicationsfor DL methods in this area to make students motivated,lessons and studies to be interesting, and make educationallearning methods to be efficient [114], [115]. Moreover, DLcan be used as a personalized recommendation module [116]to recommend more relevant content to the educator. Theapplications of DL in other domains, such as natural languagetranslation and text summarization, would be of help for smarteducation when it comes to online learning on mobile devices.

Furthermore, the advent of Massive Open Online Courses(MOOCs) and their popularity among the students has led togenerating a huge amount of data from the learners’ behaviorin such courses. MOOCs analysis can help identify strugglingstudents in early sessions of a course, and provide sufficientsupport and attention from instructors to those students to

achieve a better performance. Yang et al. [117] proposed amethod for predicting student grades in MOOCs. They useclickstream data collected from lecture videos when studentsare watching the video and interacting with it. Clickstreamdata are fed to a time series DNN that learns from both priorperformance and clickstream data. In addition, Piech et al.applied RNN and LSTM networks to model the prediction ofeducator answers to exercises and quizzes, based on their pastactivities and interactions in MOOCs [118]. Results showedimprovement over Bayesian Knowledge Tracing (BKT) meth-ods, which employ a Hidden Markov Model (HMM) forupdating probabilities of single concepts. Mehmood et al. [54]also used DNNs for a personalized ubiquitous e-teaching ande-learning framework, based on IoT technologies, aiming forthe development and delivery of educational content in smartcities. Their proposed framework is built on top of an IoTinfrastructure (e.g., smart phone sensors, smart watch sensors,virtual reality technologies) connecting the users in order tooptimize the teaching and learning processes. They used DNNfor human activity recognition to deliver adaptive educationalcontent to the students.

Classroom occupancy monitoring is another applicationthat has been investigated by Conti et al. [119]. In thiswork, the authors propose two methods for head detectionand density estimation, both based on CNN architecture forcounting students in a classroom. The algorithms have beendeployed on off-the-shelf embedded mobile ARM platform.Their algorithm receives the images that are taken from thecameras in three classrooms with a rate of three picturesevery 10 minutes. They report that the root-mean-square(RMS) error of their algorithms is at most 8.55.

8) Industry:For the industry sector, IoT and cyber-physical systems (CPS)are the core elements to advance manufacturing technolo-gies toward smart manufacturing (a.k.a Industry 4.0). Pro-viding high-accuracy intelligent systems is critical in suchapplications, as it directly leads to increased efficiency andproductivity in assembly/product lines, as well as decreasedmaintenance expenses and operation costs. Therefore, DLcan play a key role in this field. Indeed, a wide range ofapplications in industry (such as visual inspection of productlines, object detection and tracking, controlling robots, faultdiagnosis, etc.) can benefit from introduction of DL models.

In [55], visual inspection is investigated using CNN ar-chitectures including AlexNet and GoogLeNet over differentplatforms (Caffe, Tensorflow, and Torch). In this work, severalimages of produced vehicles in the assembly line along withtheir annotation are submitted to a DL system. It has beenfound that the best performance is achieved using Tensorflowwith accuracy of 94%. Moreover, Tensorflow was the fastestframework in terms of training time, where the model reachedits peak accuracy in a shorter time, followed by Torch and thenCaffe.

Shao et al. [120] used DNNs for feature extraction in a faultdiagnosis (also referred as fault detection and classification(FDC)) system for rotating devices. Models using denoisingauto-encoder (DAE) and contractive auto-encoder (CAE) were

18

developed. The learned features from these models were bothrefined using a method called locality preserving projection(LPP), and fed to a softmax classifier for fault diagnosis.Based on their experiments for fault diagnosis of rotor andlocomotive bearing devices, the proposed approach is reportedto outperform CNN and shallow learning methods.

In another study reported in [121], a DBN model wasproposed in conjunction with an IoT deployment and cloudplatform to support fault detection of defect types in cars’headlight modules in a vehicle manufacturer setting. Theirresults confirmed the superior performance of the DBN modelover two baseline methods, using SVM and radial basisfunction (RBF), in terms of error rate in test dataset. However,the reported error rate for their training dataset in the DBNmodel is comparable to that of the SVM model.

For the problem of fault detection and classification (FDC)in noisy settings, [122] employed stacked denoising AEs(SdA) to both reduce the noise of sensory data caused bymechanical and electrical disturbances, and perform faultclassification. Their system was applied for fault detection inwafer samples of a photolithography process. Results showthat SdA leads to 14% more accuracy in noisy situationscompared to several baseline methods including K-NearestNeighbors and SVM. Yan et al. [123] have also used SdAjoint with extreme learning machines for anomaly detectionin the behavior of gas turbine combustion system. Based ontheir results, the use of learned features by SdA leads to abetter classification accuracy compared to the use of handcrafted features in their system.

9) Government:Governments can gain great potential advantages throughenhanced and intelligent connectivity that comes from theconvergence of IoT and DL. Indeed, a wide variety of tasksthat pertains to the governments or city authorities requireprecise analysis and prediction. For instance, the recognitionand prediction of natural disasters (landslide, hurricane, forestfires, etc.) and environmental monitoring is of high importancefor governments to take appropriate actions. Optical remotesensing images that are fed to a deep AEs network andsoftmax classifiers were proposed by Liu et al. [124] to predictgeological landslides. An accuracy of 97.4% was reportedfor the proposed method, thus outperforming SVM and ANNmodels. In another study [125], an LSTM network is used forthe prediction of earthquakes. They used the historical datafrom US Geological Survey website for training. Their systemwas shown to achieve an accuracy of 63% and 74% with 1-Dand 2-D input data, respectively. In another study by Liu etal. [61], a CNN architecture is used for detection of extremeclimate events, such as tropical cyclones, atmospheric riversand weather fronts. Training data in their system includedimage patterns of climate events. The authors developed theirsystem in Neon framework and achieved accuracy of 89%-99%.

In addition, damage detection in the infrastructures of thecities, such as roads, water pipelines, etc., is another areawhere IoT and DL can provide benefits to governments. In[126], the problem of road damage detection was addressed

using DNNs that gets its data through crowd-sourcing, whichcan be enabled by IoT devices. Citizens can report thedamage through a mobile app to a platform. However, thesecitizens have no expert knowledge to accurately assess thestatus of road damage, which may lead to uncertain and/orwrong assessments. To eliminate these instances, the appcan determine the status of the road damage by analyzingthe image of the scene. The analysis is performed by adeep CNN that is trained by citizen reports as well as roadmanager inspection results. Since the training phase is out ofthe capability of mobile phones, the DL model is created ona server and trained everyday. An Android application canthen download the latest model from the server upon eachlaunch, and identify the status of road damages reported byimages. Evaluations showed a damage classification accuracyof 81.4% in 1 second of analysis on the mobile devices.

10) Sport and Entertainment:Sports analytics have been evolving rapidly during the recentyears and plays an important role to bring a competitiveadvantage for a team or player. Professional sport teamsnowadays have dedicated departments or employees for theiranalytics [127]. Analytics and predictions in this field canbe used to track the players’ behavior, performance, scorecapturing, etc. DL is new to this area and only few workshave used DNNs in different sports.

In [128], a DL method has been proposed for making anintelligent basketball arena. This system makes use of SVMto choose the best camera for real-time broadcasting fromamong the available cameras around the court. They alsofed basketball energy images to a CNN to capture the shootand scoring clips from the non-scoring ones, hence providingaccurate online score reporting and interesting highlight clips.This system was shown to achieve an accuracy of 94.59% incapturing score clips with 45 ms of processing time for eachframe.

In another work by Wang et al. [129], an RNN has beenused for classification of offensive basketball plays in NBAgames. The authors used video clips of the games fromSportVU3 dataset. This dataset provides videos of the rateof 25 frames per second to detect players’ unique ID, theirlocation on the court, and the position of the ball. Their modelis shown to achieve accuracy of 66% and 80% for top-1 andtop-3 accuracies, respectively. Similarly, [130] used an RNNwith LSTM units over the same dataset to predict the successrates of three-point shots, and reported better classificationaccuracy compared to gradient boosted machine (GBM) andgeneralized linear model (GLM).

Kautz et al. [131] investigated players’ activity recognitionin volleyball. Wearable sensor data and CNN were employedto achieve this task, and a classification accuracy of 83.2% toidentify players activities was observed.

Group activity recognition is another interesting directionfor sport teams. Ibrahim et al. [132] investigated this optionin a volleyball team using a hierarchical LSTM model. In thiswork, a single LSTM model was built to derive the activities

3http://go.stats.com/sportvu

19

of each player, and a top-level LSTM model was designedto aggregate the individual models to identify the overallbehavior of the team. A CNN model was utilized to extractfeatures from video frames, and feed them to the individualLSTM models. Compared to several baseline models, theproposed hierarchical model obtained better classificationresults.

11) Retail:Due to the proliferation of mobile devices, online shoppinghas increased greatly. A recent shift toward product imageretrieval through visual search techniques was noticed [133].CNNs have been used for this visual search of clothes andfashion market, to find items in online shops that are identicalor similar to what you have seen in a movie [134] or in thestreet [135].

Moreover, shopping for visually impaired people needsto be made convenient. A combination of IoT technologies,including smart carts, integrated with DL methods can be asolution to this problem. In [136], a visual grocery assistancesystem that includes smart glasses, gloves, and shopping cartswas designed to help visually impaired people in shopping.This system also used a CNN to detect items in the aisles.

Moreover, check-out counters in retail stores are usually thebottlenecks where people queue up to pay their shoppings. Thedevelopment of smart carts can enable real-time self check-outand enhancing such system with prediction capabilities canoffer an item that a customer may need based on his/her pastshopping.

Furthermore, recommending items to shoppers is a popularapplication of IoT for retails that uses different technologies,like BLE signals or visual cameras. The latter approach can bedone through identifying the shop items or shoppers actions(e.g., reach to a shelf, retract from a shelf, etc.) [137] andproviding a list of related items for the detected action.

To analyze the customer interest in merchandise, Liu etal. [75] proposed a customer pose and orientation estimationsystem based on a DNN consisting of a CNN and RNN.The input data comes from surveillance cameras. The CNNnetwork is used to extract image features. The image featuresand the last predicted orientation features are then fed to anRNN to get the output pose and orientation.

C. Lessons Learned

In this section, we have identified five classes of IoT servicesas the foundational services that can be used in a wide rangeof IoT applications. We discussed how DL have been usedto achieve these services. Moreover, we went through a widerange of IoT domains to find out how they exploit DL todeliver an intelligent service. Table IV shows the works thatutilized foundational services in IoT domains.

Many IoT domains and applications have greatly benefitedfrom image recognition. The interest is expected to grow fasteras the high-resolution cameras embedded in smart phones willresult in easier generation of image and video data. The usageof other fundamental applications, especially physiological andpsychological detections as well as localization, can be seen in

Fig. 16. The percentage of surveyed papers that have used DL models

different fields. However, the utilization of security and privacyservices is shown to be limited. This is the gap in developingintelligent IoT applications, where the potential activities ofhackers and attackers are ignored. Also, voice recognition withDL has not been used widely in IoT applications belongingto several domains, such as smart homes, education, ITS,and industry. There are works that use voice recognition withtraditional machine learning approaches. Voice recognition hasshown remarkable advancement with DL. One reason for thefew appearance of this technique in IoT applications is the lackof comprehensive training datasets for each domain, as thereis a need for large training datasets to train voice recognitionDNNs.

Foundational services need fast data analytics to be efficientin their context. Despite several works in this direction,IoT fast data analytics based on DL has many spaces fordevelopment of algorithms and architectures.

Table V summarizes the research in each domain, and theirDL model. Figure 16 also depicts the frequency of differentmodels that have been used in the different research works.About 43% of the papers have used CNN in building theirproposed systems while DBN are less used compared to othermodels (about 7%). RNNs and LSTMs together, as time-series models, have been used in 30% of the works. The tablealso emphasizes the great impact of works related to imagerecognition on IoT applications. Moreover, one third of theIoT applications are related to time-series or serial data, inwhich employing RNNs is a helpful approach.

V. DL ON IOT DEVICES

Prior to the era of IoT, most research on DL targetedthe improvement of its models and algorithms to efficientlyoperate when the scale of the problem grows to the bigdata, by trying to deploy efficient models on cloud platforms.The emergence of IoT has then opened up a totally differentdirection when the scale of the problems shrank down toresource-constrained devices and to the need for real-timeanalytics.

Smart objects need to support some sort of light-weightintelligence. Due to DL’s successful results in speech andvideo applications, which are among the fundamental services

20

TABLE IVUSAGE OF FOUNDATIONAL SERVICES IN IOT DOMAINS.

IoT Foundational Services

Image Recognition Voice RecognitionPhysiological &PsychologicalDetection

LocalizationSecurity &Privacy

IoT

Dom

ains

Smart Home [90]Smart City [58], [94], [95] [92], [93]Energy [83]ITS [100], [101] [84]Healthcare [59], [103], [104], [106] [105]Agriculture [60], [109]–[113]Education [119] [54]Industry [55], [121] [33]Government [61], [124], [126]Sport [128]–[130] [131], [132] [73]Retail [133]–[136] [75] [137]

TABLE VTHE COMMON USE OF DIFFERENT DNN MODELS IN IOT DOMAINS.

DomainUsage of DNNs

AE CNN DBN LSTM RBM RNN

Image Recognition [124]

[58]–[60], [94], [95][100], [101], [104], [106][109], [112], [119][128], [132], [134]

[121] [130] [129], [130]

Physiological &Psychological Detection

[80], [81][74], [76], [78][79], [81], [82]

[81] [76], [77] [79], [80], [82]

Localization [69], [70] [71], [72] [73] [68]Privacy and Security [86] [83], [85]Smart home [89] [90] [89] [90]Smart city [58], [94], [95] [92] [93]Energy [98] [98] [98] [63] [96] [97] [97]ITS [100], [101] [84] [99] [56] [56]Healthcare [59], [103], [104], [106] [107]Agriculture [60], [109]–[113]Education [119] [118] [117], [118]Industry [120], [122], [123] [55] [121]Government [124] [61], [126] [125]Sport [128], [131], [132] [130], [132] [129]Retail [133]–[136] [137]

and common uses of IoT, adapting its models and approachesfor deployment on resource-constrained devices became a verycrucial point of study. So far, DL methods can hardly be usedin IoT and resource-constrained devices since they will acquirea large portion of resources, such as the processors, batteryenergy, and memory. In some cases, the available resources areeven not sufficient for running a DL algorithm [57]. Luckily,it has been recently shown that many parameters that arestored in DNNs may be redundant [138]. It is also sometimesunnecessary to use a large number of hidden layers to geta high accuracy [139]. Consequently, efficiently removingthese parameters and/or layers will considerably reduce thecomplexity of these DNNs without significant degradation ofthe output [138], [139]. In the remaining of this section, wewill discuss methods and technologies to achieve this results,and illustrate their applications in different domains.

A. Methods and Technologies

DL models may consist of millions or even billions ofparameters which need sophisticated computing and largestorage resources. In this section, we discuss several state-of-the-art approaches that bring DL models to IoT embeddedand resource constrained devices.

1) Network Compression:One way of adopting DNNs in resource-constrained devices isnetwork compression, in which a dense network is converted toa sparse network. This approach helps in reducing the storageand computational requirements of DNNs. The main limitationof this approach is that they are not general enough to supportall kinds of networks. It is only applicable to specific networkmodels that can exhibit such sparsity.

Another interesting study to adopt compressed DL modelson IoT devices is the one performed by Lane et al. [57]. In

21

this study, the authors measure different factors that embedded,mobile, and wearable devices can bear for running DL algo-rithms. These factors included measurements of the runningtime, energy consumption, and memory footprint. The studyfocused on investigating the behavior of CNNs and DNNsin three hardware platforms that are used in IoT, mobile,and wearable applications, namely Snapdragon 800 used insome models of smart phones and tablets, Intel Edison usedin wearable and form-factor sensitive IoT, and Nvidia TegraK1 employed in smart phones as well as IoT-enabled vehicles.Torch has been used for developing and training DNNs, andAlexNet [21] was the dominant model used in these platforms.Their measurement of energy usage indicated that all theplatforms, including Intel Edison (which is the weakest one),were able to run the compressed models. In terms of executiontime for CNNs, it has been shown that the later convolutionallayers tend to consume less time as their dimensions decrease.

Moreover, it is known that feed-forward layers are muchfaster than the convolutional layers in CNNs. Consequently,a good approach for improving CNN models on resource-constrained devices is to replace convolutional layers withfeed-forward layers whenever possible. In addition, choosingthe employed activation function in DNNs can have a greateffect on time efficiency. For instance, several tests have shownthat ReLU functions are more time-efficient followed by Tanh,and then Sigmoid. However, the overall runtime reduction ofsuch selection is not significant (less than 7%) compared to theexecution time of layers (at least 25%). In terms of memoryusage, CNNs use less storage than DNNs due to the fewerstored parameters in convolutional layers compared to theircounterpart in DNNs.

As previously stated, reducing the number of employedparameters in DNNs, by pruning redundant and less-importantones, is another important approach to make DNNs imple-mentable on resource-constrained devices. One of the firstworks on this approach is Optimal Brain Damage [140] in1989. At the core of this method, the second order derivativeof the parameters are used to compute the importance ofthe parameters and prune unimportant parameters from thenetwork. The method presented in [141] also works based onpruning redundant and unnecessary connections and neuronsas well as using weight sharing mechanisms. Weight sharingreplaces each weight with an n bit index from a shared tablethat has 2n possible values. The steps to prune the network asdescribe by Han et al. in [141] consist of:

• Train the network to find the connections with highweights.

• Prune the unimportant connections that have a weight lessthan a threshold.

• After pruning, there may remain some neurons withno input nor output connections. The pruning processidentifies these neurons and removes them as well as alltheir remaining connections.

• Retrain the network to fine-tune the weight of the updatedmodel. The weights should be transferred from the pre-vious trained steps instead of initializing them, otherwisethe performance will degrade to some extent.

Fig. 17. The overall concept of pruning a DNN.

The authors evaluated this approach on four models related tovision, namely AlexNet, VGG-16, LeNet-300-100, and LeNet-5. The models were compressed at least 9 times for AlexNetand 13 times at VGG-16, while the accuracy of the modelswere almost preserved. One limitation of this approach is thatit cannot be used for other types of DNN models. Moreover,the resulting compressed networks are not efficient enough onall hardware platforms and CPU architectures, and thus neednew kinds of accelerators that can handle dynamic activationsparsity and weight sharing. Figure 17 illustrates the conceptof pruning a DNN.

In [142], an inference engine, called EIE, was designedwith a special hardware architecture and SRAMs insteadof DRAMs, and was shown to work well with compressednetwork models. In this architecture, customized sparse matrixvector multiplication and weight sharing are handled withoutlosing the efficiency of the network. The engine consists ofa scalable array of processing elements (PEs), each of whichkeeping a part of the network in an SRAM and performing itscorresponding computations. Since most of the energy thatis used by neural networks is consumed for accessing thememory, the energy usage is reported to be 120 times fewerwith this designed accelerator than the energy consumption ofthe corresponding original network.

In HashedNets [143], the neural network connection weightsare randomly grouped into hash buckets using a hash function.All connections that fall into a same bucket are represented bya single parameter. Backpropagation is used to fine-tune theparameters during the training. Testing results show that theaccuracy of this hash-based compressed model outperforms allother compression baseline methods.

The work in [144] by Courbariaux et al. proposed tobinarize network weights and neurons at both the inferencephase and the entire training phase, in order to reduce thememory footprint and accesses. The network can also performmost of the arithmetic operations through bit-wise operations,leading to a decreased power consumption. MNIST, CIFAR-10 and SVHN datasets were tested over Torch7 and Theanoframeworks using this approach, and results were found tobe promising.

2) Approximate Computing:Approximate computing is another approach to bothimplement machine learning tools on IoT devices andcontribute to the energy saving of their hosting devices [145],

22

TABLE VIMETHODS AND TECHNOLOGIES TO BRING DL ON IOT DEVICES

Method / Technology Reference Pros Cons

Network Compression [141] [143] [144] • Reduce storage and computation• Not general for all DL models• Need specific hardware• The pruning process bring overload to training

Approximate Computing [145], [146] • Makes fast DL models• Save energy • Not suitable for precise systems

Accelerators [67] [142] [147][148] [149] [150]

• Integrates DL model with the hardware• Efficient computations • Does not work with the traditional hardware platforms

Tinymote with DL [151]• Good for time-critical IoT apps• Energy-efficient• Provides more security and privacy for data

• Special-purpose networks

[146]. The validity of this approach arises from the factthat, in many IoT applications, machine learning outputs(e.g. predictions) need not to be exact, but rather to be inan acceptable range providing the desired quality. Indeed,these approaches need to define quality thresholds thatthe output must not pass. Integrating DL models withapproximate computing can lead to more efficient DLmodels for resource-constrained devices. Venkataramani etal. [145] proposed the extension of approximate computingto neural networks, and converted a neural network to anapproximate neural network. In their approach, the authorsextend backpropagation to identify the neurons with theleast effect on output accuracy. Then, the approximate NNis formed by substituting less important neurons in theoriginal network with their approximate counterparts. Makingapproximate neurons is performed by an approximate designtechnique called precision scaling. Instead of using a typicalfixed number of bits (16-bit or 32-bit format) to presentcomputations, various number of bits (4 - 10 bits) are used inthis technique. After forming the approximate network, theprecisions of the inputs and weights of neurons are adjustedto come up with an optimal trade-off between accuracy andenergy. There are also other attempts that have reportedapplying approximate computing with precision scaling onCNNs [146] and DBNs [152].

3) Accelerators:Designing specific hardware and circuits is another active

research direction aiming to optimize the energy efficiency[148] and memory footprint [149] of DL models in IoTdevices. In [153], several approaches for improving the in-telligence of IoT devices are identified including designingaccelerators for DNNs, and using Post-CMOS technologiessuch as spintronics that employs electron spinning mecha-nism [154]. This latter technology suggests a direction towardthe development of hybrid devices that can store data, performcomputations and communications within the same materialtechnology.

The works in [147] and [148] have reported an investigationof developing accelerators for DNNs and CNNs, respectively.Beyond the hardware accelerators, the work in [150] proposedthe use of a software accelerator for the inference phaseof DL models on mobile devices. It employs two resourcecontrol algorithms at run-time, one that compresses layersand the other that decomposes deep architecture models

across available processors. This software accelerator canbe a complementary solution for the hardware acceleratordesigns.

4) Tinymotes:In addition to all prior solutions, developing tiny size proces-sors (micromotes) with strong DL capabilities is on the rise[155] [151]. Designed within the range of one cubic millime-ter, such processors can be operated by batteries, consumingonly about 300 microwatts while performing on-board analysisusing deep network accelerators. By this technology, manytime-critical IoT applications can perform decision-makingon the device instead of sending data to high performancecomputers and waiting for their response. For applicationswhere data security and privacy are the main concerns, thisintegration of hardware and DL alleviates these concerns tosome extent, as no or only limited data needs to be sent to thecloud for analysis. Moons et al. [156] also developed a tinyprocessor for CNNs (total active area of 1.2×2 mm2) that ispower efficient (power consumption is from 25 to 288 mW).

B. Applications

There already exist mobile apps that employ DNNs toperform their tasks, such as using a CNN to identify garbagesin images [58]. However, resource consumption on these appsis still very high. Indeed, [58] reports about 5.6 seconds forreturning a prediction response, while consuming 83% of theCPU and 67 MB of memory.

Amato et al. [94] run a CNN on Raspberry Pi boardsthat were incorporated in smart cameras to find out theempty parking slots. Ravi et al. in [157] have reported thedevelopment of a fitness app for mobile devices that uses DLfor classifying human activities. The DL model is trained on astandard machine and then transferred to the mobile platformfor activity recognition. However, the input of the DL modelis mixed with several engineered features to improve the ac-curacy. As the authors describe, the small number of layers inthe DNN models dedicated for a resource-constrained deviceis a potential reason for them achieving a poor performance.In addition, the performance would not be satisfactorily if thetraining data is not well representing the entire ecosystem.

Nguyen et al. [158] proposed a conceptual software-hardware framework to support IoT smart applications. Theirframework consists of a cognitive engine and a smart con-nectivity component. The cognitive engine, which provides

23

cognitive functionality to smart objects, utilizes both DLalgorithms and game-theoretic decision analytics. To be suit-able for IoT, these algorithms must be deployed on low-power application-specific processors. The smart connectivitycomponent integrates with cognitive radio transceivers andbaseband processors to cater flexible and reliable connectionsto IoT smart objects.

C. Lessons Learned

In this section, the need to move toward supporting DLon IoT embedded and resource constrained devices werediscussed. The adversary characteristics of IoT devices andDL techniques make this direction more challenging sinceIoT devices can rarely host DL models due to their resourceconstraints. To tackle these challenges, several methods wereintroduced in the recent literature including:

• DNN compression• Approximate computing for DL• Accelerators• Tinymotes with DL.Network compression involves identifying unimportant con-

nections and neurons in a DNN through several rounds oftraining. While this is a promising approach for getting closeto real-time analytics on IoT devices, more investigations needto be performed to handle several challenges such as:

• It is not clear whether network compression approachesare suitable for data streaming, especially when the DLmodel is dynamic and may evolve over time.

• The compression methods for time-series architectures,such as RNN and LSTM, have not been well investigated,and there is a gap to see if the existing compressionmethods are applicable to these DL architectures.

• There is a need to specify the trade-off between therate of compression and accuracy of a DNN, as morecompression leads to degraded accuracy.

More recently, approximate computing approaches havealso been utilized in making DL models simpler and moreenergy-efficient, in order to operate them on resource con-strained devices. Similar to network compression techniques,these methods also take advantage of insignificant neurons.However, instead of manipulating the network structure, theypreserve the structure but change the computation representa-tions through bit-length reduction. For that reason, they seemapplicable to a variety of DL architectures and can even coverthe dynamic evolution of network models during run-time.Keeping a balance between accuracy and energy usage is theircommon goal. Nonetheless, more works are needed to find outthe superiority of one of these approaches for embedding DLmodels in IoT devices.

Moreover, we discussed the emergence of special and smallform-factor hardware that is designed to efficiently run DLmodels on embedded and resource constrained devices. Thesearchitectures can be utilized in wearable, mobile, and IoTdevices, due to their reduced resource demands and theirapplicability to time-sensitive IoT applications. However, theirgenerality to support any kind of DNN as well as their inter-

operability and compatibility with existing hardware platformsremain as clear challenges.

Table VI summarizes the methods and technologies utilizedin the recent literature to host DL analytics on IoT devicesalong with their pros and cons.

We also reviewed some applications that have implementedDL on resource constrained devices. Due to the aforemen-tioned challenges, there are not many well developed applica-tions in this category. However, by resolving these challengesand barriers, we will see the rise of many IoT applicationswhere their core DL model is embedded into the sensors,actuators, and IoT smart objects.

VI. FOG AND CLOUD-CENTRIC DL FOR IOT

Cloud computing is considered a promising solution forIoT big data analytics. However, it may not be ideal for IoTdata with security, legal/policy restrictions (e.g., data shouldnot be transferred into cloud centers that are hosted outsideof national territory), or time constraints. On the other hand,the high-level abstraction of data for some analytics purposesshould be acquired by aggregating several sources of IoTdata; hence, it is insufficient to deploy analytic solutions onindividual IoT nodes in these cases.

Instead of being only on the cloud, the idea of bringingcomputing and analytics closer to the end-users/devices hasbeen recently proposed under the name of fog computing.Relying on fog-based analytics, we can benefit from theadvantages of cloud computing while reducing/avoiding itsdrawbacks, such as network latency and security risks. It hasbeen shown that, by hosting data analytics on fog computingnodes, the overall performance can be improved due to theavoidance of transmitting large amounts of raw data to distantcloud nodes [159]. It is also possible to perform real-timeanalytics to some extent since the fog is hosted locally closeto the source of data. Smart application gateways are thecore elements in this new fog technology, performing someof the tasks currently done by cloud computing such as dataaggregation, classification, integration, and interpretation, thusfacilitating the use of IoT local computing resources.

The work in [160] proposed an intelligent IoT gateway thatsupports mechanisms by which the end users can control theapplication protocols in order to optimize the performance.The intelligent gateway primarily supports the inter-operationof different types of both IoT and resource-rich devices,causing them to be treated similarly. In the proposed intelligentgateway, a lightweight analytic tool is embedded to increasethe performance at the application level. Equipping IoT gate-ways and edge nodes with efficient DL algorithms can localizemany complex analytical tasks that are currently performed onthe cloud. Table VII summarizes several products that haveincorporated DL in their intelligent core, and can serve IoTdomains in the fog or cloud.

In the following subsection, we review several state-of-the-art enabling technologies that facilitate deep learning on thefog and cloud platforms.

24

A. Enabling Technologies and Platforms

Despite introducing DL analytics on fog infrastructure,cloud computing remains the only viable solution for analyticsin many IoT applications that cannot be handled by fogcomputing. For example, complex tasks such as video analysisrequire large and complex models with a lot of computingresources. Thus, designing scalable and high performancecloud-centric DNN models and algorithms, which can performanalytics on massive IoT data, is still an important researcharea. Coates et al. [161] proposed a large-scale system, basedon a cluster of GPU servers, which can perform the trainingof neural networks with 1 billion parameters on 3 machinesin few days. The system can be also scaled to train networkswith 11 billion parameters on 16 machines.

Project Adam [162] is another attempt to develop a scalableand efficient DL model. The system is based on distributedDL, where the computation and communication of the wholesystem are optimized for high scalability and efficiency. Theevaluation of this system using a cluster of 120 machinesshows that training a large DNN with 2 billion connectionachieves two times higher accuracy compared to a baselinesystem, while using 30 times fewer machines.

Google’s Tensor Processing Unit (TPU) [163] is a special-ized co-processor for DNNs in Google’s data centers. It wasdesigned in 2015 with the aim to accelerate the inference phaseof DNNs that are written by TensorFlow framework. From95% of DNN representatives in their data centers, CNNs onlyconstitute about 5% of the workload, while MLPs and LSTMscover the other 90%. Performance evaluation showed that TPUoutperforms its contemporary GPUs or CPUs on average, byachieving 15 to 30 times faster operation execution, whileconsuming 30 to 80 times fewer energy per TeraOps/second.

Beyond the infrastructural advancements to host scalableDL models on cloud platforms, there is a need for mechanismsand approaches to make DL models accessible through APIs,in order to be easily integrated into IoT applications. Thisaspect has not been investigated much, and only a few productsare available, such as Amazon’s AWS DL AMIs4, Googlecloud ML5, and IBM Watson6. This creates opportunitiesfor cloud providers to offer “DL models as a service” as anew sub-category of Software as a Service (SaaS). However,this imposes several challenges for cloud providers, since DLtasks are computationally intensive and may starve other cloudservices. Moreover, due to the data thirstiness of DL models,data transfers to the cloud may become a bottleneck. In orderto deliver DL analytics on the cloud, Figure 18 presents ageneral stack for DL models as a service. Different providersmay use their customized intelligence stack [164] [165].

B. Challenges

When DL analytics come to fog nodes, several challengesneed to be addressed, including:

• DL service discovery: Fog nodes are densely distributedin geographical regions, and each node may have specific

4https://aws.amazon.com/amazon-ai/amis/5https://cloud.google.com/products/machine-learning/6https://www.ibm.com/watson/

TABLE VIISOME PRODUCTS THAT USED DEEP LEARNING AND SERVING IOT

DOMAINS ON THE FOG OR CLOUD.

Product Description Application Platform

Amazon AlexaIntelligentpersonalassistant (IPA)

Smart home Fog

Microsoft Cortana IPA Smart Car, XBox Fog

Google Assistant IPA Smart Car,Smart home Fog

IBM Watson Cognitiveframework IoT domains Cloud

analytics capabilities (e.g., one node runs CNN modelsfor image detection, another node runs RNNs for time-series data prediction, etc.). So, the devices need toidentify the sources of appropriate analytic providersthrough some sort of extended service discovery protocolsfor DL analytics.

• DL model and task distribution: Partitioning the executionof DL models and tasks among the fog nodes, and opti-mally distributing of the data stream among the availablenodes are critical for time-sensitive applications [166].Aggregating the final results from the computing nodesand returning the action with the least latency are theother side of the coin.

• Design factors: Since fog computing environments are intheir infancy and are expected to evolve, it is worthwhileto investigate how the design factors of the fog environ-ment (e.g., architectures, resource management, etc.) andthe deployment of DL models in this environment canimpact the quality of analytic services. Alternatively, itwould be also interesting to see how far these designfactors can be tailored/extended to improve the operationand quality of DL analytics.

• Mobile edge: Through the ubiquity of mobile edge com-puting environments and their contribution to the IoT an-alytics, it is important to consider the dynamicity of suchenvironments for designing edge-assisted DL analyticssince mobile devices may join and leave the system. Also,the energy management of mobile edge devices should beaccurate when analytic tasks are delegated to them.

A few attempts reported the integration of DL on fognodes in the IoT ecosystems. For example, a proof of conceptfor deploying CNN models on fog nodes for machine healthprognosis was proposed by Qaisar et al. [167]. In their work,a thorough search among fog nodes is done to find free nodesto delegate analytic tasks to. Also, Li et al. [168] proposed asystem that leverages the collaboration of mobile and edgedevices running CNN models for object recognition.

C. Lessons Learned

In this section, we highlighted the role of cloud and fogcomputing and their enabling technologies, platforms andchallenges to deliver DL analytics to IoT applications. Thegreat success of cloud computing in support of DL is backedby the advancement and employment of optimized proces-sors for neural networks as well as scalable distributed DL

25

Fig. 18. A general stack of DL models as a service in the cloud platforms.

algorithms. Deploying DL models on fog platforms for IoTapplications, such as smart homes and smart grids, woulddraw the attention of the end users due to the ease ofaccessibility and fast response time. Nevertheless, cloud-basedDL analytics would be of great importance for long-term andcomplex data analytics that exceed the capabilities of fogcomputing. Some smart city applications, government sector,and nation-wide IoT deployments need to utilize cloud-basedDL infrastructures.

Currently, the integration of DL analytics into IoT appli-cations is limited to RESTful APIs, which are based on theHTTP protocol. While there exist several other applicationprotocols that are extensively used in IoT applications, suchas Message Queue Telemetry Transport (MQTT), ConstrainedApplication Protocol (CoAP), Extensible Messaging and Pres-ence Protocol (XMPP), and Advanced Message Queuing Pro-tocol (AMQP), the integration of these protocols with theDL analytic interfaces calls for enhancing their compatibilitywith the aforementioned protocols to eliminate the need formessage conversion proxies, which imposes extra overheadon the analytics response time.

We identified several challenges related to the deploymentand usage of DL models in support of analytics on fognodes. DL service discovery is a necessary requirement dueto the dense deployment of fog nodes, which makes brute-force search for available services an inefficient approach.Currently used service discovery protocols in IoT applications,such as multicast DNS (mDNS) or DNS Service Discovery(DNS-SD) [1], need to be extended to support DL servicediscovery (e.g., declare the type of analytics, DL model, inputshape, etc.). Efficient distribution of DL models and tasks, anddistribution of data streams on fog nodes and the aggregationof the results are other requirements that need to be addressed.

VII. IOT CHALLENGES FOR DEEP LEARNING, ANDFUTURE DIRECTIONS

In this section we first review several challenges that areimportant from the machine learning point of view to imple-ment and develop IoT analytics. Then we point out researchdirections that can fill the existing gaps for IoT analytics basedon DL approaches.

A. Challenges

1) Lack of Large IoT Dataset:The lack of availability of large real-world datasets for IoTapplications is a main hurdle for incorporating DL models inIoT, as more data is needed for DL to achieve more accuracy.Moreover, more data prevents the overfitting of the models.This shortage is a barrier for deployment and acceptance ofIoT analytics based on DL since the empirical validation andevaluation of the system should be shown promising in thenatural world. Access to the copyrighted datasets or privacyconsiderations are another burdens that are more common indomains with human data such as healthcare and education.Also, a portfolio of appropriate datasets would be of a lotof help for developers and researchers. A general list ofuseful datasets has been compiled in Wikipedia [169]. For theconvenience of researchers in machine learning applicationsin IoT, table VIII presents a collection of common datasetssuitable to use for DL.

2) Preprocessing:Preparing raw data in an appropriate representation to befed in DL models is another challenge for IoT applications.Most DL approaches need some sort of preprocessing toyield good results. For example, image processing techniquesby CNNs work better when the input data at the pixel levelare normalized, scaled into a specific range, or transformedinto a standard representation [21] [39]. For IoT applications,preprocessing is more complex since the system deals withdata from different sources that may have various formatsand distributions while showing missing data.

3) Secure and Privacy Preserving Deep Learning:Ensuring data security and privacy is a main concern inmany IoT applications, since IoT big data will be transferredthrough the Internet for analytics, and can be thus observedaround the world. While anonymization is used in manyapplications, these techniques can be hacked and re-identifiedas anonymized data. Moreover, DL training models are alsosubject to malicious attacks, such as False Data Injectionor adversarial sample inputs, by which many functionalor non-functional (e.g., availability, reliability, validity,trustworthiness, etc.) requirements of the IoT systems maybe in jeopardy. Indeed, DL models learn the features fromthe raw data, and can therefore learn from any invalid datafeed to it. In this case, DL models must be enhanced withsome mechanism to discover abnormal or invalid data. Adata monitoring DL model accompanying the main modelshould work in such scenarios. Papernot et al. [170] haveinvestigated the vulnerability of DNNs in adversarial settings

26

TABLE VIIICOMMON DATA SETS FOR DEEP LEARNING IN IOT.

Dataset Name Domain Provider Notes Address/Link

CGIAR dataset Agriculture,Climate CCAFS

High-resolution climatedatasets for a varietyof fields including agricultural

http://www.ccafs-climate.org/

EducationalProcessMining

Education Universityof Genova

Recordings of 115 subjects’activities through a loggingapplication while learningwith an educational simulator

http://archive.ics.uci.edu/ml/datasets/Educational+Process+Mining+%28EPM%29%3A+A+Learning+Analytics+Data+Set

CommercialBuildingEnergy Dataset

Energy,Smart Building IIITD

Energy related data setfrom a commercial buildingwhere data is sampledmore than once a minute.

http://combed.github.io/

Individualhouseholdelectric powerconsumption

Energy,Smart home

EDF R&D,Clamart,France

One-minute sampling rateover a period of almost4 years

http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption

AMPds dataset Energy,Smart home S. Makonin

AMPds contains electricity,water, and natural gasmeasurements at one minuteintervals for 2 years ofmonitoring

http://ampds.org/

UK DomesticAppliance-LevelElectricity

Energy,Smart Home

Kelly andKnottenbelt

Power demand from fivehouses. In each house boththe whole-house mainspower demand as well aspower demand from individualappliances are recorded.

http://www.doc.ic.ac.uk/∼dk3810/data/

PhysioBankdatabases Healthcare PhysioNet Archive of over 80

physiological datasets.https://physionet.org/physiobank/database/

SaarbrueckenVoice Database Healthcare

UniversitatdesSaarlandes

A collection of voicerecordings from more than2000 persons for pathologicalvoice detection.

http://www.stimmdatenbank.coli.uni-saarland.de/help en.php4

T-LESS Industry

CMP atCzechTechnicalUniversity

An RGB-D dataset andevaluation methodology fordetection and 6D poseestimation of texture-lessobjects

http://cmp.felk.cvut.cz/t-less/

CityPulse DatasetCollection Smart City

CityPulseEU FP7project

Road Traffic Data, PollutionData, Weather, Parking

http://iot.ee.surrey.ac.uk:8080/datasets.html

Open DataInstitute - nodeTrento

Smart City TelecomItalia

Weather, Air quality,Electricity,Telecommunication

http://theodi.fbk.eu/openbigdata/

Malaga datasets Smart City City ofMalaga

A broad range of categoriessuch as energy, ITS,weather, Industry, Sport, etc.

http://datosabiertos.malaga.eu/dataset

Gas sensors forhome activitymonitoring

Smart homeUniv. ofCaliforniaSan Diego

Recordings of 8 gas sensorsunder three conditionsincluding background, wineand banana presentations.

http://archive.ics.uci.edu/ml/datasets/Gas+sensors+for+home+activity+monitoring

CASAS datasetsfor activities ofdaily living

Smart homeWashingtonStateUniversity

Several public datasets relatedto Activities of Daily Living(ADL) performance in a two-story home, an apartment,and an office settings.

http://ailab.wsu.edu/casas/datasets.html

ARAS HumanActivity Dataset Smart home Bogazici

University

Human activity recognitiondatasets collected from tworeal houses with multipleresidents during two months.

https://www.cmpe.boun.edu.tr/aras/

MERLSense Data Smart home,building

MitsubishiElectricResearchLabs

Motion sensor data ofresidual traces from anetwork of over 200 sensorsfor two years, containingover 50 million records.

http://www.merl.com/wmd

SportVU Sport Stats LLCVideo of basketball andsoccer games captured from6 cameras.

http://go.stats.com/sportvu

RealDisp Sport O. Banos

Includes a wide range ofphysical activities (warm up,cool down and fitnessexercises).

http://orestibanos.com/datasets.htm

27

TABLE VIII – Continued from previous page.Dataset Name Domain Provider Notes Address/Link

Taxi ServiceTrajectory Transportation

PredictionChallenge,ECMLPKDD 2015

Trajectories performed byall the 442 taxisrunning in the cityof Porto, in Portugal.

http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html

GeoLife GPSTrajectories Transportation Microsoft

A GPS trajectory by asequence of time-stampedpoints

https://www.microsoft.com/en-us/download/details.aspx?id=52367

T-Drive trajectorydata Transportation Microsoft Contains a one-week

trajectories of 10,357 taxis

https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/

Chicago BusTraces data Transportation M. Doering

Bus traces from theChicago Transport Authorityfor 18 days with a ratebetween 20 and 40 seconds.

http://www.ibr.cs.tu-bs.de/users/mdoering/bustraces/

Uber tripdata Transportation FiveThirty-

Eight

About 20 million Uberpickups in New York Cityduring 12 months.

https://github.com/fivethirtyeight/uber-tlc-foil-response

Traffic SignRecognition Transportation K. Lim

Three datasets: Koreandaytime, Korean nighttime,and German daytimetraffic signs based onVienna traffic rules.

https://figshare.com/articles/Traffic Sign RecognitionTestsets/4597795

DDD17 Transportation J. Binas End-To-End DAVISDriving Dataset.

http://sensors.ini.uzh.ch/databases.html

where an adversary tries to provide some inputs that lead toan incorrect output classification and hence corrupting theintegrity of the classification. Developing further techniquesto defend and prevent the effect of this sort of attacks on DLmodels is necessary for reliable IoT applications.

4) Challenges of 6V’s:Despite the recent advancement in DL for big data, there arestill significant challenges that need to be addressed to maturethis technology. Each characteristic of IoT big data imposesa challenge for DL techniques. In the following we highlightthese challenges.

The massive volume of data poses a great challenge for DL,especially for time and structure complexity. The voluminousnumber of input data, their broad number of attributes, andtheir high degree of classification result in a very complexDL model and affect running time performance. Running DLon distributed frameworks or clusters of CPUs with parallelprocessing is a viable solution that has been developed [6].The high volume of IoT big data also brings another challenge,namely the noisy and unlabeled data. Even though DL is verygood at tolerating noisy data and learning from unlabeled data,it is not clear to what extent DL models can be accurate inthe presence of such abnormal data.

The variety of IoT data formats that come from varioussources pops up the challenge of managing conflicts betweendifferent data sources. In case of no conflict in data sources,DL has the ability to effectively work on heterogeneous data.

The high velocity of IoT big data, i.e., the high rate of datageneration, also brings the challenge of high speed processingand analysis of data. Online learning is a solution for highvelocity and has been proposed for DNNs. However, moreresearch is needed to augment DL with online learning andsequential learning techniques.

The veracity of IoT big data also presents challenges forDL analytics. The IoT big data analytics will not be useful if

the input data is not coming from a trustworthy source. Datavalidation and trustworthiness should be checked at each levelof big data analytics, especially when we are dealing withonline streams of input data to an analytic engine [171].

Moreover, the variability of IoT big data (variation in thedata flow rates) rises challenges for online analytics. In caseof immense streams of data, DL techniques, and in particularthe online ones, handle them. Data sampling techniques wouldbe beneficial in these scenarios.

Finally, a main challenge for business managers to adoptbig data is that it is not clear for them how to use bigdata analytics to get value out of it and improve theirbusiness [172]. Beyond that, the analytic engine may produceabstractions that are not important for the stakeholders, or arenot clear enough for them.

5) Deep learning for IoT Devices:Developing DL on IoT devices poses a new challenge for IoTdevice designers, to consider the requirements of handlingDNNs in resource-constrained devices. These requirementsare expected to grow as the datasets sizes are growing everyday, and new algorithms arise to be part of the solutions forDL in IoT.

6) Deep Learning Limitations:Despite showing impressive results in many applications, DLmodels still have several limitations. Nguyen et al. [173]reported about the false confidence of DDN for predictingimages that are unrecognizable by humans. By producingfooling examples that are totally unrecognizable by humans,the DNN classifies them as familiar objects.

The other limitation is the focus of DL models on clas-sification, while many IoT applications (e.g., electricity loadforecasting, temperature forecasting) need a kind of regressionat their analytics core. Few works tried to enrich DNNs withregression capabilities, such as the work in [174] proposing

28

the ensemble of DBN and Support Vector Regression (SVR)for regression tasks. However, more investigation is requiredto clear many aspects of regression with DL.

B. Future Directions

1) IoT Mobile Data:One remarkable part of IoT data comes from mobile devices.Investigating efficient ways to utilize mobile big data inconjunction with DL approaches is a way to come up withbetter services for IoT domains, especially in smart cityscenarios. In [175], the capabilities of DL models in mobilebig data analytics were investigated using a distributedlearning framework that executes an iterative MapReducetask on several parallel Spark workers.

2) Integrating Contextual Information:The environment’s situation cannot be understood by the IoTsensor data alone. Therefore, IoT data needs to be fusedwith other sources of data, namely context information thatcomplement the understanding of the environment [10]. Thisintegration can also help for fast data analytics and quickreasoning due to the bounded search space for the reasoningengine. For example, a smart camera with capability of facepose recognition can perform its job in various contexts suchas security gates in smart homes or government buildings, orin smart cars for driving assistance. In all these situations,complementary contextual information (e.g., time within theday, daily habits, etc.) helps the system to reason about thebest action that can be done based on the detected pose ofthe person.

3) Online Resource Provisioning for IoT analytics:The deployment of fast DL based data analytics on the fogand cloud would require online provisioning of fog or cloudresources to host the stream of data. Due to the streamingnature of IoT data, knowing the volume of data sequence inadvance is not feasible. In this regard, we need a new class ofalgorithms that work based on the current stream of data anddo not rely on the prior knowledge of the data stream. A DLmechanism and an online auctioning algorithm are proposedin [65] and [176], respectively, to support online provisioningof fog and cloud resources for IoT applications.

4) Semi-supervised Analytic Frameworks:Most of the analytic algorithms are supervised, thus needinga large amount of training labeled data that is either notavailable or comes at a great expense to prepare. Based onIDC’s report [177], it is estimated that by 2012 only about3% of all data in the digital universe has been annotated,which implies the poor source of training datasets for DL.A combination of advanced machine learning algorithmsdesigned for semi-supervised settings fits well for smartcities systems, where a small training dataset can be usedwhile the learning agent improves its accuracy using alarge amount of unlabeled data [4]. Figure 19 illustrates therole of semi-supervised learning in improving the outputaccuracy for deep reinforcement learning [40] in indoor

Fig. 19. Deep reinforcement learning with only labeled data (supervised)vs. with labeled and unlabeled data (semisupervised). At each epoch, semi-supervised model outperforms the supervised model both in terms of totalreceived rewards and closeness to the target.

localization experiments. In their experiments, only 15% ofdata was labeled but the results were strengthened by utilizingunlabeled data in the algorithm.

5) Dependable and Reliable IoT Analytics:As we rely more on CPS and IoT in large scales, theneed for mechanisms to ensure the safety of the systemagainst malicious attacks as well as failures become morecrucial [178]. DL approaches can be applied in thesedirections by analyzing the huge amount of log traces of CPSand IoT systems, in order to identify and predict weak pointsof the system where attacks may occur or the functionality isdefected. This will help the system to prevent or recover fromfaults and consequently increase the level of dependability ofCPS and IoT systems.

6) Self-organizing Communication Networks:With a huge number of IoT devices, the configurationand maintenance of their underlying physical M2Mcommunications and networking become harder. Althoughthe large body of network nodes and their relation is achallenge for traditional machine learning approaches, itopens the opportunity for DL architectures to prove theircompetency in this area by providing a range of self-servicessuch as self-configuration, self-optimization, self-healing,and self-load balancing. Valente et al. [179] have provideda survey of traditional machine learning approaches forself-organizing cellular networks.

7) Emerging IoT Applications:Unmanned aerial vehicles: The usage of Unmanned aerialvehicles (UAVs) is a promising application that can improveservice delivery in hard-reaching regions or in critical situa-tions. UAVs have been also used for many image analysis inreal-time such as surveillance tasks, search-and-rescue opera-tions, and infrastructure inspection [180]. These devices faceseveral challenges for their adoption, including routing, energysaving, avoiding private regions, and obstacle avoidance [181]etc. DL can be of great impact in this domain for the predictionand decision-making of tasks to get the best out of UAVs.

29

Moreover, UAVs can be seen as on-the-fly analytics platformsthat potentially can provide temporarily fog computing ana-lytic services as well as distributed analytics.Virtual/Augmented Reality: Virtual/augmented reality is an-other application area that can benefit from both IoT andDL. The latter can be used in this field to provide servicessuch as object tracking [182], activity recognition, imageclassification, and object recognition [183] to name a few.Augmented reality can greatly affect several domains such aseducation, museums, smart cars, etc.

VIII. CONCLUSION

DL and IoT have drawn the attention of researchers andcommercial verticals in recent years, as these two technologytrends have proven to make a positive effect on our lives, cities,and world. IoT and DL constitute a chain of data producer-consumer, in which IoT generates raw data that is analyzedby DL models and DL models produce high-level abstractionand insight that is fed to the IoT systems for fine-tuning andimprovement of services.

In this survey, we reviewed the characteristics of IoT dataand its challenges for DL methods. In specific, we highlightedIoT fast and streaming data as well as IoT big data as the twomain categories of IoT data generation and their requirementsfor analytics. We also presented several main architectures ofDL that is used in the context of IoT applications followed byseveral open source frameworks for development of DL archi-tectures. Reviewing different applications in various sectorsof IoT that have utilized DL was another part of this surveyin which we identified five foundational services along witheleven application domains. By distinguishing foundationalservices, as well as IoT vertical applications, and reviewingtheir DL approaches and use cases, the authors provided abasis for other researchers to understand the principle compo-nents of IoT smart services and apply the relevant techniquesto their problems. The new paradigm of implementing DL onIoT devices was surveyed and several approaches to achieve itwere introduced. DL based on fog and cloud infrastructures tosupport IoT applications was another part of this survey. Wealso identified the challenges and future research direction inthe path of DL for IoT applications.

LIST OF ACRONYMS

AE Auto-encoderAI Artificial IntelligenceAMQP Advanced Message Queuing ProtocolANN Artificial Neural NetworkBAC Breast Arterial CalcificationBLE Bluetooth Low EnergyBPTT Backpropagation Through TimeCAE Contractive Auto-encoderCDR Caller Detail RecordCNN Convolutional Neural NetworkCoAP Constrained Application ProtocolCPS Cyber-physical SystemCRBM Conditional Restricted Boltzmann MachineDAE Denoising Auto-encoderDBN Deep Belief NetworkDL Deep LearningDNN Deep Neural NetworkDNS-SD DNS Service DiscoveryDRL Deep Reinforcement LearningFDC Fault Detection and ClassificationFDI False Data InjectionGAN Generative Adversarial NetworkGBM Gradient Boosted MachineGLM Generalized Linear ModelHMM Hidden Markov ModelHVAC Heating, Ventilation and Air ConditioningINS Inertia Navigation SystemIoT Internet of ThingsIPA Intelligent Personal AssistantITS Intelligent Transportation SystemLSTM Long Short-term MemoryMAPE Mean Absolute Percentage ErrormDNS multicast DNSMLP Multi-layer PerceptronMOOC Massive Open Online CoursesMQTT Message Queue Telemetry TransportRBN Restricted Boltzmann MachineReLU Rectified Linear UnitsRNN Recurrent Neural NetworkSaaS Software as a ServiceSdA Stacked denoising AutoencoderSGD Stochastic Gradient DescentSVM Support Vector MachineSVR Support Vector RegressionTPU Tensor Processing UnitUAV Unmanned Aerial VehicleVAE Variational Auto-encoderVLC Visual Light CommunicationWSN Wireless Sensor NetworkXMPP Extensible Messaging and Presence Protocol

REFERENCES

[1] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, andM. Ayyash, “Internet of Things: A survey on enabling technologies,protocols, and applications,” IEEE Communications Surveys & Tutori-als, vol. 17, no. 4, pp. 2347–2376, 2015.

30

[2] J. Manyika, M. Chui, J. Bughin, R. Dobbs, P. Bisson, and A. Marrs,Disruptive technologies: Advances that will transform life, business,and the global economy. McKinsey Global Institute San Francisco,CA, 2013, vol. 180.

[3] K. Panetta. (2016) Gartner’s top 10 strategic technology trends for2017. [Online]. Available: http://www.gartner.com/smarterwithgartner/gartners-top-10-technology-trends-2017/

[4] M. Mohammadi and A. Al-Fuqaha, “Enabling cognitive smart citiesusing big data and machine learning: Approaches and challenges,”IEEE Communications Magazine, vol. PP, no. 99, pp. 1–8, 2017.

[5] M. Chen, S. Mao, Y. Zhang, and V. C. Leung, Big data: relatedtechnologies, challenges and future prospects. Springer, 2014.

[6] X.-W. Chen and X. Lin, “Big data deep learning: challenges andperspectives,” IEEE Access, vol. 2, pp. 514–525, 2014.

[7] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. Mccauley,M. Franklin, S. Shenker, and I. Stoica, “Fast and interactive analyticsover hadoop data with spark,” USENIX Login, vol. 37, no. 4, pp. 45–51,2012.

[8] C. Engle, A. Lupher, R. Xin, M. Zaharia, M. J. Franklin, S. Shenker,and I. Stoica, “Shark: fast data analysis using coarse-grained distributedmemory,” in Proceedings of the 2012 ACM SIGMOD InternationalConference on Management of Data. ACM, 2012, pp. 689–692.

[9] C.-W. Tsai, C.-F. Lai, M.-C. Chiang, L. T. Yang et al., “Data miningfor internet of things: A survey.” IEEE Communications Surveys andTutorials, vol. 16, no. 1, pp. 77–97, 2014.

[10] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Contextaware computing for the internet of things: A survey,” IEEE Commu-nications Surveys & Tutorials, vol. 16, no. 1, pp. 414–454, 2014.

[11] M. A. Alsheikh, S. Lin, D. Niyato, and H.-P. Tan, “Machine learningin wireless sensor networks: Algorithms, strategies, and applications,”IEEE Communications Surveys & Tutorials, vol. 16, no. 4, pp. 1996–2018, 2014.

[12] Z. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, andK. Mizutani, “State-of-the-art deep learning: Evolving machine intelli-gence toward tomorrow’s intelligent network traffic control systems,”IEEE Communications Surveys Tutorials, vol. PP, no. 99, 2017.

[13] B. Li, Y. Diao, and P. Shenoy, “Supporting scalable analytics withlatency constraints,” Proceedings of the VLDB Endowment, vol. 8,no. 11, pp. 1166–1177, 2015.

[14] M. Hilbert, “Big data for development: a review of promises andchallenges,” Development Policy Review, vol. 34, no. 1, pp. 135–174,2016.

[15] W. Fan and A. Bifet, “Mining big data: current status, and forecast tothe future,” ACM sIGKDD Explorations Newsletter, vol. 14, no. 2, pp.1–5, 2013.

[16] H. Hu, Y. Wen, T.-S. Chua, and X. Li, “Toward scalable systems forbig data analytics: A technology tutorial,” IEEE Access, vol. 2, pp.652–687, 2014.

[17] Y. Demchenko, P. Grosso, C. De Laat, and P. Membrey, “Addressingbig data issues in scientific data infrastructure,” in CollaborationTechnologies and Systems (CTS), 2013 International Conference on.IEEE, 2013, pp. 48–55.

[18] M. Strohbach, H. Ziekow, V. Gazis, and N. Akiva, “Towards a big dataanalytics framework for iot and smart city applications,” in Modelingand processing for next-generation big-data technologies. Springer,2015, pp. 257–282.

[19] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality ofdata with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,2006.

[20] L. Deng, “A tutorial survey of architectures, algorithms, and ap-plications for deep learning,” APSIPA Transactions on Signal andInformation Processing, vol. 3, pp. 1–29, 2014.

[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in Advances in neuralinformation processing systems, 2012, pp. 1097–1105.

[22] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to constructdeep recurrent neural networks,” arXiv preprint arXiv:1312.6026v5[cs.NE], 2013.

[23] M. Hermans and B. Schrauwen, “Training and analysing deep recur-rent neural networks,” in Advances in neural information processingsystems, 2013, pp. 190–198.

[24] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neuralcomputation, vol. 9, no. 8, pp. 1735–1780, 1997.

[25] P. Baldi, “Autoencoders, unsupervised learning, and deep architec-tures.” ICML unsupervised and transfer learning, vol. 27, no. 37-50,p. 1, 2012.

[26] C. Doersch, “Tutorial on variational autoencoders,” arXiv preprintarXiv:1606.05908v2 [stat.ML], 2016.

[27] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” in Advances inNeural Information Processing Systems, 2014, pp. 3581–3589.

[28] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” inAdvances in Neural Information Processing Systems, 2014, pp. 2672–2680.

[29] Y. Bengio et al., “Learning deep architectures for AI,” Foundations andtrends R© in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.

[30] H. Valpola, “From neural pca to deep unsupervised learning,” Advancesin Independent Component Analysis and Learning Machines, pp. 143–171, 2015.

[31] A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko,“Semi-supervised learning with ladder networks,” in Advances inNeural Information Processing Systems, 2015, pp. 3546–3554.

[32] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, “Afast and accurate online sequential learning algorithm for feedforwardnetworks,” IEEE Transactions on neural networks, vol. 17, no. 6, pp.1411–1423, 2006.

[33] Z. Yang, P. Zhang, and L. Chen, “RFID-enabled indoor positioningmethod for a real-time manufacturing execution system using OS-ELM,” Neurocomputing, vol. 174, pp. 121–133, 2016.

[34] H. Zou, H. Jiang, X. Lu, and L. Xie, “An online sequential extremelearning machine approach to wifi based indoor positioning,” in Internetof Things (WF-IoT), 2014 IEEE World Forum on. IEEE, 2014, pp.111–116.

[35] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE transac-tions on pattern analysis and machine intelligence, vol. 39, no. 6, pp.1137–1149, 2017.

[36] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE internationalconference on computer vision, 2015, pp. 1440–1448.

[37] H. Mao, S. Yao, T. Tang, B. Li, J. Yao, and Y. Wang, “Towards real-time object detection on embedded systems,” IEEE Transactions onEmerging Topics in Computing, vol. PP, no. 99, pp. 1–15, 2017.

[38] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only lookonce: Unified, real-time object detection,” in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2016, pp.779–788.

[39] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcementlearning,” arXiv preprint arXiv:1312.5602v1 [cs.LG], 2013.

[40] M. Mohammadi, A. Al-Fuqaha, M. Guizani, and J.-S. Oh, “Semi-supervised deep reinforcement learning in support of IoT and smartcity services,” IEEE Internet of Things Journal, vol. PP, no. 99, pp.1–12, 2017.

[41] Y. Bengio et al., “Deep learning of representations for unsupervised andtransfer learning.” ICML Unsupervised and Transfer Learning, vol. 27,pp. 17–36, 2012.

[42] J. Deng, R. Xia, Z. Zhang, Y. Liu, and B. Schuller, “Introducing shared-hidden-layer autoencoders for transfer learning and their applicationin acoustic emotion recognition,” in Acoustics, Speech and SignalProcessing (ICASSP), 2014 IEEE International Conference on. IEEE,2014, pp. 4818–4822.

[43] P. Wu, S. C. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao, “Online mul-timodal deep similarity learning with application to image retrieval,” inProceedings of the 21st ACM international conference on Multimedia.ACM, 2013, pp. 153–162.

[44] P. Jaini, A. Rashwan, H. Zhao, Y. Liu, E. Banijamali, Z. Chen, andP. Poupart, “Online algorithms for sum-product networks with contin-uous variables,” in Proceedings of the Eighth International Conferenceon Probabilistic Graphical Models, 2016, pp. 228–239.

[45] G. Chen, R. Xu, and S. N. Srihari, “Sequential labeling with onlinedeep learning: Exploring model initialization,” in Joint European Con-ference on Machine Learning and Knowledge Discovery in Databases.Springer, 2016, pp. 772–788.

[46] S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah, “Com-parative study of deep learning software frameworks,” arXiv preprintarXiv:1511.06435v3 [cs.LG], 2016.

[47] A. Candel, V. Parmar, E. LeDell, and A. Arora, “Deep learning withh2o,” 2015.

[48] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scalemachine learning on heterogeneous distributed systems,” arXiv preprintarXiv:1603.04467v2 [cs.DC], 2016.

31

[49] R. Collobert, K. Kavukcuoglu, and C. Farabet, “Torch7: A matlab-likeenvironment for machine learning,” in BigLearn, NIPS Workshop, no.EPFL-CONF-192376, 2011.

[50] F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Berg-eron, N. Bouchard, D. Warde-Farley, and Y. Bengio, “Theano: newfeatures and speed improvements,” arXiv preprint arXiv:1211.5590v1[cs.SC], 2012.

[51] S. Raschka and V. Mirjalili, Python Machine Learning, 2nd ed.Birmingham, UK: Packt Publishing, 2017.

[52] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture forfast feature embedding,” in Proceedings of the 22nd ACM internationalconference on Multimedia. ACM, 2014, pp. 675–678.

[53] S. Shi, Q. Wang, P. Xu, and X. Chu, “Benchmarking state-of-the-art deep learning software tools,” arXiv preprint arXiv:1608.07249v7[cs.DC], 2016.

[54] R. Mehmood, F. Alam, N. N. Albogami, I. Katib, A. Albeshri, andS. M. Altowaijri, “Utilearn: A personalised ubiquitous teaching andlearning system for smart societies,” IEEE Access, vol. 5, pp. 2615–2635, 2017.

[55] A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, andB. Vorster, “Deep learning in the automotive industry: Applications andtools,” in Big Data (Big Data), 2016 IEEE International Conferenceon. IEEE, 2016, pp. 3759–3768.

[56] X. Ma, H. Yu, Y. Wang, and Y. Wang, “Large-scale transportationnetwork congestion evolution prediction using deep learning theory,”PloS one, vol. 10, no. 3, p. e0119044, 2015.

[57] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar,“An early resource characterization of deep learning on wearables,smartphones and internet-of-things devices,” in Proceedings of the 2015International Workshop on Internet of Things towards Applications.ACM, 2015, pp. 7–12.

[58] G. Mittal, K. B. Yagnik, M. Garg, and N. C. Krishnan, “Spotgarbage:smartphone app to detect garbage using deep learning,” in Proceedingsof the 2016 ACM International Joint Conference on Pervasive andUbiquitous Computing. ACM, 2016, pp. 940–945.

[59] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, Y. Ma, S. Chen,and P. Hou, “A new deep learning-based food recognition system fordietary assessment on an edge computing service infrastructure,” IEEETransactions on Services Computing, 2017.

[60] S. Sladojevic, M. Arsenovic, A. Anderla, D. Culibrk, and D. Stefanovic,“Deep neural networks based recognition of plant diseases by leafimage classification,” Computational Intelligence and Neuroscience,vol. 2016, 2016.

[61] Y. Liu, E. Racah, J. Correa, A. Khosrowshahi, D. Lavers, K. Kunkel,M. Wehner, and W. Collins, “Application of deep convolutional neuralnetworks for detecting extreme weather in climate datasets,” Int’l Conf.on Advances in Big Data Analytics, 2016.

[62] S. Tokui, K. Oono, S. Hido, and J. Clayton, “Chainer: a next-generationopen source framework for deep learning,” in Proceedings of workshopon machine learning systems (LearningSys) in the twenty-ninth annualconference on neural information processing systems (NIPS), 2015.

[63] Y. Hada-Muranushi, T. Muranushi, A. Asai, D. Okanohara, R. Ray-mond, G. Watanabe, S. Nemoto, and K. Shibata, “A deep-learningapproach for operation of an automated realtime flare forecast,” SPACEWEATHER, 2016.

[64] J. A. C. Soto, M. Jentsch, D. Preuveneers, and E. Ilie-Zudor, “Ceml:Mixing and moving complex event processing and machine learningto the edge of the network for iot applications,” in Proceedings of the6th International Conference on the Internet of Things. ACM, 2016,pp. 103–110.

[65] M. Borkowski, S. Schulte, and C. Hochreiner, “Predicting cloud re-source utilization,” in Proceedings of the 9th International Conferenceon Utility and Cloud Computing. ACM, 2016, pp. 37–42.

[66] H. Larry. (2017) Voice control everywhere: Low-power special-purpose chip could make speech recognition ubiquitousin electronics. [Online]. Available: http://news.mit.edu/2017/low-power-chip-speech-recognition-electronics-0213

[67] M. Price, J. Glass, and A. Chandrakasan, “A scalable speech recognizerwith deep-neural-network acoustic models and voice-activated powergating,” in Proceedings of the IEEE ISSCC2017, 2017.

[68] X. Wang, L. Gao, S. Mao, and S. Pandey, “Deepfi: Deep learningfor indoor fingerprinting using channel state information,” in 2015IEEE Wireless Communications and Networking Conference (WCNC).IEEE, 2015, pp. 1666–1671.

[69] Y. Gu, Y. Chen, J. Liu, and X. Jiang, “Semi-supervised deep extremelearning machine for wi-fi based localization,” Neurocomputing, vol.166, pp. 282–293, 2015.

[70] W. Zhang, K. Liu, W. Zhang, Y. Zhang, and J. Gu, “Deep neuralnetworks for wireless localization in indoor and outdoor environments,”Neurocomputing, vol. 194, pp. 279–287, 2016.

[71] Z. Liu, L. Zhang, Q. Liu, Y. Yin, L. Cheng, and R. Zimmer-mann, “Fusion of magnetic and visual sensors for indoor localization:Infrastructure-free and more effective,” IEEE Transactions on Multi-media, vol. 19, no. 4, pp. 874–888, 2017.

[72] M. Becker, “Indoor positioning solely based on user’s sight,” inInternational Conference on Information Science and Applications.Springer, 2017, pp. 76–83.

[73] W. Lu, J. Zhang, X. Zhao, J. Wang, and J. Dang, “Multimodal sensoryfusion for soccer robot self-localization based on long short-termmemory recurrent neural network,” Journal of Ambient Intelligenceand Humanized Computing, pp. 1–9, 2017.

[74] A. Toshev and C. Szegedy, “Deeppose: Human pose estimation viadeep neural networks,” in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2014, pp. 1653–1660.

[75] J. Liu, Y. Gu, and S. Kamijo, “Joint customer pose and orientationestimation using deep neural network from surveillance camera,” inMultimedia (ISM), 2016 IEEE International Symposium on. IEEE,2016, pp. 216–221.

[76] F. J. Ordonez and D. Roggen, “Deep convolutional and lstm recurrentneural networks for multimodal wearable activity recognition,” Sensors,vol. 16, no. 1, p. 115, 2016.

[77] D. Tao, Y. Wen, and R. Hong, “Multi-column bi-directional long short-term memory for mobile devices-based human activity recognition,”IEEE Internet of Things Journal, 2016.

[78] X. Li, Y. Zhang, I. Marsic, A. Sarcevic, and R. S. Burd, “Deep learningfor rfid-based activity recognition,” in Proceedings of the 14th ACMConference on Embedded Network Sensor Systems. ACM, 2016, pp.164–175.

[79] L. Pigou, A. Van Den Oord, S. Dieleman, M. Van Herreweghe,and J. Dambre, “Beyond temporal pooling: Recurrence and temporalconvolutions for gesture recognition in video,” International Journalof Computer Vision, pp. 1–10, 2015.

[80] K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent networkmodels for human dynamics,” in Proceedings of the IEEE InternationalConference on Computer Vision, 2015, pp. 4346–4354.

[81] S. E. Kahou, X. Bouthillier, P. Lamblin, C. Gulcehre, V. Michal-ski, K. Konda, S. Jean, P. Froumenty, Y. Dauphin, N. Boulanger-Lewandowski et al., “Emonets: Multimodal deep learning approachesfor emotion recognition in video,” Journal on Multimodal User Inter-faces, vol. 10, no. 2, pp. 99–111, 2016.

[82] N. Neverova, C. Wolf, G. Lacey, L. Fridman, D. Chandra, B. Barbello,and G. Taylor, “Learning human identity from motion patterns,” IEEEAccess, vol. 4, pp. 1810–1820, 2016.

[83] Y. He, G. J. Mendis, and J. Wei, “Real-time detection of false datainjection attacks in smart grid: A deep learning-based intelligentmechanism,” IEEE Transactions on Smart Grid, 2017.

[84] M.-J. Kang and J.-W. Kang, “Intrusion detection system using deepneural network for in-vehicle network security,” PloS one, vol. 11,no. 6, p. e0155781, 2016.

[85] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, “Droid-sec: deep learning inandroid malware detection,” in ACM SIGCOMM Computer Communi-cation Review, vol. 44, no. 4. ACM, 2014, pp. 371–372.

[86] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” inProceedings of the 22nd ACM SIGSAC conference on computer andcommunications security. ACM, 2015, pp. 1310–1321.

[87] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov,K. Talwar, and L. Zhang, “Deep learning with differential privacy,”in Proceedings of the 2016 ACM SIGSAC Conference on Computerand Communications Security. ACM, 2016, pp. 308–318.

[88] T. J. Hazen. (2016) Microsoft and liebherr collaborating onnew generation of smart refrigerators. [Online]. Available: http://blogs.technet.microsoft.com/machinelearning/2016/09/02/

[89] M. Manic, K. Amarasinghe, J. J. Rodriguez-Andina, and C. Rieger,“Intelligent buildings of the future: Cyberaware, deep learning powered,and human interacting,” IEEE Industrial Electronics Magazine, vol. 10,no. 4, pp. 32–49, 2016.

[90] P. Feng, M. Yu, S. M. Naqvi, and J. A. Chambers, “Deep learning forposture analysis in fall detection,” in Digital Signal Processing (DSP),2014 19th International Conference on. IEEE, 2014, pp. 12–17.

[91] Toshiba:Press-Release. (2016) Toshiba and dell technologies’ deeplearning testbed for iot is first approved by industrial internet

32

consortium. [Online]. Available: https://www.toshiba.co.jp/about/press/2016 10/pr1702.htm

[92] X. Song, H. Kanasugi, and R. Shibasaki, “Deeptransport: Predictionand simulation of human mobility and transportation mode at acitywide level.” IJCAI, 2016.

[93] V. C. Liang, R. T. Ma, W. S. Ng, L. Wang, M. Winslett, H. Wu, S. Ying,and Z. Zhang, “Mercury: Metro density prediction with recurrent neuralnetwork on streaming cdr data,” in Data Engineering (ICDE), 2016IEEE 32nd International Conference on. IEEE, 2016, pp. 1374–1377.

[94] G. Amato, F. Carrara, F. Falchi, C. Gennaro, C. Meghini, and C. Vairo,“Deep learning for decentralized parking lot occupancy detection,”Expert Systems with Applications, 2017.

[95] S. Valipour, M. Siam, E. Stroulia, and M. Jagersand, “Parking-stall va-cancy indicator system, based on deep convolutional neural networks,”in 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT), 2016,pp. 655–660.

[96] D. C. Mocanu, E. Mocanu, P. H. Nguyen, M. Gibescu, and A. Liotta,“Big iot data mining for real-time energy disaggregation in buildings,”in Proceedings of the IEEE International Conference on Systems, Man,and Cybernetics, 2016, pp. 9–12.

[97] E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling, “Deeplearning for estimating building energy consumption,” SustainableEnergy, Grids and Networks, vol. 6, pp. 91–99, 2016.

[98] A. Gensler, J. Henze, B. Sick, and N. Raabe, “Deep learning forsolar power forecasting—an approach using autoencoder and lstmneural networks,” in Systems, Man, and Cybernetics (SMC), 2016 IEEEInternational Conference on. IEEE, 2016, pp. 2858–2865.

[99] Y. Tian and L. Pan, “Predicting short-term traffic flow bylong short-term memory recurrent neural network,” in SmartCity/SocialCom/SustainCom (SmartCity), 2015 IEEE InternationalConference on. IEEE, 2015, pp. 153–158.

[100] D. Ciresan, U. Meier, J. Masci, and J. Schmidhuber, “Multi-columndeep neural network for traffic sign classification,” Neural Networks,vol. 32, pp. 333–338, 2012.

[101] K. Lim, Y. Hong, Y. Choi, and H. Byun, “Real-time traffic signrecognition based on a general purpose gpu and deep-learning,” PLoSone, vol. 12, no. 3, p. e0173317, 2017.

[102] E. Ackerman and A. Pagano, “Deep-learning first: Drive.ai’s path toautonomous driving,” IEEE Spectrum, 2017.

[103] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, and Y. Ma, “Deepfood:Deep learning-based food image recognition for computer-aided dietaryassessment,” in International Conference on Smart Homes and HealthTelematics. Springer, 2016, pp. 37–48.

[104] C. R. Pereira, D. R. Pereira, J. P. Papa, G. H. Rosa, and X.-S.Yang, “Convolutional neural networks applied for parkinson’s diseaseidentification,” in Machine Learning for Health Informatics. Springer,2016, pp. 377–390.

[105] G. Muhammad, S. M. M. Rahman, A. Alelaiwi, and A. Alamri, “Smarthealth solution integrating IoT and cloud: A case study of voicepathology monitoring,” IEEE Communications Magazine, vol. 55,no. 1, pp. 69–73, 2017.

[106] J. Wang, H. Ding, F. Azamian, B. Zhou, C. Iribarren, S. Molloi, andP. Baldi, “Detecting cardiovascular disease from mammograms withdeep learning,” IEEE Transactions on Medical Imaging, 2017.

[107] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning todiagnose with lstm recurrent neural networks,” in ICLR 2016, 2016.

[108] D. Ravı, C. Wong, F. Deligianni, M. Berthelot, J. Andreu-Perez, B. Lo,and G.-Z. Yang, “Deep learning for health informatics,” IEEE journalof biomedical and health informatics, vol. 21, no. 1, pp. 4–21, 2017.

[109] N. Kussul, M. Lavreniuk, S. Skakun, and A. Shelestov, “Deep learningclassification of land cover and crop types using remote sensing data,”IEEE Geoscience and Remote Sensing Letters, 2017.

[110] K. Kuwata and R. Shibasaki, “Estimating crop yields with deeplearning and remotely sensed data,” in Geoscience and Remote SensingSymposium (IGARSS), 2015 IEEE International. IEEE, 2015, pp. 858–861.

[111] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, andC. H. Davis, “Training deep convolutional neural networks for land–cover classification of high-resolution imagery,” IEEE Geoscience andRemote Sensing Letters, vol. 14, no. 4, pp. 549–553, 2017.

[112] K. A. Steen, P. Christiansen, H. Karstoft, and R. N. Jørgensen, “Usingdeep learning to challenge safety standard for highly autonomousmachines in agriculture,” Journal of Imaging, vol. 2, no. 1, p. 6, 2016.

[113] I. Sa, Z. Ge, F. Dayoub, B. Upcroft, T. Perez, and C. McCool,“Deepfruits: A fruit detection system using deep neural networks,”Sensors, vol. 16, no. 8, p. 1222, 2016.

[114] M. B. Ibanez, A. Di Serio, D. Villaran, and C. D. Kloos, “Exper-imenting with electromagnetism using augmented reality: Impact onflow student experience and educational effectiveness,” Computers &Education, vol. 71, pp. 1–13, 2014.

[115] L.-f. Kwok, “A vision for the development of i-campus,” SmartLearning Environments, vol. 2, pp. 1–12, 2015.

[116] H. Wang, N. Wang, and D.-Y. Yeung, “Collaborative deep learningfor recommender systems,” in Proceedings of the 21th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining.ACM, 2015, pp. 1235–1244.

[117] T.-Y. Yang, C. G. Brinton, C. Joe-Wong, and M. Chiang, “Behavior-based grade prediction for moocs via time series neural networks,”IEEE Journal of Selected Topics in Signal Processing, 2017.

[118] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, andJ. Sohl-Dickstein, “Deep knowledge tracing,” in Advances in NeuralInformation Processing Systems, 2015, pp. 505–513.

[119] F. Conti, A. Pullini, and L. Benini, “Brain-inspired classroom occu-pancy monitoring on a low-power mobile platform,” in Proceedingsof the IEEE Conference on Computer Vision and Pattern RecognitionWorkshops, 2014, pp. 610–615.

[120] H. Shao, H. Jiang, F. Wang, and H. Zhao, “An enhancement deep fea-ture fusion method for rotating machinery fault diagnosis,” Knowledge-Based Systems, vol. 119, pp. 200–220, 2017.

[121] H. Lee, “Framework and development of fault detection classificationusing iot device and cloud environment,” Journal of ManufacturingSystems, 2017.

[122] H. Lee, Y. Kim, and C. O. Kim, “A deep learning model for robustwafer fault monitoring with sensor measurement noise,” IEEE Trans-actions on Semiconductor Manufacturing, vol. 30, no. 1, pp. 23–31,2017.

[123] W. Yan and L. Yu, “On accurate and reliable anomaly detection forgas turbine combustors: A deep learning approach,” in Proceedingsof the Annual Conference of the Prognostics and Health ManagementSociety, 2015.

[124] Y. Liu and L. Wu, “Geological disaster recognition on optical remotesensing images using deep learning,” Procedia Computer Science,vol. 91, pp. 566–575, 2016.

[125] Q. Wang, Y. Guo, L. Yu, and P. Li, “Earthquake prediction basedon spatio-temporal data mining: An lstm network approach,” IEEETransactions on Emerging Topics in Computing, 2017.

[126] H. Maeda, Y. Sekimoto, and T. Seto, “Lightweight road manager:smartphone-based automatic determination of road damage status bydeep neural network,” in Proceedings of the 5th ACM SIGSPATIALInternational Workshop on Mobile Geographic Information Systems.ACM, 2016, pp. 37–45.

[127] L. Steinberg. (2015) Forbes - Changing thegame: The rise of sports analytics. [Online].Available: https://www.forbes.com/sites/leighsteinberg/2015/08/18/changing-the-game-the-rise-of-sports-analytics

[128] W. Liu, J. Liu, X. Gu, K. Liu, X. Dai, and H. Ma, “Deep learningbased intelligent basketball arena with energy image,” in InternationalConference on Multimedia Modeling. Springer, 2017, pp. 601–613.

[129] K.-C. Wang and R. Zemel, “classifying nba offensive plays using neuralnetworks,” in Proc. MIT SLOAN Sports Analytics Conf, 2016.

[130] R. Shah and R. Romijnders, “Applying deep learning to basketballtrajectories,” in ACM KDD’16, 2016.

[131] T. Kautz, B. H. Groh, J. Hannink, U. Jensen, H. Strubberg, andB. M. Eskofier, “Activity recognition in beach volleyball using a deepconvolutional neural network,” Data Mining and Knowledge Discovery,pp. 1–28, 2017.

[132] M. S. Ibrahim, S. Muralidharan, Z. Deng, A. Vahdat, and G. Mori, “Ahierarchical deep temporal model for group activity recognition,” inProceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2016, pp. 1971–1980.

[133] S. Bell and K. Bala, “Learning visual similarity for product designwith convolutional neural networks,” ACM Transactions on Graphics(TOG), vol. 34, no. 4, p. 98, 2015.

[134] L. Xiao and X. Yichao, “Exact clothing retrieval approach basedon deep neural network,” in Information Technology, Networking,Electronic and Automation Control Conference, IEEE. IEEE, 2016,pp. 396–400.

[135] M. Hadi Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg,“Where to buy it: Matching street clothing photos in online shops,” inProceedings of the IEEE International Conference on Computer Vision,2015, pp. 3343–3351.

[136] S. Advani, P. Zientara, N. Shukla, I. Okafor, K. Irick, J. Sampson,S. Datta, and V. Narayanan, “A multitask grocery assist system for the

33

visually impaired: Smart glasses, gloves, and shopping carts provideauditory and tactile feedback,” IEEE Consumer Electronics Magazine,vol. 6, no. 1, pp. 73–81, 2017.

[137] B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, “A multi-stream bi-directional recurrent neural network for fine-grained actiondetection,” in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 2016, pp. 1961–1970.

[138] M. Denil, B. Shakibi, L. Dinh, N. de Freitas et al., “Predictingparameters in deep learning,” in Advances in Neural InformationProcessing Systems, 2013, pp. 2148–2156.

[139] J. Ba and R. Caruana, “Do deep nets really need to be deep?” inAdvances in neural information processing systems, 2014, pp. 2654–2662.

[140] Y. LeCun, J. S. Denker, S. A. Solla, R. E. Howard, and L. D. Jackel,“Optimal brain damage.” in NIPs, vol. 2, 1989, pp. 598–605.

[141] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weightsand connections for efficient neural network,” in Advances in NeuralInformation Processing Systems, 2015, pp. 1135–1143.

[142] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, andW. J. Dally, “Eie: efficient inference engine on compressed deep neuralnetwork,” arXiv preprint arXiv:1602.01528v2 [cs.CV], 2016.

[143] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen,“Compressing neural networks with the hashing trick,” in Proceedingsof the 32nd International Conference on Machine Learning, vol. 37.JMLR: W&CP, 2015.

[144] M. Courbariaux and Y. Bengio, “Binarized neural networks: Trainingdeep neural networks with weights and activations constrained to+ 1or-1,” arXiv preprint arXiv:1602.02830v3 [cs.LG], 2016.

[145] S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “AxNN:energy-efficient neuromorphic systems using approximate computing,”in Proceedings of the 2014 international symposium on Low powerelectronics and design. ACM, 2014, pp. 27–32.

[146] B. Moons, B. De Brabandere, L. Van Gool, and M. Verhelst, “Energy-efficient convnets through approximate computing,” in Applications ofComputer Vision (WACV), 2016 IEEE Winter Conference on. IEEE,2016, pp. 1–8.

[147] S. G. Ramasubramanian, R. Venkatesan, M. Sharad, K. Roy, andA. Raghunathan, “Spindle: Spintronic deep learning engine for large-scale neuromorphic computing,” in Proceedings of the 2014 interna-tional symposium on Low power electronics and design. ACM, 2014,pp. 15–20.

[148] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural net-works,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017.

[149] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam,“Diannao: A small-footprint high-throughput accelerator for ubiquitousmachine-learning,” in ACM Sigplan Notices, vol. 49, no. 4. ACM,2014, pp. 269–284.

[150] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao,L. Qendro, and F. Kawsar, “Deepx: A software accelerator for low-power deep learning inference on mobile devices,” in Information Pro-cessing in Sensor Networks (IPSN), 2016 15th ACM/IEEE InternationalConference on. IEEE, 2016, pp. 1–12.

[151] S. Bang, J. Wang, Z. Li, C. Gao, Y. Kim, Q. Dong, Y.-P. Chen,L. Fick, X. Sun, R. Dreslinski et al., “14.7 a 288µw programmabledeep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence,” in Solid-StateCircuits Conference (ISSCC), 2017 IEEE International. IEEE, 2017,pp. 250–251.

[152] X. Xu, S. Das, and K. Kreutz-Delgado, “ApproxDBN: ApproximateComputing for Discriminative Deep Belief Networks,” arXiv preprintarXiv:1704.03993v3 [cs.NE], 2017.

[153] S. Venkataramani, K. Roy, and A. Raghunathan, “Efficient embeddedlearning for iot devices,” in 2016 21st Asia and South Pacific DesignAutomation Conference (ASP-DAC). IEEE, 2016, pp. 308–311.

[154] D. D. Awschalom and M. E. Flatte, “Challenges for semiconductorspintronics,” Nature Physics, vol. 3, no. 3, pp. 153–159, 2007.

[155] K. Bourzac, “Speck-size computers: Now with deep learning [news],”IEEE Spectrum, vol. 54, no. 4, pp. 13–15, 2017.

[156] B. Moons and M. Verhelst, “A 0.3–2.6 tops/w precision-scalableprocessor for real-time large-scale convnets,” in VLSI Circuits (VLSI-Circuits), 2016 IEEE Symposium on. IEEE, 2016, pp. 1–2.

[157] D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “A deep learning approachto on-node sensor data analytics for mobile or wearable devices,” IEEEJournal of Biomedical and Health Informatics, 2016.

[158] N. Nguyen-Thanh, L. Yang, D. H. Nguyen, C. Jabbour, B. Murmannet al., “Cognitive computation and communication: A complement so-lution to cloud for iot,” in Advanced Technologies for Communications(ATC), 2016 International Conference on. IEEE, 2016, pp. 222–230.

[159] B. Tang, Z. Chen, G. Hefferman, S. Pei, W. Tao, H. He, and Q. Yang,“Incorporating intelligence in fog computing for big data analysis insmart cities,” IEEE Transactions on Industrial Informatics, 2017.

[160] A. Al-Fuqaha, A. Khreishah, M. Guizani, A. Rayes, and M. Moham-madi, “Toward better horizontal integration among iot services,” IEEECommunications Magazine, vol. 53, no. 9, pp. 72–79, 2015.

[161] A. Coates, B. Huval, T. Wang, D. J. Wu, A. Y. Ng, and B. Catanzaro,“Deep learning with cots hpc systems,” in Proceedings of the 30thInternational Conference on Machine Learning, vol. 28. JMLR: W& CP, 2013.

[162] T. M. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, “Projectadam: Building an efficient and scalable deep learning training sys-tem.” in 11th USENIX Symposium on Operating Systems Design andImplementation, vol. 14, 2014, pp. 571–582.

[163] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa,S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenterperformance analysis of a tensor processing unit,” in 44th InternationalSymposium on Computer Architecture (ISCA), 2017.

[164] K. Daniel, “Lessons learned from deploying deep learning atscale,” O’Reilly Artificial Intelligence conference, 2016. [On-line]. Available: http://conferences.oreilly.com/artificial-intelligence/ai-ny-2016/public/schedule/detail/54098

[165] N. Hemsoth, “GPU Platforms Set to LengthenDeep Learning Reach,” The Next Platform, 2015.[Online]. Available: http://www.nextplatform.com/2015/12/07/gpu-platforms-emerge-for-longer-deep-learning-reach/

[166] Y. Simmhan and S. Perera, “Big data analytics platforms for real-timeapplications in iot,” in Big Data Analytics. Springer, 2016, pp. 115–135.

[167] S. B. Qaisar and M. Usman, “Fog networking for machine healthprognosis: A deep learning perspective,” in International Conferenceon Computational Science and Its Applications. Springer, 2017, pp.212–219.

[168] D. Li, T. Salonidis, N. V. Desai, and M. C. Chuah, “Deepcham:Collaborative edge-mediated adaptive deep learning for mobile objectrecognition,” in Edge Computing (SEC), IEEE/ACM Symposium on.IEEE, 2016, pp. 64–76.

[169] Wikipedia. (2017) List of datasets for machine learning research.[Online]. Available: https://en.wikipedia.org/wiki/List of datasets formachine learning research

[170] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, andA. Swami, “The limitations of deep learning in adversarial settings,”in Security and Privacy (EuroS&P), 2016 IEEE European Symposiumon. IEEE, 2016, pp. 372–387.

[171] M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya,R. Wald, and E. Muharemagic, “Deep learning applications and chal-lenges in big data analytics,” Journal of Big Data, vol. 2, no. 1, p. 1,2015.

[172] S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz,“Big data, analytics and the path from insights to value,” MIT sloanmanagement review, vol. 52, no. 2, p. 21, 2011.

[173] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easilyfooled: High confidence predictions for unrecognizable images,” inProceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2015, pp. 427–436.

[174] X. Qiu, L. Zhang, Y. Ren, P. N. Suganthan, and G. Amaratunga,“Ensemble deep learning for regression and time series forecasting,” inComputational Intelligence in Ensemble Learning (CIEL), 2014 IEEESymposium on. IEEE, 2014, pp. 1–6.

[175] M. A. Alsheikh, D. Niyato, S. Lin, H.-P. Tan, and Z. Han, “Mobile bigdata analytics using deep learning and apache spark,” IEEE Network,vol. 30, no. 3, pp. 22–29, 2016.

[176] A. Gharaibeh, A. Khreishah, M. Mohammadi, A. Al-Fuqaha, I. Khalil,and A. Rayes, “Online auction of cloud resources in support of theinternet of things,” IEEE Internet of Things Journal, vol. PP, no. 99,pp. 1–14, 2017.

[177] J. Gantz and D. Reinsel, “The digital universe in 2020: Big data, biggerdigital shadows, and biggest growth in the far east,” IDC iView: IDCAnalyze the future, vol. 2007, no. 2012, pp. 1–16, 2012.

[178] National Science Foundation. (2017) Cyber-Physical Systems (CPS)- Program Solicitation (NSF 17-529). [Online]. Available: https://www.nsf.gov/pubs/2017/nsf17529/nsf17529.pdf

34

[179] P. Valente Klaine, M. A. Imran, O. Onireti, and R. D. Souza, “A surveyof machine learning techniques applied to self organizing cellularnetworks,” IEEE Communications Surveys and Tutorials, vol. PP, 2017.

[180] J. Lee, J. Wang, D. Crandall, S. Sabanovic, and G. Fox, “Real-time,cloud-based object detection for unmanned aerial vehicles,” in RoboticComputing (IRC), IEEE International Conference on. IEEE, 2017,pp. 36–43.

[181] L. Tai, S. Li, and M. Liu, “A deep-network solution towards model-lessobstacle avoidance,” in Intelligent Robots and Systems (IROS), 2016IEEE/RSJ International Conference on. IEEE, 2016, pp. 2759–2764.

[182] O. Akgul, H. I. Penekli, and Y. Genc, “Applying deep learning inaugmented reality tracking,” in Signal-Image Technology & Internet-Based Systems (SITIS), 2016 12th International Conference on. IEEE,2016, pp. 47–54.

[183] R. E. Sutanto, L. Pribadi, and S. Lee, “3d integral imaging basedaugmented reality with deep learning implemented by faster r-cnn,”in International Conference on Mobile and Wireless Technology.Springer, 2017, pp. 241–247.

Mehdi Mohammadi (S’14) is a Ph.D. candidatein the Department of Computer Science, WesternMichigan University (WMU), Kalamazoo, MI, USA.He received his B.S. degree in Computer Engi-neering from Kharazmi University, Tehran, Iran in2003 and his M.S. degree in Computer Engineering(Software) from Sheikhbahaee University, Isfahan,Iran in 2010. His research interests include Internetof Things, IoT data analytics, Cloud Computing,and Machine Learning. He served as reviewer forseveral journals including IEEE Communications

Magazine, IEEE Communications Letters, Wiley’s Security and WirelessCommunication Networks journal and Wiley’s Wireless Communications andMobile Computing Journal. He received a Graduate Doctoral Assistantshipfrom the WMU Libraries Information Technology since 2013. He was therecipient of six travel grants from the National Science Foundation (NSF).

Ala Al-Fuqaha (S’00-M’04-SM’09) received hisM.S. and Ph.D. degrees in Electrical and Com-puter Engineering from the University of Missouri-Columbia and the University of Missouri-KansasCity, in 1999 and 2004, respectively. Currently, heis a Professor and director of NEST Research Labat the Computer Science Department of WesternMichigan University. His research interests includesmart services in support of the Internet of Things,Wireless Vehicular Networks (VANETs), coopera-tion and spectrum access etiquettes in cognitive radio

networks, management and planning of software defined networks (SDN),intelligent services for the blind and the visually impaired, QoS routing inoptical and wireless networks, and performance analysis and evaluation ofhigh-speed computer and telecommunication networks. He is a senior memberof the IEEE and has served as Technical Program Committee member andreviewer of many international conferences and journals.

Sameh Sorour (S’98, M’11, SM’16) is an As-sistant Professor at the Department of Electricaland Computer Engineering, University of Idaho. Hereceived his B.Sc. and M.Sc. degrees in ElectricalEngineering from Alexandria University, Egypt, in2002 and 2006, respectively. In 2011, he obtained hisPh.D. degree in Electrical and Computer Engineer-ing from University of Toronto, Canada. After twopostdoctoral fellowships at University of Torontoand King Abduallah University of Science and Tech-nology (KAUST), he joined King Fahd University

of Petroleum and Minerals (KFUPM) in 2013 before moving to Universityof Idaho in 2016. His research interests lie in the broad area of advancedcommunications/networking/computing/learning technologies for smart citiesapplications, including cyber physical systems, internet of things (IoT) andIoT-enabled systems, cloud and fog networking, network coding, device-to-device networking, autonomous driving and autonomous systems, intelligenttransportation systems, and mathematical modelling and optimization forsmart systems.

Mohsen Guizani (S’85, M’89, SM’99, F’09) re-ceived the B.S. (with distinction) and M.S. degreesin electrical engineering, the M.S. and Ph.D. degreesin computer engineering from Syracuse University,Syracuse, NY, USA, in 1984, 1986, 1987, and 1990,respectively. He is currently a Professor and the ECEDepartment Chair at the University of Idaho, USA.Previously, he served as the Associate Vice Presidentof Graduate Studies, Qatar University, Chair of theComputer Science Department, Western MichiganUniversity, and Chair of the Computer Science De-

partment, University of West Florida. He also served in academic positionsat the University of Missouri-Kansas City, University of Colorado-Boulder,Syracuse University, and Kuwait University. His research interests includewireless communications and mobile computing, computer networks, mobilecloud computing, security, and smart grid. He currently serves on the editorialboards of several international technical journals and the Founder and theEditor-in-Chief of Wireless Communications and Mobile Computing journal(Wiley). He is the author of nine books and more than 450 publicationsin refereed journals and conferences. He guest edited a number of specialissues in IEEE journals and magazines. He also served as a member, Chair,and General Chair of a number of international conferences. He receivedthe teaching award multiple times from different institutions as well as thebest Research Award from three institutions. He was the Chair of the IEEECommunications Society Wireless Technical Committee and the Chair ofthe TAOS Technical Committee. He served as the IEEE Computer SocietyDistinguished Speaker from 2003 to 2005. He is a Fellow of IEEE and aSenior Member of ACM.

deep learning for iot big data and streaming analytics: a...

Documents