learning human activities for assisted living robotics activities.pdfthe activity recognition...

7
Learning Human Activities for Assisted Living Robotics David Ada Adama School of Science and Technology Noingham Trent University Noingham, U NG11 8NS Ahmad Lot School of Science and Technology Noingham Trent University Noingham, UK NG11 8NS Caroline Langensiepen School of Science and Technology Noingham Trent University Noingham, UK NG11 8NS Kevin Lee School of Science and Technology Noingham Trent University Noingham, UK NG11 8NS Pedro Trindade School of Science and Technology Noingham Trent University Noingham, UK NG11 8NS ABSTRACT Assistive living has gained increased focus in recent years with the increase in elderly population. is has led to a desire for technical solutions to reduce cost. Learning to perform human activities of daily living through the use of assistive technology (especially assistive robots) becomes more important in areas like elderly care. is paper proposes an approach to learning to perform human activities using a method of activity recognition from information obtained from an RGB-D sensor. Key features obtained from cluster- ing and classication of relevant aspects of an activity will be used for learning. Existing approaches to activity recognition still have limitations preventing them from going mainstream. is is part of a project directed towards transfer learning of human activities to enhance human-robot interaction. For test and validation of our method, the CAD-60 human activity data set is used. CCS CONCEPTS Computing methodologies ! Activity recognition and un- derstanding; Supervised learning by classication; Feature selection; Vision for robotics; Image representations; KEYWORDS Activity recognition, Assistive robotics, Ambient assisted living, Feature extraction ACM Reference format: David Ada Adama, Ahmad Lot, Caroline Langensiepen, Kevin Lee, and Pe- dro Trindade. 2017. Learning Human Activities for Assisted Living Robotics. In Proceedings of PETRA ’17, Island of Rhodes, Greece, June 21-23, 2017, 7 pages. DOI: hp://dx.doi.org/10.1145/3056540.3076197 David A. Adama is a PhD research student who has conducted the research as part of his thesis. Ahmad Lot is the corresponding author. e corresponding email address is: ahmad.lot@ntu.ac.uk Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. PETRA ’17, Island of Rhodes, Greece © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. 978-1-4503-5227-7/17/06. . . $15.00 DOI: hp://dx.doi.org/10.1145/3056540.3076197 1 INTRODUCTION With the increase in the ageing population in developed societies, Ambient Assisted Living (AAL) technology helps to reduce cost of elderly care. One means of achieving this is by deploying assistive robotics in an AAL environment. is requires the robot to learn tasks that human carers will perform as part of routine duties. Human Activity Recognition (HAR) has gained a lot of inter- est in recent years [9][10][11]. In applications involving human- computer interactions (for example, gaming and assisted living environments), HAR enables elaborate explanation/interpretation of activities performed by humans. is is key to understanding how these activities are learned and how one experience relates to another. is improves the collaboration and adaptation of assistive robots. Prior to assistive robots performing a human activity, they need to learn these activities to eectively perform them. is requires grouping human movements with their descriptive semantics. Ob- serving activities as they are performed through the use of visual or non-visual sensors makes it a lot easier to obtain information of human activities in an environment[4][19][20]. It would be ex- tremely hard to understand and interpret activities using a normal visual sensor such as RGB cameras which provide 2D visual data [6]. ese sensors provide limited information for an activity per- formed in a real world environment. However, recent development in RGB-D sensors show that they are beer devices for observ- ing human activities [6]. ese sensors provide a means of beer observing the world to detect human pose used to build activity recognition systems [4][20]. ey also provide a platform for ex- ploiting depth maps, body shape and skeleton joint detection of humans in 3D space which are used in developing sophisticated recognition algorithms. is paper proposes a supervised learning method to human ac- tivities recognition by exploiting 3D human skeleton data provided by an RGB-D sensor. e proposed method shows the observation of activities as they are performed and how features are extracted aer a clustering method is applied. us, providing a means of classifying dierent activities. e features extracted simplify the process of learning the activities. e work presented here is part of a research project directed towards improving human-robot or robot-robot interaction through Transfer Learning [12]. is will enable more adaptable robots learn to perform human activities of daily living from transferred knowledge. For example, in Figure 286

Upload: others

Post on 13-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Human Activities for Assisted Living Robotics activities.pdfthe activity recognition system. Another approach is shown in the work in [5] where the RGB-D sensor is used to

Learning Human Activities for Assisted Living Robotics

David Ada Adama∗School of Science and TechnologyNo�ingham Trent UniversityNo�ingham, U NG11 8NS

Ahmad Lot�†School of Science and TechnologyNo�ingham Trent UniversityNo�ingham, UK NG11 8NS

Caroline LangensiepenSchool of Science and TechnologyNo�ingham Trent UniversityNo�ingham, UK NG11 8NS

Kevin LeeSchool of Science and TechnologyNo�ingham Trent UniversityNo�ingham, UK NG11 8NS

Pedro TrindadeSchool of Science and TechnologyNo�ingham Trent UniversityNo�ingham, UK NG11 8NS

ABSTRACTAssistive living has gained increased focus in recent years with theincrease in elderly population. �is has led to a desire for technicalsolutions to reduce cost. Learning to perform human activitiesof daily living through the use of assistive technology (especiallyassistive robots) becomes more important in areas like elderly care.�is paper proposes an approach to learning to perform humanactivities using a method of activity recognition from informationobtained from an RGB-D sensor. Key features obtained from cluster-ing and classi�cation of relevant aspects of an activity will be usedfor learning. Existing approaches to activity recognition still havelimitations preventing them from going mainstream. �is is part ofa project directed towards transfer learning of human activities toenhance human-robot interaction. For test and validation of ourmethod, the CAD-60 human activity data set is used.

CCS CONCEPTS•Computing methodologies ! Activity recognition and un-derstanding; Supervised learning by classi�cation; Feature selection;Vision for robotics; Image representations;

KEYWORDSActivity recognition, Assistive robotics, Ambient assisted living,Feature extractionACM Reference format:David Ada Adama, Ahmad Lot�, Caroline Langensiepen, Kevin Lee, and Pe-dro Trindade. 2017. Learning Human Activities for Assisted Living Robotics.In Proceedings of PETRA ’17, Island of Rhodes, Greece, June 21-23, 2017,7 pages.DOI: h�p://dx.doi.org/10.1145/3056540.3076197∗David A. Adama is a PhD research student who has conducted the research as part ofhis thesis.†Ahmad Lot� is the corresponding author. �e corresponding email address is:ahmad.lot�@ntu.ac.uk

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permi�ed. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior speci�c permissionand/or a fee. Request permissions from [email protected] ’17, Island of Rhodes, Greece© 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.978-1-4503-5227-7/17/06. . .$15.00DOI: h�p://dx.doi.org/10.1145/3056540.3076197

1 INTRODUCTIONWith the increase in the ageing population in developed societies,Ambient Assisted Living (AAL) technology helps to reduce cost ofelderly care. One means of achieving this is by deploying assistiverobotics in an AAL environment. �is requires the robot to learntasks that human carers will perform as part of routine duties.

Human Activity Recognition (HAR) has gained a lot of inter-est in recent years [9][10][11]. In applications involving human-computer interactions (for example, gaming and assisted livingenvironments), HAR enables elaborate explanation/interpretationof activities performed by humans. �is is key to understandinghow these activities are learned and how one experience relates toanother. �is improves the collaboration and adaptation of assistiverobots.

Prior to assistive robots performing a human activity, they needto learn these activities to e�ectively perform them. �is requiresgrouping human movements with their descriptive semantics. Ob-serving activities as they are performed through the use of visualor non-visual sensors makes it a lot easier to obtain informationof human activities in an environment[4][19][20]. It would be ex-tremely hard to understand and interpret activities using a normalvisual sensor such as RGB cameras which provide 2D visual data[6]. �ese sensors provide limited information for an activity per-formed in a real world environment. However, recent developmentin RGB-D sensors show that they are be�er devices for observ-ing human activities [6]. �ese sensors provide a means of be�erobserving the world to detect human pose used to build activityrecognition systems [4][20]. �ey also provide a platform for ex-ploiting depth maps, body shape and skeleton joint detection ofhumans in 3D space which are used in developing sophisticatedrecognition algorithms.

�is paper proposes a supervised learning method to human ac-tivities recognition by exploiting 3D human skeleton data providedby an RGB-D sensor. �e proposed method shows the observationof activities as they are performed and how features are extracteda�er a clustering method is applied. �us, providing a means ofclassifying di�erent activities. �e features extracted simplify theprocess of learning the activities. �e work presented here is partof a research project directed towards improving human-robot orrobot-robot interaction through Transfer Learning [12]. �is willenable more adaptable robots learn to perform human activities ofdaily living from transferred knowledge. For example, in Figure

286

Page 2: Learning Human Activities for Assisted Living Robotics activities.pdfthe activity recognition system. Another approach is shown in the work in [5] where the RGB-D sensor is used to

PETRA ’17, June 21-23, 2017, Island of Rhodes, Greece D. A. Adama et al.

Figure 1: An example of robot-robot interaction throughlearning from activity recognition.

1, two robots are shown performing an activity in which one actsas the teacher while the other acts as a learner using transferredknowledge gained from visual observation as the teacher robotperforms the activity. �e goal of our research is to develop a stateof the art system which will have HAR and learning performancecomparable to that of humans.

�is paper is organized as follows: In section 2, a review ofrelated work in this area is presented. Section 3 gives details ofthe method applied to our approach and some initial results arepresented in Section 4. Section 5 presents conclusions and futurework to be undertaken.

2 RELATEDWORKFor an assistive robot to perform an Activity of Daily Living (ADL),an understanding of the activity is required which is the processof learning. An important aspect in the process of learning is therecognition of activities as they are performed by human actors.�is creates the need for proper observation of the environment bythe robot to rightly interpret the activity.

HAR from information obtained using RGB-D sensors givesvery important information relevant for a robot to understand anactivity. By exploring human pose detection using RGB-D sensors,activity recognition has seen more advancement in recent times[4][19]. �ese RGB-D sensors extract 3D skeleton data from depthimages and body silhoue�e for feature generation [1]. In [4], theRGB-D sensor is used to generate human 3D skeleton model withmatching of body parts linked by its joints. �ey extract positionsof individual joints from the skeleton in a 3D form x, y, z. Authors in[10] use similar RGB-D sensor to obtain depth silhoue�e of humanactivities from which body points information are extracted forthe activity recognition system. Another approach is shown in thework in [5] where the RGB-D sensor is used to obtain orientation-based human representation of each joint to the human centroid in3D space. �e data set obtained using RGB-D sensors have to bepreprocessed which is an important step in the activity recognition

system. �is process is carried out to �lter the data for be�errepresentation of features of an activity.

HAR from video camera sensors can be broken down into twoaspects [18]; feature-based and model-based. Feature-based tech-niques such as Histogram of Oriented Gradients (HOG)[21], sub-space clustering based approach (SCAR)[25] are used to extractfeatures for recognizing human activity from data acquired usingsensors. Model-based approach has to do with construction of ahuman model for recognition either as a 2D, 3D or skeletal model.Authors in [1][23] construct models using kinematic approach thatextract features from frame sequences for human structure repre-sentations. A combination of both feature-based and model-basedapproaches is seen in [20] where a Maximum Entropy MarkovModel (MEMM) for classi�cation of activities using features fromskeleton tracking combined HOG. Authors in [3] also used neuralnetwork technique to propose an end-to-end hierarchical recurrentneural network (RNN) for representing skeleton based construction.�ey make use of the raw positions of human joints as input to theRNN.

All methods reported above are using feature extraction tech-niques to obtain feature vectors which describe the activity per-formed. HAR approaches which utilize human 3D joint positionsinformation extracted from RGB-D sensors transform joint posi-tions of individual frames into column vectors. A matrix is thenformed to encode the sequence of frames at speci�c time inter-vals. 14 features obtained from distances between di�erent bodyparts are used to characterize 12 activities[4]. �is was used witha combination of multiple classi�ers to form a Dynamic BayesianMixture Model (DBMM). Similarly, [8] applied statistical covarianceof 3D joints (Cov3DJ) as features to encode the skeleton data ofraw joint positions. Another approach seen in [24] used a sequenceof joint trajectories and applied wavelets to encode each temporalsequence of joints into features. Since not all joints of a personprovide substantial information for interpreting an activity per-formed, di�erent methods are proposed to select key joints whichare more descriptive [2][7][14][16] and they will be investigatedfor an optimal method to be used in this research.

In the next section, the proposed methodology for feature extrac-tion and activity recognition from raw joint positions of skeletondata obtained from an RGB-D sensor will be explained.

3 ACTIVITY RECOGNITION SYSTEMTo recognize activities from RGB-D sensors, the �rst stage is torecognize skeleton raw joint positions. �is is done by obtainingrecorded frames of annotated data for activities performed by ahuman actor. �e data captured includes a hierarchical nature ofactivities, human skeletal features and their transitions over time.�e system can be set up in a living environment. An exampleof RGB and depth image frame extracted from an RGB-D sensoris shown in Figure 2. Once the raw joints positions informationis obtained, important features from this data must be extractedbefore they are classi�ed into di�erent activities. �e proposedsystem architecture is shown in Figure 3. �e following key stepsare identi�ed;

287

Page 3: Learning Human Activities for Assisted Living Robotics activities.pdfthe activity recognition system. Another approach is shown in the work in [5] where the RGB-D sensor is used to

Learning Human Activities for Assisted Living Robotics PETRA ’17, June 21-23, 2017, Island of Rhodes, Greece

Figure 2: Frame of RGB and Depth image showing tracked human skeleton.

Figure 3: Overall system architecture of proposed system.

• Posture feature extraction: Posture feature vectors repre-senting key human poses that describe an activity are ex-tracted by obtaining centroids of clusters using K-meansclustering technique.

• Activity classi�cation: Classi�cation of activities from theextracted posture features.

Details of each step is brie�y explained in the following sections.

3.1 Posture Feature ExtractionPosture feature extraction is a key step in our system for recognizingactivities. Features obtained in activity recognition systems can becomputed using joint position coordinates extracted from RGB-Dsensor skeleton data. �is approach is one of the simplest employedby researchers. Two methods can be applied, these are based onraw joint positions and displacement-based representations- whenconsidering temporal and spatial data.

�e proposed system adopts a method which exploits displace-ment features from the skeleton joint coordinates. However, tempo-ral information will be excluded to make the system independentof speed of movement. �is method was introduced by [2]. Foreach skeleton frame, a vector FN is used to represent the posturefor joints ji in 3D where N is the frame (or observation) numberfor an activity as shown in Figure 4. �e Kinect RGB-D sensor con-siders the skeleton frame of reference to be the sensor. However,in order to compensate the position of the skeleton, the frame ofreference will be considered for all joints relative to the torso jointcoordinates.

Consider a skeleton which has i joints with jx representing thetorso coordinates. �e distance vector di between the ith joint andthe torso (reference) is given by:

di = ji � jx (1)

A�er computing the distance di for all the joints in a frame of anactivity, a vector V representing the set of joints position distancerelative to the torso is obtained. Each vectorV represents a posturefor a skeleton frame. A set of VN vectors for all frames obtainedrepresent an activity. �is process is illustrated in Figure 4.

VN = (d1,d2,d3, ...,di ) (2)

�e proposed approach applies K-means clustering techniquefor selection of key human pose that represent an activity therebyreducing the complexity in large activity data sets. �is technique isused for representation of an activity by a subset of poses, withouthaving to use all observations. VN vectors representing frames ofan activity are grouped inK clusters based on the squared Euclideandistance as a metric.

For an activity consisting ofVN vectors, K clusters are de�ned torepresent key postures for the activity. A cluster CK is a groupingof closely related joints positions component (joint coordinates)within frames of the activity. A�er computing, an output of Kclusters [C1,C2,C3, ...,CK ] are obtained. Each cluster is a vectorcontaining centroids for all joints that de�ne a key posture withinan activity. Once the clusters are formed, they are sorted out fol-lowing the order the cluster elements occur during the activity.�e clustering algorithm provides the ID of the cluster which anobservation within the activity belongs. �e sequence of the clustervectors obtained represent the sequence of observations that consti-tute the performed activity. �e centroids of each cluster representkey postures which are the most important features of an activity.�ese features form the activity features vector A.

288

Page 4: Learning Human Activities for Assisted Living Robotics activities.pdfthe activity recognition system. Another approach is shown in the work in [5] where the RGB-D sensor is used to

PETRA ’17, June 21-23, 2017, Island of Rhodes, Greece D. A. Adama et al.

Figure 4: Steps to activity recognition in the proposed method.

A = [C1,C2,C3, ...,CK ] (3)

For example, considering an activity represented with 10 observa-tions and 3 clusters, one of the outputs a�er applying K �meansclustering would be a sequence of cluster IDs representing the activ-ity: [2, 2, 1, 1, 1, 1, 3, 3, 3, 1], this means the �rst two observations arewithin cluster 2, the third - sixth observation in cluster 1, seventh -ninth observation in cluster 3 and the tenth observation in cluster 1.�erefore, the activity features vector will be A = [C2,C1,C3,C1].

One of the challenge of using K-means clustering algorithmin the proposed method is obtaining the optimal number of clus-ters that can e�ectively represent the posture features of an ac-tivity. Considering that, we use di�erent number of clusters CKfor K = (5, 10, 15, 20, 25, 30) and compare the results. �e resultis evaluated by minimizing the error Z which is the sum of thedeviations between the centroids of the clusters and the actual jointpositions through the sequence of frames for an activity.

Z =n’i=1

(Ck,n � �i,n ) (4)

where i is the ith joint, Ck,n is the centroid of the cluster k at thenth observation/frame and �i,n is the ith joint at n.

3.2 Activity Classi�cationClassi�cation of the activity feature vector is computed to associateeach extracted feature vector with the correct activity performed.

Figure 5: Arti�cial Neural Network classi�cation structure.

To achieve the task of human activity classi�cation, machine learn-ing classi�cation algorithms may be applied. One of such algorithmis an Arti�cial Neural Network (ANN) trained classi�er.

�e proposed system employs an ANN algorithm consisting ofa hidden layer for classi�cation of human activities. Details of theANN algorithm for classi�cation which is adopted in this work canbe found in [15]. In training the classi�er, feature vectors obtainedfrom clustering which describe a performed activity are passedas inputs and the activity labels as targeted outputs of the ANN.An iterative learning process which is a key feature of ANN’s isperformed during which the weights of all neurons are adjusted topredict the correct activity label from input activity feature vectors.�e structure of the ANN classi�cation is given in Figure 5.

289

Page 5: Learning Human Activities for Assisted Living Robotics activities.pdfthe activity recognition system. Another approach is shown in the work in [5] where the RGB-D sensor is used to

Learning Human Activities for Assisted Living Robotics PETRA ’17, June 21-23, 2017, Island of Rhodes, Greece

Figure 6: Selected frames of human actor performing activities in a living environment from CAD-60 dataset.

Figure 7: Confusionmatrix showing the performance of oursystem for 6 activities.

�e input to the hidden layer neurons is the sum of the extractedactivity feature vectors A and weightswi passed through an acti-vation function. For the initial iteration, weights are selected atrandom and subsequently, they are modi�ed through successiveiterations during training of the network based on the error propa-gated from each iteration. �e error is the di�erence between theactual output label for the activity and the predicted value fromeach iteration and is calculated for each iteration.

4 EXPERIMENTAL RESULTS ANDDISCUSSION

�e proposed method of learning from activity recognition is testedon a well known human activity data set, CAD-60 data set [22]. �edata had been extracted using a Microso� Kinect RGB-D sensor[13]. It contains RGB frames as well as depth for sequence of humanactivities performed.

�e CAD-60 data set was collected at a frame rate of 15fps whichis o�ered by the sensor and comprises of 12 activities performed byfour people. However, for testing the proposed system, 6 activitiesout of 12 activities are selected which are;

(1) Brushing teeth(2) Cooking-chopping(3) Cooking-stirring(4) Open pill container(5) Talk on phone(6) Write on board

�is activities are selected based on the type of application of theproposed system for learning using an assistive robot capable ofperforming these activities. �erefore, only activities which requiremore participation of the arms when performed are chosen for tests.Selected frames for these activities are shown in Figure 6.

For posture feature extraction, the joint positions are extractedfollowing the process as described in Section 3.1. However, incomputing the optimal number of clusters which best representactivity feature vectors di�erent number of clusters ranging from5 to 30 are tested. �e best results are obtained using 25 and 30clusters. To illustrate this, a plot of cluster centroids and jointpositions through a sequence of brushing teeth activity for one jointcomponent (Right hand x-coordinate) is presented in Figure 8. �isjoint position is selected because it is one of the most signi�cantcomponents in performing the activity.

A�er clustering, the activity features vectors are associated withtheir corresponding activity through a training process using anANN classi�cation algorithm with 10 hidden neurons. For evalua-tion of our classi�er, we adopt a cross validation method of testing

290

Page 6: Learning Human Activities for Assisted Living Robotics activities.pdfthe activity recognition system. Another approach is shown in the work in [5] where the RGB-D sensor is used to

PETRA ’17, June 21-23, 2017, Island of Rhodes, Greece D. A. Adama et al.

Figure 8: Result of di�erent number of clusters in an activity for one of the key joints in performing an activity of brushingteeth from CAD-60 dataset.

using a new person data which was not used in training the classi-�er. Test data from a new person consisting of 1416 observations ofdi�erent activities is introduced to the trained network and resultsobtained achieve a recognition accuracy of 85.3% as shown in theconfusion matrix plot of Figure 7. �e confusion matrix shows thenumber/percentages of true positives, true negatives, false positivesand false negatives of observations from the new person test dataset. �e axes represent the 6 activities chosen for this test.

A�er training and validation of the classi�er using the extractedactivity features vector, each frame of a new person test data isclassi�ed to the closest posture feature within the activity features

vector. �e features vector which represent centroids of closelyrelated joint components for a pose and therefore an input framevector of joints positions can be associated to a correspondingfeature vector for the activity. �e system does not have to waitfor a complete sequence of a new person activity before classifyingthe activity performed. Considering that, the proposed systemshould be capable of ful�lling real-time performance which will beexplored in subsequent research work.

291

Page 7: Learning Human Activities for Assisted Living Robotics activities.pdfthe activity recognition system. Another approach is shown in the work in [5] where the RGB-D sensor is used to

Learning Human Activities for Assisted Living Robotics PETRA ’17, June 21-23, 2017, Island of Rhodes, Greece

5 CONCLUSION AND FUTUREWORK�e work in this paper presented a method of learning human ac-tivities via computing activity features vector by clustering rawjoint positions extracted from human skeleton representation froman RGB-D sensor. �e method can be applied to a human assistiverobot which would be capable of performing these activities when-ever the activity is detected. �e robot could perform the activityvia coordinate mapping of its joints to the human skeleton joints.However, this is beyond the scope of this paper.

�e classi�cation of activities result show that some observationsfrom the test data are classi�ed wrongly. �is could be as a resultof a feature vector representing a particular pose in the activitysimilar to that of another activity. �erefore, part of the futurework is to investigate the degree of membership to which a featurevector belongs to di�erent activities in order to distinctly classifythe activity using more sophisticated computational intelligencemethod.

Another aspect not considered in this work is temporal infor-mation of performed human activities which could be very usefulin di�erentiating similar joint coordinates position of di�ering ac-tivities. Also, in selection of the best amount of clusters K given aspeci�c activity, other approaches could be explored for exampleBayesian Information Criterion (BIC) [17] as a way to score andselect an optimal value for K. �is will be considered in furtherresearch.

�is method can be used in assisted living/elderly care to haveassistive robots used in performing most ADLs for elderly or peopleincapable of performing activities within a living environmente�ectively. �e results show a very promising outcome in the areaof assistive robots’ coexistence in human living environments.

REFERENCES[1] Alexandros Andre Chaaraoui, Pau Climent-Perez, and Francisco Florez-Revuelta.

2013. Silhoue�e-based human action recognition using sequences of key poses.Pa�ern Recognition Le�ers 34, 15 (2013), 1799–1807.

[2] Alexandros Andre Chaaraoui, Jose Ramon Padilla-Lopez, Pau Climent-Perez,and Francisco Florez-Revuelta. 2014. Evolutionary joint selection to improvehuman action recognition with RGB-D devices. Expert systems with applications41, 3 (2014), 786–794.

[3] Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural net-work for skeleton based action recognition. In Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pa�ern Recognition. IEEE ComputerSociety, 1110–1118.

[4] Diego R. Faria, Cristiano Premebida, and Urbano Nunes. 2014. A ProbabilisticApproach for Human Everyday Activities Recognition using Body Motion fromRGB-D Images. In�e 23rd IEEE International Symposium on Robot and HumanInteractive Communication, RO-MAN. IEEE, 732–737.

[5] Ye Gu, Ha Do, Yongsheng Ou, and Weihua Sheng. 2012. Human gesture recogni-tion through a kinect sensor. In IEEE International Conference on Robotics andBiomimetics (ROBIO). IEEE, 1379–1384.

[6] F. Han, B. Reily, W. Ho�, and H. Zhang. 2017. Space-Time Representationof People Based on 3D Skeletal Data: A Review. Computer Vision and ImageUnderstanding -, - (2017), –. DOI:h�p://dx.doi.org/10.1016/j.cviu.2017.01.011

[7] De-AnHuang and KrisMKitani. 2014. Action-reaction: Forecasting the dynamicsof human interaction. In European Conference on Computer Vision. Springer, 489–504.

[8] Mohamed E Hussein, Marwan Torki, Mohammad Abdelaziz Gowayyed, andMotaz El-Saban. 2013. Human Action Recognition Using a Temporal Hierarchyof Covariance Descriptors on 3D Joint Locations.. In Proceedings of the Twenty-�ird International Joint Conference on Arti�cial Intelligence. AAAI Press, Beijing,China, 2466–2472.

[9] Jose Antonio Iglesias, Plamen Angelov, Agapito Ledezma, and Araceli Sanchis.2010. Human activity recognition based on Evolving Fuzzy Systems. Internationaljournal of neural systems 20, 5 (2010), 355–364.

[10] Ahmad Jalal and S. Kamal. 2014. Real-time life logging via a depth silhoue�e-based human activity recognition system for smart home services. In 11th IEEEInternational Conference on Advanced Video and Signal-Based Surveillance, AVSS2014. IEEE, 74–80.

[11] Hema Swetha Koppula, Rudhir Gupta, and Ashutosh Saxena. 2013. Learninghuman activities and object a�ordances from RGB-D videos. �e InternationalJournal of Robotics Research 32, 8 (2013), 951–970.

[12] Jie Lu, Vahid Behbood, Peng Hao, Hua Zuo, Shan Xue, and Guangquan Zhang.2015. Transfer learning using computational intelligence: A survey. Knowledge-Based Systems 80, C (2015), 14–23.

[13] Microso�. 2017. Developing with Kinect for Windows. h�ps://developer.microso�.com/en-us/windows/kinect/develop. (2017). Accessed: 2017-02-28.

[14] Orasa Patsadu, Chakarida Nukoolkit, and Bunthit Watanapa. 2012. Humangesture recognition using Kinect camera. In Computer Science and So�wareEngineering (JCSSE), 2012 International Joint Conference on. IEEE, 28–32.

[15] David Reby, Sovan Lek, Ioannis Dimopoulos, Jean Joachim, Jacques Lauga, andStephane Aulagnier. 1997. Arti�cial neural networks as a classi�cation methodin the behavioural sciences. Behavioural processes 40, 1 (1997), 35–43.

[16] Miguel Reyes, Gabriel Domınguez, and Sergio Escalera. 2011. Featureweightingin dynamic timewarping for gesture recognition in depth data. In ComputerVision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE,1182–1188.

[17] Gideon Schwarz. 1978. Estimating the Dimension of a Model. �e Annals ofStatistics 6, 2 (1978), 461–464.

[18] T Subetha and S Chitrakala. 2016. A Survey on human activity recognitionfrom videos. In 2016 International Conference on Information Communication andEmbedded Systems (ICICES). IEEE, 1–7.

[19] Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. 2011. HumanActivity Detection from RGBD Images. In Proceedings of the 16th AAAI Conferenceon Plan, Activity, and Intent Recognition (AAAIWS’11-16). AAAI Press, 47–55.

[20] Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. 2012. Unstruc-tured human activity detection from RGBD images. In 2012 IEEE InternationalConference on Robotics and Automation. IEEE, 842–849.

[21] Sabanadesan Umakanthan, Simon Denman, Clinton Fookes, and Sridha Srid-haran. 2014. Multiple instance dictionary learning for activity representation.In 2014 22nd International Conference on Pa�ern Recognition (ICPR). IEEE, 1377–1382.

[22] Cornell University. 2009. Cornell activity datasets CAD-60. h�p://pr.cs.cornell.edu/humanactivities/data.php. (2009). Accessed: 2017-02-28.

[23] Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human ActionRecognition by Representing 3D Skeletons As Points in a Lie Group. In Pro-ceedings of the 2014 IEEE Conference on Computer Vision and Pa�ern Recognition(CVPR’14). IEEE Computer Society, Washington, DC, USA, 588–595.

[24] Ping Wei, Nanning Zheng, Yibiao Zhao, and Song-Chun Zhu. 2013. Concurrentaction detectionwith structural prediction. In Proceedings of the IEEE InternationalConference on Computer Vision. IEEE, 3136–3143.

[25] Huiquan Zhang and Osamu Yoshie. 2012. Improving human activity recognitionusing subspace clustering. In 2012 International Conference on Machine Learningand Cybernetics (ICMLC), Vol. 3. IEEE, 1058–1063.

292