tic final report

32
  A Report On GymNeus Prepared in partial fulfilment of TIC PROJECT Prepared By Yashvardhan Srivastava 2010B5A3540P Aayush Jain 2010B1A3371P Shriniwas Sharma 2010ABPS460P Danish Pruthi 2011A7PS037P Submitted to Dr. Anu Gupta BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI

Upload: aayush-jain

Post on 16-Oct-2015

23 views

Category:

Documents


0 download

TRANSCRIPT

  • A Report On

    GymNeus

    Prepared in

    partial fulfilment of

    TIC PROJECT

    Prepared By

    Yashvardhan Srivastava 2010B5A3540P

    Aayush Jain 2010B1A3371P

    Shriniwas Sharma 2010ABPS460P

    Danish Pruthi 2011A7PS037P

    Submitted to

    Dr. Anu Gupta

    BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI

  • ACKNOWLEDGEMENT

    Every work requires support and contribution from different sources and people for its

    successful completion and achieve desired outcome. I would like to thank Dr. Anu Gupta

    for providing me this opportunity to work on this project as a TIC Project for Knightvale

    Consultancy GmbH.

    I would specially like to thank Mr. Puneet Teja, my mentor for this project who has guided

    me on every turn throughout this project. His every suggestion was highly valued and

    helped me optimize the application.

    Moreover I would also like to thank the institution for providing round the clock access to

    internet facilities which helped me to get relevant information. In addition to that I would

    like to pay heartily regards to those people who helped us directly and indirectly.

  • 2

    ABSTRACT

    GYMNEUS

    GymNeus is a service that allows user to track and analyze their workouts on their

    Smartphone. It includes the workouts in the gym. The data is sent can be sent to the user's

    smartphone if its around. It can be saved in the device and can be recovered later or it can

    be directly sent to the servers and the user can get a complete analysis of his workout that

    can be tracked or even maintained as a training log for a trainer.

    The accelerometer and gyroscope sensors send the data points and the machine learning

    algorithms identify the workout being done and store it accordingly.

  • 3

    TABLE OF CONTENTS

    Acknowledgement i

    Abstract ii

    1. Introduction 4

    2. Workflow 6

    3. Results 9

    4. Literature Survey 11

    5. Decision Trees: An introduction 13

    6. Rating decision trees 14

    7. Ranking features 14

    8. Automated Methodologies of creating decision trees 15

    9. Combination/Ensemble Techniques 17

    10. APPENDIX 18

    6.1 ARFF file documentation 18

    6.2 Java Code 21

  • 4

    INTRODUCTION

    GYMNEUS

    GymNeus is used to track workouts on strength training machines. Users can keep a

    detailed record of their workouts on all strength training machines as well as general

    workout sessions. In addition, the workouts can be shared with friends, peers on social

    networks as well as with personal trainers.

    GymNeus connects over Bluetooth with your mobile device or with your computer using a

    micro USB cable.

    Gymneus has accelerometers and gyroscope based sensors which take accurate data and

    transfer it to the smartphone which then process the data in such a manner that the type of

    workout is quickly identified.

    This processing to identify/ classify the workout type can also be done on the electronics

    and there could also be a provision for directly sending the data in json format over the 3G/

    GPRS connection on the servers.

    ANDROID

    Android is a software stack for mobile devices that includes an operating system,

    middleware and key applications. Android is a software platform and operating system for

    mobile devices based on the Linux operating system and developed by Google and the Open

    Handset Alliance. It allows developers to write managed code in a Java-like language that

    utilizes Google-developed Java libraries, but does not support programs developed in

    native code. The unveiling of the Android platform on 5 November 2007 was announced

    with the founding of the Open Handset Alliance, a consortium of 34 hardware, software and

    telecom companies devoted to advancing open standards for mobile devices. When

    released in 2008, most of the Android platform will be made available under the Apache

    free-software and open-source license. Since then the market of smart phones increased

  • 5

    exponentially and different versions of Android platform were launched with the Android

    4.3 Kit-Kat being the latest. These versions provide enhanced support for various features

    and different types of sensors, which provide a very rich experience to user and an

    opportunity to developer to experiment and create amazing android apps.

    SOFTWARES/ RESOURCES USED

    Eclipse IDE

    Eclipse is an integrated development environment (IDE). It contains a base workspace and

    an extensible plug-in system for customizing the environment. Written mostly in Java,

    Eclipse can be used to develop applications. By means of various plug-ins, Eclipse may also

    be used to develop applications in other programming languages

    Java 7

    Java is a computer programming language that is concurrent, class-based, object-oriented,

    and specifically designed to have as few implementation dependencies as possible.

    The seventh iteration of this language is used for programming and generating various files

    which are later used in the project.

    WEKA

    Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine

    learning software written in Java, developed at the University of Waikato, New Zealand.

    Weka is free software available under the GNU General Public License.

    The Weka (pronounced Weh-Kuh) workbench[1] contains a collection of visualization tools

    and algorithms for data analysis and predictive modelling together with graphical user

    interfaces for easy access to this functionality.

    Weka supports several standard data mining tasks, more specifically, data pre

    processing, clustering, classification, regression, visualization, and feature selection. All of

    Weka's techniques are predicated on the assumption that the data is available as a single

    flat file or relation, where each data point is described by a fixed number of attributes

    (normally, numeric or nominal attributes, but some other attribute types are also

    supported). ARFF files were developed by the Machine Learning Project at the Department

    of Computer Science of The University of Waikato for use with the Weka machine learning

    software.

  • 6

    ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of

    instances sharing a set of attributes.

    Further details about the ARFF file which is generated for use in Weka has been given in the

    Appendix.

    WORKFLOW

    As discussed earlier the Basic requirement, or rather the initial requirement is to identify /

    classify which type of workout is being done looking at only the basic accelerometer and

    the gyroscope readings.

    The objective being a simple classification problem required a machine learning algorithm.

    It also required sufficient amount of data so that the features calculated are enough for the

    machine to both learn and train itself to identify which type of workout is being performed.

    To judge the correctness of the algorithm Weka has been used.

    The basic procedure was a two step process:

    1. Obtain enough data points for 3 predefined workout types (pulls, curls, stretches)

    2. Write java code to generate the arff file which could be used by weka.

    The Data points were collected using the accelerometers and gyroscope sensors of a

    smartphone.

    The data points for a particular repetition were stored separately in a file.

    The file was in .csv format.

    2014-02-12 10:17:13 +0000 0.034574

    -0.68078

    -0.81232 0.013885

    -0.66577

    -0.78806

    2014-02-12 10:17:13 +0000 0.011716

    -0.68789

    -0.79446 -0.00897

    -0.67288

    -0.77019

    2014-02-12 10:17:13 +0000 0.011716

    -0.68789

    -0.79446 -0.00897

    -0.67288

    -0.77019

    2014-02-12 10:17:13 +0000 -0.02315

    -0.69756

    -0.76527 -0.04384

    -0.68256 -0.741

    2014-02-12 10:17:13 +0000 -0.02022 -0.6869

    -0.77772 -0.04091

    -0.67189

    -0.75345

    2014-02-12 10:17:13 +0000 -0.02022 -0.6869

    -0.77772 -0.04091

    -0.67189

    -0.75345

    2014-02-12 10:17:13 +0000 0.006238

    -0.72957

    -0.86001 -0.01445

    -0.71457

    -0.83574

    2014-02-12 10:17:13 +0000 -0.04747

    -0.73574

    -0.83021 -0.06816

    -0.72073

    -0.80594

  • 7

    2014-02-12 10:17:13 +0000 -0.0322

    -0.85549

    -1.13512 -0.05289

    -0.84048

    -1.11086

    This file was cleaned to reveal only the data required to generate the arff file.

    121,-57,-16,-17,204,108

    11,-63,-50,-17,218,109

    87,-65,-51,-15,224,111

    57,-66,-45,-19,222,110

    980,-1086,-1629,34,219,102

    259,-2213,-2801,90,296,29

    The values generated correspond to accelerometer readings of the 3 axis followed by the

    gyroscope readings of the corresponding axis.

    A total of 117 (44 pulls, 46 stretches, 27 curls) such files corresponding to different

    repetition was created.

    This was followed by calculation of various features that might be useful in determining the

    workout type. The features which were decided upon :

    mean, variance, minmax difference, zero crossing rate, correlation of the corresponding

    pair of axis, linear regression, root mean square.

    These features were calculated using a java program and arranged according to the format

    described for arff type files.

    The format has been described in the appendix.

    The final arff file that was generated can be visualized as:

    @relation actvity

    @attribute minmax_acc_x

    ...

    @attribute minmax_vel_gyr_avg

    @attribute rms_acc_x

    ...

    @attribute rms_vel_gyr_avg

    @attribute var_acc_x

    ...

    @attribute var_vel_gyr_avg

    @attribute mean_acc_x

    ...

    @attribute mean_vel_gyr_avg

    @attribute zcr_acc_x

    ...

    @attribute zcr_vel_gyr_avg

    @attribute corr_acc_xy

    ...

    @attribute corr_vel_gyr_xz

    @attribute type {curls, pulls, stretches}

    @data

  • 8

    2737.0,4701.0,7371.0,1521.6666666666667,.....600.0,448.0,456.0,188.66666666666666,stretches

    4946.0,8544.0,11733.0,5626.333333333333,......1588.0,4739.0,2950.0,2182.333333333333,stretches

    670.8845504257793,1151.2510933762453,......1480.7058587038819,358.81221334347646,stretches

    136.6925016231688,212.70552414077073,......146.34397835237363,101.68867521345072,stretches

    1616.3602321264898,4134.453303642454,......4825.417751863562,2473.2497964329355,curls

    878.2117284573236,2834.0528929432494,......1882.6672462227625,1295.8325714895939,curls

    468474.6266666669,1379896.8733333335,......2279139.583333333,133777.1177777778,curls

    17224.573333333334,6406.71,6723.309999999999,......2102.3770370370366,1881325.7433333334,curls

    9396463.31,1.8557338793333333E7,4249710.597037037,......279233.5733336,1919486.4166666667,curls

    931439.79,406143.2314814815,18.72,-26.04,67.2,......-17.8933333324,-46.36,197.72,122.32,pulls

    91.22666666666667,898.08,-2841.32,-2338.72,.......-1427.3199997,-709.36,2487.8,1627.96,pulls

    1135.4666666666667,6.0,6.0,6.0,10.0,4.0,2.0,2.0,0.0,......2.0,2.0,1.0,2.0,2.0,0.0,0.0,0.0,pulls

    -0.46675630191156164,0.9569454103983737,.....-0.40540496037786083,0.496238755629301,pulls

    -0.7020484458471432,-0.5874667698799683,......-0.8490728313603951,0.9753172476431332,pulls

    -0.8636550536040989,-0.7832024225046871,......0.9696033412535484,-0.9029983281113617,pulls

    This file that has been generated was used in weka to generate results.

    The following results were obtained.

    There was a series of tests performed and the results are as follows:

    Phase 1:

    Correctly Classified Instances 76 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0272 Root mean squared error 0.0763 Relative absolute error 6.1754 % Root relative squared error 16.2985 % Total Number of Instances 76

    This was when we used 80% as the training set and rest 20% as testing!

    Phase 2:

    Correctly Classified Instances 111 95.6897 %

    Incorrectly Classified Instances 5 4.3103 % Kappa statistic 0.9337 Mean absolute error 0.0331 Root mean squared error 0.1671 Relative absolute error 7.6344 % Root relative squared error 35.8956 % Total Number of Instances 116

    Phase 3 :

    Correctly Classified Instances 34 97.1429 %

    Incorrectly Classified Instances 1 2.8571 % Kappa statistic 0.9544 Mean absolute error 0.019 Root mean squared error 0.138

  • 9

    Relative absolute error 4.3841 % Root relative squared error 29.5843 % Total Number of Instances 35

    === Detailed Accuracy By Class ===

    TP Rate FP Rate Precision Recall F-Measure Class

    0.857 0 1 0.857 0.923 curl

    1 0.053 0.941 1 0.97 pull

    1 0 1 1 1 stretch

    === Confusion Matrix ===

    a b c

  • 10

    .

  • 11

    LITERATURE SURVEY

    The Classification Problem

    Classication, which is the task of assigning objects to one of several predened

    categories, is a pervasive problem that encompasses many diverse applications. Examples

    include detecting spam email messages based upon the message header and content,

    categorizing cells as malignant or benign based upon the results of MRI scans, and

    classifying galaxies based upon their shapes.

    The input data for a Classication task is a collection of records. Each record, also known as

    an instance or example, is characterized by a tuple (x, y), where x is the attribute set and y

    is a special attribute, designated as the class label (also known as category or target

    attribute).

    Human Activity Recognition through feature extraction

  • 12

    This is followed by feature extraction. The feature extraction step is possibly the most

    important part of the activity recognition problem since classification can be handled by

    any existing machine learning algorithm if the features are robust. In general frequency

    domain features have been found to perform best [6]. However oftentimes extracting these

    require too much computation to be feasible in realtime systems [7]. The feature extraction

    scheme that we devised is computationally efficient but less tolerant of person to person

    variations. We combined modified versions of techniques previously used in this domain

    with quantitative description methods used in electroencephalography (EEG) signal

    analysis. Our intended use case is activity recognition on cell phones. Important

    characteristics of that scenario are minimal processing capability, only one 3D

    accelerometer, device is carried in a mostly static orientation in the user's pocket or

    purse, and that the system can be trained and used by the same person, namely the owner

    of the phone. Performance on the standard dataset and the prototype cell phone application

    proves that our method is applicable for the targeted use case.

    As a whole this work makes the following contributions:

    A novel linear-time feature extraction scheme that uses various disparate methods to

    identify human activities is presented.

    Accuracy of the proposed method is shown using various classification methods on a

    standard accelerometer-based dataset and realtime data on a cell phone.

    Prototype application demonstrates that activities can be detected on modern cellphones

    in realtime without help from any external sensing or computing device.

  • 13

    The time- and frequency-domain features we consider are listed in table I. There are two

    reasons why we consider all these features: Firstly, feature extraction costs computational

    as well as communication resources. There is a relationship between the cost, the

    robustness and the expressive power of the features. Therefore, we closely examine the

    nature of these relationships. For example, all the time domain-features avoid the

    complexity of preprocessing - i.e., they do not require the laborious task of framing,

    windowing, filtering, Fourier transformation, liftering, and so on. Subsequently, they not

    only consume little processing power, but the algorithms can be directly deployed in

    resource constrained nodes. However, they are not robust to measurement and calibration

    errors. The second reason is our desire to support rapid prototyping by providing

    application developers the knowledge and experience concerning the type of features they

  • 14

    can consider if they choose to employ accelerometer sensors. The features we analyze are

    listed in table I.

    Decision Trees: An introduction

    Decision Trees are important data-structures, where each node has a binary-question on

    which a decision is taken, and the entire data splits as per the result to the question posed

    at each node.

    Decision Trees are essential to Classification problems in particular. A decision tree is a

    flowchart-like structure in which internal node represents test on an attribute, each branch

    represents outcome of test and each leaf node represents class label (decision taken after

    computing all attributes). A path from root to leaf represents classification rules.

    Taking an example, suppose we need to identify an animal-bite, we have to somehow

    identify the animal behind it.

    Quite naturally we start asking questions, like is the area swollen after a bite? Not all

    animal bites result in swollen areas! So the answer to

    this question eliminates a few animals, we repeat and

    ask more questions until we are certain of the animal

    who had bitten. This is the very eccentric idea of

    decision trees!

    A sample decision tree from our problem has been

    shown below which decides on the nature of the

    excercise :

    Pulls, Stretches or Curls.

    The attributes/questions it considers are maxmin_acceleration and

    maxmin_velo_gyroscope values

    Rating Decision Trees

  • 15

    After forming various decision trees, a very natural question to ask is : Which of them are

    actually good?

    But some other traits of good decision trees are :

    1. Accuracy : The decision tree should perform well when subjected to different test-cases

    other than the training set.

    2. Low-Depth : Smaller decision trees are easier and faster to compute, this parameter has

    its importance on microcontrollers like Rasberry Pi, which was used for our task.

    3. Independent Nodes : The more independent the nodes are, the better it is. Since an

    error in one of them would not affect other nodes.

    The better a decision tree does on the above three parameters a better rating does it get.

    We gave a mathematical measure to each of these three traits, and rated different decision

    trees based upon these mathematical formulations.

    Knowing the weights(ratings) of different trees is essential in the last step when we club

    them to generate the final outcome!

    Ranking Different Attributes

    Among different features available, we need to basically rank features upon their

    importance. Some features are more informative than others. An important feature can

    help us decide better and quicker.

    So the solutions to the ranking attribute problem have numerous applications, not just for

    Classifications but in various other learning algorithms.

    The difficulty arrives at measuring the 'importance' statistically on what is worth and what

    is not!

    There are various statistical parameters that govern how informative a

    particular trait is.

    One such parameter is InfoGainValue, which basically means separates the all the entire

    dataset on the particular feature, and measures how clean the partitions are. Suppose an

    attribute can clearly distinguish between three classes of Excercises, then that feature has a

    lot of information and hence will have higher infoGain value than others.

    There are other parameters like entropyVal, which is the amount of entropy generated by a

    split using a particular feature. For an overlapping (bad) split the entropy value (the

    measure of randomness among different splits) would be larger. In a way InfoGainVal is

    inversely proportional to entropy values.

    The parameter we used for our project was based upon InfoGainVal

    Automated Procedures/Flowcharts for Selecting good Decision Trees

  • 16

    Two methodologies, which can be automted, are presented below.

    Generating Weighted Trees

    Some Details :

    1. The attribute ranking model to be used in Weka is 'InfoGainAttributeEval'

    2. I would recommend k to be somewhere between 4 and 10.

    3. The generation of decision tree is done using the 'J48 Model' in weka.

    4. While decrementing weights assigned to subsequent trees, various decaying

    functions can be experimented upon.

  • 17

    Generating Equal-weight Trees

    Some Details :

    1. This model generates equal-weight trees, here we for the first tree we pick say 1st,

    6th,11th ,17th 23rd features. In this example : m=k=5.

    The key idea is to exploit best features in each and every tree.

    2. I would recommend m and k to be somewhere between 5-9, and m~=k.

    3. The generation of decision tree is done using the 'J48 Model' in weka.

    4. The attribute ranking model to be used in Weka is 'InfoGainAttributeEval'

  • 18

    Combination/Ensemble Techniques

    Suppose we want to combine 5 trees. Assuming each decision tree is basically a set of if-else

    statements. We will get five answers for each training data file we input to it. Each decision

    initially has weight 1. In this way we will get a confidence voting for each group of tree.

    For example: if 4 trees give the answer as "curl" and one tree gives the answer "pull". Due

    to majority the confidence vote would be 4/5=0.8.

    We can average out the confidence vote for each training data file to get a final mean

    confidence vote for the combination of trees.

    We also rechecked the confidence vote for different sets of training data files.

    We also tried out weighted polling, where different decision trees have different priorities,

    and the weight of a high-priority tree is considered more than the lower-priority trees.

    With these exercises we finalized a decent block (of if-else statements), that would be able

    to accurately classify the exercise.

  • 19

    REFERENCES

    Analysis of Time and Frequency Domain Features of Accelerometer Measurements

    Waltenegus Dargie

    Activity Recognition from Accelerometer Data

    Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman

    A Feature Extraction Method for Realtime Human Activity Recognition on Cell

    phones

    Mridul Khan1, Sheikh Iqbal Ahamed2, Miftahur Rahman1, Roger O. Smith3

    Java Documentations

    Weka Documentations

  • 20

    APPENDIX

    ARFF FILE

    Overview

    ARFF files have two distinct sections. The first section is the Header information, which is

    followed the Data information.

    The Header of the ARFF file contains the name of the relation, a list of the attributes (the

    columns in the data), and their types. An example header on the standard IRIS dataset looks

    like this:

    @RELATION iris

    @ATTRIBUTE sepallength NUMERIC

    @ATTRIBUTE sepalwidth NUMERIC

    @ATTRIBUTE petallength NUMERIC

    @ATTRIBUTE petalwidth NUMERIC

    @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

    The Data of the ARFF file looks like the following:

    @DATA

    5.1,3.5,1.4,0.2,Iris-setosa

    4.9,3.0,1.4,0.2,Iris-setosa

    4.7,3.2,1.3,0.2,Iris-setosa

    4.6,3.1,1.5,0.2,Iris-setosa

    5.0,3.6,1.4,0.2,Iris-setosa

    5.4,3.9,1.7,0.4,Iris-setosa

    4.6,3.4,1.4,0.3,Iris-setosa

    5.0,3.4,1.5,0.2,Iris-setosa

    4.4,2.9,1.4,0.2,Iris-setosa

    4.9,3.1,1.5,0.1,Iris-setosa

    Lines that begin with a % are comments.

    The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive.

    The ARFF Header Section The ARFF Header section of the file contains the relation declaration and attribute

    declarations.

    The @relation Declaration The relation name is defined as the first line in the ARFF file. The format is:

    @relation

  • 21

    where is a string. The string must be quoted if the name includes spaces.

    The @attribute Declarations

    Attribute declarations take the form of an ordered sequence of @attribute statements. Each

    attribute in the data set has its own @attribute statement which uniquely defines the name

    of that attribute and its data type. The order the attributes are declared indicates the

    column position in the data section of the file. For example, if an attribute is the third one

    declared then Weka expects that all that attributes values will be found in the third comma

    delimited column.

    The format for the @attribute statement is:

    @attribute

    where the must start with an alphabetic character. If spaces are to be

    included in the name then the entire name must be quoted.

    The can be any of the four types supported by Weka:

    numeric

    integer is treated as numeric

    real is treated as numeric

    string

    date []

    where and are defined below. The

    keywords numeric, real, integer, string and date are case insensitive.

    Numeric attributes

    Numeric attributes can be real or integer numbers.

    Nominal attributes

    Nominal values are defined by providing an listing the possible

    values: {, , , ...}

    The ARFF Data Section The ARFF Data section of the file contains the data declaration line and the actual instance

    lines.

    The @data Declaration The @data declaration is a single line denoting the start of the data segment in the file. The

    format is:

    @data

  • 22

    The instance data Each instance is represented on a single line, with carriage returns denoting the end of the

    instance.

    Attribute values for each instance are delimited by commas. A comma may be followed by

    zero or more spaces. Attribute values must appear in the order in which they were declared

    in the header section (i.e., the data corresponding to the nth @attribute declaration is always

    the nth field of the attribute).

    A missing value is represented by a single question mark, as in:

    4.4,?,1.5,?,Iris-setosa

    Values of string and nominal attributes are case sensitive, and any that contain space must

    be quoted, as follows:

    @relation LCCvsLCSH

    @attribute LCC string

    @attribute LCSH string

    @data

    AG5, 'Encyclopedias and dictionaries.;Twentieth century.'

    AS262, 'Science -- Soviet Union -- History.'

    AE5, 'Encyclopedias and dictionaries.'

    AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Phases.'

    AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Tables.'

    Dates must be specified in the data section using the string representation specified in the

    attribute declaration. For example:

    @RELATION Timestamps

    @ATTRIBUTE timestamp DATE "yyyy-MM-dd HH:mm:ss"

    @DATA

    "2001-04-03 12:12:12"

    "2001-05-03 12:59:55"

    JAVA CODE FOR GENERATING THE REQUIRED ARFF FILE

    Accelerometer.java

    import java.io.BufferedReader;

    ...

    ...

    import java.util.Scanner;

  • 23

    public class Accelerometer {

    static FileWriter f=null;

    public static void main(String[] args) {

    ArrayList acc_x = new ArrayList();

    ArrayList acc_y = new ArrayList();

    ...

    ...

    ArrayList vel_gyr_avg = new ArrayList();

    BufferedReader br = null;

    //Scanner sc = null; FileWriter f=null;

    try {

    f = new FileWriter(new File("outfile.arff"));

    } catch (IOException e1) {

    // TODO Auto-generated catch block

    e1.printStackTrace();

    }

    try {

    f.write("@relation actvity");

    f.write("\n");f.write("\n");

    //f.write("@attribute activity string");f.write("\n");

    f.write("@attribute minmax_acc_x");f.write("\n");

    f.write("@attribute minmax_acc_y");f.write("\n");

    ...

    ...

    f.write("@attribute mean_vel_gyr_z");f.write("\n");

    f.write("@attribute mean_vel_gyr_avg");f.write("\n");

    ...

    ...

    f.write("@attribute type {curls, pulls,

    stretches}");f.write("\n");f.write("\n");

    f.write("@data");f.write("\n");f.write("\n");

    } catch (IOException e1) {

    // TODO Auto-generated catch block

    e1.printStackTrace();

    }

    for(int iter=1;iter

  • 24

    br = new BufferedReader(new

    FileReader(getFileName(iter)));

    int i=0;

    while ((s = br.readLine()) != null) {

    String[] arr = s.split(",", 100);

    acc_x.add(i, arr[0]);

    acc_y.add(i, arr[1]);

    acc_z.add(i, arr[2]);

    double

    avg_acc=(Double.parseDouble(arr[0])+Double.parseDouble(arr[1])+Double.p

    arseDouble(arr[3]))/3;

    acc_avg.add(i, ""+avg_acc);

    gyr_x.add(i, arr[3]);

    gyr_y.add(i, arr[4]);

    gyr_z.add(i, arr[5]);

    double

    avg_gyr=(Double.parseDouble(arr[3])+Double.parseDouble(arr[4])+Double.p

    arseDouble(arr[5]))/3;

    gyr_avg.add(i, ""+avg_gyr);

    cum_acc_x+=Double.parseDouble(arr[0]);

    cum_acc_y+=Double.parseDouble(arr[1]);

    cum_acc_z+=Double.parseDouble(arr[2]);

    vel_acc_x.add(i, ""+cum_acc_x);

    vel_acc_y.add(i, ""+cum_acc_y);

    vel_acc_z.add(i, ""+cum_acc_z);

    double

    avg_vel_acc=(cum_acc_x+cum_acc_y+cum_acc_z)/3;

    vel_acc_avg.add(i, ""+avg_vel_acc);

    cum_gyr_x+=Double.parseDouble(arr[3]);

    cum_gyr_y+=Double.parseDouble(arr[4]);

    cum_gyr_z+=Double.parseDouble(arr[5]);

    vel_gyr_x.add(i, ""+cum_gyr_x);

    vel_gyr_y.add(i, ""+cum_gyr_y);

    vel_gyr_z.add(i, ""+cum_gyr_z);

    double

    avg_vel_gyr=(cum_gyr_x+cum_gyr_y+cum_gyr_z)/3;

    vel_gyr_avg.add(i, ""+avg_vel_gyr);

    i++;

    //System.out.println(arr[0]);

    }

    } catch (IOException e) {

    e.printStackTrace();

  • 25

    } finally {

    try {

    if (br != null)br.close();

    } catch (IOException ex) {

    ex.printStackTrace();

    }

    }

    double minmax_acc_x =StdStats.minmax( acc_x);

    double minmax_acc_y =StdStats.minmax(acc_y );

    ...

    ... // calculating values

    ...

    double corr_vel_gyr_yz =StdStats.correlation(vel_gyr_y

    ,vel_gyr_z );

    double corr_vel_gyr_xz

    =StdStats.correlation(vel_gyr_x,vel_gyr_z );

    try {

    f.write(""+ minmax_acc_x +",");

    f.write(""+ minmax_acc_y +",");

    ...

    ...

    f.write(""+ mean_vel_gyr_z +",");

    f.write(""+ mean_vel_gyr_avg +",");

    if(iter>0&&iter44&&iter90&&iter

  • 26

    // TODO Auto-generated catch block

    e.printStackTrace();

    }

    }

    public static String getFileName(int n){

    ...

    ...

    return name;

    }

    }

    StdStats.java

    public static double minmax(ArrayList a){

    double arr[]= new double[a.size()];

    for (int i = 0; i max) max = arr[i];

    }

    mm=max-min;

    return mm;

    }

    public static double root_mean_square(ArrayList a){

    double arr[]= new double[a.size()];

    for (int i = 0; i

  • 27

    * Returns the average value in the array a[], NaN if no such

    value.

    */

    public static double mean(ArrayList a) {

    double arr[]= new double[a.size()];

    for (int i = 0; i

  • 28

    double corr_num= n*sumxy-sumx*sumy;

    double corr_den=Math.sqrt(n*sumx2-sumx*sumx)*Math.sqrt(n*sumy2-

    sumy*sumy);

    corr=corr_num/corr_den;

    return corr;

    }

    public static double var(ArrayList a) {

    double arr[]= new double[a.size()];

    for (int i = 0; i

  • 29

    System.out.println("out what type of workout it is.");

    }

    public static void play(BTNode current)

    {

    while (!current.isLeaf( ))

    {

    if (query(current.getData( )))

    current = current.getLeft( );

    else

    current = current.getRight( );

    }

    System.out.print("Are you doing " + current.getData( ) + ". ");

    if (!query("Am I right?"))

    learn(current);

    else

    System.out.println("OK.");

    }

    public static BTNode beginningTree( )

    {

    BTNode root;

    BTNode child;

    final String ROOT_QUESTION = "Is maxmin_acc_z greater than

    1686?";

    final String LEFT_QUESTION = "Is maxmin_gyc_avg greater than

    133?";

    //We can add RIGHT_QUESTION if we want.

    //final String RIGHT_QUESTION = "Is greater/smaller than

    ?";

    final String WORKOUT1 = "Pulls";

    final String WORKOUT2 = "Curls";

    final String WORKOUT3 = "Stretches";

    //final String WORKOUT4 = "Stretches";

    // Create the root node with the question Is maxmin_gyc_3

    greater than 1686?

    root = new BTNode(ROOT_QUESTION, null, null);

    // Create and attach the left subtree.

    child = new BTNode(LEFT_QUESTION, null, null);

    child.setLeft(new BTNode(WORKOUT1, null, null));

    child.setRight(new BTNode(WORKOUT2, null, null));

    root.setLeft(child);

    // Create and attach the right subtree.

    child = new BTNode(WORKOUT3, null, null);

    //child.setLeft(new BTNode(WORKOUT3, null, null));

    // child.setRight(new BTNode(WORKOUT4, null, null));

    root.setRight(child);

    return root;

    }

  • 30

    public static void learn(BTNode current)

    // Precondition: current is a reference to a leaf in a taxonomy

    tree. This

    // leaf contains a wrong guess that was just made.

    // Postcondition: Information has been elicited from the user, and

    the tree

    // has been improved.

    {

    String guessWORKOUT; // The WORKOUT that was just guessed

    String correctWORKOUT; // The WORKOUT that the user was thinking

    of

    String newQuestion; // A question to distinguish the two

    WORKOUTS

    // Set Strings for the guessed animal, correct animal and a new

    question.

    guessWORKOUT = current.getData( );

    System.out.println("I give up. What are you doing? ");

    correctWORKOUT = stdin.nextLine( );

    System.out.println("Please type a yes/no question that will

    distinguish a");

    System.out.println(correctWORKOUT + " from a " + guessWORKOUT +

    ".");

    newQuestion = stdin.nextLine( );

    // Put the new question in the current node, and add two new

    children.

    current.setData(newQuestion);

    System.out.println("As a " + correctWORKOUT + ", " +

    newQuestion);

    if (query("Please answer"))

    {

    current.setLeft(new BTNode(correctWORKOUT, null,

    null));

    current.setRight(new BTNode(guessWORKOUT, null,

    null));

    }

    else

    {

    current.setLeft(new BTNode(guessWORKOUT, null, null));

    current.setRight(new BTNode(correctWORKOUT, null,

    null));

    }

    }

    public static boolean query(String prompt)

    {

    String answer;

    System.out.print(prompt + " [Y or N]: ");

    answer = stdin.nextLine( ).toUpperCase( );

    while (!answer.startsWith("Y") && !answer.startsWith("N"))

  • 31

    {

    System.out.print("Invalid response. Please type Y or N: ");

    answer = stdin.nextLine( ).toUpperCase( );

    }

    return answer.startsWith("Y");

    }

    }