a comparative study on artificial neural networks and...

39
IN DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS , STOCKHOLM SWEDEN 2016 A comparative study on artificial neural networks and random forests for stock market prediction VICTOR ERIKSSON THUJEEPAN VARATHARAJAH KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Upload: others

Post on 21-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

IN DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS, STOCKHOLM SWEDEN 2016

A comparative study on artificial neural networks and random forests for stock market prediction

VICTOR ERIKSSON

THUJEEPAN VARATHARAJAH

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Page 2: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

En komparativ studie mellan artificial neuralnetworks och random forests med avseende på

aktieprognoser

VICTOR ERIKSSONTHUJEEPAN VARATHARAJAH

Degree Project in Computer Science, DD143XSupervisor: Atsuto MakiExaminer: Örjan Ekeberg

May 11, 2016CSC, KTH

Page 3: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Abstract

This study investigates the predictive performance of two different machine learning (ML)models on the stock market and compare the results. The chosen models are based onartificial neural networks (ANN) and random forests (RF).

The models are trained on two separate data sets and the predictions are made onthe next day closing price. The input vectors of the models consist of 6 different financialindicators which are based on the closing prices of the past 5, 10 and 20 days.

The performance evaluation are done by analyzing and comparing such values as theroot mean squared error (RMSE) and mean average percentage error (MAPE) for thetest period. Specific behavior in subsets of the test period is also analyzed to evaluateconsistency of the models.

The results showed that the ANN model performed better than the RF model asit throughout the test period had lower errors compared to the actual prices and thusoverall made more accurate predictions.

Page 4: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Sammanfattning

Denna studie undersöker hur väl två olika modeller inom maskininlärning (ML) kanförutspå aktiemarknaden och jämför sedan resultaten av dessa. De valda modellernabaseras på artificiella neurala nätverk (ANN) samt random forests (RF).

Modellerna tränas upp med två separata datamängder och prognoserna sker på näst-följande dags stängningskurs. Indatan för modellerna består av 6 olika finansiella nyckel-tal som är baserade på stängningskursen för de senaste 5, 10 och 20 dagarna.

Prestandan utvärderas genom att analysera och jämföra värden som root mean squa-red error (RMSE) samt mean average percentage error (MAPE) för testperioden. Ävenspecifika trender i delmängder av testperioden undersöks för att utvärdera följdriktighe-ten av modellerna.

Resultaten visade att ANN-modellen presterade bättre än RF-modellen då den settöver hela testperioden visade mindre fel jämfört med de faktiska värdena och gjordedärmed mer träffsäkra prognoser.

Page 5: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Contents

1 Introduction 11.1 Objective and problem statement . . . . . . . . . . . . . . . . . . . . . . . 11.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Stock prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Artificial neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Random forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.1 Decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.2 Regression tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.3 Random forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Method 113.1 Data selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Technical analysis indicators used in the input vector . . . . . . . . . . . . 133.3 ANN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 RF model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5.1 Root mean squared error . . . . . . . . . . . . . . . . . . . . . . . 163.5.2 Mean absolute percentage error . . . . . . . . . . . . . . . . . . . . 16

4 Results 184.1 Performance on WMT stock . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Performance on IXIC index . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Discussion and conclusion 215.1 Discussion of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Discussion of method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2.1 Amount of data/observations . . . . . . . . . . . . . . . . . . . . . 225.2.2 Partitioning of the data . . . . . . . . . . . . . . . . . . . . . . . . 235.2.3 The choice of technical indicators . . . . . . . . . . . . . . . . . . . 23

Page 6: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

5.3 Comments on the reviewed sources . . . . . . . . . . . . . . . . . . . . . . 245.4 Final conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Appendix A Biological neural networks 25

Bibliography 27

List of Figures 30

List of Tables 31

Acronyms & abbreviations 32

Page 7: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Chapter 1

Introduction

Stock market prediction is an important field of interest among investors. Accurate pre-dictions mean that one potentially can foresee events or trends thus making investmentsmore profitable. Making such predictions is difficult due to the complex nature of themarket which is influenced by a wide variety of factors. Several methods exist and tech-nological methods are one main branch that has become increasingly popular in recentyears. These kinds of methods utilize algorithmic models in order to make the predic-tions. The problem can be seen as a time series prediction problem which is the task ofestimating a value based solely on previous values in a time series.

Two examples of technological methods from machine learning (ML) that have beenused are artificial neural networks (ANN) and random forests (RF) [1, 2, 3, 4, 5]. AnANN is a model that is inspired by the human nervous system and a RF is a collection ofdecision trees. Both these models are suitable for applications like stock prediction as theyhave the ability to model complex structures such as nonlinear patterns. These methodsmake predictions by analyzing existing data, building a model to reflect the underlyingnature of the data and then uses this model to generalize on previously unseen data.

There exist quite a lot of research on ANNs within the field of stock prediction whichhas yielded some interesting results [1]. RFs have also been researched within the area,but have generally not attracted the same amount of attention as ANNs. However, inCaruana and Niculescu-Mizil’s [6] study where they did large scale empirical comparisonson a set of supervised learning algorithms, the overall performance of the RF model wasranked higher than the ANN model. Thus an interesting question emerges regarding howANN and RF models compare when applied to stock prediction. The goal of this studyis to compare the performance between optimized implementations of ANNs and RFs,respectively.

1.1 Objective and problem statement

The objective of this thesis is to investigate and compare the predictive properties thatartificial neural networks and random forests have on the stock market. The question tobe answered is:

1

Page 8: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 1. INTRODUCTION

How does artificial neural networks and random forests compare to one an-other when applied as one day prediction models on the stock market andwhich of the two has the best predictive performance?

1.2 Scope

The ANNs and RFs will be implemented for the Walmart stock and the Nasdaq Com-posite stock index respectively.

Due to the complexity of the models together with constraints in time and resources,the models will be implemented and optimized using built-in tools and classes in Matlab.To some extent, this also limits the level of optimization and alteration that can be madeon the models.

Performance will be measured with the chosen metrics, and the conclusion will thusbe based on these as well.

It is important to note that this study will strictly be of a comparative nature. Therewill be no analysis as to why one model performs better than the other. Instead, thestudy will draw its conclusion from comparing the performance results of both modelson the chosen stocks.

There will not be any guidelines in how one can earn money from the stock marketusing these models as this is not the aim of this study.

1.3 Outline

The remainder of this report is outlined in the following fashion. Chapter 2 presentsrelevant background information on the topics of stock prediction, machine learning,ANNs and RFs. Chapter 3, presents the methodology of the conducted study in termsof data selection, how the input vectors were constructed and how the models wereimplemented and optimized. Chapter 3 also presents the metrics that are used to evaluatethe performance of both models. Chapter 4 presents the results of the conducted study interms of the performance metrics as well as graphs showing the overview of the results.Chapter 5 discusses the results, methodology, offers source criticism and presents thefinal conclusion to answer the proposed problem statement of section 1.1. Appendix Agives a brief description of biological neural networks.

2

Page 9: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Chapter 2

Background

In this chapter, some of the key concepts regarding stock prediction, ML, ANNs and RFsare explained.

2.1 Stock prediction

Stocks relate to a stake in certain kinds of companies in which one may invest money andbe liable only in proportion to the amount invested. These are called limited companiesand have a finite number of stocks that is partitioned into shares. When buying sharesin such a company, one becomes a shareholder and thus an owner of some fraction of thecompany. The type of stock regulates the terms of ownership and therefore shareholdersmay have different influence on the business operation. [7]

A limited company can be listed on a stock exchange which is a marketplace fortrading shares but these can also be traded privately and the combination of all possibletrading places are in the context of this thesis referred to as the stock market. Thestock price at a moment in time is determined by the most recent price that a buyerand seller agrees upon. Consequently, one can say that the stock price is regulated bysupply and demand [8]. The total amount of shares available in a company correspondsto the supply while many different factors affect the demand which may in turn havevarying predictability. There are often several different prices to consider when analyzinga specific stock but in this thesis, the closing price is the only one used which is the priceof the last transaction just before the stock exchange closes for the day.

There are generally two branches of stock prediction which are fundamental analy-sis and technical analysis. Fundamental analysis focuses on financial factors such as acompany’s balance sheet, market positions, credit value etc. and uses these to make anestimate of the company’s intrinsic value. Technical analysis on the other hand bases itsestimates purely on historical data such as the stock price and different key indicators.[8]

3

Page 10: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 2. BACKGROUND

2.2 Machine learning

Machine learning can be seen as a subfield of Artificial intelligence (AI). Where AIgenerally involves any computational model based on intelligence, ML concerns itselfwith developing algorithms with the ability to learn and improve themselves by beingexposed to new data. In this regard ML is a natural part of AI as AI involves learningas well as other cognitive attributes often related to intelligence such as reasoning andperception. [9]

A model of ML can be described as a function representing the nature of the inputdata. Moreover can ML be divided into two main branches which is supervised learningand unsupervised learning. In supervised learning, the training set of the algorithmcontains data points of inputs as well as its desired outputs. In unsupervised learning, thedesired outputs are unknown to the learning algorithm [10]. The branch used throughoutthis thesis is supervised learning because the implemented models have historical datapoints available in the form of dates in combination with stock prices.

ML can be further divided into regression or classification models. Classificationmodels output a class which is a value of a discrete set while regression models insteadoutput a continuous value [10]. This makes regression the valid choice for the applicationof this thesis as the implementations must be able to output real valued stock prices.

Ultimately, the goal of a ML model is to analyze a training set, find some underlyingnature of the data and then be able to generalize this model to previously unseen data aswell [11]. In supervised learning, this is done through bias-variance tradeoff which is theproblem of simultaneously minimizing both the bias and the variance of a model. Thebias is the difference between the predicted value and the actual value while the varianceindicates to what degree the values are spread out [12].

Bias: B =1

n

n∑k=1

fk − f̂k (2.1)

where fk are the true values, f̂k are the predictions and n is the number of points in thedata set.

Sample variance: s2 =1

n− 1

n∑k=1

fk − f̄k (2.2)

where fk are a value, f̄k are the mean of all values and n is the number of points in thedata set.

High bias may cause underfitting which means that relevant properties of the dataare missed in the learning process. High variance may cause overfitting which is thereverse of underfitting and means that the model is overly sensitive to certain propertiesof the training data. Both underfitting and overfitting result in decreased performanceregarding the ability to generalize well to unseen data. [10]

The general learning approach of ML can roughly be described in three steps. The firststep is to partition the total data set into three parts which is the training data, validationdata and test data. The second and third steps is the hyperparameter optimization

4

Page 11: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 2. BACKGROUND

and model parameter optimization, respectively. Basically the hyperparameters are theparameters that are adjusted before the actual learning while the model parametersare adjusted during the learning. In this respect the hyperparameter optimization isa so called meta-optimization task meaning that it optimizes another optimizer whichin this case is the learning algorithm itself. There are a wide variety of methods usedfor optimizing the hyperparameters and this study takes the approach of using cross-validation together with loss function optimization. This means that the hyperparametersof a ML model are iteratively changed until the minimum of the chosen loss function,evaluated over the validation set, are found. The third step of the model parameteroptimization can also be done in different ways which are the case for the two models inthis study, see sections 3.3 and 3.4. [10]

2.3 Artificial neural networks

ANNs are a collection of models inspired by the biological neural network (BNN) modelof the human nervous system (see appendix A for a basic explanation of BNNs). LikeBNNs, ANNs generally prove to be well suited for typical cognitive tasks such as patternrecognition and learning [13, 14]. Even though simplified compared to a BNN, the ANNmodels implement the basic structure of their biological equivalent and can be describedas finite directed weighted graphs. Edges that are directed into vertices are inputs whileedges directed out are outputs and the collection of the m inputs or n outputs can beseen as n- or m-dimensional vectors, respectively.

ANN models have at least an input and an output layer and the number of verticesin these is determined by the dimensionality of the problem at hand. An ANN can beviewed as a function as described in section 2.2 and for the particular regression problemof this thesis, more specifically a function from R6 to R.

One of the simplest ANN models is the feed forward neural network (FFNN) whichcan be modeled as acyclic graphs. There are many other kinds of ANN models such asthe FFNNs cyclic counterpart which is called Recurrent Neural Networks (RNN) but theFFNNs are more commonly used compared to its alternatives [15].

A more mathematically stringent formulation of a vertex in an ANN is that the jthvertex can be seen as a composite function F (~x, ~w) = (A ◦ T )(~x, ~w) where T is thetransfer function, A is the activation function, ~x = x1, ..., xn is the input vector and~w = w1, ..., wn is a vector containing the edge weights for the corresponding inputs in ~x,see figure 2.1. [13, 16]

The transfer function T for the jth vertex is most often a summation function definedas the dot product of the input and weight vectors, or equivalently the weighted sum ofall the inputs, T = ~x · ~w. The output of T is passed on to the activation function Athat determines whether the artificial neuron is activated or not. The activation functionmay or may not use the threshold value (theta in figure 2.1) to perform the evaluation[13]. Different activation functions are used depending on the model but generally adistinction can be made between linear and nonlinear activation functions. The trainingof an ANN is done by the general principles of ML discussed in section 2.2, through

5

Page 12: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 2. BACKGROUND

Figure 2.1: A basic outline of an artificial neuron. Source: https://en.wikibooks.org/wiki/File:ArtificialNeuronModel_english.png

exposing the network iteratively to known data and adjusting model parameters like theedge weights correspondingly.

One of the early ANNs, modeled in the 1960’s by Frank Rosenblatt, is called a singlelayer perceptron (SLP) and was a FFNN built on the concept of the perceptron algorithm[13, 17]. This algorithm can be defined as an artificial neuron that uses the Heavisidestep function (2.3) as the activation function which means that the output is a binaryvalue dependent on the value of the transfer function in comparison with the thresholdvalue theta [13], see figure 2.1.

H(θ) =

{0, θ < 0

1, θ ≥ 0(2.3)

The learning of a SLP is usually done according to the delta rule which is an iterativegradient descent algorithm. This means that steps proportional to the negative gradientof the function are taken in order to find a minimum after a finite number of steps[17]. In the case of SLPs, the edge weights are first chosen randomly, then the desiredoutput data is compared to the actual output data in a loss function and the weightsare adjusted thereafter. A first limitation of this model is that the input data must belinearly separable, otherwise the training algorithm may not converge towards a solutionin a finite number of steps which means that there is a large number of fundamentalexamples that SLPs are unable to learn [13]. A second limitation is that the activationfunction only can perform binary classification and thus regression applications cannotbe modeled using SLPs.

To overcome the limitations of the SLP model, the multilayer perceptron (MLP)was developed in the 1970’s which is a FFNN model that differs from SLPs in severalrespects. The MLP model contains except for the input and output layers an arbitrarilylarge number of hidden layers between these.

MLPs also use more sophisticated learning methods based mainly on backpropagation(BP) that is a generalization of the delta rule used in SLPs. BP algorithms may aside

6

Page 13: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 2. BACKGROUND

Figure 2.2: A Multilayer perceptron (MLP) with one hidden layer.Source: https://upload.wikimedia.org/wikipedia/commons/c/c2/MultiLayerNeuralNetworkBigger_english.png

from adjusting the edge weights also shift the activation function by using a so calledbias vertex to allow it to trigger in a different manner [13]. The added layers and theBP algorithm provide the network with an internal structure that gives MLPs the abilityto handle problems such as non-linearly separable ones that SLPs are unable to modelappropriately [17, 13]. MLPs may also use other activation functions than the Heavisidestep function which means that MLP models can be used effectively both for classificationand regression problems. Common activation functions used in MLPs are different linearfunctions, hyperbolic tangent functions as well as the sigmoid function [16]. The sigmoidfunction is the primary function in the implementations of this thesis and can be seenbelow in (2.4). These kinds of non-constant activation functions are all continuous inthe range y ∈ [0, 1] and this is vital for the functionality of the BP algorithm. Severaldifferent activation functions may be used in one single MLP model.

S(t) =1

1 + e−t(2.4)

There are a wide range of learning algorithms that implement BP with some addi-tional optimization technique. The Levenberg-Marquardt algorithm (LMA) that is usedin this thesis, utilizes the Gauss-Newton algorithm (GNA) as the optimization and MSEas the loss function. LMA is generally considered to be a fast and efficient choice amonglearning algorithms [18].

For ANNs, the training data is used by the learning algorithm to adjust the edgeweights and the bias vertex of the network. The validation data are used during trainingto evaluate the loss function and determine when satisfactory performance of the networkhas been reached. It is also used to optimize the hyperparameters as described in sections2.2 and 3.3. The test data is purely used for performance evaluation.

7

Page 14: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 2. BACKGROUND

2.4 Random forests

2.4.1 Decision tree

A decision tree is a model of supervised learning that can be represented by a tree graph.A decision tree is thus a set of questions organized in a hierarchical manner where givenan input vector the decision tree estimates an unknown property by asking successivequestions about its known properties [19]. Starting at the root, the interior nodes of thetree represent each question and the edges between them are the corresponding answers.As such, the question that is asked next depend on the previous answer. In followingsuch a path, a decision is made once a terminal node, or leaf, has been reached.

Figure 2.3: An illustrative example of a decision tree that is used to figure out whethera photo (the input vector) represents an indoor or an outdoor scene. Source: Criminisiet al. [19] (p.88).

2.4.2 Regression tree

A regression tree is as the name implies, a decision tree that is used for regressionproblems. The general idea is to use recursive partitioning on data that act in complicatednonlinear ways to get smaller partitions with lower variance allowing for better predictions[19].

The process involves partitioning the data space into smaller disjoint regions, andagain partition those sub-regions recursively so that we get chunks of space that havesimple predictive models contained in them (to predict unknown values). This recursivepartitioning is what is represented in the regression tree. As such, each interior noderepresents a split as well as a partition of space following from earlier splits. The interiornodes guide a path to the terminal nodes which represents a cell of a partition, and which

8

Page 15: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 2. BACKGROUND

has a predictive model attached to it. It is the predictive model contained in the terminalnode that decides the actual form of the prediction and common models include usingthe constant mean or linear/polynomial regression [19].

Figure 2.4: Regression tree (a) An illustrative example of a 2-dimensional regressiontree with binary splits. The predictive model in the terminal node takes the mean ofall data points contained in that partition. (b) An illustrative example of the parti-tioned space and the containing data points. Source: https://dzone.com/articles/regression-tree-using-gini%E2%80%99s

The performance of a regression tree is mainly down to how well the tree is set up andpartitioned. A common way to split a regression tree is to use a greedy approach whereeach split maximizes the variance reduction [19, 20]. This is done by initially choosingthe data point and the split that maximizes the variance reduction, and set this as theroot node. As there are finitely many data points in the data set, there are only finitelymany splits that needs to be considered. The same process is then done recursively foreach interior node. For instance, in a data space with a regression tree that does binarysplits, the root node would represent the split on the data point that minimizes the sumof the variances on both partitions of the split. The interior nodes, which represent thesubregions, would then go through the same process recursively.

The recursive partitioning keeps going until a stopping criterion has been reachedwhich is when the growing of the tree stops. A typical stopping criterion is when apartition has some minimal amount of observations (data points) contained in it. Thusthe leaf with the predictive model represents the partition that has met the stoppingcriterion [19, 21].

A tree that has been overfitted may be a tree that has grown too much. This mayoccur due to having a stopping criterion where the amount of minimal observations aretoo low, leading the tree to have bad generalization. The process of pruning, meaningremoving unreliable branches after growing the tree, may improve its ability to generalize[20].

9

Page 16: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 2. BACKGROUND

2.4.3 Random forest

A RF is an ensemble learning technique that has shown to have better performance ingeneralization and overfitting avoidance in comparison to a single decision tree [19]. Theintroduction of RFs in classification and regression trees (CART) was first proposed in apaper by Leo Breiman [22]. As this study involves only using regression trees, the usageof the term random forest will throughout this thesis correspond to a random regressionforest.

A RF is a collection of randomly trained regression trees that are collectively used tooutput a prediction. In a RF, there is no need to prune each individual tree after growingthem. The main reason being its use of the random subspace method in a collection oftrees [22]. The random subspace method involves the following process when growingeach individual tree:

1. Randomly choose a subset of the training data for training the individual regressiontree.

2. At each node, choose a random subset of features, and only consider splitting onthese features.

The prediction of the RF model, is the mean of the predictions of all individual treesin the RF. The use of RF instead of a single regression tree have shown to considerablydecrease the risk of overfitting. As the individual trees are all randomly different fromone another, this leads to decorrelation between their individual predictions thus makingthe prediction of the RF generalize better [19, 22].

Hyperparameters that can affect the performance of a RF includes variables suchas the number of trees, how many features that are considered at each node, and thestopping criterion [23].

10

Page 17: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Chapter 3

Method

To answer the proposed problem statement, this study was initially conducted by doinga literature study on the topics of stock market prediction, ANNs, RFs and ML ingeneral. The literature study also involved reading available material on the subject ofML techniques for stock market prediction and in particular research material that usedANN and RF models as the models for prediction.

Suitable input vectors for one-day stock prediction were also researched and it wasdecided that the use of technical analysis indicators would be a relevant choice. Thisdecision was made after reading similar studies of ANN and RF based stock predictionusing technical analysis indicators as well [2, 4, 5].

Both predictive models and the necessary operations to gain results were implementedin Matlab R2015a (8.5.0.197613). The steps taken in Matlab involved processing data,initializing ANN and RF models, training and testing the models for prediction as wellas calculating performance metrics.

3.1 Data selection

The input data set comes from the Walmart stock price (WMT) and the Nasdaq Com-posite stock index (IXIC). These stocks were chosen due to differences regarding someproperties such as the closing prices and the volatility which gives a good basis for acomparison between different types of data. To gain fair and comparable results, thesame amount of datasets needed to be gathered and used for both models.

When researching available studies of stock prediction using ANNs and RFs, the lit-erature study showed that the amount of gathered datasets has been of varying amounts.Paluch and Jackowska-Strumillo [2] used one year of data in their ANN based stock pre-diction model. In contrast to that Ticknor [1] used 734 days to gain optimal results inhis study of an ANN based prediction model. Manojlovic and Stajduhar [4] used 1512days of data in their paper on RF based stock forecasting.

In this study, the training, validation and test data consisted of 254 days for bothmodels as can be seen in figures 3.1 and 3.2. The data was gathered by initially fetching274 days of closing price data from Yahoo Finance for WMT and IXIC in the timespan

11

Page 18: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 3. METHOD

Figure 3.1: The actual prices of the WMT stock in the interval between Feb 3 2015 - Feb4 2016. Source: Matlab.

January 5th 2015 - February 4th 2016. They were then processed to 6-dimensional inputvectors with 254 observations representing the timespan February 3rd 2015 - February4th 2016. The input vector consisted of technical indicators that were based on theclosing prices of 5, 10 and 20 days prior to the day that is to be predicted (see section3.2).

The last 15 % of the gathered data was chosen as a subset to test performance. Thusthe initial 85 % of the data was to be separated into training and validation sets. This wasdone by studying the properties of the gathered dataset for both stocks. The proposedmethods aim was to include unusual spikes and dips in the training data. For instance,in the WMT stock when considering the initial 85 % of the 254 observations, there isa sudden dip in the closing price on day 177 and it reaches it’s lowest price at day 199before starting to recover. With the proposed method, it was reasonable to have thesedays in the training data. In such a manner, the partitioning for each stock was done asdetailed below.

• The WMT stock was partitioned by setting 80 % as training data, 5 % was set asvalidation data and the remaining 15 % was set as test data to evaluate perfor-mance.

• The IXIC index was partitioned by setting 70 % as training data, 15 % was setas validation data and the remaining 15 % was set as test data to evaluate perfor-mance.

12

Page 19: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 3. METHOD

Figure 3.2: The actual prices of the IXIC index in the interval between Feb 3 2015 - Feb4 2016. Source: Matlab.

This partitioning also means that both the ANN and RF models for the respective stockshad the same subset of data for building their structure and the same subset to test theirperformance. Table 3.1 presents both partitioning schemes that was used.

Training days Validation days Test daysWMT 1 - 204 205 - 216 217 - 254IXIC 1 - 178 179 - 216 217 - 254

Table 3.1: Table presenting the partitioning of the data sets for each stock. The respectivestock had the same partitioning to build and evaluate both of its respective ANN andRF models.

3.2 Technical analysis indicators used in the input vector

The most commonly used technical indicators are moving averages and oscillators [2, 4].As such, the approach was to use 5, 10 and 20 day exponential moving averages (EMA)and rates of change (ROC) to construct the input vector.

13

Page 20: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 3. METHOD

Exponential moving average (N = 5, 10, 20 days):

EMAN (k) =C(k) + aC(k − 1) + a2C(k − 2) + ...+ aN−1C(k −N + 1)

1 + a+ a2 + ...+ aN−1(3.1)

where C(k) indicates the closing price on day k, and a = 1− (2/(N + 1)).

Rate of change (N = 5, 10, 20 days):

ROCN (k) =C(k)− C(k −N)

C(k −N)(3.2)

where C(k) indicates the closing price on day k.The ROC determines the rate of price change in a given period. As such, a positive

rate indicates an upgoing trend in the closing price while a negative rate indicates adowngoing trend.

3.3 ANN model

As mentioned in the beginning of this chapter, the ANN model implementation as wellas the pre- and post processing of data was done in Matlab where the implementa-tion utilized the built in software neural network toolbox. The model is a feedforwardthree-layered MLP. This network configuration is common as one hidden layer is enoughto model nonlinear functions while additional hidden layers pose an increased risk ofoverfitting depending on the nature of the data [3]. Furthermore, models with multiplehidden layers are more likely to fall into a local minimum when training the network [24].

As suggested from the dimensionality of the data, the input layer consists of six nodesand the output layer of a single node. Determining the optimal hyperparameter of thenumber of neurons in the hidden layer is not entirely straightforward and a variety ofrecommendations exist regarding the subject. The following are some commonly usedrule-of-thumb methods [25]:

• The size of the hidden layer is somewhere between the sizes of the input and outputlayers.

• The size of the hidden layer is 2/3 of the size of the input layer plus the size of theoutput layer.

• The size of the hidden layer is less than twice the size of the input layer.

The hyperparameter was empirically determined with these three recommendations asa guideline to be optimal at five nodes. This was done by cross-validation and lossfunction optimization as described in section 2.2 with RMSE as the loss function (seesection 3.5.1). In this particular situation, the estimated optimal size coincides exactlywith that of the second recommendation above.

14

Page 21: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 3. METHOD

The activation functions used were a sigmoid function for the hidden layer and alinear function for the output layer. During the training of the network, the inputs andtargets were normalized in the range [-1, 1] with the mapminmax function and the trainednetwork was configured to output the unnormalized values for better interpretability. Thestopping criterion used during training was that when the RMSE failed to decrease forsix successive times, validation checks halted and thus training terminated.

Figure 3.3: A schematic image reflecting the structure of the ANN model. Source:Matlab.

For backpropagation, the Levenberg-Marquardt algorithm was used. It is recom-mended in the API as a first choice due to its speed compared to some other alternatives[18]. The Bayesian regularization algorithm was tested without any noticeable improve-ment in performance so Levenberg-Marquardt was deemed satisfactory. A schematicrepresentation of the model is shown in figure 3.3.

3.4 RF model

The RF models were implemented using the TreeBagger class in Matlab and the imple-mentation follows Breiman’s RF model. It uses the mean of observations in its terminalnodes to predict unknown values. The models for WMT and IXIC were individuallytuned with the validation data to gain optimal performance. The hyperparameters tunedwere NumTrees, NVarToSample, and MinLeafSize and are explained below.

• NumTrees - Amount of regression trees to grow in the RF model.

• MinLeafSize - Minimum number of observations at each leaf, also known as thestopping criteria.

• NVarToSample - The amount of features to be chosen at random for each splitin the trees.

NumTrees was set to be 100 trees for both WMT and IXIC. In general, more trees ina RF could lead to better performance but this performance gain eventually convergesto a limiting value [22]. In his original paper, Breiman used 100 trees for his RF model.In this study, testing was done to see if more trees than 100 had any positive effects on

15

Page 22: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 3. METHOD

the RMSE (see section 3.5.1) of the validation data. As there was little to no gain, 100trees was used as the hyperparameter value for both RF models.

MinLeafSize was set to be 5 observations for both WMT and IXIC. This is thedefault value of the TreeBagger class. In order to test if it was a suitable value, theperformance was compared between models that used 5, 10 or 20 observations on thetraining and validation data. As 5 observations gave the lowest RMSE, it was chosen asthe hyperparameter value.

NVarToSample was set to 2 features for the WMT stock and 4 features for the IXICindex. The values were chosen by iterating through possible values on each stock andchoosing the value that gave the lowest RMSE on the validation set.

NumTrees MinLeafSIze NVarToSampleWMT 100 5 2IXIC 100 5 4

Table 3.2: Table presenting the hyperparameter values set for each RF model for therespective stocks.

3.5 Performance metrics

The chosen performance metrics are the root mean square error (RMSE) and the meanabsolute percentage error (MAPE). These are relevant measures of performance as theyinvolve calculating the bias error and as such these metrics calculate the errors basedon the actual price compared to the predicted price. The equations together with briefdescriptions are presented in the subsections below.

3.5.1 Root mean squared error

RMSE =

√√√√ 1

n

n∑k=1

(f̂k − fk)2 (3.3)

where f̂k are the predicted values, fk are the true values and n is the number of datapoints.

The RMSE is the square root of the average of the sum of squared bias errors.

3.5.2 Mean absolute percentage error

MAPE =1

n

n∑k=1

∣∣∣∣∣fk − f̂kfk

∣∣∣∣∣ (3.4)

where f̂k are the predicted values, fk are the true values and n is the number of datapoints.

16

Page 23: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 3. METHOD

The MAPE measures the size of the sum of bias errors in percentage terms. It iscalculated as the average of the unsigned percentage error.

17

Page 24: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Chapter 4

Results

In this section, the results of the ANN and RF models, with respect to both stocks arepresented. This involves graphs illustrating the difference between the actual price andthe one-day predicted price of the test set as well as tables containing the performancemetrics.

4.1 Performance on WMT stock

In figure 4.1, the graph presents the differences between the actual and the one-daypredicted closing prices on the WMT stock using the respective models on the test set.

Figure 4.1: A graph presenting the predictions of the two models on the WMT stockagainst the actual prices during the 38 day test period ranging between days 217-254.

18

Page 25: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 4. RESULTS

In table 4.1, the performance metrics for the models on the WMT stock are presented.

RMSE MAPEANN 1.1177 0.0137RF 2.1480 0.0288

Table 4.1: Table presenting the RMSE and MAPE metrics for the WMT stock. Thefigures are represented with 4 decimal points.

Both the RMSE and MAPE were better for the ANN model, in comparison to the RFmodel. This indicates that the ANN had better performance on the WMT stock.

4.2 Performance on IXIC index

In figure 4.2, the graph represents the differences between the actual and the one-daypredicted closing prices on the IXIC stock index using the respective models on the testset.

Figure 4.2: A graph presenting the predictions of the two models on the IXIC stock indexagainst the actual prices during the 38 day test period ranging between days 217-254.

19

Page 26: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 4. RESULTS

In table 4.2, the performance metrics for the models on the IXIC stock index are pre-sented.

RMSE MAPEANN 79.5055 0.0139RF 135.5593 0.0236

Table 4.2: Table presenting the RMSE and MAPE metrics for the IXIC stock index. Thefigures are represented with 4 decimal points.

Both the RMSE and MAPE were better for the ANN model, in comparison to the RFmodel. This indicates that the ANN had better performance on the IXIC stock index.

20

Page 27: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Chapter 5

Discussion and conclusion

In this section, the results and the methodology of the performed study are discussed.The reviewed sources that are used in this study are also commented upon in the mannerof source criticism. Finally, the proposed problem statement described in section 1.1 isanswered in the final conclusion.

5.1 Discussion of results

The results for both WMT and IXIC consistently show better performance for the ANNmodel compared to the RF model. The MAPE metric shows that the errors in thepredictive models all are within a couple of percents with 2.88 % being the highest forthe RF model prediction on the WMT stock.

An observation of the WMT stock prediction seen in figure 4.1 is that there is adrastic price drop from around 67 USD to 57 USD between the days 175 - 190. This isincidentally exactly the price range as represented in the test data except that the pricedevelopment looks completely different there. As there are no other representations ofthis particular price range in the training data, this could potentially impact the modelsabilities to accurately generalize these prices in the test period. The ANN model hasbetter generalization performance in this case compared to the RF model.

When analyzing the results for the IXIC index in figure 4.2, a similar pattern is seenas for the WMT stock. The sudden divergence from day 235 and onwards is consistentwith the hypothesis of bad generalization mentioned above, i.e after day 235 the valuedrops unusually low when compared to the training data in figure 3.2. Just as for theWMT stock, here the ANN model handles the lack of reliable training data better thanthe RF model as it consistently shows lower biases for every data point throughout thisperiod.

Further observations of the results might indicate that the quality of the validationdata has different impacts on the two models for their respective hyperparameter opti-mization process. For the WMT stock prediction in figure 4.1, the RF model has a ratherlarge bias consistently throughout the test period in comparison to the ANN model. Inthe IXIC index prediction in figure 4.2 prior to day 235, the difference in bias between the

21

Page 28: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 5. DISCUSSION AND CONCLUSION

ANN and RF model are not that extreme. This might suggest that the hyperparameterspotentially have been better optimized for the IXIC models in comparison to the WMTmodels. However, as the ANN model still generalize better for both stocks it seems tosuggest that it was less sensitive to overfitting, in comparison to the RF model. Thechosen partitioning of the data is further discussed in section 5.2.2.

Overall the ANN model had some noticeable spikes during the test period for bothstocks while the RF model was smooth to a larger extent. The spikes generally syncedwell with the changes of the actual price even if it occasionally, in connection with verysudden changes, resulted in a bigger misprediction than usual. The cause of this may bethat, when sudden changes with high variance in the price occur, the ANN model triestoo follow the recent change but is too slow in order to estimate these rather extremeday-to-day changes. These kinds of scenarios may be harder to predict than those over alonger time period because of the increased difficulty of generalizing such unique eventsfrom a limited set of training data.

5.2 Discussion of method

The complexity of ANNs and RFs means that there are several different hyperparametersthat can be tuned to regulate their performance. When implementing the ANN and RFmodels, some of the hyperparameters were chosen by general literature study as well asperforming own tests to choose what could be regarded as optimal. In contrast, someother hyperparameters were set rather arbitrarily, generally due to the time constraint.Studying these factors further could have improved the results.

5.2.1 Amount of data/observations

One of the factors that may affect performance is the total amount of data to collect foreach stock. The literature study showed different reports using very different amounts ofdata. Choosing 255 trading days in total (about one year of trading days) was an amountthat could have been adjusted by measuring performance using different amounts ofdatasets. It is possible that in the instances where the predictive models had difficultiesto generalize (see section 5.1), using more historical data for training and validating themodels, could have aided in increasing the performance of the models. As such, whenconsidering the size of the datasets, it might be viable to also study the behaviour ofthe stocks. For instance, considering a factor such as the stock’s volatility over a certainperiod of time could aid in deciding whether to use more or less data for that specificstock. Further studies of the stock and it’s behaviour could thus aid in selecting anoptimal amount of days/observations.

However, studying different amounts of datasets and assessing the optimal amountwould have required much time and resources, especially considering the need to choosethe same amount of data for both the ANN and RF models. It should also be noted thatthe aim of this study is to compare the performances of ANN and RF models. Using theamount that was chosen in this study can have aided in noticing possible strengths and

22

Page 29: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 5. DISCUSSION AND CONCLUSION

weaknesses of the respective models.

5.2.2 Partitioning of the data

The partitioning of the data into training, validation and test sets is a factor that mayaffect performance as well (as briefly discussed in section 5.1). The chosen amount oftest days was set rather arbitrarily, and further large-scale studies to determine optimalamount of test days for both models could improve the results. The quality of thevalidation data is also of importance when optimizing the models. As mentioned insection 3.1, the training data was slightly arbitrarily chosen to include sudden spikes anddips in the closing price, and the validation data was the remaining set of days that wereprior to the test days.

In the WMT stock in figure 3.1, it is noticed that the stock price declines untilaround day 199 when it is starting to recover. The partitioning presented in section 3.1shows that the training data (days 1 - 204) capture prices of all possible levels whilethe validation data (days 205 - 216) reflect a more local behaviour in this case as theseprices do not quite include the volatility compared to all 254 days and the test datain particular. An implication of this may be that the validation data do not give anentirely fair reflection of the true nature of the stock and thus the hyperparameters maybe further optimizable to some extent.

In contrast, the IXIC index in figure 3.2 shows that in the training data (days 1 -178), the range of prices is approximately in the interval from 4500 USD to 5200 USD.The validation data (days 179 - 216) consist of prices from 4800 USD to 5150 USD andmatches the typical price development of the training data well, and to a large extent thetest data as well. As discussed in section 5.1, this might suggest that the hyperparameteroptimization process for the IXIC models was more effective than for the WMT models.

Distributing the training and validation set is a complicated task. Increasing thevalidation data and decreasing the training data might create a superior model in somecases, whilst in some other cases it might be the direct opposite. The complexity increasesfurther due to the need for both the ANN and RF models to use the same partitioningscheme.

Overall, as both the ANN and RF models used the same partitioning scheme for thesame stock, it would be viable to state that the chosen methodology was fair for thecomparative analysis. The fact that ANNs are able to perform better for both stocks,regardless of the possible shortcomings discussed above, is an interesting result to notice.It is something that can be used to strengthen the case for the use of ANN models withinthe field of stock prediction.

5.2.3 The choice of technical indicators

The decision to use technical indicators to process the input vector was made after readingsimilar studies using the same approach. Thus, there was no greater knowledge on thesubject of technical analysis apart from the general ideas and the technical indicatorschosen were based on what had been read in similar reports. Further research on the

23

Page 30: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

CHAPTER 5. DISCUSSION AND CONCLUSION

subject of technical analysis and existing indicators and possibly using more indicatorscould have been something that could have improved the results and the methodology.

5.3 Comments on the reviewed sources

The primary sources used in this study includes Bishop’s [13] textbook on ANNs forpattern recognition, Breiman’s [22] paper on RFs, as well as Criminisi, Shotton andKonukohglu’s [19] review of RFs and its use cases. These were part of the main sourcesused to gain knowledge on ANNs and RFs and were thus key sources to be able tounderstand, implement and optimize the models. The legitimacy of these sources canbe regarded as strong as all authors have extensive backgrounds within the subject.Additional sources that were used to gain knowledge within these fields can also beregarded as strong as they generally were printed within respectable institutes and/orwere quite extensive in their material.

Other key sources include various studies made on the subject of implementing ANNand RF based models for stock prediction. The reliability of these sources could bequestioned as most reports were summarizing in how the models were implemented ratherthan having an exhaustive description. However, as these studies were made withineducational institutes, it would be justifiable to regard them as viable sources.

5.4 Final conclusion

The purpose of this study was to compare the predictive performances of ANNs andRFs for one-day predictions on the stock market. ANN and RF based models wereimplemented and their predictability was tested on two different stocks for an interval of38 consecutive days. The results of the study indicate the following:

• Seen over the whole test period, the ANN model is less erroneous than the RFmodel for both stocks. As indicated in tables 4.1 and 4.2, the ANN model showedbetter RMSE and MAPE.

• The nature of the data has a definite impact on the performance of the models.The ANN model was able to generalize better than the RF model on unreliabletraining data even if both models deteriorated to some extent at these occasions.

• There are few isolated occasions in the test data where the RF model outperformsthe ANN model.

According to these points, the ANN model showed better performance over the testperiod as a whole, showed to be less sensitive to unreliable training data and proved tobe the more consistent of the two. This makes the ANN model a better choice for oneday stock prediction compared to the RF model under the circumstances.

24

Page 31: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Appendix A

Biological neural networks

The intelligence of humans is dependent on the functioning and structure of the nervoussystem [26]. A neuron is a type of cell of the nervous system that is responsible fortransmitting information through electrical and chemical signals and a set of neurons thatis interconnected is called a biological neural network (BNN). The human brain whichis the center of the nervous system, has been estimated to consist of up to 1011 neuronswhere each neuron has up to 105 connections to other neurons [27]. These connectionsare established through the dendrites and axon terminals that are interconnected viasynapses that allow signals to travel between neurons, see figure A.1.

Figure A.1: A basic illustrative example of a neuron. Source: https://upload.wikimedia.org/wikipedia/commons/8/86/1206_The_Neuron.jpg

The sum of the input signals determines whether the neuron will send out a signalthrough its axon thus starting a potential chain reaction in the BNN. Furthermore,synapses have variable strength of their connections and this strength increases the moresignals are passed through a synapse over time. Consequently the passing signals willalso be stronger between these kinds of neurons as explained by the Hebbian theory [28].

25

Page 32: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

APPENDIX A. BIOLOGICAL NEURAL NETWORKS

With each new learning experience the synaptic connections will adapt to configure thenervous system in order to store this knowledge for future scenarios. This is the centralmodel that characterizes the learning process of humans [29].

26

Page 33: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Bibliography

[1] J. L. Ticknor, “A Bayesian regularized artificial neural network for stock marketforecasting,” Expert Systems with Applications, vol. 40, no. 14, pp. 5501–5506, 2013.

[2] M. Paluch and L. Jackowska-Strumillo, “The influence of using fractal analysis inhybrid MLP model for short-term forecast of close prices onWarsaw stock exchange,”in Computer Science and Information Systems (FedCSIS), Federated Conference,pp. 111–118, IEEE, 2014.

[3] L. S. Maciel and R. Ballini, “Design a neural network for time series financial fore-casting: Accuracy and robustness analysis,” Instituto de Economía, UniversidadeEstadual de Campinas, Sao Paulo-Brasil, 2008.

[4] T. Manojlovic and I. Stajduhar, “Predicting stock market trends using randomforests: A sample of the Zagreb stock exchange,” in Information and Communi-cation Technology, Electronics and Microelectronics (MIPRO), 38th InternationalConvention, pp. 1189–1193, IEEE, 2015.

[5] M. Kumar and M. Thenmozhi, “Forecasting stock index movement: A comparison ofsupport vector machines and random forest,” in Indian Institute of Capital Markets9th Capital Markets Conference Paper, 2006.

[6] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learn-ing algorithms,” in Proceedings of the 23rd International Conference on MachineLearning, pp. 161–168, ACM, 2006.

[7] P. H. Skärvad, Företagsekonomi 100. Liber, 2007.

[8] J. J. Murphy, Technical analysis of the financial markets: A comprehensive guide totrading methods and applications. Penguin, 1999.

[9] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine learning: An artificialintelligence approach. Springer Science & Business Media, 2013.

[10] E. Alpaydin, Introduction to machine learning. MIT press, 2014.

[11] P. Domingos, “A few useful things to know about machine learning,” Communica-tions of the ACM, vol. 55, no. 10, pp. 78–87, 2012.

27

Page 34: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

BIBLIOGRAPHY

[12] G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statisticallearning, vol. 112. Springer, 2013.

[13] C. M. Bishop, Neural networks for pattern recognition. Oxford university press,1995.

[14] L. Fausett, “Fundamentals of neural networks: architectures, algorithms, and appli-cations,” 1994.

[15] H. Jiang, T. Liu, and M. Wang, “Direct estimation of fault tolerance of feedforwardneural networks in pattern recognition,” in Neural Information Processing, pp. 124–131, Springer, 2006.

[16] V. Sharma, S. Rai, and A. Dev, “A comprehensive study of artificial neural net-works,” India (International Journal of Advanced Research in Computer Scienceand Software Engineering, Volume 2, Issue 10), 2012.

[17] K. Gurney, An introduction to neural networks. CRC press, 1997.

[18] MathWorks, “Levenberg-Marquardt backpropagation.” http://se.mathworks.com/help/nnet/ref/trainlm.html. Accessed: 2016-04-11.

[19] A. Criminisi, J. Shotton, and E. Konukoglu, “Decision forests: A unified frame-work for classification, regression, density estimation, manifold learning and semi-supervised learning,” Foundations and Trends R© in Computer Graphics and Vision,vol. 7, no. 2–3, pp. 81–227, 2012.

[20] M. Robnik-Šikonja and I. Kononenko, “Pruning regression trees with MDL,” inProceedings of the 13th European Conference on Artificial Intelligence, John Wiley& Sons, Chichester, England, pp. 455–459, 1998.

[21] B. Kitts, “Regression Trees,” tech. rep., http://www.appliedaisystems.com/papers/RegressionTrees.doc, 1997.

[22] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

[23] The Pennsylvania State University, “From Bagging to Random Forests.” https://onlinecourses.science.psu.edu/stat857/node/181. Accessed: 2016-04-11.

[24] S. Karsoliya, “Approximating number of hidden layer neurons in multiple hiddenlayer BPNN architecture,” International Journal of Engineering Trends and Tech-nology, vol. 3, no. 6, pp. 713–717, 2012.

[25] G. Panchal, A. Ganatra, Y. Kosta, and D. Panchal, “Behaviour analysis of multilayerperceptrons with multiple hidden neurons and hidden layers,” International Journalof Computer Theory and Engineering, vol. 3, no. 2, p. 332, 2011.

[26] A. C. Neubauer and A. Fink, “Intelligence and neural efficiency,” Neuroscience &Biobehavioral Reviews, vol. 33, no. 7, pp. 1004–1023, 2009.

28

Page 35: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

BIBLIOGRAPHY

[27] S. Herculano-Houzel, “The human brain in numbers: a linearly scaled-up primatebrain,” Frontiers in human neuroscience, vol. 3, p. 31, 2009.

[28] J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the theory of neural compu-tation, vol. 1. Basic Books, 1991.

[29] M. Mayford, S. A. Siegelbaum, and E. R. Kandel, “Synapses and memory storage,”Cold Spring Harbor perspectives in biology, vol. 4, no. 6, p. a005751, 2012.

29

Page 36: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

List of Figures

2.1 A basic outline of an artificial neuron. Source: https://en.wikibooks.org/wiki/File:ArtificialNeuronModel_english.png . . . . . . . . . . 6

2.2 A Multilayer perceptron (MLP) with one hidden layer. Source: https://upload.wikimedia.org/wikipedia/commons/c/c2/MultiLayerNeuralNetworkBigger_english.png . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 An illustrative example of a decision tree that is used to figure out whethera photo (the input vector) represents an indoor or an outdoor scene.Source: Criminisi et al. [19] (p.88). . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Regression tree (a) An illustrative example of a 2-dimensional regressiontree with binary splits. The predictive model in the terminal node takesthe mean of all data points contained in that partition. (b) An illustrativeexample of the partitioned space and the containing data points. Source:https://dzone.com/articles/regression-tree-using-gini%E2%80%99s 9

3.1 The actual prices of the WMT stock in the interval between Feb 3 2015 -Feb 4 2016. Source: Matlab. . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 The actual prices of the IXIC index in the interval between Feb 3 2015 -Feb 4 2016. Source: Matlab. . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 A schematic image reflecting the structure of the ANN model. Source:Matlab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1 A graph presenting the predictions of the two models on the WMT stockagainst the actual prices during the 38 day test period ranging betweendays 217-254. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 A graph presenting the predictions of the two models on the IXIC stockindex against the actual prices during the 38 day test period ranging be-tween days 217-254. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

A.1 A basic illustrative example of a neuron. Source: https://upload.wikimedia.org/wikipedia/commons/8/86/1206_The_Neuron.jpg . . . . 25

30

Page 37: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

List of Tables

3.1 Table presenting the partitioning of the data sets for each stock. Therespective stock had the same partitioning to build and evaluate both ofits respective ANN and RF models. . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Table presenting the hyperparameter values set for each RF model for therespective stocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Table presenting the RMSE and MAPE metrics for the WMT stock. Thefigures are represented with 4 decimal points. . . . . . . . . . . . . . . . . 19

4.2 Table presenting the RMSE and MAPE metrics for the IXIC stock index.The figures are represented with 4 decimal points. . . . . . . . . . . . . . . 20

31

Page 38: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

Acronyms & abbreviations

AI Artificial intelligence

ANN Artificial neural network

BNN Biological neural network

EMA Exponential moving average

FFNN Feedforward neural network

IXIC NASDAQ composite stock index

LMA Levenberg-Marquardt algorithm

MAPE Mean absolute percentage error

ML Machine learning

MLP Multilayer perceptron

MSE Mean squared error

RF Random forest

RMSE Root mean squared error

ROC Rate of change

SLP Single layer perceptron

WMT Walmart stock

32

Page 39: A comparative study on artificial neural networks and ...kth.diva-portal.org/smash/get/diva2:927335/FULLTEXT01.pdf · CHAPTER2. BACKGROUND 2.2 Machinelearning Machine learning can

www.kth.se