data analytics basics & understanding

37
Intr oduction to Data Analytics Prof. Rudra Pradhan IIT Kharagpur

Upload: manu-sharma

Post on 03-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 1/37

Introduction toData Analytics

Prof. Rudra Pradhan

IIT Kharagpur

Page 2: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 2/37

Preamble

• What is data Analytics

• Why is it?

• How is different to data analysis

•What are its requirements

• Course coverage

Page 3: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 3/37

What is Data Analytics

• Analytics  is the discovery and communication of meaningful patterns in

data.

• Especially  valuable in areas rich with recorded information, analytics

relies on the simultaneous application of statistics, econometrics, computer programming and operations research to quantify performance.

• Analytics often favors data visualization to communicate insight.

Page 4: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 4/37

What is Data Analysis

• Analysis of data  is a process of inspecting, cleaning,transforming, and modeling data with the goal of discovering

useful information, suggesting conclusions, and supporting

decision maing.

• Data analysis has multiple facets and approaches,

encompassing diverse techniques under a variety of names, in

different business, science, and social science domains.

Page 5: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 5/37

Related Issues

• Data mining is a particular data analysis technique that focuses on

modeling and nowledge discovery for predictive rather than purely

descriptive purposes.

•Business intelligence covers data analysis that relies heavily onaggregation, focusing on business information.

Page 6: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 6/37

Structure of Data Analysis

• !escriptive statistics

• "#ploratory data analysis $"!A%

• Confirmatory data analysis $C!A%

"!A focuses on discovering new features in the data, while C!A is on

confirming or falsifying e#isting hypotheses.

Page 7: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 7/37

Related Issues

• &redictive analytics and te#t analytics'

&A focuses on application of statistical or structural models for

 predictive forecasting, while (A applies statistical, and structural

techniques to e#tract and classify information from te#tual sources,

a species of unstructured data.

• !ata integration is a precursor to data analysis, and data analysis is

closely lined to data visualization and data dissemination.

• (he term data analysis is sometimes used as a synonym for data

modeling.

Page 8: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 8/37

Analytics Vs. Analysis

• Analytics is a multi)dimensional discipline. (here is e#tensive use of

mathematics and statistics, the use of descriptive techniques and predictive

models to gain valuable nowledge from data ) data analysis. (he insights

from data are used to recommend action or to guide decision maing

rooted in business conte#t.

• Analytics is not so much concerned with individual analyses or analysis

steps, but with the entire methodology. (here is a pronounced tendency to

use the term analytics in business settings e.g. te#t analytics vs. the more

generic te#t mining to emphasize this broader perspective.

Advanced analytics, typically used to describe the technical aspects ofanalytics, especially predictive modeling, machine learning techniques lie

artificial neural networs.

Page 9: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 9/37

Why Data Analytics

• ar!eting optimi"ation

• Portfolio management

• Ris! management

• Stoc! mar!et prediction

• #inancial mar!et forecasting

• Digital analytics

Page 10: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 10/37

Few Questions

• How to set a perfect path?

• Do you need support?

Do you need criteria?• Do you need tricks?

• Is it reliable?

Page 11: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 11/37

Principles of odelling

Object/ System

hy? hat arewe lookin! for

"ind? hat do wewant to know

#odel$ariable% &arameters

#odel &rediction

$alid%

Accepted predictions

 'est

Page 12: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 12/37

Basic $nderstandings

• Data

• Variables

• Scaling

• odels% S&

• 'ools% statistics( mathematics( econometrics( operation research

• Statistical odeling

• athematical odeling

• Soft )omputing

Page 13: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 13/37

odeling Structure

• 'heory

• Assumptions

• *b+ectives

• )onstraints

*odelling' it shows the relationships, direct and indirect, interrelationships ofactions and reactions in terms of cause and effect.

(wo types' !escriptive and predictive

+oth dynamic and static

Page 14: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 14/37

()

E,amples of the !ind of problems that

may be solved by an Econometrician

(. 'estin! whether *nancial markets are weak+forminformationally e,cient.

-. 'estin! whether the A&# or A&' represent superior

models for the determination of returns on risky assets.

. #easurin! and forecastin! the 0olatility of bond returns.

). 12plainin! the determinants of bond credit ratin!s usedby the ratin!s a!encies.

3. #odellin! lon!+term relationships between prices ande2chan!e rates

Page 15: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 15/37

(3

E,amples of the !ind of problems that

may be solved by an Econometrician -cont.d/

4. Determinin! the optimal hed!e ratio for a spot position inoil.

5. 'estin! technical tradin! rules to determine which makes

the most money.

6. 'estin! the hypothesis that earnin!s or di0idendannouncements ha0e no e7ect on stock prices.

8. 'estin! whether spot or futures markets react more rapidly

to news.

(9."orecastin! the correlation between the returns to thestock indices of two countries.

Page 16: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 16/37

(4

•  Frequency & quantity of data 

toc maret prices are measured every time there is a trade or

somebody posts a new quote.

• Quality

-ecorded asset prices are usually those at which the transaction too

 place. o possibility for measurement error but financial data are /noisy0.

 

What are the Special )haracteristics

 of #inancial Data0

Page 17: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 17/37

(5

 'ypes of Data and 1otation

• (here are 1 types of data which econometricians might use for analysis'

2. (ime series data

3. Cross)sectional data

1. &anel data, a combination of 2. 4 3.

• (he data may be quantitative $e.g. e#change rates, stoc prices, number ofshares outstanding%, or qualitative $e.g. day of the wee%.

• "#amples of time series data

Series Frequency

5& or unemployment monthly, or quarterly

government budget deficit annually

money supply weely

value of a stoc maret inde# as transactions occur

Page 18: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 18/37

(6

'ypes of Data and 1otation -cont.d/

•  Examples of Problems that Could be Tackled Usin a Time Series !eression

) How the value of a country6s stoc inde# has varied with that country6s

  macroeconomic fundamentals.

) How the value of a company6s stoc price has varied when it announced the

  value of its dividend payment.

) (he effect on a country6s currency of an increase in its interest rate

• Cross)sectional data are data on one or more variables collected at a single

 point in time, e.g.) A poll of usage of internet stoc broing services

) Cross)section of stoc returns on the ew 7or toc "#change

) A sample of bond credit ratings for 89 bans

Page 19: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 19/37

(8

'ypes of Data and 1otation -cont.d/

•  Examples of Problems that Could be Tackled Usin a Cross"Sectional !eression

) (he relationship between company size and the return to investing in its shares

) (he relationship between a country6s 5!& level and the probability that the

  government will default on its sovereign debt.

• &anel !ata has the dimensions of both time series and cross)sections, e.g. the

daily prices of a number of blue chip stocs over two years.

• :t is common to denote each observation by the letter t  and the total number of

observations by T  for time series data, and to to denote each observation by the

letter i and the total number of observations by #  for cross)sectional data.

Page 20: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 20/37

-9

• :t is preferable not to wor directly with asset prices, so we usually convert theraw prices into a series of returns. (here are two ways to do this'

imple returns or log returns

 

where, !t  denotes the return at time t 

   pt  denotes the asset price at time t 

  ln denotes the natural logarithm

• We also ignore any dividend payments, or alternatively assume that the priceseries have been already ad;usted to account for them.

 

Returns in #inancial odelling

<2==2

2

×−

=−

−t 

t t t  p

 p p !   <2==ln

2×   

  

 = −t 

t t 

 p

 p !

Page 21: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 21/37

-(

• (he returns are also nown as log price relatives, which will be used throughout this

 boo. (here are a number of reasons for this'

2. (hey have the nice property that they can be interpreted as continuously

  compounded returns.

3. Can add them up, e.g. if we want a weely return and we have calculated  daily log returns'

r 2 > ln p2p= > ln p2 ) ln p=

r 3 > ln p3p2 > ln p3 ) ln p2

r 1 > ln p1p3 > ln p1 ) ln p3

r @ > ln p@p1 > ln p@ ) ln p1

r  > ln pp@ > ln p ) ln p@

        

  ln p ) ln p=  > ln pp=

2og Returns

Page 22: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 22/37

--

 

• (here is a disadvantage of using the log)returns. (he simple return on a

 portfolio of assets is a weighted average of the simple returns on the

individual assets'

 

• +ut this does not wor for the continuously compounded returns.

A Disadvantage of using 2og Returns

 ! $ ! pt ip it 

i

 # 

==

∑2

Page 23: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 23/37

-

Steps involved in the formulation of

econometric models

"conomic or Binancial (heory $&revious tudies%

Bormulation of an "stimable (heoretical *odel

  Collection of !ata

  *odel "stimation

  :s the *odel tatistically Adequate?

  o 7es

  -eformulate *odel :nterpret *odel

8se for Analysis

Page 24: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 24/37

-)

2. !oes the paper involve the development of a theoretical model or is it

  merely a technique looing for an application, or an e#ercise in data

  mining?

3. :s the data of /good quality0? :s it from a reliable source? :s the size of 

  the sample sufficiently large for asymptotic theory to be invoed?

1. Have the techniques been validly applied? Have diagnostic tests for   violations of been conducted for any assumptions made in the

estimation

  of the model?

Some Points to )onsider 3hen reading papers

in the academic finance literature

Page 25: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 25/37

-3

@. Have the results been interpreted sensibly? :s the strength of the results

  e#aggerated? !o the results actually address the questions posed by the

  authors?

. Are the conclusions drawn appropriate given the results, or has the

  importance of the results of the paper been overstated?

Some Points to )onsider 3hen reading papers

in the academic finance literature -cont.d/

Page 26: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 26/37

*b+ectives of  Data Analytics

• Data reduction

• Structural simplification

• Analysis of dependence

• Analysis of interdependence

• Prediction& #orecasting

• 4ypotheses construction and testing

• Strategy and policy implications

Page 27: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 27/37

)ourse odules

odule 5% Basic Applied Econometrics

+asics, probability distribution, regression analysis, issues and problems ofregression analysis

odule 6% Advanced Econometrics

--*, &!*, "*odule 7% 'ime series Econometrics

:ntegration and co)integration, DA- modelling, volatility modelling, bootstrapping

odule 8% *ptimi"ation 'ools

imple E&&, :nteger programming, 5oal programming, imulation, AH&, WE&

odule 9% Soft computing

A, BE, 5A, D*

Page 28: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 28/37

odelling Structure

• $nivariate structure

Central tendency, dispersion, sewness, urtosis

• Bivariate structure

Covariance, correlation, regression

• ultivarate structure

Correlation, regression, factor analysis, con;oint analysis, cluster analysis, path

analysis, *!, AH&, "*

Page 29: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 29/37

Statistical Modelling: A BasicFraewor! 

Object/ System:esearch Desi!n/hoice/ reati0ity

;ni0ariate#odellin! #ulti0ariate#odellin!

Data Analysis

Interpretation andonclusion

<i0ariate#odellin!

Page 30: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 30/37

Research Process

Step ": #e$ne Research Pro%leStep &: Re'iew of (iterature

)Re'iew concepts and theories*

Re'iew pre'ious research $nding+

Step ,: Forulate -pothesesStep /: Research #esign

Step 0: #ata 1ollection

Step 2: #ata Analsis

Step 3: Interpretation

Page 31: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 31/37

Soft commuting% Basics

• Soft computing is a term applied to a field within computer science which

is characterized by the use of ine#act solutions to computationally hard

tass such as the solution of non)deterministic polynomial $&%) complete

 problems, for which there is no nown algorithm that can compute an

e#act solution in polynomial time.

• Soft computing differs from conventional $hard% computing in that, unlie

hard computing, it is tolerant of imprecision, uncertainty, partial truth, and

appro#imation. :n effect, the role model for soft computing is the human

mind.

Page 32: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 32/37

'ools of Soft )omputing

Artificial neural networs $A%• upport Dector *achines $D*%

• Buzzy logic $BE%

• "volutionary computation $"C%, including'

 – "volutionary algorithms

5enetic algorithms• !ifferential evolution

 – *etaheuristic and warm :ntelligence

• Ant colony optimization

• &article swarm optimization

• :deas about probability including'

 – +ayesian networ 

• Chaos theory

• Wavelet analysis

Page 33: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 33/37

$ni:variate Statistics

• entral 'endency

• Dispersion

Skewness• =urtosis

Page 34: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 34/37

<i+0ariate Statistics

• o0ariance

• orrelation

Page 35: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 35/37

Why ultivariate odelling

• Applicability% )lient fields use these techni;ues

• <uantification% )reate the habit of loo!ing at the strength of a

relationship( not +ust the significance=

• )reativity% a!e introductory statistics give techni;ues that let

students e,press their o3n interests=• Empo3erment% ove from parado, to understanding.

Page 36: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 36/37

-ow to Teach Multi'ariateModelling to Intro. Students

Replace alge%ra with coputation4 siulationand geoetr.

• Siulation:

1on$dence inter'als 'ia %ootstrapping*hpothesis testing 'ia randoi5ation ofe6planator 'aria%les.

7eoetr:Regression as pro8ection* A9;A as

Pthagorean 'ector decoposition* p<'alues fro su%tended angles.

Page 37: Data Analytics Basics & Understanding

8/12/2019 Data Analytics Basics & Understanding

http://slidepdf.com/reader/full/data-analytics-basics-understanding 37/37

Data #odellin! and &acka!edSoftware

• SPSS• =;I=>S• MI1RFIT• 7A?SS• (IM#=P• MAT(AB•

AMS• MI9ITAB• STATISTI1A• RATS• S@STAT• STATA• (IS=RA(• SAS• TSP• S-AAM• #=A