data analyses skills (id6020 module) rahul r. marathe department of management studies

31
Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Upload: vanessa-hoover

Post on 22-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Data Analyses Skills

(ID6020 Module)

Rahul R. MaratheDepartment of Management Studies

Page 2: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Introduction: Why?

Numbers everywhere! -- Last year, ID6020 had 386 students registered. This year the

number is 405. -- Average time required to complete a typical catalysis

experiment under laboratory conditions is 34.7.

• Successful professionals are those who can make sense of these numbers.

• In today’s world, it is more the case of information overload – too much data! It is our job to make this data tell us a story!

• Sort out what is important and what is not!

2

Page 3: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Introduction: Why?

• Whether you will be audited by income tax authorities depends a lot on sampling techniques used by the IT department, and also on you hitting certain numerical signals.

• The urban traffic planning is done using the data collected from various locations in a city.

• Market research firms use statistical techniques on point-of-sale data to understand buyer behavior.

• Suitability of a drug is decided by analyzing the field data collected from trials conducted.

• That’s why every professional should know these techniques.

3

Page 4: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Introduction: Why?

• Data analysis done traditionally through “Statistical techniques”; in recent times, we call this “Data Analytics”.

• Today, data analytics encompasses areas like: Statistics (uni- and multi- variate), Probability theory, Stochastic processes, Computational methods, Optimization techniques, Data mining, Artificial Intelligence, Econometrics, Numerical techniques, Simulation…..

• Data analysis – Understanding the story told by the numbers!

4

Page 5: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Introduction: Why?

• Very likely, your research will involve data collection and analysis.

• Data could be experimental (most engineering applications), or secondary data (from surveys – humanities and management).

• Data collection and analyses require deep understanding of theory and techniques of data analytics.

• Your research area itself could be data analytics. • You certainly require good understanding of theory and

techniques!

5

Page 6: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Introduction: Data

• Data: Any related observations.• A collection of data is the data set and single observation is

data point.• Data can be collected by:1. Observations of incidences occurring (direct recording)2. Surveys (and sampling)3. Conducting experiments etc.

• Data collection is the most important step. Because, if the collected data is not correct, analyses and conclusions are incorrect and misleading!

6

Page 7: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Data collection

Before relying on any data, test the data by asking:• Where did the data come from? Is the source biased?• Do the data support or contradict other evidence we have?• Is the evidence missing that might cause us to come to a

different conclusion?• How many observations do we have? Do they represent all

the groups we wish to study?• Are the conclusions logical? Have we made conclusions that

are not supported by data?

7

Page 8: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Example of misleading data

• Trucking company advertises“75% of everything you use travels by truck.”

• What do you conclude?

8

Page 9: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Before the data analyses….

Identify: Samples and population• A population is a collection of all the elements one wants to

study and about which one is trying to draw conclusions.• A sample is a collection of some, but not all, of the elements of

a population.Consider a beauty soap which is targeted at middle–class women

customer aged between 18 and 45 years, The population is the entire set of middle-class females of age

between 18 – 45. But you need to be careful about definition of “middle-class”. Clearly, a school girl is not a member of the population.

Sample is any subset of the above set.

9

Page 10: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Before the data analyses….

• Identify and classify variables

10

Types of scales

Data type Description Example

Nominal Qualitative Data arranged in unordered categories

Gender {Male, female}Software {Code A, Code B}

Ordinal Qualitative Ordered categories Quality of chemical {poor, average, good}

Interval Quantitative Rank and distance from arbitrary zero

Temperature (difference works, ratio doesn’t!)

Ratio Quantitative Interval + ratio with a meaning

Weight (object weighing 20 kgs is twice as heavy as object with weight 10 kg).

Page 11: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Quick check

• Can variables with nominal scale be quantitative? Yes or No.No – Nominal scale has categories. Categories are for qualitative

data.• Can variables with ordinal scale be qualitative? Yes or No.Could be qualitative; could be quantitative. So yes! • Can nominal or ordinal scale be continuous? Yes or No.No! Nominal or ordinal scale is for categorical data. Categorical

variables are discrete.• Can interval scale be continuous and/or discrete? Yes or No.It can be either continuous or discrete.

11

Page 12: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Before the data analyses….

• Check and question the assumptions made:A. LinearityB. NormalityC. SymmetryD. Effect of uncommon observation

12

Page 13: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Example

13

Pressure Current12.1 412.5 3.912.9 4.1113.4 4.414.9 2.01

Page 14: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Example (cont.)

Pressure Current12.1 412.5 3.912.9 4.1113.4 4.414.9 2.0114 3.7

14.8 2.7511.8 3.45

14.65 2.6814.2 2.9

14

Page 15: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Before the data analyses….

• Understand the purpose: Data analyses is done to identify and understand patterns in data and use this information to make better decisions.

DATA = STRUCTURE + NON-STRUCTUREDATA = EXPLAINED BEHAVIOR + WHITE NOISE

15

Page 16: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Steps in data analysis

• Once data is collected, we need to clean the data, and then summarize, interpret and make sense.

• Three categories:1. Descriptive: How can the data be summarized?2. Inferential: How can we draw inferences from the data?3. Predictive: How can we build predictive models using the data

available?

16

Page 17: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Summary of data

• Describe the data in graphical or statistical way: Some of commonly used graphical tools – Frequency distribution

tables; Line charts; Histogram; Higher dimensional plots; Scatter plot

Use of summary statistics – • Measures of central tendency (measures of location)

Examples? • Measures of dispersion (extent of scatter) Examples? • Measure of symmetry (skewness)• Etc.

17

Page 18: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Interpretation and prediction

Should depend on:• Data (variable) type;• Amount of data;• Expected type of conclusions.

• Data type:

18

Dependent variable Y

Independent variable X

Quantitative Qualitative

Quantitative Correlation, Regression

Convert X into qualitative

Qualitative ANOVA Crosstabulation (e.g. Pivot)

Page 19: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Example: Bridge failure

Material Design Load Corridor Support Status

Concrete 100 tons Bangalore Central Failed

Tar 75 tons Ahmedabad Multiple Failed

Tar 150 tons Mumbai Multiple Still there!

Concrete 125 tons Bareily Beams Failed

Synthetic 200 tons Gangtok Central Still there!

19

Page 20: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Questions to ask

• Want to know: Reasons for failure• Also: factors that may contribute to failure

• Is the data valid?• Is the data sufficient?• Can the conclusions be extrapolated?

• Possible methodology: Clustering algorithms.• Interpretation depends on whether you look at this problem as

a civil engineer, management researcher, or a computer scientist!

20

Page 21: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Example: Chemical reaction

• Time required to complete a chemical reaction in a set of experiments:

24.2, 20.15, 17.11, 14.83, …

Do you see a trend? Can we be more specific?

Solution methodology: Forecasting

What if the data has uncertainty?

21

Page 22: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Example: Regression

22

Page 23: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Example: Nonlinear relationships

23

Page 24: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

What should you be asking?

“Average time required to complete a typical catalysis experiment under laboratory conditions is 34.7.”

•What do you mean by “typical”?•What do you mean by “laboratory conditions”?•What were the other sample values? Was average value affected by extreme values?•What are the units?

24

Page 25: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Courses related to data analyses

Every department has some course(s) on analyses of data and modeling using data.

• Computational aerodynamics (AS5330)• Analytical methods in transportation engineering (CE5390)• Mathematical methods in thermal engg (ME6170)• Modeling and simulation in manufacturing (ME7240)• Mathematical methods in materials engg (MM5590)

• Probability and Statistics courses offered by Mathematics, MS and many other departments.

25

Page 26: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Courses related to data analyses

• Stochastic processes (multiple courses offered by EE, Mathematics, MS)

• Multiple courses offered by CSE (on data mining, AI, Data structures, Big Data)

• Optimization courses offered by CH, Mathematics, MS etc.• Econometrics courses offered by HS, MS.

• These courses will probably not teach you how to draw a 3D plot using the data you have, or how to interpret the same.

• But these courses will help you understand the numbers and analysis in your research!

26

Page 27: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Tools for data analyses

Institute license, available on super-computing server:• Abaqus• Ansys• LAMMPS • Matlab• Mathematica• Many more!

• SPSS – Many department have licenses. R is available free over internet

• Old friend: MS Excel27

Page 28: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

What should you be reading?

• Start from basic Data Analysis textbooks – understand the basics first.

• Read the advanced texts and research articles – need based learning (see what you require, understand the pre-requisites and then master the technique).

• General reading should never stop!!! e.g. “Freakonomics”: To understand what fun one can have

simply by playing with data!!

28

Page 29: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Data analyses

Do’s:• Apply the correct analysis technique• Understand the assumptions of the method• Enter the data in the selected technique correctly• Use the correct equations/software• Be very careful about the conclusions you draw.Dont’s:• Try each and every technique to decide which “looks” good.• Get fooled by jazzy graphs and colors.• Extrapolate results and conclusions.

29

Page 30: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Final word

• Data analyses skills are extremely important and useful.• Every researcher is going to require these skills at some point

or the other. • Equip yourself with these techniques and you are better

prepared for the battle of logic.• These weapons in your armory have to be used carefully, and

after knowing their capabilities (and limitations).• Don’t make the mistake of beating everything with the same

stick – different demons require different tools!

30

Page 31: Data Analyses Skills (ID6020 Module) Rahul R. Marathe Department of Management Studies

Best wishes!!

Questions? Comments?

rrmarathe_at_iitm.ac.in