statistics for social sciences i (e563) statistics for social sciences i (e563) statistics for...

28
Statistics Statistics for Social Scien for Social Scien ces I (E563) ces I (E563) Prof. Sudip Ranjan Prof. Sudip Ranjan Basu Basu , , Ph.D Ph.D 25 September 2008 25 September 2008

Upload: nathan-curtis

Post on 20-Jan-2016

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

StatisticsStatistics for Social Sciences for Social Sciences I (E563)I (E563)

StatisticsStatistics for Social Sciences for Social Sciences I (E563)I (E563)

Prof. Sudip Ranjan BasuProf. Sudip Ranjan Basu, , Ph.DPh.D

25 September 200825 September 2008

Page 2: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 2

« A statistical tie »Think about these bar diagrams…

Page 3: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 3

Measurement in Statistics

• Concepts of measurement:• Measurement: a very specific process to assigning number to a

variable– Assignment by category (categorical/qualitative-attributes)– Assignment by amount

» assignment of a person to a particular category or a variable

– Validity: • to describe the objective and accurately reflect the concept• to measure by a particular scale or index

– Face validity/Content validity/Criterion validity/Construct validity

– Reliability: • to have consistency of the data collected• likelihood that the scale is actually measuring what it is supposed to

measure• Free of measurement errors

– Split-half reliability/test-retest reliability

Page 4: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 4

Forms of ‘variable’

• Variables: Concepts that vary, or change, from one observation to another in a sample or population

• Measurement scale differs

• Different statistical methods to apply to Quantitative and Qualitative variables

Variable

Quantitative: measurement scale

has numerical values, imply amounts-annual income

Categorical/Qualitative:

measurement scale is a set of categories,

not imply amounts-marital status

Page 5: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 5

Sales of measurement

•Quantitative variable:•Interval scale

Annual income (chf 50 and chf 30= chf 20)

•Qualitative variable:•Unordered/nominal scale

Primary mode of transportation (Bus, tram, bicycle, walk)

•Qualitative variable:•Ordered/ordinal scale

•Involves a rank order or other orderingPolitical philosophy

(Liberal, moderate. conservative)

Page 6: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 6

Quantitative aspects of ordinal data

• Interval scale: – Class interval: An interval that indicates the space

between two end points– Qualitative

• vary in magnitude

• Nominal scale: – Qualitative

• vary in quality not in quantity

• Ordinal scale: – quantitative-qualitative

• vary in quality not in quantity– Each level has a greater or smaller magnitude – Numerical scale by assigning numerical scores to

categories– Interval than nominal– Sensitivity analysis

Page 7: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 7

Discrete and Continuous

• Discrete: A set of values form separate numbers, such as 0,1,2,….

• Unit of measurement cannot be subdivided» Number of siblings » Number of visits to a physician last year

• Categorical variables-nominal or ordinal• Quantitative variables-discrete (Number of siblings) or

continuous (age)

• Continuous: An infinite continuum of possible real number values

• Any real number possible between two values» Height» Weight

Page 8: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 8

Summarize types of variables

Page 9: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 9

Describing data• Categorical data:

– Frequency : headcounts or tallies indicating the number of cases in particular category or the total number of cases measured/the number of observations

– Scores: Numbers that are used to represent amounts or rankings– Relative frequency

• The proportion (# of observations in a category divided by the total number of observations) or percentage (proportion multiplied by 100) of the observations that fall in that category

• Sum of proportions equals to 1.00– Frequency distribution

• A tabulation that lists possible values for a variable, together with the number of observations at each level.

– Relative frequency distribution• A listing of possible values together with their proportions or

percentages

• Quantitative data:– Frequency distribution

• Intervals of values in frequency distributions are usually of equal width• Mutually exclusive intervals

Page 10: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 10

Bar graphs

11

18

15

20

6

05

10

15

20

Native L

anguage S

peakers

Asian EU-other English French GermanSource: Statistics Class 1, SRBasu

by languagesBar diagram of native language speakers, E563

Page 11: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 11

Comparing groups

• Compare: Same variable and different groups

• Relative frequency distributions• Histograms• Stem-and-leaf plots

Page 12: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 12

Population and sample distribution

• Sample distribution is a ‘blurry’ picture of the population distribution– As the sample size increases, the

sample proportion in any interval gets closer to the true population proportion

• Sample distribution population distribution

Page 13: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 13

Shape of a distribution• Shapes of distributions differ

Symmetric

Skewed

Page 14: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 14

SESSION 2 of Lecture 2

Page 15: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 15

Working with [email protected]

http://www.ststa.com

Page 16: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 16

Getting started with STATA

• The first four windows open automatically after clicking STATA icon:

• The most visible window is the Results Window, which shows results from commands you have typed in the Command Window.

• The Command Window is below Results Window where all your commands are typed.

• The Review Window lists all typed commands that have been entered from the Command Window. When you click on a command from Review Window, it is pasted into the Command Window.

• The Variables Window lists all working variables in the file. Once you click on a variable, and it will appear in the command window.

Page 17: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 17

STATA window

Page 18: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 18

Simpel commands• The data editor allows you to enter, view, or

edit your working data file. Caution: This window must be closed in order to run commands in STATA.

• The do-file editor allows you to write, edit, and save STATA commands. STATA commands can be run from the do-file editor. -- files are called do files because they have the file extension .do

• Note: STATA treats lines that begin with an asterisk * or text between a pair of /* and */ as comments.

Page 19: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 19

Save-Close files• Open/Save/Close data file using the icons at

the top of the screen-“file” or via commands in the Command Window.

• The STATA dataset is saved in the .dta format.

• You can use a separate programme called Stat Transfer to translate the dataset from its current format into STATA format.

• For large dataset, researchers prefer to use this program. This program retains any variable or value labels from the original file.

Page 20: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 20

Help-Search• Memory allows you to handle a large datasets. For example,

you can set a memory size of 20m by the following command in the Command Window.

.set memory 20m• Help/Search facilities in the STATA allow looking for any

command. You can use the help command by simply typing help in the Command Window or using the drop-down Help menu icon, which will open a separate window. You can also type findit command for more information.

• However, if you do not know the STATA command name you can use the Search facility using the drop-down Help menu icon. For example, if you want help with describe, then you type:

.help describe• STATA programme uses simple language syntax. Almost all

commands follow the structure: .command variable (variable variable…) , options

Page 21: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 21

Creating a new dataset

• The easy way to create a dataset is to type values for each variable, in columns that STATA automatically calls var1, var2, etc in the Data Editor. Thus, var1 contains names of students; var1 statistics competency; and so forth.

• Rename: .rename var1 students

.label variable students “Students in Statistics, 2008-2009”

• After typing in the information, you close the window and save data, say

.stat2.dta

. save stat2

Page 22: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 22

Working with Sample

• Specifying Subsets of the data: You can restrict to a subset of the data by adding an in or if qualifier, such as using only the 1st through 20th observation, type

.list in 1/25 .sort origin .list origin program in 1/25 • The if qualifier also has broad applications,

but it selects observations based on specific variable values, such as

.summarize if stat==1

Page 23: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 23

Describing data• Frequency Tables and Two-Way

Cross Tabulations: You can work on Categorical variables for tabulation. Use the dataset stat to tabulate the categorical variable programme:

.tabulate programme• You can do cross-tabulation of

programme by stat: .tabulate programme stat• You can get column percentages, type .tabulate programme stat, column

Page 24: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 24

Data tabulation• Multiple Tables and Multi-way Cross-Tabulations: You

can work on many different variables, type .tab1 origin programme stat .tab1 programme – education • You can get multiple two-way tables, such as cross-

tabulations of every two-way combinations of the listed variables, type

.tab2 origin programme stat• To produce multi-way tables, if we do not need percentages

or statistical tests, type .table programme , contents (freq)• To produce two-way frequency table or cross-tabulation, type . table origin programme , contents (freq)• To produce a more complicated tables, type . table origin programme , contents (freq) by (stat)

Page 25: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 25

GRAPHS with STATA• You can draw bar charts, type: .graph bar stat, over (programme) blabel(bar) bar (1,

bcolor(gs10)) .graph bar stat, over(programme) legend( label(1 "Frequency")) ytitle("Native Language Speakers") title("Bar diagram of native language speakers, E563") subtitle("by languages") note("Source: Statistics Class 1, SRBasu")

.graph bar stat word, over (programme) blabel(bar) bar (1, bcolor(gs10)) bar (2, bcolor (gs7))

• You can draw horizontal bar charts, type: .graph hbar stat, over (programme) blabel(bar) bar (1,

bcolor(gs10)) .graph hbar stat word, over (programme) blabel(bar)

Page 26: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 26

Working with datasetsSee Week 2 web-course material

1) Assignment_1 Datasets:

2) Week2_Students Profile3) Week2_World Socio-economic data

Page 27: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 27

Week 3-2 October• Descriptive Statistics

» Measures of Central Tendency and Dispersion, Moments, Skewness, and Kurtosis

• Readings: » AF-Chapter 3 (p.39-60)» MS-Chapter 4, MS-Chapter 5

• Assignment: Assignment 2» Students should turn in his/her own paper in

hardcopies to teaching assistant at Rigot Office No. 31 or in class on Thursday 9 October-Week 4.

Note

Page 28: Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences I (E563) Statistics for Social Sciences

Lecture 2-Sudip R. Basu 28