siap-srtc training course on sampling acceed center, aim, makati philippines 4 april 2002
Post on 19-Mar-2016
31 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
SIAP-SRTC Training Course on SamplingAcceed Center, AIM, MakatiPhilippines4 April 2002
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
OUTLINE Statistical Computing ResourcesData Management with StataTable GenerationTab and Table CommandsSurvey Commands
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesThe Age of ICT has brought about a synergy of computing and communicationsImplications: More DATA collectedMore DATA storedMore DATA accessible and distributed
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesThere are a host of statistical software that provide pre-programmed analytical and data management capabilities. These software may be classified according to use and cost.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesTypes of Stat Software by usageGeneral Purpose -- SAS, SPSS, R, Splus, Statistica, StataSpecial Purposes -- econometric modeling (Eviews), seasonal adjustment (X12), Bayesian modeling (WINBUGS), survey data tabulation & variance estimation (IMPS, CENVAR)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesTypes of Stat Software by costCommercial Software - SAS, SPSS, Stata, S-plus Freeware - R, IMPS, X12
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesFOR SURVEY DATABascula from Statistics Netherlands. CENVAR (& IMPS)from U.S. Bureau of the Census. CLUSTERS from University of Essex. Epi Info from Centers for Disease Control. Generalized Estimation System (GES) from Statistics Canada. IVEWare (beta version) from University of Michigan.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesFOR SURVEY DATAPCCARP from Iowa State University. SAS/STAT from SAS Institute. Stata from Stata Corporation. SUDAAN from Research Triangle Institute. VPLX from U.S. Bureau of the Census. WesVar from Westat, Inc.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesLists of Statistical Software http://members.aol.com/johnp71/javasta2.html http://www.stir.ac.uk/Departments/HumanSciences/SocInfo/Statistical.htmhttp://www.fas.harvard.edu/~stats/survey-soft/ http://www.feweb.vu.nl/econometriclinks/software.html
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesThis afternoon, we will provide a demonstration on how to use STATA for accomplishing some of the most common tasks of data management, statistical computing and analysis of survey data.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStata Estimation of means, totals, ratios, and proportions; linear regression, logistic regression, and probit. Point estimates, associated standard errors, confidence intervals, and design effects for the full population or subpopulations are displayed.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStata Auxiliary commands display various information for linear combinations (e.g., differences) of estimators, and conduct hypothesis tests. New in Stata : contingency tables with Rao-Scott corrections of chi-squared tests; new survey-corrected regression commands including tobit, interval, censored, instrumental variables, multinomial logit, ordered logit and probit, and Poisson
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStatastratified designs; cluster sampling; FPCs can be calculated for simple random sampling w/o replacement of sampling units within strata; variance estimation for multistage sample data carried out through the customary between-PSU-squared-differences calculation.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStataVariance estimation is done thru Taylor-series linearization in the survey analysis commands. There are also commands for jackknife and bootstrap variance estimation, but these are not specifically oriented toward survey data.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesNote:We will demonstrate the use of STATA version 6. Current version is version 7; even a Special Edition (SE) which can handle up to 32,766 variables w/ strings up to 244 chars, and up to 11,000 x 11,000 matrices.
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementSTARTING UPGo to Start, Programs, Stata, Intercooled StataAlternatively, from Windows Explorer, go to folder c:\stata Double click wstata.exe
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementCREATING A NEW DATASETOpen the STATA spreadsheet editor
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementCREATING A NEW DATASETEnter data into the editor, when done close the editor.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementCREATING A NEW DATASETIn the STATA COMMAND window enter the commandsave newfile
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTEA STATA dataset will have extension name dta. That is, newfile is actually newfile.dtaPublic use files of some surveys, e.g. VLSS (Vietnam Living Standards Survey), are in Stata format.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementINSPECTING DATA BASEIn the STATA COMMAND window enter the following commandsdescribe list summarize
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:Stata is case sensitive.Stata commands may be abbreviated, e.g. D for DESCRIBE, SUM for SUMMARIZE, etc.We may use Page Up/Down keys or mouse for re-selecting commands in the Review window.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:Commands and output are shown in Results window. Windows may be re-sized. Commands and output may be logged into a log file by pressing Open Log button.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRENAMING VARIABLESONE WAY : (From Data Editor) Double click anywhere in the variables column resulting in a dialogue box
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRENAMING VARIABLESSECOND WAY: (In the STATA COMMAND window) enter rename var1 domain rename var2 hcn rename var3 age label variable age HH head age d
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementSAVING EDITED DATABASEIn the STATA COMMAND window enter the following commands save newfile, replaceNote: typing only save newfile will result in an error message
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementREADING PRE-EXISTING STATA DATASETIf dataset is in folder c:\fies2000 and filename is fies00small.dta, enter clear set mem 64m cd c:\fies2000 use fies00smallNOTE: Impt for MEMORY MANAGEMENT
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try.txt in c:\fies2000 folder NOTE: Missing Data coded as .
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try.txt in c:\fies2000 folderUse the infile command with syntaxinfile variable-list using filename.rawIn particular, entercd c:\fies2000 infile domain hcn age using try.txt, automatic
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementTRIVIA ON STRING VARIABLESWhen using the infile command for character (string) variables, we need to identify these variables. For instanceinfile domain hcn str30 prov using tr.txtFor more details regarding infile, enterhelp infile1
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try2.txt in c:\fies2000 folder with the data in specific fields Assumes last line is blank line
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try2.txt in c:\fies2000 folder with the data in specific fieldsUse the infix command infix domain 1 hcn 2 age 3-4 using try2.txt, clear
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementThus, Stata can read text files withInfile (if the data in text is separated by spaces and does not have strings, or if strings are just one word, or if all strings are enclosed in quotes)Infix (fixed format text)Insheet (if text file was created by a spreadsheet or db program)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:The commands infile, infix, insheet read data from ASCII files. Outfile is a way to save the data in ASCII. There are third party programs, esp. Stat/Transfer and DBMS/COPY, that perform translations from one data format (e.g., dBASE, Excel, SAS, SPSS, Stata) to another.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementOTHER USEFUL COMMANDSTo sort the dataset by age sort ageTo get a listing of the datasetlistTo get a listing of the 2nd-4th datalist in 2/4
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementOTHER USEFUL COMMANDSTo summarize the restricted dataset of HHs whose heads age is less than/equal to 50summarize if age > = == < 35To get the correlation matrixcorrelate x y z
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementGENERATING & REPLACING VARIABLESSuppose we want to obtain per capita income (pci) of FIES 2000 householdsclearcd d:\fies00use fies00small gen pci=toinc/hsize
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementGENERATING & REPLACING VARIABLESNow tag the household as poor (1) if pci < some threshold, say 13823, determine percent of HHs that are poor. gen poor=1 if pci < 13823 replace poor=0 if poor==. sum poor [aw=rfact] save fies00small, replace
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTESmall portion of data set of FIES 2000 was used. The Family Income and Expenditure Survey (FIES) is conducted by the National Statistics Office (NSO)every 3 years. Data may be purchased through the NSO website: www.census.gov.ph
2000 SPSS Public Sector User Exchange
-
SIAP-SRTC Training Course on SamplingAcceed Center, AIM, MakatiPhilippines5 April 2002
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRECALLThat if we use our fies2000 data setset mem 64m cd c:\fies2000 use fies00small sum poor [aw=rfact]Note poverty line we provided is a weighted average of the variable poverty lines in the Philippines (for urban-rural areas across the different regions)
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Estimating Food Poverty LineFood poverty line estimated from low cost one day menus (breakfast, lunch, supper snack) constructed for each urban-rural area of a region by Food and Nutrient Research Institute (FNRI) which meet 100% sufficiency in energy and protein requirements and 80% sufficiency of other nutrients and vitamins. RDAs for energy: 2000 Kcal per personRDAs for protein: 50 grams per person29 such menus constructed on the basis of the 1988 Food Consumption Survey
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Food Line Urban, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Food Line Rural, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Estimating Poverty LinePoverty Line= Food Threshold/ Engels Coefficient Engels coefficient estimated by analyzing the consumption pattern of families having incomes within plus or minus 10 percentage points from food threshold. Engels coeff = Food Exp/ Total Basic Exp
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Poverty Line Urban, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Poverty Line Rural, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Poverty Statistics (Family)[Standard Error]
Measures20001997
Poverty Incidence 33.6% [0.3%]31.8%Poverty Gap10.7%[0.1%]10.0%Severity Index4.6%[0.1%]4.3%
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Poverty Incidence All Areas, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Small Area Poverty Stats?Stata has some add ons for generating SEs for poverty statsIf we wish to generate provincial poverty statistics, we will find out that SEs are too high, i.e. figures are unreliable
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRECALLThat if we use our fies2000 data setset mem 64m cd c:\fies2000 use fies00small sum poor [aw=rfact]Note poverty line we provided is a weighted average of the variable poverty lines in the Philippines (for urban-rural areas across the different regions)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:STATA uses several types of weights fw frequency weightsaw analytic weights iw importance weightspw probability weights
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:Within the command generate or replace, we may transform or create variables by using functions, e.g.,generate loginc=ln(toinc) generate y=cos(x*_pi/180)replace newvar=normd(z) generate rvar=uniform()
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementDELETING VARIABLES/DATATo drop a variable, say agedrop ageTo drop some observationsdrop in 2/3Try also the command keep. To drop all data in memoryclear
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:So far we have used STATA interactively. We can also do batch processing through the DO FILE editor.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:The STATA toolbar has 13 buttons.
The first three are to OPEN a Stata datasetSAVE to the disk the resident dataset PRINT a graph or log
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
The next five are for Starting/stopping/suspending a LOG Bringing the Log to the Front Bringing the Dialog to Front Bringing the Results to Front Bringing the Graph to Front
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
The last five are for Opening the DO FILE editor Opening the DATA editor Opening the DATA Browser Telling Stat to continue when it has paused in mid of long output Stopping the current task
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExerciseWhat is the average income of families that are below or above the mean family expenditure?
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExerciseCompare correlation of food expenditures (fexp) and nonfood expenditures for families in rural & urban areas.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExtraEntergraph food nfood
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExtraNow trysort urb graph food nfood, by (urb) graph food nfood, by (urb) total
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExtraMatrix plotsgraph toinc food nfood, matrix
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Table Generation w/ tabEarlier, we showed the use of the tab(ulate) command. Trytab urb tab urb [aw=rfact]tab urb [iw=rfact]tab urb regn
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TabThe tab command has options for generating 1-way tables of freqs tab urb, summ(toinc)and two way tables tab urb sextab urb sex, rowtab urb sex, row col chi2 tab urb sex, all exact
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Table Generation w/ tableAside from the tab command, we can generate tables of statistics with the table command. Compare tab urbwithtable urb
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TableTo generate the average (family) income and average (family) expenditure across urban and rural areas, enter table urb, c(mean toinc mean toexp)Using weights table urb [aw=rfact], c(mean toinc mean toexp)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TableThe contents option may specify at most five of the ff statistics: freq (for frequency) mean varname (for mean of varname) sd varname (for standard deviation) sum varname (for sum) rawsum varname (for sums ignoring optionally specified weight) count varname (for count of nonmissing data)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TableThe contents option may specify at most five of the ff statistics:n varname (same as count)max varname (for maximum)min varname (for minimum)median varname (for median)p1 varname (for 1st percentile)p2 varname (for 2nd percentile) ...iqr varname (for interquartile range)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Exercise Using TableObtain the average and median per capita income of households by sex of household head table sex, c(mean pci median pci)Obtain the weighted frequency of poor and nonpoor households across regions table poor regn [iw=rfact]
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsSTATA has designed a family of commands especially for sample surveys. These commands all begin with svy svyset setting variables svydes describe strata and PSUs svymean estimate popn & subpop means svytotals estimate popn & subpop totals
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsSvy commands svyprop estimate popn & subpop props svyratio estimate popn & subpop ratios svytab for two way tables svyreg for regression svyivreg for instrumental variables reg svylogit for logit reg svyprobitfor probit reg
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsSvy commands svytest for hypothesis testing svylc for estimating linear combs svymlog for multinomial logistic reg svyolog for ordered logistic reg svyoprob for ordered probit reg svypois for poisson reg svyintrg for censored & interval reg
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsBefore issuing any svy estimation command, we identify the weight, strata and PSU identifier variables svyset pweight rfact svyset strata domain svyset psu hcn
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsTo obtain the average family income & average family expenditure svymean toinc toexp To obtain the total family income, total family expenditure by provincesvytotal toinc toexp, by(regn)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsTo obtain the per capita income & per capita expenditure svyratio toinc/fsize toexp/fsize pci & pce by urban/rural svyratio toinc/fsize toexp/fsize, by(urb)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsLinear regression of ln(pci) gen loginc=ln(pci)svyreg loginc age fsize sex prov urbCompare the results with the regular regression commandreg loginc age fsize sex prov urb
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsTwo way tablessvytab urb poor, row se compared withtab urb poor [aw=rfact], no freq row
2000 SPSS Public Sector User Exchange
-
Alternatives to STATA
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Learning More about StataOnline tutorial, typetutorial introList of TutorialsTutorial Description-----------------------------------------------------intro An introduction to Statagraphics How to make graphstables How to make tablesregress Estimating regression models, inc 2SLSanova Estimating one-, two- and N-way ANOVA and ANCOVA models
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Learning More about StataTutorial Description-----------------------------------------------------logit Estimating maximum-likelihood logit and probit modelssurvival Estimating ML survival modelsfactor Estimating factor and principal component modelsourdata Description of the data we provideyourdata How to input your own data into Stata
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Learning More about StataEmail distribution list. Send email to Majordomo@hsphsun2.harvard.eduIn the body of your email message type the message subscribe statalist email@address or for a daily summary subscribe statalist-digest email@address
2000 SPSS Public Sector User Exchange
-
Maraming Salamat sa inyong pakikinig.(Thank you for your attention)
2000 SPSS Public Sector User Exchange
2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public Sector User Exchange
top related