health survey for england: small area estimation teaching...

20
Health Survey for England: Small area estimation teaching datasets ESDS Government Author: Alan Marshall Version: 1.1 Date: January 2011 UK Data Archive Study Number 6792 - Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset

Upload: others

Post on 22-Dec-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

Health Survey for England: Small area estimation teaching datasets

ESDS Government

Author: Alan Marshall Version: 1.1 Date: January 2011

UK Data Archive Study Number 6792 - Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset

Page 2: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

Contents List of figures ...................................................................................................................... 3 1. Introduction to the Health Survey for England ............................................................... 4 2. How to obtain the Health Survey for England: Small area estimation teaching datasets 5 3. Overview of datasets and teaching materials .................................................................. 5 4. Data and variables within the main dataset (‘HSE data.dta’) ......................................... 6 5. Weighting the data .......................................................................................................... 7 6. Missing values ................................................................................................................ 7 7. Frequencies ..................................................................................................................... 8 8. Aggregate datasets ........................................................................................................ 10 9. Appendix ....................................................................................................................... 13

9.1 Syntax used to create HSE data.dta ........................................................................ 13 9.2 Syntax used to create Practical 1- task 5 – data.dta ................................................ 16 9.3 Syntax used to create Practical 2 data.dta ............................................................... 18

Page 3: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

List of figures

Figure 1: Teaching materials that accompany the Small area estimation teaching datasets............................................................................................................................................. 5 Figure 2: Variables in the 2000 to 2001 HSE data for small area estimation of disability file ....................................................................................................................................... 6 Figure 3: Descriptive statistics for the ‘sex’ variable ......................................................... 8 Figure 4: Summary statistics for the ‘age’ variable ............................................................ 8 Figure 5: Descriptive statistics for the 'gora' variable ......................................................... 8 Figure 6: Descriptive statistics for the 'year' variable ......................................................... 8 Figure 7: Descriptive statistics for the 'disab' variable ....................................................... 8 Figure 8: Descriptive statistics for the 'pcare' variable ....................................................... 9 Figure 9: Descriptive statistics for the 'sight' variable ........................................................ 9 Figure 10: Descriptive statistics for the 'mobility' variable ................................................ 9 Figure 11: Descriptive statistics for the 'llti' variable ......................................................... 9

Page 4: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

1. Introduction to the Health Survey for England The Health Survey for England (HSE) is a series of annual surveys about the health of people living in England. Since 1994 the survey has been carried out by the Joint Health Surveys Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, Royal Free and University College Medical School, London. The survey is sponsored by the Department of Health to provide better and more reliable information about various aspects of people’s health and to monitor selected health targets. The HSE began in 1991 and has been carried out annually since then. The survey combines questionnaire-based answers with physical measurements and the analysis of blood samples. Blood pressure, height, weight, smoking, drinking and general health are covered every year. An interview with each eligible person in the household is followed by a nurse visit. The interview is carried out face-to-face by a trained interviewer using a laptop computer. Some of the more sensitive topics are answered using self-completion questionnaires for confidentiality reasons. A number of core questions are included every year but each year’s survey also has a particular focus on a disease or condition or population group. In 2000 and 2001 a module of question on disability were included Topics are brought back at appropriate intervals in order to monitor change. The 'core' questions include: general health and psychosocial indicators, smoking, alcohol, demographic and socio-economic indicators, questions about use of health services and prescribed medicines and measurements of height, weight and blood pressure. The modules may be about a single topic, several topics or about population groups, for example the 2000 HSE had a boost sample of older people in residential/care homes The Health Survey for England followed a multistage stratified probability sampling design in 2000 and 2001. In each year the primary sampling units were postcodes stratified according to health authority regions and the percentage of households with a head of household in a non-manual occupation. The HSE 2000 and 2001 are designed to provide data at both national and regional level about the population living in private households in England. In 2001 the HSE included the population at all ages whilst in 2000 the HSE did not include those ages 0-1. A boosted sample of older people (aged 65 and over) resident in care homes was included in the HSE 2000. More information about the 2000 HSE1 and the 2001 HSE2, including the questionnaire and detailed information about the variables included in the dataset is available from the Economic and Social Data Service (ESDS):

1 http://www.esds.ac.uk/findingData/snDescription.asp?sn=4487 2 http://www.esds.ac.uk/findingData/snDescription.asp?sn=4628

Page 5: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

2. How to obtain the Health Survey for England: Small area estimation teaching datasets The Health Survey for England (HSE): Small area estimation teaching datasets are freely available for download. The normal registration process has been waived because a limited number of variables (14) are included in the dataset. The teaching dataset and materials can be can be downloaded from the ESDS Government webpages

3. Overview of datasets and teaching materials There are 4 datasets that comprise The Health Survey for England (HSE): Small area estimation teaching datasets. The main dataset ‘HSE data.dta’ contains data extracted from the Health Survey for England in 2000 and 2001 with each row containing an individual (aged over 10). There are 3 other aggregate (or area level) datasets that are derived from ‘HSE.dta’ but also contain data from sources including the Census, Mid-year population estimates and population projections. The aggregate datasets are listed below and explained in more detail in section 10:

• Practical 1 – task 5 – data.dta • Practical 2 data.dta • Pop data – practical 3.dta

There are a number of other teaching materials that accompany the data. These are summarised in the table below: Figure 1: Teaching materials that accompany the Small area estimation teaching datasets File name Description Small area estimation using ESDS Government Surveys

This is a word document that introduces various methods and issues associated with generating small area estimates using ESDS Government Surveys. It includes three practical case studies that enables users to produce district estimates of disability using the Health Survey for England (2000/2001), the Census, Mid-year population estimates and population projections.

Practical_1 This is a Stata do file that contains the syntax required to perform the analysis in practical 1 of the guide above (p28)

Practical_2 This is a Stata do file that contains the syntax required to perform the analysis in practical 2 of the guide above (p51)

Practical_3 This is a Stata do file that contains the syntax required to perform the analysis in practical 3 of the guide above (p65)

It is important that you save the data you download to the C drive of your computer as all of the paths to files in the syntax files assume this location.

Page 6: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

4. Data and variables within the main dataset (‘HSE data.dta’) The main dataset (HSE data.dta) contains data from the HSE 2000 and the HSE 2001 on individuals aged 10 and over (Disability questions were not asked of those under 10). The dataset also includes the boost sample of elderly people in residential/care homes that was undertaken in 2000. The disability variables in the teaching dataset are modified from those in the HSE 2000 and 2001 and full details of the modifications are given in the syntax that was used to derive ‘HSE data.dta’ (see appendix). Each of the disability variables has been coded so that a person either has a disability or not - in the original data there were categories for higher and lower severities of disability. The questions that were used to determine whether a person had a disability of a particular type are based on ability to perform daily activities. More information on these can be found in the HSE disability report for 20013. Communication disability is not included the teaching dataset, however, a person who had a communication disability and none of the other disabilities would be classed as having a disability under the overall disability measure. A slight alteration is made to the age variable, with all those aged over 84 being given an age of 88 (matching the mean age of the 85+ population derived from the 2001 Sample of anonymised Records4) The table below shows the variables that are included in the ‘HSE data.dta’ dataset.

Figure 2: Variables in the 2000 to 2001 HSE data for small area estimation of disability file Variable name

Description

Sex 1=male 2=female

Age Single year of age Gora Government Office region (1=North East, 2=North West, 3=Yorkshire and

Humberside, 4=West Midlands, 5=East Midlands, 6=East of England, 7=London, 8=South East, 9=South West)

area Primary sampling unit (postcode) Weight Weights to account for disproportionate sampling of children and older people Year Survey year (2000 or 2001) Disab Overall disability - indicates whether a person has one of the 5 disabilities

measured in the HSE (mobility, personal care, hearing, sight, communication) 1=has an disability 0=does not have a disability

3 http://www.archive2.official-documents.co.uk/document/deps/doh/survey01/disa/disa07.htm#a6 4 http://www.ccsr.ac.uk/sars/

Page 7: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

Variable name

Description

Pcare Personal care disability 1= has a personal care disability 0=does not have a personal care disability

Sight Sight disability 1= has a sight disability 0=does not have a sight disability

Hear Hearing disability 1= has a hearing disability 0=does not have a hearing disability

Mobility Mobility disability 1= has a mobility disability 0=does not have a mobility disability

LLTI Limiting long term illness (LLTI) 1= has an LLTI 2=does not have an LLTI

Agesq Age squared=age*age Agecub Age cubed=age*age*age

5. Weighting the data The variable ‘weight’ should be used to weight the data. This corrects for disproportionate sampling of children (only two children per household are selected so the children from households with more than two children are under represented) and of older people (to correct for the elderly boost sample). The weights in each survey have been centred so that their mean is equal to 1 (see syntax in the appendix). This ensures that one year of HSE data is not weighted more than the other. A possible weakness of the weights in this dataset is that weights for the elderly population are not adjusted in recognition that the HSE 2001 sample did not include a care home sample. A more robust approach would involve doubling the weights of those in care homes in 2000 in recognition of the lack of institutional coverage in the 2001 HSE. However, a comparison of the results from small area estimation models reveals very little difference between the estimates derived from each weighting approach. For more information see Marshall (2009) – Appendix H5.

6. Missing values A dot (.) is used to denote missing values in the data. Information on why data is missing is removed from the data for all variables with the exception of the gora variable. Each of the disability variables has a very small number of missing values. The LLTI variable has a larger number of missing values resulting from the failure to include this question in proxy interviews in 2000. A Government Office Region of residence was not recorded for the care home sample and so a large number of ‘not applicable’ (-1) missing values result for the ‘gora’ variable.

5 http://www.ccsr.ac.uk/staff/documents/Thesis_Alan_Marshall_Final_submitted_version.pdf

Page 8: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

7. Frequencies The following frequencies and descriptive statistics relate to the unweighted HSE Small area estimation teaching dataset – HSE data.dta. Figure 3: Descriptive statistics for the ‘sex’ variable Sex Frequency % Cumulative % men 12,394 43.6 43.6women 16,057 56.4 100Total 28,451 100 Figure 4: Summary statistics for the ‘age’ variable Age Obs Mean Std. Dev. Min Max Age 28,451 47.8 22.3 10 88 Figure 5: Descriptive statistics for the 'gora' variable Government Office Region Frequency % Cumulative % -1 2,493 8.8 8.8North East 1,637 5.8 14.5North West 3,549 12.5 27.0Yorkshire and Humberside 2,782 9.8 36.8West Midlands 2,903 10.2 47.0East Midlands 2,431 8.5 55.5East of England 3,005 10.6 66.1London 3,006 10.6 76.6South East 3,845 13.5 90.2South West 2,784 9.8 99.9. 16 0.1 100Total 28,451 100 Figure 6: Descriptive statistics for the 'year' variable HSE survey year Frequency % Cumulative % 2000 11,277 39.6 39.62001 17,174 60.4 100Total 28,451 100 Figure 7: Descriptive statistics for the 'disab' variable Overall disability Frequency % Cumulative % No disability 22,003 77.3 77.3Disability 6,437 22.6 100.0. 11 0.0 100Total 28,451 100

Page 9: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

Figure 8: Descriptive statistics for the 'pcare' variable Personal care disability Frequency % Cumulative % No disability 25,264 88.8 88.8Disability 3,173 11.2 100.0. 14 0.1 100Total 28,451 100 Figure 9: Descriptive statistics for the 'sight' variable Sight disability Frequency % Cumulative % No disability 27,163 95.5 95.5Disability 1,274 4.5 100.0. 14 0.1 100Total 28,451 100 Figure 10: Descriptive statistics for the 'mobility' variable Mobility disability Frequency % Cumulative % No disability 23,384 82.2 82.2Disability 5,056 17.8 100.0. 11 0.0 100Total 28,451 100 Figure 11: Descriptive statistics for the 'llti' variable Limiting long term illness Frequency % Cumulative % No LLTI 20,086 70.6 70.6LLTI 7,077 24.9 95.5. 1,288 4.5 100Total 28,451 100

Page 10: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

8. Aggregate datasets As described in section 3, there are 3 aggregate (area) datasets that are used in the practicals of the Small area estimation guide. Each of these aggregate datasets contains age and sex specific counts of population and model/observed rates of LLTI/disability either for the country as a whole or for specific districts. The population counts are either from the Census or from mid year estimates/population projections that are developed by the Office for National Statistics (ONS). It should be noted that the online ONS mid year estimates and population projections are rounded to the nearest 100. The ONS have kindly provided unrounded population counts for the calculations in these practicals, however, they do not recommend reporting the unrounded data as it implies a false sense of accuracy in the figures they produce. This section provides more details on the aggregate datasets whilst the appendix contains the syntax used to create the aggregate data. ‘Practical 1 – task 5-data.dta’ contains a row for each single year of age (10, 11,….84,88) for males and females in each of the six casestudy districts (Barnet, Bury, Wakefield, South Bucks, Easington and Stroud). Figure 12 shows the variables that are included in this dataset.

Figure 12: Variables in the Practical 1 - task 5 – data.dat file Variable name Description

Zonecode ONS code for local authority district

Zonename Name of local authority district

Gora Government office region of local authority district

Sex Sex (1=male, 2=female)

Age Age

Pop_2001 District population count in 2001 (Census) taken from table ST001 Age and

sex by resident type

Pop_2021 Projected district population count in 2021 (2006 based population

projections)6

Pred_MO Regional predicted regional mobility rates (from practical 4). See the Practical

1 do file for details of how these rates were estimated.

Pred_MO_ENG England predicted mobility rates (from practical 3). See the Practical 1 do file

for details of how these rates were estimated.

6 Projections are available at: http://www.statistics.gov.uk/downloads/theme_population/SNPP-2006/InteractivePDF_2006-basedSNPP.pdf

Page 11: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

The dataset ‘Practical 2.dta’ contains a row for each single year of age (10,11, 12,…..83, 84, 88) for males and females in England and each of the six casestudy districts. The variables that are included in this dataset are shown in figure 13.

Figure 13: Variables in the ‘practical 2’ dataset Variable Description Zonecode ONS code of local authority district Zonename Name of local authority district Sex Sex (1=male, 2=female) age Age (10,11,……83,84,88) llti_2001 Age and sex specific LLTI rates for England (Census) Table ST016 – Sex and

age by general health and limiting long term illness and table ST065 - Sex and age by general health and limiting long term illness (communal establishments).

pop_2001 Population counts (by single year of age and sex) in 2001 (census) pop_2021 Population counts (by single year of age and sex) in 2021 (2006 based

population projections) MO_OBS_RT Age and sex specific rates of mobility disability for England D_OBS_RT Age and sex specific rates of overall disability for England PC_OBS_RT Age and sex specific rates of personal care disability for England HR_OBS_RT Age and sex specific rates of hearing disability for England ST_OBS_RT Age and sex specific rates of sight disability for England mobilitycount The number of HSE observations (individuals) at each single year of age that

contribute towards the calculations for the mobility rates for males and females (England). This variable is used (along with MO_OBS_RT) to calculate analytic weights (aweights) for use when fitting the relational models

Disabweight Relational model weights for overall disability Pcareweight Relational model weights for personal care disability Hearweight Relational model weights for hearing disability Sightweight Relational model weights for sight disability The variable llti_2001 contains rates of llti (for England and each of the six casestudy districts). The census data on LLTI are only released with quinary age detail. In order to generate single year estimates, these five year rates are smoothed using an Excel based tool developed by POPGROUP7 users specifically for this purpose. For each quinary rate the excel tool creates an estimated rate for each of the five single years. The rate for the middle year is set to equal the estimate for the quinary age group as a whole. The other four rates are calculated using the difference between the quinary rate in question and the rate for the neighbouring quinary group using weights that are proportionate to the distance from the single year in question to middle year. This ensures that a smooth graduation of the quinary rates is achieved in the estimated single year schedules. The Excel tool is available on request from the author or ESDS Government.

7 For more information on POPGROUP see www.ccsr.ac.uk/popgroup/

Page 12: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

The rates of disability (D_OBS_RT, PC_OBS_RT, MO_OBS_RT, HR_OBS_RT AND ST_OBS_RT) are all estimated from the Health Survey for England and relate to England as a whole (the same rates are repeated for each district). The syntax on how these are calculated are given in practical 1 do file. The weights for each disability type (with the exception of mobility disability - the weights for this disability type are calculated in practical 2) are calculated as defined in equation 16 of the Small area estimation using ESDS Government surveys guide. Further detail on how this dataset was produced is found in the syntax in the appendix. The population data for practical 3 (Pop data – practical 3.dta) contains a population count for each of the 6 casestudy districts distinguishing each combination of age (10, 11, 12,….84,88) sex (male, female) and LLTI (has LLTI, no LLTI). The census tables ST016 and ST065 are used for this data, however, as for the llti_2001 variable in the Practical 2 dataset, the Excel smoothing tool is needed to generate single year counts of the population with and without an LLTI. The smoothed LLTI data is imported straight into Stata to create ‘Pop data – practical 3.dta’ so no syntax is given in the appendix for this dataset.

Page 13: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

9. Appendix

9.1 Syntax used to create HSE data.dta *********************HSE 2000 data preparation************************ clear set mem 200m use "C:\ESDS SAE practical\Data preparation\HSE 2000\UKDA-4487-stata8\stata8\hse00ai.dta", clear *Keep all the variables that will be used in the various models keep sex age loco4 pcare sight hear wt_child wt_65 gor disab2 limitill area *Oldest age in census is 85+ so create a new age variable with the oldest age group 85+ *The 85+ age group is labelled 88 (average age in HSE(2001) of those aged over 85) sort age replace age=88 if age>84 *2000 weights - rename wt_child to child_wt to match 2001 file rename wt_child child_wt *Create a single weighting variable - the mean of the child weights is 1, so replace all other ages with a weight of 1 replace child_wt=1 if age>=16 *Create a single weighting variable - the mean of the 65+ weights is 1, so replace all other ages with a weight of 1 replace wt_65=1 if age<65 *Create the single weighting variable which is: *Equal to child weight under the age of 16. *Equal to wt_65 over the age of 65 *Equal to 1 for all other ages gen weight=wt_65 if age>=65 replace weight=child_wt if age<65 *Drop child_wt and wt_65 drop child_wt wt_65 *Creat a year identifier gen year=2000 rename gor gora save "C:\ESDS SAE practical\Data preparation\HSE 2000 formatted.dta", replace

Page 14: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

***********************HSE 2001 data preparation******************** clear use "C:\ESDS SAE practical\Data preparation\HSE 2001\stata6\hse01ai.dta", clear keep sex age loco4 pcare sight hear child_wt gora disab2 limitill area *Oldest age in census is 85+ so create a new age variable with the oldest age group 85+ *The 85+ age group is labelled 88 (average age in HSE(2001) of those aged over 85) sort age replace age=88 if age>84 gen weight=child_wt drop child_wt gen year=2001 rename gora gor gen gora=. replace gora=1 if gor=="A" replace gora=2 if gor=="B" replace gora=3 if gor=="D" replace gora=4 if gor=="F" replace gora=5 if gor=="E" replace gora=6 if gor=="G" replace gora=7 if gor=="H" replace gora=8 if gor=="J" replace gora=9 if gor=="K" drop gor move gora limitill save "C:\ESDS SAE practical\Data preparation\HSE 2001 formatted.dta", replace append using "C:\ESDS SAE practical\Data preparation\HSE 2000 formatted.dta" label define gora 1 "North East" 2 "North West" 3 "Yorkshire and Humberside" 4 "West Midlands" 5 "East Midlands" 6 "East of England" 7 "London" 8 "South East" 9 "South West" label values gora gora * recode the disability variables *Disability - one of the five disability types gen disab=disab2 recode disab -9 -8 -1=. 0=0 1 2=1 tab disab disab2 label define disab 0 "No disability" 1 "disability" label values disab disab

Page 15: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

*Personal Care gen pcare2=pcare recode pcare2 -9 -8 -1=. 0=0 1 2=1 tab pcare2 pcare label define pcare2 0 "No pcare disability" 1 "pcare disability" label values pcare2 pcare2 *Sight gen sight2=sight recode sight2 -9 -8 -1=. 0=0 1 2=1 tab sight sight2 label define sight2 0 "No sight disability" 1 "sight disability" label values sight2 sight2 *Hearing gen hear2=hear recode hear2 -9 -8 -1=. 0=0 1 2=1 tab hear hear2 label define hear2 0 "No hearing disability" 1 "hearing disability" label values hear2 hear2 *Mobility gen mobility=loco4 recode mobility -9 -8 -1=. 0=0 1 2=1 tab mobility loco4 label define loco1 0 "No loco disability" 1 "loco disability" label values mobility loco1 *LLTI gen llti=limitill recode llti -9 -8 -1=. 1=1 2=0 3=0 label define llti 0 "No LLTI" 1 "LLTI" label values llti llti *age squared gen agesq=age*age gen agecub=age*age*age drop limitill sight hear pcare loco4 disab2 rename pcare2 pcare rename hear2 hear rename sight2 sight drop if age<10 save "C:\ESDS SAE practical\Data\HSE data.dta", replace

Page 16: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

9.2 Syntax used to create Practical 1- task 5 – data.dta clear *Open the population data and then browse its structure use "C:\ESDS SAE practical\Data preparation\Population data reduced.dta", replace *Close the data editor window *Sort the data ready for merging sort gora sex age *Merge the population data with the file containing regional model schedules of disability rates *(this is the file you saved at the end of practical 4). The matching variables are 'age' 'sex' and 'gora'. *The syntax to creat this file is at C:\ESDS SAE practical\Do files\Practical_1.do merge gora sex age using "C:\ESDS SAE practical\Saved practical work\Practical 1 - task 4.dta" *Keep only the observations in the merged dataset that were in both the two files that were merged together *(e.g. data for children aged under 10 is only in the population file and so is dropped here) keep if _merge==3 *Merge the current file with the file containing model schedules of mobility disability for England. *The matching variables are sex and age. Note in order to do this we must first drop the _merge variable (that is created after all merges) and sort the data by age and sex (ready for the next merge). drop _merge sort sex age merge sex age using "C:\ESDS SAE practical\Saved practical work\Practical 1 - task 3.dta" *Sort data sort gora sex age *Keep the variables we need keep zonecode zonename gora sex age pop_2001 pop_2021 pred_MO pred_MO_ENG *Reorder variables slightly

Page 17: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

move gora sex *Save the data for use in practical 5 save "C:\ESDS SAE practical\Data\Practical 1 - task 5 - data.dta", replace

Page 18: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

9.3 Syntax used to create Practical 2 data.dta ***********************Practical 2 data preparation*************** clear set mem 200m use "C:\ESDS SAE practical\Data\HSE data.dta", clear *First open the HSE 2000/2001 data *Generate weighted counts of population gen count=1 gen count_w=count*weight *Generate weighted counts of disability gen mobility_w=mobility*weight gen disab_w=disab*weight gen pcare_w=pcare*weight gen hear_w=hear*weight gen sight_w=sight*weight *Generate weighted counts of disabled populations for numerators in claculations of disability rates sort sex age by sex age: egen MO_num=total(mobility_w) by sex age: egen D_num=total(disab_w) by sex age: egen PC_num=total(pcare_w) by sex age: egen HR_num=total(hear_w) by sex age: egen ST_num=total(sight_w) *Generate weighted counts of population for denominators in claculations of disability rates by sex age: egen denom=total(count_w) *Generate disability rates gen MO_OBS_RT=MO_num/denom gen D_OBS_RT=D_num/denom gen PC_OBS_RT=PC_num/denom gen HR_OBS_RT=HR_num/denom gen ST_OBS_RT=ST_num/denom *Generate sample sizes associated with each rate - this is needed for weights in relational models egen mobilitycount=count(MO_OBS_RT), by (age sex) *The syntax here calculates weight for relational models for each disability type except mobility) *Note several disability types uses weights based on quinary age bands gen ageband=. replace ageband=12.5 if age>=10 & age<=14 replace ageband=17.5 if age>=15 & age<=19 replace ageband=22.5 if age>=20 & age<=24 replace ageband=27.5 if age>=25 & age<=29

Page 19: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

replace ageband=32.5 if age>=30 & age<=34 replace ageband=37.5 if age>=35 & age<=39 replace ageband=42.5 if age>=40 & age<=44 replace ageband=47.5 if age>=45 & age<=49 replace ageband=52.5 if age>=50 & age<=54 replace ageband=57.5 if age>=55 & age<=59 replace ageband=62.5 if age>=60 & age<=64 replace ageband=67.5 if age>=65 & age<=69 replace ageband=72.5 if age>=70 & age<=74 replace ageband=77.5 if age>=75 & age<=79 replace ageband=82.5 if age>=80 & age<=84 replace ageband=88 if age>=85 egen disabcount=count(D_OBS_RT), by (age sex) egen pcarecount5yr=count(PC_OBS_RT), by (ageband sex) egen sightcount5yr=count(ST_OBS_RT), by (ageband sex) egen hearcount5yr=count(HR_OBS_RT), by (ageband sex) *Generate weighted counts of disabled populations for numerators in claculations of disability rates (quinary agebands) sort sex ageband by sex ageband: egen PC_num5yr=total(pcare_w) by sex ageband: egen HR_num5yr=total(hear_w) by sex ageband: egen ST_num5yr=total(sight_w) *Generate weighted counts of population for denominators in claculations of disability rates (quinary agebands) by sex ageband: egen denom5yr=total(count_w) *Generate disability rates gen PC_OBS_RT5yr=PC_num5yr/denom5yr gen HR_OBS_RT5yr=HR_num5yr/denom5yr gen ST_OBS_RT5yr=ST_num5yr/denom5yr gen disabweight=(disabcount*D_OBS_RT*(1-D_OBS_RT)) gen pcareweight=(pcarecount5yr*PC_OBS_RT5yr*(1-PC_OBS_RT5yr)) gen sightweight=(sightcount5yr*ST_OBS_RT5yr*(1-ST_OBS_RT5yr)) gen hearweight=(hearcount5yr*HR_OBS_RT5yr*(1-HR_OBS_RT5yr)) duplicates drop age sex gora, force keep sex age MO_OBS_RT D_OBS_RT PC_OBS_RT HR_OBS_RT ST_OBS_RT mobilitycount disabweight pcareweight sightweight hearweight sort sex age save "C:\ESDS SAE practical\Data preparation\HSE data practical 2 prep.dta", replace clear *Open the population data and then browse its structure use "C:\ESDS SAE practical\Data preparation\Population data all uk districts.dta", clear drop cen_weight hh_prop

Page 20: Health Survey for England: Small area estimation teaching ...doc.ukdataservice.ac.uk/doc/6792/mrdoc/pdf/6792... · head of household in a non-manual occupation. The HSE 2000 and 2001

keep if zonename=="Barnet"|zonename=="Bury"|zonename=="Wakefield"|zonename=="South Bucks"|zonename=="Easington"|zonename=="Stroud"|zonename=="ENGLAND" *Sort the data ready for merging sort sex age *Merge the population data with the file containing regional model schedules of disability rates *(this is the file you saved at the end of practical 4). The matching variables are 'age' 'sex' and 'gora'. merge sex age using "C:\ESDS SAE practical\Data preparation\HSE data practical 2 prep.dta" *Keep only the observations in the merged dataset that were in both the two files that were merged together *(e.g. data for children aged under 10 is only in the population file and so is dropped here) keep if _merge==3 duplicates drop zonename age sex, force *Browse the structure of the merged data file browse sort zonename sex age drop _merge drop pop_2002 - pop_2020 pop_2022 - pop_2031 save "C:\ESDS SAE practical\Data\Practical 2 data.dta", replace