introduction to excel for statistics

38
Introduction to Excel for Statistics

Upload: jey-r-ventura

Post on 20-Jul-2016

18 views

Category:

Documents


4 download

DESCRIPTION

Introduction to Excel for Statistics

TRANSCRIPT

Page 1: Introduction to Excel for Statistics

Introduction to Excel for Statistics

Page 2: Introduction to Excel for Statistics

This booklet was prepared as part of supplementary reading material for a Statistics Concept Course for the staff of Central Bureau of Statistics, facilitated by Biometry Unit Consultancy Services (BUCS) and the Statistical Services Center (SSC) and funded by DFID. You are welcome to use and share this material, as long as due credit is given to BUCS & SSC.

Page 3: Introduction to Excel for Statistics

Introduction to Excel for statistics

Part 1: Getting started

1. Introduction This introductory guide covers the basics of Excel that are needed for data analysis. It is in two parts.

We assume that you are not new to computers. In the first part of this guide we review the basics of Windows and Excel that we will assume. This is provided as pre-course reading for the Phase 1 training given to CBS staff. It may also be used during preparatory computer training.

The second part of this guide looks at the handling of data in Excel. We explain the importance of keeping data in a “list”, and introduce data auditing, filtering, sorting, and calculation among other topics. The data used for illustration are taken from the 1997 welfare monitoring survey, conducted by CBS. A CD is available from this survey that provides the questionnaires, raw data, data dictionaries and reports1.

This guide is used as supporting material for the first session of the Phase 1 training to CBS, Kenya. Later sessions look at the production of tables and graphs in Excel, and at the use of a statistical add-in, for which there are further guides2.

The two parts of the guide follow the approach in the first chapters of the book titled “Data Analysis with Microsoft Excel.”, Berk, K.N and Carey, P. (2000). Reference copies of this book are available for those who need more detailed information. Some materials have used the training notes from the first sessions of the short course, called “Excel for Statistics: What you can and can not do” given by SSC in the UK and BUCS in Kenya.

2. Using Windows There are many different versions of Windows. These notes apply for all versions from Windows 95.

In Fig. 2.1 we show a typical Windows desktop. This is the base from which you open application programs, such as Excel.

1 This is an excellent resource for the training and we are grateful to CBS (Central Bureau of Statistics, Kenya) for providing the information. 2 The guides are called “Good Tables for Excel Users”, “Guidelines for Good Statistical Graphics in Excel” and “Tutorial Introduction to SSC-Stat”.

Page 4: Introduction to Excel for Statistics

Fig. 2.1 The Windows desktop

You must be comfortable weffectively. In the table be

Operation Des

Clicking Moyou

Right-clicking Sam

Double-clicking Pre

Dragging Prespreswhe

Start button and menu

ith the use of your computer moulow we describe the four basic mo

Mouse operations cription

ve the mouse so the tip of the poin want to use. Then press and rele

e as the above, but you press and

ss and release the left button twice

s and hold down the left button. sed, move the mouse across the sn you are at your destination.

Recycle bin

My computer

Taskbar including programs currently open

“Office” software

Statistical software

se to use Windows use operations.

ter touches the element ase the left mouse button.

release the right button.

in rapid succession

Then, with the button still creen. Release the button

Page 5: Introduction to Excel for Statistics

If you need practice, then here is a short exercise.

1. Double-click3 on the My Computer icon on the Windows desktop. This will open the My Computer window, as shown in Fig. 2.2.

Fig. 2.2 Practicing using the mouse

Title bar

2. Click the Minimisebutton on the taskba

3. Click on the My Cothe My Computer w

4. Click the Maximisescreen.

5. Click the Midsize boriginal size.

6. Move the mouse posee Fig. 2.2. Drag t

3 With some versions of Window

Mimimise, maximise and close buttons

My Computer icon on desktop

My Computer icon on the taskbar

button. Thisr.

mputer icon indow to the

button. The

utton. The M

inter, so it is he window to

s you only hav

Corner or side for dragging

will reduce the My Computer window to a

on the taskbar, see Fig. 2.2. This will restore desktop.

My Computer window now fills the whole

y Computer Window is now restored to its

on the title bar of the My Computer window, the bottom right-hand corner of the screen,

e to single-click to open the window.

Page 6: Introduction to Excel for Statistics

and then release the mouse button. Then drag the Window back to roughly its original position.

7. Point to the lower left-hand corner of the My Computer Window. The pointer will turn into a double-headed arrow. Drag the window corner down, then release the button. This has enlarged the Window. Return it to roughly its previous size4.

8. Repeat this operation, but start with the pointer at an edge of the Window, rather than a corner. This changes the shape of the window.

9. Click on the Close Button. This closes the My Computer window.

If you are a relative beginner, then take time to practice these operations, both in the type of exercise above, and when you start using an application, like Excel.

Fig. 1.3 Windows Help

Help command Help window

If you need more information about Windows, then take time also to access the On-line Help system. This provides tutorials and other information to help you to use Windows effectively. We show an example of the Windows Help in Fig. 2.3.

4 When you are working with more than one window, you will often wish to arrange them conveniently on the desktop.

Page 7: Introduction to Excel for Statistics

Beginners to computing often look for books they can read to gain experience. If you are in this situation we urge you mainly to practice with on-line help and information instead. The best way to gain experience in computing is simply to use a computer!

One of the key features of using applications in Windows is that the way you use all applications is basically the same. So look for the common points, like the way to use Help, the way to open and save files, and so on. Then with each new application, all you have to do is look for the new features, and they are probably why you chose that application.

Once you gain in experience you will also find that you do not need to wait for a formal training course before starting to use a new application. Ask the presenters of the next course that you attend what they do. You will find that they often taught themselves, using books and the on-line information, for many of the applications that they are now trainers for.

3. Spreadsheets Excel is a piece of software. It is an application to evaluate and present information in a spreadsheet format. Spreadsheets were originally developed for simple business uses, such as a financial report or an inventory management.

Excel is now so flexible that it is used for many other applications, including data analysis. You can use Excel to enter data and then for simple analyses, such as the construction of the appropriate tables and graphs. The results can then be transferred to other applications, such as Word for including in a report, or Powerpoint for including in a presentation.

To start Excel you probably have an icon on the desktop that you can click, just as you did to open the My Computer window5. If you do this the screen will look roughly as shown in Fig. 3.1.

5 Otherwise press on the Start button, and then on Programs and then look for Excel.

Page 8: Introduction to Excel for Statistics

Fig. 3.1 A typical Excel window

Active cell

Worksheet

More sheets, making up a workbook

Column headings

Row headings

Formula bar

Scroll bars

Toolbar

Menu bar

The Excel window, shown in Fig. 3.1 is where you will analyse your data. If you are not experienced in Excel, take time to note the elements shown in Fig. 3.1.

You instruct Excel by using the menus or the toolbars. For example, to open an Excel file, click on the File menu, and then on Open.

Excel also offers toolbars, and they provide a one-click access to many of the same commands as are in the Excel menus. For example, in Fig. 3.1, clicking on the icon that looks like an open file, is the same as using the menu.

Excel documents are called workbooks. Each workbook is made up of individual worksheets. Each spreadsheet can have up to 255 columns, labelled A, B, C, and so on. It can have up to 65,000 rows6. A workbook can have up to 255 worksheets.

6 These are the current maxima, when this guide was written, February 2003. If these dimensions are limiting, you should probably be using other software anyway. Most statistics packages do not have such

Page 9: Introduction to Excel for Statistics

4. A sample workbook

For illustration we use an Excel workbook that contains data from the 1997 household survey conducted in Kenya by the Central Bureau of Statistics, CBS. We have just used the data from a single district.

In this section we review how you can open a workbook and navigate round it.

We assume that you are in Excel7. If so then:

1. Click the Open button on the Excel toolbar. The Open dialogue box appears, looking something like Fig. 4.1.

Fig 4.1 The Open Files dialogue

2. Click on the drop-down list to locate the folder

3. Either double-click on the workbook to open, oOpen button. The selected workbook, see Fig. 4

4. There are three ways to move from sheet to sheeach of these ways.

a. Click on the tab at the bottom of the she limits. For large surveys the limit of 256 columns is sometimes a questions than that. 7 If Excel is not already open, then an alternative is to use Windowclick on it, to open Excel with the file ready for use.

Click to open the selected file

List of files in the current folder

Selected file

Drop-down list with the folder tree

with the Excel file to open.

r click and then click on the .2, will open into Excel.

et in the workbook. Practice

et, to move to the one you like. restriction, because they have more

s Explorer. Find the file and double-

Page 10: Introduction to Excel for Statistics

b. Click on the worksheet navigation buttons.

c. Right-click on any of the worksheet navigation buttons, and select from the drop-down list that will appear.

Fig. 4.2 Open workbook in Excel

Active sheetOther sheets in the workbook

Worksheet navigation buttons

5. Within any worksheet practice also with the vertical and horizontal scrollbars. Check that you can answer the following questions:

a. From the worksheet with core information for the head of the household, which we have called “corehead”, you will see that the data are from a single district. Go to the sheet called “codes” to find which district this is.

b. Use the sheet with the codes to check how many districts were included in this survey.

c. How many rows of data are there in the three sheets called “CoreHead”, “Expenditure” and “Agriculture”. If there are not the same number, can you think why this should be?

d. Excel names the columns of data as A, B, C, etc. What is the name of the last column with data in the “CoreHead” sheet? So how many columns of data are there in this sheet?

e. How many rows of data are there in the sheet called Coreperson? Could you use this information to give you an idea of the average number of people per household in the district?

Page 11: Introduction to Excel for Statistics

5. Basic workbook operations We continue to use the workbook that was introduced in the previous section. In this section we review how you can add and delete sheets. We also describe how data can be copied and moved etc.

In this section and beyond it is important that you understand the logic of the task that ask you to do. Do not simply try to remember the route to do the task. In Windows there are often different ways of doing the same task and these will soon become automatic for the common tasks.

The first task is to add a new sheet to the workbook. This is so that you can then copy some data to this new sheet for your working.

1. Use Insert > Worksheet to insert a blank worksheet.

2. In the new worksheet, go to the tab at the bottom of the screen and right-click. You should get a popup menu as shown in Fig. 5.1.

Fig. 5.1 Popup menu gives basic operations on the current sheet

3. Click on the option to Rename and call the sheet “Temp” to signify you will use

it for your temporary work.

4. We will now practice copying and pasting. Return to the sheet called CoreHead. Make the top corner cell, <A1> the active cell and then drag the mouse down to the cell <D14>8.

5. If that was easy, continue with this part. Otherwise go to 6. Press the CTRL key. With CTRL still pressed, put the cursor on the cell <F1> and drag the

8 If things go wrong, then use the <Undo> button and try again.

Page 12: Introduction to Excel for Statistics

mouse down to the cell <F14>, see Fig. 5.2. Then use <Edit> <Copy>9 to copy the selected parts of the sheet to the clipboard.

Fig. 5.2 Selecting parts of a worksheet

The part where you used CTRL and then dragged

The cells that were dragged

6. Return to the new sheet that you called “Temp” and use Edit > Paste10.

7. Now we show a different way of pasting. Start by using the Undo button to remove the contents you have just pasted. Then use Edit > Paste Special, rather than just the simple Paste. The menu is shown in Fig. 5.3. Here use Paste Link as shown.

9 Or use the Copy on the toolbar, or press <Cntl> <C>. 10 Or us the Paste on the toolbar, or use <Cntl> <V>.

Page 13: Introduction to Excel for Statistics

Fig. 5.3 The Paste Special dialogue

Fig. 5.4 Linking information across worksheets

This cell is linked and not just copied as the number 23

Page 14: Introduction to Excel for Statistics

8. Explain why the Paste Link might be a useful feature in your future use of Excel11. Do you think it might have a down-side12?

We now show one way of moving a range of cells.

1. In the sheet called Temp, select the data in the first four columns.

2. Move the mouse to a border of the area that is selected. The pointer will change from a + to an arrow.

3. Drag the selected area down 3 rows and across one column, as shown in Fig. 5.5.

4. Click on any cell to deselect the cell range.

5. When using Windows there are often different ways of doing the same thing. Can you think of another way of moving a range of cells13? Try to do this by pressing the Undo button, and then repeating the move.

Fig. 5.5 Moving a range of cells

Border of new destination

Pointer is an arrow

Tooltip to show destination

You should regularly save your work, when you make changes to a workbook. Excel, and other Windows applications, offer you two options for saving. You can save using

11 It means the data are not duplicated. Any correction in the original data will automatically change the working copy. 12 These links sometimes can slow the processing of the data, particularly with large files. 13 For example you could try cutting the selected range, and then pasting them to their new destination.

Page 15: Introduction to Excel for Statistics

the Save command, or shortcut, which keeps the name of the file. Or you can use the Save As command, which allows you to give your new work a different name.

1. Click on File > Save As to open the corresponding dialogue box.

2. Check which folder you are saving your file in, see Fig. 5.6. If necessary change the folder to the one you wish to use.

3. Give the file a new name, and click on the Save button.

Fig. 5.6 The Save As dialogue

Folder in which the file will be saved

New name for the file

Before finishing this section we suggest that you delete the temporary worksheet that you created for your working.

1. Make sure that you are on the sheet called Temp.

2. Then use the edit menu and then delete sheet. Excel will ask you if you really want to continue. Say that you do.

3. Note that the Undo button on the toolbar is now inactive. This is an operation that can not be undone.

4. Can you think of another way you could have done this task?14

14 You can right-click on the tab at the bottom of the sheet. This gives a popup menu and one option is to delete the sheet.

Page 16: Introduction to Excel for Statistics

6. Excel add-ins Excel’s capabilities can be extended through the use of special programs called add-ins. These are then used in the same way as Excel itself. Some add-ins are supplied with Excel, but are not automatically installed. Others are made available by different groups.

We check first that Excel’s own Data Analysis Toolpack has been installed.

1. Click Tools > Data Analysis, if you can find the command called Data Analysis. Otherwise see the steps below. If the command is there, the menu shown in Fig. 6.1 will open. Click on Cancel, because we do not need these features now.

Fig. 6.1 The data analysis toolpack.

2. As an exercise15, click on Tools > Add-Ins. The dialogue shown in Fig. 6.2 is

shown.

Fig. 6.2 The add-in dialogue (the entries may be different on each machine)

15 You will have to do this if you could not find the Data Analysis command on the Tools menu.

Page 17: Introduction to Excel for Statistics

3. If you did not find the Data Analysis Toolpack earlier, then the entry in Fig. 6.2 will not be ticked. In that case, tick the entry and then press OK. Otherwise press Cancel.

The second task is to install the add-in called SSC-Stat. This may already have been done earlier, in which case you will have an extra menu, called SSCStat every time you load Excel.

If you do not have this extra menu then the following tasks will install the SSC-Stat add-in.

1. The SSC-Stat add-in must first be installed on your computer. It is available on CD, or can be downloaded from the web site www.reading.ac.uk/ssc. Once downloaded click on the file and it will install. You do this outside Excel.

2. Then go into Excel and use Tools > Add-ins as before. Now use the Browse button on the dialogue and move to the directory where the file called SSC-Stat.xla16 was copied.

3. Double-click on the file called something like SSC-Stat.xla. Then click OK on the Add-ins dialogue and the extra menu should appear.

Add-ins are a very powerful feature of Excel. They enable third-parties to add to the facilities provided by Excel, or to tailor existing facilities for particular users.

The SSC-Stat add-in is designed to encourage good statistics with Excel. It has facilities to support the use of Excel for data manipulation, graphics and simple statistics.

Fig. 6.3 SSC-Stat, showing the graphics menu.

It is used just like any other Excel menu as indicated in Fig. 6.317. 16 If you do not remember where the add-in was copied, then you can search for the file. Search for all files beginning with ssc-stat, e.g. ssc-stat*.xla, because the name may include the version number, for example ssc-stat v2.0.xla.

Page 18: Introduction to Excel for Statistics

7. In conclusion If you are a relative beginner the Excel, then we have introduced some of the key aspects that apply whatever field of application you use for the software.

If you have used Excel before, then most of the material in this chapter should have been familiar. However, Excel is a large package and existing users may have found some points that were new, and will help them to use the software more effectively.

One feature of Windows is that there is usually more than one way of accomplishing the general tasks that have been introduced in this chapter. It is much more important to understand the logic of the task, than to remember the steps associated with one way of doing the work. Always try to use logic and not memory.

A key feature of Windows is that once you have mastered any application, then others have some similar features. So, once you are confident in your use of Excel, this should also help with your use of other software.

17 SSC-Stat has its own tutorial. It is loaded automatically, when SSC-Stat is installed,. It and other help for the add-in is available from within SSC-Stat. Use SSC-Stat > General for access.

Page 19: Introduction to Excel for Statistics

Introduction to Excel for Statistics

Part 2 Handling Data

8. Introduction In this part of the guide we look at the features of Excel that are particularly needed when using Excel for handling and analysing data. Excel is an “all-purpose” tool and to use it effectively for statistical work, you have to work with the same discipline that would be automatic if you used a statistics package. We explain here what we mean by this. For illustration we continue with the data set introduced in part 1 of this guide.

One element of “good practice” is to avoid making any worksheet too complicated. It is better to have workbooks with sheets, that are simple and with names that show clearly what they contain. Then it is easier to find a particular set of data, or a table of results.

In the next section we describe the importance of keeping your data in what Excel calls a “List”. Then we show briefly how data can be entered and checked in Excel.

The term “meta-data” is used for the information about the data themselves, for example the units of measurements, or the explanation of each category of a column that is coded 1 and 2. We consider how names are used in Excel, and also the value of the feature of adding comments to a cell.

We then describe two powerful features that work only when the data are in the form of a list18. The first is to Filter, or to examine a subset of the data, and the second is the facility to sort columns.

In the remainder of this part of the guide we look briefly at Excel’s system for calculating functions and formulae, at protecting data and at how to cope with missing values.

9. Structure of your data For effective statistical work you need an efficient structure for holding, managing and analysing the data. This is usually obtained through organising the data in rectangular arrays.

In an array, usually

Columns Contain different measurements or information about the ‘individuals’, or ‘units’ being studied: the ‘Variables’

Rows Contain all the information collected about a single ‘individual’: the ’cases’

18 Even more important for survey work is that Excel’s powerful facility for tabulation requires data to be in the form of a list.

Page 20: Introduction to Excel for Statistics

The first row of the list contains labels for the columns. As an example we show part of the data from the 1997 household survey. There are four columns shown, and just 10 of the four hundred households in the district used.

Fig. 9.1 Data from the CBS survey is in list form.

These rectangular structures are called LISTS in Excel, they have the advantages that they:

• Allow the use of a range of database facilities.

• Make it easy to exchange data with other software such as databases and statistical packages.

• Give structure to the way data are handled. In particular the use of lists encourages the user to store information of the same kind within columns19.

In the following table we show Excel’s own Guidelines for creating a list on a worksheet as they appear in the Help.

19 So a column with numbers should not contain text, and a text column does not include numbers that will need summarising.

Page 21: Introduction to Excel for Statistics

In this part of the guide we will see many facilities of Excel that rely on the data being organised in a list. This starts with Excel’s facilities for data entry, validation and auditing, that we consider in the next section.

Microsoft Excel has a number of features that make it easy to manage and analyse data in a list. To take advantage of these features, enter data in a list according to the following guidelines.

List size and location

• Avoid having more than one list on a worksheet. Some list management features, such as filtering, can be used on only one list at a time.

• Leave at least one blank column and one blank row between the list and other data on the worksheet. Microsoft Excel can then more easily detect and select the list when you sort, filter, or insert automatic subtotals.

• Avoid putting blank rows and columns in the list so that Microsoft Excel can more easily detect and select the list.

Avoid placing critical data to the left or right of the list; the data might be hidden when you filter the list.

Column labels

• Create column labels in the first row of the list. Microsoft Excel uses the labels to create reports and to find and organise data.

• Use a font, alignment, format, pattern, border, or capitalisation style for column labels that is different from the format you assign to the data in the list.

• When you want to separate labels from data, use cell borders and not blank rows or dashed lines to insert lines below the labels.

Row and Column Contents

• Design the list so that all rows have similar items in the same column.

• Don't insert extra spaces at the beginning of a cell; extra spaces affect sorting and searching.

• Don't use a blank row to separate column labels from the first row of data.

10. Entering data For large surveys a system that is specially designed for data entry and verification is usually used, rather than Excel. Hence we consider data entry only briefly here20.

To see the sort of facilities for data entry in Excel you could try the following exercise:

1. Make a new sheet (as described in Part 1 of this guide).

20 Excel is commonly used for entering the data from small studies. We have a guide called “Disciplined Use of Spreadsheets for Data Entry”, that describes the facilities available in Excel. This is produced in a variety of forms and can be downloaded from www.reading.ac.uk/ssc if copies are not available locally.

Page 22: Introduction to Excel for Statistics

2. Copy the first 21 rows and 8 columns from the sheet called Expenditure to this new sheet.

3. Use Data > Form to produce the type of data entry form shown in Fig. 10.1.

Fig. 10.1 Data entry form on a subset from the expenditure survey

4. Use this form to explore the 20 records that you copied across.

5. Press New and enter the next record, which is 2, 1, Y, 23, 63,3, 4, 3.65.

Excel also offers facilities for validation and auditing. To show how these are used we assume that the number of people in a household is usually between 2 and 5. To set up a validation rule:

6. Mark the whole of the column G, called MEMBERS. Click on the letter G to select the whole column.

7. Use Data > Validation and complete the dialogue as shown in Fig. 10.2.

Page 23: Introduction to Excel for Statistics

Fig. 10.2 Setting a validation rule

We could use checks like this while we are entering data. They are also useful when auditing data that have already been entered, as we illustrate below.

8. Use Tools >Auditing > Show Auditing Toolbar and click on the option to Circle Invalid data, as shown in Fig. 10.3.

9. Once you have seen the data that are outside the range you can click on the option to hide the invalid data again.

Page 24: Introduction to Excel for Statistics

Fig. 10.3 Auditing data that were already entered.

Finally in this section we look at how a column of data can be added, when it is a regular sequence.

1. On the new sheet where you copied the 20 records, go to the top of the next free column. This is probably column I. Give cell I1 the name Index.

2. Type the number 1 in the next cell down, cell I2, and 2 in cell I3.

3. Select the range I2:I3. Notice the small black box at the lower right hand corner of the selected cells. This is called the fill handle.

4. Move the mouse over the fill handle, so the pointer changes from a fat–plus, to a simple +.

5. Click the mouse button and drag down to the end of the data. In Fig. 10.4 we show the series after we are halfway down the column.

Page 25: Introduction to Excel for Statistics

Fig. 10.4 Entering a sequence Fig. 10.5 Finding the dialogue to enter patterned data

6. Now clear the data you have just filled, because we will show an alternative way that is more flexible.

This second way is also an excuse to show one element of the data analysis tool-pack.

7. Use Tools > Data Analysis. This will give the dialogue shown in Fig. 10.5. Use the option to generate random numbers21, to give the menu shown in Fig. 10.6.

21 You are not going to generate random numbers, but produce data with a regular pattern.

Page 26: Introduction to Excel for Statistics

Fig. 10.6 Generating a regular sequence

8. In the dialogue in Fig. 10.6, specify the distribution as Patterned. Complete the dialogue as shown in Fig. 10.6 and press OK.

This facility to generate patterned data is very powerful. Also the dialogue gives more control than you would have by dragging the mouse, particularly when the columns of data are long, as they often are with large surveys.

11. Comments and names Adding comments to cells is a very useful feature of Excel. As a simple exercise we will add comments to some of the column names.

1. In the worksheet that you made in the last section, go to the cell H1, which is called ADULTQ.

2. Right click to give the popup menu and take the option called Insert Comment.

3. The dictionary states that this column is the number of adult equivalent people in the household. So add this comment, as shown in Fig. 11.1.

Page 27: Introduction to Excel for Statistics

Fig 11.1 Adding a comment to a cell Fig. 11.2 Options with comments

4. When you return to this cell, the comment will be shown as in Fig. 11.1.

5. When you are in a cell with a comment, right-click again. The popup menu is given in Fig. 11.2 and shows that you can now edit, or delete the comment.

6. Add comments to some more cells. They can also be cells containing data that you think need some further explanation.

7. If you have a large file with comments, it is sometimes useful to jump from comment to comment. How could you find out if this is possible in Excel22?

When the data are in an Excel list, then the first row gives the names of the columns. These names can simplify the use of Excel for simple statistics. They are used by Excel, and by various add-ins, including SSC-Stat.

1. Go to the sheet where you copied the data.

2. Use SSC-Stat > Analysis > Descriptive statistics23.

3. The dialogue is shown in Fig. 11.3. Notice, from the variables that are listed, that the dialogue has picked up the names of the columns.

22 It is possible. If you ask for Excel’s Help on “Comment” it gives many options including one to “Select Cells that contain comments”. Then follow the instructions given. 23 This exercise assumes that SSC-Stat has been added to Excel. If not, then just read the exercise.

Page 28: Introduction to Excel for Statistics

Fig. 11.3 SSC-Stat dialogue uses column names Fig. 11.4 Results on a new worksheet

4. Select the ADULTQ column, as shown in Fig. 11.3 and press OK.

5. The results are shown in Fig. 11.4. Notice that an extra sheet has been generated for the results and given the name, “DescStat”.

The add-in, SSC-Stat makes an effort to work with column names. It is also useful to use column names generally in Excel. So we will register the names of all the columns in this sheet.

1. Go back to the sheet containing the 20 records you copied over.

2. Select all the data, from A1:I22.

3. Use Insert > Name >Create.

4. Check the dialogue is as in Fig. 11.5, so the names will be taken from the top row of the data. Then press OK.

Page 29: Introduction to Excel for Statistics

Fig. 11.5 Creating names from a selected array

5. To see the names have been defined, press the down arrow on the pull-down

list. (It is the cell marked A1 in Fig. 11.5).

We have therefore seen that sheets can be named, and columns can be named. We will use these names in later sections. It is sometimes also useful to name a rectangle, with the data, as we show in the next exercise.

1. Select the data on the sheet, it they are not already selected.

2. Use Insert > Name > Define and name the rectangle as Sample.

3. Go to another sheet, such as the sheet with the codes, see Fig. 11.5 and use the pull-down name box.

4. Select the name Sample as shown in Fig. 11.5 and you return straight to the rectangle that you named earlier.

Page 30: Introduction to Excel for Statistics

Fig. 11.5 Using the name of an array

5. In this case the rectangle contains all the information on the sheet, but this is not

always the case. With the data selected, move the rectangle down, so it starts on the 4th row.

6. Now add some information about the data in the top rows, as is often done. An example is shown in Fig. 11.6.

Fig. 11.6 Adding metadata to an Excel worksheet

7. In Fig. 11the text. Y

24 If the formatting toformatting toolbar.

Used to colour the cells

The data have been moved down three rows

The column names are coloured and in bold

.6 we have also used the opportunity to colour the cells, and some of ou could do similarly, but this is an optional extra24.

ols are not visible in your version of Excel, then use View > Toolbars and tick the

Page 31: Introduction to Excel for Statistics

8. Now move to another sheet and again select the name Sample as before. You will return to the same sheet, with the rectangle selected as before.

9. If you added the rows and coloured the cells, then you could practice returning the sheet to its original form25.

12. Selecting subsets (Filters) Filters display a subset of cases in a list. Excel help describes how to use AutoFilter roughly as follows:

a) Click a cell in the list you want to filter.

b) Use Data > Filter > AutoFilter.

c) To display only the rows that contain a specific value in one column, click the arrow at the top of the corresponding column.

d) Click the appropriate value.

e) To apply an additional condition based on a value in another column, repeat steps c) and d) in the other column.

f) To filter the list in a slightly more complicated way, click the arrow at the top of the column, and then click Custom.

You can apply up to two conditions to a column with AutoFilter. If you need to do something more complicated, you can use advanced filters.

In the following exercise we just select the households with 6 people or more.

1. Click anywhere in the data and use Data > Filter > AutoFilter.

2. Click on the arrow at the top of the column called MEMBERS and then on the option called Custom, see Fig. 12.1.

25 If you are still practicing basic Windows operations, then try removing in different ways. For example simply marking and pressing <Delete> removes the contents, but not the rows themselves. Try Edit > Delete for more options. Then use Undo and try marking the whole of the first 3 rows through Clicking and then <Shift> <Click>, and then Edit > Delete. And there is still the option of marking, and then a right click to explore!

Page 32: Introduction to Excel for Statistics

Fig. 12.1 Applying a custom filteer

3. Complete the resulting dialogue as shown in Fig. 12.2.

4. Fig. 12.3 then shows there were just 5 households that satisfied this criterion.

Fig 12.3 The custom filter dialogue

Fig. 12.4 The filtered data

Selected rows

Arrow is blue to show column used in filter.

Page 33: Introduction to Excel for Statistics

If you need to apply a more complicated condition, and wish to avoid using Excel’s advanced filters, then an alternative is to use Excel’s powerful calculation features and make a new column that corresponds to the condition you need. We illustrate below, and look in more detail at Excel’s functions in Section 14.

5. Use Data > Filter > AutoFilter again, to turn off the filter.

6. Go to the top of the next empty column, probably column J. Name the cell J1 as “SixPlus”.

7. Go to the cell J2 and type the formula “=(MEMBERS>=6)” and then press <Enter>. See Fig. 12.5

Fig. 12.5 Entering a logical calculation

8. Drag down, as shown in Fig. 12.5, so the whole column is marked.

9. Now try filtering again, but just selecting the column called Sixplus to be TRUE.

The value of this approach is that the condition can be made more complicated, and it still becomes easy to use Excel’s simple AutoFilter. For example the condition in step 7 above could be MEMBERS>=MEDIAN(MEMBERS) to show which families have more than the median household size26.

Be aware that filters only affect the way data are displayed, not the actual contents of the list. Any statistical analysis using Excel own statistical tool-box will still include the data filtered out. To analyse just the subset that you see it is necessary to copy the filtered list into a new worksheet and then produce the required analysis. This extra step is not needed if you use the SSC-Stat add-in. Then “what you see is what is analysed”. 26 This shows again the value of using names, because the formulae are so much easier to understand. A further example is the condition MEMBERS>=PERCENTILE(MEMBERS,80), to identify the top 20% of household sizes and so on.

Page 34: Introduction to Excel for Statistics

13. Sorting Perhaps the data would be easier to check if the respondents were sorted on household size.

1. Make the active cell somewhere in the column called MEMBERS and use Data > Sort.

2. Sort in descending order, see Fig. 13.1. Press OK to give the data sorted as shown in Fig. 13.2.

Fig. 13.1 Sort dialogue Fig. 13.2 Data sorted on household size

3. In Fig 13.2 you can see clearly that there are up to 7 people in the households, and that 5 households have six people or more.

4. Now put the data back into their original order27.

14. Formulae and functions To illustrate the use of Excel for calculation we consider the expenditure of households on primary education. In the Expenditure sheet the columns QD5.1 gives the expenditure on fees, while 5.4, 5.7, 5.10, 5.13 and 5.16 give expenditure on uniforms, books, transport, food and harambee (fund raising).

Our task is first to calculate the total expenditure on primary education, and then the percentage on books.

1. On the expenditure sheet, select the data, and then use Insert > Name > Create to name the columns of data.

2. In the first row of the next sheet (cell CB1 for us) give the name “Primary”.

27 Hint: Luckily you constructed an index column, see Fig. 13.2, which used to be in ascending order. So you can sort on this column. Alternatively, as you have just done the sorting, you could use Undo.

Page 35: Introduction to Excel for Statistics

3. In the second row enter “=qd5.1+qd5.4” and press <Enter>. We got 1300 for the sum.

4. Now complete the formula to add the other columns.

5. Then drag to the bottom of the data. This should give the extra column, see Fig. 14.1

6. The next calculations will be simpler if we “register” the name of the new

column. So select the whole column and use Insert > Name > Define.

7. Now go to an empty cell. Then we will use the sum function, so type “=sum(qd5.4)”.

8. In the next cell type “=sum(Primary)”.

9. Then, in a third cell, divide one by the other.

10. Finally do the same again, but in a single calculation, as in Fig. 14.2. So overall, 31% of the expense of primary education was on books.

Fig. 14.2 Calculations on expenditure for primary education

Next we mprimary ed

Percentage on books

ight be asked what proportion (or percentage) of households spent money on ucation.

Page 36: Introduction to Excel for Statistics

11. One way to do this is to use the COUNTIF function. You could try using the function wizard for this, or choose an empty cell and type “=COUNTIF(Primary,”>0”)”

12. Then also calculate the count overall, i.e. “=COUNT(Primary)”

13. Finally the percentage is given by dividing one by the other. We got 100*199/402 = 49.5%.

15. Missing values Missing values are common when analysing data. They are pieces of information that are related to the variable of interest but that are of a different nature from the main bulk of the data. In surveys they can be because the respondent:

• Refused to reply

• Did not know

• The question was not applicable

In general they can indicate that information was not available, something was impossible to measure or a mathematical calculation was impossible, such as 2/0. An example is in Fig. 14.2, where 0/0 was given as #DIV/0!

The usual practice is to assign specific codes to different types of missing values. These codes are often numbers that do not occur in the variable. For example, in Fig. 15.1 we see that the number of acres of agricultural land includes missing values, coded as 999.90. From Fig. 15.2 we see that there were 4 missing values in this district.

Fig. 15.1 Missing values in the survey Fig. 15.2 Showing 4 cases were missing

Unlike statistics packages, Excel does not have specific facilities to handle missing values. The user has to make sure that any analysis treats missing values appropriately. One approach is to leave the cells blank, and insert a comment to indicate the reason for the missing value.

Page 37: Introduction to Excel for Statistics

16. In conclusion Some readers may be surprised that this guide is mainly concerned with data manipulation, rather than analysis. However we find that organising the data is often the most time-consuming part of data processing.

At the start of this part of the guide we stated that individual sheets should remain simple. One aspect of this simplicity is that results are best kept on separate sheets to those with the data28. We did not do this ourselves, when examining to expenditure on primary education, in Section 14, see for example, Fig. 14.2. This is because, for these small calculations it was more convenient to use a part of the same sheet. However, now those calculations are done the results can easily be cut and pasted to a new sheet, as we show in Fig. 16.1.

Fig. 16.1 Putting results on a separate sheet

Sometimes you may want to show the actual values, perhaps sorted, and with summary results for different groups, as separate rows on the sheet. In this case wee suggest that you first make a copy of the data sheet, and then mix data and results on the copy. Then there is always a master sheet containing just the data.

Excel is often used for tabulating data and for graphical work. These activities have their own guides.

28 This happens automatically when a statistics package is used for the analysis. They normally have a spreadsheet-type display for the data, with a separate window for the results.

Page 38: Introduction to Excel for Statistics

Biometry Unit Consultancy Services College of Agriculture & Veterinary Sciences

University of Nairobi, Kabete Campus

E mail: [email protected]

Website: http://www.uonbi.ac.ke/acad_depts/BUCS

IN COLLABORATION WITH

Statistical Services Center School of Applied Statistics, The University of Reading

Reading RG6 6FN, UK

Website: http://www.reading.ac.uk/ssc