uts library · different tasks within the one overall project. likeany document – make sure you...
TRANSCRIPT
UTS
CRI
COS
PRO
VIDE
R CO
DE 0
0099
F
UTS Library
Easy Data Analytics in Excel
For Mac
Table of Contents Finding Data ........................................................................................................................................ 2
Putting Data into Excel ........................................................................................................................ 3
Making a ‘working’ data sheet. ........................................................................................................... 5
Cleaning the data ................................................................................................................................ 5
Making a table .................................................................................................................................... 7
Creating a graph .................................................................................................................................. 7
Filtering a Chart ................................................................................................................................... 9
Part 2 - Top oil consuming countries ................................................................................................ 10
Extracting data from a pdf ................................................................................................................ 11
Working with the Oil Consumption Data from BP ............................................................................ 13
Using text filters ................................................................................................................................ 16
Making a Line Chart........................................................................................................................... 17
Creating an average .......................................................................................................................... 19
Charting average oil consumption .................................................................................................... 21
Standardizing units of measurement with a formula ....................................................................... 21
Introduction – Researching a Commodity (Oil) Say we are researching oil and we want to know who is producing the oil and who is consuming it. Maybe we would like to know who the top producing countries are and what the top consuming countries are.
Part 1 – Top Oil Producing Countries
What is Excel?
Excel is a spreadsheet program, that allows you to organise and manipulate data, including calculations, graphing and charting tools. An Excel file is called a workbook, and within each workbook you can have one or more sheets. Sheets are like tabs that let you manage different tasks within the one overall project. Like any document – make sure you regularly save your workbook somewhere you can find it on your computer.
Finding Data The website list provided on UTS Online, contains a bunch of sources for you to consult. For oil, websites like the International Energy Agency would probably be able to answer this question, but you can search on Google too. The image seen is for the search - top oil producing countries
For our example in this workbook we’ve chosen to use - Crude Oil production from the CIA world factbook. The World Fact Book is a good solid source, as it’s produced by the US Government. The information contained within the link is pretty current too.
Putting Data into Excel There is a download link above the oil production data but in this case it creates a not terribly useful text file. So instead of using that we’ll try to copy and paste from the webpage
itself. So, the first thing we need to do is highlight all the text in the table (including
the headings)
Open Excel, and save the workbook to your desktop. At the bottom of the workbook double click on the name of the sheet and rename it to ‘Oil production raw’
Click into the cell at the top left of the sheet at A1. and then paste your data.
After you cut and paste into Excel it looks like this:
Paste in the link of your data source beneath the table, so that you know where you found it.
Making a ‘working’ data sheet. It is good practice to have a separate sheet for your raw data and then another sheet where you work with the data. That way if you accidentally delete or change the data, you can always return to the original source.
To do this go down to the oil production raw sheet and right click – then choose move or copy.
Check the ‘create a copy’ box and then rename the new sheet oil production working
Cleaning the data We have some unnecessary information in this data set. So we need to remove it before we can make our chart. In our oil production working sheet let’s remove column A (ranking). To do this click in column A. Then right click and choose delete. Check the box saying entire column.
Now our sheet looks like this:
On a mac you may find the formatting hasn’t been cleared, so for example all the countries might still be underlined. To remove formatiing choose clear>formats.
Now our sheet looks like this:
Making a table If you want to sort the contents of this sheet or make a graph or chart out of the data, it’s a good idea to turn the data into a table.
To highlight the entire table do command-A.
You can also do command-shift-down and then command-shift-right to select data if you prefer. After you select data you can check the corners of the doc by doing ctrl-fullstop.
When you have highlighted the data do insert>table.
Ideally the header row box will be ticked:
Now my data looks like this
Creating a graph Let’s graph the top 20 in a bar chart or something you can. Highlight the top 20 rows and select insert > chart
Now if you want to graph say the top 20 in a bar chart or something you can. Just highlight the top 20 and do insert > chart
and choose the column graph
Then choose 2d clustered columns
That produces a graph that looks like this
We can rename the graph by double clicking on the title. We might do this to make the data more understandable. Something like leading oil producers BBL/day.
Filtering a Chart By using chart>source data you can toggle elements of the chart on and off.
Say you want to compare the biggest producers on each continent. You can untick the countries that don’t apply using the remove button leaving yourself with the countries that do.
To change colour of the chart, first click on the columns to highlight them
Then use chart layout to select colours like this
Part 2 - Top oil consuming countries Find dataset: top consuming countries Google Search – top oil consuming countries -> Click on the Wikipedia article.
Mousing over the superscript numbers will give you links to their sources. Click on number 2 – the Statistical Review of World Energy and open that link up.
This is a report produced by BP (a company). Information provided by companies needs a little extra scrutiny before being used but this is the kind of numerical data where there isn’t much ambiguity, and there are citations beneath the tables to describe where the data is being sourced from. There is some good oil consumption data on p.9 of this report:
However if we try and cut and paste from the PDF the data doesn’t come out properly and is thus unusable, so we need to figure out another way to grab this data.
Extracting data from a pdf The first thing we need to do is download the pdf. Use the little down arrow and computer on the document to do this –
Once it’s downloaded go to your downloads folder and open the file up.
This is a 48 page document, but we only want one page for our purposes. So what we need to do is extract page 9 from the pdf, because we don’t want 48 pages worth of excel tables, especially when a lot of it is just text.
So, to extract page 9:
First thing to do is download the pdf to your computer. Now use this view option to select thumbnail view
Then go down to page 9 in the thumbnails and click on it so a border appears around it
Now go to edit> copy
Then do file>new from clipboard
This will make a duplicate of that page called untitled 1
Then file>Save this as something indicative like oil consumption
Now you can upload that single page pdf into a PDF to Excel file converter. The one I used was called https://online2pdf.com/
Once you arrive at the website select the file you placed on the desktop and press convert
Once converted your file should look like this:
Working with the Oil Consumption Data from BP This data is quite a bit bunched up, and it’s formatted differently to our other sheet. So to standardize the appearance of this sheet we need to clear formats again
And to remove this bunching use format>autofit row height
Now the data looks like this:
The font size is 10 and the font is times new roman so we can change that here if we like
Once our data is all tidied up lets use command-A and move it to our sheet with the production data. We’ll call this new sheet ‘Oil production raw’. Cut and paste the source below this data as well.
Now we can make another sheet (right click > move or copy > choose move to end > check the box called ‘create a copy’) and rename the new sheet Oil Consumption Working.
For the purposes of this exercise we might delete the rightmost two columns that track percentage changes.
After doing that highlight all the relevant data (headings and countries and data, but not the footnotes) and then choose table>new>insert table with headers
Then sort by the drop down beneath the header 2015 – choose sort descending
These are our sorted values. Note that all of the top rows are now totals for continent which we don’t want.
Using text filters To get rid of the rows discovered above, you can highlight them and then right click>delete.
However you can also use a text filter to help locate the rows you want to delete. This method is useful for large datasets that aren’t easy to go through and manually delete row.
Select the drop down menu under Thousand barrels daily.
Then choose filter >contains> and then write ‘total’
You’ll see all the rows that have the word total in them now.
Now highlight all the rows that you need to delete and then right click and choose delete row
Now to get back to your data, open up the drop down beneath thousand barrels daily again and then choose ‘clear filter’.
You can repeat this process with other words if that’s useful.
Making a Line Chart Let’s make a chart to show US consumption of oil over time. To do that we’d need to highlight the top two rows of the table, headings and US.
Then do Insert> charts.
Then choose line
Then select a type of line chart
At first your data may look all back to front:
But if you go to data and choose plot series by row it will look normal again:
If you want to you can add notations for the y axis. Use chart layout and then select axis titles
Then change the text saying axis title to Thousands of barrels per day.
We can also double click on the title and change this to US oil consumption 2005-2015. Finished product looks like this:
Creating an average Go back to our original sheet for oil consumption. If you wanted to do an average for all those years click into the cell to the right of the top row of values
Then click formulas>formula builder
Then double click the option called AVERAGE
Excel will autoselect all the values in that row.
If you are happy with the range press enter and the average will autofill down across all the countries
This is cool but the heading says 2016 where we want it to read average. So double click in 2016 and relabel it.
Charting average oil consumption To show the average oil consumption of the top 10 consuming countries. Hold down command and select the top 10 countries from the countries column and also select the matching averages.
To graph this do insert > chart
Why not play around with the possibilities of charts and see what you can do with this data?
Standardizing units of measurement with a formula If you wanted to compare these two datasets you would need to standardize the units of measurement, as one set (Consumption) is expressed in terms of 1000’s of barrels a day whereas production is expressed as a whole number of barrels per day. To standardize these two we would need to make the consumption figures also represent the whole number, which means multiplying all the numbers in that set by 1000.
To do this, choose a cell – eg: B2.
Then, in the fx column in excel write
=(B2*1000) and then press enter
The multiplied number will appear in whichever cell you have clicked before running the operation. It’s a good idea to make this cell the one next to the original number.
Then go to the cell with the multiplied number. You will see a small green dot at the bottom right of the cell. Click-drag this green dot downwards. It will apply the same formula to all the numbers below it.