computer analysis and visualisation

Upload: khoi-bocchoi

Post on 01-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Computer analysis and Visualisation

    1/39

    Chapter 5Key knowledgeAfter completing this chapter you will be able to demonstrate knowledge of :After completing this chapter, you will be able to demonstrate knowledge of : stages of the problem-solving methodology stages of the problem-solving methodology ypes of information problems and users needs that can be met through types of information problems and users needs that can be met through

    presenting information in visual formspresenting information in visual forms problem-solving activities related to analysing information problems problem-solving activities related to analysing information problems types of data visualisations types of data visualisations sources of authentic data sources of authentic data data types and data structures relevant to selected software tools data types and data structures relevant to selected software tools purposes of data visualisations purposes of data visualisations suitability of different types of visualisations that meet users needs suitability of different types of visualisations that meet users needs design tools for representing data visualisations design tools for representing data visualisations needs of users that can inuence the type and presentation of visualisations needs of users that can inuence the type and presentation of visualisations

    criteria and techniques for evaluating visualisations criteria and techniques for evaluating visualisations characteristics of le formats and their ability to be converted into other characteristics of le formats and their ability to be converted into otherformatsformats

    unctions of appropriate software tools to select required data and to functions of appropriate software tools to select required data and tomanipulate data when developing visualisationsmanipulate data when developing visualisations

    ormats and conventions applied to visualisations in order to improve their formats and conventions applied to visualisations in order to improve theireffectiveness for intended users.effectiveness for intended users.

    For the student If you can imagine that the amount of data that is generated every day just byIf you can imagine that the amount of data that is generated every day just byusing social-networking tools you might also be able to imagine that there isusing social-networking tools, you might also be able to imagine that there is

    someone somewhere who is looking through a mountain of data looking forsomeone, somewhere, who is looking through a mountain of data looking formeaning.meaning.Data visualisation is the process in which we take large amounts of dataData visualisation is the process in which we take large amounts of data

    and process it into effective graphical representations that will meet theand process it into effective graphical representations that will meet theneeds of users or clients. These representations can take the form of chartsneeds of users or clients. These representations can take the form of charts,graphs spatial relationships and network diagrams. In some cases the datagraphs, spatial relationships and network diagrams. In some cases, the datavisualisation might involve interactivity and the inclusion of dynamic datavisualisation might involve interactivity and the inclusion of dynamic datathat allows the user to deduce further meaning from the visualisation.that allows the user to deduce further meaning from the visualisation.

    Data analysis and visualisation

  • 8/9/2019 Computer analysis and Visualisation

    2/39

    9780170187466177

    For the teacher This chapter introduces students to the knowledge and skills needed toThis chapter introduces students to the knowledge and skills needed touse software tools to access authentic data from repositories and presentuse software tools to access authentic data from repositories and presentthe information in a visual form. The key knowledge and skills are basedthe information in a visual form. The key knowledge and skills are based

    on the Unit 2 Area of Study 1. If a data visualisation is effective it reduceson the Unit 2 Area of Study 1. If a data visualisation is effective, it reducesthe effort needed by readers to interpret information. This chapter takesthe effort needed by readers to interpret information. This chapter takesstudents through the different types of visualisations and then uses astudents through the different types of visualisations, and then uses acase study to explore some of the tools available to process data fromcase study to explore some of the tools available to process data fromrepositories such as the Bureau of Meteorology and the Australian Bureaurepositories, such as the Bureau of Meteorology and the Australian Bureauof Statistics.of Statistics.

    Revising the problem-solving methodologyAs you might remember from Unit 1, problems exist when the currentinformation system does not meet the information needs of the clientsor users or when there is a new need identied. This could be a simpleproblem, such as miscalculations in a database, or a more complex onethat involves nding an appropriate tool to display otherwise complex andincomprehensible data in a visual form.

    The process that we go through is the application of the problem-solving methodology. You may remember that this is a framework that can

    be used to guide the development of an ICT solution. Each stage of themethodology takes you through key aspects of the solution to consider,while always focusing on the needs of the user (Figure 5-1).

    Many project managers invest themajority of their time at the analysis stageof the problem-solving methodology toprevent time being spent on unnecessarytasks at the development stage.

    Presenting information in a visual formMany people prefer information to be presented in a visual form rather thanlines of text or gures. If we take a close look at an information problem thatdeals with the processing of a lot of complex data, often the needs of the userscan be met by presenting the data in a visual form.

    Edward Tufte is known for his workwith information design. Seewww.edwardtufte.com

    FIGURE 5-1Problem-solving methodology.

    RequirementsConstraints

    Scope

    ANALYSIS

    functionality appearance

    Solution design

    Evaluation criteria

    DESIGN

    ManipulationValidationTestingDocumentation

    DEVELOPMENT

    Strategy (measuring)ReportingEVALUATION

    The problem-solving methodology isexplained in detail on page xiii.

  • 8/9/2019 Computer analysis and Visualisation

    3/39

    9780170187466178

    Information Technology VCE Units 1&2

    There are three different ways in which the information on latenesscould have been presented: written, such as the ten-page report, which was generated from the

    system oral, such as a voicemail or broadcast visual, such as a histogram or a chart.

    In the case of the late students, a chart will provide a more immediateunderstanding of the distribution of data across a timescale from 8.30 a.m.to 9.30 a.m.

    Using a simple data visualisation, such as a histogram or a line graph(see Figure 5-3), we can identify when the majority of students are signingin late and this information might be used by school administration toidentify a pattern and formulate a plan of action.

    For example, a secondary school might have a problem with largenumbers of students signing in late every day. It is assumed that thereis an issue with student behaviour, and there is a ten-page report thatis printed every day and distributed to each year-level coordinator.The information problem is that the ten-page printout is difcult to

    understand because of the vast quantities of data. A report listingevery late student at a school might seem impressive and informative, but the list is just that a list of data (Figure 5-2). The large reportmeets the needs of a year-level coordinator chasing up students;however, the list does not show the !"#$%"&'$"() of latecomers orwhether there is a relationship between lateness and different days ofthe week.

    The website Studying Style claims that 65per cent of people are visual learners.

    A !"#$%"&'$"() of data is whencollected data is spread over a period oftime or dates; for example, data on howmany people use public transport.

    Think about IT 5-1

    What other data sets might becomemore meaningful when distributedacross a timescale?

    FIGURE 5-2Late student list shows information that is only meaningful to the year-level coordinators.

    Time Student name Surname Home group Excuse8.35 Scott Kelly 7A Train was late

    8.35 Michael Fitzgerald 12B Slept in

    8.36 Sally Saunders 7C No excuse, justlate

    8.36 June Ho 9D Tram was late

    8.36 Rosanna MacKenzie 9A Train

    8.37 Dion Lei 9A Missed the train

    8.37 Tony Marchetti 9A Train was late

    8.40 Yolanda Kerr 7C Train was late

    8.40 Finn Crisp 10B Orthodontist

    8.41 Hong Chen 10B Car broke down

    8.41 Tim Morris 11E Mums fault

    8.42 Christine Nguyen 12B Train

  • 8/9/2019 Computer analysis and Visualisation

    4/39

  • 8/9/2019 Computer analysis and Visualisation

    5/39

    9780170187466

    Information Technology VCE Units 1&2

    180

    If the user is looking to: compare data, a bar chart might be used show a distribution of data, a histogram might be used show a relationship between two data sets, a scatter diagram might be used show a composition of data that changes over time, this might involve an

    animation or simulation.Often before we start this process, we have an idea of what the solutionmight roughly look like based on the needs of the users. We might ask thefollowing questions: Does the information problem involve data analysis? Does the data need to be compared? Does a distribution need to be identied from the data set? Does a relationship between two sets of data need to be identied? Does the user need to make a decision from a collection of data? Is there a need for the data to be communicated in a graphical way

    because of the types of users?

    Analysing information problemsMany people believe the analysis stage of the problem-solving methodologyis the most important part. If you get this stage correct, your solution is morelikely to be successful as you will have a clear understanding of the problem.

    This analysis stage of the problem-solving methodology often involves theprocess of determining the scope, requirements and constraints of a solution.

    More and more data is being presentedin a visual form. For example, Wordle can be used tocreate a text cloud, which shows thefrequency of certain words being used ina document or on a webpage.

    Abraham Lincoln is quoted as saying, IfI had eight hours to chop down a tree,Id spend six hours sharpening my axe.Time spent analysing and designing aninformation problem is never wasted.

    Think about IT 5-4

    Using the Wordle tool atwww.wordle.net/ , use the text from your schoolnewsletter to create a text c loud. Doesthis data visualisation tool clearly showany key themes from your schoolnewsletter?

    CASE STUDY:Southside Makers

    Southside Makers is a not-for-prot craft association that supportscraftspeople, hobbyist and micro businesses in Melbournes southernsuburbs. The Southside Makers comprise about forty people who gettogether socially to craft, and they also run a market every month inSt Kilda Town Hall to promote handicrafts.

    The Market Committee includes three hardworking volunteers,and all work contributing to the success of the market is done for thelove of it.

    As part of the Southside marketing strategy, a letterbox drop has beendone a few weeks before the market in suburbs surrounding the market.Members of the craft club also place bundles of pamphlets in cafs in thehope that customers will pick one up and then visit the market. In the

    past the letterbox drop has been an unplanned approach, with volunteerspicking random streets near their homes or in the suburbs surroundingthe market.

    The last few markets have not seen an increase in attendance;as a result the stallholders have not made as many sales as theywould normally have done. The stallholders have requested that theMarket Committee investigate possible reasons behind this drop inattendance.

  • 8/9/2019 Computer analysis and Visualisation

    6/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    181

    Remember that the rst step in problem-solving is conducting an analysis.The analysis involves an investigation of the factors that affect the problem(constraints), identication of the needs of the users (requirements), andthe output that must be produced to meet those needs (scope). A commontechnique is to write out the problem as a short statement or question.

    CASE STUDY:Southside MakersAfter lengthy discussion by the Market Committee, the following

    *()#$%,")$# were identied: Southside Makers do not have a lot of resources to put towards asolution, so any solutions have to be reasonably priced with minimalimpact on the information system that they already have.

    Southside Makers also identied a time constraint. The solution hadto be achievable before the next market.Southside Makers have devised a short statement that denes their

    information problem: How can Southside Makers improve the effectivenessof their marketing campaign to increase attendance at their markets?

    Project*()#$%,")$# are often denedby the time and resources data, people,software, procedures available for thesolution. Another constraint may also bewhich area of the organisation is beingtargeted.

    CASE STUDY:Southside MakersAt the last market, the Southside Market Committee decided to undertakea visitor survey, both online and hard copy, to try and identify the averagemarket visitor. It is hoped that the information obtained is going to be usedto make changes to the current marketing strategy.

    FIGURE 5-5Southside Makers Market is run by volunteers.

    Obtaining feedback

  • 8/9/2019 Computer analysis and Visualisation

    7/39

    9780170187466

    Information Technology VCE Units 1&2

    182

    When seeking feedback there are two types of data that can be gathered:/',)$"$,$"0. and /',+"$,$"0. .

    Quantitative data is measurable and specic and therefore easier tochart or graph. At a simplistic level, quantitative data gathering is based onverifying theory through the use of statistics and largely numerical data,

    whilst qualitative data gathering typically generates theory. An example ofquantitative data is: Fifty per cent of market attendees bought a product fromthe market. An example of qualitative data gathering is more descriptive: Ona cold and wet, winters day, many older members of the community turned upto the market dressed in raincoats, wearing scarfs, and most carried umbrellas.

    Quantitative data can be gathered by using the following data gatheringinstruments: surveys questionnaires observation.When data has been gathered using the instruments mentioned above,quantitative data can be analysed by using software such as Excel, SPSS,Minitab or SAS. This takes times and often involves hours of data entrydepending on the complexity of the data gathering instrument (Figure 5-6).For simple data gathering, on-line surveys such as SurveyMonkey allowsusers to create surveys and manage the collection and analysis of quantitativedata. Results are often saved into a database and then downloadable inCSV (comma-separated value) format (see Figure 5-7 and Figure 5-8).SurveyMonkey also permits qualitative data to be entered.

    1',)$"$,$"0. !,$, is measurableand specic; for example, the amountof students who attend assembly eachweek.

    1',+"$,$"0. !,$, is harder tomeasure and deals with peoples opinions

    and feelings towards something.

    FIGURE 5-6Data gathering doesnt have to be complicated: it just needs to be effective.

    FIGURE 5-7A CSV le opened in Notepad. Data is unreadable and hard to understand, but the leformat means that the le size is very small.

    SurveyMonkey was discussed onpage 130.

    One popular online survey tool isSurveyMonkey. Registered account holders candownload reports and the raw data isgiven in a CSV format. This data can thenbe used to create data visualisations.

    What is a CSV le? A CSV le is plain text le that is speciallyformatted to store spreadsheet- ordatabase-style information. Each line ofthe le has one record on it, and eacheld within that record is separated by acomma.

  • 8/9/2019 Computer analysis and Visualisation

    8/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    183

    FIGURE 5-8A CSV le opened in a spreadsheet program. The information starts to become moremeaningful when put into columns.

    Think about IT 5-5

    What are some of the other techniquesthat Southside Makers can use to

    obtain feedback from visitors, membersand stallholders?

    CASE STUDY:Southside MakersThe Southside Market Committee could see from the data that they hadgathered that: The key !.3(4%,56"* of people attending the market are women aged

    thirty to forty-ve, with at least one child. The establishment of a kids craft table was one feature of the market

    that rated highly on the survey. The letterbox drops have been largely ineffective, with few people

    identifying a pamphlet in their letterbox as their reason for attending. A signicant number of market attendees picked up a pamphlet from

    their local caf and cited this as the reason why they attended the market.Additional feedback was obtained from Southside Makers Market

    stallholders on the day, and informal or verbatim feedback was gatheredand compared with the data being obtained through formal surveys.

    7.3(4%,56"* refers to thecharacteristics of an area or group ofpeople. For example, a demographic maybe people aged 1825 from non-Englishspeaking backgrounds, or aged 75 andabove and living by themselves.

    Visitors attending Southside Market

    Poster in a shop or caf

    Letterbox dropNewspaper listing

    White hat listing

    Facebook

    Southside Makers blog

    Another craft blog

    Was walking past

    Invited by a stallholder

    Invited by a friend

    FIGURE 5-9A pie chart showing quantitative data.

    Quantitative data is easy to communicate using data visualisations,such as pie charts. The pie chart in Figure 5-9 clearly shows how visitorsfound out about the market. What percentage found out through pamphlets,newspapers or friends?

  • 8/9/2019 Computer analysis and Visualisation

    9/39

    9780170187466

    Information Technology VCE Units 1&2

    184

    Qualitative data is harder to measure than quantitative data. Whengathering qualitative data, interviews, video footage and observation aredata gathering instruments. Generally, these tools need to be recordedaccurately and transcribed at a later stage. The analysis of the qualitativedata is also quite different to quantitative data. With quantitative data,

    the researcher looks for themes or patterns through the use of numbers,whereas with qualitative data, the researcher establishes rich descriptionsand nds themes through classication and coding.

    A $.8$ *+('! can be used to analyse qualitative (that is, non-specic andharder to measure) feedback from the stallholders. Key words immediatelyjump off the page, clearly communicating what was emphasised in the text.

    A$.8$ *+('! can be used to visuallyrepresent the words from a survey, achapter of a book or a collection of tagsfrom a blog. Wordle isa good example of a text cloud creator.

    Queensland-based company Leximanceris pioneering the area of datavisualisations using qualitative datato create themes, concepts and theirassociated relationships. On Leximancers

    website ,interesting data-visualisation examples ofkey reports are given, such as the 2009

    Australian Defense White Paper.

    CASE STUDY:Southside MakersA simple text cloud was used to analyse the qualitative data from online

    surveys from the market to see whether there were any patterns in whatthe stallholders and visitors were saying about the market.

    Upon closer analysis, several words jumped off the page, such asmusic and customers (Figure 5-10). This visualisation was then used,along with the quantitative data, to compare how many stallholders werehappy with their day and whether they sold enough products.

    FIGURE 5-10A text cloud from qualitative feedback from stallholders created with Wordle.

    Data visualisation might be used as part of an information solution, butit can also be used at the analysis stage to identify further data that might

    be needed to properly #*(5. the problem before moving forward to thedesign phase.

    To#*(5. a problem, we clearly denewhat the problem is and what the solutionwill affect.

    CASE STUDY:Southside MakersThe current marketing strategy of letterbox drops does not target areas of thecommunity representative of the average market visitor or buyer. As a result,the volunteers are not using their time effectively to distribute pamphlets.

  • 8/9/2019 Computer analysis and Visualisation

    10/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    185

    Defining the problemBased on the feedback from stallholders and visitors to the market, the MarketCommittee identied an area in which they believe problems exist.

    These problems are causing frustration for stallholders as the market

    threatens to become no longer viable for them to attend. A possible solutionto this problem is to identify effective areas to do letterbox or caf pamphletdrops, where there is high disposable income.

    Solution requirementsImagining the solution before it has been produced takes a lot of thought. How isthe organisation going to use the information? In what format does the solutionneed to be presented? The information being produced will be inuenced bythe type of data gathered. If the Southside Makers were to do a shopping stripsurvey on a Saturday morning in St Kilda, they would receive very different datathan they would from knocking on doors in the area during the week.

    CASE STUDY:Southside MakersSouthside Makers Market need information that will help them withtargeted letterbox drops in the St Kilda area. They need to reach thehouseholders who have money to be spent on impulse items at a market.The information to meet their needs is quite specic.

    The Market Committee could go doorknocking in the area to nd outwhat the average earnings are, or they could survey people on a Saturdaymorning at the nearby shopping strip. However, both of these data-collection methods take time, and the quality or quantity of informationis not guaranteed. Another option is to use data that is freely availableonline and is collected by a reputable data-collection organisation.

    Using freely available data obtained from the Australian Bureau ofStatistics (ABS), the Market Committee can identify: the income of people living in St Kilda the different types of households in St Kilda; for example, single or

    families.

    The ABS manages quite a number ofdata sets: from the National Census datacollected every ve years to the consumerprice index. Its data can be manipulatedin text form, or viewed as a thematic mapthrough its MapStats application on itswebsite atwww.abs.gov.au/ .

    Through discussions with stallholders, the committee feels that thereis an additional need to attract more customers with higher disposableincome so that they may purchase more products from stallholders. Thefollowing questions are asked: Where do high income earners hang out?and How would we nd out this information?

    Being a volunteer organisation, marketing is done on the cheap. Thecommittee needs to come up with a way to work smarter in terms ofadvertising the Southside Makers Market.

  • 8/9/2019 Computer analysis and Visualisation

    11/39

    9780170187466

    Information Technology VCE Units 1&2

    186

    A comparison of these !,$, #.$# should show the Southside MarketCommittee the areas around St Kilda in which they should be performingtheir letterbox drops (see Figure 5-11 for an example).

    Constraints and limitationsA *()#$%,")$ is any factor that inuences the nature of the solution. This may

    be the time that it might take to produce the solution, the style and size of thesolution or the characteristics of the users. It is important to try and identify

    as many constraints as possible before heading into the design phase of theproblem-solving methodology, such as: Does the solution need to appeal to a certain gender? Does the solution need to appeal to a certain learning style?

    A !,$, #.$ is a collection of data; forexample, census data collected every veyears might be regarded as a data set.

    9()#$%,")$# might involve: time time taken to produce the

    solution or a timeframe in whichthe solution needs to be created

    resources data, people, hardwareor software that needs to be usedas part of the solution.

    Think about IT 5-6

    Use the ABS website to nd out more about yourcommunity. Click on Census Dataand then on Community Prolesfor statistical data. If you are after avisualisation of the statistics, click onMapStats.

    CASE STUDY:Southside MakersA constraint on the information needed for the Market Committee might

    be an easy-to-read solution, where the committee can draw meaning fromit reasonably quickly.

    As the committee is dealing with a local area, the solution couldprovide them with a map that they can use to distribute pamphlets forthe next market.

    A list of statistics listing each street and their average earnings wouldnot assist the committee in making decisions about how to advertisetheir market. The problem in this case study is to provide the MarketCommittee with accurate data so that they can plan their marketingstrategy for their next event.

    FIGURE 5-11An example of a thematic map from the Australian Bureau of Statistics website showingthe distribution of data. Dark areas show high-income households, which are dened asearning more than $2500 per week.

  • 8/9/2019 Computer analysis and Visualisation

    12/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    187

    If the resulting solution is a data visualisation, is the audience familiarwith the problem or will they be seeing this solution for the rst time?

    How much detail does the solution need to have?The +"3"$,$"()# of a solution also need to be considered. Will there be

    any variables that will limit the usability of the solution, such as:

    Does the solution have technical limitations? Can the solution only be viewed by a certain piece of software or by acertain browser?

    Do you need to register or have an account to access the solution? Is the solution time-sensitive? Does it have to be communicated within a

    timescale? Are there any aesthetic limitations that prevent the visually impaired from

    understanding the solution?

    Types of data visualisationsWhen IBM launched its personal computer range in the 1980s (Figure 5-12), one

    of its main marketing angles was that the average person could use software toolsto be more productive, in particular through spreadsheets to process and analyseits data. Images of people getting excited by green writing on black screens were

    broadcast everywhere, paired with images of graphs and charts being used tomake otherwise confusing data more meaningful and easier to understand.

    :"3"$,$"()# might be: technical, when using the solution economic, such as needing

    to purchase software or amembership to view a solution

    aesthetic, which might preventa visually impaired person fromviewing the solution.

    Think about IT 5-7

    What constraints and limitations mightaffect the development of a schoolwebsite?

    FIGURE 5-12IBM released the 5150 PC in September 1981.

    Now, we have more data and information at our ngertips than ever before.It is estimated by a report in the Economist in February 2010 that we are

    bombarded by up to 34 gigabytes of data every day by the various tools andwebsites that we access. Social-networking tools have impacted on us by greatlyadding to the mountain of data that is created every day (see Figure 5-13).

    But with challenges, there are also opportunities every day another data-visualisation tool is being developed to be able to cut through this constantstream of data and give the users something that they can work with.

    Data visualisation is a visual representation of data that wouldnormally be quite hard to understand and derive meaning from.Displaying data in a visual form can add clarity and provide theend user with a snapshot of what the data is trying to communicate.

    Juice Analytics has a handy little widgeton their website that allows you tochoose the reason why you are charting,and then you can download into yourspreadsheet package a blank template toplay around with.

  • 8/9/2019 Computer analysis and Visualisation

    13/39

    9780170187466

    Information Technology VCE Units 1&2

    188

    Often people who present to an audience will use data visualisations tocommunicate a point because it is easier for the audience to understand quickly.

    The four main ways in which data visualisation can be used are:

    comparing data showing distribution of data showing relationships between data showing the composition of data.

    Data visualisations to compare dataComparing data is one of the most common practices for businesses: Howmuch did we sell last year compared with this year? or How many peopleuse public transport on a Monday compared with a Friday? A comparativevisualisation is one of the easiest to set up and understand.

    A chart is the most common data visualisation used. It is a visualrepresentation of a set of data. The charts that might be used to show acomparison between two data sets are: column charts, bar charts, linecharts, pie charts and comparative maps.

    For example, a pie chart can be used to show a comparison betweentrafc sources for a website (see Figure 5-14). The colours and ratio of thewedges provide the user with an immediate understanding of the data

    being shown, and they can see immediately where people are comingfrom. Are they arriving at this website directly, via a search engine or arethey being referred from another website? This form of data visualisationwould be far more useful than a series of bar charts.

    There are a number of creative variationson the humble bar chart. The use of icons,words and 3-D shading can be used toemphasise the data being visualised.

    FIGURE 5-13This diagram shows a visual representation of the data being pushed out by Twitter every day.

  • 8/9/2019 Computer analysis and Visualisation

    14/39

  • 8/9/2019 Computer analysis and Visualisation

    15/39

    9780170187466

    Information Technology VCE Units 1&2

    190

    Data visualisations to showdistribution of dataDistribution of data can be used to analyse when an event is happening acrossa timeframe. We are not comparing two sets of data; rather, we are looking athow the data relates to a xed item, such as a time, date or even grades. Somecommon data visualisations used to show distribution of data are column6"#$(4%,3# , line histograms and scatter charts.

    Column histogramsFigure 5-18 features a column histogram that shows how hits on a websiteare distributed over a period of thirty days.

    A6"#$(4%,3 shows how data haschanged over a timeframe; for example,hours, days, months and years.

    A line chart or histogram might be a good tool to use to show acomparison of site visits over time (Figure 5-17). The web developer canthen start to track relationships between site visits and site updates.

    FIGURE 5-18SiteMeter column histogram for 30 days. This popular diagnostic tool allowsclients to view a histogram of visits to their registered website across a period of7 days, 30 days or 12 months. This gure is an example of a histogram for the craft

    blog Konstant Kaos.

    FIGURE 5-17Google Analytics line histogram showing visits over time: the manager of the websitecan observe changes in viewer habits over a period of time.

  • 8/9/2019 Computer analysis and Visualisation

    16/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    191

    The user can clearly see that there was a spike with hits on a particularday; they can then go and investigate whether it was something that thewebsite did to cause this or an external force, such as someone linking tothe website from a blog.

    Using the column histogram, it is far easier to understand and

    identify patterns compared with the raw data (Figure 5-19). A histogramof todays hits clearly shows that people check the website at the start oftheir work day and at lunch time (see Figure 5-20). This may inuence adecision in regard to advertising a product.

    FIGURE 5-19A SiteMeter data set that shows hits on a website. In this form, it is harder to see thedistribution of data across the day.

    Think about IT 5-9

    Google Analytics and SiteMeter are justa couple of the many tools available totrack activity on a website. What doesyour school use?

    Line histogramsThe most common line histogram is the bell curve. Many educational institutionsuse this model to ensure that there is a spread of results for each subject. Forthe bell curve in Figure 5-21, for example, students with marks at the top of thecurve are average students; those on either side of the curve are below and aboveaverage. The majority of student marks are at the top of the bell curve.

    Scatter charts;*,$$.% *6,%$# can be used to show a distribution of data against a xed unit, suchas time. For example, a scatter chart might be used to visually show the time thatstudents are signing in late along one axis, and which year level they are in to seeif there is a relationship between the distributions of the data (see Figure 5-22).

    A#*,$$.% *6,%$ can be used to showa distribution of data against a xed unitof time.

  • 8/9/2019 Computer analysis and Visualisation

    17/39

  • 8/9/2019 Computer analysis and Visualisation

    18/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    193

    Scatter chartsA scatter chart can again be used to show the relationship between two datasets. In Figure 5-23, for example, the scatter chart shows a relationship betweentwo sets of data: frequency of volcano eruptions and the waiting time betweeneruptions. In a scatter graph, the independent variable, the eruption duration,goes on the horizontal axis while the dependent variable, the waiting time

    between eruptions, goes on the vertical axis. The chart can clearly show arelationship between the two sets of data; and to assist with the understanding ofthis data, we can draw a line to show this.

    FIGURE 5-23The volcano scatter chart clearly shows the waiting time between volcano eruptions andthe duration of the eruption. A line of best t can then be drawn to show the relationship.

    100

    90

    80

    70

    60

    50

    40

    Sample scatter graph

    Eruption duration (min)

    W a i t i n g t i m e b e t w e e n e r u p t i o n s ( m

    i n )

    2.0 3.0 4.0 5.0

    FIGURE 5-22Scatter charts can show the distribution of data. In this example, you can clearly see that agroup of Year 9 students arrived at the same time on the day that the data was processed.

    Students that are late to school

    7

    8

    9

    10

    11

    12

    8.34 8.36 8.38 8.40 8.42 8.44 8.46 8.48 8.50

    Time students arrived

    Y e a r l e v e l

  • 8/9/2019 Computer analysis and Visualisation

    19/39

    9780170187466

    Information Technology VCE Units 1&2

    194

    Bubble chartsBubble charts are quite similar to scatter charts, but one set of data is shownas bubbles that represent a size or percentage. Take a look at Figure 5-24. Arandom survey of stallholders at the Southside Makers Market and their salesfrom the day reveal a relationship between the amount of products that theyhad on sale and the amount of sales that they achieved.

    The raw data for the bubble chart is easy enough to read andunderstand, but you cannot easily see the relationship between the twosets of data (see Figure 5-27). A line of best t (also known as a trend line)could also be added to the bubble chart to show more clearly how therelationship works.

    The Whitburn ProjectThe Whitburn Project isa labour of love by a group of obsessiverecord collectors. In their quest toelectronically preserve high-quality

    recordings of every popular song since the1890s, they have created a spreadsheetof 37 000 songs and 112 columns of rawdata. Search around the Internet, andyou can see the analysis of this projectpresented as a collection of scatter andbubble charts that show some interestingrelationships between song duration andhow long a song stays in the charts. (SeeFigures 5-25 and 5-26.)

    FIGURE 5-24This bubble chart takes raw data and displays it in a more meaningful way.

    Analysis of sales atSouthside Makers Market

    Total products for sale

    T o t a l s a l e s ( $ )

    1600.00

    1400.001200.00

    1000.00

    800.00

    600.00

    400.00

    200.00

    0

    200.00

    400.00

    5 10 15 20 25 300

    19440:00:00

    0:00:43

    0:01:26

    0:02:10

    0:02:53

    0:03:36

    0:04:19

    0:05:02

    1948 1952 1956 1960 1964 1968 1972

    Average Song Duration (1944-2008)

    S o n g d u r a t

    i o n ( m

    i n u t e s : s e c o n d s )

    1976 1980 1984 1988 1992 1996 2000 2004 2008

    FIGURE 5-25Scatter chart of Average Song Duration from The Whitburn Project.

  • 8/9/2019 Computer analysis and Visualisation

    20/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    195

    Data visualisations to showcomposition of dataThe most common use for a data visualisation composition is to show howsets of data change over time. For example, using the

  • 8/9/2019 Computer analysis and Visualisation

    21/39

    9780170187466

    Information Technology VCE Units 1&2

    196

    Click on the World map button also on the left-hand side, and you cansee an animation or composition showing how the CO 2 emissions for bothcountries (and the world) have changed since the 1960s (see Figure 5-30).

    Click on the bubble icon to show a chart composition between two datasets: CO 2 emissions and electric power consumption (see Figure 5-31).

    Sources of authentic dataData can be divided up into primary- and secondary-source data. Primary-source data is acquired directly from the source, whereas secondary-sourcedata is an interpretation of a primary event or primary-source data.

    In the case study of Southside Makers, the sources for their authenticdata came from the Australian Bureau of Statistics (ABS) and the visitor

    FIGURE 5-29Line chart created in Google public data, comparing CO 2 emissions for Australia andGermany.

    FIGURE 5-28Line chart created in Google public data, showing how CO 2 emissions have risen overtime for Australia.

  • 8/9/2019 Computer analysis and Visualisation

    22/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    197

    survey results. There is both a large data pool and a quality process in placefor the distribution and collection of surveys and for data entry.

    The ABS has a census quality statement that outlines four principal#('%*.# (> .%%(% in its census data set: respondent error when the person lling out the survey misunderstands

    the question and includes false data processing error when the technique used to input the data, either

    manual entry or automatic entry using optical character recognitionsoftware, into the database makes a mistake

    partial response when the person lling out the survey has onlycompleted part of the survey

    undercount when not all of the respondents to the survey havecompleted the survey due to a range of reasons.

    ;('%*.# (> .%%(% in a data set aresometimes beyond the control of thecollectors.

    FIGURE 5-30World map composition created in Google public data, showing CO 2 emissionshighlighting Australia and Germany.

    FIGURE 5-31Bubble chart composition showing relationship between CO 2 and electric powerconsumption.

  • 8/9/2019 Computer analysis and Visualisation

    23/39

  • 8/9/2019 Computer analysis and Visualisation

    24/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    199

    A #'%0.- also uses a data-collection form and provides data about whatthe respondents think, such as their preferences in consumer goods orpolitical parties, what they want from an information system or their rolewithin an organisation.

    Valerie Browning is an Australian Nursewho lives and works in Ethiopia with the

    Afar people, an indigenous race of people

    dating back thousands of years. UsingWorld Health Organisation data sets wecan see that the area of Africa in whichshe lives is under-resourced comparedwith surrounding countries and even

    Australia.

    A#'%0.- is used to determine whatpeople believe is the current situation,or to provide facts as known to therespondent.

    If an organisation was looking to sponsor a child in a third world country,they might use the World Health Organisation (WHO) data to identifywhich areas of the world need their support and sponsorship.

    Qualied Health Professionals in Africa

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Eritrea Ethiopia Kenya Somalia Sudan

    Countries bordering Ethiopia

    Q u a

    l i e

    d h e a

    l t h p r o

    f e s s

    i o n a

    l s p e r

    1 0 0 0

    Per 1000: MidwivesPer 1000: NursesPer 1000: Physicians

    FIGURE 5-32Using World Health Organisation data we can select appropriate data to assist us inmaking a decision.

    Think about IT 5-12

    If your school sponsors a childin Africa, use the World HealthOrganisations database to nd furtherinformation about where they live.

    When you are planning on acquiring data to produce an informationsolution, you need to be mindful of what to expect and what form the datamight come in, especially when you have a handwritten survey.

    Surveys were also discussed onpage 122.

    FIGURE 5-33Using the data downloaded from the World Health Organisation site, we hide or deletethe columns of data that we dont want to work with.

  • 8/9/2019 Computer analysis and Visualisation

    25/39

    9780170187466

    Information Technology VCE Units 1&2

    200

    Data integrityFor a computer to produce useful information, the data that is input intoa database must have integrity. Data integrity is the degree to which datais correct. A misspelled movie title in a movie database is an example ofincorrect data. When a database contains these types of errors, it loses itsintegrity. The more errors the data contains, the lower its integrity. Userswill not rely on data that has little or no integrity.

    Data integrity is very important because information is used to makedecisions and take actions. When you order a product, such as a book aboutsewing or soccer, a process that eventually delivers you the book begins.Before you receive the book, the retailer usually charges the order to yourcredit card. The retailer will bill an incorrect amount to your credit card ifthe books price is not correct in its database. This type of error causes bothyou and the retailer extra time and effort to remedy.

    Validating data as it is being entered into the database can ensurethat integrity is kept high. When you purchase a product online, you are

    generally asked for a three-digit code on the back of the credit card. Thiscode is used as a check digit to ensure that the credit card information

    being typed in is correct.Throughout the process there are checks and balances introduced to ensure

    that the data order is correct and the correct payment information is received.

    Measurement Measurement is the most common technique used for gathering data thatis to be processed into information. Examples include an inventory clerkwho conducts a stocktake by counting each type of stock on the shelves andentering that number onto an inventory form. The manager at a sportsgroundmay determine the size of a crowd by counting the number of tickets used bypatrons. A nurse will record a patients weight by reading the measurementfrom a set of scales.

    Data measurement can also take place electronically. It occurs each timea shoppers groceries are scanned by a barcode reader at the supermarketcheckout. A school science experiment may be set up so that when thetemperature in a model hothouse reaches 21 degrees Celsius, the heater isswitched off. Other forms of measurement might be electronic, such as hitson a website and trafc passing through a router.

    The invention of the barcode in 1973 andthe Universal Product Code (UPC) systemheralded a new era in data integrity. Thelast column in the barcode was used asa check digit to ensure that none of theother bars had been compromised.

    A3.$%"* is just a measurement thatcan be used to evaluate.

    Think about IT 5-13

    Can you think of any other systemsthat use a similar technique to thecheck digit to ensure data integrity?

    CASE STUDY:Southside MakersAt the Southside Makers Market the committee might rely on a numberof strategies for creating 3.$%"*# of their event. Someone might count thenumber of people through the door of the Town Hall. After the event isover, the organisers might ask every stallholder to tell them how manysales they had for the day and how much they sold. Of course, metricsare only as good as the source they came from.

  • 8/9/2019 Computer analysis and Visualisation

    26/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    201

    Data needs to be separated into distinct elds or columns that only haveone sort of !,$, $-5. in them. Common data types are integer, charactersand strings.

    Integer data typesThe integer data type is used to represent whole numbers. A number ofdifferent formats are used to represent integer data (see Figure 5-35). Theformat used depends on the size of the numbers that need to be stored. Anexample of an integer might be the total number of visitors to the market.

    CharactersA character variable or constant holds a single letter, number or symbol. Forexample, a common character used in survey is a (Y)es/(N)o/(M)aybe answer.

    A !,$, $-5. is a set of data withpredened characteristics.

    Data types were also discussedon page 5.

    Data types and structures relevant to selected software tools

    Many different software tools can be used to create data visualisations;however, most are derivatives of either a spreadsheet tool or a database tool.

    For example, using the online data-visualisation tool Many Eyes < http://manyeyes.alphaworks.ibm.com/manyeyes/ > to generate a chart might requireusers to format their data into a data table using column headers that displaythe unit of measurement, such as dollars ($) or percentages (%) (Figure 5-34).

    FIGURE 5-34When converting the data from a spreadsheet tool into Many Eyes, the user needs toremove any trace of the $ or % signs.

  • 8/9/2019 Computer analysis and Visualisation

    27/39

    9780170187466

    Information Technology VCE Units 1&2

    202

    StringsStrings are sequences of characters. For example, a string might be a commentleft by someone in a survey.

    Boolean dataBoolean data can have only two values: true and false. An example of aBoolean answer to a survey question Will you attend the market again?might be (Y)es and (N)o.

    Decimal numbersDecimal numbers are those that include a fractional or decimal part. Forexample, a survey might ask How much did you spend at the SouthsideMakers Market?

    Date and time dataDate and time data are not recorded as the actual date, such as 4.39 p.m. on9 May 1938 or 09/05/1938. Instead, the date is held in the form of an integerthat counts the number of days from a specied reference point, such as 1900,and time data is stored as a decimal fraction of the day (see Figure 5-8 onpage 183). For example, a date and time might have been recorded when yourattendance was checked off on the way into a market so that a distributionchart might be created later on to see which times were the busiest. For morein-depth explanation on data types, refer to Chapter 6.

    Purposes of data visualisationsThe main purpose of a data visualisation is to reduce the effort required toanalyse the information (make the communication of the data more efcient)and increase comprehension, or the effectiveness of the information.Ultimately, we make decisions with information that is given to us. Below aresome examples of the decisions that data visualisations aid with when using awebsite.

    A bar or pie chart can provide us with a visualisation that will enable usto compare the origin of visitors on a website.

    Think about IT 5-14

    Look through the newspaper andidentify a data visualisation that hasbeen used in a story. Does the datavisualisation accurately represent thestory? What type of visualisation havethey used and could they have used adifferent one?

    Integer type Range Memory used (bytes)Short 128 to 127 1

    Normal 32 768 to 32 767 2

    Long 2 147 483 648 to 2 147 483 647 4

    Extra-long 2 63 to 2 63 1 8

    FIGURE 5-35Type and range of integer data.

  • 8/9/2019 Computer analysis and Visualisation

    28/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    203

    The organisation will be able to ask questions such as Are we gettingenough trafc from the desired country? This information might be used toevaluate an advertising campaign.

    A histogram showing at what time each day (or the distribution) peoplelook at the website will provide the organisation with data that might help

    them decide when information needs to be changed on the website. Forexample, if the bulk of visitors check the website at morning-tea time, theorganisation will know that any specials or updates need to be done by thistime.

    Data showing a distribution of hits both from a website and anadvertisement might indicate a direct relationship between advertisinghits and trafc. A composition created from the distribution of hits overa number of months and those on an advertisement might show a biggerpicture about the impact of advertising on web trafc.

    CASE STUDY:Southside MakersThere are many possibilities for Southside Makers to use datavisualisation for decision-making: A comparative chart might show how many customers are nding

    their way to the market each month. Is there a trend going up ordown?

    A distribution chart might show when the busiest period of the dayis for stallholders. Is it before, during or after lunch?

    A scatter chart showing customers and sales might indicate arelationship and possibly a tipping point for a good or bad market day.

    A composition might be created to show the relationship betweentime of day and sales (if you could get that data).

    DiRTDigital Research Tools Wiki is an excellent knowledge repositoryfor all processes associated with datavisualisation. Among the vast collectionof links are examples of how datavisualisations can be used.

    Design tools for representingdata visualisationsA variety of tools can be used to represent the design of a data visualisation: Layout diagram: A data visualisation needs to show what type of chart

    might be used and where the source data might appear; for example, alongthe x - or y -axis. Layout diagrams should show colours, headings, axis

    labels and legends. They can also show on a map where data might beshown and where the data source might come from (see Figure 5-36). Storyboard: Storyboards can be used to show how the data-visualisation

    animation might work (see Figure 5-38). For example, if you are usingFlash to show how one data visualisation morphs into another, thestoryboard might show the transitions from one visualisation to another.

    Flowchart: Flowcharts can be used to show the procedure that users needto complete to create a data visualisation.

  • 8/9/2019 Computer analysis and Visualisation

    29/39

    9780170187466

    Information Technology VCE Units 1&2

    204

    Characteristics of usersAs outlined in Chapter 1, many characteristics need to be taken intoconsideration when creating information, such as gender, special needs,culture, age, education level, status and location. When creating a datavisualisation, you might therefore want to also consider the following: Age: Visualisation being shown must be age appropriate. If a visualisation

    is being shown to a primary-school class about the distribution of sporting

    elds in a municipality, you might use different-sized footballs rather than bubbles to appeal to a younger audience.

    Special needs: Special needs such as sight impairment might change the wayyou visualise your data to ensure that people with special needs can read theaxis labels more clearly. Composition visualisations might move a bit moreslowly, or sound might be used to indicate an incline or decline in statistics.

    Culture: If you are preparing a visualisation that is to be presented to acommunity group whose members are mainly non-English speaking, youmay aim to use more symbols and drawings rather than words in yourpresentation.

    See page 10 for more information oncharacteristics of audiences.

    FIGURE 5-37Example of data that might

    be used for a visualisationshowing carbon dioxideemissions (metric tons percapita) over many decades.

    FIGURE 5-36An example of how you might show a composite data visualisation using a layoutdiagram.

    0 1960 1970 1980 1990 2000 2010

    5

    10

    15

    20

    C O 2 e m

    i s s i o n s ( m e t r i c

    t o n s

    )

    Year

    Data charted for Germany

    Layout diagram for composite data visualisation

    Data charted for Australia

    Source: OECD data from Google public data .

    Think about IT 5-15

    Create a owchart that accuratelyshows a procedure for creating asimple bar chart.

    FIGURE 5-38An example of how you might show a composite data visualisation using a storyboard.

    Map of world Germany

    Australia

    1960

    14 pt red red

    blue blue

    20 pt

    changes

    changes

    2010

    Map of world Germany

    Data changes to match CO 2 emissions

    Australia 14 pt

    Source: OECD data from Google public data.

  • 8/9/2019 Computer analysis and Visualisation

    30/39

  • 8/9/2019 Computer analysis and Visualisation

    31/39

    9780170187466

    Information Technology VCE Units 1&2

    206

    Accuracy: Is the data visualisation accurate? Can we compare a list of datawith the visualisation to ensure its message is accurate?

    Clarity: Does the data visualisation provide ambiguous information,therefore potentially leading to misinterpretation?

    Using problem statements to evaluatedata visualisationsWhen the problem statement is clearly dened, words can be used that willguide us in how we might evaluate the information or data visualisationproduced (Figure 5-39). Taking a look at the Southside Makers Marketproblem statement, one key word is used in describing the problem thatcan be used to evaluate the data visualisation: How can Southside Makersimprove the effectiveness of their marketing campaign to increase attendanceat their markets?

    When we evaluate the Southside Makers data visualisation against the

    problem statement, we should draw on the key word that we have in thisproblem statement: Does the data visualisation improve the quality (effectiveness) of

    the marketing campaign by providing information that is easier tounderstand?

    Is the visualisation more relevant (effective) to the task of improvingthe effectiveness of the marketing campaign compared with previouscampaigns?

    Is the data visualisation timely (effective)?

    Characteristics of le formats

    and ease of conversionMost of the data-visualisation tools will produce a visualisation that is able to be saved and then used as part of a presentation or report.

    In the case of the Southside Makers Market case study, thevisualisations created at the ABS website, such as Figure 5-11 on page 186,can be saved as a JPEG by simply right-clicking on the le and then savingit into an appropriate directory.

    Websites that deal with public data sets, such as Google public dataexplorer < www.google.com/publicdata/ >, use ash to animate theirvisualisations. Depending on the browser, there are tools available to enableusers to capture these ash les and save them to their hard drives andinsert them into PowerPoint.

    Tools such as Chart Chooser, by Juice Analytics < http://chartchooser.juiceanalytics.com/ >, allow users to download a template into their chosenapplication, such as a spreadsheet or presentation tool. They can then alterthe data set to produce a visualisation that meets their needs.

    Functions of data visualisation softwareA basic spreadsheet tool can provide you with the tool necessary to produce astatic data-visualisation information solution. Basic comparative, distributiveand relationship-based charts can be created using a common spreadsheeting

    More and more websites are using PNGas a le format. PNG stands for portablenetwork graphics. It is an improvedversion of the GIF le format. JPEG is stilla popular le format.

    Think about IT 5-18

    Using Chart Chooser, create a piechart that shows football teams yourclassmates support.

  • 8/9/2019 Computer analysis and Visualisation

    32/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    207

    Think about IT 5-19

    Using Google public data from OECDcountries, create a visualisationcomparing two sets of data.

    package. However, if you are wanting to create an animated visualisaton of adata set, you need to register and use some of the more complex tools on theInternet, including: Many Eyes < http://manyeyes.alphaworks.ibm.com/manyeyes/ > allows

    registered users to upload data sets, create complex visualisations and

    share them with the world. Google Chart Tools < http://code.google.com/apis/charttools/ > allow users tocreate complex animations using a wide range of charts (Figure 5-39), and tooverlay these onto Google maps < http://maps.google.com/ > if required.

    FIGURE 5-39This punch-card scatter chart is just one example of the user-submitted charts availableat Google Chart Tools.

    SatFri

    ThrWedTue

    MonSun

    12 a.m. 1 2 3 4 5 6 7 8 9 10 11 12 a.m. 1 2 3 4 5 6 7 8 9 10 11

    Any Chart < www.anychart.com > allows users to play around withXML code to customise their charts. Although this product is not free,registered users can download a demonstration to use.

    Formats and conventions applied tovisualisationsA good data visualisation tells one story. This story should be easy tounderstand, clear to identify and without ambiguity. The formats andconventions that we use should support this. If you remember back to Chapter 1,a format and convention can be used to enhance the appearance of informationand make it more readable. Formats and conventions used in data visualisationsare similar to what you would use for most other solutions.

    Formats for data visualisationsA format helps to change the appearance of a piece of information usingfeatures such as font, margins, spacing, columns, tables, graphics, borders,pages numbers and headers and footers. A good data visualisation should useformats that increase the effectiveness of the information. These might include: easy-to-read fonts for headings a chart format that clearly shows the information being communicated colour gradients that are easy to distinguish.

    For example, using a bubble chart overlay over a map might not be the bestformat for communicating information on the number of bars in the UnitedStates (see Figure 5-40). The use of colour variants might have been a bettermethod for communicating which state has the most drinking establishments.

  • 8/9/2019 Computer analysis and Visualisation

    33/39

    9780170187466

    Information Technology VCE Units 1&2

    208

    Conventions for data visualisationsA convention is a well-known rule used to communicate information. The mostpopular example is the way an envelope is addressed. The convention is to putthe persons name, address, suburb, state, postcode and then country, if necessary.

    Conventions for data visualisations include: clearly identifying the data source at the footer of the visualisation providing a heading, stating the purpose of the visualisation inserting labels that show what the data is and what unit of measurement

    is being used inserting a footer showing who has created the chart and for whom.

    Junk Charts blogJunk Charts is an interesting blogthat looks at bad and confusing chartsand how they try to communicateinformation a great source of datavisualisations gone wrong.

    Think about IT 5-21

    Visit the Junk Charts website andselect a chart. Identify two ways in

    which formats and conventions havenot been correctly applied.

    Think about IT 5-20

    Using data retrieved from one of thelarge data providers (for example, WHO,World Bank or Australian Census),create an example of a good and a badvisualisation. Outline the good and badfeatures of each.

    FIGURE 5-40This data visualisation is overcrowded and too busy to get any meaningful informationfrom it.

    Fewer Referencesto Bars

    BARS: Size of symbol represents absolute number of mentions of bars in the Google Maps directory. A maximum value of 1139 islocated in Chicago, IL. Data collected in August 2008. Copyleft Floatingsheep.org, 2010

    More Referencesto Bars

    Axis headings must be clear

    What is the visualisation doing?

    Source of the data for the visualisation and when the data was retrieved

    Author clearly identified and when the visualisation was created

    Qualied Health Professionals in Africa

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Eritrea

    Source: World Health Organisation data:http://www.who.int/ 12 Sept 2010Created by: Sally Bloggs 20 Sept 2010

    Ethiopia Kenya Somalia SudanCountries bordering Ethiopia

    Q u a

    l i e

    d h e a

    l t h p r o

    f e s s

    i o n a

    l s p e r

    1 0 0 0

    Per 1000: MidwivesPer 1000: NursesPer 1000: Physicians

    FIGURE 5-41An annotated diagram showing which conventions have been used in the visualisation.

  • 8/9/2019 Computer analysis and Visualisation

    34/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    209209

    What you should know

    1 The problem-solving methodology is used as aframework for solving information problems. 2 An information problem occurs when

    an information system does not producethe information required for a user or anorganisation.

    3 There are three different ways in whichinformation can be presented or communicated:written form, such as a ten-page report; oral,such as a voicemail or a broadcast; and visually,such as a chart or animation.

    4 When looking at a distribution of data , we lookfor changes over a period of time.

    5 A data visualisation is when data is presentedin a visual format so that it has greater meaningor where understanding can be reached in ashorter period of time.

    6 When processing data into information, there arefour basic types of data visualisations: comparing sets of data, showing a distribution , showing arelationship , and composing an animation thatshows how the data changes over time.

    7 When analysing an information problem, werst need to dene the scope , informationrequirements and information constraints .

    8 A project constraint is often dened by the timeand the resources available.

    9 Quantitative data is measurable and specic. 10 Qualitative data is harder to measure and

    often deals with peoples opinions and feelingstowards something.

    11 Quantitative data can be gathered throughsurveys (handwritten and online) and statistics.

    12 Qualitative data can be gathered throughinterviews and questionnaires (handwritten andonline).

    13 A CSV le is a plain text le that is specicallyformatted to store spreadsheet- or database-style

    information. 14 Demographic is data that deals with the

    characteristics of a group of people. 15 A text cloud is an effective way of

    communicating qualitative data as it highlightskey words and how often they have been used.

    16 Data visualisations can be used either as aproduct resulting from the problem-solvingmethodology or as a tool to further analyse aproblem.

    17 A data visualisation ensures that complexdata is communicated in a way that makesunderstanding it easier.

    18 A data set is a collection of data; for example,census data collected every ve years might beregarded as a data set.

    19 A constraint is any factor that affects thenature of a solution, such as time or resourcesavailable.

    20 A limitation might involve technical, economicor aesthetic limitations.

    21 Charts such as pie, column or bar can be used asdata visualisations to compare two or more sets

    of data.22 A data visualisation showing a distribution of

    data might be communicated using a scatterchart, line histogram or column histogram.

    23 A histogram shows how data has changed over atimeframe; for example, hours, days, months andyears.

    24 A scatter chart can be used to show adistribution of data against a xed unit of time.Complex scatter charts can show a comparisonof data and therefore identify a relationship

    between them. 25 A data visualisation showing a relationship

    between two sets of data might show thisrelationship using a scatter or bubble chart.

    26 A bubble chart displays one set of data as bubbles, which are relative to their values.

    27 A data visualisation showing a compositionof data might be communicated using ananimation that is, morphing one visualisationinto another.

    28 Data falls into two different types: primaryand secondary source . Primary-source data isdata acquired directly from the source, whereassecondary-source data is an interpretation of the

    original data. 29 When collecting data, there are four sources

    of error that might occur: respondent error,processing error, partial response andundercount.

    30 Respondent error is when a data-collection formhas been lled out incorrectly.

    31 Processing error occurs when there is an errorwith manual or automatic data entry into thesystem.

  • 8/9/2019 Computer analysis and Visualisation

    35/39

    9780170187466

    Information Technology VCE Units 1&2

    210

    32 Partial response is when only half of the data-collection form is lled out.

    33 Undercount is when not all respondents havelled out the data-collection form, thereforemaking the data set quite small.

    34 Data-collection forms are used to acquirelarge sets of data. This can be done manuallyor electronically. A common method for datacollection is a survey .

    35 Data integrity is the degree to which data iscorrect. Validating data as it is entered into thedatabase can keep integrity high.

    36 Data measurement , or metrics , is a simple way

    of collecting data. This deals with measuringxed items such as attendance, temperature,speed or stock levels.

    37 A data type is a way we categorise data. Basicdata types include integer, character, strings,Boolean data, decimal numbers and date andtime numbers.

    38 The purpose of a data visualisation is to reducethe effort required to analyse information andincrease comprehension.

    39 We can use a number of design tools forrepresenting data visualisations: layoutdiagrams, storyboards and owcharts.

    40 A layout diagram can show what type of chartwe might use, and indicate where the sourcedata, axis labels and headings come from.

    41 A storyboard can be used to show how adata visualisation animation might work; forexample, from data set A to data set B.

    42 A owchart might be used to show the processor procedure that the user needs to go through tocreate a visualisation.

    43 Characteristics of users that might inuence thetype and presentations of visualisation might beage, special needs and culture.

    44 Evaluating data visualisations might involveassessing whether the visualisation is efcient(less time, less cost and less effort).

    45 Evaluating data visualisations might involveassessing whether the visualisations are effective(higher quality, relevancy, timeliness, accuracyand clarity).

    46 When we evaluate a data visualisation against aproblem statement, we look for key words thatcan assist us in the evaluation statements.

    47 Most data visualisations can be saved into JPEG or PNG le formats . Animated datavisualisations can be saved as a ash le format.

    48 Software tools used to create data visualisationsrange from spreadsheets to bespoke online tools.

    49 Data visualisations should be formatted clearlyto show easy-to-read headings, a clear chartformat and colour gradients that are easy todistinguish.

    50 Data visualisations have the followingconventions applied to them: headings and axislabels, clear identication of data source, whothe visualisation is created for, and who created

    the chart.

  • 8/9/2019 Computer analysis and Visualisation

    36/39

    9780170187466

    Chapter 5 Data analysis and visualisation

    211

    Revising the problem-solving methodology 1 What is an information problem? 2 What are the three different ways information

    can be presented? 3 What is a data set? 4 What are the two factors that data visualisation

    can increase?

    Presenting information in a visual form 5 What is a data visualisation? 6 Why might we favour a data visualisation over

    other ways of presenting information? 7 What are the four different ways you can present

    information in a visual form? 8 What is the difference between a visualisation

    that compares and visualisation that showsrelationships?

    9 Give an example of a visualisation that wouldshow a distribution of data.

    Analysing information problems 10 What are the three factors that should be taken

    into consideration when analysing a problem?11 Identify three common constraints that might

    exist when analysing an information problem. 12 How might a set of user requirements change the

    nature of an information solution? 13 What is a scope? 14 What is the difference between quantitative and

    qualitative data? Give examples. 15 What strategies might we use when gathering

    quantitative data? 16 What data visualisations can we use to analyse

    qualitative data? 17 What is a thematic map? 18 Identity three constraints that might inuence

    the nature of a solution. 19 Why is it important to consider the limitations

    of a solution when designing a solution?

    Types of data visualisations 20 How many chart types do you know? List and

    give an example of how they might be used. 21 How does a histogram differ from a line chart? 22 What visualisation can we use to demonstrate

    what might happen over a given timeframe? 23 How might a scatter chart be used? 24 How does a bubble chart differ from a scatter

    chart? 25 If we had two sets of data and we needed to

    show how the data changed over time, what typeof visualisation might we create?

    Sources of authentic data 26 What are the four types of errors that can occurwhen acquiring data for a visualisation?

    27 Why would it be important to identify if a dataset has an undercount?

    28 What is the difference between primary- andsecondary-source data?

    29 Why is it important to identify the source of adata visualisation?

    30 What is the difference between a survey and adata-collection form?

    31 How does data integrity make a difference to thevisualisations you produce?

    32 Give three examples of data measurement.

    Data types and structures relevantto selected software tools 33 When creating a data visualisation, why is it

    important to dene the data type that you are using? 34 What is a CSV le?

    Purpose of a data visualisation 35 How might a design assist you in clarifying the

    purpose of a data visualisation? 36 Draw a owchart that explains the process that

    you need to go through to create a thematicmap from the Australian Bureau of Statisticswebsite.

    37 Create a storyboard that shows how a datavisualisation might work.

    Test your knowledge www.nelsonnet.com.au/infotech

  • 8/9/2019 Computer analysis and Visualisation

    37/39

    9780170187466212

    Information Technology VCE Units 1&2

    Characteristics that can inuence typeand presentation 38 How might age impact on the design of a data

    visualisation? 39 Provide an example of how culture might

    change the form of data visualisation. 40 Identify a special need that might change the

    format of a data visualisation.

    Criteria and techniques for evaluating adata visualisation 41 Provide an example of how you might evaluate a

    data visualisation to demonstrate efciency. 42 Create three questions that would test for clarity

    in a data visualisation. 43 How can we test for accuracy when creating a

    data visualisation?

    Characteristics of le formats 44 A member of your team has asked you for a copy

    of a data visualisation that appeared in a Worddocument. What le format would you send it in?

    45 Into what le formats can we save datavisualisation animations?

    Functions of appropriate software tools 46 Only two columns of data need to be used to

    create a data visualisation for a presentation.Using a owchart, explain how you might selectthe required data for the data visualisationrequired.

    Formats and conventions 47 Identify three formats that each data

    visualisation should demonstrate to ensureclarity.

    48 Identify and explain three conventions thatevery data visualisation should show to ensure

    relevancy.

  • 8/9/2019 Computer analysis and Visualisation

    38/39

  • 8/9/2019 Computer analysis and Visualisation

    39/39

    Information Technology VCE Units 1&2

    Preparing for Unit 2 Outcome 1On completion of this unit the students should be ableto apply the problem-solving methodology and useappropriate software tools to create data visualisationsthat meet users needs.

    In Unit 2 Outcome 1, you are required to produce adata visualisation that meets the need of a user. Thiswill involve using a complex data set and accessingsoftware or online tools that will enable you to convertthis data into a more meaningful format.

    Learning milestones1 Familiarise yourself with the design brief and

    analyse the information problem that exists. 2 Determine the type of data visualisation that

    might be needed to solve the informationproblem.

    3 Select appropriate sources of data and identifyrelevant data.

    4 Determine the suitability of different data typesand structures for creating visualisations.

    5 Select types of visualisations that areappropriate to the data.

    6 Select and apply appropriate tools to plan thedesign of the visualisations.

    7 Apply software functions to locate and acquire

    data that will be input and manipulated. 8 Use appropriate software tools, and select andapply a range of suitable functions.

    Steps to be followedThe analysis, design, development and evaluationstages of the problem-solving methodology will beused to create the solution. 1 Analyse and dene the problem. From the

    design brief, identify:a the factors that affect the problem

    b the data visualisation needs of the userc the data visualisation product to be produced

    to meet the users needsd appropriate sources of data that must be

    processed to produce the solution.The problem to be solved is best written as a

    question. For instance, identify a pattern in the lateattendance of students across a one-month period.

    2 Design the solution. This includes:a use of techniques such as owcharts,

    storyboards or layout diagrams for all aspects ofthe output, both on screen and in printed reports

    b a decision about which software and hardwareare most suitable for producing the datavisualisation

    c formats and conventions that will be used inthe data visualisation

    d selection of the test data that will be usedto conduct tests to check that the datavisualisation is producing the information thatis required.

    3 Develop a data visualisation that can be viewedeither on screen or in hard copy, depending onthe needs of the user.a Produce a clear and concise data visualisation.

    b Show the source of the data that you are using.c Ensure that you have correct headings, axis

    labels and measurements to ensure clarity inthe data visualisation.

    4 Test the solution. Detail how you have checkedthat the output produced is working as expected.Test data selected in the design stage will beused.a Test that the source data is displayed correctly

    in the data visualisation. b Test the clarity of the data visualisation.c Test the relevancy of the data visualisation.

    You will need to provide testing tables, showingfunctions of the data visualisation software tested,results that are expected using the data and the actualresult that was observed when using the software toproduce the data visualisation. For example, use atesting table similar to the one shown below. 5 Evaluate the solution.

    Prepare an evaluation feedback form and gainfeedback from the known audience.

    6 Discuss the likely impact on the ability of theuser to make clear decisions based on howinformation is communicated through thedata visualisation. For instance, identify thedecisions the user will be able to make with thedata visualisation.