data literacy

45
5th Seminar on Data Literacy Jayanta Kr. Nayek DRTC,ISIBC 3rd Semester 2013-2015

Upload: jayanta-nayek

Post on 14-Jun-2015

215 views

Category:

Data & Analytics


2 download

DESCRIPTION

INDIAN STATISTICAL INSTITUTE Documentation Research & Training Centre 8th Mile, Mysore Road, RVCE Post Bangalore-560 059 DRTC Seminar- 5 2014 Data Literacy ABSTRACT In our increasingly data-driven society, data literacy is an important civic skill which we should be developing in our society. Data is slowly but steadily forcing their way into the societies. Data literacy may seem less technical than either Computer Science or any other fields. Still we need to envisage a wide variety of tools for accessing, converting and manipulating data. These require to understand relational databases (like MS Access), data manipulation techniques, statistical software tools (like Minitab, SPSS, STATA and MS Excel) and data representation software tools (like MS PowerPoint and MS Excel). This seminar includes an introduction on data literacy, its inter-relationship with information literacy and statistical literacy. It also includes various steps for working with data followed by short demonstration of data analysis techniques by using the software STATA11. Speaker: Jayanta Kr. Nayek Date:29 .10.2014. Time: 2 p.m. Venue: DRTC, ISI Bangalore. All are cordially invited. Seminar Coordinator Biswanath Dutta

TRANSCRIPT

Page 1: Data literacy

5th Seminar on

Data Literacy

Jayanta Kr. NayekDRTC,ISIBC

3rd Semester2013-2015

Page 2: Data literacy

Contents• Introduction• What is data?• Data Life Cycle• Definitions of DL, IL & SL• Relation between DL, IL &SL• Conceptions of Data Literacy• Why data literacy?• Basic steps for working with Data

– Data Visualization– Data Interpretation– Data Documentation– Data Transformation

Data visualization and wrangling tools Becoming data literate: Basic Skills of Data Literacy Data Literacy in Libraries Big Data magagement SAS as a solution for Big data• Conclusion• References

22

Page 3: Data literacy

Introduction

• The evaluation of information is a key element in information literacy, statistical literacy and data literacy. As such, all three literacies are inter-related. It is difficult to promote information literacy or data literacy without promoting statistical literacy.

• All librarians are interested in information literacy; archivists and data librarians are interested in data literacy. Both should both consider teaching statistical literacy as a service to users who need to critically evaluate information in arguments.

3

Page 4: Data literacy

What is Data?

• Webster meaning:

“facts or information used usually to calculate, analyze, or plan something”.

• Anything is data – text, image, numbers, …

• For computer to understand, data needs to be in structured and machine-readable form

4

Page 5: Data literacy

Data Life Cycle

Page 6: Data literacy

Data Literacy

“Data-literacy is the ability to consume for knowledge, produce coherently and think critically about data.”

“Data literacy” includes :

• Information Literacy• Statistical Literacy • Understanding how to work with large data sets,• How they were produced, • How to connect various data sets, and • How to interpret them.

6

Page 7: Data literacy

Information Literacy

Page 8: Data literacy

Information Literacy• An information literate individual is able to:

(1) Determine the extent of information needed,

(2) Access the needed information effectively and efficiently,

(3) Evaluate information and its sources critically,

(4) Incorporate selected information into one’s knowledge base,

(5) Use information effectively to accomplish a specific purpose, and

(6) Understand the economic, legal, and social issues surrounding the use of information, and access and use information ethically and legally.

8

Page 9: Data literacy

Statistical Literacy

• Statistical literacy studies the use of statistics as evidence in arguments (Schield, 1998, 1999).

• A key element of statistical literacy is assembly: how the statistics are defined, selected and presented.

• A second key element of statistical literacy is the importance of context and confounding.

9

Page 10: Data literacy

Relationship b/w IL, DL & SL

Page 11: Data literacy

Discussion

According to Schield (2004), data literacy is the part of statistical literacy that involves training individuals to access, assess, manipulate, summarize and present data, whereas statistical literacy aims to teach how to “think critically about descriptive statistics.”

Data literacy as a complement to or a form of information literacy which makes us think that data literacy would be the umbrella concept covering statistical literacy.

Statistical literacy is envisaged as the component of data literacy involved in the critical appraisal, interpretation, processing, and statistical analysis of data.

Data literacy can be defined as the component of information literacy that enables individuals to access, interpret, critically assess, manage, handle and ethically use data.

Page 12: Data literacy

Conceptions of Data Literacy(1)

A social science perspective :

Data literacy almost synonymous with statistical literacy,quantitative literacy and numeracy – but involving more than basic statistics and mathematical functions

understanding data and its tabular and graphical representations, including statistical concepts and terms

finding, evaluating and using statistical information effectively and ethically as evidence for social inquiries

reading, interpreting and thinking critically about stats

Page 13: Data literacy

Conceptions of data literacy(2)

Page 14: Data literacy

Conceptions of Data Literacy(3)

A science (STEM/information science) perspective :

Science data literacy shares aspects of social science conceptions, but requires awareness of the data life cycle, metadata issues, data tools and collaboration mechanisms

managing the data generated from experiments, surveys and observations by using sensors and other devices

understanding the attributes, quality and history of data to produce valid, reliable answers to scientific inquiries

accessing, collecting, processing, manipulating, converting, transforming, evaluating and using data

Page 15: Data literacy
Page 16: Data literacy

Why data literacy?

• Slowly but steadily data are forcing their way into every nook and cranny of the industry, company and job.

• Data literacy is the ability to ask and answer meaningful questions by collecting, analyzing and making sense of data encountered in our everyday lives.

• In our increasingly data-driven society, data literacy is an important civic skill which we should be developing in our society.

16

Page 17: Data literacy

Basic Steps in Working with Data

There are at least three key concepts we need to understand when starting a data project:

• Data requests should begin with a list of questions you want to answer.

• Data often is messy and needs to be cleaned.

• Data may have undocumented features.

17

Page 18: Data literacy

Data Visualization

Visualization provides a unique perspective on the dataset.

Visualization is critical to data analysis. It provides a front line of attack, revealing intricate structure in data that cannot be absorbed in any other way. We discover unimagined effects, and we challenge imagined ones.

--William S. Cleveland: visualizing Data

Page 19: Data literacy

Data insights: a visualization (Gregor Aisch)

19

Page 20: Data literacy

How to visualize Data

Tables It is very powerful when you are dealing with a relatively small number of data points.

Charts It allow you to map dimensions in your data to visual properties of geometric shapes.

Maps The power of map is to re-connect the data to our very physical world.

Graphs It is all about showing the inter-connections (edges) in your data points (nodes).

Page 21: Data literacy

Analyze and Interpret

Once you have visualized your data, the next step is to learn something from the picture we created. You could ask yourself:

• What can I see in this image? Is it what I expected?• Are there any interesting patterns?• What does this mean in the context of the data?

• Sometimes you might end up with visualization that, in spite of its beauty, might seem to tell you nothing of interest about your data. But there is almost always something that you can learn from any visualization, however trivial.

21

Page 22: Data literacy

Document Your Insights and Steps

Documentation is the most important step of the process; and it is also the one we’re most likely to tend to skip.

Help yourself

Help others

22

Page 23: Data literacy

Transform data

Data transformation is the process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system.

Data transformation can be divided into two steps:

data mapping maps data elements from the source data system to

the destination data system and captures any transformation that must occur

code generation that creates the actual transformation program

Page 24: Data literacy

Data visualization and wrangling tools

Names Examples

Spreadsheets LibreOffice, Excel or Google Docs.

Statistical programming frameworks

R (r-project.org) or Pandas (pandas.pydata.org),STATA,SPSS etc.

Geographic Information Systems (GIS)

Quantum GIS, ArcGIS, GRASS

visualization Libraries d3.js (mbostock.github.com/d3), Prefuse (prefuse.org), Flare (flare.prefuse.org)

Data Wrangling Tools Google Refine, Datawrangler

Non-Programming Visualization Software

ManyEyes, Tableau Public (tableausoftware.com/products/public)

Page 25: Data literacy
Page 26: Data literacy
Page 27: Data literacy
Page 28: Data literacy

Basic Skills of Data Literacy

• “Learning key statistical terms, like the difference between mean and median; or why a standard deviation or margin of error might matter.

• “Knowing what questions to ask about data or a statistic to gauge its potential relevance, quality or reliability.

• “Performing basic statistical calculations -- nothing fancy, just enough to do a quick reality-check whether you're understanding the story that a dataset might be telling.

• “Putting data in context, such as considering the local unemployment rate in the context of Census data for your community, or local vs. state/national crime statistics.”

28

Page 29: Data literacy

Data Literacy in Libraries’Instructional Programs and Services

Academic libraries are deploying a four-fold response for the growing need to use research data:

1) hiring specialized staff (data librarians or data specialists) or furthering data management and analysis training for (generally reference) librarians;

2) intensifying the collection or compilation of and providing access to data sources;

3) participating in the development of institutional data repositories to preserve and share original research data and

4) incorporating data literacy in their instructional programs and services (whose design should follow today’s inescapable reference framework, namely the ACRL (2011b).

Page 30: Data literacy

The ACRL (2011b) recommendations on information literacy, as well as its “Guidelines for Instruction Programs in Academic Libraries.”

via the Web, with the publication of self-training resources (open to the public or for in-house use only);

in the library itself, through reference service, one-on-one and on-demand or scheduled user training sessions;

through face-to-face and online instruction, forming part of credit courses, either as specialized stand alone instruction or, with instructors’ cooperation, as instruction embedded in other subjects.

Page 31: Data literacy

List of libraries provided Data literacy

The Massachusetts Institute of Technology’s (MIT) Data Management and Publishing tutorial,

The EDINA Research Data Management Training (MANTRA),

The University of Edinburgh’s Data Library and

The University of Minnesota libraries’ Data Management Course for Structural Engineers.

Page 32: Data literacy
Page 33: Data literacy
Page 34: Data literacy

Data Library A data library is a collection of numeric and/or geospatial data sets for

secondary use in research.

A data library is normally part of a larger institution (academic, corporate, scientific, medical, governmental, etc.) established to serve the data users of that organisation.

The data library tends to house local data collections and provides access to them through various means (CD-/DVD-ROMs or central server for download).

A data library may also maintain subscriptions to licensed data resources for its users to access.

Page 35: Data literacy

Data libraries & Data librarians services

Reference Assistance User Instruction Technical Assistance Collection Development & Management Preservation and Data Sharing Services

Page 36: Data literacy

Big Data what it means

Page 37: Data literacy
Page 38: Data literacy

Handling the Big Data

Three key technologies that can help you get a handle on big data – and even more importantly, extract meaningful business value from it.

• Information management for big data.

• High-performance analytics for big data.

• Flexible deployment options for big data.

Page 39: Data literacy

1. SAS Information Management

Unified data management capabilities

including data governance, data integration, data quality and metadata management.

Complete analytics management

including model management, model deployment, monitoring and governance of the analytics information asset.

Effective decision management

capabilities to easily embed information and analytical results directly into business processes while managing the necessary business rules, workflow and event logic

Page 40: Data literacy

2. High-performance AnalyticsGrid Computing A centrally managed grid infrastructure provides dynamic

workload balancing, high availability and parallel processing for data management, analytics and reporting.

In-database processing Using the scalable architecture, in-database processing reduces the time needed to prepare data and build, deploy and update analytical models.

In-memory analytics Quickly create and deploy analytical models. Solve dedicated, industry-specific business challenges byProcessing detailed data in-memory within a distributed environment, rather than on a disk.

Support for Hadoop With SAS Information Management, you can effectively manage data and processing in the Hadoop environment (which stores and processes large volumes of data on commodity hardware).

Page 41: Data literacy

3. Flexible Deployment

For some organizations, it won’t make sense to build the IT infrastructure to support big data, especially if data demands are highly variable or unpredictable.

Those organizations can benefit from cloud computing, where big data analytics is delivered as a service and IT resources can be quickly adjusted to meet changing business demands.

Page 42: Data literacy

Conclusion Both data literacy and information literacy should be expanded to include critical thinking and statistical literacy.

Expanding data literacy to include statistical literacy will help to deal with inferring causation from associations (in social science).

Expanding information literacy to include statistical literacy will help to deal with information that involves statistics.

As such, including statistical literacy with information literacy and with data literacy will provide more opportunities for librarians to be of service in helping users think critically.

Page 43: Data literacy

ReferencesAssociation of College and Research Libraries (ACRL). 2011a. “Information Literacy Competency Standards

for Journalism Stu-dents and Professionals.” Accessed September 24, 2014.http://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/il_journalism.pdf

Information Litearcy Statistical Literacy and Data Litearcy.2004.Acccessed October 21,2014.http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2CCC9BFA3C0B3E0F05CABBD154E1DA15?doi=10.1.1.144.6309&rep=rep1&type=pdf

International Association for Social Science Information Services and Technology (IASSIST). Available at: www.iassistdata.org and http://datalib.library.ualberta.ca/

Linden, Julie (2002).Finding, Evaluating and Using Numeric Data. Presented at IASSIST 2002 conference, Storrs, Connecticut. Available at: http://ropercenter.uconn.edu/iassist2002/program.html

www.datajournalismhandbook.org

www.datalib.edina.ac.uk/Mantra/

www.ed.ac.uk

www.knightdigitalmediacenter.org

www.sas.com/resources/whitepaper/wp_46345.pdf

Www.dataone.org

Page 44: Data literacy
Page 45: Data literacy