advisory expert group big data statistics canada

23
ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Upload: ashley-king

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

ADVISORY EXPERT GROUPBIG DATA

Statistics Canada

Page 2: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Outline

Big data and the National Accounts Establishing the right infrastructure Lessons learned: case studies from Statistics Canada

Traditional big data Scanner data Electricity consumption Credit card and Interact Remote sensing

2 Statistics Canada • Statistique Canada

Page 3: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Big data and the National Accounts

From a business perspective "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.…. – (Gartner 2012) Wikipedia

From an NSO perspective "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to which could reduce respondent burden, increase quality, develop new statistical products or enhance the detail of existing statistical products…..…. – ????

3 Statistics Canada • Statistique Canada

Page 4: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Big data and the National Accounts

Mich Couper from the University of Michigan’s’ Survey Research Center sites the following limitations NSO will face when confronting Big data:

• lack of covariates in the datasets;• self-selection and self-reporting biases;• lack of stability;• privacy issues;• access issues;• opportunity for mischief;• size issues; and• selective reporting of results (file drawer problem).

You could add to that• Sustainability – data sources disappear, systems change, perceptions change.

1. Couper, Mick P., Is the Sky Falling: New Technology, Changing Media, and the Future of Surveys. (Presentation, European Survey Research Association, 5th Conference, Ljubljana, Slovenia, July, 2013)

Statistics Canada • Statistique Canada4

Page 5: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

There needs to be up-front acknowledgement that we are trying to fit a square peg in a round hole….

The needs of business (big data to increase business intelligence) and national accountants (big data to produce comprehensive macroeconomic statistics) is quite different.

5 Statistics Canada • Statistique Canada

Dimensions of the data Needs of National Accountants

Needs of business

Scope of the dataset Comprehensive Limited to the needs of the business

Use of the dataset Produce meaningful aggregate statistics

Find patterns, explore the detail

Structure of the dataset On-going, stable, regular Structure can change as required by the business

Big data and the National Accounts

Page 6: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Putting in place the appropriate infrastructure

In order to determine how to best leverage big data NSO needs to put in place the proper infrastructure to:

1. Obtain the data

2. Process the data

3. Evaluate the data

4. Integrate the data

Statistics Canada • Statistique Canada6

Page 7: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Putting in place the appropriate infrastructure – Obtaining the data

Use of legislation – e.g., Section 13 of Canada’s Statistics Act states that “A person having the custody or charge of any documents or records that are maintained in any department or in any municipal office, corporation, business or organization, from which information sought in respect of the objects of this Act can be obtained or that would aid in the completion or correction of that information, shall grant access thereto for those purposes to a person authorized by the Chief Statistician to obtain that information or aid in the completion or correction of that information.” 1970-71-72, c. 15, s. 12.

Memorandum of understanding (MOUs) which outline:• Roles and responsibilities• Delivery mechanism• Uses of data• Termination of the agreement

Purchasing big data• Many firms sell big data that can be used for business intelligence – it could also

be purchased for statistical purposes. Under what conditions and terms should NSOs purchase big data?

Statistics Canada • Statistique Canada7

Page 8: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Putting in place the appropriate infrastructure – Processing the data

File transfer system - NSOs need a secure, high capacity file transfer system to transfer data from the data provider to the NSO.

Storage and processing capacity - In most NSOs (especially NA divisions) the processing capacity for big data does not exist.

Software - Statistics Canada is leveraging the SAS distributed computing solution called “SAS Grid” to shorten the time needed to process and analyze its larger data holdings. Also, the Data Analysis Resource Center at Statistics Canada maintains a research computer with analytical software installed, offering a wide range of add-ons that provide advanced analytical and visualization tools particular to big data analytics.

Information management policies – Access, privacy, confidentiality, retention

Statistics Canada • Statistique Canada8

Page 9: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Putting in place the appropriate infrastructure – Evaluating the data

Big data community of practice• There needs to be a structure in place that allows analysts and

programs to gain knowledge and share experiences with respect to big data, to engage with colleagues internally or externally when needed and to report findings to senior managers when appropriate.

Big data needs to be evaluated with respect to its: Quality Coverage Timeliness Detail Regularity

In order to leverage big data we need to develop a research and development orientation.

Statistics Canada • Statistique Canada9

Page 10: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics Canada:International merchandise trade statistics

Collection/access agreement: Access to detailed customs data is governed by two memorandum of understanding: one with the Canadian Revenue Agency and one with the U.S. Census Bureau

Cost: Nil Dimensions: 1.5 Terabytes, 60 attributes Uses: Balance of Payments, International Merchandise Trade

Statistics Timeliness: 35 days following the reference period Frequency: Daily, if required Potential uses: Creating an importer and exporter characteristics file

which can be used to analyze the entry an exit of Canadian traders within the Canadian economy, used in studies of globalization, global production, goods for processing, foreign affiliate statistics.

10 Statistics Canada • Statistique Canada

Page 11: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics Canada:Taxation statistics

Collection/access agreement: Access to detailed taxation statistics is governed by a memorandum of understanding with the Canada Revenue Agency.

Cost: Approximately $1.6 million Dimensions: 6 Terabytes and growing Uses: Benchmark estimates of wages and salaries; output; property

incomes, taxes, etc. Timeliness: Earliest use – 45 data following the reference period Frequency: Mainly annual, some monthly (goods and services

taxation statistics) Potential uses: Creation of a National Accounts longitudinal file—a

business level micro-data file that can be used to undertake studies such as GDP by city, GDP by firm size, productivity by firm size.

11 Statistics Canada • Statistique Canada

Page 12: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics Canada:Government finance statistics

Collection/access agreement: No formal agreement in place – institutional understanding between Statistics Canada and the government jurisdictions.

Cost: Nil Dimensions: 40 million financial transactions, 200 GB Uses: Government Finance Statistics, government sector – National

Accounts Timeliness: Earliest is 15 days following the reference period. Frequency: Monthly, quarterly, annual Potential uses: Local government remains a ‘survey of municipalities’,

access to electronic files will increase our ability to provide CMA level data as well as increased revenue and expenditure details. Potential data uses for the health, education and justice programs.

12 Statistics Canada • Statistique Canada

Page 13: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics Canada:Electronic household transactions (credit and debit)

Collection/access agreement: Memorandum of understanding outlining the roles and responsibilities of both Statistics Canada and the data provider.

Cost: Nil Dimensions: “Aggregated” big data - number of transactions, value of

transactions aggregated by merchant group by place of transaction (domestic, international) by class of transactor (personal or commercial).

Uses: Indicator for household final consumption expenditure and international travel abroad

Timeliness: Earliest is 15 days following the reference period. Frequency: Monthly Potential uses: International travel services, monthly household final

consumption expenditure.

13 Statistics Canada • Statistique Canada

Page 14: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Statistics Canada • Statistique Canada14

Examples of big data research at Statistics Canada:Electronic household transactions (credit and debit)

Page 15: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Statistics Canada • Statistique Canada15

Examples of big data research at Statistics Canada:Electronic household transactions (credit and debit)

Page 16: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics Canada:Scanner data: vendor specific

Collection/Access Agreement: MOU in negotiation Cost: Current costs are nil though the long-term approach being proposed would involve a quid

pro quo agreement where CPD would provide the company their data back with value added (i.e., an implicit cost would be borne by the division).

Dimensions: Sales, quantities, and item descriptions of all goods sold for a given store over a given period

Uses: Consumer prices and household expenditure weights to feed the CPI Timeliness: TBD, though potentially as little as a one day lag (e.g., weekly data for a given week

could be delivered on the first day of the following week). Frequency: Initial data has been provided on a weekly aggregated basis. Future work will look at

daily and / or transactional level data. Dataset size: For one week of sales data (aggregated on the week) for one store,

• roughly 4,000 KB

• roughly 30,000 rows (i.e., unique items sold)

• implies roughly 200MB for one year of weekly aggregated data for one store. Potential uses moving forward: Direct input into the calculation of the CPI (potential

replacement for collected prices), studies on consumer behaviour, CPI weights, household final consumption expenditures, retail sales.

16 Statistics Canada • Statistique Canada

Page 17: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics CanadaSmart meter: household electricity consumption

Collection/access agreement: Two memoranda of understanding with two regional electricity distributors

Cost: Nil Dimensions: Roughly 200 GB of raw hourly electricity

consumption data have been obtained, providing detailed information on approximately 120,000 customers, between the years of 2008 to 2013

Uses: Household electricity consumption Timeliness: Earliest is 15 days following the reference period. Frequency: Hourly Potential uses: Household final consumption expenditure, monthly

Gross Domestic Product’s utilities.

17 Statistics Canada • Statistique Canada

Page 18: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Statistics Canada • Statistique Canada18

Total residential consumption

Examples of big data research at Statistics CanadaSmart meter: household electricity consumption

Page 19: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics CanadaSatellite Imaging: Land Account

Collection/Access Agreement: Public data Cost: Nil Dimensions: 20 GB. Although not apparent here, “dimension” of this

type of big data (which is not really big data, strictly speaking) may well explode in the coming years. LIDAR datasets (high resolution radar), as well as higher resolution (space and time) satellite data will require terabytes of storage and “terahertz” of processing capacity.

Uses: Land accounts: Land cover / land use change 2000 and 2010 - 2013

Timeliness: 3 years lag Frequency: Annual Potential Uses moving forward: Landscape and freshwater

ecosystem accounts

19 Statistics Canada • Statistique Canada

Page 20: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Statistics Canada • Statistique Canada20

Examples of big data research at Statistics CanadaRemote sensing: land use

Page 21: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Examples of big data research at Statistics CanadaWater Measurement Instruments: Water Account

Collection/Access Agreement: Informal agreement with Water Survey of Canada

Cost: Nil Dimensions: Original WSC data is 5 GB; derived water yield data

is 90 GB Uses: Water accounts: Water Yield Timeliness: From real-time to lag of several years Frequency: Daily Potential Uses moving forward: Freshwater ecosystem accounts

21 Statistics Canada • Statistique Canada

Page 22: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Some lessons learned so far

1. Quid pro quo – is important when trying to obtain ‘big data’. Firms are more willing to part with their ‘big data’ if you show them how they will receive a ‘business intelligence’ benefit on their side.

2. Cost – ‘big data’ is not always the cheapest option. It is sometimes easier to have the firm complete the survey than to create an infrastructure to receive and process their data. For example, the data received from local electricity providers is equivalent to the completion of two questions on our current survey.

3. Classification systems – ‘big data’ does not follow any standard classification system. For example, electronic retail transactions are classified according to merchant groups rather than industries.

4. Big data aggregates – asking firms to aggregate their ‘big data’ is an option.

5. Data formats – Need to work with new data formats that we are often not familiar with.

22 Statistics Canada • Statistique Canada

Page 23: ADVISORY EXPERT GROUP BIG DATA Statistics Canada

Discussion point for the AEG

23 Statistics Canada • Statistique Canada

• In order to exploit the potential of big data, NSOs need to make significant investments. How can we leverage the work taking place across various NSOs to minimize the investment and maximize the return?

• How do we promote the development of new data products using big data over using big data to re-construct existing data products? Do we adjust our frameworks to accommodate big data or do we adjust big data to accommodate our frameworks?