big data – big noise · big data 3 •refers to non-traditional it generated data outside...
TRANSCRIPT
Big Data – Big Noise
Its relevance to industrial Statistics in the context of SDG monitoring
Shyam Upadhyaya UNIDO
CCSA SPECIAL SESSION ON SHOWCASING BIG DATA 1 October 2015, Bangkok
Data revolution and big data
2
New technologies are leading to an exponential increase in the volume and types of data … that are bigger, faster and more detailed than ever before. This is the data revolution. Report of Independent Expert Advisory Group formed by UN Secretary General
Data revolution generally refers to the technology of data collection, production and dissemination while big data is an outcome of the data revolution.
Big data
3
• Refers to non-traditional IT generated data outside official statistics
• Include not only numbers, but also geo-spatial data, audio, video and text files.
• More than 90% of today’s data of the world was created in the last two years
• Big data are being increasingly studied under the Data science while traditional, especially official data are produced according to the principles prescribed by the Statistical science
Characteristics of big data (but NOT the characteristics of statistical data)
Huge in volume terabytes (1000 GB or petabytes (1000 TB) ; Facebook ingests 500 terabytes of new data every day
High speed of processing real time data; Internet logs, tracking locations by mobile phones
Diverse in type Structured and unstructured data, number, text and images
Relevance of Internet and private data sources to industrial statistics
Source Relevance Social Networks (human-sourced information
Not so relevant; very difficult to extract data meaningful to economic aspects
Process mediated business system data
Relevant for consistency checking and comparison purpose
Internet data – Google trend
Useful for deriving extrapolator
Private data sources (PMI)
Often collected from the primary sources but the scope, coverage and reference periods might be different
Some examples of using Big Data in industrial statistics
Nissan car search in Google by region in Japan, July 2015
• Price data from Internet advertisements and catalogs to produce Billion prices CPI
• Monthly employment index from ADP company processing payroll data
• Google trend data allow to compare the trends derived from the search to nowcast the sales figures by commodities
(Figures on Motor vehicles from Choi, Varian - 2011 Predicting the present with Google trends; MAE – mean absolute error)
Employment data from Google
0.0
25.0
50.0
75.0
100.0
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2015
late
st
Percentage of manufacturing in relation to total jobs searches
Source: Google trends
Indicates the interest of job searchers based on perceived employment opportunities
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
2005 2006 2007 2008 2009 2010 2011
Number of employees in manufacturing industry of selected economies
Including China
Source: UNIDO INDSTAT Database
Excluding China
Indicates the actual number of jobs offered with (data available with certain time lags)
On UNIDO press release On PMI report
Effect of media on Internet generated data
0
20
40
60
80
100
2010 2011 2012 2013 2014 2015
Purchasing Manager's index
UNIDO Production index (IIP)
Trend based on popularity of terms among data searchers
Actual growth trends
Perceived and actual growth trends
How good are big data?
• Differences in concepts and definition • Statistical and business accounting
terms mean different things even when they may sound similar
• Difference in reference year and other time frames
• Coverage and representativeness
• Knowledge and resources required to use big data
Area of potential use of Big data
• Reference data come from official statistics with certain time lags
• Big data may provide an extrapolator from:
• commodity sales, price and registration data • investment company data (FDI), banks etc.
• Estimates could be derived from the reference data (from
official statistics - INDSTAT) and extrapolator (big data – Google trends etc.)
• Estimates from trends might be available much earlier than from the official sources
How Big Data are relevant to UNIDO’s role in SDGs monitoring
Industry related indicators 9.2 Manufacturing value added and employment 9.3 Share of small scale industry in industry output and access to financial services 9.4 CO2 emission per unit of industrial output 9.B Percentage of high-tech sectors in MVA
• At best, big data may provide ratio and growth rates, but SDG indicators require disaggregated data, by sector, gender, region whatever applicable
• SDG indicators require that figures are representative for entire economy
• Internet and privately generated data lack uniformity in data compilation methods
• NSO’s technical capacity of extracting
usable figures from the big-data
• Transition to big data driven system requires the investment of time and resources
• Not all data are easily accessible; there are privacy and proprietary barriers
• Big data is no panacea to data gap, because data gaps
in official statistics are paralleled by gaps in big data
• It would be more sensible to invest on national capacity building than on acquiring and transforming big data to statistical indicators
Cost of using Big Data
Agenda 2030 on Statistics for SDG monitoring
We recognize that baseline data for several of the targets remain unavailable and we call for increased support for strengthening data collection and capacity building in Member States… (Para 57)
We will support developing countries, particularly African countries, LDCs, SIDS and LLDCs, in strengthening the capacity of national statistical offices… (Para 76) No mention of big data anymore
• To begin with, UNIDO identifies the relevant big data sources that have no confidentiality and access issues
• Develop the statistical methodology on how to derive the extrapolator for early growth estimates
• Develop partnerships and data protocols with business and administrators that own big data
• Promote incremental and system-wide
integration of statistical and IT systems with NSOs and international agencies
What do we plan to do with “Big data”
Conclusions
• We cannot ignore the changes brought by data revolution, but we should be careful how to embrace it in our benefits
• As an international agency, we should be able to advise NSOs how to balance the big-data with official statistics
• We need to develop methodologies, conduct case studies, suggest classification
• Utilize more of the technological part of the data revolution to strengthen the role of official statistics