big data – big noise · big data 3 •refers to non-traditional it generated data outside...

17
Big Data – Big Noise Its relevance to industrial Statistics in the context of SDG monitoring Shyam Upadhyaya UNIDO CCSA SPECIAL SESSION ON SHOWCASING BIG DATA 1 October 2015, Bangkok

Upload: others

Post on 11-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Big Data – Big Noise

Its relevance to industrial Statistics in the context of SDG monitoring

Shyam Upadhyaya UNIDO

CCSA SPECIAL SESSION ON SHOWCASING BIG DATA 1 October 2015, Bangkok

Page 2: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Data revolution and big data

2

New technologies are leading to an exponential increase in the volume and types of data … that are bigger, faster and more detailed than ever before. This is the data revolution. Report of Independent Expert Advisory Group formed by UN Secretary General

Data revolution generally refers to the technology of data collection, production and dissemination while big data is an outcome of the data revolution.

Page 3: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Big data

3

• Refers to non-traditional IT generated data outside official statistics

• Include not only numbers, but also geo-spatial data, audio, video and text files.

• More than 90% of today’s data of the world was created in the last two years

• Big data are being increasingly studied under the Data science while traditional, especially official data are produced according to the principles prescribed by the Statistical science

Page 4: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Characteristics of big data (but NOT the characteristics of statistical data)

Huge in volume terabytes (1000 GB or petabytes (1000 TB) ; Facebook ingests 500 terabytes of new data every day

High speed of processing real time data; Internet logs, tracking locations by mobile phones

Diverse in type Structured and unstructured data, number, text and images

Page 5: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Relevance of Internet and private data sources to industrial statistics

Source Relevance Social Networks (human-sourced information

Not so relevant; very difficult to extract data meaningful to economic aspects

Process mediated business system data

Relevant for consistency checking and comparison purpose

Internet data – Google trend

Useful for deriving extrapolator

Private data sources (PMI)

Often collected from the primary sources but the scope, coverage and reference periods might be different

Page 6: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Some examples of using Big Data in industrial statistics

Nissan car search in Google by region in Japan, July 2015

• Price data from Internet advertisements and catalogs to produce Billion prices CPI

• Monthly employment index from ADP company processing payroll data

• Google trend data allow to compare the trends derived from the search to nowcast the sales figures by commodities

(Figures on Motor vehicles from Choi, Varian - 2011 Predicting the present with Google trends; MAE – mean absolute error)

Page 7: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Employment data from Google

0.0

25.0

50.0

75.0

100.0

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2015

late

st

Percentage of manufacturing in relation to total jobs searches

Source: Google trends

Indicates the interest of job searchers based on perceived employment opportunities

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

2005 2006 2007 2008 2009 2010 2011

Number of employees in manufacturing industry of selected economies

Including China

Source: UNIDO INDSTAT Database

Excluding China

Indicates the actual number of jobs offered with (data available with certain time lags)

Page 8: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

On UNIDO press release On PMI report

Effect of media on Internet generated data

Page 9: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

0

20

40

60

80

100

2010 2011 2012 2013 2014 2015

Purchasing Manager's index

UNIDO Production index (IIP)

Trend based on popularity of terms among data searchers

Actual growth trends

Perceived and actual growth trends

Page 10: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

How good are big data?

• Differences in concepts and definition • Statistical and business accounting

terms mean different things even when they may sound similar

• Difference in reference year and other time frames

• Coverage and representativeness

• Knowledge and resources required to use big data

Page 11: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Area of potential use of Big data

• Reference data come from official statistics with certain time lags

• Big data may provide an extrapolator from:

• commodity sales, price and registration data • investment company data (FDI), banks etc.

• Estimates could be derived from the reference data (from

official statistics - INDSTAT) and extrapolator (big data – Google trends etc.)

• Estimates from trends might be available much earlier than from the official sources

Page 12: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

How Big Data are relevant to UNIDO’s role in SDGs monitoring

Industry related indicators 9.2 Manufacturing value added and employment 9.3 Share of small scale industry in industry output and access to financial services 9.4 CO2 emission per unit of industrial output 9.B Percentage of high-tech sectors in MVA

• At best, big data may provide ratio and growth rates, but SDG indicators require disaggregated data, by sector, gender, region whatever applicable

• SDG indicators require that figures are representative for entire economy

• Internet and privately generated data lack uniformity in data compilation methods

• NSO’s technical capacity of extracting

usable figures from the big-data

Page 13: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

• Transition to big data driven system requires the investment of time and resources

• Not all data are easily accessible; there are privacy and proprietary barriers

• Big data is no panacea to data gap, because data gaps

in official statistics are paralleled by gaps in big data

• It would be more sensible to invest on national capacity building than on acquiring and transforming big data to statistical indicators

Cost of using Big Data

Page 14: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Agenda 2030 on Statistics for SDG monitoring

We recognize that baseline data for several of the targets remain unavailable and we call for increased support for strengthening data collection and capacity building in Member States… (Para 57)

We will support developing countries, particularly African countries, LDCs, SIDS and LLDCs, in strengthening the capacity of national statistical offices… (Para 76) No mention of big data anymore

Page 15: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

• To begin with, UNIDO identifies the relevant big data sources that have no confidentiality and access issues

• Develop the statistical methodology on how to derive the extrapolator for early growth estimates

• Develop partnerships and data protocols with business and administrators that own big data

• Promote incremental and system-wide

integration of statistical and IT systems with NSOs and international agencies

What do we plan to do with “Big data”

Page 16: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Conclusions

• We cannot ignore the changes brought by data revolution, but we should be careful how to embrace it in our benefits

• As an international agency, we should be able to advise NSOs how to balance the big-data with official statistics

• We need to develop methodologies, conduct case studies, suggest classification

• Utilize more of the technological part of the data revolution to strengthen the role of official statistics

Page 17: Big Data – Big Noise · Big data 3 •Refers to non-traditional IT generated data outside official statistics •Include not only numbers, but also geo-spatial data, audio, video

Thank you!

UNIDO STAT-INFO services [email protected]

stat