big data for official statistics @ konferensi big data indonesia 2016

46
Big Data, a Big Challenge for Official Statistics? Setia Pramana Pusat Kajian Komputasi Statistik Sekolah Tinggi Ilmu Statistik

Upload: setia-pramana

Post on 22-Jan-2018

1.476 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Big Data, a Big Challenge forOfficial Statistics?

Setia PramanaPusat Kajian Komputasi StatistikSekolah Tinggi Ilmu Statistik

Page 2: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Official Statistics

Statistics published by government agencies or other public bodies such as international organizations as a public good.

One of the Official Statistics Producers in Indonesia is BPS

Page 3: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Badan Pusat Statistik

POSITION

BPS is a non-ministerial government institution

Under and responsible directly to the President of the Republic of Indonesia

Headed by a Chief Statistician

TASK

To execute governmental duty in the field of statistics according to the prevailing laws and regulations

Page 4: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Types of Data

Type Description Undertaken by

Basic

Statistics

• Used for a broad range of purposes

• Utilized by government and society

• Characters: cross-sectoral, nationally scale,

broadly aggregated

BPS-Statistics

Indonesia

Sectoral

Statistics

• Utilized by several particular institutions to

fulfill their main tasks

Govt institutions

(independently or in

collaboration with

BPS)

Special

Statistics

• Utilized to fulfill the specific needs of

business, education, socio-cultural, and other

purposed

Non-government

institutions,

organizations,

individuals and/or

other parts of the

society.

Page 5: Big data for official statistics @ Konferensi Big Data Indonesia 2016

NATIONAL STATISTICAL SYSTEM

Statistical Data

Request

Statistical Community

Forum

Resources, Methods,

Infrastructure, Science &

Technology, Law

Component

SectoralStatistics

Basic Statistics

Special Statistics

Government Institution

BPS

Community

Data

Data

Synopsis

Data

BPSAs Clearing

House ofStatistical Information

Provider ofStatistical

Information

SurveyCompilation ofAdministrative

ProductOthers

CensusSurvey

Compilation ofAdministrative

ProductOthers

SurveyCompilation ofAdministrative

ProductOthers

Type Undertaker Methods Result

Feedback

Coordination, Integration, Synchronization, Standardization (CISS)

(5)(1)

(2)

(1)

(3)

(4)

NOTES: (1) BPS coordinates statistical undertaking; (2) Govt institutions submit survey plan and provides recommendations;(3) Govt institutions submit the result to BPS; (4) Private institutions or community submit synopsis (5) Govt institutions and private/community coordinate & cooperate with BPS

Page 6: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Data Collection Method

CENSUS

Enumerating all population units in Indonesia

Is conducted to obtain characteristics of the population at the certain period of time

Is held decennially

SURVEY

Enumerating samples of a population

Is conducted to estimate the characteristics of a population at a certain period of time

COMPILATION OF ADMINISTRATIVE RECORDS

Data collection, processing, dissemination, and analysis based on administrative records of government or community

Page 7: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Census Conducted by BPS (1)

• Mandate of Law No.16/1997 and an agenda of UN;

• Held every 10 years in years ended with 0;

• The 2010 Population Census was the sixth Population Census in Indonesia after 1961, 1971, 1980, 1990, and 2000;

• The 2010 Population Census was held to collect basic data on housing and population, demographic parameter, data for MDGsevaluation, and program targeting.

Population

Census

http://sp2010.bps.go.id

Page 8: Big data for official statistics @ Konferensi Big Data Indonesia 2016

• Used as benchmark data for agricultural sector;

• Conducted every 10 years in years ended with 3;

• The 2013 Agricultural Census was the sixth agricultural census in Indonesia after 1963, 1973, 1983, 1993 and 2003;

• Agricultural characteristics: farmer household, number of livestock, land tenure, etc.

Agricultural

Census

http://st2013.bps.go.id

Census Conducted by BPS (2)

Page 9: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Census Conducted by BPS (3)

• Conducted every 10 year in years ending with 6;

• The upcoming 2016 Economic Census is going to be the fourth economic census in Indonesia after 1986, 1996, 2006, and 2016;

• Enterprise characteristics: number of enterprises, labor force, etc.

Economic

Census

Page 10: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Surveys Conducted by BPS

Several surveys conducted by BPS, among others:

Susenas (National Economic and Social Survey),

Sakernas (National Labor Force Survey),

Price Survey (Consumer, Rural Consumer, Wholesale)

Business and Consumer Tendency Survey

Industrial Survey

Indonesia Demographic and Health Survey

Etc.

Page 11: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Data Obtained from Compilation of Administrative Records

• Human Development Index,

• agricultural indicators,

• export and import,

• transportation,

• flow fund accounts,

• gross domestic product

• tourism

Page 12: Big data for official statistics @ Konferensi Big Data Indonesia 2016

www.themegallery.com

Official Statistics News (Berita Resmi Statistik / BRS)

MonthlyQuarterly

(Feb, May, Aug, Nov)

Four-monthly, Semesterly,

AnnuallyInflation/Consumer Price

IndexGDP/Economic Growth Poverty (January and July)

Export Business Tendency Index Employment (May and November)

Import Consumer Tendency Index Forecast:

Trade Balance Manufacturing Industry- Production of Paddy, Maize, and Soybeans

Preliminary Figure Year n-1 (March)

Tourism - Large and Medium Scale- Production of Paddy, Maize, and Soybeans

Forecast I Year n (July)

Transportation - Micro and Small Scale- Production of Paddy, Maize, and Soybeans

Forecast II Year n (November)

Farmer Terms of Trade Producer Price Index (Oct’13)

Grain Producer Price

Wage

Wholesale Price Index

Page 13: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Quality of Statistics

Accuracy

Relevance

Timeliness

Accessibility

Coherence

Interpretability

Page 14: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Will Big Data Replacing the Official Statistics?

Page 15: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Official Statistics vs. Big Data

15

Dr. Jose Ramon G. Albert, NSCB, Philippines

Page 16: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Complementary Roles for Official Statistics and Big Data

• Provide variables to help BPS stratify better for sample surveys

• Improve sample survey estimates

• Help to compensate for nonresponse

• Help to check BPS estimates

• Help to improve the frequency and timeliness of data releases

• Help to improve and provide more small-area estimates

16

Cavan Capps and Tommy Wright, U.S. Census Bureau

Page 17: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Sources of Big Data

• Administrative data that arise from the administration of a program, be it governmental or not (e.g., electronic medical records, hospital visits, insurance records, bank records, and food banks)

• Commercial or transactional digital data that arise from the transaction between two entities (e.g., credit card transactions, online transactions including from mobile devices Sensor data (e.g., satellite imaging, road sensors, and climate sensors)

• GPS tracking devices (e.g., tracking data from mobile telephones)

• Behavioral data (e.g., online searches about a product, service, or any other type of information and online page views)

• Opinion data (e.g., comments on social media)

Page 18: Big data for official statistics @ Konferensi Big Data Indonesia 2016

What have been done?

Page 19: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Connecting the Data

19

Google Dengue and Influence TrendGlobal Pulse: Price of rice trend based on Twitter

Page 20: Big data for official statistics @ Konferensi Big Data Indonesia 2016

What have been done?

Page 21: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Proof of Concept Projects

Big Data for Predicting Commuting Patterns

Using Big Data to Nowcast Food Prices

Page 22: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Big Data for Predicting Commuting Patterns

Page 23: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Big Data for Predicting Commuting Patterns

Collaboration with Pulse Lab UN Jakarta

a big data project using multiple sources of data, e.g. social media, statistical data, etc.

to better understand inter-city commuting patterns using social media.

offers less expensive and easier way to collect information.

Page 24: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Data Sources

Jabodetabek Commuter Survey 2014

Sample: 13120 Household from 13 kabupaten/kota

Twitters (Februari 2014)

7 Million tweets

bases on Geotag

Page 25: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Preliminary Results

Commuter Survey Twitter

Page 26: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Using Big Data to Nowcast Food Prices

Page 27: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Using Big Data to Nowcast Food Prices

Collaboration with Pulse Lab UN Jakarta

Aim: to nowcast food prices using multiple sources of data including social media, Google Trends, and crowdsourcing as well as official statistics from BPS, Ministry of Trade and Ministry of Agriculture.

Locus: Kota Mataram, NTB

Time: March– July 2015

Page 28: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Data Sources

Page 29: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Data Sources, cont’d

Crowdsource

Page 30: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Premise

Page 31: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Premise

Contributor’s payment method:

online cash transfers (paypal)

mobile money,

grocery vouchers,

bitcoin,

gift cards.

Plus other incentives

Page 32: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Premises: Commodities

Page 33: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Premises: Quality checking and Fraud detection

Profile Fraud

User creates multiple profiles on

multiple phones. This gives the illusion that all observations are from different users with unique user IDs, when in actuality they are duplicates.

Group Collaboration Fraud

Users travel together to the same markets and submit the same items.

Page 34: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Premises: Fraud detection

Location Fraud

Users attempt to submit observations from one location as sourced from multiple locations by manually changing the store name.

Duplicate Data Fraud

Users attempt to submit the same products multiple times, often by changing the price for each submission

Page 35: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Data Sources

Food Prices: Consumer Prices Index BPS

HK 1.1 and HK 1.2

3 March until 13 July 2015.

Page 36: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Crowdsource: Data Preparation

107,973 records

Check and revise the unmatch quantity, size dan size.

Standard the price as data from crowdsourcing have different unit dan size unit

Data cleaning: remove record: uncomplete, error, dan record with unacceptable value.

Spline approach

Page 37: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Crowdsource: Challenges

Different unit size

Unknown commodity quality.

Lot of outliers (Large range max-min prices).

Contain strange observations, e.g. price: Rp 1,-

Uncompleted data

Number of observations per time is different.

Page 38: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Preliminary Results

Rice

Page 39: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Preliminary Results

Beef Chicken

Page 40: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Preliminary Results

Eggs Flour

Page 41: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Preliminary Results

0

5000

10000

15000

20000

Tomato

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

Sweet Potato

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

Page 42: Big data for official statistics @ Konferensi Big Data Indonesia 2016

0

2000

4000

6000

8000

10000

12000

Long Bean

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

0

5000

10000

15000

20000

25000

30000

35000

40000

Green Chili

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

Page 43: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Discussion

Similar pattern for commuting behavior and Twitter movement

Similar trend between crowdsourcing approach and BPS Survey for all commodities.

Data cleaning need to be more robust and automatized.

Extend to other commodities to predict inflation.

Can Crowdsourcing approach provide food price fast, and reliable?

Page 44: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Summary

Bigdata is complementing Official Statistics

Still many approaches have to be explored and studied

Big Data “maybe replacing” conventional approach, in some parts

Contributions from stakeholders needed

Page 45: Big data for official statistics @ Konferensi Big Data Indonesia 2016

Researchers

STIS

Setia Pramana

Ricky Yordani

Budi Yuniarto

Robert Kurniawan

Pulse Lab

Jonggun Lee

Imaddudin

Page 46: Big data for official statistics @ Konferensi Big Data Indonesia 2016