big data for official statistics @ konferensi big data indonesia 2016

Post on 22-Jan-2018

1.476 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data, a Big Challenge forOfficial Statistics?

Setia PramanaPusat Kajian Komputasi StatistikSekolah Tinggi Ilmu Statistik

Official Statistics

Statistics published by government agencies or other public bodies such as international organizations as a public good.

One of the Official Statistics Producers in Indonesia is BPS

Badan Pusat Statistik

POSITION

BPS is a non-ministerial government institution

Under and responsible directly to the President of the Republic of Indonesia

Headed by a Chief Statistician

TASK

To execute governmental duty in the field of statistics according to the prevailing laws and regulations

Types of Data

Type Description Undertaken by

Basic

Statistics

• Used for a broad range of purposes

• Utilized by government and society

• Characters: cross-sectoral, nationally scale,

broadly aggregated

BPS-Statistics

Indonesia

Sectoral

Statistics

• Utilized by several particular institutions to

fulfill their main tasks

Govt institutions

(independently or in

collaboration with

BPS)

Special

Statistics

• Utilized to fulfill the specific needs of

business, education, socio-cultural, and other

purposed

Non-government

institutions,

organizations,

individuals and/or

other parts of the

society.

NATIONAL STATISTICAL SYSTEM

Statistical Data

Request

Statistical Community

Forum

Resources, Methods,

Infrastructure, Science &

Technology, Law

Component

SectoralStatistics

Basic Statistics

Special Statistics

Government Institution

BPS

Community

Data

Data

Synopsis

Data

BPSAs Clearing

House ofStatistical Information

Provider ofStatistical

Information

SurveyCompilation ofAdministrative

ProductOthers

CensusSurvey

Compilation ofAdministrative

ProductOthers

SurveyCompilation ofAdministrative

ProductOthers

Type Undertaker Methods Result

Feedback

Coordination, Integration, Synchronization, Standardization (CISS)

(5)(1)

(2)

(1)

(3)

(4)

NOTES: (1) BPS coordinates statistical undertaking; (2) Govt institutions submit survey plan and provides recommendations;(3) Govt institutions submit the result to BPS; (4) Private institutions or community submit synopsis (5) Govt institutions and private/community coordinate & cooperate with BPS

Data Collection Method

CENSUS

Enumerating all population units in Indonesia

Is conducted to obtain characteristics of the population at the certain period of time

Is held decennially

SURVEY

Enumerating samples of a population

Is conducted to estimate the characteristics of a population at a certain period of time

COMPILATION OF ADMINISTRATIVE RECORDS

Data collection, processing, dissemination, and analysis based on administrative records of government or community

Census Conducted by BPS (1)

• Mandate of Law No.16/1997 and an agenda of UN;

• Held every 10 years in years ended with 0;

• The 2010 Population Census was the sixth Population Census in Indonesia after 1961, 1971, 1980, 1990, and 2000;

• The 2010 Population Census was held to collect basic data on housing and population, demographic parameter, data for MDGsevaluation, and program targeting.

Population

Census

http://sp2010.bps.go.id

• Used as benchmark data for agricultural sector;

• Conducted every 10 years in years ended with 3;

• The 2013 Agricultural Census was the sixth agricultural census in Indonesia after 1963, 1973, 1983, 1993 and 2003;

• Agricultural characteristics: farmer household, number of livestock, land tenure, etc.

Agricultural

Census

http://st2013.bps.go.id

Census Conducted by BPS (2)

Census Conducted by BPS (3)

• Conducted every 10 year in years ending with 6;

• The upcoming 2016 Economic Census is going to be the fourth economic census in Indonesia after 1986, 1996, 2006, and 2016;

• Enterprise characteristics: number of enterprises, labor force, etc.

Economic

Census

Surveys Conducted by BPS

Several surveys conducted by BPS, among others:

Susenas (National Economic and Social Survey),

Sakernas (National Labor Force Survey),

Price Survey (Consumer, Rural Consumer, Wholesale)

Business and Consumer Tendency Survey

Industrial Survey

Indonesia Demographic and Health Survey

Etc.

Data Obtained from Compilation of Administrative Records

• Human Development Index,

• agricultural indicators,

• export and import,

• transportation,

• flow fund accounts,

• gross domestic product

• tourism

www.themegallery.com

Official Statistics News (Berita Resmi Statistik / BRS)

MonthlyQuarterly

(Feb, May, Aug, Nov)

Four-monthly, Semesterly,

AnnuallyInflation/Consumer Price

IndexGDP/Economic Growth Poverty (January and July)

Export Business Tendency Index Employment (May and November)

Import Consumer Tendency Index Forecast:

Trade Balance Manufacturing Industry- Production of Paddy, Maize, and Soybeans

Preliminary Figure Year n-1 (March)

Tourism - Large and Medium Scale- Production of Paddy, Maize, and Soybeans

Forecast I Year n (July)

Transportation - Micro and Small Scale- Production of Paddy, Maize, and Soybeans

Forecast II Year n (November)

Farmer Terms of Trade Producer Price Index (Oct’13)

Grain Producer Price

Wage

Wholesale Price Index

Quality of Statistics

Accuracy

Relevance

Timeliness

Accessibility

Coherence

Interpretability

Will Big Data Replacing the Official Statistics?

Official Statistics vs. Big Data

15

Dr. Jose Ramon G. Albert, NSCB, Philippines

Complementary Roles for Official Statistics and Big Data

• Provide variables to help BPS stratify better for sample surveys

• Improve sample survey estimates

• Help to compensate for nonresponse

• Help to check BPS estimates

• Help to improve the frequency and timeliness of data releases

• Help to improve and provide more small-area estimates

16

Cavan Capps and Tommy Wright, U.S. Census Bureau

Sources of Big Data

• Administrative data that arise from the administration of a program, be it governmental or not (e.g., electronic medical records, hospital visits, insurance records, bank records, and food banks)

• Commercial or transactional digital data that arise from the transaction between two entities (e.g., credit card transactions, online transactions including from mobile devices Sensor data (e.g., satellite imaging, road sensors, and climate sensors)

• GPS tracking devices (e.g., tracking data from mobile telephones)

• Behavioral data (e.g., online searches about a product, service, or any other type of information and online page views)

• Opinion data (e.g., comments on social media)

What have been done?

Connecting the Data

19

Google Dengue and Influence TrendGlobal Pulse: Price of rice trend based on Twitter

What have been done?

Proof of Concept Projects

Big Data for Predicting Commuting Patterns

Using Big Data to Nowcast Food Prices

Big Data for Predicting Commuting Patterns

Big Data for Predicting Commuting Patterns

Collaboration with Pulse Lab UN Jakarta

a big data project using multiple sources of data, e.g. social media, statistical data, etc.

to better understand inter-city commuting patterns using social media.

offers less expensive and easier way to collect information.

Data Sources

Jabodetabek Commuter Survey 2014

Sample: 13120 Household from 13 kabupaten/kota

Twitters (Februari 2014)

7 Million tweets

bases on Geotag

Preliminary Results

Commuter Survey Twitter

Using Big Data to Nowcast Food Prices

Using Big Data to Nowcast Food Prices

Collaboration with Pulse Lab UN Jakarta

Aim: to nowcast food prices using multiple sources of data including social media, Google Trends, and crowdsourcing as well as official statistics from BPS, Ministry of Trade and Ministry of Agriculture.

Locus: Kota Mataram, NTB

Time: March– July 2015

Data Sources

Data Sources, cont’d

Crowdsource

Premise

Premise

Contributor’s payment method:

online cash transfers (paypal)

mobile money,

grocery vouchers,

bitcoin,

gift cards.

Plus other incentives

Premises: Commodities

Premises: Quality checking and Fraud detection

Profile Fraud

User creates multiple profiles on

multiple phones. This gives the illusion that all observations are from different users with unique user IDs, when in actuality they are duplicates.

Group Collaboration Fraud

Users travel together to the same markets and submit the same items.

Premises: Fraud detection

Location Fraud

Users attempt to submit observations from one location as sourced from multiple locations by manually changing the store name.

Duplicate Data Fraud

Users attempt to submit the same products multiple times, often by changing the price for each submission

Data Sources

Food Prices: Consumer Prices Index BPS

HK 1.1 and HK 1.2

3 March until 13 July 2015.

Crowdsource: Data Preparation

107,973 records

Check and revise the unmatch quantity, size dan size.

Standard the price as data from crowdsourcing have different unit dan size unit

Data cleaning: remove record: uncomplete, error, dan record with unacceptable value.

Spline approach

Crowdsource: Challenges

Different unit size

Unknown commodity quality.

Lot of outliers (Large range max-min prices).

Contain strange observations, e.g. price: Rp 1,-

Uncompleted data

Number of observations per time is different.

Preliminary Results

Rice

Preliminary Results

Beef Chicken

Preliminary Results

Eggs Flour

Preliminary Results

0

5000

10000

15000

20000

Tomato

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

Sweet Potato

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

0

2000

4000

6000

8000

10000

12000

Long Bean

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

0

5000

10000

15000

20000

25000

30000

35000

40000

Green Chili

PREMISE Rata-Rata Pasar Rata-Rata Swalayan

Discussion

Similar pattern for commuting behavior and Twitter movement

Similar trend between crowdsourcing approach and BPS Survey for all commodities.

Data cleaning need to be more robust and automatized.

Extend to other commodities to predict inflation.

Can Crowdsourcing approach provide food price fast, and reliable?

Summary

Bigdata is complementing Official Statistics

Still many approaches have to be explored and studied

Big Data “maybe replacing” conventional approach, in some parts

Contributions from stakeholders needed

Researchers

STIS

Setia Pramana

Ricky Yordani

Budi Yuniarto

Robert Kurniawan

Pulse Lab

Jonggun Lee

Imaddudin

top related