tabulations in sas with time series a...

Post on 26-Feb-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Tabulations in SAS with Time Series –

A PerspectivePresenters

Karine Désilets, Statistics Canada, Ottawa, Canada.

Jun Li, Statistics Canada , Ottawa, Canada.

AbstractPresenting efficient ways to calculate growth rates and include Seasonal Data,

Raking and Benchmarking in Tabulation Tools. The pros and cons of tabulating with

Proc Means, Proc Summary, Proc Tabulate and Proc Report will also be compared.

Telling Canada’s story in numbers

Karine Désilets

Jun Li

System Engineering Division

Statistics Canada

May, 2018

Tabulations in SAS with

Times Series – A perspective

www.statcan.gc.ca

Today’s Topics

40 minutes - with the Presentation of :

Our Generalized Tabulation Tools

Some Proc for Tabulations

Growth Rate Calculations

Seasonal Adjustment with X-12-ARIMA, 2 Days Class, H-0434

Theory and Application of Benchmarking, 2 Days Class, H-0436

Theory and Application of Raking for Time Series, 2 Days Class, H-0437

Agenda

Introduction to Time Series and Tabulation Tools

Econometric with SAS/ETS

Background and Strategic Fit of Unadjusted/Seasonally Adjusted Data

Economic World: X12-ARIMA, Raking and Benchmarking

Pros / Cons :

Means, Summary, Tabulate, Report Procs

New Generation: Threaded Means Proc with Viya and CAS

Growth Rate / Percentage Change

Conclusion

What is a Time Series ? (Industry)

Source: twitter,visme

What is a Time Series ? (Key Indicators)

Source: Statistics Canada Official Web Site

What is a Time Series ? (Cansim)

Source: Statistics Canada Official Web Site - Cansim

Annual

Quarterly

Monthly

ID SEX

AGE

GROUP WEIGHT INCOME

1 0 1 5 0

2 0 2 10 900

3 0 3 15 -5

4 0 4 1 3

5 1 1 5 6

6 1 2 5 10

7 1 3 12 4

8 1 4 1 2

Input MicrodataTabulated Data

SEX

AGE

GROUP

SUM

WEIGHT

INCOME

SUM

WEIGHT

. . 54 9058

0 . 31 8928

1 . 23 130

. 1 10 30

. 2 15 9050

. 3 27 -27

. 4 2 5

0 1 5 0

0 2 10 9000

0 3 15 -75

0 4 1 3

1 1 5 30

1 2 5 50

1 3 12 48

1 4 1 2

XML Input FileMetadata

Definitions

What is a Tabulation Tool ?

TABULATION

Injection File

Categorical Data

Dimensions

Weighted / Unweighted

Statistics to Compute

Confidentiality / Rounding

Injection File

From continuous microdata there is a need to create categories.

Example:

Age Age Group

Age Description GroupO -17 Persons under 18 years 1

18 – 64 Persons 18 to 64 years 2

65 and up Persons 65 year and over 3

. Not Responded 4

ID SEX

AGE

GROUP WEIGHT INCOME

1 0 1 5 0

2 0 2 10 900

3 0 3 15 -5

4 0 4 1 3

5 1 1 5 6

6 1 2 5 10

7 1 3 12 4

8 1 4 1 2

ID SEX

AGE

GROUP WEIGHT INCOME

1 0 1 5 0

2 0 2 10 900

3 0 3 15 -5

4 0 4 1 3

5 1 1 5 6

6 1 2 5 10

7 1 3 12 4

8 1 4 1 2

Input MicrodataTabulated Data

SEX

AGE

GROUP

SUM

WEIGHT

INCOME

SUM

WEIGHT

. . 54 9058

0 . 31 8928

1 . 23 130

. 1 10 30

. 2 15 9050

. 3 27 -27

. 4 2 5

0 1 5 0

0 2 10 9000

0 3 15 -75

0 4 1 3

1 1 5 30

1 2 5 50

1 3 12 48

1 4 1 2

XML Input FileMetadata

Definitions

What is a Tabulation Tool ?

TABULATION

Injection File

Categorical Data

Dimensions

Weighted / Unweighted

Statistics to Compute

Confidentiality / Rounding

Tabulation Tool Actual State

A Generalized Tabulation tool has been developed for Social Survey,

Administrative Data and soon, Census Data :

Create Tabulated Data Tables;

Calculate Precision Measures;

Apply confidentiality rules and/or rounding consistently across data sources;

Disseminate output or custom products for internal and/or external clients

Dynamically produce tabulation for a specific period or for time series

Unadjusted Data at :

Annual level

Infra annual level (quarters or months)

Inclusion of socio-economic/economic field would make the need of

Seasonally Adjusted Data to be created and then, tabulate them.

Actual Statistics Calculations Available

Level – 1 Statistics

All Statistics available in Proc

Means.

Examples:

Sum, median, percentile, max,

min, weighted sum, count

Level – 2 Statistics

Gini

Geomean

Level – 3 Statistics

Ratio

Share

Distribution

Level – 4 Statistics

Moving Average

Level – 5 Statistics

Level Change

Percentage Change

Significance Test

Input Categorical Data

What is a Tabulation Tool for Time Series?

Year ID SEX

AGE

GROUP WEIGHT INCOME

2001 1 0 1 5 0

2001 2 0 2 10 900

2001 3 0 3 15 -5

2001 4 0 4 1 3

2001 5 1 1 5 6

2001 6 1 2 5 10

2001 7 1 3 12 4

2001 8 1 4 1 2

2002 1 0 1 5 0

2002 2 0 2 10 900

2002 3 0 3 15 -5

2002 4 0 4 1 3

2002 5 1 1 5 6

2002 6 1 2 5 10

2002 7 1 3 12 4

2002 8 1 4 1 2

Year Month ID SEX

AGE

GROUP WEIGHT INCOME

2001 1 1 0 1 5 0

2001 2 2 0 2 10 516

2001 3 3 0 3 15 -4

2002 1 1 0 4 1 2

2002 2 2 1 1 5 6

2002 3 3 1 2 5 9

2003 1 1 1 3 12 0

2003 2 2 1 4 1 1

2003 3 3 0 1 5 0

2004 1 1 0 2 10 323

2004 2 2 0 3 15 -4

2004 3 3 0 4 1 1

2005 1 1 1 1 5 4

2005 2 2 1 2 5 9

2005 3 3 1 3 12 4

Quarter ?

Econometrics Calculations with SAS/ETS

SAS/ETS software, a component of the SAS System, provides SAS

procedures for:

econometric analysis

time series analysis

time series forecasting

systems modeling and simulation

discrete choice analysis

analysis of qualitative and limited dependent variable models

seasonal adjustment of time series data

financial analysis and reporting

access to economic and financial databases

time series data management

Looking for Time Series

related calculations ?

The answer is certainly with

the documentation.

Economic Calculations with SAS/EG

Background and Strategic Fit

Time Series are part of current life and lead the economy.

Tabulation tools already exist and can support Unadjusted Time Series :

• Annual level

• Infra annual level (Quarters or Months)

Next Steps:

• Unadjusted Data

• Seasonally Adjusted Data (PROC X12-ARIMA)

• Raking

• Benchmarking

• Incorporation of Growth Rates (Percentage Change) as a Statistic

• Which PROC to use to Tabulate ?

Could We tabulate Time Series Microdata with Basic concepts related to Time series ?

Briefly: What is Seasonally Adjusted Data?Unadjusted Series:

Trend-Cycle

Seasonal Component

Trading-Day/Easter Effects

Irregular

Seasonally Adjusted Series :

Combination of trend-cycle and

irregular components

Estella Dagum from Statistics

Canada created the X11-ARIMA

Method in the 1970.350000

370000

390000

410000

430000

450000

470000

490000

510000

530000

550000

Q1 2007 Q1 2009 Q1 2011 Q1 2013 Q1 2015 Q1 2017

GDP at market prices

Source of graph : Statistics Canada - CSMA foundations – module 8 extension – with approbation of

Jim Tebrake.

How to Produce Seasonally Adjusted Data?With PROC X-12-ARIMA. :

USA Census Office Bureau method is called X-13-ARIMA-SEATS*.

The main goal is to apply moving-averages to the Calendar-adjusted time series to

smooth out the seasonal fluctuations *

Our objective today, is to have a system that can handle X12-ARIMA with

Tabulations.

Options:

1. Seasonally adjusted the Raw microdata directly and Tabulate (Direct Bottom-

up Approach) – multiplicative model with zero values

2. Seasonally adjusted the lower cuboid of the Tabulated Data and aggregate to

higher level (Semi Bottom-up Approach)

3. Seasonally adjusted the cuboid of Tabulated Data (Lost of Additivity)

Extract of : Canadian system of macroeconomic accounts (CSMA) – Module 8

SEATS = Signal Extraction in ARIMA Time Series

How to Produce Seasonally Adjusted Data?

With PROC X12:

Source: https://support.sas.com/documentation/onlinedoc/ets/132/x12.pdf

Proc X12 Results and Database

Components to store in the Dataset :

Date

Dimensions (ex: Province, Sex, Age Group)

Weight / Unweighted

Raw Statistics

Seasonally Adjusted Statistics

Trend-Cycle

Seasonal Component

Trading-Day/Easter Effects

Irregular

(Sex, Age Group,

Province)

(Sex, Age Group)

(Sex)

()

(Age Group)

(Sex, Province)

Province

(Age Group, Province)

Tabulated Data

Date Sex

Age

group Province Types SUM(Sales)

SEAS

SUM(Sales)

JAN-18 . . . 000 9058

Computed

from

option 1,2,3

JAN-18 0 . . 100 8928 …

JAN-18 1 . . 100 130 …

JAN-18 . 1 . 010 30 …

JAN-18 . 2 . 010 9050 …

JAN-18 . 3 . 010 -27 …

JAN-18 . 4 . 010 5 …

JAN-18 . . Quebec 001 0 …

JAN-18 . . Quebec 001 9000 …

JAN-18 . . Quebec 001 -75 …

JAN-18 . . Quebec 001 3 …

JAN-18 . . Quebec 001 30 …

JAN-18 . . Quebec 001 50 …

JAN-18 . . Quebec 001 48 …

… … … … … …

Computational

Lattice of Cuboids

Tabulations and Seasonally Adjusted Data

What we will do in case of hierarchy?

Country Region Province City

O(n) where n is the

number of dimensions

What is a Raking? When to Apply a Raking ?

A.K.A Reconciliation, Balancing, Spreading or Dispersing.

Simplest Form

Province AppleSold

IPE 1,850

Nova Scotia 548

New Brunswick 761

Newfoundland 4,091

Quebec 8,871

Ontario 34,333

Manitoba 5,866

Saskatchewan 13,632

Alberta 13,096

British Columbia 14,624

Nunavut 21

NWT 292

Yukon 4

CANADA TOTAL 97,990

New Total: 103,500

Formula :

ĉ𝑖 = 𝑐𝑖 ∗ 𝐵𝑒𝑛𝑐ℎ

𝑐𝑖

Results of One-Dimensional Raking

Two types of constraint:

Binding Total vs

Non-binding

18/05/201824

Quick Example : One-dimensional Raking

East Center West Sum Control total

(Canada)

Q1 12 14 13 39 40

Q2 10 9 15 34 25

Q3 12 8 17 37 40

Q4 9 9 14 32 37

Annual

Total

43 40 59

East Center West Control total

(Canada)

Q1 11.34 14.82 13.84 40

Q2 10 5.59 9.41 25

Q3 12.02 8.92 19.06 40

Q4 9.64 10.67 16.69 37

Annual

Total

43 40 59

Result

Initial table

Source: Statistics Canada - Proc Ts-Raking Course Notes: An in-house SAS procedure for

Balancing Time Series

18/05/201825

Quick Example : Two-dimensional Raking

East Center West Sum Control total

(Canada)

Cars 12 14 13 39 40

Vans 20 20 24 64 53

Sum 32 34 37

Control total 30 31 32

Result

Initial table

East Center West Control total

(Canada)

Cars 12.72 14.38 12.9 40

Vans 17.28 16.62 19.1 53

Control total 30 31 32

Results

need to be

rounded

Source: Statistics Canada - Proc Ts-Raking Course Notes: An in-house SAS procedure for

Balancing Time Series

How to do a Raking?

1. Manual Adjustments (Based on subject-matter expertise)

2. Iterative Proportional Fitting a.k.a. RAS (The Basic Algorithm) from 1960-70

3. PROC OPTMODEL – SAS/OR with equations and linear constraints.

4. PROC TS-RAKING – Statistics Canada Generalized System

5. Macro GSeriesTSBalancing - – Statistics Canada Generalized System

6. A Prorate on FAME - Forecasting Analysis and Modeling Environment from

Sunguard

How to use Raking in Tabulation ?Tabulated Data Full Lattice of Cuboids

Date Sex

Age

group Province Types SUM(Sales)

NEW

Sum(Sales)

JAN-18 . . . 000 9058

JAN-18 0 . . 100 8928 8500

JAN-18 1 . . 100 130 140

JAN-18 . 1 . 010 30 30

JAN-18 . 2 . 010 9050 8200

JAN-18 . 3 . 010 -27 400

JAN-18 . 4 . 010 5 10

JAN-18 . . Quebec 001 0 …

JAN-18 . . Quebec 001 9000 …

JAN-18 . . Quebec 001 -75 …

JAN-18 . . Quebec 001 3 …

JAN-18 . . Quebec 001 30 …

JAN-18 . . Quebec 001 50 …

JAN-18 . . Quebec 001 48 …

… … … … … …

AddUp

RakedRaked

(AgeG)

(Sex, AgeG)

(Sex, AgeG, Prov)

(AgeG, Prov)

()

(Prov)

(Sex, Prov)

(Sex)

Raked

Raked Raked

Raked

Raked

What is a Benchmarking ?

A Benchmarking occurs, often, when new Annual Data Benchmark series is given

at annual level and sub-annual estimates need to be adjusted accordingly.

The Prorate or One-Dimensional Raking is not always possible for a time

series, it will possibly cause :

A “level-jump” (downward or upward) at each beginning of the year.

A opposite jump at the end of the year

This is also known as Denton-Cholette Quadratic Minimization Method

Step-adjustment with the Denton-Cholette Method

1000

1100

1200

1300

1400

1500

1600

2010q1 2012q1 2014q1

Old ann New ann Old qtr New qtr

Source of graph : Statistics Canada - CSMA foundations – module 8 extension – with approbation of

Jim Tebrake

What to use to apply a Benchmark?

With PROC Benchmarking – Statistics Canada Generalized System

Quadminz in FAME - Forecasting Analysis and Modeling Environment from

Sunguard

How to use Benchmarking in Tabulation ?

New Tabulated Data to the Annual

Benchmark

New Provincial Benchmark

Date Industry Province Types

RAW

SUM(Sales)

SEAS

Sum(Sales)

Q1-14 1001 Quebec 11 9058

Q2-14 1001 Quebec 11 8928 8500

Q3-14 1001 Quebec 11 130 140

Q4-14 1001 Quebec 11 30 30

Q1-15 1001 Quebec 11 9050 8200

Q2-15 1001 Quebec 11 -27 400

Q3-15 1001 Quebec 11 5 10

Q4-15 1001 Quebec 11 0 …

Q1-16 1001 Quebec 11 9000 …

Q2-16 1001 Quebec 11 -75 …

Q3-16 1001 Quebec 11 3 …

Q4-16 1001 Quebec 11 30 …

Q1-17 1001 Quebec 11 50 …

Q2-17 1001 Quebec 11 48 …

Q3-17 … … … … …

Date Industry Province Types New Annual

14 1001 Quebec 111 9058

15 1001 Quebec 111 8928

16 1001 Quebec 111 130

17 1001 Quebec 111 30

+PROC Benchmarking options

=

Original Tabulated Data

Date Industry Province Types

RAW

SUM(Sales)

SEAS

Sum(Sales)

Q1-14 1001 Quebec 11

Q2-14 1001 Quebec 11

Time Series Dataset:

Summary Statistics:

Time Series and Their Summary Statistics

Candidate Procs

1. Proc Means: creating printed tables of summary statistics;

2. Proc Summary: creating datasets of summary statistics;

3. Proc Tabulate: creating tabular reports of summary

statistics; (reports can be either simple or highly customized

tables.)

4. Proc Report: creating both detail and summary reports

containing both summary statistics and computed data

using “compute” blocks.

1. Means proc:

2. Summary proc:

3. Tabulate proc:

Code Examples

4. Report proc:

Code Examples (cont.)

• All four procedures can create reports of summary

statistics with the same standard suite, and output the

reports to SAS datasets.

• By default, MEANS proc displays output; SUMMARY proc

does not display output (needs print option to display

output).

Comparison of Four Procs

Comparison of Four Procs (cont.)

• When var statement missing, Proc MEANS analyzes all numerical variables that are not listed in

other statements;

Proc SUMMARY generates observation counts only, and does not

work if statistics specified in output statement.

• Proc TABULATE has more flexibility than others in

displaying summary statistics within groups in either rows

or columns.

Comparison of Four Procs (cont.)

• REPORT proc is capable of calculating and displaying

information based on other columns (compute block in

proc report). It can provide both detail reporting and

summary reporting.

Comparison of Four Procs (cont.)

• TABULATE and REPORT procs more emphasize on

visually displaying summary statistics.

• For summary statistics reports that are output to SAS

datasets, MEANS and SUMMARY procs are easier in

implementation and more appropriate to be used in a

generalized tabulation tool.

Observation

Threaded Proc Means with Cloud Analytics Services (CAS)

Source: SAS Cloud Analytic Services 3.1: Fundamentals,

available on documentation.sas.com.

What is a Growth Rate ?

The Economic Times Wrote Statistics Canada

Percentage Change refers to

the actual change between the

old value and the new one,

expressed relative to the old

value.

Formula

Valuet – Valuet-1 * 100

Valuet-1

How to calculate Growth Rate in SAS (1)

Equivalent Formula :

With a Simple Dataset :

( Value t – 1 ) * 100

Value t-1

How to calculate Growth Rate in SAS (1)

Dataset with group:

ConclusionTabulation tool system is needed for dissemination(Pre Cansim Data) or Special

Tables. Integration of Seasonally Adjusted Data would be needed and statistics

related to Times Series as:

• Growth Rate / Percent Change

• Period to Period Change

• Year over Year Change

• Data at Annual Rate

• Index, Linked Index

• Laspeyres, Paasche, Fisher, Walsh, Tornqvist, etc..

SAS/ EG Task or Custom tasks Could be a good idea for analysis and explore

those Data.

MEANS and SUMMARY are more appropriate procs than TABULATE and

REPORT to be used in a generalized tabulation tool.

Questions

References• Seasonal Adjustment with X-12-ARIMA, Class Notes, H-0434.

• Theory and Application of Benchmarking, Class Notes, H-0436.

• Theory and Application of Raking for Time Series, Class Notes, H-0437.

• Han J. & Kamber, Micheline. Data Mining – concepts and Techniques, 499 pages.

• Canadian System of Macroeconomic Accounts – Modules 1 to 8, Statistics Canada

• SAS website and documentations.

A Special Thanks to :

Philip Smith – Alumni, Statistics Canada – Educator, Mentor, Consultant

Jim Tebrake – Executive Director Macroeconomic Accounts , Statistics Canada

Steve Holder – National Strategy, SAS Canada for SAS Viya + CAS expertise/discussions

top related