building a star schema v1.1

15
Star Schemas Patrick Cuba – Consultant (SAS® Software) Scalable Performance Data Engine using

Upload: patrick-cuba

Post on 16-Apr-2017

321 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Building a Star Schema v1.1

Star Schemas

Patrick Cuba – Consultant

(SAS® Software)Scalable Performance Data Engine

using

Page 2: Building a Star Schema v1.1

Page 2

AGENDA

• Case Study – Need for SPDE• SPDE Library • Case Study – Need for SPDS• SPDS Server

Clusters Star Schema StarJoin

• Questions• References

Page 3: Building a Star Schema v1.1

Page 3

• Table build is 6 hours• Query time is 20 minutes

• Latest is 360GB• Generation tables hold 24 months• Generation tables grown to 1TB each

• 300+ columns• Four balances per credit card (Max 255)• 20 million customers• Growing customer base• Keeps defaults customer balance

CASE STUDY

Page 4: Building a Star Schema v1.1

Page 4

• At month end the cycle end and latest credit card for the month are added to SAS Generation TablesCycle-end

CASE STUDY

Month EndCycle-endCycle-end

Cycle-end

Cycle-end

Cycle-end

Month end

Month end

Month end

• Accounts cycle at different days in the month

Latest

Page 5: Building a Star Schema v1.1

Page 5

BASE LIBRARY

SAS Dataset

• SAS Datasets are flat files

Page

libname all_users’/disk1/metadata’;

Page 6: Building a Star Schema v1.1

Page 6

• Under BASE SAS License• Scalable Performance Data Engine (SPDE)• On SMP server (at least 2 CPU’s)• RAID

SPDE LIBRARY

SAS SPD Dataset

Data Part

Data Part

Data Part

Data Part

Data Part

HBX Index

IBX Index Meta

libname all_users spde ’/disk1/metadata’datapath= (’/disk2/userdata’ ’/disk3/userdata’)indexpath= (’/disk4/userindexes’ ’/disk5/userindexes’) partsize=128M;

Page 7: Building a Star Schema v1.1

Page 7

• Star Schema using StarJoin• Clustered Cycle & Month end

totalling 1TB

• Table build is 30-40 minutes• Query time is seconds to 5

minutes

CASE STUDY

Dimension

DimensionFact

Dimension

Dimension

Page 8: Building a Star Schema v1.1

Page 8

• Scalable Performance Data Server• Client/Server• SQL Pass-thru

SPD SERVER

Page 9: Building a Star Schema v1.1

Page 9

• Clusters

SPD SERVER

M1

M2

M3

M4

M5

M6

M7

M8

Cluster

PROC SPDO LIBRARY=domain-name; SET ACLUSER user-name; CLUSTER CREATE cluster-table-name MEM = SPD-Server-table1 MEM = SPD-Server-table2 MAXSLOT=24QUIT;

Page 10: Building a Star Schema v1.1

Page 10

• Facts and Dimensions

SPD SERVER

Dimension

DimensionFact

Dimension

Dimension

Pairwise :7 Joins1 Select

StarJoin:3 Steps

Page 11: Building a Star Schema v1.1

execute(reset nostarjoin=<1/0>)

Page 11

STARJOIN RULES

• 1. Turn it ON

Page 12: Building a Star Schema v1.1

Page 12

• 2. No Snowflakes

STARJOIN RULES

Dim

DimFact

Dim

Dim

Dim

Dim

Page 13: Building a Star Schema v1.1

Page 13

• 3. Single Fact Table

STARJOIN RULES

Dim

DimFact

Dim

Dim

• 4. Single Join Condition

Fact

• 5. Fact & Dimension Indexes

Page 14: Building a Star Schema v1.1

Page 14

QUESTIONS

Patrick CubaEmail: [email protected]: 0458 91 2634Linkedin: http://www.linkedin.com/in/patrickcuba

Page 15: Building a Star Schema v1.1

Page 15

REFERENCES

STARJOINhttp://support.sas.com/documentation/cdl/en/spdsug/63088/HTML/default/viewer.htm#n0mlj75x9c4dtzn1ves84e1op3jt.htmSAS® 9.1 Scalable PerformanceData Enginehttp://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_91/base_dataeng_6996.pdfSAS® 9.2Scalable PerformanceData Enginehttp://support.sas.com/documentation/cdl/en/engspde/61887/PDF/default/engspde.pdfWhen should you use the SPDE enginehttp://support.sas.com/rnd/scalability/spde/when.html