isqs 6339, data management and business intelligence cubism – measures and dimensions

Post on 19-Jan-2016

43 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions. Zhangxi Lin Texas Tech University. Outline. Measures Where we’ve been Populating fact table Types of dimensions. Structure and Components of Business Intelligence. SSMS. SSIS. SSAS. SSRS. SAS EG. - PowerPoint PPT Presentation

TRANSCRIPT

ISQS 6339, Data Management and Business ISQS 6339, Data Management and Business Intelligence Intelligence

Cubism – Measures and Cubism – Measures and DimensionsDimensionsZhangxi Lin

Texas Tech University

1

OutlineOutlineMeasuresWhere we’ve beenPopulating fact tableTypes of dimensions

2

Structure and Components of Structure and Components of Business IntelligenceBusiness Intelligence

3

SSMSSSMS SSISSSIS SSASSSAS

SSRSSSRS

SASEM

SASEM

SASEG

SASEG

Snowflake Schema of the Data MartSnowflake Schema of the Data Mart

4

Manufacturingfact

DimProduct

DimProductSubType

DimProductType

DimBatch

DimMachine

DimMachineType

DimMaterial

DimPlant

DimCountry

1

2

3

4

5

8

6

7

910

Where we’ve been and where we Where we’ve been and where we are noware nowExercise 1: Getting started Exercise 2: Creating a data mart with

SSMSExercise 3: Creating data mart with

BIDSExercise 4: Populating dimensions of a

data martExercise 5: Loading fact tablesExercise 6: Create and customize a

cube

5

What we need to do with the half-What we need to do with the half-done data mart?done data mart?Populate DimBatch dimenstion tablePopulate ManufacturingFact tableBuild an OLAP cube (we already did this

before)Check measuresCheck dimensions

6

7

MEASURES MEASURES

FactsFactsFacts are measurements associated with a

specific business process.Many facts can be derived from other facts,

including additive and semiadditive facts. Non-additive facts can be avoided by

calculating it from additive facts.Measures are clustered together in a

group, called measure group.

8

Types of measuresTypes of measures Three types

◦ Additive measures. Most facts are additive (calculative), such as sum

◦ Semiadditive measures. The measures that can be added along some dimensions, but not along others. For example, inventory level can be added along product dimension but not time dimension.

◦ Non-additive (such as max, average), or descriptive (e.g. factless fact table).

Aggregate functions◦ Additive: Sum◦ Semiadditive: ByAccount, Count, FirstChild,

FirstNonEmpty, LastChild, LastNonEmpty, Max, Min◦ Nonadditive: DistinctCount, None.

Measures and dimensionsMeasures and dimensionsDimensions are used to

aggregate measures. Therefore, they must be somehow related to measures

Granularity◦Important for the analysis◦There could be missing values in the

fact table

LOADING FACT LOADING FACT TABLESTABLES

11

Exercise 5: Loading Fact Exercise 5: Loading Fact TablesTables Project name: MMMFactLoad-lastname Package name: FactLoad.dtsx Tasks

◦ Create Inventory Fact table◦ Load Dim Batch◦ Load Manufacturing Fact◦ Load Inventory Fact

Deliverable: email a screenshot of the “green” outcome of the ETL project to isqs6347@gmail.com, with a subject title “ISQS 6339 EX5 - <lastname>”

12

Inventory Fact TableInventory Fact Table

Create a Table InventoryFact in your database.◦ Compound primary key: DateOfInventory,

ProductCode, and Material◦ Define two foreign keys

Column Name Data Type Allow Nulls

InventoryLevel Int No

NumberOnBackorder Int No

DateOfInventory Datatime No

ProductCode Int No

Material Varchar(30) No

13

Data Sources for Loading Data Sources for Loading FactFact For loading DimBatch table and ManufacturingFact

table◦ BatchInfo.CSV

For loading InventortyFact table ◦ Lin.OrderProcessingSystem Database

14

Control Flow for Loading Facts and the Control Flow for Loading Facts and the Remaining DimensionRemaining Dimension

Note: to ease debugging, you may use three packages and test them one by one, instead of doing everything in one package

15

Flat File Connection Flat File Connection Data types

◦BatchNumber, MachinNumber: four-byte signed integer [DT_I4]

◦ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4]

◦TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp]

Only check BatchNumber as the input of Dim Batch

All columns are needed for fact tables

16

Some Frequently Used Some Frequently Used NodesNodes

Load DimBatch Data FlowLoad DimBatch Data Flow

18

Load DimBatch Data FlowLoad DimBatch Data Flow

19

Note: Because of duplication in the source file, we may insert An Aggregate item after the Flat File Source item.

The Flat File SourceThe Flat File Source

20

21

Sort Transformation

In the Aggregate item,Define “Group-by” BatchNumber.

In Derived column item, Define BatchName From BatchNumber

Use the expression(DT_WSTR, 50)[BatchNumber]To change the data typeOf BatchName.

Load Fact Data FlowLoad Fact Data Flow

22

Derived Columns for the Fact Derived Columns for the Fact tabletable

23

Expressions for the Derived ColumnsExpressions for the Derived Columns

AcceptedProducts◦ [NumberProduced] – [NumberRejected]

ElapsedTimeForManufacture◦ DATEDIFF(“mi”, [TimeStarted],[TimeStopped])

DateOfManufacture◦ (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,25)

[TimeStarted],1,10) This expression converts TimeStarted into a

string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion.

24

25

OLE DB DestinationFor loading the facttable

Load Inventory FactLoad Inventory Fact OLE DB Source

◦ OrderProcessingSystem.InventoryFact OLE DB Destination

◦ MaxMinManufacturingDM-lastname.InventoryFact No transformation

There are two ways to loading the table◦ Create the table and use ETL to load it◦ Import directly from the source to the database

MaxMinManufacturingDM-lastname

26

Debugging ResultsDebugging Results

27

Loading DimBatch Loading ManufacturingFact

28

BUILDING AN OLAP BUILDING AN OLAP CUBECUBE

Exercise 6: Design a CubeExercise 6: Design a Cube Project name: ISQS6339_EX6_2015_lastname Tasks

◦ Add in new date items (year, quarter, and month) to two fact tables

◦ Create time dimension using Manufacturing Fact table◦ Define calculated measures (Total Products, Percent

Rejected) ◦ Define hierarchies of attributes in dimension tables◦ Create a cube from the MaxMinManufacturing data

mart with hierarchical date dimension Deliverable:

◦ Screenshots: dimension hierarchies, dimensions, relationships of facts and dimensions, deployment result, format of measures, and browsing results.

29

Three Steps to Create a Cube from Three Steps to Create a Cube from Data SourcesData Sources Defining data source Defining data source view

◦ Add in three new columns of year, quarter, and month for the two fact tables

Building a cube. ◦ Define a new dimension Dim Time from

Manufacturing Fact table Customize the cube:

◦ Link two fact tables in a cube◦ Define new primary key for Dim Time◦ Define calculated measures◦ Relate dimensions to measures

30

T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - ManufactureDefinition - Manufacture YearOfManufacture

CONVERT(char(4),YEAR(DateOfManufacture)) QuarterOfManufacture

CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3

THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6

THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9

THEN 'Q3'ELSE 'Q4'END

MonthOfManufactureCONVERT(char(4), YEAR(DateOfManufacture)) +

RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)

31

T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - InventoryDefinition - Inventory YearOfInventory

CONVERT(char(4),YEAR(DateOfInventory)) QuarterOfInventory

CONVERT(char(4), YEAR(DateOfInventory)) + CASE WHEN MONTH (DateOfInventory) BETWEEN 1 AND 3 THEN

'Q1' WHEN MONTH (DateOfInventory) BETWEEN 4 AND 6 THEN

'Q2' WHEN MONTH (DateOfInventory) BETWEEN 7 AND 9 THEN

'Q3'ELSE 'Q4'END

MonthOfInventoryCONVERT(char(4), YEAR(DateOfInventory)) +

RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfInventory)),2)

32

Data Source ViewData Source View

33

New columns

Select Measures PageSelect Measures Page

34

Uncheck ManufactureFact Count

35

The finished cube

36

CubeStructure

37

Defining a format string

38

Inventory measures

“Number on Backorder” is also set with these two parameters

Calculated measures – Calculated measures – made-up factsmade-up facts The definition of calculated measure is stored

in the OLAP cube itself. The actual values that result from a calculated

measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation.

The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL.

39

41

42

DIMENSIONSDIMENSIONSin SQL Serverin SQL Server

Types of DimensionsTypes of Dimensions Fact dimensions: the Dimensions created from attributes

in a fact table Parent-Child dimensions: Built on a table containing a

self-referential relationship, such as a parent attribute. Role playing dimensions: related to the same measure

group multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment. ◦ To create a role playing dimension, add the dimension to the

Dimension Usage tab multiple times. Then create a relationship between each instance of the dimension and the measure group.

Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group

Data mining dimensions: the information discovered by data mining

Many-to-many dimensions: e.g. multiple ship to addresses

Slowly changing dimensions

43

Slowly changing Slowly changing dimensionsdimensionsType 1 SCD – no trackType 2 SCD – tracking the entire history,

adding four attributes: SCD Original ID, SCD Start Date, SCD End Date, SCD Status

Type 3 SCD – Similar to Type 2 SCD but only track current state and the original state; two additional attribute: SCD Start Date, SCD Initial Value

Add a time dimension (a fact Add a time dimension (a fact dimension)dimension)

Rename time dimensionRename time dimension

Date HierarchyDate Hierarchy

Material Hierarchy & Plant Material Hierarchy & Plant HierarchyHierarchy

Product HierarchyProduct Hierarchy

Relating Dimensions in Relating Dimensions in the Cubethe Cube

top related