informatica easy learning online training

Post on 21-Jan-2017

35 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Warehousing Concepts What is Data Warehousing? Dimensional Data Model Star Schema Snowflake Schema Slowly Changing Dimension Conceptual Data Model Logical Data Model Physical Data Model Conceptual, Logical, and Physical Data Model Data Integrity What is OLAP MOLAP, ROLAP, and HOLAP

What is Data Warehousing?Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.

A process of transforming data into information and making it available to users in a timely enough manner to make a difference

To summarize ...

• OLTP Systems are used to “run” a business

• The Data Warehouse helps to “optimize” the business

Corporate Data

It includes

• human resource data• financial data• facilities data• sales data• expenses on marketing data• production planning cost• manufacturing cost• service delivery cost• inventory management• shipping and payment data

What is enterprise-wide corporate data?

How is the Business Intelligence in Retail Banking? Or Retail Industry?

KPI’s

The KPI can be used as the performance measurement tool

(Key Performance Indicator) 

The KPI’s in Retail Banking: The Total cash deposits held in a month The average annual deposit held Average number of deposits per retail bank growth Average withdrawals made by each depositor Ratio of active depositor or dormant depositor Average number of default borrowers in a year Average number of credit cards issued by the retail bank Rate of borrowing risk Rate of default risk Average number of customers served in a day Average number of closed bank accounts

KPI’s

The KPI can be used as the performance measurement tool

(Key Performance Indicator) 

The KPI’s in Retail Industry:

• Sales compared to Budget & Target• Sales compared to last year (or any other period)• Wage cost recovery• Average sale per customer/transaction• Units per customer/transaction• Sales per hour• Sales & Gross Margin

KPI’s (Key Performance Indicator) 

Examples of common departmental KPIs

Sales GrowthAnalyze the pace at which your organization's sales revenue is growing and use that information in strategic decision-making

MarketingAnalyze the pace at which your organization's sales revenue is growing and use that information in strategic decision-making

FinancialMeasures your organization's financial health by analyzing readily available resources that could be used to meet any short-term obligations.

Data Warehousing

Data Warehousing Architecture

Data Warehousing Environment

• Duplicate data • Inconsistent values• Missing data• Unexpected use of fields• Impossible or wrong values

Data Quality

• Data-Type Constraints: • Range Constraints:• Mandatory Constraints: • Unique Constraints: • Set-Membership constraints: • Foreign-key constraints: Regular expression patterns:

Validations for Data Cleansing

Views to build warehouse

• The top-down view• The data source view• The data warehouse view• The business query view

What approach is better to design data warehouse?

Top Down Approach

Bottom Up Approach

Data Warehousing Design

• Requirement Gathering• Physical Environment Setup• Data Modeling• ETL• OLAP Cube Design• Front End Development• Report Development• Performance Tuning• Query Optimization• Quality Assurance• Rolling out to Production• Production Maintenance• Incremental Enhancements

Why Data Warehousing?

Need to see daily, weekly, monthly, quarterly profit of each store.

Comparison of sales and profit on various time periods.

Comparison of sales in various time bands of the day.

Need to know which product has more demand on which location?

Need to study trend of sales by time period of the day over the week, month, and year?

On what day sales is higher?

Phases of Data Warehousing Project

1. Identify and collect requirements

Need to see daily, weekly, monthly, quarterly profit of each store.

Comparison of sales and profit on various time periods.

Comparison of sales in various time bands of the day.

Need to know which product has more demand on which location?

Need to study trend of sales by time period of the day over the week, month, and year?

On what day sales is higher?

Will be handled by business analyst and leads

Who collects the requirements?

Phases of Data Warehousing Project

2. Design the dimensional model

Pharmacy_Claims_FactDrug_Id (FK)Org_Id (FK)Practitioner_Id (FK)Product_Id (FK)Time_ID (FK)Claim_status_Id (FK)Provider_Id (FK)Subscriber_id (FK)Demographic_key (FK)

InsuranceType_Id (FK)Incurred_DateClaim_DateClaim_Settled_DateDays_SupplyDispensing_FeeIncentive_Savings_AmountIncentive_Fee_Paid_AmountAmount_ClaimedAmount_PaidAmount_PendingAmount_Adjusted

CoPayment_AmountCoInsurance_Amount

DeductibleRefill_IndicatorClaim_Production_Key

Claim_Production_Txn_NoStatus_Change_DateLast_Record_Flag

PractitionerPractitioner_IdPractitioner_NamePractitioner_Type

practioner_type_descQualification

Specialisationssn

Medical_Assoc_Enroll_No

OrganisationOrg_IdOrg_prod_idOrg_NameAddressCityCountyStateZipIndustry_Classification

SubscriberSubscriber_idSubscriber_prod_keyMember_prod_keyMember_NameDate_of_BirthSubscriber_typeAddressCityCountyStateZipHobby1Hobby2Smoker_YNAlcoholic_YNPre_Existing_Ailments

DemographicsDemographic_keyAge_groupIncome_groupRaceCountry_of_birthMarital_statusGenderCitizenship_status

ProviderProvider_IdProvider_NameProvider_TypeAddressCityCountyStateZipService_Area

Netwrok_Provider

Insurance_TypeInsuranceType_IdInsuranceType_NameInsuranceType_Desc

ProductProduct_IdProduct_NameProduct_Category

LoB

Claim_StatusClaim_status_IdClaim_Status_Reason

Claim_stat_catg

TimeTime_IDDayWeekMonthQuarterYearSeason

DrugsDrug_IdDrug_Name_GenericDrug_Name_TradeNational_Drug_CodeDrug_DescriptionDrug_CategoryFormularyManufacturer

Data Model will be designed by Data Modelers

Phases of Data Warehousing Project

3. Create and Maintain the tables

Database will be maintained by DBA’s

Phases of Data Warehousing Project

4. Loading the data into Data Warehouse and Data Marts

Will be taken care by ETL Team

What is ETL?

Informatica is ETL application

Phases of Data Warehousing Project

5. Develop Reports / Dashboards

Will be taken care by Reporting Team

Phases of Data Warehousing Project

6. Testing ETL Mappings and Reports / Dashboards

Will be taken care by QA Department

7. Deploying to the Production and Maintaining by Production Team

Will be taken care by Production Department

Where do we fit after learning this training?

Phases of Data Warehousing Project

Where do we fit after learning this training?

We can work as a1. ETL Developer2. ETL Administrator3. ETL Tester

Data Modeling

What is Data Modeling?

• Data model defines relationships between data

• Dimensional data model is most often used in data warehousing systems.

• Data modeling is the process of learning about the data.

Data modeling will be designed by data modelers

What is Dimensional Modeling?

• It help us store the data

Goals and benefits of Dimensional Modeling• Faster Data retrieval• Better Understandability• Extensibility

It has 2 distinct categories• Dimension and• Measures

Scenarios of Dimensional Data Modeling

McDonald’s client:I want to store information of how many burgers and fries are getting sold per day from a single McDonald’s outlet.

what is dimension and what is a measure in this example

Step1: Identify the Dimensions

1.Food (ex: Burgers and fries) 2. Store (McDonald’s) 3. Some specific day

Step2: Identify the measures

Number of burgers/fries sold is a measure.

The Fact table captures the data that measures the organizations business operations

Scenarios of Dimensional Data Modeling

Step3: Identify the attributes or properties of dimensions

KEY NAME

1 Burger

2 Fries

KEY NAME

1 Store 1

2 Store 2

... ...

KEY DAY

1 01 Jan 2012

2 02 Jan 2012

3 03 Jan 2012

... ...

Scenarios of Dimensional Data Modeling

Step 4: Identify the granularity of the measures

What is meant by "Granularity"?

Granularity refers to the lowest (or most granular) level of information stored in any table

Scenarios of Dimensional Data Modeling

Step 5: History Preservation (Optional)

This can be solved by designing the dimension tables as "slowly changing dimension".

Entities:Entities are the things about which you want to store information.

For example: EMPLOYEE

Cardinalities:

Scenarios of Dimensional Data Modeling

The cardinality shows how much of one side of the relationship belongs to how much of the other side of the relationship.

For example: • How many customers belong to 1 sale?; • How many sales belong to 1 customer?; • How many sales take place in 1 shop?

Customers --> Sales; 1 customer can buy something several timesSales --> Customers; 1 sale is always made by 1 customer at the timeCustomers --> Products; 1 customer can buy multiple productsProducts --> Customers; 1 product can be purchased by multiple customers

Scenarios of Dimensional Data Modeling for Banking

Scenarios of Dimensional Data Modeling for Retail Banking

Scenarios of Dimensional Data Modeling for Retail Banking

Event 1 - Set-up Banks and BranchesEvent 2 - Create new CustomerEvent 3 - Setup New AccountEvent 4 - Issue Credit CardEvent 5 - Customer makes DepositEvent 6 - Customer uses CardEvent 7 - Bank Issues StatementEvent 8 - Customer closes Account

Data Modeling

Data Modeling

Data Modeling

Types of OLAP Servers

We have four types of OLAP servers:

• Relational OLAP (ROLAP)• Multidimensional OLAP (MOLAP)• Hybrid OLAP (HOLAP)• Specialized SQL Servers

OLTP v/s OLAP

OLTP Data Model

OLTP OLAP

Snowflake Schema

Snowflake Schema

Star Schema

Informatica

top related