b i - extraction, transformation & loading - power point show
TRANSCRIPT
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
1/17
xtractionTransformationandLoadingin
USINESS INTELLEGENCEE T LBI
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
2/17
Topics to be covered
Overview of the ETL process
Different Types of Source Systems
Overview of Data Sources
PSA
Transformations / Rule Types
DTPs / Infopackage
Data Reconciliation
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
3/17
Overview of a Generic ETL Process
It is the process of taking raw data from a source system,
applying transformation rules to it, and loading it to an
InfoProvider (target).
The ETL Process
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
4/17
BI ETL Process BI Data Flow Details
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
5/17
Source Systems in SAP
A Source Systemis any system that isavailable to BI for data extraction and
transfer purposes. Examples include
mySAP ERP, mySAP CRM, custom
system-based Oracle DB.
Source System Types and Interfaces
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
6/17
Overview of Data Source DataSourcesare BI Objects used to extract and
stage data from source systems. It contains anumber of logically-related fields that are
arranged in a flat structure (extraction structure)
and contain data to be transferred into BI.
There are 2 types of Data Sources-DataSource for Transaction Data-DataSource for Master Data
DataSource for Attributes
DataSource for Texts
DataSource for Hierarchy
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
7/17
Overview of Data Source Use
- DataSources supply the metadata description of source data.- They are used to extract data from a source system and to
transfer the data to the BI system
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
8/17
Persistent Staging Area PSA)/ Infopackage Persistent Staging Area (PSA)
is a transparent database table in which request data isstored
is created per DataSource and Source system.
It represents an initial store in BI, in which the requested data is saved
unchanged from the Source System.
Key Point
It can be bypassed but its highly un recommended as it is verycrucial for Data Backup purpose.
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
9/17
Transformation The transformation process allows you to
consolidate, cleanse, and integrate data. The data
can be semantically synchronized from
heterogeneous sources.
When you load data from one BI object into a further
BI object, the data is passed through a
transformation.
A transformation converts the fields of the source
into the format of the target.
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
10/17
Transformation
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
11/17
Transformation Creation of very simple to highly complex transformations is possible using
Rule types
Aggregation types
Routines
Rule Types
A rule type determines whether and how a characteristic or key
figure, or a data field or key field is updated into the target.Different rule types are as follows:
KEY FIGURES
Direct Assignment
Formula
No Transformation
Routine
Routine with Unit
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
12/17
Transformation Rule Types
CHARACERISTICS
Constant
Direct Assignment
Formula
Initial (Only for Key fields)
Read Master Data
Routine
Aggregation Types
Controls how a key figure or a data field is updated in the InfoProvider .
For InfoCubes:
Always Summation
For DataStore Objects:
Either Summation or Overwrite
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
13/17
Transformation Routine Types
Start Routine
run for each data package at the start of the transformation
has a table in the format of the source structure as input and output parameters.
It is used to perform preliminary calculations and store these in a global data structure
can modify or delete data in the data package
End routine
is a routine with a table in the target structure format as input and output parameters
to post process data after transformation on a package-by-package basis.
For e.g., records can be deleted that are not to be updated, or perform data checks
Characteristic Routine
This routine is available as a transformation rule for a key figure or a characteristic.
The input and output values depend on the selected field in the transformation rule.
Expert Routine
Only intended for use in special cases if there are not sufficient functions to perform a
transformation.
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
14/17
Infopackage / Data TransferProcess DTP)Infopackage and DTPs initiate the Data Flow. Infopackage
Infopackage is used to load data into the PSAfrom any source using the Data Source structure.
An InfoPackage is a BI object that contains allthe settings directing exactly how this data should
be uploaded from the source system.
The target of the InfoPackage is the PSA table tied
to the specific DataSource associated with theInfoPackage
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
15/17
Infopackage / Data Transfer Process DTP)
Data Transfer Process It is this object that controls the actual data flow (filters,
update mode (delta or full) for a specific transformation.
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
16/17
Data LoadsData Loading is a two step process:
The first process is loading the data from the source system. involves multiple steps that differ depending on which source system is involved.
For example, if it is a SAP source system, a function call must be made to the other system, and an
extractor program associated with the DataSource might be initiated.
An InfoPackage is the BI object that contains all the settings directing exactly how this data should
be uploaded from the source system.
The target of the InfoPackage is the PSA table tied to the specific DataSource associated with the
InfoPackage.
The second process the data transfer process. DTP controls the actual data flow (filters, update mode (delta or full) for a specific transformation.
There can more than one data transfer process if there are more than one transformation step ortarget in the ETL flow.
Slide 16
-
7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW
17/17
Data Reconciliation
Important aspect in ensuring the quality of
data in BI is the consistency of the data
Data reconciliation allows to check theintegrity of the loaded data, for example,
comparing the totals of a key figure in the
DataStore object with the corresponding
totals that the PSA stores directly from thesource system.