data warehousing concept using etl process for scd type-1
TRANSCRIPT
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 1/10
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 9
Data Warehousing Concept Using ETL Process ForSCD Type-1
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3
1Andhra University, M.Tech (Ph.D), Visakhapatnam, India.2Andhra University, Professor, Visakhapatnam, India.
3Andhra University, M.Tech (Ph.D), Visakhapatnam, India.
2
dr [email protected]@gmail.com
Abstract:
A Type 1 change overwrites an existing dimensional attribute with new information. Inthe customer name-change example, the new name overwrites the old name, and the value for the
old version is lost. A Type One change updates only the attribute, doesn't insert new records, andaffects no keys. The new incoming record (changed/modified data set) replaces the existing old
record in target. It is easy to implement but does not maintain any history of prior attribute values. Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather
than changing on a time-based, regular schedule.
Keywords- ETL; Metadata; Mapping; Transformation.
I. INTRODUCTION
With Slowly Changing Dimensions (SCDs) data changes slowly[1], rather than
changing on a time-based, regular schedule. For example, you may have a dimension in
your database that tracks the sales records of your company's salespeople. Creating sales reports
seems simple enough, until a salesperson is transferred from one regional office to another.
How do you record such a change in your sales dimension? You could calculate the sumor average of each salespersons sales, but if you use that to compare the performance of salesmen,
that might give misleading information. If the salesperson was transferred and used to work in a hot
market where sales were easy, and now works in a market where sales are infrequent, his/her totals
will look much stronger than the other salespeople in their new region. Or you could create a second
salesperson record and treat the transferred person as a new sales person, but that creates problems.
Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason[3]; These
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 2/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 10
are a few examples of Slowly Changing Dimensions since some changes are happening to themover a period of time. The new incoming record (changed/modified data set) replaces the existing
old record in target.Using the oracle emp table source data implemented on SCD type-1, how tomodify and how to store the date in emp table Table 1.
A. Implementation:
Source:
Table 1: Oracle SQL Query On EMP Table
II.SOURCE TABLE AN SOURCE ANALYZER
Add a relational Table source definition to a mapping, U need to connect it to a source qualifier transformation. The source qualifier transformation represents the records that the informatica
server reads when it runs a session Figure 1.
Figure 1: Source Table an Source Analyzer
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 3/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 11
III. TARGET TABLE AN TARGET DESIGNER
Target definitions define the structure of tables in the target database, or the structure of filetargets the Power Center Server creates when you run a workflow. If you add a target definition to
the repository that does not exist in a relational database, you need to create target tables in your
target database Figure 2. You do this by generating and executing the necessary SQL code withinthe Warehouse Designer.
Figure 2: Target Table an Target Designer
IV. EXPRESSION TRANSFORMATION IN INFORMATICA
Expression transformation is a connected, passive transformation used to calculate values
on a single row[5]. Examples of calculations are concatenating the first and last name, adjusting theemployee salaries, converting strings to date etc. Expression transformation can also be used to test
conditional statements before passing the data to other transformations.
A. Creating an Expression Transformation:
Just follow the below steps to create an expression transformation1. In the mapping designer, create a new mapping or open an existing
mapping.
2. Go to Toolbar->click Transformation -> Create. Select the expression
transformation. Figure 3.
3. Enter a name, click on Create and then click on Done.
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 4/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 12
Figure 3: Diagram for Expression Transformation
Figure 4: Creating Expression port logic
You can add ports to expression transformation either by selecting and dragging ports
from other transformations or by opening the expression transformation and create ports manuallyFigure 4.We can add the port inset_flag using string datatype. In expression transformation
implement the employee key either true or false.
IIF(ISNULL(EMPKEY,’TRUE’,’FALSE’)
V. ROUTER TRANSFORMATION IN INFORMATICA
Router transformation is an active and connected transformation[8]. It is similar to the filter
transformation used to test a condition and filter the data. In a filter transformation, you can specifyonly one condition and drops the rows that do not satisfy the condition Figure 5. Where as in a
router transformation, you can specify more than one condition and provides the ability for route thedata that meet the test condition[6]. Use router transformation if you need to test the same input data
on multiple conditions.
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 5/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 13
A. Creating Router Transformation
Follow the below steps to create a router transformation1. In the mapping designer, create a new mapping or open an existing mapping
2. Go the toolbar->Click on Transformation->Create
3. Select the Router Transformation, enter the name, click on create and then click on Done.4. Select the ports from the upstream transformation and drag them to the router transformation.
You can also create input ports manually on the ports tab.
Figure 5: Creating Router Transformation
We can implement the Router transformation split the two new Groups ports. One group
name Insert second group name update.
Insert: Insert_flag=’True’Update:Insert_flag=’false’
VI. UPDATE STRATEGY TRANSFORMATION IN INFORMATICA
Update strategy transformation is an active and connected transformation. Update strategy
transformation is used to insert, update, and delete records in the target table. It can also reject therecords without reaching the target table[7]. When you design a target table, you need to decide
what data should be stored in the target.
When you want to maintain a history or source in the target table, then for every change inthe source record you want to insert a new record in the target table. When you want an exact copy
of source data to be maintained in the target table, then if the source data changes you have toupdate the corresponding records in the target[2]. The design of the target table decides how to
handle the changes to existing rows Figure 6. In the informatica, you can set the update strategy attwo different levels:
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 6/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 14
• Session Level: Configuring at session level instructs the integration service to either treat all
rows in the same way (Insert or update or delete) or use instructions coded in the session
mapping to flag for different database operations.
• Mapping Level: Use update strategy transformation to flag rows for inert, update, delete or
reject.
A. Flagging Rows in Mapping with Update Strategy:
You have to flag each row for inserting, updating, deleting or rejecting. The constants and their
numeric equivalents for each database operation are listed below.• DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.
• DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.
• DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.
• DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.
Figure 6: Update Strategy Transformation
In this Update Strategy Transformation using only Insert and Update.
Transformation Attribute Value
Update Strategy Expression: 0Update Strategy Expression: 1
VII. SEQUENCE GENERATOR TRANSFORMATION
• Passive and Connected Transformation.• The Sequence Generator transformation generates numeric values.• Use the Sequence Generator to create unique primary key values[5], replace missing primary
keys, or cycle through a sequential range of numbers.
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 7/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 15
We use it to generate Surrogate Key in DWH environment mostly. When we want toMaintain history, then we need a key other than Primary Key to uniquely identify the record. So we
create a Sequence 1,2,3,4 and so on Figure 7. We use this sequence as the key. Example: If EMPNO is the key, we can keep only one record in target and can’t maintain history[10]. So we useSurrogate key as Primary key and not EMPNO.
A. Sequence Generator Ports :
The Sequence Generator transformation provides two output ports: NEXTVAL and CURRVA.
• We cannot edit or delete these ports.• Likewise, we cannot add ports to the transformation.
NEXTVAL:
Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation or target.
Figure 7: Sequence Generator Transformation
VIII. SCD TYPE-1 MAPPING DESIGN
The complete Slowly Changing Dimension Mapping Design flow, Figure 8. This flow will provide completion information of SCD-Type-1 source data how to load target, maintain the data
processing.
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 8/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 16
Figure 8: Slowly Changing Dimensions (SCDs) Flow
A. Insert :
Insert into new employee records and Update the data complete information in this tableTableSame data will display the graphical mode in ETL processing,after inert and update data
available in Table 3.
Table 2: New record inserted table
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 9/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 17
Table 3 : Display the Designer Preview Data
Result: Display the preview data using Slowly Changing Dimensions (SCDs) Type-1 only Thenew incoming record (changed/modified data set) replaces the existing old record in target.
Source Data: Table 1
Target Data : Table 2Table 3[Graphical view]
IX. CONCLUSIONS AND FUTURE WORK
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources .In this paper, we have focused on the problem A TypeOne change updates only the attribute, doesn't insert new records, and affects no keys. It is easy to
implement but does not maintain any history of prior attribute values. Slowly Changing
Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a
time-based, regular schedule. Under the framework of conventional ETL, the ETL process isdefined[7] for different data source, develop and compile program or script, retrieval records from
database.In this paper, a useful engineering made study for ETL tool selection was developed. Inthe end. all three initial objec-tives were achieved[9]. Comprehensive ETL criteria were identified.
testing procedures were developed. and this work was applied to commercial ETL tools. The studycovered all major aspects of ETL usage and can be used to effectivel! compare and evaluate various
ETL tools.
REFERENCES
[1] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data
Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp. 215-229.
[2] R. J. Davenport, September 2007. [Online] ETL vs. ELT: A Subjective View. In Source IT
7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1
http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 10/10
K.Srikanth1, N.V.E.S.Murthy
2, J.Anitha
3, The International Journal of Computer
Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 18
Consulting Ltd., U.K. Available at: http://www.insource.co.uk/pdf/ETL_ELT.pdf.
[3] T. Jun, C. Kai, Feng Yu, T. Gang, “The Research and Application of ETL Tools in Business Intelligence Project,” in Proc. International Forum on Information Technology and
Applications, 2009, IEEE, pp.620-623.
[4] Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting,Cleaning,Conforming, and Delivering Data. John Wiley & Sons,2004.[5] Labio, W., Garcia-Molina, H.: E±cient Snapshot Di®erential Algorithms for Data Warehousing.
VLDB,1996.[6] Informatica Power Center, Available at:
www.informatica.com/ products/ data integration/ power center/ default.htm .
[7] Teradata, Available at: www.teradata.com.
[8] Sun SPACE M9000 Processor, Available at: http://www.sun.com/servers/highend/m9000/
[9] L. Troy, C. Pydimukkala, How to Use Power Center with Teradata to Load and Unload Data, Informatica Corporation [Online], Available at: www.myinformatica.com.
[10] Widom, J.: Research Problems in Data Warehousing. CIKM, 1995.