Download - DWH Basics
ETL TESTING
This guide provides the following sections:-
1. Data warehouse concepts 2. Etl development life cycle3. Etl test plan 4. Etl testing life cycle (or) Etl test process5. Types of etl testing 6. Types of etl bugs7. Bug reporting 8. Testing templates(test case, bug reporting & etc..)9. Etl performance testing 10. Etl interview questions11. Project with example 12. SQL 13. Unix
1. Data warehouse concepts Data ware house is relational database which is subject oriented, integrated, time-variant and non volatile collection of data used to support strategic decision making process
Data warehouse Architecture:
2. Etl development life cycle
To learn etl testing, sql is mandatory and should have knowledge in unix. Any way I will guide you in last section.
ETL Testing:- ETL testing is similar to manual testing which we have to do manually with human interaction.
Once after inserting or updating the data into datamart by etl developer then we will test that datamart before loading into the centralized dataware house. This test is called ETL Testing.
Etl development life cycle:
REQUIREMENT ANLAYSIS HIGH LEVEL DESIGN
LOW LEVEL DESIGN
DEVELOPMENT SIT(system integration testing)
REVIEW
TESTING--------->>Etl Testing life Cycle
UAT(user acceptance testing)
PRODUCTION
3. Etl test plan
Test Plan for banking project
Etl testing life cycle (or) Etl test process
ETL TESTING LIFE CYCLE:-
Introduction Banking
Back Ground Informatica, oracle 10g
Test Items Fixed Deposit, Withdrawls.
Features to be tested Like password.non secure field to tested
Approach Types of Etl testing
Testing levels Sanity, smoke
Features Pass & Fail criteria
How many tc pass, tc fail
Suspension criteria Company will make some rules
Test Environment Staging server, client server(Alpha), production server(Beta), live server
Test deliverables Test cases, bug logging, test procedure
Scheduled tasks It’s a time table of the project or module.
Staff & training Required persons
Risk and mitigation General Holidays, seek leaves
Sign off Higher authority
Features not to be tested
Secure fiels,tables
5. Types of etl testing
1) Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is mapped from source to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a) NOT NULL
b) UNIQUE
c) Primary Key
d) Foreign key
e) Check
f) Default
g) NULL
2) Source to Target Count Testing:
In the Source to Target data is matched or not. A Tester can check in this view whether it is ascending order or descending order it doesn’t matter .Only count is required for Tester.
Due to lack of time a tester can follow this type of Testing.
3) Source to Target Data Validation Testing:
In this Testing, a tester can validate the each and every point of the source to target data.
Most of the financial projects, a tester can identify the decimal factors.
4) Threshold/Data Integrated Testing:
In this Testing, the Ranges of the data, A test Engineer can usually identifies the population calculation and share marketing and business finance analysis (quarterly, halferly, Yearly)
MIN MAX RANGE
4 10 6
5) Field to Field Testing:
In the field to field testing, a test engineer can identify that how much space is occupied in the database. The data is integrated in the table cum datatypes.
NOTE: To check the order of the columns and source column to target column.
6) Duplicate Check Testing:In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time the tester follows database queries why because huge amount of data is present in source and Target tables.
Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL HAVING COUNT (*) >1;
Note:
1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may arise.
2) Sometimes, a developer can do mistakes while transferring the data from source to target at that time duplicates may arise.
3) Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).
7) Error/Exception Logical Testing:
1) Delimiter is available in Valid Tables
2) Delimiter is not available in invalid tables(Exception Tables)
8) Incremental and Historical Process Testing:
In the Incremental data, the historical data is not corrupted. When the historical data is corrupted then this is the condition where bugs raise.
9) Control Columns and Defect Values Testing:
This is introduced by IBM
10) Navigation Testing:
Navigation Testing is the End user point of view testing. An end user cannot follow the friendly of the application that navigation is called as bad or poor Navigation.
At the time of Testing, A tester can identify this type of navigation scenarios to avoid unnecessary navigation.
11) Initialization testing:
A combination of hardware and software installed in platform is called the Initialization Testing
12) Transformation Testing:
At the time of mapping from source table to target table, Transformation is not in mapping condition, then the Test Engineer raises bugs.
13) Regression Testing:
Code modification to fix a bug or to implement a new functionality which makes us to to find errors.
These introduced errors are called regression . Identifying for regression effect is called regression testing.
14) Retesting:
Re executing the failed test cases after fixing the bug.
15) System Integration Testing:
Integration testing: After the completion of programming process . Developer can integrate the modules there are 3 models
a) Top Down
b) Bottom Up
c) Hybrid
6. Types of etl bugs
1. User interface bugs/cosmetic bugs:- Related to GUI of application
Navigation, spelling mistakes, font style, font size, colors, alignment.
2. BVA Related bug:-
Minimum and maximum values
3. ECP Related bug:-
Valid and invalid type
4. Input/output bugs:-
Valid values not accepted
Invalid values accepted
5. Calculation bugs:-
Mathematical errors
Final output is wrong
6. Load condition bugs:-
Does not allows multiple users
Does not allows customer expected load
7. Race condition bugs:-
System crash & hang
System cannot run client plat forms
8. Version control bugs:-
No logo matching
No version information available
This occurs usually in regression testing
9. H/W bugs:-
Device is not responding to the application
10. Source bugs:-
Mistakes in help documents
7)Bug reporting
Bug Life Cycle (or) Defect Tracking Process
DETECT DEFECT
REPRODUCED DEFECT
REPORT DEFECT
BUG FIXING BUG RESOLVING
BUG CLOSING
Testing templates
1. Issue log/Clarification template
2. Test case template
3. Bug reporting template
4. Metrics template
Issue log/Clarification template:-
Reference (Doc Name)
Issue Description
Clarification provider
status Raised date Clarified date
Clarified by Remarks
Test case template:-
S.NO TC_ID Description Expected Result
status Query comment
Bug reporting template:-
Defect_ID Description Build_ID Version_ID Severity Priority Status Assigned to
Detected By
Metrics template:-
DateNo. of test cases designed
No. of test cases executed
No. of test cases failed
No. of test cases hold
No. of defects logged
Comments
Etl performance testing
ETL Performance Tuning:
In the Phase of ETL Performing testing , A tester can involve in database Level or Core Database Level. As well as database tester and the same time ETL tester can involve in Performance tuning also. Performance tuning means server side based work.
What is a Performance Testing :
To test the Server response with different user loads. The Purpose of performance testing is to find bottle neck in the application.
What is a Bottle Neck ?
Bottle Neck is a break point where the server will be in peak (or) the bottle neck is a pin point (or) break point when the server responds where the server will be busy with the user request.
ETL Performing Life cycle :
• Work flow requirements
• Performing Objective
• Performing testing
• Performing Measurements
• Performance Tuning
• ETL Workflow requirements:-
In the Phase of work flow req, ETL Tester can identify the performing scenarios how to connect the database to server which environment supports the performance testing and to check the front end and back end environment and batch jobs, data merging, file system components finally reporting events.
• Performing Objective :
The performing objective is to start end to end performance testing most
Of the time performing objective will be decided by the client.
• Performing Testing :
To calculate the speed of the project , ETL Tester can test the DataBase Level . The data base is loading the target properly or not. When ETL Developer doesn’t loads the data in proper conditions then some damage is caused in the performance of the system.
• Performing Measurements :
At the time of Performing execution, we need to measure the below metrics.
Client side metrics 2.hits/sec 3. Through put 4. Memory allocation,5. Process resources 6. Database statistics database user conditions.
• Performance Tuning :
It is a mechanism to get a fixed performance related issues as a Performance tester , we are going to give some suggest recommendations to tuning department.
Code Level ---------- Developer
Data Base Level-----------DBA
Network Level------------Administrator
System Level-------------S/A
Server Level------------Server side People.
ETL interview questions:
1. What is the difference between OLAP and OLTP?
2. Tell me about your ETL workflow process?
3. What is the difference between Operational Database and Warehouse?
4. What type of approach you follow in your project?
5. What is the difference between Data Mart and data ware house?
6. In your project you are using which type of data base and how much space ?
7. Explain the test case template?
8. What is the difference between Severity and Priority?
9. What is the difference between SDLC and STLC?
10. What is the difference between Issue Log and Clarification Log?
11. What type of bugs you have faced in your project?
12. What is Banking?
13. Explain what are the types of Banking?
14. What is the difference between Dimension table and Fact table?
15. Explain SCD’s and their types? how it will be used?
16. Explain Bug reporting?
17. Are you using any models in SDLC?
18. Which process used in ETL Testing?
19. What is unit testing? who will do this?
20. Whats the difference between Incremental Load and Initial Load?
21. Through which document you have done your project?
22. Are you using Requirement tab in QC?
Project
Here I am taking emp table as example. For this I will write test scenarios and test cases, that means we are testing emp table.
Check List or Test Scenarios:-
1. To validate the data in table (emp) 2. To validate the table structure. 3. To validate the null values of the table. 4. To validate the null values of very attribute. 5. To check the duplicate values of the table. 6. To check the duplicate values of each attribute of the table 7. To check the field value or space (length of the field size) 8. To check the constraints (foreign ,primary key) 9. To check the name of the employer who has not earned any commission 10. To check the all employers who are work in dept no (Account
dept,sales dept) 11. To check the row count of each attribute. 12. To check the row count of the table. 13. To check the max salary from emp table. 14. To check the min salary from emp table.
Introduction to database:-
Data: The properties of anything is called data
Ex:- Meaningful facts, text, graphics, images, sound, video segments
Information: Data processed to be useful in decision making
Ex: - student got 1st rank.
Database: To store the information
Earlier days to store information we are using flat file systems like:
1. Spread sheets
2. Folders
3. Ledgers
4. List
The above mentioned storage methods are called as Flat file systems.
Disadvantages:-
Data Redundancy
Limited data sharing
Excessive program maintenence
File System Approach Access:
For each program we have to maintain separate file
To avoid this drawbacks "RDBMS" came to picture
RDBMS:
It is an advanced version of DBMS with relationshipsIt is also used to store and manage data with efficient way than DBMS
RDBMS Approach
You can't connect directly to the database it won't allow. So, we used RDBMS.
SQL
Structured query language and purpose is in order to store (or) manage the information with relational database
Sql is a set of standards maintain by the ANSCII group
Installation Procedures for Oracle 10g,11g:
Installation of Oracle 10g in windows xp:- Click here
Installation of Oracle 11g in windows 7:- Click here
Once after installing the sql prepare the below content and practice it simultaneously
DATAWAREHOUSE-BASICS
What is a Data warehouse? Why we need Data warehouse?According to, Ralph Kimball: A data warehouse is a relational database that is designed for querying and analyzing the business but not for transaction processing.It usually contains historical data derived from transactional data (different sourcesystems).
According to ,W.H.Inmon:
A Data warehouse is a Subject oriented, integrated, timevariant and non-volatile collection of Data used to support strategic decision Makingprocess. Characteristic features of a Data warehouse:1.Subject Oriented2.Integrated
3.Nonvolatile4.Time Variant
Note: The first data warehousing system is implemented in 1987 by W.H.Inmon
Subject Oriented : The data warehouses are designed as a Subject-oriented that a reused to analyze the business by top level management, or middle level management, or for a individual department in an enterprise.
Process Oriented Subject Oriented
Transactional Storage Data WarehouseStorageFor example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a datawarehouse by subject matter, sales in this case makes the data warehouse subject oriented. Integrated: A data warehouse is an integrated database which contains the business information collected from various operational data sources.
12
12
Integration of Data Data Warehouse StorageTransactional Storage
A p p l . A - M , F A p p l . B - 1 , 0 A p p l . C - X , Y Appl. A -pipeline cm.A p p l . B - p i p e l i n e i n c h e s Appl. C -pipeline mcfAppl. A -balance dec(13,2)Appl. B -balance PIC 9(9)V99Appl. C -balance floatA p p l . A - b a l - o n - h a n d Appl. B -current_balanceAppl. C -balanceAppl. A -
date (Julian)Appl. B -date (yymmdd)Appl. C -date (absolute)M, Fpipeline cmbalance dec(13, 2)balancedate (Julian)
I n t e g r a t i o n
EncodingUnit of AttributesPhysicalAttributesNamingConventionsDataConsistency
Time Variant
:A
Data warehouse is a time variant database which allows you to analyze and compare the business with respect to various time periods (Year, Quarter, Month, Week, Day) because which maintains historical data. Current Data Historical Data
Transactional Storage Data Warehouse Storage
Non-volatile
:AData warehouse is a non-volatile database. That means once the data
entered into data warehouse cannot change. It doesn’t reflect to the changes taken
place in operational database. Hence the data is static Volatile Non- Volatile
According to, Babcock -
Data Warehouse is a repository of data summarized or aggregated in simplified form from operational systems. End user orientated data access and reporting tools let user get at the data for decision support.
Why we need Data warehouse?
1.To Store Large Volumes of Historical Detail Data from Mission Critical Applications
2.Better business intelligence for end-users3. Data Security - To prevent unauthorized access to sensitive data4. Replacement of older, less-responsive decision support systems5. Reduction in time to locate, access, and analyze informationEvaluation:1.60’s: Batch reports 1. hard to find and analyze information2. inflexible and expensive, reprogram every new request3. 70’s: Terminal-based DSS and EIS (executive information systems)1.
still inflexible, not integrated with desktop tools4. 80’s: Desktop data access and analysis tools 1. query tools, spreadsheets, GUIs2. easier to use, but only access operational databases5. 90’s: Data warehousing with integrated OLAP engines and toolsWhat is an Operational System? OR What is OLTP?1. Operational systems are the systems that help us run the day-to-day enterprise operations.
2. On Line Transactional Processing systems not built to hold history data.3. The data in these systems are having current data only.4. The data in these systems are maintained in 3 NF. The data is used for runningthe business that doesn’t used for analyzing the business. 5. The examples are online reservations, credit-card authorizations, and AT withdrawals etc.,Difference between OLTP and Data warehouse (OLAP)In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it. Operational System (OLTP) Data warehouse (OLAP)It is designed to support business transactional processing. It is designed to support decision-making process.Application oriented data Subject oriented dataCurrent data Historical dataDetailed data Summary dataVolatile data Non-volatile dataLess history (3-6 months) More history (5-10 years)Normalization data De-normalization dataDesigned for running the business Designed for analyzing the businessSupports E-R modeling Supports Dimensional modelingClerical users can access this data Knowledge users can access this
dataDB Size–100MB-GB DB Size–100GB-TBFew Indexes Many IndexesMany Joins Some JoinsAdvantages of Data Warehousing:1. High query performance2.
Queries not visible outside warehouse3. Can operate when sources unavailable4. Can query data not stored in a DBMS5. Extra information at warehouse1. Modify, summarize (store aggregates)2. Add historical information6. Improves the quality and accessibility of data.7. Reduce the requirements of users to access operational data.8. Allows new reports and studies to be introduced without disrupting operationalsystems.9. Increases the amount of information available to users
Types of Data warehouse: There are three types of data warehousesCentralized data warehouse:A centralised DW is one in which data is stored in asingle, large primary database. This database can be queried directly or used to feeddata marts.1. Functional data warehouse: A functional DW is dedicated to a subset of the business, such as a marketing or finance business function.1.Separate DWs for different business capabilities2. Easier to build initially