dwh testing

Upload: karthik-raparthy

Post on 12-Oct-2015

49 views

Category:

Documents


0 download

DESCRIPTION

DWH

TRANSCRIPT

  • Business value Assurance / Advanced DWH (Testing)1. Challenges faced by the testing team in realtime scenario2. Challenges faced by the team in differents phases of STLC3. What tools are available & used for testing DWH at different stages4. Any automation tool available for DWH5. Any tool available and used to ensure data quality6. How it is ensured that the data sample selected ensures completeness7. Howis data reconciliation done8. How to test bulk data9. Some information on performance tool and how the result is analyzed

  • Table Of Contents1. Challenges faced by the testing team in real-time scenario.2. Challenges faced by the team in different phases of STLC.3. What tools are available & used for testing DWH at different stages.4. Any automation tool available for DWH.5. Any tool available and used to ensure data quality.6. How it is ensured that the data sample selected ensures completeness.7. Howis data reconciliation done.8. How to test bulk data.9. Some information on performance tool and how the result is analyzed.

  • Challenges faced by the testing team in real-time scenario. Challenges Faced:Lack of Skilled testersResults:Resulted into incomplete, insufficient and inadequacy of testing that led to spending of lot of effort in finding and reporting the bugs.

  • Challenges Faced:Lack of availability of standard test data / datasets during testingResults:Lead to insufficient test coverage.

  • Challenges Faced:The team members had insufficientknowledge of the domain standardsResults:Resulted in inadequate testing.

  • Challenges Faced:Poor understanding of requirements and Miscommunication or no communication with the end-users during testing/development cyclesResults:No specifics of what an application should or shouldn't do (the application's requirements) and lead to poor quality of testing.

  • Challenges Faced:Not recording non-reproducible defectsResults:Many times tester came across bugs during random / exploratory testing which appeared on specific configurations and are non-reproducible. This made testing task extremely tedious and time consuming, as many times there would be random hangs in product.

  • Challenges Faced:Tedious manual verification and testing the complete applicationResults:Even though this led developers on displaying specific interpretation of results, this has to be done on wide range of datasets and is a repetitive work. Also to testeach and every combination was challenging.

  • Challenges Faced:Interdependencies of components in the SoftwareResults:Since the software was complex with different components, the changes in one part of software often caused breaks in other parts of the software. Pressure to handle the current functionality changes, previous working functionality checks and bug tracking.

  • Challenges Faced:Testing always under time constraintsResults:Often there was a slippage in other phases of the project and thus reduced time for testing as there was a committed end date to customer. It was also observed that the tester could simply focus on task completion and not on the test coverage and quality of work. This testing activity was taken up as last activity in project life cycle and there was always a pressure to squeeze testing in a short time.

  • Challenges Faced:Test Systems inadequacy & lack of dedicated resources for test team.Under estimating testing efforts in project effortsResults:Testing time was affected because of lack of dedicated test systems given totest team, the testers got assigned to test multiple modules and the developerswere finally moved on the testing job.Test engineers were forced to work at odd hours/weekends as the limitedresources were in control of the development team and test engineers weregiven a lower priority during allocation of resources.Testing team was not involved during scoping phase and the testing teamsefforts were typically underestimated. This led to lower quality of testing assufficient efforts could not be put in for the same.

  • Challenges Faced:The involvement of test team in entire life cycle is lackingResults:Test engineers were involved late in the life cycle. This limited their contribution only to black box testing. The project team didnt use the services of the test team for the unit as well as integration testing phases. Due to the involvement testers in the testing phase, the test engineers took time to understand all the requirements of the product, and were overloaded and finally were forced to work many late hours.

  • Challenges Faced:Problems faced to cope with attritionResults:Few Key employees left the company at very short career intervals. Management faced hard problems to cope with attrition rate. New testers taken into project required project training from the beginning and as this is a complex project it became difficult to understand thus causing delay in release date.

  • Challenges Faced:Hard or subtle bug remained unnoticedResults:Since there was a lack of skilled testers and domain expertise, some testers concentrated more on finding easy bugs that did not require deep understanding.

  • Challenges Faced:Lack of relationship with the developers & no documentation accompanying releases provided to test teamResults:It is a big challenge. There is no proper documentation accompanying releases provided to the test team. The test engineer is not aware of the known issues, main Features to be tested, etc. Hence a lot of effort is wasted.

  • Challenges Faced:Problems faced to cope up with scope creep and changes to the functionality.Results:Delays in implementation date because of lot of rework. Since there were dependencies among parts of the project and the frequent changes to beincorporated, resulted many bugs in the software.

  • Though automated testing has a lot of benefit, but italso has some associated challenges. i. Selection of Test Toolii. Customization of Tooliii. Selection of Automation Leveliv. Development and Verification of Scriptv. Implementation of Test Management System

  • Challenges faced by the team in different phases of STLC. Testing the complete application:Is it possible? I think impossible. There are millions of test combinations. Its not possible to test each and every combination both in manual as well as in automation testing. If you try all these combinations you will never release the product.

  • Misunderstanding of company processes:Some times you just dont pay proper attention what the company-defined processes are and these are for what purposes. There are some myths in testers that they should only go with company processes even these processes are not applicable for their current testing scenario. This results in incomplete and inappropriate application testing.

  • Relationship with developers:Big challenge. Requires very skilled tester to handle this relation positively and even by completing the work in testers way. There are simply hundreds of excuses developers or testers can make when they are not agree with some points. For this tester also requires good communication, troubleshooting and analyzing skill.

  • Regression testing:When project goes on expanding the regression testing work simply becomes uncontrolled. Pressure to handle the current functionality changes, previous working functionality checks and bug tracking.

  • Testing always under time constraint:Hey tester, we want to ship this product by this weekend, are you ready for completion? When this order comes from boss, tester simply focuses on taskcompletion and not on the test coverage and quality of work. There is huge list of tasks that you need to complete within specified time. This includes writing,executing, automating and reviewing the test cases.

  • Which tests to execute first?Then how will you take decision which test cases should be executed and with what priority? Which tests are important over others? This requires good experience to work under pressure.

  • Understanding the requirements:Some times testers are responsible for communicating with customers for understanding the requirements. What if tester fails to understand the requirements? Will tester be able to test the application properly? Definitely No! Testers require good listening and understanding capabilities.

  • Decision to stop the testing:When to stop testing? Very difficult decision. Requires core judgment of testing processes and importance of each process. Also requires on the fly decision ability.

  • One test team under multiple projects:Challenging to keep track of each task. Communication challenges. Many times results in failure of one or both the projects.

  • Reuse of Test scripts:Application development methods are changing rapidly, making it difficult to manage the test tools and test scripts. Test script migration or reuse is very essential but difficult task.

  • Testers focusing on finding easy bugs:If organization is rewarding testers based on number of bugs (very bad approach to judge testers performance) then some testers only concentrate on finding easy bugs those dont require deep understanding and testing. A hard or subtle bug remains unnoticed in such testing approach.

  • To cope with attrition:Increasing salaries and benefits making many employees leave the company at very short career intervals. Managements are facing hard problems to cope with attrition rate. Challenges New testers require project training from the beginning, complex projects are difficult to understand, delay in shipping date!

  • Different types of testing are required throughout the life cycle of a DWH implementation.So we have different challenges to face during the different phases of STLC.

  • ETL (Business Functionality Data Quality Performance)During the ETL phase of DWH implementation, Data quality testing is of utmost importance. Any defect slippage in this phase will be very costly to rectify later. Functional testing need to be carried out to validate the Transformation Logic.

  • Data Load (Parameters Settings Validation)During the setup of Data Load functionality, specific testing on the load module is carried out. The Parameters and Settings for data load are tested here.

  • Initial Data Load (Perfomance Data Quality)Initial Data Load is when the underlying databases are loaded for the first time. Performance testing is of significance here. Data Quality, once tested and signed off during the ETL testing phase is re-tested here.

  • E2E Business Testing (UI & Interface Testing)Once the initial data load is done, the Data warehouse is ready for an end-to-end functional validation. UI testing and Interface testing are carried out during this phase.

  • Maintenance / Data Feeds (Regression)Data from the operational Database should be input into the Data warehouse periodically. During such periodic updates, regressing testing should be executed. This ensures the new data updates heve not broken any existing functionality. Periodic updates are required to ensure temporal consistency.

  • What tools are available and used for testing DWH at different stages? ETL software can help you in automating such process of data loading from Operational environment to Data Warehouse environment.

  • What tools are available and used for testing DWH at different stages?

  • What tools are available and used for testing DWH at different stages? Create pairs of SQL queries (QueryPairs) and reusable queries (Query Snippets) to embed in queries.

  • What tools are available and used for testing DWH at different stages? Execute Scenarios that compare Source databases and / or files to Target data warehouses.

  • What tools are available and used for testing DWH at different stages? Agents execute your queries and return the results to the QuerySurge server for reporting and analysis.

    Analyze and drill down into your results and identify bad data and data defects with our robust reporting.

  • Issue: Missing DataDescription: Data that does not make it into the target databasePossible Causes: By invalid or incorrect lookup table in the transformation logic Bad data from the source database (Needs cleansing) Invalid joinsExample(s): Lookup table should contain a field value of High which maps to Critical. However, Source data field contains Hig - missing the h and fails the lookup, resulting in the target data field containing null. If this occurs on a key field, a possible join would be missed and the entire row could fall out.

  • Issue: Truncation of DataDescription: Data being lost by truncation of the data fieldPossible Causes: Invalid field lengths on target databaseTransformation logic not taking into account field lengths from sourceExample(s):Source field value New Mexico City is being truncated to New Mexico C since the source data field did not have the correct length to capture the entire field.

  • Issue: Data Type MismatchDescription: Data types not setup correct on target databasePossible Causes: Source data field not configured correctlyExample(s): Source data field was required to be a date, however, when initially configured, was setup as a VarChar.

  • Issue:Null TranslationDescription:Null source values not being transformed to correct target valuesPossible Causes: Development team did not include the null translation in the transformation logicExample(s):A Source data field for null was supposed to be transformed to None in the target data field. However, the logic was not implemented, resulting in the target data field containing null values.

  • Issue:Wrong TranslationDescription:Opposite of the Null Translation error. Field should be null but is populated with a non-null value or field should be populated but with wrong valuePossible Causes: Development team incorrectly translated the source field for certain valuesExample(s):Ex. 1) Target field should only be populated when the source field contains certain values, otherwise should be set to nullEx. 2) Target field should be Odd if the source value is an odd number but target field is Even (This is a very basic example)

  • Issue:Misplaced DataDescription:Source data fields not being transformed to the correct target data fieldPossible Causes: Development team inadvertently mapped the source data field to the wrong target data fieldExample(s):A source data field was supposed to be transformed to target data field Last_Name. However, the development team inadvertently mapped the source data field to First_Name

  • Issue:Extra RecordsDescription:Records which should not be in the ETL are included in the ETLPossible Causes: Development team did not include filter in their codeExample(s):If a case has the deleted field populated, the case and any data related to the case should not be in any ETL

  • Issue:Not Enough RecordsDescription:Records which should be in the ETL are not included in the ETLPossible Causes: Development team had a filter in their code which should not have been thereExample(s):If a case was in a certain state, it should be ETLd over to the data warehouse but not the data mart

  • Issue:Transformation Logic Errors/HolesDescription:Testing sometimes can lead to finding holes in the transformation logic or realizing the logic is unclearPossible Causes: Development team did not take into account special cases. For example international cities that contain special language specific characters might need to be dealt with in the ETL codeExample(s):Ex. 1) Most cases may fall into a certain branch of logic for a transformation but a small subset of cases (sometimes with unusual data) may not fall into any branches. How the testers code and the developers code handle these cases could be different (and possibly both end up being wrong) and the logic is changed to accommodate the cases.Ex. 2) Tester and developer have different interpretation of transformation logic, which results in having different values. This will lead to the logic being re-written to become more clear

  • Issue:Simple/Small ErrorsDescription:Capitalization, spacing and other small errorsPossible Causes: Development team did not add an additional space after a comma for populating the target field.Example(s):Product names on a case should be separated by a comma and then a space but target field only has it separated by a comma

  • Issue:Sequence GeneratorDescription:Ensuring that the sequence number of reports are in the correct order is very important when processing follow up reports or answering to an auditPossible Causes: Development team did not configure the sequence generator correctly resulting in records with a duplicate sequence numberExample(s):Duplicate records in the sales report was doubling up several sales transactions which skewed the report significantly

  • Issue:Undocumented RequirementsDescription:Find requirements that are understood but are not actually documented anywherePossible Causes: Several of the members of the development team did not understand the understood undocumented requirements.Example(s):There was a restriction in the where clause that limited how certain reports were brought over. Used in mappings that were understood to be necessary, but were not actually in the requirements.Occasionally it turns out that the understood requirements are not what the business wanted.

  • Issue:Duplicate RecordsDescription:Duplicate records are two or more records that contain the same dataPossible Causes: Development team did not add the appropriate code to filter out duplicate recordsExample(s):Duplicate records in the sales report was doubling up several sales transactions which skewed the report significantly

  • Issue:Numeric Field PrecisionDescription:Numbers that are not formatted to the correct decimal point or not rounded per specificationsPossible Causes: Development team rounded the numbers to the wrong decimal pointExample(s):The sales data did not contain the correct precision and all sales were being rounded to the whole dollar

  • Issue:Rejected RowsDescription:Data rows that get rejected due to data issuesPossible Causes: Development team did not take into account data conditions that could break the ETL for a particular rowExample(s):Missing data rows on the sales table caused major issues with the end of year sales report

  • Any tool available and used to ensure data quality. WizSoft- WizRule Vality- IntegrityPrism Solutions, Inc.- Prism Quality Manager

  • Objective: Is your data complete and valid?Tool: WizSoft- WizRule, Vality- IntegrityFeatures: Data examination- determines quality of data, patterns within it, and number of different fields used.

  • Objective: Does your data comply to your business rules? (Do you have missing values, illegal values, inconsistent values, invalid relationships?)Tool: Prism Solutions, Inc.- Prism Quality ManagerWizSoft - WizRuleVality- IntegrityFeatures: Compare to business rules and assess data for consistency and completeness against rules.

  • Objective: Are you using sources that comply to yourbusiness rules?Tool: WizSoft- WizRule, Vality- IntegrityFeatures: Data reengineering- examining the data to determine what the business rules are?

  • Trillium Software- Parser i.d. Centric- DataRightTrillium Software- GeoCoder i.d. Centric- ACE, Clear I.D. LibraryGroup 1- NADISTrillium Software- MatcherInnovative Systems- Match i.d. Centric-Match/ConsolidationGroup 1- Merge/Purge PlusInnovative Systems- Corp-Match

  • Objective: Does your data need to be broken up between source and data warehouse?

    Tool: Trillium Software- Parser i.d. Centric- DataRight

    Features: Data parsing (elementizing)- context and destination of each component of each field.

  • Objective: Does your data have abbreviations that should be changed to insure consistency?

    Tool: Trillium Software- Parser i.d. Centric- DataRight

    Features: Data standardizing- converting data elements to forms that are standard throughout the DW.

  • Objective: Is your data correct?

    Tool: Trillium Software- ParserTrillium Software- GeoCoder i.d. Centric- ACE, Clear I.D. LibraryGroup 1- NADIS

    Features: Data correction and verification- matches data against known lists (addresses, product lists, customer lists)

  • Objective: Is there redundancy in your data?

    Tool: Trillium Software- MatcherInnovative Systems- Match i.d. Centric-Match/ConsolidationGroup 1- Merge/Purge Plus.

    Features: Record matching- determines whether two records represent data on the same object.

  • Objective: Are there multiple versions of companynames in your database?

    Tool: Innovative Systems- Corp-Match

    Features: Record matching- based on user specified fields such as tax ID

  • Objective: Is your data consistent prior to entering data warehouse?

    Tool: Vality- Integrityi.d. Centric-Match/Consolidation

    Features: Transform data- 1 for male, 2 for female becomes M & F- ensures consistent mapping between source systems and datawarehouse

  • Objective: Do you have information in free form fields that differs between databases?

    Tool: Vality- Integrity

    Features: Data reengineering- examining the data to determine what the business rules are?

  • Objective: Do you multiple individuals in the same household that need to be grouped together?

    Tool: i.d. Centric-Match/ConsolidationTrillium Software- Matcher

    Features: Householding- combining individual records that have same address.

  • Objective: Does your data contain atypical words- such as industry specific words, ethnic or hyphenated names?

    Tool: i.d. Centric- ACE, Clear I.D.

    Features: Data parsing combined with data verification-comparison to industry specific lists.

  • Enterprise / Integrator by Carleton.Semio - SemioMap

  • Objective: Do you have multiple formats to be accessed- relational dbs, flat files, etc.?

    Tool: Enterprise/Integrator by Carleton.

    Features: Access the data then map it to the dw schema.

  • Objective: Do you have free form text that needs to be indexed, classified, other?

    Tool: Semio- SemioMap

    Features: Text mining- extracts meaning and relevance from large amounts of information

  • Objective: Have the rules established during the data cleansing steps been reflected in the metadata?

    Tool: Vality- Integrity

    Features: Documenting- documenting the results of the data cleansing steps in the metadata.

  • Objective: Is data Y2K compliant?Tool: Enterprise/Integrator by Carleton.Features: Data verifiacation within a migration tool.

  • How it is ensured that the data sample selected ensures completeness. By data verification with the help of migration tool.

  • How is data reconciliation done? If the DDL that the data architect has produced somehow does not match the DDL that has already been defined to the dbms, then there MUST BE a reconciliation before any other design and development ensues.

  • Many of the data warehouses are built on n-tier architecture with multiple data extraction and data insertion jobs between two consecutive tiers. As it happens, the nature of the data changes as it passes from one tier to the next tier. Data reconciliation is the method of reconciling or tie-up the data between any two consecutive tiers (layers).

  • Master Data ReconciliationMaster data reconciliation is the method of reconciling only the master data between source and target.

    Common examples of master data reconciliationTotal count of rows, example: Total Customer in source and targetTotal number of Products in source and target etc.

    Total count of rows based on a condition, example: Total number of active customersTotal number of inactive customers etc.

  • Transactional Data ReconciliationSales quantity, revenue, tax amount, service usage etc. are examples of transactional data. Transactional data make the very base of BI reports so any mismatch in transactional data can cause direct impact on the reliability of the report and the whole BI system in general. That is why reconciliation mechanism must be in-place in order to detect such a discrepancy before hand (meaning, before the data reach to the final business users)

    Some examples measures used for transactional data reconciliation Sum of total revenue calculated from source and targetSum of total product sold calculated from source and target etc.

  • Automated Data ReconciliationFor large warehouse systems, it is often convenient to automate the data reconciliation process by making this an integral part of data loading. This can be done by maintaining separate loading metadata tables and populating those tables with reconciliation queries. The existing reporting architecture of the warehouse can be then used to generate and publish reconciliation reports at the end of the loading. Such automated reconciliation will keep all the stake holders informed about the trustworthiness of the reports.

  • How to test bulk data? Using Automation tools.

  • Open source load testing tool:It is a Java platform application. It is mainly considered as a performance testing tool and it can also be integrated with the test plan. In addition to the load test plan, you can also create afunctional test plan. This tool has the capacity to be loaded into a server or network so as to check on its performance and analyze its working under different conditions. It is of great use in testing the functional performance of the resources such as Servlets, Perl Scripts and JAVA objects. Some information on performance tool and how the result is analyzed.

  • Load and performance testing software:This is a tool used for measuring and analyzing the performance of the website. The performance and the end result can be evaluated by using this tool and any further steps can be taken. This helps you in improving and optimizing the performance of your web application. This tool analysis the performance of the web application by increasing the traffic to the website and the performance under heavy load can be determined. It is available in two different languages; English and French.

  • One of the key attractive features of this testing tool is that, it can create and handle thousands of users at the same time. This tool enables you to gather all the required information with respect to the performance and also based on the infrastructure. The LoadRunner comprises of different tools; namely, Virtual User Generator, Controller, Load Generator and Analysis.

  • Open Source Stress Testing Tool: This tool works effectively when it is integrated with the functional testing tool soapUI. This allows you to create, configure and update your tests while the application is being tested. It also gives a visual Aid for the user with a drag and drop experience. This is not a static performance tool. The advanced analysis and report generating features allows you to examine the actual performance by pumping in new data even while the application is being tested. You need not bother to restart the LoadUI each and every time you modify or change the application. It automatically gets updated in the interface.

  • Load testing and stress testing tool for web application: To find out the bottlenecks of the website, it is necessary to examine the pros and cons. There are many performance testing tools available for measuring the performance of the certain web application. WebLoad is one such tool used for load testing and stress testing. This tool can be used for Load testing any internet applications such as Ajax, Adobe Flex, Oracle Forms and much more. This tool is widely used in the environment where there is a high demand for maximum Load testing.

  • It refers to the Web Application Performance tool. These are scales or analyzing tools for measuring the performance and output of any web application or web related interfaces. These tools help us to measure the performance of any web services, web applications or for any other web interfaces. With this tool you have the advantage of testing the web application performances in various different environment and different load conditions. WAPT provides detailed information about the virtual users and its output to its users during the load testing. The WAPT tools can tests the web application on its compatibility with browser and operating system. It is also used for testing the compatibility with the windows application in certain cases.

  • It is a desktop based advanced HTTP load testing tool. The web browser can be used to record the scripts which is easy to use and record. Using the GUI you can modify the basic script with dynamic variables to validate response. With control over network bandwidth you can simulate large virtual user base for your application stress tests. After test is executed HTML report is generated for analysis.

  • It is a load testing tool which is mainly used in the cloud-based services. This also helps in website optimization and improvising the working of any web application. This tools generates traffic to the website by simulating users so as to find the stress and maximum load it can work. This LoadImpact comprises of two main parts; the load testing tool and the page analyzer. The load testing can be divided into three types such as Fixed, Ramp up and Timeout. The page analyzer works similar to a browser and it gives information regarding the working and statistics of the website. The fame of developing this load testing tool belongs to Gatorhole AB. This is a freemium service which means that, it can be acquired for free and also available for premium price.

  • It is an automated performance testing tool which can be used for a web application or a server based application where there is a process of input and output is involved. This tool creates a demo of the original transaction process between the user and the web service. By the end of it all the statistical information are gathered and they are analyzed to increase the efficiency. Any leakage in the website or the server can be identified and rectified immediately with the help of this tool. This tool can be the best option in building a effective and error free cloud computing service.

  • It is a automated testing tool which can be employed for testing the performance of any web sites, web applications or any other objects. Many developers and testers make use if this tool to find out any bottlenecks in their web application and rectify them accordingly. This testing tool comes along with a built in editor which allows the users to edit the testing criteria according to their needs. The testing anywhere tool involves 5 simple steps to create a test. They are object recorder, advanced web recorder, SMART test recorder, Image recognition and Editor with 385+ comments.

  • ThanksPrepared by Mr. Prashanth B SSoftware Testing Corporate TrainerOn behalf of ISQT International

    ISQT - Process & Consulting Services Private Limited 732, 1st Floor, 12th Main, 3rd Block, Rajajinagar, Bangalore - 560 010, INDIA Phone: + 91- 80 - 23012501-15 Fax: + 91 80 23142425 www.isqtinternational.com email:[email protected]