big data testing

9
Testing Big Data: Three Fundamental Components Big Data is a big topic in software development today. When it comes to practice, s testers may not yet fully understand what Big Data is exactly. What testers do know need a plan for testing it. The problem here is the lack of a clear understanding a and how deep inside a tester should go. There are some key questions that must be a before going down this path. ince most Big Data lacks a traditional structure, wh Data quality look like! "nd what are the most appropriate software testing tools! "s a software tester, it is imperative to first have a clear definition of Big Data improperly believe that Big Data is $ust a large amount of information. This is a c incorrect approach. %or example, a & petabyte 'racle database alone doesn(t constit Data situation ) $ust a high load one. To be very precise, Big Data is a series of and methods for processing of high volumes of structured and *most importantly+ of data. The key difference between Big Data and ordinary- high load systems is the a create flexible queries. The Big Data trend first appeared five years ago in .., when researchers from /oo announced their global achievement in the scientific $ournal, 0ature. Without any s results of medical tests, they were able to track the spread of flu in the .. by numbers of /oogle search queries to track influen1a2like illness in a population. Today, Big Data can be described by three 3s-4 3olume, 3ariety and 3elocity. 5n ot you have to process an enormous amount of data of various formats at high speed. Th processing of Big Data, and, therefore its software testing process, can therefore three basic components. The process is illustrated below by an example based on the open source "pache 6ado software framework4 7. 8oading the initial data into the 6adoop Distributed %ile ystem *6D%+. &. 9xecution of #ap2:educe operations. ;. :olling out the output results from the 6D%. Loading the Initial Data into HDFS 5n this first step, the data is retrieved from various sources *social media, web l networks etc.+ and uploaded into the 6D%, being split into multiple files4 3erify that the required data was extracted from the original system and ther corruption. 3alidate that the data files were loaded into the 6D% correctly.

Upload: bipulpwc

Post on 07-Oct-2015

10 views

Category:

Documents


0 download

DESCRIPTION

enjoyyyy

TRANSCRIPT

Testing Big Data: Three Fundamental Components

Big Data is a big topic in software development today. When it comes to practice, software testers may not yet fully understand what Big Data is exactly. What testers do know is that you need a plan for testing it. The problem here is the lack of a clear understanding about what to test and how deep inside a tester should go. There are some key questions that must be answered before going down this path. Since most Big Data lacks a traditional structure, what does Big Data quality look like? And what are the most appropriate software testing tools?As a software tester, it is imperative to first have a clear definition of Big Data. Many of us improperly believe that Big Data is just a large amount of information. This is a completely incorrect approach. For example, a 2 petabyte Oracle database alone doesnt constitute a Big Data situation just a high load one. To be very precise, Big Data is a series of approaches, tools and methods for processing of high volumes of structured and (most importantly) of unstructured data. The key difference between Big Data and ordinary high load systems is the ability to create flexible queries. The Big Data trend first appeared five years ago in U.S., when researchers from Google announced their global achievement in the scientific journal, Nature. Without any significant results of medical tests, they were able to track the spread of flu in the U.S. by analyzing numbers of Google search queries to track influenza-like illness in a population. Today, Big Data can be described by three Vs: Volume, Variety and Velocity. In other words, you have to process an enormous amount of data of various formats at high speed. The processing of Big Data, and, therefore its software testing process, can therefore be split into three basic components. The process is illustrated below by an example based on the open source Apache Hadoop software framework: 1. Loading the initial data into the Hadoop Distributed File System (HDFS).2. Execution of Map-Reduce operations.3. Rolling out the output results from the HDFS.Loading the Initial Data into HDFS In this first step, the data is retrieved from various sources (social media, web logs, social networks etc.) and uploaded into the HDFS, being split into multiple files: Verify that the required data was extracted from the original system and there was no data corruption. Validate that the data files were loaded into the HDFS correctly. Check the files partition and copy them to different data units. Determine the most complete set of data that needs to be checked. For a step-by-step validation, you can use tools such as Datameer, Talend or Informatica. Execution of Map-Reduce Operations In this step, you process the initial data using a Map-Reduce operation to obtain the desired result. Map-reduce is a data processing concept for condensing large volumes of data into useful aggregated results: Check required business logic on standalone unit and then on the set of units. Validate the Map-Reduce process to ensure that the key-value pair is generated correctly. Check the aggregation and consolidation of data after performing "reduce" operation. Compare the output data with initial files to make sure that the output file was generated and its format meets all the requirements. The most appropriate language for the verification of data is Hive. Testers prepare requests with the Hive (SQL-style) Query Language (HQL) that they send to Hbase to verify that the output complies with the requirements. Hbase is a NoSQL database that can serve as the input and output for Map-Reduce jobs. You can also use other Big Data processing programs as an alternative to Map-Reduce. Frameworks like Spark or Storm are good examples of substitutes for this programming model, as they provide similar functionality and are compatible with the Hadoop community. Rolling out the Output Results from HDFS This final step includes unloading the data that was generated by the second step and loading it into the downstream system, which may be a repository for data to generate reports or a transactional analysis system for further processing: Conduct inspection of data aggregation to make sure that the data has been loaded into the required system and thus was not distorted. Validate that the reports include all the required data, and all indicators are referred to concrete measures and displayed correctly. Testing data in a Big Data project can be obtained in two ways: copying actual production data or creating data exclusively for testing purposes the former being the preferred method for software testers. In this case, the conditions are as realistic as possible and thus it becomes easier to work with a larger number of test scenarios. However, not all companies are willing to provide real data when they prefer to keep some information confidential. In this case, you must create testing data yourself or make a request for artificial info. The main drawback of this scenario is that artificial business scenarios created by using limited data inevitably restrict testing. Only real users themselves can detect defects in that case. As speed is one of Big Datas main characteristics, it is mandatory to do performance testing. A huge volume of data and an infrastructure similar to the production infrastructure is usually created for performance testing. Furthermore, if this is acceptable, data is copied directly from production. To determine the performance metrics and to detect errors, you can use, for instance, the Hadoop performance monitoring tool. There are fixed indicators like operating time, capacity and system-level metrics like memory usage within performance testing. To be successful, Big Data testers have to learn the components of the Big Data ecosystem from scratch. Since the market has created fully automated testing tools for Big Data validation, the tester has no other option but to acquire the same skill set as the Big Data developer in the context of leveraging the Big Data technologies like Hadoop. This requires a tremendous mindset shift for both the testers as well as testing units within organizations. In order to be competitive, companies should invest in Big Data-specific training needs and developing the automation solutions for Big Data validation. In conclusion, Big Data processing holds much promise for todays businesses. If you apply the right test strategies and follow best practices, you will improve Big Data testing quality, which will help to identify defects in early stages and reduce overall cost. Big Data Testing

Big data creates a new layer in the economy which is all about information, turning information, or data, into revenue. This will accelerate growth in the global economy and create jobs. In 2013, big data is forecast to drive $34 billion of IT spending - GartnerData science is all about trying to create a process that allows you to chart out new ways of thinking about problems that are novel, or trying to use the existing data in a creative atmosphere with a pragmatic approach.Businesses are struggling to grapple with the phenomenal information explosion. Conventional database systems and business intelligence applications have given way to horizontal databases, columnar designs and cloud-enabled schemas powered by sharding techniques.Particularly, the role of QA is very challenging in this context, as this is still in a nascent stage. Testing Big Data applications requires a specific mindset, skillset and deep understanding of the technologies and pragmatic approaches to data science. Big Data from a testers perspective is an interesting aspect. Understanding the evolution of Big Data, What is Big Data meant for, Why Test Big Applications is fundamentally important.Big Data Testing Needs and ChallengesThe following are some of the needs and challenges that make it imperative for Big Data applications to be tested thoroughly.An in-depth understanding of the 4 Nouns of Big Data is a key to successful Big Data Testing.

Increasing need for Live integration of information: With multiple sources of information from different data, it has become imminent to facilitate live integration of information. This forces enterprises to have constantly clean and reliable data, which can only be ensured through end to end testing of the data sources and integrators. Instant Data Collection and Deployment: Power of Predictive analytics and the ability to take Decisive Actions have pushed enterprises to adopt instant data collection solutions. These decisions bring in significant business impact by leveraging the insights from the minute patterns in large data sets. Add that to the CIOs profile which demands deployment of instant solutions to stay in tune with changing dynamics of business. Unless the applications and data feeds are tested and certified for live deployment, these challenges cannot be met with the assurance that is essential for every critical operation. Real-time scalability challenges: Big Data Applications are built to match the level of scalability and monumental data processing that is involved in a given scenario. Critical errors in the architectural elements governing the design of Big Data Applications can lead to catastrophic situations. Hardcore testing involving smarter data sampling and cataloguing techniques coupled with high end performance testing capabilities are essential to meet the scalability problems that Big Data Applications pose.Data Integration - Drawing large and disparate data sets together in real time.Current data integration platforms which have been built for an older generation of data challenges, limit IT's ability to support the business. In order to keep up, organizations are beginning to look at next-generation data integration techniques and platforms.

Ability to understand, analyze and create test sets that encompass multiple data sets is vital to ensure comprehensive Big Data Testing.Testing Data Intensive Applications and Business Intelligence SolutionsCigniti leverages its experience of having tested large scale data warehousing and business intelligence applications to offer a host of Big Data Testing services and solutions.

Testing New Age Big Data Applications - Cigniti TestletsCigniti Testlets offer point solutions for all the problems that a new age Big Data Application would have to be go through before being certified with QA levels that match industry standards.

To know more about how Cigniti can help you take advantage of Large Data Sets through a comprehensive testing of your Big Data Application, write to [email protected]

Big Data testing: The challenge and the opportunity

The possibility of unknown scenarios in Big Data testing is gigantic when compared to testing techniques for conventional applications. The scope and range of the data harness in Big Data applications will demand new benchmarks of Software Quality Assurance

The inherent production of digital data across the economies and institutions is seen as an enormous source of information, which can help build a reliable knowledge base for critical decisions. As the IT enables global economy moves ahead, enterprises look at new ways of utilizing existing and growing data. At such moments, the Big Data perspective bridges the current and emerging trends.

Big data has purpose, little data has hope While current trends suggest Big Data driven business as an avenue that requires substantial investments, the future will see a growth of Big Data apps by ISVs and Small and Medium Enterprise segment as well. Moreover, as business grows, enterprises need to accommodate and manage the increasing volume, variety and velocity of the data that flows into the IT systems.

The conventional columnar designs and horizontal databases demand continuous expansion to store and retrieve this data. The sheer volume in itself weighs on the cloud enabled schemas and sharding techniques, forcing enterprises to look for new ways to accept, model and discard the data. Findings of an MIT research project by Andrew McAfee and Erik Brynjolfsson indicate that companies which inject big data and analytics into their operations show productivity rates and profitability that are 5 to 6 percent higher than those of their peers.

The possibility of unknown scenarios in Big Data testing is gigantic when compared to testing techniques for conventional applications. The scope and range of the data harness in Big Data applications will demand new benchmarks of Software Quality Assurance.

To accommodate Big Data test requirements, processes and infrastructure will be redesigned to achieve new levels of scalability, reusability and compatibility to ensure comprehensive, continuous and context driven test capabilities. To handle the volume and ensure live data integration, Big Data testing needs to empower developers and enterprises with freedom to experiment and innovate

One data layer From a Big Data perspective, enterprises will seek validation of application design, data security, source verification and compliance with industry standards. The parameters of performance, speed, security and load will add magnitude and precision to sculpt and reorganize data volumes into blocks that match the emerging requirements.

Over time, the database and storage layers will merge into a single data layer with options of retrieval and transmission exported out of the layer.

Business leaders now look at data maps to estimate and draft plans for emerging scenarios. The transformation of data into comprehensive reports in real time will add value to business decisions and enrich operations with higher levels of speed and accuracy. The test capabilities will acquire ability to de-complicate data sources/types/structures and channel them along specified contexts to align with objectives.

In a story titled The Top 7 Things Obama Taught Us About the Future of Business, the Forbes reported that the Obama campaign used a test tool called 'Optimize' to improve efficiency. Dan Siroker, Co-founder of Optimize, was quoted as saying we ran over 240 A/B tests to try different messaging, calls to action, and in attempt to raise more money. Because of our efforts, we increased effectiveness 49 percent.

Why Big Data is a good opportunity for Software Testers? Consider this. A joint report by NASSCOM and CRISIL Global Research & Analytics suggests that by 2015, Big Data is expected to become a USD 25 billion industry, growing at a CAGR of 45 per cent. Managing data growth is the number two priority for IT organizations over the next 12-18 months. In order to sustain growth, enterprises will adopt next generation data integration platforms and techniques fueling the demand for Quality Assurance mechanisms around the new data perspectives.

Be a smart tester and ride the next wave of IT on Big Data Testers can formulate service models through operational exposure to data acquisition techniques on Hadoop and related platforms. Test approaches can be developed by studying the deployment strategies of Mahout, Java, Python, Pig, Hive etc. Contextualization of data from diverse sources to streamlined outputs helps testers understand the channels of business logic in the data science.

Big Data is an emerging discipline which will leave a profound impact on the global economy. The ability to explore the power of Big Data testing is like being in a hotspot that will see action in terms of innovations that match emerging test requirements.