data quality and data integration: the keys for successful data...

©20 04 Group 1 Softwa re, Inc. All rights res e rve d .

Data Quality and DataIntegration: The Keys forSuccessful Data Warehousing

MAS Strategies

2

Many organizations have successfully implemented data warehouses to analyzethe data contained in their multiple operational systems to compare current andhistorical values. By doing so, they can better, and more profitably, manage theirbusiness, analyze past efforts, and plan for the future. When properly deployed, datawarehouses benefit the organization by significantly enhancing its decision-makingcapabilities, thus improving both its efficiency and effectiveness. However, the quality of the decisions that are facilitated by a data warehouse is onlyas good as the quality of the data contained in the data warehouse - this data mustbe accurate, consistent, and complete. For example, in order to determine its top tencustomers, an organization must be able to aggregate sales across all of its saleschannels and business units and recognize when the same customer is identified bymultiple names, addresses, or customer numbers. In other words, the data used todetermine the top ten customers must be integrated and of high quality. After all, ifthe data is incomplete or incorrect then so will be the results of any analysisperformed upon it.

Data Quality and DataIntegration: The Keys forSuccessful Data Warehousing

3

ConceptsUnderlying a DataWarehouse:The data warehouse concept originated in an effort tosolve data synchronization problems and resolve datainconsistencies that resulted when analysts acquired datafrom multiple operational or production systems. One ofthe most important functions of a data warehouse is toserve as a collection point for consolidating and furtherdistributing data extracts from an organization’s produc-tion systems. The data warehouse also must ensure thatthis data is uniform, accurate, and consistent and therebyserves as a “single version of truth” for the enterprise. However, this is much more complicated than it might firstappear, especially since each production system wasdeveloped to satisfy a particular operational need.Consequently, each application system was designed withits own data standards and thus was poorly integratedwith other systems. This integration is particularlychallenging when dealing with legacy systems that wereimplemented before any real effort was made to establishenterprise data standards or even common datadefinitions. Even if we lived in a world with enough disk space andCPU resources to allow time-stamped data values fromeach transaction associated with every production systemto be saved forever, year-end data purges never took place, and computers could quickly read and aggregate all this

data for analysis, data warehouses would still be desirable.At a minimum, the data warehouse would be needed tointegrate the data in each system and establish a commonformat. Moreover, not all of the data an organization requires for analysis purposes is stored in its operationalsystems. Consequently, data warehouses are frequentlyaugmented with data from third-party content providers.This content might, for example, include customerdemographics and lifestyle data, credit information, orgeographic data used to determine distances fromfirehouses, telephone company central offices, or even taxjurisdictions. Data warehouses are also likely to containderived data fields and summary values resulting from theconsolidation of data contained in one or moreoperational systems. Even when organizations developed data standards, itwas unlikely that they modified the existing operationalsystems to reflect these standards; rather these standardswere applied only when developing and implementing newsystems. Consequently, when the data residing in theseoperational systems was needed to populate a datawarehouse, it was often necessary to first transform thedata from each source to be consistent with the enterprisedata standards prior to loading it into the data warehouse.The data warehouse was sometimes the first attempt andoften the first place that the data actually conformed tocorporate standards! Data integration and data quality are the two keycomponents of a successful data warehouse as bothcompleteness and accuracy of information are ofparamount importance. Once this data is collected it canbe made available both for direct analysis and fordistribution to other, smaller data warehouses.

C h a l l e n g e : Automate business processes associated with consumer purchases.S o l u t i o n : The organization deployed a data warehouse to enable information to flow effectively through thecompany by automating the entire supply chain. From the receipt of a customer order via the Internet, throughsending the order to the manufacturer and finally to organizing delivery logistics and invoicing – this retailerdramatically enhanced its end-to-end business processes. B e n e f i t :The acquired functionality included the ability to: • Execute order placements• Monitor the books• Provide decision support• Analyze transactional data, product performance, sales and earnings statistics, and information on customer experiences

4

Variations on aTheme: DataWarehouse, DataMart, OperationalData Store, EIIThe need to bring consistent data from disparate sourcestogether for analysis purposes is the basic premise behindany data warehouse implementation. Based on this need,various data warehouse architectures and implementationapproaches have evolved from the basic concept asoriginally formulated by Bill Inmon in his book “Building the Data Warehouse” (W.H. Inmon, 1992, John Wiley &Sons, Inc.). Inmon stated, “a data warehouse is a subjectoriented, integrated, nonvolatile, time variant, andnonvolatile collection of data in support of management’sdecisions.” There are now a variety of approaches to data warehous-ing including enterprise data warehouses, data marts,operational data stores, and enterprise informationintegration. However, most organizations deploy a hybridcombination with each approach complementing theothers. Although they may differ in content, scope,permanency or update cycle they all have twocharacteristics in common: the need to integrate data andthe need for this data to be of high quality.

Data Warehouses: From a conceptual perspective, data warehouses storesnapshots and aggregations of data collected from avariety of source systems. Data warehouses encompass avariety of subject areas. Each of these source systemscould store the same data in different formats, withdifferent editing rules, and different value lists. For example, gender code could be represented in threeseparate systems as male/female, 0/1, and M/F respec-tively; dates might be stored in a year/month/day,month/day/year, or day/month/year format. In the UnitedStates “03062004” could represent March 6, 2004 whilein the United Kingdom it might represent June 3, 2004.Data warehouses involve a long-term effort and are usuallybuilt in an incremental fashion. In addition to adding newsubject areas with each iteration, the breadth of datacontent of existing subject areas is usually increased asusers expand their analysis and their underlying datarequirements. Users and applications can directly use the datawarehouse to perform their analysis. Alternately, a subsetof the data warehouse data, often relating to a specificline-of-business and/or a specific functional area, can beexported to another, smaller data warehouse, commonlyreferred to as a data mart. Besides integrating andcleansing an organization’s data for better analysis, one ofthe benefits of building a data warehouse is that the effortinitially spent to populate it with complete and accuratedata content further benefits any data marts that aresourced from the data warehouse.

D ata Q u a l i t y a n d D ata I n t e g r a t i o n : Th e K e y s f o r S u c c e s s f u l D ata Wa r e h o u s i n g

C h a l l e n g e : Provide access to information on production efficiency, sales activities and logistics and transformit into useful intelligence. Enable users to query data sources without having to rely on IT assistance. S o l u t i o n : Implemented a Web-based data warehousing solution, providing the ability to:• Access and analyze data from anywhere via the Web• Analyze group sales profits down to the customer or individual product levelB e n e f i t :• Reliability of product shipments• Reduced manufacturing costs

Data Marts: A data mart, if populated from a data warehouse, containsa subset of the data from the data warehouse. If this isthe case, then it is generally considered to be a dependentdata mart and can be implemented relatively quickly as the data has already been collected and integrated withinthe data warehouse. The quality of its content is directlydependent upon the contents of the data warehouse.Independent data marts are those that are developedwithout regard to an overall data warehouse architecture,perhaps at the departmental or line-of-business level,typically for use as a temporary solution. As the independent data mart cannot rely on an existingdata warehouse for its content, implementation will takelonger than a dependent data mart, assuming, of course,that the data warehouse used to populate the dependentdata mart already existed. Just because a data martoperates independently of any other data mart or datawarehouse, it is nonetheless still important that the datawithin it be complete and accurate. If not, erroneousanalysis is likely to occur and invalid conclusions drawn.Pragmatically, an independent data mart may be the onlyviable approach when the existing enterprise warehouse isbeing built incrementally and the data needed by the datamart is not yet available from the warehouse. Building acorporate data warehouse on a “subject by subject”approach is certainly a reasonable and proven strategy.Many organizations that have tried to populate theirenterprise data warehouses with data for all requestedsubject areas prior to initial rollout have found that this was akin to attempting to trying to “boil the ocean,”the task was simply too overwhelming to be realisticallyaccomplished in anything other than a phased approach.

It is reasonable to assume that an organization’sindependent data marts will ultimately be combined.Eventually they will lose their independence as individualdata needs are ultimately satisfied through an enterprisedata warehouse. Combining the content requirements of theseindependent data marts to determine the contents ofthe enterprise data warehouse will be significantly easierif each data mart contains high quality, complete data.This “bottoms up” approach of using the requirementsof existing independent data marts to then determine therequirements of a data warehouse from which they will bepopulated has been effective in organizations whereseveral departments first needed to quickly implementtheir own solutions. These organizations could simply notwait for their “top down” data warehouse to first be built. Operational Data Stores: A common problem that exists in many organizations isthe inability to quickly combine operational data aboutthe same entity such as a customer or vendor that existsin multiple systems. A classic example occurred whenbanking institutions first started adding new serviceofferings such as investment accounts to their moretraditional savings and checking account offerings.Many of these new services were supported by systemsthat existed independently. When the bank needed to seeall of the current financial information it had about acustomer, it needed to combine and consolidate data fromall of these systems, assuming of course it could identifythat a customer whose account information resided inseveral systems, was the same customer. As this needbecame increasingly more important, the operational datastore (ODS) came into vogue.

5

A primary difference between data warehouses andoperational data stores is that while a data warehousefrequently contains multiple time-stamped historical datasnapshots, with new snapshots being added on awell-defined periodic schedule, an operational data storecontains current values that are continually in flux. A datawarehouse adds new time-stamped data values and retainsthe old ones; an operational data store updates existingdata values. While the initial load and continual updatingof the operational data store are classic examples of dataintegration, the ability to identify and link differentaccounts each captured from a different system, asbelonging to the same customer is also a classic exampleof data quality. This underscores the importance of, andinterdependence between, data quality and dataintegration, when solving real-world business problems.Enterprise Information Integration(EII): While not necessarily a new concept, the idea ofenterprise information integration, or EII, has receivedmuch publicity in the past few years. Simply stated, itinvolves quickly bringing together data from multiplesources for analysis purposes without necessarily storingit in a separate database. Some vendors even have goneso far as to claim that an EII approach can replace atraditional data warehouse or data mart with a “virtualdata warehouse” by eliminating the need to extract andstore the data into another database. However, theramifications associated with this approach (e.g., such asthe underlying data changing, or even being purged,between analysis) must be not be overlooked.

That said, an EII solution certainly complements the otherdata warehousing variants and can be a valuable resourcefor those wishing to perform quick, perhaps ad hoc,analysis on current data values residing in operationalsystems. It can help alleviate the backlog of requests whichare a constant struggle for any IT staff.Organizations must, however, recognize that the data inthese operational systems may not be consistent witheach other, that the data quality of each source may varywidely, and that historical values may not be available.This is a risk many users are willing to take for “quick anddirty” analysis when the needed data is not contained ina formal data warehouse or data mart. In fact, manyorganizations use an EII approach to establish processesand programming logic that enable their users to trans-form and pull together data from multiple sources forpurposes that include desktop analysis. Of course, if thequality of the data in the underlying operational systemswere high, the EII analysis would obviously benefit.EII solutions can also be successfully used to prototypeor evaluate additional subject areas for possible inclusionin a data warehouse. Some organizations have initiallydeployed EII solutions when the data warehouses or datamarts did not contain the needed data and later addedthis data to their data warehouse content. In order tocombine current and historical values, organizations caninclude an existing data warehouse or data mart as oneof their sources and thus combine historical and currentdata values.

6

D ata Q u a l i t y a n d D ata I n t e g r a t i o n : Th e K e y s f o r S u c c e s s f u l D a ta Wa r e h o u s i n g

Data Integrationand Data QualityWhile estimates vary, it is generally agreed that dataintegration and data quality represent the majority ofthe cost of implementing a data warehouse. Lack of dataintegration and poor data quality are the most commoncauses of post-implementation data warehouse failures.First impressions count; if decision-makers find that thedata warehouse contains incomplete or incorrect data,they are likely to seek their data elsewhere.At a very simple level, data integration and data qualityare concerned with collecting accurate data, transformingit into a common format with a common set of values,providing appropriate aggregations or summary tables,and loading it into the data warehouse environment.This sounds simple enough but there are manycomplicating factors that must be considered.Multiple Data Sources: The requisite data is likely to be stored in a variety ofsystems, in a variety of formats, and on a variety ofplatforms. Assuming the required data resides in acomputer system, not on paper in a file cabinet, the datasources may be relational databases, XML documents,legacy data structures (such as Adabas, IDMS, IMS, VSAM,or even sequential EBCDIC files). It may be contained inpackaged enterprise application systems such as SAP orSiebel where knowledge of the business logic is necessaryto understand and access the underlying data. The datamay reside on a wide variety of computing platformsincluding mainframe, Unix, Windows, and Linuxenvironments. Data Transformations: The detailed data residing in the operational systems mustfrequently be consolidated in order to generate and store,for example, daily sales by product by retail store, rather than storing the individual line-item detail for each cashregister transaction. Complex arithmetic or statistical calculations frequentlyneed to be applied to the source data in order to perform,for example, percent of sales calculations or “top x”rankings, especially if items that of low value are groupedinto an “others” category before being loaded into thewarehouse.

Data Linking:In many cases data records concerned with the sameobject (for example, a given customer, employee, orproduct), resides in multiple source systems.These records must first be linked together and consoli-dated prior to being loaded into the data warehouse.Integration with data quality software is often the onlyrealistic way of matching these records, especially whentrying to deal with the nuances involved identifyingcustomers or vendors. Each system could contain its ownvariation, in format or spelling, of the same customername and/or address. Once again, data quality softwarecan greatly facilitate this task. Data Augmentation:Data warehouses and data marts are not exclusivelypopulated from the data contained in an organization’soperational systems. It is frequently desirable to augmentthis data with information from outside sources.While organizations can try and collect this on their own,there are many commercial sources of company-centricand people-centric data that can be used to augment datawarehouse content. Data Volumes:It is frequently necessary to load very large data volumesinto the warehouse in a short amount of time, therebyrequiring a parallel processing and memory-basedprocessing. While the initial data loads are usually themost voluminous, organizations have a relatively long loadwindow in which to accomplish this task since the initialload is done prior to the data warehouse being openedfor production. After the data warehouse is in use, new7

data content must be loaded on a periodic basis. The loadvolume can be reduced if change data capture techniquesare employed to capture only data that has changed sincethe prior data load. In some cases, Enterprise ApplicationIntegration (EAI) technology, frequently involving messagequeues, can be used to link enterprise applications to thedata warehouse data integration processes in order tocapture new data on a near-real-time basis. Collaborative User/ITDevelopment Efforts:Many data warehouses have been implemented only toquickly discover that the data content was not what theusers had in mind. Much finger pointing and general ill will can be avoided if the data integration staff can workcollaboratively with the end-user analysts. They should beable to view the results of the data transformation processon real data rather than trying to interpret somewhatabstract data flow diagrams and transformation descrip-tions. The ability to view live data and the associatedtransformation processes involved in the data integrationprocess can help avoid nasty surprises when, for example,a field in the source system thought to contain telephonenumbers actually contains text data.

Changing Requirements:A successful data warehouse builds user momentum andgenerates increased user demand that results in a largeruser audience and new data requirements. The dataintegration processes must be able to quickly respondto new data sources, or changes to the underlying filestructure of the existing source systems, withoutcompromising existing processes or causing them to berewritten. Increased user demand often translates into anarrower data warehouse load window, especially if newusers are in geographic areas that now require access tothe data warehouse at times during which there waspreviously little or no user demand.Metadata Integration: Most data integration tools store the metadata (or dataabout data) associated with its sources and targets in ametadata repository that is included with the product.At a minimum, this metadata includes information suchas source and target data formats, transformation rules,business processes concerned with data flows from theproduction systems to the data warehouse (i.e., datalineage), and the formulas for computing the values of anyderived data fields. While this metadata is needed by the

8

Challenge: Collect and analyze large volumes of data in different formats from both internal and external sources tooptimize business processes and make predictive calculations for planning purposes. Solution: A data warehouse that provided the infrastructure to manage high volumes of data critical todecision-making processes:• Hourly forward and historic energy and capacity positions, including mark-to-market (exposure price volatility) for two years into the future. The positions can be viewed hourly, daily, weekly, monthly, and yearly• Retail choice analysis and statistics, including alternate supplier load and capacity obligation, which can be viewed by load aggregator, supplier, zone, and rate class• Customer shopping statistics that provide forward and historic views of customer demand, including the measurement of customer churn • Weather analysis and statistics for historical and forward views of temperature, dew point, and wind speed• Portfolio performance reporting that measures the impact of the business decisions over time• Short-term energy deal “what-if ” analysisB e n e f i ts :• Ability to manage its data in all of its different formats• Develop tools for analysis• Ultimately deliver it to the browsers of market analysts and managers in its organization Consequently, the company was able to make the best decisions possible using the best information available.

D a ta Q u a l i t y a n d D a ta I n t e g r a t i o n : Th e K e y s f o r S u c c e s s f u l D ata Wa r e h o u s i n g

9

data integration tool for use in defining and creatingappropriate data transformation processes, its value isenhanced when shared with other tools utilized indesigning the data warehouse tables and businessintelligence tools that access the warehouse data. If themetadata also includes information about what analysisprogram uses which data element, it can be a valuablesource for analyzing the ramifications of any change tothe data element (i.e., impact analysis).Additional Thoughts on DataQuality:Data quality is involved throughout the entire datawarehousing environment and is an integral part of thedata integration process. Data quality involves ensuringthe accuracy, timeliness, completeness, and consistencyof the data used by an organization while also making surethat all parties utilizing the data have a common under-standing of what the data represents. For example, doessales data include or exclude internal sales and is itmeasured in units or dollars, or perhaps even Euros? In most data warehousing implementations data qualityis applied in at least two phases. The first phase isconcerned with ensuring that the source systemsthemselves contain high quality data while the secondphase is concerned with ensuring that the data extractedfrom these sources can then be combined and loaded intothe data warehouse. As mentioned earlier, even if the dataresiding in each of the sources is already accurate andclean, it is not simply a matter of directly combining theindividual sources as the data in each source could exist ina different format and use a different value list or codeset. One system might use of the alphabetic codes(S,M,D) to represent “single,” “married,” and “divorced”while another might represent them with the numericcodes (1, 2, 3). The data loaded into the warehouse mustconform to a single set of values; data cleansing and data

transformation technology must work together to ensurethat they do. Of course, duplicate occurrences of thesame customer or vendor across multiple systems, or evenin the same system, with different variations of the samename and/or address, is a well-known example of a dataquality issue that was previously discussed.

Aproaches ToDQ/DIThe “Do It Yourself” Approach:Many organizations have attempted to access andconsolidate their data through in-house programming.After all, how difficult can it be to write a few programs toextract data from computer files? Assuming for a momentthat the files are documented (and the documentationup-to-date), the programming team has in many casessucceeded, although usually after the originally estimatedcompletion date. Unfortunately, the initial extract and loadis usually the easy part! It is a fact of life and systems that “things change.”Even when the initial data integration programs work asrequired, there will be a continuing need to maintain themand keep them up-to-date. This is one of the mostoverlooked costs of the “do it yourself” approach to dataintegration and one that is frequently ignored inestimating the magnitude of any in-house data integrationeffort. This approach frequently does not consider dataquality. Even when it does, only the most obvious dataquality issues are considered as the organization’sprogramming staff does not have the time or experience tobuild strong data quality tools that are comparable tothose offered by commercial data quality vendors.

Trends in Data WarehousingSeveral trends are developing in the data warehouse market, many of which are directly concerned with data integration and data quality.These include: • EAI and ETL will continue to converge due to the need to update the data warehouse with the recent transactions • The use of “active” data warehouses that directly feed analytical results back to operational systems will grow• Pragmatic hybrid approaches to data warehousing will continue to win-out over insistence on architectural purity• Data quality will be recognized as an up-front requirement for both operational and analytical systems efforts, rather than an after-the-fact fix• EII will succeed when marketed as complementary to, not a replacement for, traditional data warehouses and data marts• Disparate data integration tools will give way to end-to-end data integration platforms that provide end-to-end dataintegration functionality• Data integration platforms will be callable both through direct application programming interfaces and as Web services

Additionally, most packaged data integration software hasa metadata repository component that allows for sharingof metadata with other data warehouse components suchas database design and business intelligence tools.However, in-house software frequently does not providefor sharing its own the metadata or leveraging themetadata collected by other data warehouse components.In fact, the metadata collected in the “do it yourself”approach is usually rather limited and may only becontained in COBOL file descriptions for the input andoutput formats or in the actual program code for thetransformation and aggregation logic. In general, metadataresiding in “home-grown” software cannot be readilyshared with other data warehouse tools.Commercial Data Integration Tools:Fortunately, the industry has recognized the need for dataintegration tools and a variety of offerings arecommercially available. An appropriate tool should becapable of accessing the organization’s data sources andprovide connectors for packaged enterprise applicationsoftware systems and XML data sources. It should includea powerful library of built-in data transformation and dataaggregation functions that can be extended with theadditional of new functions developed by the deployingorganization. It must, of course, be able to populate thetarget data warehouse databases, be they relationaldatabases or proprietary OLAP cubes. The tool should also have sufficient performance notonly for the initial data requirements, but also for theadditional content, or additional generations of currentcontent. This can be accomplished perhaps through theuse of change data capture techniques that can beexpected in the future as the data warehouse subjectareas grow. Sufficient headroom should be available to be

able to handle not only the current update frequency anddata volumes but also anticipated future requirements.Both batch ETL loads and (should future plans require thiscapability) event-driven “near-real-time” EAI transactionsshould be supported. The tool should provide a simple, yet powerful, userinterface allowing users to visually perform the dataintegration tasks without having to write low-level codeor even utilize a high level programming language or 4GL.It should be able to be used by a variety of audiences, notjust data integration specialists. Users and data integrationspecialists would be able to collaborate in an iterativeprocess, and ideally view the transformation processagainst live data samples prior to actually populating thedata warehouse.The data integration tool should itself be easy to integratewith other technology in use by the organization andshould callable through a variety of mechanisms includingapplication programming interfaces (APIs) and Webservices. The technology should also be deployable as astandalone tool. Of particular importance is the ability toperform data cleansing and validation or to be able to beintegrated with tools that can provide data quality.Strong metadata capabilities should be included, both foruse by the data integration tool itself, and to facilitate thesharing of metadata with the business intelligence toolsthat will access the integrated data. If directly licensed the product should operate in a “lightsout” environment by setting up appropriate jobs steps andevent-driven conditional error handling routines, with theability to notify the operations staff if an error condition isencountered that cannot be resolved. It should be offeredwith several pricing models so it can be economicallydeployed, such as through an application services provider(ASP) for utilization by small organizations, or smaller

10

Challenge: Deliver real-time information to customers and improve information access among employees.Solution: In under ninety days, deployed a data warehouse that provided access to:• Information about all members, accounts and products• An integrated view of the business through one single entry point • Detailed transaction information on loans, credit cards and share products• General ledger information for accounting and profitability analysis Benefits:• Can now provide reports to key decision makers more quickly• Provide account information to members on-demand

D ata Q u a l i t y a n d D ata I n t e g r a t i o n : Th e K e y s f o r S u c c e s s f u l D ata Wa r e h o u s i n g

11

units of large enterprises.Commercial Data Quality Tools: Many organizations have attempted to resolve data qualityissues through their own devices as well. While editroutines can be built into programs to check for properformats, value ranges, and field value dependencies in thesame record, these are relatively simple when compared,for example, to ensuring that a customer address is a validand up-to-date. Data quality vendors, once best known for name andaddress correction and validation, have significantlyexpanded their capabilities in terms of the scope of thedata they can handle (i.e., non-name-and-address data),the volumes they can support, and the databases theymaintain in order to validate data. Many have expandedtheir offerings to include both software licensing and ASPdelivery.Data Integration Tools Supplied byDatabase Vendors: Most database vendors now offer data integration toolspackaged with, or as options for, their database offerings.Although these tools should certainly be considered whenevaluating data integration products, it is important torecognize that database vendors want their own databaseto be at the center of their customers’ data universe. Consequently, a data integration tool from a databasevendor could very well be optimized to populate thevendor’s own databases by, for example, taking advantageof proprietary features of that database. If used topopulate other databases, they may not perform as well,or even at all. That said, if an organization has standardized on aparticular database for all of its data warehousing projects,the data integration tools offered by that database vendorcould be used as the basis against which other dataintegration tools are compared.

SummaryToday almost all organizations recognize the significantadvantages and value that data warehousing can provideboth for pure analysis and as a complement to operationalsystems. While data warehouses exist in many forms,including enterprise-scale centralized monoliths,dependent and independent data marts, operational datastores, and EII implementations, they all benefit fromcomplete, consistent, and accurate data. After all, what isthe value of any analysis that are based upon faulty orincomplete information? While an organization’s overall data warehouse architecturecan encompass a variety of forms, each organization mustdecide what is right for its own purposes and recognizethat implementing a successful data warehousingenvironment is a continuous journey, not a one-time event. Whatever the choice, two things are certain: dataintegration and data quality will be key components of,if not the enabling technology for, the organization’s datawarehousing success. Data integration is an ongoingprocess that comes into play with each data load and witheach subject area extension; the quality of the data in thewarehouse must be continually monitored to ensure itsaccuracy. Organizations that ignore these requirementsmust be careful that instead of building a data warehousethat will be of benefit to their users, they do notinadvertently wind up creating a repository that providessuboptimal business value.Many data warehousing industry vendors can providerobust data integration and data quality solutions.In addition to developing and marketing products, thesevendors offer a wealth of experience and expertise thatthey can share with their customers. As a result, anorganization is best served when it deploys a commercial,fully supported and maintained set of tools rather thantrying to develop and maintain such a technology on itsown.

12

About MAS StrategiesMAS Strategies specializes in helping vendors marketand position their business intelligence and data ware-housing products in today’s highly competitive market.Typical engagements include SWOT analysis, marketresearch, due diligence support, white papers, publicpresentations, and helping organizations evaluatetactical and strategic product and marketing decisions. MAS Strategies also assists user organizations in theirdata warehouse procurement evaluations, needs analysis,and project implementations. For more informationabout MAS Strategies, visit its Web site atwww.mas-strategies.com.Group 1 Software (Nasdaq: GSOF) is a leading providerof solutions that help over 3,000 organizations world-wide maximize the value of their customer and otherdata. Group 1 provides industry-leading technologiesthat allow businesses to cleanse and enrich theircorporate data, generate personalized customercommunications and integrate and deliver data acrossthe enterprise. These technologies are essentialcomponents of enterprise applications includingcustomer relationship management (CRM), enterpriseresource planning (ERP) and business intelligencesystems. Founded in 1982 and headquartered inLanham, Maryland, Group 1 offers solutions utilized byleaders in the financial services, banking, GIS/mapping,retail, telecommunications, utilities, insurance and otherindustries. The company’s customer base includes suchrecognized names as Entergy, GEICO, L.L. Bean,MapQuest, QVC, Siemens, Wal-Mart and Wells Fargo.For more information about Group 1, visit thecompany’s Web site at http://www.g1.com.

About Group 1 Software

4200 Parliament Place, Suite 600 • Lanham, MD • 20706-1844Tel: (888) 413-6763 • Fax: (301) 918-0735 • Web: http://www.g1.comGroup 1 Software and the Group 1 logo are registered trademarks of Group 1 Software, Inc. ©2004 Group 1 Software, Inc. All rights reserved. 0504

data quality and data integration: the keys for successful data...

Documents